`infer`

infer は CheckpointContext 上の onCheckpoint に渡される関数です。今しがた保存されたチェックポイントアダプタに紐づいた推論リクエストを実行し、生の Response を返します。トップレベルの infer エクスポートはありません: SDK はコールバック引数として露出し、呼び出しが正しいジョブ + チェックポイントステップに自動的にスコープされるようにしています。

onCheckpoint: async ({ step, infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "I can't log in." },
    ],
  });
  console.log(`step=${step} sample=`, await res.text());
}

入力

interface InferArgs {
  messages: ChatMessage[];
  temperature?: number;
  topP?: number;
  maxTokens?: number;
  /** デフォルト: true。SSE ではなく単一 JSON ボディが欲しいなら false に。 */
  stream?: boolean;
  /** OpenAI 互換の Function Calling のためのツール定義。 */
  tools?: ToolDefinition[];
  /** "auto" | "none" | "required" | { type: "function", function: { name } } */
  toolChoice?: ToolChoice;
  /** OpenAI 互換の response_format（text / json_object / json_schema）。 */
  responseFormat?: ResponseFormat;
  /** responseFormat で表現できない制約（regex / choice / grammar）用の vLLM 拡張。 */
  structuredOutputs?: StructuredOutputs;
  signal?: AbortSignal;
}

フィールド	型	補足
`messages`	`ChatMessage[]`	チャット履歴。`system` / `user` / `assistant`（任意の `tool_calls` 付き）/ `tool`（`tool_call_id` 付き）の判別共用体で、OpenAI のメッセージ形と完全互換 — Function Calling を含む履歴をそのまま往復できます。
`temperature`	`number?`	サンプリング温度。省略時はバックエンドのデフォルト。
`topP`	`number?`	Nucleus サンプリング。省略時はバックエンドのデフォルト。
`maxTokens`	`number?`	応答トークンの最大値。省略時はバックエンドのデフォルト。
`stream`	`boolean?`	デフォルトは true（SSE）。単一 JSON ボディが欲しければ `false`。
`tools`	`ToolDefinition[]?`	モデルが呼び出せる関数定義。`toolChoice` を明示しない場合は OpenAI 仕様どおり既定 `"auto"` が適用されます。エンドポイントが auto-tool 抽出に未対応なら `400 tool_calling_not_configured` が返ります。
`toolChoice`	`ToolChoice?`	`"auto"` / `"none"` / `"required"` / `{ type: "function", function: { name } }`。auto-extraction のパーサが必要なのは `"auto"`（と `tools` 指定時の既定）のみで、それ以外は guided-decoding 経路を通ります。
`responseFormat`	`ResponseFormat?`	OpenAI 標準の構造化出力スイッチ。`{ type: "text" }` / `{ type: "json_object" }` / `{ type: "json_schema", json_schema: { name, schema, strict? } }`。表現できる制約はこちらを優先。
`structuredOutputs`	`StructuredOutputs?`	`responseFormat` で表せない制約のための vLLM 拡張。`json` / `regex` / `choice` / `grammar` / `json_object` / `structural_tag` のうち高々 1 つを指定（相互排他）。フィールド名は vLLM の wire 形式に合わせ snake_case（`json_object`、`structural_tag`、`disable_any_whitespace`、`whitespace_pattern`）。
`signal`	`AbortSignal?`	ローカル fetch を Abort。バックエンドの作業は止めません。モデルは生成を続け、あなたが読むのを止めるだけです。

Function Calling の例

onCheckpoint: async ({ infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "東京の天気は？" },
    ],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          parameters: {
            type: "object",
            properties: { city: { type: "string" } },
            required: ["city"],
          },
        },
      },
    ],
    toolChoice: "auto",
    stream: false,
  });
  const data = (await res.json()) as { choices: Array<{ message: ChatMessage }> };
  // data.choices[0].message は { role: "assistant", tool_calls: [...] } になり得ます。
};

構造化出力の例

const res = await infer({
  messages: [{ role: "user", content: "ユーザーのメールアドレスを抽出。" }],
  responseFormat: {
    type: "json_schema",
    json_schema: {
      name: "user",
      schema: {
        type: "object",
        properties: { email: { type: "string", format: "email" } },
        required: ["email"],
      },
      strict: true,
    },
  },
});

出力

infer は Promise<Response> を返します: 生の Fetch Response。SDK はボディをパースしません。消費の仕方はあなたが決めます:

// ストリーミング（デフォルト）
const res = await infer({ messages });
for await (const chunk of res.body!) {
  // chunk: 1 つ以上の SSE フレームの Uint8Array
}

// あるいはストリームを一気に読む
const text = await res.text();

// あるいは stream: false にして JSON ボディをパース
const res = await infer({ messages, stream: false });
const data = await res.json();

stream: true（デフォルト）のときボディは Studio の Playground が消費するのと同じ形の SSE イベントストリームです。SDK はこのストリーム用のフレームパーサを今のところ提供していません。デコードしたテキストデルタが必要なら、packages/studio-app/src/lib/api.ts から小さな extractInferenceDelta ヘルパーをコピーするか、eventsource-parser を使ってパーサを書いてください。

制約

infer は CheckpointContext 上にのみ存在します。完了済みジョブに対する SDK 側の同等物はありません。そのパスにはクラウド API を直接叩くか、学習をもう一度起こしてください。Studio の Playground は完了済みアダプタとチャットする UI レベルのルートです。
呼び出しは { kind: "checkpoint", jobId, step } にスコープされます。onCheckpoint の中から別のチェックポイントや別モデルに向け直すことはできません。
関数はメモ化されていません: 呼ぶたびにバックエンドへ届きます。

使いどころ

学習中のサニティーチェック。 ステップ 50 のチェックポイントとステップ 100 のチェックポイントを固定プロンプトで比較。Loss（モデルの誤差を表す指標）の曲線は問題なく見えても出力が劣化していれば、学習完了前に気付けます。
カスタム Early Stopping（学習の自動打ち切り）。 簡単な eval プロンプトと組み合わせて、出力が逸脱したら controller.abort()（abortSignal を参照）で学習を止め、trainer.cancel() でバックエンドを停止。詳しくは Early Stopping レシピを参照。
自前 UI へのライブプレビュー。 チェックポイントの出力を Slack、社内レビューキュー、自前アプリのプレビューチャネルに送る。

はじめに

ロードマップ

コンセプト

CLI

SDK

Studio

Cookbook

infer

`infer`

入力

Function Calling の例

構造化出力の例

出力

制約

使いどころ

はじめに

ロードマップ

コンセプト

CLI

SDK

Studio

Cookbook

Documentation Index

​infer

​入力

​Function Calling の例

​構造化出力の例

​出力

​制約

​使いどころ

`infer`

入力

Function Calling の例

構造化出力の例

出力

制約

使いどころ