`infer`

infer is a function passed into onCheckpoint on CheckpointContext. It runs an inference request bound to the just-saved checkpoint adapter and returns the raw Response. There is no top-level infer export; the SDK exposes it as a callback argument so that the call is automatically scoped to the right job + checkpoint step.

onCheckpoint: async ({ step, infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "I can't log in." },
    ],
  });
  console.log(`step=${step} sample=`, await res.text());
}

Input

interface InferArgs {
  messages: ChatMessage[];
  temperature?: number;
  topP?: number;
  maxTokens?: number;
  /** Default: true. Set false to get a single JSON body instead of SSE. */
  stream?: boolean;
  /** OpenAI-compatible function-calling tool definitions. */
  tools?: ToolDefinition[];
  /** "auto" | "none" | "required" | { type: "function", function: { name } } */
  toolChoice?: ToolChoice;
  /** OpenAI-compatible response_format (text / json_object / json_schema). */
  responseFormat?: ResponseFormat;
  /** vLLM structured outputs (regex / choice / grammar) for cases response_format can't express. */
  structuredOutputs?: StructuredOutputs;
  signal?: AbortSignal;
}

Field	Type	Notes
`messages`	`ChatMessage[]`	Chat history. Discriminated union over `system` / `user` / `assistant` (with optional `tool_calls`) / `tool` (with `tool_call_id`) — matches the OpenAI message shape so a tool-calling history can round-trip.
`temperature`	`number?`	Sampling temperature. Backend default if omitted.
`topP`	`number?`	Nucleus sampling. Backend default if omitted.
`maxTokens`	`number?`	Maximum response tokens. Backend default if omitted.
`stream`	`boolean?`	Default true (SSE). Set `false` for a single JSON body.
`tools`	`ToolDefinition[]?`	Function declarations the model is allowed to call. When set without an explicit `toolChoice`, the OpenAI-compatible default `"auto"` applies; the underlying endpoint must be configured for auto-tool extraction or the request returns `400 tool_calling_not_configured`.
`toolChoice`	`ToolChoice?`	`"auto"` / `"none"` / `"required"` / `{ type: "function", function: { name } }` — only `"auto"` (and the default when `tools` is present) needs the auto-extraction parser; the rest go through the guided-decoding path.
`responseFormat`	`ResponseFormat?`	OpenAI’s standard structured-output knob: `{ type: "text" }`, `{ type: "json_object" }`, or `{ type: "json_schema", json_schema: { name, schema, strict? } }`. Prefer this when expressible.
`structuredOutputs`	`StructuredOutputs?`	vLLM extension for constraints `responseFormat` can’t express. Mutually-exclusive: pick at most one of `json` / `regex` / `choice` / `grammar` / `json_object` / `structural_tag`. Field names are snake_case (`json_object`, `structural_tag`, `disable_any_whitespace`, `whitespace_pattern`) to match vLLM’s wire format.
`signal`	`AbortSignal?`	Aborts the local fetch. Does not stop work on the backend; the model finishes generating but you stop reading.

Tool calling example

onCheckpoint: async ({ infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "What's the weather in Tokyo?" },
    ],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          parameters: {
            type: "object",
            properties: { city: { type: "string" } },
            required: ["city"],
          },
        },
      },
    ],
    toolChoice: "auto",
    stream: false,
  });
  const data = (await res.json()) as { choices: Array<{ message: ChatMessage }> };
  // data.choices[0].message may be { role: "assistant", tool_calls: [...] }
};

Structured-output example

const res = await infer({
  messages: [{ role: "user", content: "Extract the user's email." }],
  responseFormat: {
    type: "json_schema",
    json_schema: {
      name: "user",
      schema: {
        type: "object",
        properties: { email: { type: "string", format: "email" } },
        required: ["email"],
      },
      strict: true,
    },
  },
});

Output

infer returns Promise<Response>: the raw Fetch Response. The SDK does not parse the body; you decide how to consume it:

// Streaming (default)
const res = await infer({ messages });
for await (const chunk of res.body!) {
  // chunk: Uint8Array of one or more SSE frames
}

// Or read the whole stream at once
const text = await res.text();

// Or, if you set stream: false, parse the JSON body
const res = await infer({ messages, stream: false });
const data = await res.json();

When stream: true (the default), the body is an SSE event stream in the same shape Studio’s Playground consumes. The SDK does not currently expose a frame parser for this stream; if you need decoded text deltas, copy the small extractInferenceDelta helper from packages/studio-app/src/lib/api.ts or write a parser around eventsource-parser.

Constraints

infer lives only on CheckpointContext. There is no equivalent for completed jobs from the SDK side; for that path use the cloud-api directly or trigger the run again. Studio’s Playground is the UI-level route to chat with a completed adapter.
The call is scoped to { kind: "checkpoint", jobId, step }. You cannot retarget it to a different checkpoint or a different model from inside onCheckpoint.
The function is not memoized: every call hits the backend.

When you would use it

Sanity check during a run. Compare a checkpoint at step 50 to one at step 100 against a fixed prompt. If the loss curve looks fine but outputs are degraded, you find out before the run finishes.
Custom early-stopping. Combine with a simple eval prompt: if outputs diverge, abort the run via controller.abort() (see abortSignal) and call trainer.cancel() to stop the backend. See the Early stopping recipe for the full pattern.
Live preview into your own UI. Send the checkpoint output to Slack, an internal review queue, or your own app’s preview channel.

Get started

Roadmap

Concepts

CLI

SDK

Studio

Cookbook

infer

`infer`

Input

Tool calling example

Structured-output example

Output

Constraints

When you would use it

Get started

Roadmap

Concepts

CLI

SDK

Studio

Cookbook

Documentation Index

​infer

​Input

​Tool calling example

​Structured-output example

​Output

​Constraints

​When you would use it

`infer`

Input

Tool calling example

Structured-output example

Output

Constraints

When you would use it