Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arkor-92aeef0e-function-calling-structured-outputs.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

infer

infer is a function passed into onCheckpoint on CheckpointContext. It runs an inference request bound to the just-saved checkpoint adapter and returns the raw Response. There is no top-level infer export; the SDK exposes it as a callback argument so that the call is automatically scoped to the right job + checkpoint step.
onCheckpoint: async ({ step, infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "I can't log in." },
    ],
  });
  console.log(`step=${step} sample=`, await res.text());
}

Input

interface InferArgs {
  messages: ChatMessage[];
  temperature?: number;
  topP?: number;
  maxTokens?: number;
  /** Default: true. Set false to get a single JSON body instead of SSE. */
  stream?: boolean;
  /** OpenAI-compatible function-calling tool definitions. */
  tools?: ToolDefinition[];
  /** "auto" | "none" | "required" | { type: "function", function: { name } } */
  toolChoice?: ToolChoice;
  /** OpenAI-compatible response_format (text / json_object / json_schema). */
  responseFormat?: ResponseFormat;
  /** vLLM structured outputs (regex / choice / grammar) for cases response_format can't express. */
  structuredOutputs?: StructuredOutputs;
  signal?: AbortSignal;
}
FieldTypeNotes
messagesChatMessage[]Chat history. Discriminated union over system / user / assistant (with optional tool_calls) / tool (with tool_call_id) — matches the OpenAI message shape so a tool-calling history can round-trip.
temperaturenumber?Sampling temperature. Backend default if omitted.
topPnumber?Nucleus sampling. Backend default if omitted.
maxTokensnumber?Maximum response tokens. Backend default if omitted.
streamboolean?Default true (SSE). Set false for a single JSON body.
toolsToolDefinition[]?Function declarations the model is allowed to call. When set without an explicit toolChoice, the OpenAI-compatible default "auto" applies; the underlying endpoint must be configured for auto-tool extraction or the request returns 400 tool_calling_not_configured.
toolChoiceToolChoice?"auto" / "none" / "required" / { type: "function", function: { name } } — only "auto" (and the default when tools is present) needs the auto-extraction parser; the rest go through the guided-decoding path.
responseFormatResponseFormat?OpenAI’s standard structured-output knob: { type: "text" }, { type: "json_object" }, or { type: "json_schema", json_schema: { name, schema, strict? } }. Prefer this when expressible.
structuredOutputsStructuredOutputs?vLLM extension for constraints responseFormat can’t express. Mutually-exclusive: pick at most one of json / regex / choice / grammar / json_object / structural_tag. Field names are snake_case (json_object, structural_tag, disable_any_whitespace, whitespace_pattern) to match vLLM’s wire format.
signalAbortSignal?Aborts the local fetch. Does not stop work on the backend; the model finishes generating but you stop reading.

Tool calling example

onCheckpoint: async ({ infer }) => {
  const res = await infer({
    messages: [
      { role: "user", content: "What's the weather in Tokyo?" },
    ],
    tools: [
      {
        type: "function",
        function: {
          name: "get_weather",
          parameters: {
            type: "object",
            properties: { city: { type: "string" } },
            required: ["city"],
          },
        },
      },
    ],
    toolChoice: "auto",
    stream: false,
  });
  const data = (await res.json()) as { choices: Array<{ message: ChatMessage }> };
  // data.choices[0].message may be { role: "assistant", tool_calls: [...] }
};

Structured-output example

const res = await infer({
  messages: [{ role: "user", content: "Extract the user's email." }],
  responseFormat: {
    type: "json_schema",
    json_schema: {
      name: "user",
      schema: {
        type: "object",
        properties: { email: { type: "string", format: "email" } },
        required: ["email"],
      },
      strict: true,
    },
  },
});

Output

infer returns Promise<Response>: the raw Fetch Response. The SDK does not parse the body; you decide how to consume it:
// Streaming (default)
const res = await infer({ messages });
for await (const chunk of res.body!) {
  // chunk: Uint8Array of one or more SSE frames
}

// Or read the whole stream at once
const text = await res.text();

// Or, if you set stream: false, parse the JSON body
const res = await infer({ messages, stream: false });
const data = await res.json();
When stream: true (the default), the body is an SSE event stream in the same shape Studio’s Playground consumes. The SDK does not currently expose a frame parser for this stream; if you need decoded text deltas, copy the small extractInferenceDelta helper from packages/studio-app/src/lib/api.ts or write a parser around eventsource-parser.

Constraints

  • infer lives only on CheckpointContext. There is no equivalent for completed jobs from the SDK side; for that path use the cloud-api directly or trigger the run again. Studio’s Playground is the UI-level route to chat with a completed adapter.
  • The call is scoped to { kind: "checkpoint", jobId, step }. You cannot retarget it to a different checkpoint or a different model from inside onCheckpoint.
  • The function is not memoized: every call hits the backend.

When you would use it

  • Sanity check during a run. Compare a checkpoint at step 50 to one at step 100 against a fixed prompt. If the loss curve looks fine but outputs are degraded, you find out before the run finishes.
  • Custom early-stopping. Combine with a simple eval prompt: if outputs diverge, abort the run via controller.abort() (see abortSignal) and call trainer.cancel() to stop the backend. See the Early stopping recipe for the full pattern.
  • Live preview into your own UI. Send the checkpoint output to Slack, an internal review queue, or your own app’s preview channel.