Documentation Index
Fetch the complete documentation index at: https://arkor-92aeef0e-function-calling-structured-outputs.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
infer
infer is a function passed into onCheckpoint on CheckpointContext. It runs an inference request bound to the just-saved checkpoint adapter and returns the raw Response. There is no top-level infer export; the SDK exposes it as a callback argument so that the call is automatically scoped to the right job + checkpoint step.
Input
| Field | Type | Notes |
|---|---|---|
messages | ChatMessage[] | Chat history. Discriminated union over system / user / assistant (with optional tool_calls) / tool (with tool_call_id) — matches the OpenAI message shape so a tool-calling history can round-trip. |
temperature | number? | Sampling temperature. Backend default if omitted. |
topP | number? | Nucleus sampling. Backend default if omitted. |
maxTokens | number? | Maximum response tokens. Backend default if omitted. |
stream | boolean? | Default true (SSE). Set false for a single JSON body. |
tools | ToolDefinition[]? | Function declarations the model is allowed to call. When set without an explicit toolChoice, the OpenAI-compatible default "auto" applies; the underlying endpoint must be configured for auto-tool extraction or the request returns 400 tool_calling_not_configured. |
toolChoice | ToolChoice? | "auto" / "none" / "required" / { type: "function", function: { name } } — only "auto" (and the default when tools is present) needs the auto-extraction parser; the rest go through the guided-decoding path. |
responseFormat | ResponseFormat? | OpenAI’s standard structured-output knob: { type: "text" }, { type: "json_object" }, or { type: "json_schema", json_schema: { name, schema, strict? } }. Prefer this when expressible. |
structuredOutputs | StructuredOutputs? | vLLM extension for constraints responseFormat can’t express. Mutually-exclusive: pick at most one of json / regex / choice / grammar / json_object / structural_tag. Field names are snake_case (json_object, structural_tag, disable_any_whitespace, whitespace_pattern) to match vLLM’s wire format. |
signal | AbortSignal? | Aborts the local fetch. Does not stop work on the backend; the model finishes generating but you stop reading. |
Tool calling example
Structured-output example
Output
infer returns Promise<Response>: the raw Fetch Response. The SDK does not parse the body; you decide how to consume it:
stream: true (the default), the body is an SSE event stream in the same shape Studio’s Playground consumes. The SDK does not currently expose a frame parser for this stream; if you need decoded text deltas, copy the small extractInferenceDelta helper from packages/studio-app/src/lib/api.ts or write a parser around eventsource-parser.
Constraints
inferlives only onCheckpointContext. There is no equivalent for completed jobs from the SDK side; for that path use the cloud-api directly or trigger the run again. Studio’s Playground is the UI-level route to chat with a completed adapter.- The call is scoped to
{ kind: "checkpoint", jobId, step }. You cannot retarget it to a different checkpoint or a different model from insideonCheckpoint. - The function is not memoized: every call hits the backend.
When you would use it
- Sanity check during a run. Compare a checkpoint at step 50 to one at step 100 against a fixed prompt. If the loss curve looks fine but outputs are degraded, you find out before the run finishes.
- Custom early-stopping. Combine with a simple eval prompt: if outputs diverge, abort the run via
controller.abort()(seeabortSignal) and calltrainer.cancel()to stop the backend. See the Early stopping recipe for the full pattern. - Live preview into your own UI. Send the checkpoint output to Slack, an internal review queue, or your own app’s preview channel.