AI Model Gateway¶
Preloop can act as a control plane for managed model traffic, not just as an MCP policy layer for tools.
What the Model Gateway Does¶
The model gateway gives managed runtimes a consistent way to send model traffic through Preloop instead of talking directly to upstream providers.
This unlocks several capabilities in one place:
- Managed ingress for OpenAI-compatible and Anthropic-compatible clients
- Centralized budget checks before upstream dispatch
- Allowed-model enforcement based on the active subject scope
- Usage accounting for tokens, spend, provider, and runtime attribution
- Execution and session observability through gateway events and runtime-session views
Supported Ingress Shapes¶
Today the gateway is centered around these API shapes:
GET /openai/v1/modelsPOST /openai/v1/chat/completionsPOST /openai/v1/responsesPOST /anthropic/v1/messages
This lets multiple agent runtimes use one shared control plane even when their preferred client protocol differs.
How Requests Flow¶
At a high level, a managed runtime sends a request to Preloop, Preloop evaluates whether the request is allowed, and only then forwards the request to the upstream provider.
sequenceDiagram
participant Runtime as Managed Runtime
participant Gateway as Preloop Model Gateway
participant Policy as Governance + Budget Checks
participant Provider as Upstream Provider
participant Ledger as Usage Ledger
Runtime->>Gateway: Model request
Gateway->>Policy: Validate subject scope + budgets
alt Request allowed
Policy-->>Gateway: Approved
Gateway->>Provider: Forward upstream request
Provider-->>Gateway: Model response
Gateway->>Ledger: Record usage + attribution
Gateway-->>Runtime: Response
else Request denied
Policy-->>Gateway: Denied with reason
Gateway-->>Runtime: Budget/policy error
end
Gateway-Enabled vs Direct-Provider Access¶
Preloop supports two broad model access patterns.
Gateway-Enabled Models¶
When gateway routing is enabled:
- The runtime can receive a managed base URL, model alias, and short-lived bearer token
- Model traffic flows through Preloop
- Budget checks and allowed-model checks happen before the request leaves Preloop
- Usage and spend are recorded centrally
This is the preferred path when you want centralized governance and observability.
Direct-Provider Models¶
When gateway routing is not enabled or not yet supported for a given runtime/provider combination:
- The runtime can still use the configured model directly
- Observability may be narrower than the gateway path
- Budget enforcement and captured gateway events only apply to traffic that actually flows through Preloop
Budget Enforcement¶
The gateway can enforce multiple budget layers before the upstream call is made.
Examples include:
- Account-level limits
- Flow-level limits
- Model-specific limits
- Trial hosted-model hard caps
When a request is denied, the gateway returns a reason that can be surfaced to operators and to the calling runtime.
Attribution and Usage Accounting¶
Gateway requests are recorded in the shared usage ledger with fields such as:
- Account
- User or API key
- Flow and flow execution
- Runtime session
- Runtime principal
- Provider and model alias
- Prompt, completion, and total tokens
- Estimated cost
This shared ledger powers account usage summaries, dashboard telemetry, session views, and per-model observability.
Gateway Events and Redaction¶
Preloop can capture normalized gateway events for debugging and operator review.
These events are designed to be useful without exposing secrets unnecessarily:
- Request and response previews are normalized across providers
- Sensitive values should be redacted before capture or logging
- Execution-scoped event views help explain what happened during one flow run
- Session- and account-scoped views help operators find broader patterns over time