AI Model Gateway¶

Preloop can act as a control plane for managed model traffic, not just as an MCP policy layer for tools.

What the Model Gateway Does¶

The model gateway gives managed runtimes a consistent way to send model traffic through Preloop instead of talking directly to upstream providers.

This unlocks several capabilities in one place:

Managed ingress for OpenAI-compatible and Anthropic-compatible clients
Centralized budget checks before upstream dispatch
Allowed-model enforcement based on the active subject scope
Usage accounting for tokens, spend, provider, and runtime attribution
Execution and session observability through gateway events and runtime-session views

Supported Ingress Shapes¶

Today the gateway is centered around these API shapes:

GET /openai/v1/models
POST /openai/v1/chat/completions
POST /openai/v1/responses
POST /anthropic/v1/messages

This lets multiple agent runtimes use one shared control plane even when their preferred client protocol differs.

How Requests Flow¶

At a high level, a managed runtime sends a request to Preloop, Preloop evaluates whether the request is allowed, and only then forwards the request to the upstream provider.

sequenceDiagram
    participant Runtime as Managed Runtime
    participant Gateway as Preloop Model Gateway
    participant Policy as Governance + Budget Checks
    participant Provider as Upstream Provider
    participant Ledger as Usage Ledger

    Runtime->>Gateway: Model request
    Gateway->>Policy: Validate subject scope + budgets
    alt Request allowed
        Policy-->>Gateway: Approved
        Gateway->>Provider: Forward upstream request
        Provider-->>Gateway: Model response
        Gateway->>Ledger: Record usage + attribution
        Gateway-->>Runtime: Response
    else Request denied
        Policy-->>Gateway: Denied with reason
        Gateway-->>Runtime: Budget/policy error
    end

Gateway-Enabled vs Direct-Provider Access¶

Preloop supports two broad model access patterns.

Gateway-Enabled Models¶

When gateway routing is enabled:

The runtime can receive a managed base URL, model alias, and short-lived bearer token
Model traffic flows through Preloop
Budget checks and allowed-model checks happen before the request leaves Preloop
Usage and spend are recorded centrally

This is the preferred path when you want centralized governance and observability.

Direct-Provider Models¶

When gateway routing is not enabled or not yet supported for a given runtime/provider combination:

The runtime can still use the configured model directly
Observability may be narrower than the gateway path
Budget enforcement and captured gateway events only apply to traffic that actually flows through Preloop

Budget Enforcement¶

The gateway can enforce multiple budget layers before the upstream call is made.

Examples include:

Account-level limits
Flow-level limits
Model-specific limits
Trial hosted-model hard caps

When a request is denied, the gateway returns a reason that can be surfaced to operators and to the calling runtime.

Attribution and Usage Accounting¶

Gateway requests are recorded in the shared usage ledger with fields such as:

Account
User or API key
Flow and flow execution
Runtime session
Runtime principal
Provider and model alias
Prompt, completion, and total tokens
Estimated cost

This shared ledger powers account usage summaries, dashboard telemetry, session views, and per-model observability.

Gateway Events and Redaction¶

Preloop can capture normalized gateway events for debugging and operator review.

These events are designed to be useful without exposing secrets unnecessarily:

Request and response previews are normalized across providers
Sensitive values should be redacted before capture or logging
Execution-scoped event views help explain what happened during one flow run
Session- and account-scoped views help operators find broader patterns over time