Skip to content

AI Model Gateway

Preloop can act as a control plane for managed model traffic, not just as an MCP policy layer for tools.

What the Model Gateway Does

The model gateway gives managed runtimes a consistent way to send model traffic through Preloop instead of talking directly to upstream providers.

This unlocks several capabilities in one place:

  • Managed ingress for OpenAI-compatible and Anthropic-compatible clients
  • Centralized budget checks before upstream dispatch
  • Allowed-model enforcement based on the active subject scope
  • Usage accounting for tokens, spend, provider, and runtime attribution
  • Execution and session observability through gateway events and runtime-session views

Supported Ingress Shapes

Today the gateway is centered around these API shapes:

  • GET /openai/v1/models
  • POST /openai/v1/chat/completions
  • POST /openai/v1/responses
  • POST /anthropic/v1/messages

This lets multiple agent runtimes use one shared control plane even when their preferred client protocol differs.

How Requests Flow

At a high level, a managed runtime sends a request to Preloop, Preloop evaluates whether the request is allowed, and only then forwards the request to the upstream provider.

sequenceDiagram
    participant Runtime as Managed Runtime
    participant Gateway as Preloop Model Gateway
    participant Policy as Governance + Budget Checks
    participant Provider as Upstream Provider
    participant Ledger as Usage Ledger

    Runtime->>Gateway: Model request
    Gateway->>Policy: Validate subject scope + budgets
    alt Request allowed
        Policy-->>Gateway: Approved
        Gateway->>Provider: Forward upstream request
        Provider-->>Gateway: Model response
        Gateway->>Ledger: Record usage + attribution
        Gateway-->>Runtime: Response
    else Request denied
        Policy-->>Gateway: Denied with reason
        Gateway-->>Runtime: Budget/policy error
    end

Gateway-Enabled vs Direct-Provider Access

Preloop supports two broad model access patterns.

Gateway-Enabled Models

When gateway routing is enabled:

  • The runtime can receive a managed base URL, model alias, and short-lived bearer token
  • Model traffic flows through Preloop
  • Budget checks and allowed-model checks happen before the request leaves Preloop
  • Usage and spend are recorded centrally

This is the preferred path when you want centralized governance and observability.

Direct-Provider Models

When gateway routing is not enabled or not yet supported for a given runtime/provider combination:

  • The runtime can still use the configured model directly
  • Observability may be narrower than the gateway path
  • Budget enforcement and captured gateway events only apply to traffic that actually flows through Preloop

Budget Enforcement

The gateway can enforce multiple budget layers before the upstream call is made.

Examples include:

  • Account-level limits
  • Flow-level limits
  • Model-specific limits
  • Trial hosted-model hard caps

When a request is denied, the gateway returns a reason that can be surfaced to operators and to the calling runtime.

Attribution and Usage Accounting

Gateway requests are recorded in the shared usage ledger with fields such as:

  • Account
  • User or API key
  • Flow and flow execution
  • Runtime session
  • Runtime principal
  • Provider and model alias
  • Prompt, completion, and total tokens
  • Estimated cost

This shared ledger powers account usage summaries, dashboard telemetry, session views, and per-model observability.

Gateway Events and Redaction

Preloop can capture normalized gateway events for debugging and operator review.

These events are designed to be useful without exposing secrets unnecessarily:

  • Request and response previews are normalized across providers
  • Sensitive values should be redacted before capture or logging
  • Execution-scoped event views help explain what happened during one flow run
  • Session- and account-scoped views help operators find broader patterns over time

Where to Explore It Next