Smartflow — API and SDK Reference

Platform v3.0  •  SDK v0.3.0  •  February 2026

Smartflow is an enterprise AI gateway that proxies requests to multiple LLM providers, enforces compliance policy, caches semantically, and orchestrates MCP tools and A2A agents. This document covers every API surface the platform exposes: the LLM proxy, management APIs, MCP gateway, A2A gateway, vector store, RAG pipeline, and the Python SDK.

Architecture Overview

Smartflow runs as five cooperating services:

ServiceDefault PortPurpose
smartflow (proxy)7775LLM proxy, MCP gateway, A2A gateway, semantic caching, pre/post-call compliance hooks
api_server (management)7778Virtual keys, routing chains, audit logs, analytics
compliance_api_server7777ML content scanning, PII redaction, adaptive learning, intelligent scan
policy_perfect_api7782Policy and preset CRUD, AI document-to-policy generation, assignment management
smartflow-hybrid-bridge3500Cross-datacenter Redis log aggregation

All five services share one Redis instance for shared state: routing tables, semantic cache, VAS logs, provider latency metrics, virtual key budgets, and MCP server registry. The Policy Perfect API additionally requires PostgreSQL for durable policy and preset storage. In production the proxy sits behind a TLS-terminating reverse proxy (Caddy or nginx). Management, compliance, and policy APIs are backend surfaces.

Authentication

Virtual Keys

The primary credential for clients. Issue sk-sf-{48-hex} tokens via the management API (POST /api/enterprise/vkeys). Each key is stored in Redis and carries optional USD budget caps, period resets, and rate limits. The raw token is returned exactly once at creation — it cannot be retrieved again.

# OpenAI-compatible and all /v1/* routes
Authorization: Bearer sk-sf-a1b2c3d4...

# Anthropic native /anthropic/v1/* routes
x-api-key: sk-sf-a1b2c3d4...

On every request the proxy:

  1. Extracts the sk-sf-* token from Authorization: Bearer or x-api-key
  2. Looks up the key in Redis — returns 429 if not found or revoked
  3. Checks USD budget against accumulated spend — returns 429 with X-Smartflow-Budget-Exceeded: true if exceeded
  4. Forwards request to the provider using the server-side provider API key
  5. Records actual spend (cost_usd) back to the key's spend counter after completion

Provider API Keys

Stored server-side in Redis (smartflow:api_keys:{provider}_api_key). Clients never send raw provider credentials. The proxy resolves the correct key from the key store when forwarding to each provider.

Anthropic Native Passthrough

For /anthropic/* routes, clients send their Smartflow virtual key as x-api-key. The proxy validates budget, then replaces it with the real ANTHROPIC_API_KEY before forwarding.

JWT (SafeChat / Dashboard)

The SafeChat web app and admin dashboard use a smartflow_token cookie-based JWT for browser sessions. JWT validation occurs at the application layer, not in the proxy core.

LLM Proxy Endpoints

All proxy endpoints are on port 7775 by default.

/v1/chat/completions

POST/v1/chat/completions

OpenAI-compatible chat completions. Accepts any OpenAI-format request body. Provider and model are resolved from the model name or an explicit prefix.

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false
}

Model-prefix routing:

Prefix / PatternProvider
gpt-*, o1-*, o3-*, chatgpt-*OpenAI
claude-*Anthropic
gemini-*Google Gemini
grok-*xAI
mistral-*, mixtral-*Mistral AI
command-*, c4ai-*Cohere
llama-*, groq/*Groq
openrouter/*OpenRouter
ollama/*Local Ollama
azure/*Azure OpenAI

No prefix is required for the primary supported providers — model name heuristic detects gemini-*, claude-*, gpt-*, etc. automatically. An explicit provider/model prefix always takes precedence.

Multimodal — Image

{
  "model": "gpt-4o",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What is in this image?"},
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
    ]
  }]
}

Multimodal — Audio (gpt-4o-audio-preview)

{
  "type": "input_audio",
  "input_audio": {"data": "<base64>", "format": "mp3"}
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Paris."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 24, "completion_tokens": 3, "total_tokens": 27}
}

/anthropic/v1/messages

POST/anthropic/v1/messages

Native Anthropic Messages API passthrough. The proxy injects the API key from the server key store. The full Anthropic request and response format is preserved with no translation. Also accessible as /cursor/v1/messages for Cursor IDE passthrough. The [1m] suffix that Claude Code appends to model names is stripped automatically.

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "system": "You are a helpful assistant.",
  "messages": [
    {"role": "user", "content": "Hello, Claude."}
  ]
}

Multimodal — Image (native Anthropic)

{
  "type": "image",
  "source": {"type": "base64", "media_type": "image/png", "data": "..."}
}

Multimodal — PDF Document (native Anthropic)

{
  "type": "document",
  "source": {"type": "base64", "media_type": "application/pdf", "data": "..."}
}

/v1/embeddings

POST/v1/embeddings

Generate vector embeddings. Supports multi-provider routing via model prefix.

{"model": "text-embedding-3-small", "input": "Your text here"}

Response follows the OpenAI embeddings format with data[].embedding float arrays.

/v1/audio/transcriptions

POST/v1/audio/transcriptions

Transcribe audio. Multipart form upload. Routes to OpenAI Whisper by default. Use groq/whisper-large-v3 for Groq, deepgram/nova-2 for Deepgram.

Content-Type: multipart/form-data

file=@audio.mp3
model=whisper-1

/v1/audio/speech

POST/v1/audio/speech

Text-to-speech synthesis. Returns raw audio bytes.

{
  "model": "tts-1", "input": "Hello, world.",
  "voice": "nova", "response_format": "mp3"
}

/v1/images/generations

POST/v1/images/generations
{
  "model": "dall-e-3", "prompt": "A futuristic city at sunrise",
  "n": 1, "size": "1024x1024", "quality": "hd",
  "style": "vivid", "response_format": "url"
}

/v1/rerank

POST/v1/rerank

Document reranking. Compatible with Cohere's rerank API.

{
  "model": "rerank-english-v3.0",
  "query": "What is the return policy?",
  "documents": ["Document one text.", "Document two text."],
  "top_n": 3
}

/v1/models

GET/v1/models

List available models across all enabled providers.

/v1/completions

POST/v1/completions

Legacy text completions. Forwarded to the configured provider.

Routing and Provider Selection

Automatic Model-Name Heuristic

For requests to /v1/chat/completions with no explicit provider prefix, the proxy infers the provider from the model name. An explicit provider/model prefix always takes precedence over heuristic detection.

PatternInferred Provider
gpt-*, o1-*, o3-*, o4-*, chatgpt-*, whisper-*, tts-*, dall-e-*OpenAI
claude-*Anthropic
gemini-*Google
grok-*xAI
mistral-*, mixtral-*Mistral
command-*Cohere
llama-*Groq

Routing Strategies

Configured per fallback chain via the management API:

StrategyBehavior
round_robinDistribute requests across targets in order
weightedTraffic proportional to assigned weights
least_connectionsSend to provider with fewest in-flight requests
randomRandom selection among healthy providers
priorityTry targets in order; fall back only on failure
latencyRoute to provider with lowest p95 rolling EMA latency (tracked in Redis)
costRoute to provider with lowest per-token cost; skip providers over daily budget cap

Fallback Chains

Named ordered provider lists with retry logic. Configured at POST /api/routing/fallback-chains.

{
  "name": "production-chain",
  "targets": [
    {"provider": "openai",    "model": "gpt-4o",                        "weight": 1},
    {"provider": "anthropic", "model": "claude-sonnet-4-6", "weight": 1},
    {"provider": "google",    "model": "gemini-1.5-pro",             "weight": 1}
  ],
  "retry_on": ["429", "500", "502", "503"],
  "max_retries": 2,
  "backoff_ms": 500
}

On 429 or 5xx the proxy retries the next target with exponential backoff. Non-retryable 4xx errors bypass retry. Providers that have exceeded their daily budget cap are excluded from selection automatically.

MetaCache — Semantic Caching

The MetaCache intercepts every /v1/chat/completions request before any provider call is made.

How It Works

The incoming query is embedded and its cosine similarity is computed against stored request embeddings. If similarity exceeds the configured threshold, the cached response is returned. Otherwise the request is forwarded to the provider and the response is stored. Responses are semantically compressed before storage to reduce Redis footprint while preserving meaning.

Three tiers operate in sequence: L1 in-process memory, L2 Redis semantic similarity, L3 Redis exact match. Every cache lookup traverses all three before forwarding.

Per-Request Cache Controls

HeaderEffect
Cache-Control: no-cacheBypass cache read; always query the provider
Cache-Control: no-storeBypass cache write; do not cache this response
x-smartflow-cache-ttl: 3600Override TTL in seconds for this response
x-smartflow-cache-namespace: <ns>Scope cache to a logical partition

Cached responses return x-smartflow-cache-hit: true and x-smartflow-cache-key for client-side correlation.

MCP Gateway

Smartflow implements the Model Context Protocol (MCP) gateway. Register external MCP servers and invoke their tools through the proxy with shared authentication, budgeting, and audit logging.

Server Registry

GET/api/mcp/servers

List registered MCP servers.

POST/api/mcp/servers

Register an MCP server.

{
  "id": "github-tools",
  "name": "GitHub MCP Server",
  "base_url": "https://mcp.github.example.com",
  "auth_type": "bearer",
  "allowed_tools": ["list_repos", "create_issue"],
  "disallowed_tools": [],
  "cost_info": {"per_call_usd": 0.001},
  "guardrail_mode": "strict"
}

Tool Invocation

POST/{server_id}/mcp/
POST/mcp/v1/{server_id}/tools/call

The proxy authenticates the request, applies per-tool access controls, records cost, and forwards to the server.

Catalog and Search

GET/api/mcp/catalog

Browse the tool catalog across all registered servers.

GET/api/mcp/tools/search?q={query}&k={n}

Semantic search over the tool catalog. Returns the top k tools matching the natural-language query.

GET/api/mcp/tools/index

Full indexed tool list with embedding metadata.

Access Control

Per-server configuration fields for access control:

FieldTypeDescription
allowed_toolsstring[]If non-empty, only these tools may be called
disallowed_toolsstring[]These tools are always blocked
allowed_paramsobjectPer-tool parameter allowlists
guardrail_modestring"strict" — block on policy violation; "log" — flag and continue
available_on_public_internetboolIf false, only accessible from approved network segments

Access Request Flow

GET/api/mcp/catalog/requests
POST/api/mcp/catalog/requests
POST/api/mcp/catalog/requests/{id}/approve
POST/api/mcp/catalog/requests/{id}/deny

OAuth Flow

GET/api/mcp/auth/initiate?server_id={id}
GET/api/mcp/auth/callback
GET/api/mcp/auth/tokens

Usage and Logs

GET/api/mcp/usage

Aggregated cost and call counts per server and tool.

GET/api/mcp/logs

Per-invocation audit logs.

API Generation from OpenAPI Spec

POST/api/mcp/generate

Auto-generate an MCP server adapter from an OpenAPI specification.

{
  "spec": "<OpenAPI JSON or YAML string>",
  "server_id": "my-api",
  "server_name": "My REST API",
  "base_url": "https://api.example.com",
  "include_methods": ["GET", "POST"]
}

A2A Agent Gateway

Smartflow implements the A2A (Agent-to-Agent) protocol for inter-agent communication. Register external agents and invoke them with full logging and routing.

Agent Card

GET/a2a/{agent_id}/.well-known/agent.json

Returns the agent's machine-readable capability card: name, capabilities, supported task types, and authentication requirements.

Task Invocation

POST/a2a/{agent_id}

Send a task to a registered agent. The proxy forwards the request, captures the response, and logs both.

{
  "id": "task-uuid",
  "message": {
    "role": "user",
    "parts": [{"type": "text", "text": "Summarize the latest earnings report."}]
  }
}

Supports synchronous JSON responses and SSE streaming for long-running tasks. Include x-a2a-trace-id to correlate task invocations across agents in distributed workflows.

Vector Store API

Built-in vector store backed by Redis. No external vector database required. All endpoints are on the proxy at port 7775.

POST/v1/vector_stores

Create a vector store.

{
  "name": "product-documentation",
  "description": "Internal product docs",
  "metadata": {"team": "engineering"}
}

Response includes id, name, description, file_count, created_at.

GET/v1/vector_stores

List all vector stores.

GET/v1/vector_stores/{id}

Get a specific vector store.

DELETE/v1/vector_stores/{id}

Delete a vector store and all its files.

POST/v1/vector_stores/{id}/files

Add a text document. The document is chunked and embedded automatically.

{
  "content": "Full document text...",
  "filename": "architecture.md",
  "metadata": {"version": "3.0"}
}
GET/v1/vector_stores/{id}/files

List files in a vector store.

POST/v1/vector_stores/{id}/search

Semantic search over stored documents.

{
  "query": "How does the caching layer work?",
  "max_results": 5,
  "score_threshold": 0.7
}

// Response
{
  "results": [
    {"file_id": "vf_xyz", "filename": "architecture.md", "content": "...chunk...", "score": 0.91}
  ],
  "total": 1
}

RAG Pipeline API

Built on top of the vector store. Ingest documents with automatic chunking, then retrieve context for LLM augmentation.

POST/v1/rag/ingest

Chunk a document, embed each chunk, and store in a named vector store.

{
  "content": "Full document text...",
  "vector_store_id": "vs_abc123",
  "filename": "report-q4.txt",
  "chunk_size": 512,
  "chunk_overlap": 64,
  "metadata": {"source": "internal"}
}
FieldTypeDefaultDescription
contentstringrequiredFull document text
vector_store_idstringrequiredTarget store (must already exist)
filenamestring""Display name for the file
chunk_sizeint512Characters per chunk
chunk_overlapint64Overlap between consecutive chunks
metadataobject{}Arbitrary key-value metadata

Response: { "store_id", "file_id", "chunks_created", "status": "completed" }

POST/v1/rag/query

Embed a question, retrieve matching chunks, and optionally assemble a context string for injection into an LLM system prompt.

{
  "query": "What were the Q4 revenue figures?",
  "vector_store_id": "vs_abc123",
  "max_results": 5,
  "score_threshold": 0.0,
  "include_context": true
}
FieldDefaultDescription
queryrequiredNatural language question
vector_store_idrequiredStore to search
max_results5Maximum chunks to return
score_threshold0.0Minimum cosine similarity (0 = return all)
include_contexttrueConcatenate chunks into a context string field

Response includes chunks[], context (concatenated string for prompt injection), and total.

Management API

Management API runs on port 7778.

Virtual Keys

All virtual key endpoints are on the management port (7778 / /api/enterprise/vkeys). Requires admin-level auth.

POST/api/enterprise/vkeys

Create a virtual key. The token is returned once only — store it immediately. All fields except name and owner_id are optional.

// Request
{
  "name":       "team-alpha-prod",       // human label
  "owner_id":   "alice@example.com",     // user or team id
  "provider":   null,                    // null = any provider; or "openai","anthropic"…
  "budget_usd": 100.00,                  // USD ceiling per period; null = unlimited
  "period":     "monthly",               // daily | weekly | monthly | lifetime
  "tpm_limit":  100000,                  // tokens per minute; null = unlimited
  "rpm_limit":  60,                      // requests per minute; null = unlimited
  "tags": { "team": "alpha", "env": "prod" }
}

// Response — token shown ONCE
{
  "success": true,
  "token":   "sk-sf-a1b2c3d4e5f6...",  // 48-hex — deliver to your client now
  "key_id":  "sk-sf-a1b2c3d4e5f6...",
  "message": "Virtual key created. Store the token securely — it will not be shown again."
}
GET/api/enterprise/vkeys

List all virtual keys. Tokens are never returned here — only metadata (name, owner, budget, active status).

{
  "success": true,
  "total": 3,
  "keys": [
    {
      "key_id":       "sk-sf-a1b2...",
      "name":         "team-alpha-prod",
      "owner_id":     "alice@example.com",
      "active":       true,
      "budget_usd":   100.00,
      "period":       "monthly",
      "period_start": "2026-03-01T00:00:00Z",
      "tpm_limit":    100000,
      "rpm_limit":    60,
      "created_at":   "2026-02-15T10:30:00Z",
      "last_used":    "2026-03-10T08:22:00Z",
      "tags": { "team": "alpha" }
    }
  ]
}
GET/api/enterprise/vkeys/user/{owner_id}

List all keys owned by a specific user or team id.

GET/api/enterprise/vkeys/{key_id}/budget

Get current period spend and remaining budget for a key.

{
  "success": true,
  "budget": {
    "key_id":         "sk-sf-a1b2...",
    "name":           "team-alpha-prod",
    "active":         true,
    "budget_usd":     100.00,
    "period":         "monthly",
    "spent_usd":      23.47,
    "remaining_usd":  76.53,
    "percent_used":   23.47,
    "last_used":      "2026-03-10T08:22:00Z"
  }
}
POST/api/enterprise/vkeys/{key_id}/reset

Reset the spend counter to zero and restart the budget period. No request body needed.

{ "success": true, "message": "Budget reset for key 'sk-sf-a1b2...'" }
POST/api/enterprise/vkeys/{key_id}/revoke

Deactivate a key. Revoked keys return 429 Virtual key is revoked on all subsequent requests. The key metadata is retained in Redis for audit purposes.

{ "success": true, "message": "Virtual key 'sk-sf-a1b2...' revoked" }

Budget Exceeded Response

When a request is blocked by budget enforcement, the proxy returns:

HTTP/1.1 429 Too Many Requests
X-Smartflow-Budget-Exceeded: true
X-Smartflow-Budget-Remaining: 0.0000

{
  "error": {
    "message": "Budget limit exceeded: $100.00 spent of $100.00 (monthly budget)",
    "type":    "budget_exceeded",
    "code":    "insufficient_quota"
  },
  "spent_usd":     100.04,
  "remaining_usd": 0.0
}

Quick Start — cURL

# Create a key with a $50/month budget
curl -s https://YOUR_SMARTFLOW_HOST/api/enterprise/vkeys \
  -H "Authorization: Bearer YOUR_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-app-prod",
    "owner_id": "alice@example.com",
    "budget_usd": 50.0,
    "period": "monthly"
  }'

# Check spend
curl -s https://YOUR_SMARTFLOW_HOST/api/enterprise/vkeys/sk-sf-xxx/budget \
  -H "Authorization: Bearer YOUR_ADMIN_KEY"

# Revoke
curl -s -X POST https://YOUR_SMARTFLOW_HOST/api/enterprise/vkeys/sk-sf-xxx/revoke \
  -H "Authorization: Bearer YOUR_ADMIN_KEY"

Routing API

GET/api/routing/fallback-chains
POST/api/routing/fallback-chains
DELETE/api/routing/fallback-chains/{name}
GET/api/routing/status

Current routing state: active provider, fallback chain, last failure.

POST/api/routing/force-provider
{"provider": "openai", "duration_seconds": 600}

Audit Logs (VAS)

GET/api/vas/logs?limit=50&provider=openai

Retrieve VAS audit logs. Every request proxied through Smartflow produces a log entry including: timestamp, provider, model, prompt tokens, completion tokens, cost in USD, cache hit flag, compliance flags, user context, and latency.

GET/api/vas/logs/hybrid

Retrieve logs aggregated across multiple Smartflow instances via the hybrid bridge.

Analytics

GET/api/analytics?period=7d

Usage analytics: request volume, cost by provider, cache hit rate, top models, top users.

Compliance API

The Compliance API runs on port 7777. It provides ML-based content scanning, PII detection and redaction, and an adaptive learning loop that improves over time based on human feedback. The proxy integrates with this service on every request when pre/post-call scanning is enabled.

POST/v1/compliance/scan

Rule-based compliance scan against configured policies.

{
  "content": "Text to scan",
  "policy": "enterprise_standard",
  "user_id": "user-123",
  "org_id": "acme"
}

// Response
{
  "has_violations": false,
  "compliance_score": 0.97,
  "risk_level": "low",
  "recommended_action": "Allow",
  "violations": [],
  "pii_detected": []
}
POST/v1/compliance/intelligent-scan

Maestro ML policy engine. Evaluates intent against your organization's policy documents — not keyword matching.

Response includes risk_score (0–1), risk_level, recommended_action (Allow / Flag / Block), violations, explanation.

POST/v1/compliance/feedback

Submit a correction to improve the ML model's future predictions.

{
  "scan_id": "scan-xyz",
  "correct_action": "Allow",
  "correct_risk_level": "low",
  "notes": "False positive — internal terminology"
}
POST/v1/compliance/redact

Detect and redact PII from content. Returns the redacted string.

GET/v1/compliance/learning/status/{user_id}
GET/v1/compliance/learning/summary
GET/v1/compliance/ml/stats
GET/v1/compliance/org/baseline/{org_id}

Policy Perfect API

The Policy Perfect API runs on port 7782. It manages the organization's compliance policy library — the source documents the Maestro ML engine reads when evaluating requests. Backed by PostgreSQL for durable policy storage.

GET/health

Liveness check for the Policy Perfect service.

GET/api/stats

Aggregate counts for the current state of the policy library.

{
  "total_policies": 42,
  "total_presets": 8,
  "total_applications": 1204,
  "compliance_violations": 3
}

Policies

Policies are named compliance rules attached to scopes. The Maestro engine evaluates all active policies on every request.

Policy types:

TypeDescription
complianceRegulatory rules — HIPAA, GDPR, SOC 2, PCI-DSS, etc.
brandBrand voice and communication standards
formatOutput format constraints
roleRole-based access and behavior restrictions
industryIndustry-specific usage rules
legalLegal department rules and disclaimers
securitySecurity guardrails and data handling policies
GET/api/policies

List all active policies.

POST/api/policies

Create a policy.

{
  "name": "HIPAA PHI Protection",
  "description": "Prevent transmission of protected health information",
  "policy_type": "compliance",
  "content": "Do not include patient names, diagnoses, medical record numbers, or any PHI in AI responses.",
  "priority": 90,
  "applicable_providers": ["all"],
  "applicable_models": ["all"],
  "regulatory_framework": "HIPAA",
  "severity": "critical",
  "metadata": {
    "departments": ["clinical", "billing"],
    "ad_groups": ["clinicians", "admins"]
  }
}
FieldTypeDescription
namestringPolicy display name
policy_typestringOne of the seven policy types above
contentstringPolicy text read by the Maestro ML engine
priorityintEvaluation order (0–100); higher values evaluated first
applicable_providersstring[]Providers this policy applies to; ["all"] for universal
applicable_modelsstring[]Models this policy applies to; ["all"] for universal
regulatory_frameworkstringHIPAA, GDPR, SOC2, PCI-DSS, etc.
severitystringcritical, high, medium, low
metadataobjectLayer 2/3 targeting: source_ips, ad_groups, departments, applications
GET/api/policies/{id}

Get a policy by ID.

PUT/api/policies/{id}

Update a policy. All fields optional; only supplied fields are changed. Set "is_active": false to deactivate without deleting.

DELETE/api/policies/{id}

Delete a policy permanently.

Presets

Presets are named, ordered collections of policies. Assign a preset to a team, role, or virtual key instead of managing individual policies per scope.

GET/api/presets

List all presets. Each entry includes the preset metadata and its ordered policy list.

POST/api/presets

Create a preset.

{
  "name": "Healthcare Standard",
  "description": "Default policy set for all clinical staff",
  "use_case": "Clinical AI assistant",
  "policy_ids": ["pol_hipaa_phi", "pol_brand_tone", "pol_no_diagnosis"]
}

Policy order in policy_ids determines evaluation priority.

GET/api/presets/{id}

Get a preset and its full ordered policy list.

AI Document-to-Policy Generation

Upload a compliance document (PDF, DOCX, TXT — up to 50 MB). The service uses GPT-4o to extract structured policy suggestions automatically. Processing is asynchronous; poll for progress with the returned job ID.

POST/api/policies/generate-from-document

Multipart form upload. Field name: file.

Content-Type: multipart/form-data

file=@hipaa-policy-handbook.pdf

Immediate response:

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Document processing started."
}
GET/api/documents/job/{job_id}/progress

Poll for processing status. Status values: pending, processing, completed, failed.

{
  "success": true,
  "job": {
    "id": "550e8400-...",
    "filename": "hipaa-policy-handbook.pdf",
    "status": "processing",
    "progress_pct": 62,
    "created_at": "2026-02-19T10:00:00Z"
  }
}
GET/api/documents/job/{job_id}/results

Retrieve suggested policies once status is completed. Each suggestion includes a confidence score (0–1). Review suggestions and create live policies via POST /api/policies.

{
  "success": true,
  "job_id": "550e8400-...",
  "filename": "hipaa-policy-handbook.pdf",
  "total_policies": 7,
  "suggested_policies": [
    {
      "id": "sugg_abc",
      "name": "Minimum Necessary Standard",
      "type": "compliance",
      "content": "Limit PHI access and disclosure to the minimum necessary...",
      "priority": 85,
      "regulatory_framework": "HIPAA",
      "confidence": 0.94
    }
  ]
}

Alerting

Smartflow fires HTTP POST webhooks when threshold events occur. Configuration is via environment variables on the proxy server.

Alert TypeTrigger
BudgetThresholdProvider or virtual key spend exceeds configured cap
ProviderFailureError rate for a provider exceeds spike threshold
SlowRequestRequest latency exceeds the slow-request threshold
CustomProgrammatic alerts from the management API

Configure any combination of webhook destinations:

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
TEAMS_WEBHOOK_URL=https://outlook.office.com/webhook/...
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...
SMARTFLOW_ALERTS_ENABLED=true

Alerts are fire-and-forget — they do not block the request that triggered them.

Observability

GET/health/liveliness

Returns 200 OK with {"status":"ok"} when the proxy process is running.

GET/health/readiness

Returns 200 OK when Redis is connected and providers are reachable.

GET/metrics

Prometheus-compatible metrics. Exposed metrics:

MetricDescription
smartflow_requests_totalRequest counter by provider, model, status
smartflow_request_latency_secondsRequest latency histogram
smartflow_cache_hits_totalCache hit counter by tier (L1/L2/L3)
smartflow_cache_misses_totalCache miss counter
smartflow_provider_errors_totalUpstream error counter by provider and status
smartflow_tokens_totalToken usage by provider and direction
smartflow_cost_usd_totalCumulative cost by provider
smartflow_mcp_calls_totalMCP tool invocation counter by server and tool
smartflow_vkey_spend_usdPer-virtual-key spend gauge

Python SDK

Installation

pip install smartflow-sdk

# or from source
pip install git+https://github.com/SRAGroupTX/SmartflowV3.git#subdirectory=sdk/python

Requirements: Python 3.10+, httpx >= 0.24

SmartflowClient

The primary async client.

class SmartflowClient(
    base_url: str,
    api_key: Optional[str] = None,
    timeout: float = 30.0,
    management_port: int = 7778,
    compliance_port: int = 7777,
    bridge_port: int = 3500,
)
ParameterTypeDefaultDescription
base_urlstrProxy URL, e.g. "https://smartflow.example.com"
api_keystrNoneVirtual key sent as Authorization: Bearer
timeoutfloat30.0Request timeout in seconds
management_portint7778Management API port
compliance_portint7777Compliance API port
bridge_portint3500Hybrid bridge port
from smartflow import SmartflowClient

async with SmartflowClient("https://smartflow.example.com", api_key="sk-sf-...") as sf:
    reply = await sf.chat("What is the capital of France?")
    print(reply)

Core AI Methods

chat()

async def chat(
    message: str,
    model: str = "gpt-4o",
    system_prompt: Optional[str] = None,
    temperature: float = 0.7,
    max_tokens: Optional[int] = None,
    **kwargs,
) -> str

Send a message, receive the reply as a plain string.

reply = await sf.chat("Summarise this in one sentence.", model="claude-sonnet-4-6")

chat_completions()

async def chat_completions(
    messages: List[Dict[str, str]],
    model: str = "gpt-4o",
    temperature: float = 0.7,
    max_tokens: Optional[int] = None,
    stream: bool = False,
    **kwargs,
) -> AIResponse

Full OpenAI-compatible completions. Returns an AIResponse object.

response = await sf.chat_completions(
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user",   "content": "What is 2 + 2?"},
    ],
    model="gpt-4o-mini", max_tokens=50,
)
print(response.content)
print(response.usage.total_tokens)

stream_chat()

async def stream_chat(message: str, model: str = "gpt-4o", ...) -> AsyncIterator[str]

Async generator that yields text delta strings as they stream.

async for chunk in sf.stream_chat("Tell me a story about a robot"):
    print(chunk, end="", flush=True)

embeddings()

async def embeddings(
    input: Union[str, List[str]],
    model: str = "text-embedding-3-small",
    encoding_format: str = "float",
    dimensions: Optional[int] = None,
    input_type: Optional[str] = None,
    **kwargs,
) -> Dict[str, Any]
result = await sf.embeddings("Hello world")
vector = result["data"][0]["embedding"]

# Cohere with input_type
result = await sf.embeddings(
    ["doc one", "doc two"],
    model="cohere/embed-english-v3.0",
    input_type="search_document",
)

# Reduce dimensions
result = await sf.embeddings("Hello", model="text-embedding-3-large", dimensions=256)

rerank()

result = await sf.rerank(
    "What is the return policy?",
    ["We accept returns within 30 days.", "Contact support@example.com."],
    top_n=1,
)

claude_message()

Send a message to Claude via the native Anthropic Messages API path (/anthropic/v1/messages). The proxy injects the API key automatically.

async def claude_message(
    message: str,
    model: str = "claude-sonnet-4-6",
    max_tokens: int = 1024,
    system: Optional[str] = None,
    anthropic_key: Optional[str] = None,
) -> str
reply = await sf.claude_message(
    "Summarise this contract in three bullet points.",
    model="claude-sonnet-4-6",
    system="You are a legal assistant.",
    max_tokens=512,
)

chatbot_query()

Query Smartflow's built-in system chatbot for natural-language operational queries about logs, cache stats, cost analysis, and system health.

result = await sf.chatbot_query("show me today's cache stats")
print(result["response"])

result = await sf.chatbot_query("which provider had the most errors this week?")

Provider Routing Examples

All chat and completion methods accept a model argument that determines which provider is used. No additional client-side configuration is required.

# OpenAI
reply = await sf.chat("Hello", model="gpt-4o")
reply = await sf.chat("Hello", model="gpt-4o-mini")
reply = await sf.chat("Hello", model="o3-mini")

# Anthropic — model-name heuristic, no prefix needed
reply = await sf.chat("Hello", model="claude-sonnet-4-6")
reply = await sf.chat("Hello", model="claude-3-opus-20240229")

# Google Gemini — model-name heuristic
reply = await sf.chat("Hello", model="gemini-1.5-pro")
reply = await sf.chat("Hello", model="gemini-2.0-flash")

# xAI Grok — explicit prefix
reply = await sf.chat("Hello", model="xai/grok-2-latest")

# Mistral
reply = await sf.chat("Hello", model="mistral/mistral-large-latest")

# Cohere
reply = await sf.chat("Hello", model="cohere/command-r-plus")

# Groq (fast Llama inference)
reply = await sf.chat("Hello", model="groq/llama-3.1-70b-versatile")

# OpenRouter (200+ models through one key)
reply = await sf.chat("Hello", model="openrouter/meta-llama/llama-3.1-405b")

# Local Ollama
reply = await sf.chat("Hello", model="ollama/llama3.2")

# Azure OpenAI — deployment name as suffix
reply = await sf.chat("Hello", model="azure/my-gpt4o-deployment")

# Native Anthropic path (uses /anthropic/v1/messages directly)
reply = await sf.claude_message("Hello", model="claude-sonnet-4-6")

MCP Tool Invocation via SDK

The Python SDK does not expose dedicated MCP methods. MCP tool calls are made as direct HTTP requests to the proxy. Use httpx or any HTTP client. All requests go through the proxy's authentication, access control, and cost tracking.

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "https://smartflow.example.com/github-tools/mcp/",
        headers={
            "Authorization": "Bearer sk-sf-...",
            "Content-Type": "application/json",
        },
        json={
            "jsonrpc": "2.0", "id": 1, "method": "tools/call",
            "params": {
                "name": "create_issue",
                "arguments": {
                    "repo": "my-org/my-repo",
                    "title": "Bug: login fails on mobile",
                    "body": "Steps to reproduce..."
                }
            }
        }
    )
    print(response.json()["result"]["content"])

Discover available tools with a natural-language search:

async with httpx.AsyncClient() as client:
    r = await client.get(
        "https://smartflow.example.com/api/mcp/tools/search",
        params={"q": "create github issue", "k": 3},
        headers={"Authorization": "Bearer sk-sf-..."},
    )
    for tool in r.json()["results"]:
        print(f"{tool['server_id']}.{tool['name']}: {tool['description']}")

A2A Agent Invocation via SDK

A2A tasks are sent as HTTP POST requests to the proxy. The proxy forwards to the registered agent, logs the exchange, and returns the result. Include x-a2a-trace-id to correlate across multi-agent workflows.

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "https://smartflow.example.com/a2a/summarizer-agent",
        headers={
            "Authorization": "Bearer sk-sf-...",
            "Content-Type": "application/json",
            "x-a2a-trace-id": "trace-abc-123",
        },
        json={
            "id": "task-uuid-001",
            "message": {
                "role": "user",
                "parts": [{"type": "text", "text": "Summarise the Q4 earnings report."}]
            }
        }
    )
    print(response.json()["result"]["parts"][0]["text"])

# Retrieve the agent's capability card
r = await client.get(
    "https://smartflow.example.com/a2a/summarizer-agent/.well-known/agent.json",
    headers={"Authorization": "Bearer sk-sf-..."},
)
print(r.json()["capabilities"])

Audio and Image Methods

audio_transcription()

with open("recording.mp3", "rb") as f:
    result = await sf.audio_transcription(f, model="whisper-1")
print(result["text"])

# Groq Whisper (faster, same format)
with open("recording.mp3", "rb") as f:
    result = await sf.audio_transcription(f, model="groq/whisper-large-v3")

text_to_speech()

audio = await sf.text_to_speech("Hello, this is Smartflow.", voice="nova")
with open("output.mp3", "wb") as f:
    f.write(audio)

image_generation()

result = await sf.image_generation(
    "A mountain landscape at dawn",
    model="dall-e-3", size="1792x1024", quality="hd",
)
print(result["data"][0]["url"])

Compliance Methods

check_compliance()

result = await sf.check_compliance("User message text", policy="hipaa")
if result.has_violations:
    print(result.violations)

intelligent_scan()

ML-based scan combining regex, embedding similarity, behavioral analysis, and organization baselines.

async def intelligent_scan(
    content: str,
    user_id: Optional[str] = None,
    org_id: Optional[str] = None,
    context: Optional[str] = None,
) -> IntelligentScanResult
result = await sf.intelligent_scan(
    "My SSN is 123-45-6789",
    user_id="user-123",
    org_id="acme-corp",
    context="customer_support",
)
print(f"{result.risk_level}: {result.recommended_action}")
# "high: Block"

redact_pii()

clean = await sf.redact_pii("My SSN is 123-45-6789, email me at john@example.com")
# "My SSN is [SSN], email me at [EMAIL]"

submit_compliance_feedback()

Submit a true/false-positive correction. Used to retrain the ML model.

async def submit_compliance_feedback(
    scan_id: str,
    is_false_positive: bool,
    user_id: Optional[str] = None,
    notes: Optional[str] = None,
) -> Dict[str, Any]
await sf.submit_compliance_feedback(
    scan_id="scan-xyz",
    is_false_positive=True,
    notes="This was a test phone number, not real PII",
)

ML Learning and Org Baseline Methods

MethodReturnsDescription
get_learning_status(user_id)LearningStatusAdaptive learning progress for a specific user
get_learning_summary()LearningSummaryOrganization-wide learning summary
get_ml_stats()MLStatsPattern counts, accuracy, and categories for the ML engine
get_org_summary()DictOrganization-level compliance summary
get_org_baseline(org_id)OrgBaselineBehavioral baseline used for anomaly detection
get_persistence_stats()PersistenceStatsRedis persistence statistics for compliance data
save_compliance_data()DictTrigger manual flush of compliance data to Redis
get_intelligent_health()DictHealth status of the ML engine and all sub-components

Monitoring Methods

get_cache_stats()

stats = await sf.get_cache_stats()
print(f"Hit rate:     {stats.hit_rate:.1%}")
print(f"Tokens saved: {stats.tokens_saved:,}")
print(f"Cost saved:   ${stats.cost_saved_usd:.4f}")
print(f"L1/L2/L3:     {stats.l1_hits} / {stats.l2_hits} / {stats.l3_hits}")

health_comprehensive()

h = await sf.health_comprehensive()
print(h.overall_status)       # "healthy"
print(h.redis_connected)      # True
print(h.providers_available)  # ["openai", "anthropic", "google"]

Other monitoring methods

MethodReturns
health()Dict — basic liveness check
get_provider_health()List[ProviderHealth] — latency + success rate per provider
get_logs(limit, provider)List[VASLog] — audit log entries from local instance
get_logs_hybrid(limit)List[Dict] — logs aggregated across all instances via hybrid bridge
get_analytics(start_date, end_date)Dict — usage and cost analytics
get_routing_status()Dict — current routing state
force_provider(provider, duration_seconds)Dict — force routing for a duration

SmartflowAgent

Stateful agent with conversation memory and per-message compliance scanning.

async with SmartflowClient("https://smartflow.example.com", api_key="sk-...") as sf:
    agent = SmartflowAgent(
        client=sf,
        name="SupportBot",
        model="gpt-4o",
        system_prompt="You are a helpful customer support agent.",
        user_id="user-123",
        org_id="acme",
    )
    r1 = await agent.chat("How do I reset my password?")
    r2 = await agent.chat("What if I forgot my email too?")

    print(agent.message_count)
    agent.clear_history()
MethodDescription
chat(message, scan_input=True, scan_output=True)Send message; raises ComplianceError if blocked
clear_history()Reset conversation, preserve system prompt
get_history()Return copy of message history
message_countNumber of messages in history

SmartflowWorkflow

Chain AI operations with branching and error handling.

workflow = SmartflowWorkflow(client, name="TicketFlow")

workflow \
    .add_step("classify", action="chat",
              config={"prompt": "Classify this ticket: {input}", "model": "gpt-4o-mini"}) \
    .add_step("check", action="compliance_check",
              config={"content": "{output}"}) \
    .add_step("route", action="condition",
              config={"field": "output", "cases": {"billing": "billing_step"}, "default": "general_step"})

result = await workflow.execute({"input": ticket_text})
print(result.output)
print(result.steps_executed)
print(result.execution_time_ms)
ActionConfig fieldsDescription
"chat"prompt, model, temperatureChat completion; {input} / {output} are template variables
"compliance_check"contentCompliance scan
"condition"field, cases, defaultBranch on a context value

SyncSmartflowClient

Synchronous wrapper for scripts and Jupyter notebooks. Every async method is available without await.

from smartflow import SyncSmartflowClient

sf = SyncSmartflowClient("https://smartflow.example.com", api_key="sk-...")

reply      = sf.chat("Hello!")
emb        = sf.embeddings("Hello", model="text-embedding-3-small")
img        = sf.image_generation("A sunset", model="dall-e-3")
transcript = sf.audio_transcription(open("audio.mp3", "rb"))
audio      = sf.text_to_speech("Hello!", voice="nova")
ranked     = sf.rerank("What is the return policy?", ["doc1", "doc2"])

In Jupyter with an existing event loop: pip install nest_asyncio then nest_asyncio.apply().

OpenAI Drop-in Replacement

Any code targeting the OpenAI API works by pointing base_url at Smartflow. MetaCache, compliance scanning, VAS logging, and routing apply transparently.

from openai import OpenAI

client = OpenAI(
    api_key="sk-sf-your-virtual-key",
    base_url="https://smartflow.example.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Response Types

AIResponse

FieldTypeDescription
contentstrFirst choice text
choiceslistFull choices array
usageUsageToken usage
modelstrModel used
idstrResponse ID

CacheStats

FieldType
hit_ratefloat
total_requestsint
tokens_savedint
cost_saved_usdfloat
l1_hitsint
l2_hitsint
l3_hitsint

ComplianceResult

FieldType
has_violationsbool
compliance_scorefloat
violationslist[str]
pii_detectedlist[str]
risk_levelstr — "low" / "medium" / "high" / "critical"
recommendationslist[str]
redacted_contentstr | None

IntelligentScanResult

FieldType
risk_scorefloat — 0.0 to 1.0
risk_levelstr
recommended_actionstr — "Allow" / "Flag" / "Block"
violationslist
explanationstr

Response Headers

Every proxied response includes these headers:

HeaderDescription
x-smartflow-providerProvider that served the request
x-smartflow-modelActual model used
x-smartflow-request-idUnique request ID for log correlation
x-smartflow-cache-hittrue if response was served from MetaCache
x-smartflow-cache-keyCache key when cache-hit is true
x-smartflow-latency-msTotal proxy latency in milliseconds
x-smartflow-cost-usdEstimated cost in USD for this request
x-smartflow-compliance-scoreCompliance score (0–1) when pre-call scan is enabled

Environment Variables

Set on the Smartflow server; not used in client code.

Provider Keys

VariableProvider
OPENAI_API_KEYOpenAI
ANTHROPIC_API_KEYAnthropic
GEMINI_API_KEYGoogle Gemini
XAI_API_KEYxAI / Grok
OPENROUTER_API_KEYOpenRouter
AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSIONAzure OpenAI
MISTRAL_API_KEYMistral AI
COHERE_API_KEYCohere
GROQ_API_KEYGroq
DEEPGRAM_API_KEYDeepgram
FIREWORKS_API_KEYFireworks AI
NVIDIA_NIM_API_KEY, NVIDIA_NIM_API_BASENVIDIA NIM
HUGGINGFACE_API_KEY, HUGGINGFACE_API_BASEHuggingFace
TOGETHER_API_KEYTogether AI
PERPLEXITY_API_KEYPerplexity AI
REPLICATE_API_KEYReplicate
VERTEXAI_API_KEY, VERTEXAI_PROJECT, VERTEXAI_LOCATIONVertex AI
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGIONAWS Bedrock
NOVITA_API_KEYNovita AI
VERCEL_AI_GATEWAY_API_KEYVercel AI Gateway

Feature Flags and Ports

VariableDefaultDescription
GEMINI_ENABLEDfalseEnable Google Gemini in intelligent routing
SMARTFLOW_ALERTS_ENABLEDtrueEnable webhook alerting
SLACK_WEBHOOK_URLSlack incoming webhook
TEAMS_WEBHOOK_URLMicrosoft Teams webhook
DISCORD_WEBHOOK_URLDiscord webhook
PROXY_PORT7775LLM proxy port
MANAGEMENT_PORT7778Management API port
COMPLIANCE_PORT7777Compliance API port
BRIDGE_PORT3500Hybrid bridge port

Error Reference

HTTP Status Codes

CodeMeaning
400Malformed request — check body format
401Missing or invalid API key
402Virtual key budget exceeded
403Request blocked by compliance policy
404Resource or route not found
429Rate limit exceeded (RPM or TPM)
500Proxy internal error
502Upstream provider returned an error
503No providers available — fallback chain exhausted

SDK Exceptions

ExceptionCondition
SmartflowErrorBase class for all SDK errors
ConnectionErrorCannot connect to proxy
AuthenticationError401 — invalid or missing key
RateLimitError429 — rate limit hit
ComplianceError403 — request blocked by policy
ProviderErrorUpstream provider error
TimeoutErrorRequest timeout
from smartflow import ComplianceError, RateLimitError
import asyncio

try:
    result = await sf.chat("sensitive message")
except ComplianceError as e:
    print(f"Blocked by policy: {e}")
except RateLimitError:
    await asyncio.sleep(60)
    # retry

Changelog

v3.0 (proxy) / v0.3.0 (SDK) — 2026

New in the proxy:

  • Vector Store API (/v1/vector_stores/*) — Redis-backed, no external vector database required
  • RAG Pipeline API (/v1/rag/ingest, /v1/rag/query) — document chunking, embedding, context retrieval
  • A2A Agent Gateway (/a2a/*) — A2A protocol for inter-agent orchestration
  • Webhook alerting — Slack, Teams, Discord for budget, failure, and latency events
  • Model-name heuristic routing — claude-*, gemini-*, gpt-* detected automatically
  • Anthropic API key injection for /anthropic/* passthrough
  • Cost-based and latency-based routing strategies
  • Prometheus metrics endpoint (/metrics)
  • MCP access control — allowed_tools, disallowed_tools, guardrail_mode per server
  • MCP cost tracking via Redis HINCRBYFLOAT

New in the SDK:

  • image_generation() — multi-provider image generation
  • audio_transcription() — multipart audio, Groq/Deepgram/Fireworks routing
  • text_to_speech() — returns raw audio bytes
  • stream_chat() — async SSE iterator
  • rerank() — Cohere-compatible document reranking
  • Extended embeddings() with encoding_format, dimensions, input_type

v2.0 (proxy) / v0.2.0 (SDK)

  • MCP gateway with server registry, catalog, OAuth flow
  • SmartflowAgent with compliance scanning and conversation memory
  • SmartflowWorkflow for multi-step AI pipelines
  • Maestro ML policy engine (intelligent compliance)

v1.0 (proxy) / v0.1.0 (SDK)

  • OpenAI-compatible proxy, virtual keys, 3-tier semantic cache
  • Initial SDK: chat, chat_completions, embeddings
  • VAS audit logging, SyncSmartflowClient