Release · March 2026

Smartflow 1.5

4-Phase VectorLite Cache  ·  Kubernetes / Helm  ·  Concurrency Hardening
Phase 4 BERT KNN Kubernetes / Helm 3 Deadlock Fixes Zero-Error Concurrency
Caching
1
Phase 4 — VectorLite BERT Semantic KNN CacheNEW
MetaCache

The MetaCache retrieval pipeline has been extended with a fourth phase: full BERT semantic similarity search using sentence-transformers/all-MiniLM-L6-v2 (384-dimensional vectors, local inference) against a K-nearest-neighbours index in Redis Stack.


When Phases 1–3 miss (intent fingerprint, near-miss, exact key), Phase 4 embeds the request text locally and searches all stored response embeddings for cosine similarity ≥0.90. A paraphrased question with different wording but the same meaning returns the cached response — no LLM call, no external API, no additional network hop.


Phase 1
Intent Fingerprint Exact Match
In-process memory, <1ms. Catches repeated questions regardless of minor phrasing variation.
Phase 2
Intent Fingerprint Near-Miss
Token-overlap scoring on intent signatures. Catches reformulations of the same core question.
Phase 3
SHA-256 Exact Key (Redis)
1–3ms Redis lookup. Reliable fallback for structured payloads and multimodal inputs.
Phase 4 — NEW
VectorLite BERT KNN Search
all-MiniLM-L6-v2, 384-dim, cosine similarity ≥ 0.90. Redis Stack vector index. 5–20ms. Local BERT inference — no external vector DB.
Unlocks

Support bots, knowledge base tools, and internal Q&A systems see dramatically higher cache hit rates. Phase 4 captures the most common real-world cache miss pattern: same question, different wording. Validated: similarity 0.91–0.95 for genuine paraphrases, correct misses below threshold.

2
Default Similarity Threshold Updated: 0.85 → 0.90
MetaCache
The default cosine similarity threshold for Phase 4 cache hits has been raised from 0.85 to 0.90. The higher threshold reduces false-positive hits (returning a cached response for a question that is related but semantically distinct) while maintaining strong recall for genuine paraphrases. Configurable per-deployment via environment variable.
Deployment
3
Kubernetes / Helm Chart — Production Validated
Deployment

The helm/smartflow chart is now validated for production use on managed Kubernetes. Tested on DigitalOcean Kubernetes (v1.35) with NGINX Ingress Controller, cert-manager TLS (Let's Encrypt), TimescaleDB StatefulSet, Redis Stack with PVC, and horizontal pod scaling across all services.


New Helm values:

  • proxy.image.pullPolicy: Always— force fresh pull on pod restart
  • compliance.replicas: 3— horizontal scale for concurrent validation
  • policyPerfect.replicas: 2— horizontal scale for policy checks
  • proxy.env.RATE_LIMIT_REQUESTS_PER_HOUR— per-IP rate limit tuning
  • proxy.env.TOKIO_WORKER_THREADS: 16— async worker pool sizing
Unlocks

Single-command Kubernetes deployment on DigitalOcean, AWS EKS, GKE, and AKS. Automatic TLS, health probes, persistent storage, and service mesh-ready inter-pod routing out of the box.

Reliability & Concurrency Fixes
4
Fix: Key Store Reader Deadlock (RwLock Starvation)
Bug Fix
Root cause: API key refresh was executing synchronously on the Tokio request path using std::sync::RwLock. On Linux, RwLock write-lock acquisition blocks all readers while pending. Under concurrent load with an empty Redis key store, every request triggered a refresh. All refresh threads competed for the write lock, blocked all readers, and exhausted the Tokio worker pool — complete proxy freeze.

Fix: Replaced RwLock with std::sync::Mutex for write-fairness. Introduced AtomicU64for lock-free cache expiry checking — the hot read path never acquires any lock. Background refresh debounced withAtomicBool; only one refresh runs at a time, always in a detached std::thread::spawn— never on a Tokio worker thread.
Impact Before Fix

Proxy would completely freeze under 20+ concurrent requests when Redis key store was empty or stale. All Tokio workers blocked. HTTP server stopped accepting connections.

5
Fix: Compliance Check Blocking Tokio Runtime
Bug Fix
Root cause: determine_compliance_info() used tokio::task::block_in_place(|| Handle::current().block_on(...))with a 60-second timeout. This occupied a Tokio worker thread for up to 60 seconds per request. Under concurrent load, all worker threads could be held simultaneously — deadlock.

Fix: Function converted to async fn. Compliance HTTP call is now directly .awaited with tokio::time::timeout(Duration::from_secs(8), ...). No worker thread is blocked. On timeout, check fails open — request passes through with a warning log. All four call sites updated to.await.
Impact Before Fix

Requests with slow compliance service responses would hold a Tokio worker for 60 seconds. Four concurrent requests with slow compliance responses could exhaust a 4-worker pool entirely.

6
Fix: MAESTRO UUID Validation Error for Anonymous Users
Bug Fix
Root cause: For requests with no policies applied (anonymous users), vas_log.policies_applied was an empty string. Splitting on commas produced vec![""]. Passing an empty string as a PostgreSQL UUID parameter caused invalid input syntax for type uuid: "" errors, returned to the proxy as HTTP 503 compliance failures.

Fix: Empty strings are filtered from policy ID lists before dispatch (filter(|s| !s.is_empty())). The load_policy()function in the storage layer also guards against empty IDs as defence-in-depth —if id.is_empty() { return Ok(None); }.
Impact Before Fix

Every request from an anonymous (no-policy) user would fail with a 503 compliance error when POLICY_FAIL_OPEN=false. Blocked all unauthenticated API access.

Post-Fix Concurrency Benchmark
7
Concurrency Benchmark — K8s vs Bare Metal
Performance
Post-fix benchmarks across 20 concurrent cache-hit requests and 15 concurrent Phase 4 semantic variant requests:
MetricBare MetalK8s (2×s-4vcpu-8gb)
20 concurrent cache hits — p500.14s0.57s
20 concurrent cache hits — wall time0.22s0.59s
15 concurrent semantic variants — p500.18s0.41s
HTTP 200 rate100%100%
Errors / deadlocks00
K8s overhead is network and container runtime latency. Application-level deadlocks are fully resolved on both platforms.
Release Summary
# Feature / Fix Area
1Phase 4 VectorLite BERT semantic KNN cache— all-MiniLM-L6-v2, 384-dim, cosine ≥ 0.90, Redis StackCaching
2Four-phase MetaCache pipeline (intent fingerprint → near-miss → exact key → VectorLite)Caching
3Default similarity threshold raised 0.85 → 0.90Caching
4Kubernetes / Helm chart production validation (DigitalOcean, NGINX ingress, cert-manager TLS)Deployment
5Compliance & policy-perfect horizontal scaling (replicas: 3/2) in Helm valuesDeployment
6Fix:Key store reader deadlock — Mutex + AtomicU64 expiry + background refreshBug Fix
7Fix:Compliance check async refactor — 8s timeout, zero block_in_placeBug Fix
8Fix:MAESTRO UUID empty-string guard — filter + storage layer defence-in-depthBug Fix