Skip to main content
TL;DR: Verification is a critical task requiring maximum accuracy. Cloud LLMs (GPT-4, Claude) deliver 90-95% accuracy vs 70-80% for local models, making them ideal for production verification.

🎯 The Core Problem

Verification requires two things:
  1. LLM understands the query (natural language → structured reasoning)
  2. Symbolic verifier proves the answer (SymPy, Z3, AST)
If the LLM gets step 1 wrong, verification fails even with perfect symbolic math.

📊 LLM Accuracy Comparison

Math Verification Example

Query: “What is the integral of 2x?”
ModelTypeAccuracyTypical Response
GPT-4o-miniCloud~95%“x² + C” ✅
Claude 3 HaikuCloud~93%“x² + C” ✅
Llama 3 8BLocal~75%Sometimes “x² + C” ✅, sometimes “2x²/2” ❌
Mistral 7BLocal~70%Inconsistent, may confuse derivative/integral

Why This Matters

When QWED verifies:
1. LLM says: "x² + C"
2. SymPy computes: integrate(2*x, x) = x**2
3. QWED compares: ✅ MATCH!
If LLM is wrong:
1. LLM says: "2x²" (incorrect)
2. SymPy computes: x**2
3. QWED: ❌ NO MATCH → Verification fails
Result: User sees failure, even though QWED’s symbolic engine is correct!

🤔 When to Use Each

Use CaseLocal LLM (Ollama)Cloud LLM (OpenAI/Anthropic)
Development/Testing✅ Free, fast iteration⚠️ Costs add up
Production (Critical)❌ Lower accuracyRecommended
Privacy-Sensitive Data✅ 100% local + PII masking⚠️ Use with PII masking
Cost-Sensitive✅ $0/month⚠️ ~$5-50/month
High-Stakes Decisions❌ Risk of errorsRecommended

💡 QWED’s Hybrid Approach

Best Practice: Use both strategically

Development Setup (Free)

from qwed_sdk import QWEDLocal

# Local LLM for development
client_dev = QWEDLocal(
    base_url="http://localhost:11434/v1",  # Ollama
    model="llama3",
    cache=True  # Cache responses
)

# Test your queries
result = client_dev.verify("What is 2+2?")
Cost: $0/month
Use for: Prototyping, experimentation, learning

Production Setup (Reliable)

import os
from qwed_sdk import QWEDLocal

# Cloud LLM for production
client_prod = QWEDLocal(
    provider="openai",
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o-mini",
    mask_pii=True,   # Privacy protection
    cache=True        # 50-80% cost savings!
)

# Critical verification
result = client_prod.verify("Verify calculation: ...")
Cost: ~$5-10/month (with caching!)
Use for: Production, high-stakes decisions

💰 Cost Analysis

Local LLM (Ollama)

  • Setup: 10 minutes (download model)
  • Monthly Cost: $0
  • Accuracy: 70-80% on math/logic
  • Privacy: 100% local
  • Best for: Development, testing, learning

Cloud LLM (OpenAI GPT-4o-mini)

  • Setup: 2 minutes (API key)
  • Monthly Cost: $5-10 (with caching)
  • Accuracy: 90-95% on math/logic
  • Privacy: Use PII masking
  • Best for: Production, critical tasks

With QWED Caching (Smart Cost Savings)

# First query: Hits LLM (costs $$)
result1 = client.verify("What is 2+2?")

# Same query within 24 hours: Cache hit (FREE!)
result2 = client.verify("What is 2+2?")  # $0 cost!
Real Savings: 50-80% cost reduction on repeated queries!

🔒 Privacy Considerations

Local LLM Advantages

100% private - data never leaves your machine
No API keys - no third-party access
Compliance - easier GDPR/HIPAA compliance

Cloud LLM with PII Masking

client = QWEDLocal(
    provider="openai",
    mask_pii=True,  # Auto-mask emails, SSNs, etc.
    pii_entities=["EMAIL_ADDRESS", "CREDIT_CARD", "US_SSN"]
)

# Sensitive data protected!
result = client.verify("User email: john@example.com, calculate 2+2")
# OpenAI sees: "User email: <EMAIL_ADDRESS>, calculate 2+2"
Result: Cloud accuracy + local privacy! 🔒

🎯 Recommendation by Use Case

Healthcare (HIPAA)

# Option 1: Local LLM (most private)
client = QWEDLocal(
    base_url="http://localhost:11434/v1",
    model="llama3"
)

# Option 2: Cloud + PII masking (more accurate)
client = QWEDLocal(
    provider="openai",
    mask_pii=True,
    pii_entities=["PERSON", "US_SSN", "MEDICAL_LICENSE"]
)
Recommendation: Cloud + PII masking for critical diagnoses

Finance (PCI-DSS)

client = QWEDLocal(
    provider="openai",
    mask_pii=True,
    pii_entities=["CREDIT_CARD", "IBAN_CODE"]
)
Recommendation: Cloud + PII masking (accuracy matters for money!)

Enterprise (General)

# Development
dev_client = QWEDLocal(base_url="http://localhost:11434/v1", model="llama3")

# Production
prod_client = QWEDLocal(provider="openai", mask_pii=True, cache=True)
Recommendation: Hybrid approach

🚀 The QWED Advantage

Even with local LLMs, QWED catches errors!

Scenario: Local LLM Makes Mistake

client = QWEDLocal(base_url="http://localhost:11434/v1", model="llama3")

# Llama 3 might say: "Derivative of x² is x" (WRONG!)
result = client.verify("What is the derivative of x²?")

# QWED's symbolic verification:
# SymPy: diff(x**2, x) = 2*x
# LLM said: "x"
# QWED: ❌ NO MATCH! 
# result.verified = False
User sees: “Verification failed - LLM answer doesn’t match symbolic proof” But:
  • More failures = worse UX
  • Cloud LLMs = fewer verification failures = better UX

📈 Accuracy in Practice

From QWED internal testing:
DomainLocal LLM (Llama 3 8B)Cloud LLM (GPT-4o-mini)
Basic Math85%98%
Calculus75%95%
Logic (SAT)70%93%
Code Security80%96%
Takeaway: Cloud LLMs reduce verification failures by 15-25%!

🎓 Bottom Line

Start with Local LLM

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download model
ollama pull llama3

# Use with QWED
python -c "from qwed_sdk import QWEDLocal; \
  client = QWEDLocal(base_url='http://localhost:11434/v1', model='llama3'); \
  print(client.verify('2+2'))"
Perfect for: Learning, prototyping, hobby projects

Scale to Cloud LLM

# Get API key from OpenAI
export OPENAI_API_KEY="sk-..."

# Use with QWED
python -c "from qwed_sdk import QWEDLocal; \
  client = QWEDLocal(provider='openai', mask_pii=True, cache=True); \
  print(client.verify('2+2'))"
Perfect for: Production, enterprise, critical decisions

❓ FAQ

Q: Can I use Llama 3 70B instead of GPT-4?
A: Yes! Larger local models (70B+) approach cloud accuracy but require significant hardware (40GB+ VRAM).
Q: Is Ollama really free?
A: Yes! Fully open source. You just need hardware to run it.
Q: What about Google Gemini?
A: QWED supports Gemini! Similar accuracy to GPT-4/Claude.
Q: Can I switch between local and cloud?
A: Absolutely! Change the provider parameter anytime.
Q: Do I need PII masking with local LLMs?
A: Not necessarily, but it’s still good practice for audit trails.

The choice is yours - QWED works with both! 🚀 Recommendation: Start local (free), scale to cloud (reliable) when it matters.