Skip to main content
TL;DR: Verification is a critical task requiring maximum accuracy. Cloud LLMs (GPT-4, Claude) deliver 90-95% accuracy vs 70-80% for local models, making them ideal for production verification.

๐ŸŽฏ The core problem

Verification requires two things:
  1. LLM translates the query (natural language โ†’ structured reasoning)
  2. Symbolic verifier proves the answer (SymPy, Z3, AST)
If the LLM gets step 1 wrong, verification fails even with perfect symbolic math.

๐Ÿ“Š LLM accuracy comparison

Math verification example

Query: โ€œWhat is the integral of 2x?โ€
ModelTypeAccuracyTypical Response
GPT-4o-miniCloud~95%โ€œxยฒ + Cโ€ โœ…
Claude 3 HaikuCloud~93%โ€œxยฒ + Cโ€ โœ…
Llama 3 8BLocal~75%Sometimes โ€œxยฒ + Cโ€ โœ…, sometimes โ€œ2xยฒ/2โ€ โŒ
Mistral 7BLocal~70%Inconsistent, may confuse derivative/integral

Why this matters

When QWED verifies:
1. LLM says: "xยฒ + C"
2. SymPy computes: integrate(2*x, x) = x**2
3. QWED compares: โœ… MATCH!
If LLM is wrong:
1. LLM says: "2xยฒ" (incorrect)
2. SymPy computes: x**2
3. QWED: โŒ NO MATCH โ†’ Verification fails
Result: User sees failure, even though QWEDโ€™s symbolic engine is correct!

๐Ÿค” When to use each

Use CaseLocal LLM (Ollama)Cloud LLM (OpenAI/Anthropic)
Development/Testingโœ… Free, fast iterationโš ๏ธ Costs add up
Production (Critical)โŒ Lower accuracyโœ… Recommended
Privacy-Sensitive Dataโœ… 100% local + PII maskingโš ๏ธ Use with PII masking
Cost-Sensitiveโœ… $0/monthโš ๏ธ ~$5-50/month
High-Stakes DecisionsโŒ Risk of errorsโœ… Recommended

๐Ÿ’ก QWEDโ€™s hybrid approach

Best Practice: Use both strategically

Development setup (free)

from qwed_sdk import QWEDLocal

# Local LLM for development
client_dev = QWEDLocal(
    base_url="http://localhost:11434/v1",  # Ollama
    model="llama3",
    cache=True  # Cache responses
)

# Test your queries
result = client_dev.verify("What is 2+2?")
Cost: $0/month
Use for: Prototyping, experimentation, learning

Production setup (reliable)

import os
from qwed_sdk import QWEDLocal

# Cloud LLM for production
client_prod = QWEDLocal(
    provider="openai",
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o-mini",
    mask_pii=True,   # Privacy protection
    cache=True        # 50-80% cost savings!
)

# Critical verification
result = client_prod.verify("Verify calculation: ...")
Cost: ~$5-10/month (with caching!)
Use for: Production, high-stakes decisions

๐Ÿ’ฐ Cost analysis

Local LLM (Ollama)

  • Setup: 10 minutes (download model)
  • Monthly Cost: $0
  • Accuracy: 70-80% on math/logic
  • Privacy: 100% local
  • Best for: Development, testing, learning

Cloud LLM (OpenAI GPT-4o-mini)

  • Setup: 2 minutes (API key)
  • Monthly Cost: $5-10 (with caching)
  • Accuracy: 90-95% on math/logic
  • Privacy: Use PII masking
  • Best for: Production, critical tasks

With QWED caching (cost savings)

# First query: Hits LLM (costs $$)
result1 = client.verify("What is 2+2?")

# Same query within 24 hours: Cache hit (FREE!)
result2 = client.verify("What is 2+2?")  # $0 cost!
Savings: repeated queries hit the cache and skip the LLM call.

๐Ÿ”’ Privacy considerations

Local LLM advantages

โœ… Private โ€” data stays on your machine
โœ… No API keys - no third-party access
โœ… Compliance - easier GDPR/HIPAA compliance

Cloud LLM with PII masking

client = QWEDLocal(
    provider="openai",
    mask_pii=True,  # Auto-mask emails, SSNs, etc.
    pii_entities=["EMAIL_ADDRESS", "CREDIT_CARD", "US_SSN"]
)

# Sensitive data protected!
result = client.verify("User email: john@example.com, calculate 2+2")
# OpenAI sees: "User email: <EMAIL_ADDRESS>, calculate 2+2"
Result: Cloud accuracy + local privacy! ๐Ÿ”’

๐ŸŽฏ Recommendation by use case

Healthcare (HIPAA)

# Option 1: Local LLM (most private)
client = QWEDLocal(
    base_url="http://localhost:11434/v1",
    model="llama3"
)

# Option 2: Cloud + PII masking (more accurate)
client = QWEDLocal(
    provider="openai",
    mask_pii=True,
    pii_entities=["PERSON", "US_SSN", "MEDICAL_LICENSE"]
)
Recommendation: Cloud + PII masking for critical diagnoses

Finance (PCI-DSS)

client = QWEDLocal(
    provider="openai",
    mask_pii=True,
    pii_entities=["CREDIT_CARD", "IBAN_CODE"]
)
Recommendation: Cloud + PII masking (accuracy matters for money!)

Enterprise (general)

# Development
dev_client = QWEDLocal(base_url="http://localhost:11434/v1", model="llama3")

# Production
prod_client = QWEDLocal(provider="openai", mask_pii=True, cache=True)
Recommendation: Hybrid approach

๐Ÿš€ The QWED advantage

Even with local LLMs, QWED catches errors!

Scenario: local LLM makes mistake

client = QWEDLocal(base_url="http://localhost:11434/v1", model="llama3")

# Llama 3 might say: "Derivative of xยฒ is x" (WRONG!)
result = client.verify("What is the derivative of xยฒ?")

# QWED's symbolic verification:
# SymPy: diff(x**2, x) = 2*x
# LLM said: "x"
# QWED: โŒ NO MATCH! 
# result.verified = False
User sees: โ€œVerification failed - LLM answer doesnโ€™t match symbolic proofโ€ But:
  • More failures = worse UX
  • Cloud LLMs = fewer verification failures = better UX

๐Ÿ“ˆ Accuracy in practice

From QWED internal testing:
DomainLocal LLM (Llama 3 8B)Cloud LLM (GPT-4o-mini)
Basic Math85%98%
Calculus75%95%
Logic (SAT)70%93%
Code Security80%96%
Takeaway: Cloud LLMs reduce verification failures by 15-25%!

๐ŸŽ“ Bottom line

Start with local LLM

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download model
ollama pull llama3

# Use with QWED
python -c "from qwed_sdk import QWEDLocal; \
  client = QWEDLocal(base_url='http://localhost:11434/v1', model='llama3'); \
  print(client.verify('2+2'))"
Perfect for: Learning, prototyping, hobby projects

Scale to cloud LLM

# Get API key from OpenAI
export OPENAI_API_KEY="sk-..."

# Use with QWED
python -c "from qwed_sdk import QWEDLocal; \
  client = QWEDLocal(provider='openai', mask_pii=True, cache=True); \
  print(client.verify('2+2'))"
Perfect for: Production, enterprise, critical decisions

โ“ FAQ

Q: Can I use Llama 3 70B instead of GPT-4?
A: Yes! Larger local models (70B+) approach cloud accuracy but require significant hardware (40GB+ VRAM).
Q: Is Ollama really free?
A: Yes! Fully open source. You just need hardware to run it.
Q: What about Google Gemini?
A: QWED supports Gemini! Similar accuracy to GPT-4/Claude.
Q: Can I switch between local and cloud?
A: Absolutely! Change the provider parameter anytime.
Q: Do I need PII masking with local LLMs?
A: Not necessarily, but itโ€™s still good practice for audit trails.

The choice is yours - QWED works with both! ๐Ÿš€ Recommendation: Start local (free), scale to cloud (reliable) when it matters.