🎯 The Core Problem
Verification requires two things:- LLM understands the query (natural language → structured reasoning)
- Symbolic verifier proves the answer (SymPy, Z3, AST)
📊 LLM Accuracy Comparison
Math Verification Example
Query: “What is the integral of 2x?”| Model | Type | Accuracy | Typical Response |
|---|---|---|---|
| GPT-4o-mini | Cloud | ~95% | “x² + C” ✅ |
| Claude 3 Haiku | Cloud | ~93% | “x² + C” ✅ |
| Llama 3 8B | Local | ~75% | Sometimes “x² + C” ✅, sometimes “2x²/2” ❌ |
| Mistral 7B | Local | ~70% | Inconsistent, may confuse derivative/integral |
Why This Matters
When QWED verifies:🤔 When to Use Each
| Use Case | Local LLM (Ollama) | Cloud LLM (OpenAI/Anthropic) |
|---|---|---|
| Development/Testing | ✅ Free, fast iteration | ⚠️ Costs add up |
| Production (Critical) | ❌ Lower accuracy | ✅ Recommended |
| Privacy-Sensitive Data | ✅ 100% local + PII masking | ⚠️ Use with PII masking |
| Cost-Sensitive | ✅ $0/month | ⚠️ ~$5-50/month |
| High-Stakes Decisions | ❌ Risk of errors | ✅ Recommended |
💡 QWED’s Hybrid Approach
Best Practice: Use both strategicallyDevelopment Setup (Free)
Use for: Prototyping, experimentation, learning
Production Setup (Reliable)
Use for: Production, high-stakes decisions
💰 Cost Analysis
Local LLM (Ollama)
- Setup: 10 minutes (download model)
- Monthly Cost: $0
- Accuracy: 70-80% on math/logic
- Privacy: 100% local
- Best for: Development, testing, learning
Cloud LLM (OpenAI GPT-4o-mini)
- Setup: 2 minutes (API key)
- Monthly Cost: $5-10 (with caching)
- Accuracy: 90-95% on math/logic
- Privacy: Use PII masking
- Best for: Production, critical tasks
With QWED Caching (Smart Cost Savings)
🔒 Privacy Considerations
Local LLM Advantages
✅ 100% private - data never leaves your machine✅ No API keys - no third-party access
✅ Compliance - easier GDPR/HIPAA compliance
Cloud LLM with PII Masking
🎯 Recommendation by Use Case
Healthcare (HIPAA)
Finance (PCI-DSS)
Enterprise (General)
🚀 The QWED Advantage
Even with local LLMs, QWED catches errors!Scenario: Local LLM Makes Mistake
- More failures = worse UX
- Cloud LLMs = fewer verification failures = better UX
📈 Accuracy in Practice
From QWED internal testing:| Domain | Local LLM (Llama 3 8B) | Cloud LLM (GPT-4o-mini) |
|---|---|---|
| Basic Math | 85% | 98% |
| Calculus | 75% | 95% |
| Logic (SAT) | 70% | 93% |
| Code Security | 80% | 96% |
🎓 Bottom Line
Start with Local LLM
Scale to Cloud LLM
🔗 Related Documentation
- LLM Configuration Guide - Complete LLM setup
- PII Masking Guide - Privacy protection
- Caching Guide - Cost savings
❓ FAQ
Q: Can I use Llama 3 70B instead of GPT-4?A: Yes! Larger local models (70B+) approach cloud accuracy but require significant hardware (40GB+ VRAM). Q: Is Ollama really free?
A: Yes! Fully open source. You just need hardware to run it. Q: What about Google Gemini?
A: QWED supports Gemini! Similar accuracy to GPT-4/Claude. Q: Can I switch between local and cloud?
A: Absolutely! Change the
provider parameter anytime.
Q: Do I need PII masking with local LLMs?A: Not necessarily, but it’s still good practice for audit trails.
The choice is yours - QWED works with both! 🚀 Recommendation: Start local (free), scale to cloud (reliable) when it matters.