TL;DR: QWED verifies LLM outputs against developer-written code specifications, not natural language. If you let the LLM generate both the answer AND the spec, you’ve just verified a hallucination.
How QWED Actually Works
The Golden Rule
| Component | Who Provides It | Example |
|---|---|---|
| Specification | Developer (you) | expected = "P * (1 + r)**n" |
| LLM Output | LLM | "$150,000" |
| Ground Truth | Developer (you) | P=100000, r=0.05, n=10 |
Examples by Engine
Math Engine
SQL Engine
Code Engine
Logic Engine
When QWED Does NOT Work
| Use Case | Why It Fails |
|---|---|
| ”Verify this essay is factually correct” | No ground truth to compare against |
| ”Check if this code does what I want" | "What you want” is ambiguous |
| ”Validate this creative writing” | No deterministic correctness |
| ”Verify this translation is accurate” | Requires semantic understanding |
When QWED Works Best
| Use Case | Why It Works |
|---|---|
| Financial calculations | Formula is known, answer must match |
| SQL query validation | Schema is known, query must follow rules |
| Code security scanning | Dangerous patterns are predefined |
| Logic proofs | Premises are given, conclusion must follow |
| Statistical claims | Data is known, statistics must be correct |
Summary
| Question | Answer |
|---|---|
| Does QWED use LLMs to verify LLMs? | No. Uses SymPy, Z3, AST, SQLGlot. |
| Can QWED verify any LLM output? | No. Only structured, domain-specific outputs. |
| Who provides the ground truth? | You, the developer. |
| What if the spec is wrong? | Output will be wrong. Same as any software. |
“Math verifies the proof, not the premise. If you hallucinate the constraints, you’ve verified a hallucination.” — Valid criticism we agree with. That’s why specs come from you, not the LLM.