Skip to main content

QWED Verification: Specification Guide

TL;DR: QWED verifies LLM outputs against developer-written code specifications, not natural language. If you let the LLM generate both the answer AND the spec, you've just verified a hallucination.

How QWED Actually Works

┌─────────────────────────────────────────────────────────────┐
│ CORRECT USAGE │
├─────────────────────────────────────────────────────────────┤
│ Developer provides: expected_value = "100000 * 1.05**10" │
│ LLM generates: answer = "150000" │
│ QWED verifies: 150000 ≠ 162889.46 → REJECTED │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ INCORRECT USAGE │
├─────────────────────────────────────────────────────────────┤
│ LLM generates: expected_value = "100000 * 1.05**5" │
│ LLM generates: answer = "127628" │
│ QWED verifies: 127628 ≈ 127628.16 → VERIFIED ❌ │
│ (Hallucination verified because spec was also hallucinated)│
└─────────────────────────────────────────────────────────────┘

The Golden Rule

ComponentWho Provides ItExample
SpecificationDeveloper (you)expected = "P * (1 + r)**n"
LLM OutputLLM"$150,000"
Ground TruthDeveloper (you)P=100000, r=0.05, n=10

QWED does NOT solve: Natural language → Formal specification translation.

QWED DOES solve: "Is this LLM output correct given my known ground truth?"


Examples by Engine

Math Engine

from qwed_new.core.verifier import VerificationEngine

engine = VerificationEngine()

# ✅ CORRECT: Developer provides formula
result = engine.verify_compound_interest(
principal=100000, # Developer knows this
rate=0.05, # Developer knows this
time=10, # Developer knows this
n=1, # Developer knows this
expected=150000 # LLM claimed this
)
# Result: is_correct=False, calculated=162889.46

# ❌ WRONG: Letting LLM provide the formula
# If LLM says "use rate=0.03" and you trust it, QWED can't help

SQL Engine

from qwed_new.core.sql_verifier import SQLVerifier

verifier = SQLVerifier()

# ✅ CORRECT: Developer defines allowed tables
result = verifier.verify_query(
query="SELECT * FROM users", # LLM generated
allowed_tables=["users", "orders"], # Developer defines
allowed_columns=["id", "name", "email"] # Developer defines
)

# ❌ WRONG: Letting LLM define what tables are allowed

Code Engine

from qwed_new.core.code_verifier import CodeSecurityVerifier

verifier = CodeSecurityVerifier()

# ✅ CORRECT: Developer defines forbidden patterns
result = verifier.analyze_code(
code="eval(user_input)", # LLM generated
# Verifier has built-in dangerous pattern detection
)
# Result: UNSAFE - eval detected

# The patterns (eval, exec, __import__) are defined by QWED, not LLM

Logic Engine

from qwed_new.core.logic_verifier import LogicVerifier

verifier = LogicVerifier()

# ✅ CORRECT: Developer provides premises
result = verifier.verify_conclusion(
premises=["All humans are mortal", "Socrates is human"], # Developer
conclusion="Socrates is mortal" # LLM claimed
)

# ❌ WRONG: Letting LLM provide premises
# If LLM says "All humans are immortal", QWED will verify wrong logic

When QWED Does NOT Work

Use CaseWhy It Fails
"Verify this essay is factually correct"No ground truth to compare against
"Check if this code does what I want""What you want" is ambiguous
"Validate this creative writing"No deterministic correctness
"Verify this translation is accurate"Requires semantic understanding

When QWED Works Best

Use CaseWhy It Works
Financial calculationsFormula is known, answer must match
SQL query validationSchema is known, query must follow rules
Code security scanningDangerous patterns are predefined
Logic proofsPremises are given, conclusion must follow
Statistical claimsData is known, statistics must be correct

Summary

QuestionAnswer
Does QWED use LLMs to verify LLMs?No. Uses SymPy, Z3, AST, SQLGlot.
Can QWED verify any LLM output?No. Only structured, domain-specific outputs.
Who provides the ground truth?You, the developer.
What if the spec is wrong?Output will be wrong. Same as any software.

"Math verifies the proof, not the premise. If you hallucinate the constraints, you've verified a hallucination."

— Valid criticism we agree with. That's why specs come from you, not the LLM.