TL;DR: CrossHair symbolic execution has real-world limitations. QWED addresses these with bounded model checking, depth limits, and graceful fallbacks.
The path explosion problem
Symbolic execution explores all possible execution paths. This works great for small code, but real-world code has:| Challenge | Example | Impact |
|---|---|---|
| Deep loops | for i in range(1000) | 1000+ paths to explore |
| Recursion | fibonacci(n) | Exponential path growth |
| Complex conditionals | Nested if/else chains | 2^n paths |
| Data structures | Dict/list operations | Unbounded state space |
When CrossHair works
| Code Type | Works Well? | Example |
|---|---|---|
| Pure functions | ✅ Yes | def add(a, b): return a + b |
| Simple validation | ✅ Yes | def is_positive(x): return x > 0 |
| Bounded loops | ✅ Yes | for i in range(10) |
| LLM-generated utilities | ✅ Yes | Most generated code is simple |
When CrossHair fails
| Code Type | Works? | Why |
|---|---|---|
| Deep recursion (n > 50) | ❌ No | Path explosion |
| Unbounded loops | ❌ No | Infinite paths |
| External I/O | ❌ No | Non-deterministic |
| Complex frameworks | ❌ No | Too much state |
QWED’s bounded model checking solution
We implemented depth limits to prevent path explosion:Fail-closed verification results
verify_code() is fail-closed: absence of proof is never reported as success. is_verified is only True when at least one function was actually checked, every checked function was proven, and no functions were skipped, unverifiable, or produced counterexamples.
The result dictionary always includes the following counters:
True only if symbolic proof actually succeeded for every discovered function. Defaults to False whenever proof was incomplete. The result also exposes is_safe and verified as aliases for the same boolean.One of:
verified— all functions were proven symbolically.counterexamples_found— CrossHair produced concrete counterexamples.no_verifiable_functions— code had no functions, or no typed functions could be checked.unverifiable— at least one function was skipped or could not be proven (e.g. missing type annotations).verification_error— verification did not complete cleanly.crosshair_not_available— CrossHair is not installed in the runtime.syntax_error— the submitted code could not be parsed.
Human-readable explanation of the status.
Total functions found in the submitted code.
Functions that were actually run through symbolic execution. A skipped function does not count as checked.
Functions that were proven by CrossHair without counterexamples.
Functions skipped because they cannot be analyzed (for example, no type annotations).
Functions that could not be proven, including skipped functions and functions whose verification raised an error.
Number of concrete counterexamples produced by CrossHair.
List of issue objects with
type (unverifiable, counterexample, or error), function name, and description.Why untyped code fails closed
CrossHair requires type annotations to perform symbolic execution. Functions without type hints are reported asskipped and unverifiable, and the overall result fails closed:
is_verified=False and status="unverifiable". Code that contains no functions at all returns status="no_verifiable_functions" rather than a pass-like result.
Pre-verification exits (crosshair_not_available, syntax_error) return the same fail-closed shape — is_verified=False and zeroed counters — so callers can rely on the same fields regardless of how verification ended.
Configuration guide
Conservative (fast, less coverage)
Balanced (default)
Thorough (slow, more coverage)
Fallback strategy
When symbolic execution fails, QWED falls back to:Honest benchmarks
We tested CrossHair on different code types:| Code Type | Lines | Loops | Result | Time |
|---|---|---|---|---|
| Simple math | 5 | 0 | ✅ Verified | 0.2s |
| String utils | 15 | 1 | ✅ Verified | 1.2s |
| Data validation | 30 | 3 | ✅ Verified | 5.8s |
| Complex parser | 100 | 10 | ⏱️ Timeout | 30s |
| Framework code | 500+ | Many | ❌ Not suitable | - |
Best practices
Do use symbolic execution for
- ✅ LLM-generated utility functions
- ✅ Mathematical calculations
- ✅ Validation logic
- ✅ Simple transformations
- ✅ Code with clear contracts
Don’t use symbolic execution for
- ❌ Entire applications
- ❌ Code with external dependencies
- ❌ Deep recursion algorithms
- ❌ Real-time systems
- ❌ Code with I/O operations
API reference
Bounded model checking
verify_bounded() applies loop and recursion bounds to the code before verification. If the bounds transform itself fails (e.g., the AST cannot be unparsed after transformation), the method returns a bounds_transform_error status instead of silently falling back to the original code:
Fail-closed result semantics
SymbolicVerifier.verify_code() treats absence of proof as failure. A result is only is_verified: true when CrossHair actually proved every function it checked. Code that contains no functions, contains untyped functions, or mixes typed and untyped functions cannot pass.
Result fields
The result dictionary distinguishes how each function was handled, so you can tell the difference between “verified” and “not actually checked”:| Field | Meaning |
|---|---|
is_verified | true only if at least one function was checked, all checked functions were proven, no functions were skipped or unverifiable, and no counterexamples were found. |
status | One of verified, no_verifiable_functions, unverifiable, counterexamples_found, or verification_error. |
message | Human-readable explanation of the status. |
functions_discovered | Total functions found in the source. |
functions_checked | Functions actually submitted to CrossHair. |
functions_verified | Functions CrossHair proved. |
functions_skipped | Functions skipped (for example, missing type annotations). |
functions_unverifiable | Functions that could not be proven, including skipped functions and verification errors. |
counterexamples_found | Number of counterexample issues reported. |
issues | Per-function issue records. Skipped functions appear with type: "unverifiable". |
Status values
status | When it applies |
|---|---|
verified | At least one function was checked and every checked function was proven. |
no_verifiable_functions | No functions were found, or every discovered function was skipped (for example, all untyped). |
unverifiable | At least one function was skipped or could not be proven. Some functions may still have been verified. |
counterexamples_found | CrossHair returned at least one counterexample. |
verification_error | Verification did not complete cleanly for another reason. |
Typed function — passes
Untyped function — fails closed
CrossHair requires type hints. Untyped functions are now reported as skipped and unverifiable rather than silently passing:Mixed typed and untyped — fails closed
Even if some functions verify, a single skipped function prevents an overall pass:Code with no functions — fails closed
Source that contains only top-level statements has nothing to prove and is no longer reported asis_verified: true:
Summary
| Question | Answer |
|---|---|
| Does symbolic execution work on all code? | No. Limited by path explosion. |
| How does QWED handle this? | Bounded model checking + fallbacks. |
| What’s the typical code size limit? | ~50-100 lines of simple code. |
| When should I use it? | LLM-generated utilities and validation. |
“My guess would be that technique falls apart at depths required in real world coding environments.” — Reddit criticism. We agree. That’s why we have bounds and fallbacks.