Symbolic execution limits in QWED

TL;DR: CrossHair symbolic execution has real-world limitations. QWED addresses these with bounded model checking, depth limits, and graceful fallbacks.

The path explosion problem

Symbolic execution explores all possible execution paths. This works great for small code, but real-world code has:

Challenge	Example	Impact
Deep loops	`for i in range(1000)`	1000+ paths to explore
Recursion	`fibonacci(n)`	Exponential path growth
Complex conditionals	Nested if/else chains	2^n paths
Data structures	Dict/list operations	Unbounded state space

When CrossHair works

Code Type	Works Well?	Example
Pure functions	✅ Yes	`def add(a, b): return a + b`
Simple validation	✅ Yes	`def is_positive(x): return x > 0`
Bounded loops	✅ Yes	`for i in range(10)`
LLM-generated utilities	✅ Yes	Most generated code is simple

When CrossHair fails

Code Type	Works?	Why
Deep recursion (n > 50)	❌ No	Path explosion
Unbounded loops	❌ No	Infinite paths
External I/O	❌ No	Non-deterministic
Complex frameworks	❌ No	Too much state

QWED’s bounded model checking solution

We implemented depth limits to prevent path explosion:

from qwed_new.core.symbolic_verifier import SymbolicVerifier

verifier = SymbolicVerifier(
    max_loop_iterations=100,  # Bound loops
    max_recursion_depth=50,   # Bound recursion
    timeout_seconds=30        # Hard timeout
)

result = verifier.verify_code(code)

if result["status"] == "TIMEOUT":
    # Fallback to static analysis
    print("Symbolic execution timed out, using AST analysis")

Fail-closed verification results

verify_code() is fail-closed: absence of proof is never reported as success. is_verified is only True when at least one function was actually checked, every checked function was proven, and no functions were skipped, unverifiable, or produced counterexamples. The result dictionary always includes the following counters:

boolean

True only if symbolic proof actually succeeded for every discovered function. Defaults to False whenever proof was incomplete. The result also exposes is_safe and verified as aliases for the same boolean.

string

One of:

verified — all functions were proven symbolically.
counterexamples_found — CrossHair produced concrete counterexamples.
no_verifiable_functions — code had no functions, or no typed functions could be checked.
unverifiable — at least one function was skipped or could not be proven (e.g. missing type annotations).
verification_error — verification did not complete cleanly.
crosshair_not_available — CrossHair is not installed in the runtime.
syntax_error — the submitted code could not be parsed.

string

Human-readable explanation of the status.

integer

Total functions found in the submitted code.

integer

Functions that were actually run through symbolic execution. A skipped function does not count as checked.

integer

Functions that were proven by CrossHair without counterexamples.

integer

Functions skipped because they cannot be analyzed (for example, no type annotations).

integer

Functions that could not be proven, including skipped functions and functions whose verification raised an error.

integer

Number of concrete counterexamples produced by CrossHair.

array

List of issue objects with type (unverifiable, counterexample, or error), function name, and description.

Why untyped code fails closed

CrossHair requires type annotations to perform symbolic execution. Functions without type hints are reported as skipped and unverifiable, and the overall result fails closed:

code = """
def add(a, b):
    return a + b
"""

result = verifier.verify_code(code)

assert result["is_verified"] is False
assert result["status"] == "no_verifiable_functions"
assert result["functions_discovered"] == 1
assert result["functions_checked"] == 0
assert result["functions_skipped"] == 1
assert result["functions_unverifiable"] == 1

Mixed code that contains both typed and untyped functions also fails closed — a single skipped function is enough to set is_verified=False and status="unverifiable". Code that contains no functions at all returns status="no_verifiable_functions" rather than a pass-like result. Pre-verification exits (crosshair_not_available, syntax_error) return the same fail-closed shape — is_verified=False and zeroed counters — so callers can rely on the same fields regardless of how verification ended.

Configuration guide

Conservative (fast, less coverage)

config = {
    "max_loop_iterations": 10,
    "max_recursion_depth": 10,
    "timeout_seconds": 5
}

Balanced (default)

config = {
    "max_loop_iterations": 100,
    "max_recursion_depth": 50,
    "timeout_seconds": 30
}

Thorough (slow, more coverage)

config = {
    "max_loop_iterations": 1000,
    "max_recursion_depth": 100,
    "timeout_seconds": 300
}

Fallback strategy

When symbolic execution fails, QWED falls back to:

1. Symbolic Execution (CrossHair)
   ↓ timeout/failure
2. Static Analysis (AST)
   ↓ insufficient
3. Type Checking (mypy patterns)
   ↓ still need more
4. Manual Review Flag

Honest benchmarks

We tested CrossHair on different code types:

Code Type	Lines	Loops	Result	Time
Simple math	5	0	✅ Verified	0.2s
String utils	15	1	✅ Verified	1.2s
Data validation	30	3	✅ Verified	5.8s
Complex parser	100	10	⏱️ Timeout	30s
Framework code	500+	Many	❌ Not suitable	-

Best practices

Do use symbolic execution for

✅ LLM-generated utility functions
✅ Mathematical calculations
✅ Validation logic
✅ Simple transformations
✅ Code with clear contracts

Don’t use symbolic execution for

❌ Entire applications
❌ Code with external dependencies
❌ Deep recursion algorithms
❌ Real-time systems
❌ Code with I/O operations

API reference

from qwed_new.core.symbolic_verifier import SymbolicVerifier

# Create verifier with custom limits
verifier = SymbolicVerifier(
    max_loop_iterations=100,
    max_recursion_depth=50,
    timeout_seconds=30
)

# Check if code is verifiable before executing
budget = verifier.get_verification_budget(code)
if budget["estimated_time_seconds"] > 60:
    print("Code too complex for symbolic execution")

# Verify with fallback
result = verifier.verify_code(
    code=code,
    fallback_to_ast=True  # Use AST if symbolic fails
)

Bounded model checking

verify_bounded() applies loop and recursion bounds to the code before verification. If the bounds transform itself fails (e.g., the AST cannot be unparsed after transformation), the method returns a bounds_transform_error status instead of silently falling back to the original code:

result = verifier.verify_bounded(code, loop_bound=50, recursion_depth=20)

if result["status"] == "bounds_transform_error":
    print(result["message"])  # Describes why the transform failed
    print(result["issues"])   # [{"type": "error", "description": "..."}]

Fail-closed result semantics

SymbolicVerifier.verify_code() treats absence of proof as failure. A result is only is_verified: true when CrossHair actually proved every function it checked. Code that contains no functions, contains untyped functions, or mixes typed and untyped functions cannot pass.

Result fields

The result dictionary distinguishes how each function was handled, so you can tell the difference between “verified” and “not actually checked”:

Field	Meaning
`is_verified`	`true` only if at least one function was checked, all checked functions were proven, no functions were skipped or unverifiable, and no counterexamples were found.
`status`	One of `verified`, `no_verifiable_functions`, `unverifiable`, `counterexamples_found`, or `verification_error`.
`message`	Human-readable explanation of the status.
`functions_discovered`	Total functions found in the source.
`functions_checked`	Functions actually submitted to CrossHair.
`functions_verified`	Functions CrossHair proved.
`functions_skipped`	Functions skipped (for example, missing type annotations).
`functions_unverifiable`	Functions that could not be proven, including skipped functions and verification errors.
`counterexamples_found`	Number of counterexample issues reported.
`issues`	Per-function issue records. Skipped functions appear with `type: "unverifiable"`.

Status values

`status`	When it applies
`verified`	At least one function was checked and every checked function was proven.
`no_verifiable_functions`	No functions were found, or every discovered function was skipped (for example, all untyped).
`unverifiable`	At least one function was skipped or could not be proven. Some functions may still have been verified.
`counterexamples_found`	CrossHair returned at least one counterexample.
`verification_error`	Verification did not complete cleanly for another reason.

Typed function — passes

from qwed_new.core.symbolic_verifier import SymbolicVerifier

verifier = SymbolicVerifier(timeout_seconds=5)

result = verifier.verify_code("""
def add(x: int, y: int) -> int:
    return x + y
""")

result["is_verified"]            # True
result["status"]                 # "verified"
result["functions_checked"]      # 1
result["functions_verified"]     # 1
result["functions_unverifiable"] # 0

Untyped function — fails closed

CrossHair requires type hints. Untyped functions are now reported as skipped and unverifiable rather than silently passing:

result = verifier.verify_code("""
def add(a, b):
    return a + b
""")

result["is_verified"]              # False
result["status"]                   # "no_verifiable_functions"
result["functions_discovered"]     # 1
result["functions_checked"]        # 0
result["functions_skipped"]        # 1
result["functions_unverifiable"]   # 1
result["issues"][0]["type"]        # "unverifiable"

Mixed typed and untyped — fails closed

Even if some functions verify, a single skipped function prevents an overall pass:

result = verifier.verify_code("""
def typed_add(a: int, b: int) -> int:
    return a + b

def untyped_add(a, b):
    return a + b
""")

result["is_verified"]            # False
result["status"]                 # "unverifiable"
result["functions_discovered"]   # 2
result["functions_checked"]      # 1
result["functions_verified"]     # 1
result["functions_skipped"]      # 1

Code with no functions — fails closed

Source that contains only top-level statements has nothing to prove and is no longer reported as is_verified: true:

result = verifier.verify_code("x = 1 + 2\nprint(x)\n")

result["is_verified"]   # False
result["status"]        # "no_verifiable_functions"

Do not gate downstream behavior on is_verified alone without also inspecting functions_checked. If functions_checked == 0, no symbolic proof was performed regardless of any other field.

Summary

Question	Answer
Does symbolic execution work on all code?	No. Limited by path explosion.
How does QWED handle this?	Bounded model checking + fallbacks.
What’s the typical code size limit?	~50-100 lines of simple code.
When should I use it?	LLM-generated utilities and validation.

“My guess would be that technique falls apart at depths required in real world coding environments.” — Reddit criticism. We agree. That’s why we have bounds and fallbacks.

Architecture

Concepts

Specifications

Integration Patterns

Contributing

Symbolic execution limits in QWED

The path explosion problem

When CrossHair works

When CrossHair fails

QWED’s bounded model checking solution

Fail-closed verification results

Why untyped code fails closed

Configuration guide

Conservative (fast, less coverage)

Balanced (default)

Thorough (slow, more coverage)

Fallback strategy

Honest benchmarks

Best practices

Do use symbolic execution for

Don’t use symbolic execution for

API reference

Bounded model checking

Fail-closed result semantics

Result fields

Status values

Typed function — passes

Untyped function — fails closed

Mixed typed and untyped — fails closed

Code with no functions — fails closed

Summary

​The path explosion problem

​When CrossHair works

​When CrossHair fails

​QWED’s bounded model checking solution

​Fail-closed verification results

​Why untyped code fails closed

​Configuration guide

​Conservative (fast, less coverage)

​Balanced (default)

​Thorough (slow, more coverage)

​Fallback strategy

​Honest benchmarks

​Best practices

​Do use symbolic execution for

​Don’t use symbolic execution for

​API reference

​Bounded model checking

​Fail-closed result semantics

​Result fields

​Status values

​Typed function — passes

​Untyped function — fails closed

​Mixed typed and untyped — fails closed

​Code with no functions — fails closed

​Summary

The path explosion problem

When CrossHair works

When CrossHair fails

QWED’s bounded model checking solution

Fail-closed verification results

Why untyped code fails closed

Configuration guide

Conservative (fast, less coverage)

Balanced (default)

Thorough (slow, more coverage)

Fallback strategy

Honest benchmarks

Best practices

Do use symbolic execution for

Don’t use symbolic execution for

API reference

Bounded model checking

Fail-closed result semantics

Result fields

Status values

Typed function — passes

Untyped function — fails closed

Mixed typed and untyped — fails closed

Code with no functions — fails closed

Summary