Features
- Chain-of-Thought Validation - Parse and verify reasoning steps
- Result Caching - LRU + Redis for repeated queries
- Multi-Provider Support - Anthropic, Azure OpenAI, Google Gemini, OpenAI
- Semantic Fact Extraction - Identify verifiable claims
Usage
Multi-provider verification
Caching
The engine caches results to avoid redundant LLM calls:ReasoningVerifier instance owns its own cache, so verifiers configured with different providers or modes do not share entries. Cache keys incorporate the query, primary formula, the provider list (in configured order), and the cross-validation flag, and cached results are returned as defensive copies so callers cannot mutate stored entries.
Cache key composition
A cache entry is only reused when every component below matches the current verification call. Changing any of these forces a fresh computation:The verification query text, byte-for-byte.
The expression of the primary task derived from the query.
The configured provider list, in the order it was supplied. Provider precedence is part of cache identity: reordering providers (for example, swapping the primary) invalidates the cache, as do additions or removals.
Whether cross-validation is requested. Toggling this flag invalidates cache reuse, because cross-validated and single-provider results are not interchangeable.
TTL enforcement
Cached entries are bound to their creation time and are evaluated againstcache_ttl_seconds on every lookup. When an entry’s age exceeds the TTL, it is evicted and the verifier recomputes the result. Stale entries are never returned, even if they remain within the LRU window.
Instance isolation
The cache is per-instance state, not shared acrossReasoningVerifier objects. Two verifiers — even with identical configuration — start with independent caches and never read or write each other’s entries. This prevents stale results from one verification context replaying under a different one.
Fail-closed prerequisites
The engine fails closed when required reasoning evidence is missing. A verification result is only marked valid when both prerequisites below are satisfied.A substantive reasoning trace must be produced by the primary provider. The verifier scans each numbered or bulleted trace entry and rejects entries that contain any of the following non-substantive markers (case-insensitive):
no llm providercould not generate reasoning traceno structured reasoning trace generatedfailed to generate reasoning tracen/aunavailableno reasoningrate limit exceeded
- after stripping leading whitespace. If no substantive entries remain, the result includes the issue Reasoning trace unavailable or non-substantive and is_valid is False.Cross-validation requires a secondary provider that is different from the primary. The primary provider is no longer reused as a fallback.If
enable_cross_validation=True and no distinct secondary provider is configured, the result includes the issue Cross-validation requested but no distinct secondary provider is available and is_valid is False.Formula equivalence
When verifying whether two arithmetic formulas produce the same result, the engine uses a safe AST-based evaluator instead of Python’seval(). The evaluator only allows basic arithmetic operators (+, -, *, /, **) and numeric literals — any other expression is rejected. This prevents code-injection risks while still supporting numeric fallback checks during reasoning validation.
Fail-closed prerequisites
The Reasoning Engine refuses to mark a result as valid when the evidence required to verify it is missing. Absence of issues is not proof — if the engine cannot produce a substantive reasoning trace, or cannot actually run cross-validation when requested, the result is reported as invalid with an explanatory issue.Required reasoning trace
Verification fails closed when no usable reasoning trace can be produced. This occurs when:- No LLM provider is available to generate the trace.
- The provider returns an empty or placeholder response.
- Trace generation raises an error or hits a provider rate limit.
- The trace contains only non-substantive markers such as
N/A,unavailable,no reasoning,rate limit exceeded, or messages indicating the trace could not be generated.
1. ...) or - is treated as a reasoning step. Lines that match a non-substantive marker are not counted as substantive reasoning, even if they are formatted as a numbered list.
In any of these cases, result.is_valid is False and result.issues contains either Reasoning trace missing or Reasoning trace unavailable or non-substantive.
verify_reasoning.
Distinct secondary provider for cross-validation
Whenenable_cross_validation=True, the engine requires a secondary provider that is distinct from the primary. The primary provider is no longer reused as a fallback secondary path, because verifying a model against itself does not constitute cross-validation.
If only one provider is configured, cross-validation fails closed with the issue Cross-validation requested but no distinct secondary provider is available.
enable_cross_validation=False to skip the secondary check.