Skip to main content
The Stats Engine executes statistical queries on tabular data. All model-generated code runs inside a secure Docker sandbox — in-process execution paths (Wasm and restricted Python) are disabled.

Features

  • Docker sandbox — Full container isolation for all statistical code execution
  • Fail-closed execution — If Docker is unavailable, verification is blocked rather than falling back to in-process execution
  • Pre-execution security validation — AST-based code analysis before Docker execution
  • Live Docker health checks — The executor verifies Docker availability on each request, not just at startup

Prerequisites

The Stats Engine requires a running Docker daemon. Without Docker, all statistical verification requests return HTTP 503. See the deployment guide for setup instructions.

Usage

import pandas as pd
from qwed_sdk import QWEDClient

client = QWEDClient(api_key="qwed_...")

# Create sample data
df = pd.DataFrame({
    "product": ["A", "B", "C"],
    "sales": [100, 200, 150]
})

# Verify statistical claim
result = client.verify_stats(
    query="What is the average sales?",
    data=df
)
print(result.answer)  # 150.0

Execution model

All generated statistical code is executed inside a Docker container with enforced memory and CPU limits. The engine does not fall back to in-process execution under any circumstances.
ScenarioBehavior
Docker runningCode executes in an isolated container
Docker unavailable at startupRequests return 503 Service Temporarily Unavailable
Docker becomes unavailable mid-operationRequest is blocked and returns 503
Code fails AST security checkRequest returns 403 Verification Blocked by Security Policy
Code generation failsRequest returns Internal verification error
Previous versions of QWED offered Wasm and restricted Python fallbacks when Docker was unavailable. These fallback paths have been removed. You must have a running Docker daemon for statistical verification to work.

Error handling

When the Stats Engine encounters an internal failure — such as a code generation or translation error — it returns a generic "Internal verification error" message. Sensitive details like file paths, credentials, or stack traces are never included in the API response. If you receive this error, check the server-side logs for diagnostic details. The engine logs the exception type for debugging while keeping the client response opaque.

Direct operations

For simple operations, bypass code generation:
result = client.compute_statistics(
    data=df,
    column="sales",
    operation="mean"  # mean, median, std, var, sum, count, min, max, mode
)
OperationDescription
meanArithmetic mean of the column
medianMedian value
stdStandard deviation
varVariance
sumSum of all values
countNumber of non-NaN values
minMinimum value
maxMaximum value
modeMost frequent value (fails if multimodal)

Fail-closed validation

compute_statistics returns SUCCESS only when the result is clearly defined and safely verifiable. It returns ERROR in the following cases:
ConditionError
Column not foundColumn '{name}' not found
Unknown operationUnknown operation '{name}'
Multiple modes (multimodal data)mode is ambiguous because {n} equally frequent values exist
Mode with no valuesmode produced an undefined result (NaN)
Result is NaN (includes empty series or all-NaN columns){operation} produced an undefined result (NaN)
Empty series and all-NaN columns are caught by the NaN result check — if the underlying pandas operation returns NaN, the method returns an ERROR status rather than propagating the undefined value.
import pandas as pd

# Empty series — returns ERROR (NaN result)
df_empty = pd.DataFrame({"col": pd.Series([], dtype="float64")})
result = client.compute_statistics(data=df_empty, column="col", operation="mean")
print(result["status"])  # ERROR

# Multimodal data — returns ERROR for mode
df_multi = pd.DataFrame({"col": [1, 1, 2, 2]})
result = client.compute_statistics(data=df_multi, column="col", operation="mode")
print(result["status"])  # ERROR