Skip to main content

Monitoring QWED in Production

Track QWED's performance, errors, and usage in your production environment.

Key Metrics to Track

1. Verification Success Rate

What to track:

  • Percentage of successful verifications
  • Number of failed verifications
  • Failure reasons

Example logging:

from datadog import statsd

result = qwed.verify(query)

if result.verified:
statsd.increment('qwed.verification.success')
else:
statsd.increment('qwed.verification.failure')
statsd.increment(f'qwed.failure.{result.reason}')

Target: >99% success rate


2. Response Time

What to track:

  • Average response time
  • p50, p95, p99 latency
  • Slow queries
import time

start = time.time()
result = qwed.verify(query)
duration = time.time() - start

# Log to monitoring
statsd.timing('qwed.response_time', duration * 1000) # ms

if duration > 5: # Slow query threshold
logger.warning(f"Slow QWED query: {duration}s")

Target: p95 < 3 seconds


3. API Quota Usage

What to track:

  • Daily API calls
  • Remaining quota
  • Quota usage trend
# After each call
current_quota = client.get_quota_status()

statsd.gauge('qwed.quota.used', current_quota.used)
statsd.gauge('qwed.quota.remaining', current_quota.remaining)

if current_quota.remaining < 1000:
alert_team("QWED quota low!")

4. Error Rates

What to track:

  • Network errors
  • Timeout errors
  • Authentication errors
  • Validation errors
from qwed.exceptions import *

try:
result = qwed.verify(query)
except TimeoutError:
statsd.increment('qwed.error.timeout')
except AuthenticationError:
statsd.increment('qwed.error.auth')
alert_team("QWED auth failure!")
except Exception as e:
statsd.increment('qwed.error.unknown')
logger.error(f"QWED error: {e}")

Target: Error rate < 0.1%


Monitoring Dashboard Example

Grafana Dashboard

{
"dashboard": {
"title": "QWED Monitoring",
"panels": [
{
"title": "Verification Success Rate",
"targets": [
{
"expr": "rate(qwed_verification_success_total[5m]) / rate(qwed_verification_total[5m]) * 100"
}
]
},
{
"title": "Response Time (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, qwed_response_time_bucket)"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(qwed_error_total[5m])"
}
]
}
]
}
}

Alerting Rules

Critical Alerts

1. High Error Rate

- alert: QWEDHighErrorRate
expr: rate(qwed_error_total[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "QWED error rate above 1%"
description: "Error rate: {{ $value }}%"

2. Slow Response Time

- alert: QWEDSlowResponses
expr: histogram_quantile(0.95, qwed_response_time_bucket) > 5
for: 10m
labels:
severity: warning
annotations:
summary: "QWED p95 latency > 5s"

3. Quota Low

- alert: QWEDQuotaLow
expr: qwed_quota_remaining < 1000
for: 1m
labels:
severity: warning
annotations:
summary: "QWED quota running low"
description: "Remaining: {{ $value }}"

Logging Best Practices

Structured Logging

import logging
import json

logger = logging.getLogger('qwed')

def verify_with_logging(query, user_id):
log_data = {
'timestamp': time.time(),
'user_id': user_id,
'query': query[:100], # Truncate
}

try:
start = time.time()
result = qwed.verify(query)
duration = time.time() - start

log_data.update({
'verified': result.verified,
'duration_ms': int(duration * 1000),
'status': 'success'
})

logger.info(json.dumps(log_data))
return result

except Exception as e:
log_data.update({
'status': 'error',
'error': str(e)
})
logger.error(json.dumps(log_data))
raise

Health Checks

Endpoint Health Check

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/health/qwed')
def qwed_health():
try:
# Test QWED connection
result = qwed.verify("2+2=4", timeout=5)

if result.verified:
return jsonify({
'status': 'healthy',
'qwed': 'operational'
}), 200
else:
return jsonify({
'status': 'degraded',
'qwed': 'verification_failed'
}), 503

except Exception as e:
return jsonify({
'status': 'unhealthy',
'qwed': 'error',
'error': str(e)
}), 503

Troubleshooting Alerts

When alerts fire:

  1. Check dashboard - Review metrics
  2. Check logs - Look for errors
  3. Test manually - Run test script
  4. Contact support - If issue persists

Next: Troubleshooting Guide