> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qwedai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Monitoring

> Track QWED verification success rate, response time, and errors in production. Includes Datadog integration examples and key metrics to monitor for health.

## Key metrics to track

### 1. Verification success rate

**What to track:**

* Percentage of successful verifications
* Number of failed verifications
* Failure reasons

**Example logging:**

```python theme={null}
from datadog import statsd

result = qwed.verify(query)

if result.verified:
    statsd.increment('qwed.verification.success')
else:
    statsd.increment('qwed.verification.failure')
    statsd.increment(f'qwed.failure.{result.reason}')
```

**Target:** >99% success rate

***

### 2. Response time

**What to track:**

* Average response time
* p50, p95, p99 latency
* Slow queries

```python theme={null}
import time

start = time.time()
result = qwed.verify(query)
duration = time.time() - start

# Log to monitoring
statsd.timing('qwed.response_time', duration * 1000)  # ms

if duration > 5:  # Slow query threshold
    logger.warning(f"Slow QWED query: {duration}s")
```

**Target:** p95 \< 3 seconds

***

### 3. API quota usage

**What to track:**

* Daily API calls
* Remaining quota
* Quota usage trend

```python theme={null}
# After each call
current_quota = client.get_quota_status()

statsd.gauge('qwed.quota.used', current_quota.used)
statsd.gauge('qwed.quota.remaining', current_quota.remaining)

if current_quota.remaining < 1000:
    alert_team("QWED quota low!")
```

***

### 4. Error rates

**What to track:**

* Network errors
* Timeout errors
* Authentication errors
* Validation errors

```python theme={null}
from qwed.exceptions import *

try:
    result = qwed.verify(query)
except TimeoutError:
    statsd.increment('qwed.error.timeout')
except AuthenticationError:
    statsd.increment('qwed.error.auth')
    alert_team("QWED auth failure!")
except Exception as e:
    statsd.increment('qwed.error.unknown')
    logger.error(f"QWED error: {e}")
```

**Target:** Error rate \< 0.1%

***

## Monitoring dashboard example

### Grafana dashboard

```json theme={null}
{
  "dashboard": {
    "title": "QWED Monitoring",
    "panels": [
      {
        "title": "Verification Success Rate",
        "targets": [
          {
            "expr": "rate(qwed_verification_success_total[5m]) / rate(qwed_verification_total[5m]) * 100"
          }
        ]
      },
      {
        "title": "Response Time (p95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, qwed_response_time_bucket)"
          }
        ]
      },
      {
        "title": "Error Rate",
        "targets": [
          {
            "expr": "rate(qwed_error_total[5m])"
          }
        ]
      }
    ]
  }
}
```

***

## Alerting rules

### Critical alerts

**1. High Error Rate**

```yaml theme={null}
- alert: QWEDHighErrorRate
  expr: rate(qwed_error_total[5m]) > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "QWED error rate above 1%"
    description: "Error rate: {{ $value }}%"
```

**2. Slow Response Time**

```yaml theme={null}
- alert: QWEDSlowResponses
  expr: histogram_quantile(0.95, qwed_response_time_bucket) > 5
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "QWED p95 latency > 5s"
```

**3. Quota Low**

```yaml theme={null}
- alert: QWEDQuotaLow
  expr: qwed_quota_remaining < 1000
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "QWED quota running low"
    description: "Remaining: {{ $value }}"
```

***

## Logging best practices

### Structured logging

```python theme={null}
import logging
import json

logger = logging.getLogger('qwed')

def verify_with_logging(query, user_id):
    log_data = {
        'timestamp': time.time(),
        'user_id': user_id,
        'query': query[:100],  # Truncate
    }
    
    try:
        start = time.time()
        result = qwed.verify(query)
        duration = time.time() - start
        
        log_data.update({
            'verified': result.verified,
            'duration_ms': int(duration * 1000),
            'status': 'success'
        })
        
        logger.info(json.dumps(log_data))
        return result
        
    except Exception as e:
        log_data.update({
            'status': 'error',
            'error': str(e)
        })
        logger.error(json.dumps(log_data))
        raise
```

***

## Health checks

### Endpoint health check

```python theme={null}
from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/health/qwed')
def qwed_health():
    try:
        # Test QWED connection
        result = qwed.verify("2+2=4", timeout=5)
        
        if result.verified:
            return jsonify({
                'status': 'healthy',
                'qwed': 'operational'
            }), 200
        else:
            return jsonify({
                'status': 'degraded',
                'qwed': 'verification_failed'
            }), 503
            
    except Exception as e:
        return jsonify({
            'status': 'unhealthy',
            'qwed': 'error',
            'error': str(e)
        }), 503
```

***

## Troubleshooting alerts

When alerts fire:

1. **Check dashboard** - Review metrics
2. **Check logs** - Look for errors
3. **Test manually** - Run test script
4. **Contact support** - If issue persists

***

**Next:** [Troubleshooting guide](./troubleshooting)
