[!IMPORTANT] PII masking is an enterprise privacy feature that detects and masks Personally Identifiable Information (PII) before your data is sent to LLM providers. This is critical for HIPAA, GDPR, and PCI-DSS compliance.
📋 Table of Contents
- What is PII Masking?
- Why It Matters
- Installation
- Quick Start
- Supported PII Types
- Usage Examples
- Enterprise Use Cases
- How It Works
- Configuration
- Limitations
- FAQ
What is PII Masking?
PII (Personally Identifiable Information) masking automatically detects and replaces sensitive data with placeholders before sending queries to LLM providers. Example:Why It Matters
The Problem
When you send queries to cloud LLM providers (OpenAI, Anthropic, etc.), your data passes through their servers:- 💳 Credit card numbers exposed
- 📧 Email addresses harvested
- 🔢 SSNs leaked
- 📞 Phone numbers stored
- 🏥 Medical data (HIPAA violation)
The Solution
QWED masks PII before sending to LLMs:- ✅ HIPAA Compliant (Healthcare)
- ✅ GDPR Compliant (EU Privacy)
- ✅ PCI-DSS Compliant (Finance)
- ✅ Zero Trust architecture
Installation
Step 1: Install PII Extra
PII masking requires Microsoft Presidio (optional dependency):Step 2: Download spaCy Model
Presidio uses spaCy for NLP:Verify Installation
Quick Start
Python API
CLI
Supported PII Types
QWED detects 9 types of PII using Microsoft Presidio:| Entity Type | Examples | Use Case |
|---|---|---|
EMAIL_ADDRESS | john@example.com | Identity |
CREDIT_CARD | 4532-1234-5678-9010 | Finance (PCI-DSS) |
PHONE_NUMBER | 555-123-4567, +1-555-1234 | Contact info |
US_SSN | 123-45-6789 | Identity (US) |
IBAN_CODE | DE89370400440532013000 | Banking (EU) |
IP_ADDRESS | 192.168.1.1 | Network security |
PERSON | John Doe, Jane Smith | Names |
LOCATION | New York, 123 Main St | Addresses |
MEDICAL_LICENSE | DEA-1234567 | Healthcare (HIPAA) |
Detection Examples
Usage Examples
Example 1: Healthcare (HIPAA)
Scenario: AI-powered medical assistant- ✅ PHI (Protected Health Information) never sent to cloud
- ✅ HIPAA compliance maintained
- ✅ Transparent audit trail in evidence
Example 2: Finance (PCI-DSS)
Scenario: Fraud detection system- ✅ PCI-DSS Level 1 compliance
- ✅ Card numbers never in LLM logs
- ✅ Works with local LLMs (zero cloud exposure)
Example 3: Legal (Attorney-Client Privilege)
Scenario: Contract analysisHow It Works
Architecture
Detection Process
- Analyze: Presidio scans text for PII patterns
- Detect: Identifies entity types and positions
- Mask: Replaces with
<ENTITY_TYPE>placeholders - Send: Masked text goes to LLM
- Evidence: PII metadata saved for audit
One-Way Masking
QWED uses non-reversible masking:- ✅ Simple and secure
- ✅ No mapping tables to leak
- ✅ “Proving without revealing”
Configuration
Custom Entity Types
Only detect specific PII types:Disable for Specific Queries
Environment-Based
Limitations
1. Detection Accuracy
- False Positives: May mask non-PII (e.g., “john” as name)
- False Negatives: May miss obfuscated PII
- Language: English only (v2.2.0)
2. Performance
- Latency: Adds ~100-200ms per query
- Memory: Requires ~150MB for spaCy model
3. Context Loss
Masked data loses semantic meaning:FAQ
Q: Does PII masking work with Ollama?
A: Yes! In fact, it’s perfect for Ollama:- ✅ LLM runs locally
- ✅ PII masked locally
- ✅ Zero cloud exposure
Q: What if Presidio isn’t installed?
A: Graceful error with install instructions:Q: Can I see what was masked?
A: Yes! Check the evidence:Q: Does it work with caching?
A: Yes! Cached results also include PII info.Q: What’s the performance impact?
A: Typically 100-200ms added latency. Negligible compared to LLM API call (~1-3s).Q: Is it secure?
A: Yes:- ✅ Runs locally (not a cloud service)
- ✅ Microsoft Presidio (enterprise-grade)
- ✅ No data sent to QWED servers
- ✅ One-way masking (no reverse mapping)
Enterprise Use Cases
Healthcare: HIPAA Compliance
Finance: PCI-DSS Compliance
Legal: Data Privacy Laws
Next Steps
- Install:
pip install 'qwed[pii]' - Test:
qwed pii "your sensitive text" - Integrate: Add
mask_pii=Trueto your code - Audit: Check
evidence['pii_masked']for compliance
© 2025 QWED. Privacy-first AI verification.