PII Detection
An Evaluator that identifies and categorizes various types of personally identifiable information (PII) present in text using an ensemble of advanced entity recognition techniques.
- Use Case: PII Detection and Data Privacy Compliance
- Analytic Engine: Ensemble
- Related OWASP Risks:
- Related Regulations:
- Valid Inputs: Text
- Scope: Full Exchange
- Last Update: 2025-03-04
- License: MIT License
- Dependencies: N/A
Detailed Description
The PII Detection - Balanced Evaluator is a robust system for detecting and categorizing personally identifiable information (PII) in text. It employs an ensemble of advanced entity recognition techniques, including NER, pattern matching, and semantic analysis. It can effectively identify a wide range of PII types, including names, emails, phone numbers, SSNs, addresses, and more. The Evaluator is designed for regulatory compliance use cases where both accuracy and comprehensive coverage are essential.
For more advanced use cases, custom Search Evaluators can be used for lightweight detection of specific patterns of PII. Additionally, custom Foundational Evaluators can be used for a more comprehensive and contextual analysis of PII presence.
Input Descriptions:
The Evaluator accepts text input from both Prompts and Completions in an Exchange.
Law Usage Example:
This input would trigger the Evaluator since it contains multiple types of PII:
Hello, my name is John Smith and my email is john.smith@example.com. You can also reach me at (555) 123-4567 or find me at 123 Main Street, New York, NY.
This input would also trigger the Evaluator for SSN detection:
Please update my social security number from 123-45-6789 to my new one 987-65-4321.
This input would not trigger the Evaluator since it contains no PII:
I'd like to discuss the latest technological developments in the AI field and how they might affect business operations in the coming years.
Output Descriptions:
Returns a Finding containing detection results for various PII entity types:
{
"matched_entities": [
{
"entity_type": "example1",
"score": [0-1],
"start": int,
"end": int,
}
...
],
"name": "PII Detector Balanced"
}
Configuration Options:
| Parameter | Description | Default |
|---|---|---|
pii_entities | List of specific entity types to detect | All supported entities |
apply_to | Specifies whether to analyze the "prompt", "completion", or "both" | "prompt" |
Data & Dependencies
Data Sources
The entity recognition models are trained on the following focused datasets:
- AI4Privacy PII Dataset - PII Entities
- LMsys Chatbot Arena Conversations - Control Dataset
Benchmarks
The PII Detection - Balanced has been tested against benchmark datasets to assess its effectiveness:
| Dataset | Sample Size | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| ThirdLaw PII Benchmark | 210k conversations | 71.4% | 68.1% | 91.8% | 78.2% |
| PrivateAI PII Benchmark | 3000 words | N/A | 96% | 95% | 94% |
Key Findings:
- Performance varied across different PII categories
Benchmarks last updated: March 2025
Ways to Use and Deploy this Evaluator
Here's how to incorporate the PII Detection - Balanced in your Law:
if PiiDetection-Ensemble.pii_detected in Both then run BlockCompletion
You can create more targeted interventions for specific PII types:
if PiiDetection-Ensemble.us_ssn.detected in Both then run LogAndAlert
if PiiDetection-Ensemble.credit_card.detected in Both then run MaskAndLog
Security, Compliance & Risk Assessment
Security Considerations:
- Entity detection operates entirely within your secure environment, never transmitting sensitive data externally
Compliance & Privacy:
- GDPR - Supports Article 9 (processing of special categories of personal data) and Article 35 (data protection impact assessment) compliance
- HIPAA - Identifies Protected Health Information (PHI) to prevent unauthorized disclosure
- CCPA - Detects personal information as defined under California law
- EU AI Act - Helps implement safeguards for high-risk AI systems handling personal data
Revision History:
2025-03-04: Initial release
- Comprehensive entity detection across 39 PII entity types
- ThirdLaw benchmark results
- Initial documentation