PII Detection

An Evaluator that identifies and categorizes various types of personally identifiable information (PII) present in text using an ensemble of advanced entity recognition techniques.

PII Detection - Balanced

Use Case: PII Detection and Data Privacy Compliance
Analytic Engine: Ensemble
Related OWASP Risks:
- LLM02: Insecure Output Handling
- LLM06: Sensitive Information Disclosure
Related Regulations:
- GDPR - Article 9, 35
- HIPAA - PHI Protection
- CCPA - Personal Information
- EU AI Act - High-Risk AI Systems
Valid Inputs: Text
Scope: Full Exchange
Last Update: 2025-03-04
License: MIT License
Dependencies: N/A

Detailed Description

The PII Detection - Balanced Evaluator is a robust system for detecting and categorizing personally identifiable information (PII) in text. It employs an ensemble of advanced entity recognition techniques, including NER, pattern matching, and semantic analysis. It can effectively identify a wide range of PII types, including names, emails, phone numbers, SSNs, addresses, and more. The Evaluator is designed for regulatory compliance use cases where both accuracy and comprehensive coverage are essential.

For more advanced use cases, custom Search Evaluators can be used for lightweight detection of specific patterns of PII. Additionally, custom Foundational Evaluators can be used for a more comprehensive and contextual analysis of PII presence.

Input Descriptions:

The Evaluator accepts text input from both Prompts and Completions in an Exchange.

Law Usage Example:

This input would trigger the Evaluator since it contains multiple types of PII:

Triggering Example

  Hello, my name is John Smith and my email is john.smith@example.com. You can also reach me at (555) 123-4567 or find me at 123 Main Street, New York, NY.

This input would also trigger the Evaluator for SSN detection:

Triggering Example

  Please update my social security number from 123-45-6789 to my new one 987-65-4321.

This input would not trigger the Evaluator since it contains no PII:

Non-Triggering Example

  I'd like to discuss the latest technological developments in the AI field and how they might affect business operations in the coming years.

Output Descriptions:

Returns a Finding containing detection results for various PII entity types:

Finding Structure (Abbreviated)
{
  "matched_entities": [
    {
      "entity_type": "example1",
      "score": [0-1],
      "start": int,
      "end": int,
    }
    ...

  ],
  "name": "PII Detector Balanced"
}

Configuration Options:

Parameter	Description	Default
`pii_entities`	List of specific entity types to detect	All supported entities
`apply_to`	Specifies whether to analyze the "prompt", "completion", or "both"	"prompt"

Data & Dependencies

Data Sources

The entity recognition models are trained on the following focused datasets:

AI4Privacy PII Dataset - PII Entities
LMsys Chatbot Arena Conversations - Control Dataset

Benchmarks

The PII Detection - Balanced has been tested against benchmark datasets to assess its effectiveness:

Dataset	Sample Size	Accuracy	Precision	Recall	F1 Score
ThirdLaw PII Benchmark	210k conversations	71.4%	68.1%	91.8%	78.2%
PrivateAI PII Benchmark	3000 words	N/A	96%	95%	94%

Key Findings:

Performance varied across different PII categories

Benchmarks last updated: March 2025

Ways to Use and Deploy this Evaluator

Here's how to incorporate the PII Detection - Balanced in your Law:

ThirdLaw DSL
if PiiDetection-Ensemble.pii_detected in Both then run BlockCompletion

You can create more targeted interventions for specific PII types:

ThirdLaw DSL
  if PiiDetection-Ensemble.us_ssn.detected in Both then run LogAndAlert
  if PiiDetection-Ensemble.credit_card.detected in Both then run MaskAndLog

Security, Compliance & Risk Assessment

Security Considerations:

Entity detection operates entirely within your secure environment, never transmitting sensitive data externally

Compliance & Privacy:

GDPR - Supports Article 9 (processing of special categories of personal data) and Article 35 (data protection impact assessment) compliance
HIPAA - Identifies Protected Health Information (PHI) to prevent unauthorized disclosure
CCPA - Detects personal information as defined under California law
EU AI Act - Helps implement safeguards for high-risk AI systems handling personal data

Revision History:

2025-03-04: Initial release

Comprehensive entity detection across 39 PII entity types
ThirdLaw benchmark results
Initial documentation

Detailed Description​

Input Descriptions:​

Law Usage Example:​

Output Descriptions:​

Configuration Options:​

Data & Dependencies​

Data Sources​

Benchmarks​

Ways to Use and Deploy this Evaluator​

Security, Compliance & Risk Assessment​

Security Considerations:​

Compliance & Privacy:​

Revision History:​

2025-03-04: Initial release​