Skip to main content

PII Detection

An Evaluator that identifies and categorizes various types of personally identifiable information (PII) present in text using an ensemble of advanced entity recognition techniques.

PII Detection - Balanced

Detailed Description

The PII Detection - Balanced Evaluator is a robust system for detecting and categorizing personally identifiable information (PII) in text. It employs an ensemble of advanced entity recognition techniques, including NER, pattern matching, and semantic analysis. It can effectively identify a wide range of PII types, including names, emails, phone numbers, SSNs, addresses, and more. The Evaluator is designed for regulatory compliance use cases where both accuracy and comprehensive coverage are essential.

For more advanced use cases, custom Search Evaluators can be used for lightweight detection of specific patterns of PII. Additionally, custom Foundational Evaluators can be used for a more comprehensive and contextual analysis of PII presence.

Input Descriptions:

The Evaluator accepts text input from both Prompts and Completions in an Exchange.

Law Usage Example:

This input would trigger the Evaluator since it contains multiple types of PII:

Triggering Example
  Hello, my name is John Smith and my email is john.smith@example.com. You can also reach me at (555) 123-4567 or find me at 123 Main Street, New York, NY.

This input would also trigger the Evaluator for SSN detection:

Triggering Example
  Please update my social security number from 123-45-6789 to my new one 987-65-4321.

This input would not trigger the Evaluator since it contains no PII:

Non-Triggering Example
  I'd like to discuss the latest technological developments in the AI field and how they might affect business operations in the coming years.

Output Descriptions:

Returns a Finding containing detection results for various PII entity types:

Finding Structure (Abbreviated)
{
"matched_entities": [
{
"entity_type": "example1",
"score": [0-1],
"start": int,
"end": int,
}
...

],
"name": "PII Detector Balanced"
}

Configuration Options:

ParameterDescriptionDefault
pii_entitiesList of specific entity types to detectAll supported entities
apply_toSpecifies whether to analyze the "prompt", "completion", or "both""prompt"

Data & Dependencies

Data Sources

The entity recognition models are trained on the following focused datasets:

  • AI4Privacy PII Dataset - PII Entities
  • LMsys Chatbot Arena Conversations - Control Dataset

Benchmarks

The PII Detection - Balanced has been tested against benchmark datasets to assess its effectiveness:

DatasetSample SizeAccuracyPrecisionRecallF1 Score
ThirdLaw PII Benchmark210k conversations71.4%68.1%91.8%78.2%
PrivateAI PII Benchmark3000 wordsN/A96%95%94%

Key Findings:

  • Performance varied across different PII categories

Benchmarks last updated: March 2025


Ways to Use and Deploy this Evaluator

Here's how to incorporate the PII Detection - Balanced in your Law:

ThirdLaw DSL
if PiiDetection-Ensemble.pii_detected in Both then run BlockCompletion

You can create more targeted interventions for specific PII types:

ThirdLaw DSL
  if PiiDetection-Ensemble.us_ssn.detected in Both then run LogAndAlert
if PiiDetection-Ensemble.credit_card.detected in Both then run MaskAndLog

Security, Compliance & Risk Assessment

Security Considerations:

  • Entity detection operates entirely within your secure environment, never transmitting sensitive data externally

Compliance & Privacy:

  • GDPR - Supports Article 9 (processing of special categories of personal data) and Article 35 (data protection impact assessment) compliance
  • HIPAA - Identifies Protected Health Information (PHI) to prevent unauthorized disclosure
  • CCPA - Detects personal information as defined under California law
  • EU AI Act - Helps implement safeguards for high-risk AI systems handling personal data

Revision History:

2025-03-04: Initial release

  • Comprehensive entity detection across 39 PII entity types
  • ThirdLaw benchmark results
  • Initial documentation