Website Detection
A Search-based Evaluator that rapidly identifies website and URL patterns in text using carefully curated regular expressions. It classifies different types of web addresses including full URLs, domain names, IP addresses, and local development URLs.
- Use Case: Website Detection, Security Monitoring
- Analytic Engine: Search
- OWASP Risks:
- Compliance Areas:
- EU AI Act - System Security
- NIS Directive - Cybersecurity
- GDPR - Data Protection
- Valid Inputs: Text
- Scope: Full Response
- Last Update: 2025 02 26
- License: ThirdLaw License
- Dependencies: N/A
Detailed Description
The Website Detection - Search Evaluator uses specialized pattern recognition to identify various types of web addresses that might appear in text content. It detects full URLs with protocols (http/https), domain names (with or without www prefix), IP addresses (with optional ports), and local development URLs (localhost/127.0.0.1). This capability is particularly useful for monitoring whether LLMs are generating or processing website references that could potentially be used for data exfiltration, phishing attempts, or unauthorized network connections.
This Evaluator helps organizations identify when an LLM might be attempting to reference external web resources or internal network locations, which is important for security monitoring, preventing data leakage, and enforcing proper boundaries for LLM capabilities.
Input Descriptions:
The Evaluator accepts text input from both Prompt and Response Events within an Exchange.
Law Usage Example:
This Response would trigger the Evaluator since it contains a full URL with protocol and domain name:
Please visit https://example.com/products?category=electronics for more information about our latest offerings.
This Response would trigger the Evaluator since it contains a domain name without protocol but is still recognizable as a website:
You can reach our support team through the contact form on example.org or by email.
This Response would trigger the Evaluator since it contains an IP address with port number that could be used to access a web server:
The server is running at 192.168.1.100:8080 and can be accessed from your internal network.
This Response would trigger the Evaluator since it contains a localhost reference commonly used for web development:
For local development, open your browser and navigate to localhost:3000 to view the application.
This Response would not trigger the Evaluator since it doesn't contain any website patterns, URLs, domain names, IP addresses, or localhost references:
The documentation explains how to configure your application settings properly.
Output Descriptions:
Returns a Finding containing Boolean flags for each type of web address:
{
"WebsiteDetection-Search.any": [True/False],
"WebsiteDetection-Search.is_full_url": [True/False],
"WebsiteDetection-Search.is_domain": [True/False],
"WebsiteDetection-Search.is_ip_address": [True/False],
"WebsiteDetection-Search.is_localhost": [True/False]
}
Configuration Options:
N/A
Data & Dependencies
Data Sources
Pattern library developed based on standard URL and domain name formats, following RFC 3986 (URI) and RFC 1034/1035 (DNS) specifications.
Ways to Use and Deploy this Evaluator
Here's how to incorporate the Website Detection - Search in your Law:
if WebsiteDetection-Search in Response then run InterventionType
For monitoring all types of web references across the entire exchange:
if WebsiteDetection-Search.any in Exchange then run LogAlert
Blocking responses with potentially risky network references:
if WebsiteDetection-Search.is_ip_address or WebsiteDetection-Search.is_localhost in Response then run BlockResponse
Using multiple detectors together for layered protection:
if WebsiteDetection-Search.is_domain in Response and PromptInjection-Search.is_prompt_injection in Exchange then run BlockResponse and LogSecurityEvent
Security, Compliance & Risk Assessment
Security Considerations:
- Provides essential detection capability for potential data exfiltration channels, allowing organizations to identify when an LLM might be attempting to reference external websites or suggesting users visit potentially malicious domains that could be used in social engineering or phishing attacks.
- Serves as a critical component in preventing network access attempts by detecting references to internal IP addresses, localhost, or other network identifiers that could be used to probe internal systems or attempt lateral movement within a network environment.
Compliance & Privacy:
- EU AI Act - supports compliance with security requirements for AI systems by monitoring external reference attempts and preventing potentially harmful connections
- NIS Directive - supports cybersecurity requirements by providing early detection of potential network security threats and unauthorized connection attempts
- GDPR - helps prevent unauthorized data transfers to external websites and supports data protection by monitoring potential exfiltration channels
Revision History:
2025-02-26: Initial release
- Initial pattern library for website and URL detection of four pattern categories: full URLs, domain names, IP addresses, and localhost references
- Initial documentation with usage examples and security guidance