Website Detection

A Search-based Evaluator that rapidly identifies website and URL patterns in text using carefully curated regular expressions. It classifies different types of web addresses including full URLs, domain names, IP addresses, and local development URLs.

Website Detection - Search

Use Case: Website Detection, Security Monitoring
Analytic Engine: Search
OWASP Risks:
Compliance Areas:
- EU AI Act - System Security
- NIS Directive - Cybersecurity
- GDPR - Data Protection
Valid Inputs: Text
Scope: Full Response
Last Update: 2025 02 26
License: ThirdLaw License
Dependencies: N/A

Detailed Description

The Website Detection - Search Evaluator uses specialized pattern recognition to identify various types of web addresses that might appear in text content. It detects full URLs with protocols (http/https), domain names (with or without www prefix), IP addresses (with optional ports), and local development URLs (localhost/127.0.0.1). This capability is particularly useful for monitoring whether LLMs are generating or processing website references that could potentially be used for data exfiltration, phishing attempts, or unauthorized network connections.

This Evaluator helps organizations identify when an LLM might be attempting to reference external web resources or internal network locations, which is important for security monitoring, preventing data leakage, and enforcing proper boundaries for LLM capabilities.

Input Descriptions:

The Evaluator accepts text input from both Prompt and Response Events within an Exchange.

Law Usage Example:

This Response would trigger the Evaluator since it contains a full URL with protocol and domain name:

Triggering Example

  Please visit https://example.com/products?category=electronics for more information about our latest offerings.

This Response would trigger the Evaluator since it contains a domain name without protocol but is still recognizable as a website:

Triggering Example

  You can reach our support team through the contact form on example.org or by email.

This Response would trigger the Evaluator since it contains an IP address with port number that could be used to access a web server:

Triggering Example

  The server is running at 192.168.1.100:8080 and can be accessed from your internal network.

This Response would trigger the Evaluator since it contains a localhost reference commonly used for web development:

Triggering Example

  For local development, open your browser and navigate to localhost:3000 to view the application.

This Response would not trigger the Evaluator since it doesn't contain any website patterns, URLs, domain names, IP addresses, or localhost references:

Non-Triggering Example

  The documentation explains how to configure your application settings properly.

Output Descriptions:

Returns a Finding containing Boolean flags for each type of web address:

Finding Structure
{
    "WebsiteDetection-Search.any": [True/False],
    "WebsiteDetection-Search.is_full_url": [True/False],
    "WebsiteDetection-Search.is_domain": [True/False],
    "WebsiteDetection-Search.is_ip_address": [True/False],
    "WebsiteDetection-Search.is_localhost": [True/False]
}

Configuration Options:

N/A

Data & Dependencies

Data Sources

Pattern library developed based on standard URL and domain name formats, following RFC 3986 (URI) and RFC 1034/1035 (DNS) specifications.

Ways to Use and Deploy this Evaluator

Here's how to incorporate the Website Detection - Search in your Law:

ThirdLaw DSL
  if WebsiteDetection-Search in Response then run InterventionType

For monitoring all types of web references across the entire exchange:

ThirdLaw DSL
  if WebsiteDetection-Search.any in Exchange then run LogAlert

Blocking responses with potentially risky network references:

ThirdLaw DSL
  if WebsiteDetection-Search.is_ip_address or WebsiteDetection-Search.is_localhost in Response then run BlockResponse

Using multiple detectors together for layered protection:

ThirdLaw DSL
  if WebsiteDetection-Search.is_domain in Response and PromptInjection-Search.is_prompt_injection in Exchange then run BlockResponse and LogSecurityEvent

Security, Compliance & Risk Assessment

Security Considerations:

Provides essential detection capability for potential data exfiltration channels, allowing organizations to identify when an LLM might be attempting to reference external websites or suggesting users visit potentially malicious domains that could be used in social engineering or phishing attacks.
Serves as a critical component in preventing network access attempts by detecting references to internal IP addresses, localhost, or other network identifiers that could be used to probe internal systems or attempt lateral movement within a network environment.

Compliance & Privacy:

EU AI Act - supports compliance with security requirements for AI systems by monitoring external reference attempts and preventing potentially harmful connections
NIS Directive - supports cybersecurity requirements by providing early detection of potential network security threats and unauthorized connection attempts
GDPR - helps prevent unauthorized data transfers to external websites and supports data protection by monitoring potential exfiltration channels

Revision History:

2025-02-26: Initial release

Initial pattern library for website and URL detection of four pattern categories: full URLs, domain names, IP addresses, and localhost references
Initial documentation with usage examples and security guidance

Detailed Description​

Input Descriptions:​

Law Usage Example:​

Output Descriptions:​

Configuration Options:​

Data & Dependencies​

Data Sources​

Ways to Use and Deploy this Evaluator​

Security, Compliance & Risk Assessment​

Security Considerations:​

Compliance & Privacy:​

Revision History:​

2025-02-26: Initial release​

Detailed Description

Input Descriptions:

Law Usage Example:

Output Descriptions:

Configuration Options:

Data & Dependencies

Data Sources

Ways to Use and Deploy this Evaluator

Security, Compliance & Risk Assessment

Security Considerations:

Compliance & Privacy:

Revision History:

2025-02-26: Initial release