Skip to main content

Code Detection

A semantic analysis-based Evaluator that uses vector embeddings to identify code patterns and programming language constructs in text by comparing against curated collections of code and neutral content.

Code Detection - Semantic

Detailed Description

The Evaluator leverages a vector database to perform semantic similarity searches against collections of code and neutral content. This approach provides more nuanced detection compared to pattern-based approaches, enabling it to catch code-like content even when it has been obfuscated or modified.

For quick pattern-based detection of common programming languages, consider using the CodeDetection-Search Evaluator. The Search and Semantic Evaluators can be used together for comprehensive code detection - Search providing fast initial screening and Semantic providing deeper analysis.

Input Descriptions

The Evaluator accepts text input and processes both Prompts and completions.

Law Usage Example

This Response would trigger the Evaluator since it contains code-like patterns:

Triggering Example
def calculate_sum(a, b):
result = a + b
return result

print(calculate_sum(5, 3))
Triggering Example
function calculateTotal(items) {
return items.reduce((sum, item) => {
return sum + item.price;
}, 0);
}

const cart = [{price: 10}, {price: 20}];
console.log(calculateTotal(cart));
Triggering Example
struct Point {
x: i32,
y: i32,
}

impl Point {
fn distance(&self, other: &Point) -> f64 {
let dx = (self.x - other.x) as f64;
let dy = (self.y - other.y) as f64;
(dx * dx + dy * dy).sqrt()
}
}

This Response would not trigger the Evaluator since it contains natural language:

Non-Triggering Example
The sum of two numbers can be calculated by adding them together. For example, five plus three equals eight.

Output Descriptions

Returns a Finding containing the closest matching collection:

Finding Structure
{
"CodeDetector-Balanced.closest_collection": ["code", "neutral"],
"CodeDetector-Balanced.code.max_similarity: [0-1],
"CodeDetector-Balanced.neutral.max_similarity: [0-1],
}

Configuration Options

N/A


Data & Dependencies

Data Sources

  • Synthetically generated code examples.

Benchmarks

The Code Detection - Semantic has been tested against one benchmark dataset to assess its effectiveness:

The ThirdLaw Legal Document dataset is composed of the following open-source datasets:

DatasetSample SizeAccuracyPrecisionRecallF1
Code Detection Test Set18023 conversations57.5%46.9%97.1%63.3%

Benchmarks last updated: April 2025


Ways to Use and Deploy this Evaluator

Basic Example

Here's how to incorporate the Code Detection - Semantic in your Law:

ThirdLaw DSL
if CodeDetection-Semantic in ScopeType then run InterventionType

Security, Compliance & Risk Assessment

Security Considerations

  • Identifies LLM attempts to generate any code, which may pose a security risk

Compliance & Privacy

  • EU AI Act - supports EU AI Act compliance through semantic analysis of AI system outputs
  • NIS Directive - supports cybersecurity by detecting potentially malicious code execution

Revision History

2025-02-22: Initial Release

  • Initial implementation with code and neutral collections
  • Documentation