Code Detection
A Search-based Evaluator that rapidly identifies common programming language patterns in text to detect code snippets and determine their likely programming language.
- Use Case: Unauthorized Code Detection
- Analytic Engine: Search
- Related OWASP Risks:
- Related Regulations:
- EU AI Act - Output Safety
- GDPR - Data Protection
- NIS Directive - Cybersecurity
- EAR - Export Control
- Valid Inputs: Text
- Scope: Full Exchange
- Last Update: 2025 02 14
- License: Apache 2.0
- Dependencies: N/A
Detailed Description
The Code Detection - Search Evaluator uses a collection of carefully curated regular expressions to identify programming language-specific patterns in both Prompts and Responses. It's designed to quickly detect code snippets and classify them into specific programming languages based on their syntax patterns.
For more advanced detection capabilities, especially with obfuscated or modified code, consider using the Code Detection - Semantic Evaluator. The Search and Semantic Evaluators can be used together for comprehensive code detection - Search providing fast initial screening and Semantic providing deeper analysis.
It currently detects the following languages:
- Python
- Dart
- WebAssembly
- Rust
- x86/x86_64
- C
- C++
- C#
- Objective C
Initial search terms adapted from NVIDIA's garak.
Input Descriptions:
The Evaluator accepts the raw text input from both Prompt and Response Events.
Law Usage Example:
This Response would trigger the Evaluator since it contains code patterns:
Here's a simple Python function to calculate factorial:
def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)
This Response would not trigger the Evaluator since it contains no code patterns:
The lambda of an idea should always be concise, much like how one might map over thoughts in a list of considerations.
Python Detector
This Response would trigger the Evaluator since it contains a function definition (def), which matches the regex for detecting Python code:
def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)
This Response would not trigger the Evaluator since it lacks any identifiable Python syntax:
We definitely need to print out our "Hello, World" imports to verify them.
Dart Detector
This Response would trigger the Evaluator since it includes File, a class commonly used in Dart I/O operations:
import 'dart:io';
File file = File('example.txt');
print(file.readAsStringSync());
This Response would not trigger the Evaluator since it lacks any Dart-specific syntax:
This is a simple text file for reading and writing.
WebAssembly Detector
This Response would trigger the Evaluator since it contains f32.add and param i32, which match the WebAssembly regex pattern:
(func $add (param i32 i32) (result i32)
local.get 0
local.get 1
i32.add)
This Response would not trigger the Evaluator since it does not contain any WebAssembly syntax:
Adding two numbers together is a common operation in programming.
Rust Detector
This Response would trigger the Evaluator since it contains fn main() and let x =, which match Rust syntax:
fn main() {
let x = 10;
println!(\"Value: {}\", x);
}
This Response would not trigger the Evaluator since it does not contain any Rust-specific syntax:
A variable holds a value and can be changed over time.
x86_64 Assembly Detector
This Response would trigger the Evaluator since it contains register names like rsi and rax, which match the x86_64 pattern:
mov rax, rsi
add rax, rdx
This Response would not trigger the Evaluator since it does not contain any x86_64-specific syntax:
Computers process instructions using their CPU registers.
x86 Assembly Detector
This Response would trigger the Evaluator since it contains mov, cmp, and jnz, which match x86 instruction patterns:
mov ax, bx
cmp ax, 10
jnz label
This Response would not trigger the Evaluator since it does not contain any x86-specific syntax:
Branching logic helps in conditional operations.
C Detector
This Response would trigger the Evaluator since it contains #include, int main(), and fprintf(), which match C syntax:
#include <stdio.h>
int main() {
fprintf(stdout, \"Hello, world!\\n\");
return 0;
}
This Response would not trigger the Evaluator since it does not contain any C-specific syntax:
Printing text is useful for debugging programs.
C++ Detector
This Response would trigger the Evaluator since it contains #include, int main(), and std::cout, which match C++ syntax:
#include <iostream>
using namespace std;
int main() {
cout << \"Hello, world!\" << endl;
return 0;
}
This Response would not trigger the Evaluator since it does not contain any C++-specific syntax:
A standard library provides common utilities in programming languages.
C# Detector
This Response would trigger the Evaluator since it contains using System; and namespace MyApp, which match C# syntax:
using System;
namespace MyApp {
class Program {
static void Main() {
Console.WriteLine(\"Hello, world!\");
}
}
}
This Response would not trigger the Evaluator since it does not contain any C#-specific syntax:
Namespaces help organize code and avoid name conflicts.
Objective-C Detector
This Response would trigger the Evaluator since it contains @interface, @implementation, and @end, which match Objective-C syntax:
#import <Foundation/Foundation.h>
@interface MyClass : NSObject
- (void)sayHello;
@end
@implementation MyClass
- (void)sayHello {
NSLog(@\"Hello, world!\");
}
@end
This Response would not trigger the Evaluator since it does not contain any Objective-C-specific syntax:
Classes and objects help encapsulate functionality in object-oriented programming.
Output Descriptions:
Returns a Finding containing Boolean values for each language detected:
{
"name": "Code Detection Search",
"matched": [True/False],
}
Configuration Options:
- N/A
Data & Dependencies
Data Sources
An 80/20 training-test split from macrocosm-os/code-parrot-github-code was used to fine-tune the Search.
Benchmarks
The Code Detection - Search has been tested against a benchmark dataset with two different configurations:
| Dataset | Sample Size | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| ThirdLaw Code Benchmark | 51,337 github snippets | 80.3% | 76.2% | 65.4% | 70.4% |
Benchmarks last updated: March 2025
Usage Examples
Here's how to incorporate the Code Detector - Search in your Law:
if CodeDetection-Search in ScopeType then run InterventionType
Security, Compliance & Risk Assessment
Compliance & Privacy:
- EU AI Act - supports EU AI Act compliance through the continuous monitoring of high-risk AI systems that can generate and run their own code.
- NIS Directive - supports cybersecurity by detecting potentially malicious code execution
- GDPR - supports data protection by identifying and flagging sensitive information in code
- EAR - helps identify and control potential dual-use code generation
Revision History:
2025-02-18: Initial release
- Initial pattern library for 10 programming languages
- macrocosm-os/code-parrot-github-code benchmark results
- Initial documentation
Documentation Updated: 2025 02 22