Skip to main content

Code Detection

A Search-based Evaluator that rapidly identifies common programming language patterns in text to detect code snippets and determine their likely programming language.

Code Detection - Search

Detailed Description

The Code Detection - Search Evaluator uses a collection of carefully curated regular expressions to identify programming language-specific patterns in both Prompts and Responses. It's designed to quickly detect code snippets and classify them into specific programming languages based on their syntax patterns.

For more advanced detection capabilities, especially with obfuscated or modified code, consider using the Code Detection - Semantic Evaluator. The Search and Semantic Evaluators can be used together for comprehensive code detection - Search providing fast initial screening and Semantic providing deeper analysis.

It currently detects the following languages:

  • Python
  • Dart
  • WebAssembly
  • Rust
  • x86/x86_64
  • C
  • C++
  • C#
  • Objective C

Initial search terms adapted from NVIDIA's garak.

Input Descriptions:

The Evaluator accepts the raw text input from both Prompt and Response Events.

Law Usage Example:

This Response would trigger the Evaluator since it contains code patterns:

Triggering Example
Here's a simple Python function to calculate factorial:

def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)

This Response would not trigger the Evaluator since it contains no code patterns:

Non-Triggering Example
    The lambda of an idea should always be concise, much like how one might map over thoughts in a list of considerations.

Python Detector

This Response would trigger the Evaluator since it contains a function definition (def), which matches the regex for detecting Python code:

Triggering Example
def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)

This Response would not trigger the Evaluator since it lacks any identifiable Python syntax:

Non-Triggering Example
We definitely need to print out our "Hello, World" imports to verify them.

Dart Detector

This Response would trigger the Evaluator since it includes File, a class commonly used in Dart I/O operations:

Triggering Example
import 'dart:io';
File file = File('example.txt');
print(file.readAsStringSync());

This Response would not trigger the Evaluator since it lacks any Dart-specific syntax:

Non-Triggering Example
This is a simple text file for reading and writing.

WebAssembly Detector

This Response would trigger the Evaluator since it contains f32.add and param i32, which match the WebAssembly regex pattern:

Triggering Example
(func $add (param i32 i32) (result i32)
local.get 0
local.get 1
i32.add)

This Response would not trigger the Evaluator since it does not contain any WebAssembly syntax:

Non-Triggering Example
Adding two numbers together is a common operation in programming.

Rust Detector

This Response would trigger the Evaluator since it contains fn main() and let x =, which match Rust syntax:

Triggering Example
fn main() {
let x = 10;
println!(\"Value: {}\", x);
}

This Response would not trigger the Evaluator since it does not contain any Rust-specific syntax:

Non-Triggering Example
A variable holds a value and can be changed over time.

x86_64 Assembly Detector

This Response would trigger the Evaluator since it contains register names like rsi and rax, which match the x86_64 pattern:

Triggering Example
mov rax, rsi
add rax, rdx

This Response would not trigger the Evaluator since it does not contain any x86_64-specific syntax:

Non-Triggering Example
Computers process instructions using their CPU registers.

x86 Assembly Detector

This Response would trigger the Evaluator since it contains mov, cmp, and jnz, which match x86 instruction patterns:

Triggering Example
mov ax, bx
cmp ax, 10
jnz label

This Response would not trigger the Evaluator since it does not contain any x86-specific syntax:

Non-Triggering Example
Branching logic helps in conditional operations.

C Detector

This Response would trigger the Evaluator since it contains #include, int main(), and fprintf(), which match C syntax:

Triggering Example
#include <stdio.h>
int main() {
fprintf(stdout, \"Hello, world!\\n\");
return 0;
}

This Response would not trigger the Evaluator since it does not contain any C-specific syntax:

Non-Triggering Example
Printing text is useful for debugging programs.

C++ Detector

This Response would trigger the Evaluator since it contains #include, int main(), and std::cout, which match C++ syntax:

Triggering Example
#include <iostream>
using namespace std;
int main() {
cout << \"Hello, world!\" << endl;
return 0;
}

This Response would not trigger the Evaluator since it does not contain any C++-specific syntax:

Non-Triggering Example
A standard library provides common utilities in programming languages.

C# Detector

This Response would trigger the Evaluator since it contains using System; and namespace MyApp, which match C# syntax:

Triggering Example
using System;
namespace MyApp {
class Program {
static void Main() {
Console.WriteLine(\"Hello, world!\");
}
}
}

This Response would not trigger the Evaluator since it does not contain any C#-specific syntax:

Non-Triggering Example
Namespaces help organize code and avoid name conflicts.

Objective-C Detector

This Response would trigger the Evaluator since it contains @interface, @implementation, and @end, which match Objective-C syntax:

Triggering Example
#import <Foundation/Foundation.h>
@interface MyClass : NSObject
- (void)sayHello;
@end

@implementation MyClass
- (void)sayHello {
NSLog(@\"Hello, world!\");
}
@end

This Response would not trigger the Evaluator since it does not contain any Objective-C-specific syntax:

Non-Triggering Example
Classes and objects help encapsulate functionality in object-oriented programming.

Output Descriptions:

Returns a Finding containing Boolean values for each language detected:

Finding Structure
{
"name": "Code Detection Search",
"matched": [True/False],

}

Configuration Options:

  • N/A

Data & Dependencies

Data Sources

An 80/20 training-test split from macrocosm-os/code-parrot-github-code was used to fine-tune the Search.

Benchmarks

The Code Detection - Search has been tested against a benchmark dataset with two different configurations:

DatasetSample SizeAccuracyPrecisionRecallF1 Score
ThirdLaw Code Benchmark51,337 github snippets80.3%76.2%65.4%70.4%

Benchmarks last updated: March 2025


Usage Examples

Here's how to incorporate the Code Detector - Search in your Law:

ThirdLaw DSL
if CodeDetection-Search in ScopeType then run InterventionType

Security, Compliance & Risk Assessment

Compliance & Privacy:

  • EU AI Act - supports EU AI Act compliance through the continuous monitoring of high-risk AI systems that can generate and run their own code.
  • NIS Directive - supports cybersecurity by detecting potentially malicious code execution
  • GDPR - supports data protection by identifying and flagging sensitive information in code
  • EAR - helps identify and control potential dual-use code generation

Revision History:

2025-02-18: Initial release

Documentation Updated: 2025 02 22