Code Detection

A Search-based Evaluator that rapidly identifies common programming language patterns in text to detect code snippets and determine their likely programming language.

Code Detection - Search

Use Case: Unauthorized Code Detection
Analytic Engine: Search
Related OWASP Risks:
- LLM05: Improper Output Handling
- LLM06: Excessive Agency
Related Regulations:
- EU AI Act - Output Safety
- GDPR - Data Protection
- NIS Directive - Cybersecurity
- EAR - Export Control
Valid Inputs: Text
Scope: Full Exchange
Last Update: 2025 02 14
License: Apache 2.0
Dependencies: N/A

Detailed Description

The Code Detection - Search Evaluator uses a collection of carefully curated regular expressions to identify programming language-specific patterns in both Prompts and Responses. It's designed to quickly detect code snippets and classify them into specific programming languages based on their syntax patterns.

For more advanced detection capabilities, especially with obfuscated or modified code, consider using the Code Detection - Semantic Evaluator. The Search and Semantic Evaluators can be used together for comprehensive code detection - Search providing fast initial screening and Semantic providing deeper analysis.

It currently detects the following languages:

Python
Dart
WebAssembly
Rust
x86/x86_64
C
C++
C#
Objective C

Initial search terms adapted from NVIDIA's garak.

Input Descriptions:

The Evaluator accepts the raw text input from both Prompt and Response Events.

Law Usage Example:

This Response would trigger the Evaluator since it contains code patterns:

Triggering Example
Here's a simple Python function to calculate factorial:

def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)

This Response would not trigger the Evaluator since it contains no code patterns:

Non-Triggering Example

    The lambda of an idea should always be concise, much like how one might map over thoughts in a list of considerations.

Python Detector

This Response would trigger the Evaluator since it contains a function definition (def), which matches the regex for detecting Python code:

Triggering Example
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)

This Response would not trigger the Evaluator since it lacks any identifiable Python syntax:

Non-Triggering Example

We definitely need to print out our "Hello, World" imports to verify them.

Dart Detector

This Response would trigger the Evaluator since it includes File, a class commonly used in Dart I/O operations:

Triggering Example
import 'dart:io';
File file = File('example.txt');
print(file.readAsStringSync());

This Response would not trigger the Evaluator since it lacks any Dart-specific syntax:

Non-Triggering Example

This is a simple text file for reading and writing.

WebAssembly Detector

This Response would trigger the Evaluator since it contains f32.add and param i32, which match the WebAssembly regex pattern:

Triggering Example
(func $add (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add)

This Response would not trigger the Evaluator since it does not contain any WebAssembly syntax:

Non-Triggering Example

Adding two numbers together is a common operation in programming.

Rust Detector

This Response would trigger the Evaluator since it contains fn main() and let x =, which match Rust syntax:

Triggering Example
fn main() {
    let x = 10;
    println!(\"Value: {}\", x);
}

This Response would not trigger the Evaluator since it does not contain any Rust-specific syntax:

Non-Triggering Example

A variable holds a value and can be changed over time.

x86_64 Assembly Detector

This Response would trigger the Evaluator since it contains register names like rsi and rax, which match the x86_64 pattern:

Triggering Example

mov rax, rsi
add rax, rdx

This Response would not trigger the Evaluator since it does not contain any x86_64-specific syntax:

Non-Triggering Example

Computers process instructions using their CPU registers.

x86 Assembly Detector

This Response would trigger the Evaluator since it contains mov, cmp, and jnz, which match x86 instruction patterns:

Triggering Example
mov ax, bx
cmp ax, 10
jnz label

This Response would not trigger the Evaluator since it does not contain any x86-specific syntax:

Non-Triggering Example

Branching logic helps in conditional operations.

C Detector

This Response would trigger the Evaluator since it contains #include, int main(), and fprintf(), which match C syntax:

Triggering Example
#include <stdio.h>
int main() {
    fprintf(stdout, \"Hello, world!\\n\");
    return 0;
}

This Response would not trigger the Evaluator since it does not contain any C-specific syntax:

Non-Triggering Example

Printing text is useful for debugging programs.

C++ Detector

This Response would trigger the Evaluator since it contains #include, int main(), and std::cout, which match C++ syntax:

Triggering Example
#include <iostream>
using namespace std;
int main() {
    cout << \"Hello, world!\" << endl;
    return 0;
}

This Response would not trigger the Evaluator since it does not contain any C++-specific syntax:

Non-Triggering Example

A standard library provides common utilities in programming languages.

C# Detector

This Response would trigger the Evaluator since it contains using System; and namespace MyApp, which match C# syntax:

Triggering Example
using System;
namespace MyApp {
    class Program {
        static void Main() {
            Console.WriteLine(\"Hello, world!\");
        }
    }
}

This Response would not trigger the Evaluator since it does not contain any C#-specific syntax:

Non-Triggering Example

Namespaces help organize code and avoid name conflicts.

Objective-C Detector

This Response would trigger the Evaluator since it contains @interface, @implementation, and @end, which match Objective-C syntax:

Triggering Example
#import <Foundation/Foundation.h>
@interface MyClass : NSObject
- (void)sayHello;
@end

@implementation MyClass
- (void)sayHello {
    NSLog(@\"Hello, world!\");
}
@end

This Response would not trigger the Evaluator since it does not contain any Objective-C-specific syntax:

Non-Triggering Example

Classes and objects help encapsulate functionality in object-oriented programming.

Output Descriptions:

Returns a Finding containing Boolean values for each language detected:

Finding Structure
{
    "name": "Code Detection Search",
    "matched": [True/False],

}

Configuration Options:

Data & Dependencies

Data Sources

An 80/20 training-test split from macrocosm-os/code-parrot-github-code was used to fine-tune the Search.

Benchmarks

The Code Detection - Search has been tested against a benchmark dataset with two different configurations:

Dataset	Sample Size	Accuracy	Precision	Recall	F1 Score
ThirdLaw Code Benchmark	51,337 github snippets	80.3%	76.2%	65.4%	70.4%

Benchmarks last updated: March 2025

Usage Examples

Here's how to incorporate the Code Detector - Search in your Law:

ThirdLaw DSL
if CodeDetection-Search in ScopeType then run InterventionType

Security, Compliance & Risk Assessment

Compliance & Privacy:

EU AI Act - supports EU AI Act compliance through the continuous monitoring of high-risk AI systems that can generate and run their own code.
NIS Directive - supports cybersecurity by detecting potentially malicious code execution
GDPR - supports data protection by identifying and flagging sensitive information in code
EAR - helps identify and control potential dual-use code generation

Revision History:

2025-02-18: Initial release

Initial pattern library for 10 programming languages
macrocosm-os/code-parrot-github-code benchmark results
Initial documentation

Documentation Updated: 2025 02 22

Detailed Description​

Input Descriptions:​

Law Usage Example:​

Python Detector​

Dart Detector​

WebAssembly Detector​

Rust Detector​

x86_64 Assembly Detector​

x86 Assembly Detector​

C Detector​

C++ Detector​

C# Detector​

Objective-C Detector​

Output Descriptions:​

Configuration Options:​

Data & Dependencies​

Data Sources​

Benchmarks​

Usage Examples​

Security, Compliance & Risk Assessment​

Compliance & Privacy:​

Revision History:​

2025-02-18: Initial release​

Detailed Description

Input Descriptions:

Law Usage Example:

Python Detector

Dart Detector

WebAssembly Detector

Rust Detector

x86_64 Assembly Detector

x86 Assembly Detector

C Detector

C++ Detector

C# Detector

Objective-C Detector

Output Descriptions:

Configuration Options:

Data & Dependencies

Data Sources

Benchmarks

Usage Examples

Security, Compliance & Risk Assessment

Compliance & Privacy:

Revision History:

2025-02-18: Initial release