Semantic Similarity
The Semantic Similarity Analytic Engine performs deep semantic analysis of text to identify thematic relationships, conceptual similarities, and content patterns against pre-defined reference collections.
-
Use Case: Identify conceptual and thematic relationships in content
-
Technology: Vector-based semantic comparison
-
Valid Inputs: Text / Image / Audio / Video / Documents in Exchanges
-
Available Evaluators:
-
Last Engine Update: 2025-03-03
-
Dependencies: Vector database
Detailed Description
The Semantic Similarity Analytic Engine builds on vector embeddings to detect higher-level semantic relationships and thematic similarities between input content and reference collections. Unlike the Search engine that focuses on direct similarity matches, this engine specializes in understanding broader semantic connections and conceptual relationships. All vector embeddings and databases used by this engine are maintained within the ThirdLaw VPC to ensure data security and privacy.
How It Works
The Semantic Similarity Analytic Engine processes content by analyzing vector embeddings to identify thematic and conceptual similarities. When an Evaluator using this engine is initialized, it connects to a vector database that contains pre-computed embeddings of various reference collections. These collections typically represent different categories of content, such as code samples, legal documents, or controversial topics.
When processing input content, the engine first converts the input into vector embeddings compatible with the reference collections. It then compares these embeddings against multiple reference collections simultaneously through vector similarity operations. The engine calculates similarity scores between the input and each reference collection, taking into account the semantic proximity and conceptual overlap rather than exact text matches.
After comparison, the engine ranks the collections based on their semantic proximity to the input, identifying which reference collections are most similar to the input content. To provide a comprehensive assessment, the engine calculates average similarity scores across all segments of the input, enabling detection of semantic relationships even when only portions of the content match a reference collection. This approach makes the Semantic Similarity Analytic Engine particularly effective for identifying conceptual similarities that might be missed by pattern-based approaches.
Configuration Options
The Semantic Similarity Analytic Engine supports the following configuration parameters:
| Parameter | Description | Default |
|---|---|---|
collections | List of collections to analyze against | Required |
desired_collections | List of collections that should be closest to the input | Required |
top_n | Number of top results to retrieve per query | 5 |
Finding Structure
A generic Evaluator based on the Semantic Similarity Analytic Engine returns a Finding with the following structure. The fields under collection_name are repeated, one for each defined collection in the Evaluator.
{
"EvaluatorName-Semantic": True/False, # Default Finding (duplicate of finding.any)
"EvaluatorName-Semantic.found": True/False, # Returns True if the closest collection matches one of the desired collections
"EvaluatorName-Semantic.closest_collection": "string", # Returns the name of the closest collection
"EvaluatorName-Semantic.closest_collection.max_similarity": "string", # Returns the max similarity of the closest collection
"EvaluatorName-Semantic.collection.max_similarity": [0-1], # Returns the maximum similarity score achieved by the collection
}
Available Evaluators
The following table lists common Evaluators that can be created using the Semantic Similarity Analytic Engine:
| Evaluator Name | Description | Common Use Cases |
|---|---|---|
| Code Detection | Detects code patterns and programming language constructs in text | Unauthorized code detection, vulnerability analysis |
Dependencies
- Vector Database: Vector database for storing and searching embeddings
- Embedding Pipeline: Requires pre-computed embeddings from a Search Engine or similar processor
Revision History
- 2025-03-03: Initial documentation creation