DataGyro Evaluate

DataGyro’s Evaluate vertical provides a robust solution for assessing the performance of your Information Retrieval (IR) systems. It mimics the industry-standard TREC (Text REtrieval Conference) evaluation process, offering a streamlined way to measure and improve your system’s effectiveness. Evaluate is currently a fully separate system from Search and is composed of four main components:

Datasets

Upload your data to automatically generate queries and relevance judgments (qrels).

Benchmarks

Run your custom IR algorithm against generated queries and get comprehensive metric scores.

Explore

Inspect the generated queries and qrels to understand the evaluation data.

Compare

Compare your algorithm’s results against standard benchmarks like BM25.

Core Evaluation Process

The Evaluate vertical allows you to:

Automate TREC-Style Evaluation: DataGyro automates the creation of test queries and relevance judgments (qrels) directly from your datasets. This eliminates the manual effort typically associated with preparing evaluation sets.
Run Against Your IR Algorithm: You can then run these automatically generated queries and qrels against your own Information Retrieval algorithm.
Generate Comprehensive TREC Reports: After the evaluation run, DataGyro generates a detailed TREC report. This report includes a suite of standard IR evaluation metrics, such as:
- nDCG (normalized Discounted Cumulative Gain): Measures the ranking quality of your results.
- Recall: Indicates the proportion of relevant documents retrieved.
- MAP (Mean Average Precision): Provides an overall measure of precision across different recall levels.
- Precision: Shows the proportion of retrieved documents that are relevant.
- And other key metrics to give you a holistic view of your system’s performance.
Systematically Improve Performance: By leveraging these insights, you can systematically measure, benchmark, and improve the effectiveness of your IR system.

How Evaluate Works (Conceptual)

Upload Data (Datasets): Provide the dataset you want to use for evaluating your IR system. DataGyro processes this to create test queries and relevance judgments.
Run Benchmark (Benchmarks): Configure and run an evaluation using your IR algorithm’s endpoint against the generated queries.
Analyze Report: Review the comprehensive TREC report generated by DataGyro to understand your system’s performance.
Deep Dive (Explore, Compare): Further inspect the evaluation data and compare results to baselines.

Dive into each component’s documentation to learn more about its specific functionalities. Detailed documentation on setting up datasets, configuring evaluation runs, and interpreting reports will be added soon.

Getting Started

Search

Evaluate

Introduction to Evaluate

DataGyro Evaluate

Datasets

Benchmarks

Explore

Compare

Core Evaluation Process

How Evaluate Works (Conceptual)

Getting Started

Search

Evaluate

​DataGyro Evaluate

Datasets

Benchmarks

Explore

Compare

​Core Evaluation Process

​How Evaluate Works (Conceptual)

DataGyro Evaluate

Core Evaluation Process

How Evaluate Works (Conceptual)