Evaluate: Benchmarks

The “Benchmarks” component of DataGyro’s Evaluate vertical allows you to test your custom Information Retrieval (IR) algorithm against the queries generated from your datasets and receive a comprehensive performance report.

Setting Up a Benchmark

To run a benchmark, you will need:

A “Ready” Dataset: This means you have already uploaded your data in the Datasets section, and DataGyro has successfully generated queries and qrels from it.
Your IR Algorithm’s URL: DataGyro needs an accessible HTTP endpoint for your IR algorithm. This endpoint should be capable of receiving a query and returning a ranked list of document IDs or results.

Running the Benchmark

Navigate to the “Evaluate” section and select the “Benchmarks” tab.
Initiate a new benchmark run.
Select the “Ready” dataset you want to use.
Provide the URL for your custom IR algorithm.
Configure any additional parameters for the benchmark run (e.g., number of results to retrieve per query).
Start the benchmark.

DataGyro will then systematically send the generated queries (from your chosen dataset) to your IR algorithm’s URL.

Scoring and Metrics Generation

As your algorithm returns results for each query, DataGyro compares these results against the pre-generated qrels (relevance judgments) for that dataset. Based on this comparison, DataGyro calculates a wide range of standard TREC evaluation metrics, including (but not limited to):

nDCG (normalized Discounted Cumulative Gain)
Recall
MAP (Mean Average Precision)
Precision
Other relevant metrics like Mean Reciprocal Rank (MRR), Precision@k, Recall@k, etc.

TREC Report

Once the benchmark run is complete, DataGyro generates a comprehensive TREC report. This report presents all the calculated metric scores, allowing you to:

Understand the overall performance of your IR algorithm.
Identify strengths and weaknesses across different metrics.
Track improvements over iterations of your algorithm.

This report is crucial for data-driven development and optimization of your IR system. After running a benchmark, you might want to Explore the queries and qrels or Compare your results.

Getting Started

Search

Evaluate

Benchmarks

Evaluate: Benchmarks

Setting Up a Benchmark

Running the Benchmark

Scoring and Metrics Generation

TREC Report

Getting Started

Search

Evaluate

​Evaluate: Benchmarks

​Setting Up a Benchmark

​Running the Benchmark

​Scoring and Metrics Generation

​TREC Report

Evaluate: Benchmarks

Setting Up a Benchmark

Running the Benchmark

Scoring and Metrics Generation

TREC Report