Evaluate: Benchmarks
The “Benchmarks” component of DataGyro’s Evaluate vertical allows you to test your custom Information Retrieval (IR) algorithm against the queries generated from your datasets and receive a comprehensive performance report.Setting Up a Benchmark
To run a benchmark, you will need:- A “Ready” Dataset: This means you have already uploaded your data in the Datasets section, and DataGyro has successfully generated queries and qrels from it.
- Your IR Algorithm’s URL: DataGyro needs an accessible HTTP endpoint for your IR algorithm. This endpoint should be capable of receiving a query and returning a ranked list of document IDs or results.
Running the Benchmark
- Navigate to the “Evaluate” section and select the “Benchmarks” tab.
- Initiate a new benchmark run.
- Select the “Ready” dataset you want to use.
- Provide the URL for your custom IR algorithm.
- Configure any additional parameters for the benchmark run (e.g., number of results to retrieve per query).
- Start the benchmark.
Scoring and Metrics Generation
As your algorithm returns results for each query, DataGyro compares these results against the pre-generated qrels (relevance judgments) for that dataset. Based on this comparison, DataGyro calculates a wide range of standard TREC evaluation metrics, including (but not limited to):- nDCG (normalized Discounted Cumulative Gain)
- Recall
- MAP (Mean Average Precision)
- Precision
- Other relevant metrics like Mean Reciprocal Rank (MRR), Precision@k, Recall@k, etc.
TREC Report
Once the benchmark run is complete, DataGyro generates a comprehensive TREC report. This report presents all the calculated metric scores, allowing you to:- Understand the overall performance of your IR algorithm.
- Identify strengths and weaknesses across different metrics.
- Track improvements over iterations of your algorithm.