Skip to main contentEvaluate: Benchmarks
The “Benchmarks” component of DataGyro’s Evaluate vertical allows you to test your custom Information Retrieval (IR) algorithm against the queries generated from your datasets and receive a comprehensive performance report.
Setting Up a Benchmark
To run a benchmark, you will need:
- A “Ready” Dataset: This means you have already uploaded your data in the Datasets section, and DataGyro has successfully generated queries and qrels from it.
- Your IR Algorithm’s URL: DataGyro needs an accessible HTTP endpoint for your IR algorithm. This endpoint should be capable of receiving a query and returning a ranked list of document IDs or results.
Running the Benchmark
- Navigate to the “Evaluate” section and select the “Benchmarks” tab.
- Initiate a new benchmark run.
- Select the “Ready” dataset you want to use.
- Provide the URL for your custom IR algorithm.
- Configure any additional parameters for the benchmark run (e.g., number of results to retrieve per query).
- Start the benchmark.
DataGyro will then systematically send the generated queries (from your chosen dataset) to your IR algorithm’s URL.
Scoring and Metrics Generation
As your algorithm returns results for each query, DataGyro compares these results against the pre-generated qrels (relevance judgments) for that dataset.
Based on this comparison, DataGyro calculates a wide range of standard TREC evaluation metrics, including (but not limited to):
- nDCG (normalized Discounted Cumulative Gain)
- Recall
- MAP (Mean Average Precision)
- Precision
- Other relevant metrics like Mean Reciprocal Rank (MRR), Precision@k, Recall@k, etc.
TREC Report
Once the benchmark run is complete, DataGyro generates a comprehensive TREC report. This report presents all the calculated metric scores, allowing you to:
- Understand the overall performance of your IR algorithm.
- Identify strengths and weaknesses across different metrics.
- Track improvements over iterations of your algorithm.
This report is crucial for data-driven development and optimization of your IR system.
After running a benchmark, you might want to Explore the queries and qrels or Compare your results.