DataGyro Evaluate
DataGyro’s Evaluate vertical provides a robust solution for assessing the performance of your Information Retrieval (IR) systems. It mimics the industry-standard TREC (Text REtrieval Conference) evaluation process, offering a streamlined way to measure and improve your system’s effectiveness. Evaluate is currently a fully separate system from Search and is composed of four main components:Datasets
Upload your data to automatically generate queries and relevance judgments (qrels).
Benchmarks
Run your custom IR algorithm against generated queries and get comprehensive metric scores.
Explore
Inspect the generated queries and qrels to understand the evaluation data.
Compare
Compare your algorithm’s results against standard benchmarks like BM25.
Core Evaluation Process
The Evaluate vertical allows you to:- Automate TREC-Style Evaluation: DataGyro automates the creation of test queries and relevance judgments (qrels) directly from your datasets. This eliminates the manual effort typically associated with preparing evaluation sets.
- Run Against Your IR Algorithm: You can then run these automatically generated queries and qrels against your own Information Retrieval algorithm.
- Generate Comprehensive TREC Reports: After the evaluation run, DataGyro generates a detailed TREC report. This report includes a suite of standard IR evaluation metrics, such as:
- nDCG (normalized Discounted Cumulative Gain): Measures the ranking quality of your results.
- Recall: Indicates the proportion of relevant documents retrieved.
- MAP (Mean Average Precision): Provides an overall measure of precision across different recall levels.
- Precision: Shows the proportion of retrieved documents that are relevant.
- And other key metrics to give you a holistic view of your system’s performance.
- Systematically Improve Performance: By leveraging these insights, you can systematically measure, benchmark, and improve the effectiveness of your IR system.
How Evaluate Works (Conceptual)
- Upload Data (
Datasets): Provide the dataset you want to use for evaluating your IR system. DataGyro processes this to create test queries and relevance judgments. - Run Benchmark (
Benchmarks): Configure and run an evaluation using your IR algorithm’s endpoint against the generated queries. - Analyze Report: Review the comprehensive TREC report generated by DataGyro to understand your system’s performance.
- Deep Dive (
Explore,Compare): Further inspect the evaluation data and compare results to baselines.