Skip to main contentEvaluate: Datasets
The “Datasets” component in DataGyro’s Evaluate vertical is your starting point for any Information Retrieval (IR) system evaluation. Its primary purpose is to take your raw data and automatically prepare it for a TREC-style evaluation by generating queries and relevance judgments (qrels).
Uploading Your Data
Currently, DataGyro supports uploading datasets in the JSONL (JSON Lines) format. Support for other formats like plain JSON and Parquet is planned for future releases.
To upload a dataset:
- Navigate to the “Evaluate” section in DataGyro and select the “Datasets” tab.
- Click on “Upload Dataset” (or a similar button).
- Choose your JSONL file for upload.
Automatic Query and Qrel Generation
Once you upload your JSONL file, DataGyro works in the background to:
- Parse your data: Understand the structure and content.
- Generate Queries: Create a set of test queries based on the information within your dataset. These queries are designed to be representative of how a user might search for information within that data.
- Generate Qrels (Relevance Judgments): For each generated query, DataGyro identifies and marks the relevant documents or passages within your dataset. These form the ground truth for the evaluation.
This automated process saves significant time and effort compared to manually creating query sets and relevance judgments.
Dataset Status: “Ready”
After the background processing is complete, your dataset’s status will change to “Ready”. This indicates that the queries and qrels have been successfully generated, and the dataset can now be used in the “Benchmarks” component to evaluate an IR algorithm.
Future Enhancements
- Support for additional file formats (JSON, Parquet).
- More configuration options for the query and qrel generation process.
Next, learn how to use your “Ready” dataset in Evaluate: Benchmarks.