Skip to main content

Evaluate: Compare

The “Compare” component in DataGyro’s Evaluate vertical allows you to benchmark your Information Retrieval (IR) algorithm’s performance not just in isolation, but also against standard retrieval models like BM25. Currently, the primary comparison offered is against BM25 (Best Matching 25), a widely recognized and strong baseline in information retrieval. DataGyro also plans to include its own custom IR algorithm for comparison in the near future (this will be made public soon).

Why Compare?

Comparing your algorithm’s results with established baselines like BM25 helps you:
  • Contextualize Performance: Understand if your custom algorithm is performing better or worse than well-known methods on your specific dataset and queries.
  • Identify Areas for Improvement: If BM25 outperforms your algorithm on certain types of queries, it can highlight areas where your model might need refinement.
  • Validate Innovations: Demonstrate the added value of your custom IR solution if it consistently surpasses standard baselines.
  • Per-Query Analysis: The comparison often allows you to see side-by-side results for individual queries, showing which documents your algorithm retrieved versus what BM25 retrieved, and how their relevance scores differ.

How Comparison Works

  1. Select a Benchmark Run: You typically start by selecting a completed benchmark run from the Benchmarks section, which contains the results from your custom IR algorithm.
  2. Choose a Baseline: Select BM25 (and in the future, DataGyro’s custom algorithm) as the baseline for comparison.
  3. View Comparative Results: DataGyro will then display:
    • Overall metric comparisons (e.g., nDCG for your algorithm vs. nDCG for BM25).
    • Per-query result comparisons, allowing you to drill down into specific examples.
This feature is invaluable for rigorous evaluation and for showcasing the competitive edge of your IR system. After comparing, you might revisit the Datasets to refine your data or the Explore section to better understand query characteristics.