Skip to main content

Collections

As a fundamental component of DataGyro’s Search vertical, Collections are logical groupings of data that allow you to organize your information for natural language search and LLM retrieval.

What are Collections?

Collections serve as an abstraction layer over your data sources, enabling you to query your data using natural language. Currently, collections have a 1:1 mapping with data sources - each collection is created from a single data source. Collections are the queryable interface to your data sources. While data sources provide the connection to your data, collections transform that data into a format optimized for LLM retrieval, handling all the complex work of chunking, embedding, and indexing automatically.

Creating Collections

Creating a collection is simple and straightforward:
  1. Navigate to the “Collections” section in your project
  2. Click “New Collection”
  3. Enter a name for your collection
  4. Select the data source you want to use
  5. Click “Create Collection”
That’s it! Your collection will now begin processing.

Collection Processing Time

After creating a collection, our systems need time to process your data:
  • We sync the data from your database (for SQL connection strings) or import your SQL dump file
  • Automatically analyze and chunk your data optimally for retrieval
  • Generate high-quality embeddings using the best models for your data type
  • Create hybrid search indexes that combine semantic and keyword search capabilities
  • Optimize everything for fast, accurate retrieval
This process usually takes about 5-10 minutes for large datasets. You can see real-time updates on the progress in the Collections tab.

Automatic Updates for SQL Databases

For collections connected to SQL databases via connection strings, any changes you make to your database will be automatically pulled and applied to your collection! This ensures your collection always stays in sync with your source data.
To enable automatic updates for SQL databases, you’ll need to configure your database properly by running the following SQL commands as a database administrator. These commands set up logical replication and create a dedicated user with the necessary permissions for DataGyro to monitor changes:Replace the placeholders:
  • Replace xxxxxxxx with a secure password for the DataGyro user
  • Replace <DB_NAME> with your actual database name
  • Replace <SCHEMA_NAME> with your schema name (repeat the schema commands for each schema you want to sync)
-- Create publication for logical replication (required for PostgreSQL CDC)
CREATE PUBLICATION data_gyro_publication FOR ALL TABLES;

-- Create a dedicated user for Data Gyro
CREATE ROLE datagyro WITH LOGIN PASSWORD 'xxxxxxxx' REPLICATION;

-- Create priveleges for every Schema --
GRANT ALL PRIVILEGES ON DATABASE <DB_NAME> TO datagyro;
GRANT ALL PRIVILEGES ON SCHEMA <SCHEMA_NAME> TO datagyro;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA <SCHEMA_NAME> TO datagyro;
GRANT USAGE ON SCHEMA <SCHEMA_NAME> TO datagyro;
ALTER ROLE datagyro WITH REPLICATION;
After running these commands, provide the database connection details including the datagyro user credentials when setting up your data source in DataGyro.

Collection Schema

Each collection inherits its schema from the underlying data source:
  • Fields: The columns or attributes from your data source
  • Data Types: The type of data each field contains (string, number, date, etc.)
  • Primary Keys: Fields that uniquely identify records in your collection

Managing Collections

Viewing Collection Details

To view details about a collection:
  1. Go to the “Collections” section
  2. Click on the collection name
  3. Review the schema and data preview

Deleting Collections

To delete a collection:
  1. Go to the “Collections” section
  2. Find the collection you want to delete
  3. Click the “Delete” button
  4. Confirm the deletion
Deleting a collection cannot be undone. Make sure you no longer need the collection before deleting it.

Coming Soon: Advanced Collection Features

We’re actively working on expanding the capabilities of collections. In the near future, collections will allow you to:

1. Combine Multiple Data Sources

Create unified collections that bring together data from different sources, enabling cross-source natural language queries and comprehensive retrieval.

2. Create Custom Views

Define specific views of your data focused on particular use cases, optimizing retrieval for different LLM-powered features in your product.

3. Apply Filters

Set up persistent filters to refine the search scope, ensuring your LLMs always work with the most relevant data subsets.