Datasets

Datasets define the relationships between your views and their metrics.

Datasets: Organizing views and defining relationships

A dataset is a collection of views and the relationships between them, defining the structured tables that users interact with when querying data from the catalog in the canvas. Datasets should be designed to serve a business objective or function. This approach ensures that datasets are meaningful, actionable, and aligned with business needs.

Each dataset is stored in a separate YAML file and can be built using one or more views. When combining views, you must define join types (e.g., one-to-many, many-to-one) to ensure correct aggregation and maintain symmetrical data aggregation. Once configured, datasets automatically generate SQL based on predefined logic, enabling accurate and consistent data exploration within the canvas.


Datasets are listed in the Datasets section of the catalog YAML editor.

They can also be seen in the catalog homepage.

Dataset YAML schema

The schema for datasets definitions is displayed next to the YAML editor in Count. Expand the section below for an example dataset definition.

Show the dataset YAML schema
name: matches_and_players
label: Matches and Players

description: Match summary data with player attributes for both winners and losers of each match.

from: matches

join:
  - view: players
    alias: winners
    label: winners
    constraint: winners.player_id=matches.winner_id
    relationship: many_to_one
  
  - view: players
    alias: losers
    label: losers
    constraint: losers.player_id=matches.loser_id
    relationship: many_to_one

Why datasets matter

Datasets are essential for ensuring that views are properly related and aggregated. They:

  • Maintain data integrity by defining relationships between views with appropriate join types (one_to_many, many_to_one, etc.).

  • Enable symmetric aggregation, ensuring that aggregations are consistently applied across joined views and avoid discrepancies in results.

  • Simplify querying by automatically generating SQL, reducing the need for manual writing.

  • Ensure consistency in queries and data explorations in the canvas.

Building Dataset

The next page will guide you through creating and customizing your datasets.

  1. Creating Datasets – Learn how to set up new datasets to organize your data effectively.

Last updated