Datasets

Datasets define the relationships between your views and their metrics.

Datasets: Organizing views and defining relationships

A dataset is a collection of views and the relationships between them, defining the structured tables that users interact with when querying data from the catalog in the canvas. Datasets should be designed to serve a business objective or function. This approach ensures that datasets are meaningful, actionable, and aligned with business needs.

Each dataset is stored in a separate YAML file and can be built using one or more views. When combining views, you must define join types (e.g., one-to-many, many-to-one) to ensure correct aggregation and maintain symmetrical data aggregation. Once configured, datasets automatically generate SQL based on predefined logic, enabling accurate and consistent data exploration within the canvas.


Dataset YAML schema

The schema for datasets definitions is displayed next to the YAML editor in Count. Expand the section below for an example dataset definition.

Show the dataset YAML schema

name: workspaces
                    
label: Workspaces

from: workspaces

description: Workspace and user summary dataset over time.

join:
  - view: integrations
    constraint: integrations.workspace_id = workspaces.workspace_id
    relationship: one_to_many

  - view: events
    constraint: events.workspace_id = workspaces.workspace_id
    relationship: one_to_many

  - view: user_permissions
    constraint: user_permissions.workspace_id = workspaces.workspace_id
    relationship: one_to_many

  - view: users
    constraint: users.user_id = user_permissions.user_id
    relationship: one_to_many

  - view: workspace_users_over_time
    constraint: workspace_users_over_time.workspace_id = workspaces.workspace_id
    relationship: one_to_many

Why datasets matter

Datasets are essential for ensuring that views are properly related and aggregated. They:

  • Maintain data integrity by defining relationships between views with appropriate join types (one_to_many, many_to_one, etc.).

  • Enable symmetric aggregation, ensuring that aggregations are consistently applied across joined views and avoid discrepancies in results.

  • Simplify querying by automatically generating SQL, reducing the need for manual writing.

  • Ensure consistency in queries and data explorations in the canvas.

Building Dataset

The next page will guide you through creating and customizing your datasets.

  1. Creating Datasets – Learn how to set up new datasets to organize your data effectively.

Last updated