Datasets
Last updated
Last updated
A dataset is a collection of views and the relationships between them, defining the structured tables that users interact with when querying data from the catalog in the canvas. Datasets should be designed to serve a business objective or function. This approach ensures that datasets are meaningful, actionable, and aligned with business needs.
Each dataset is stored in a separate YAML file and can be built using one or more views. When combining views, you must define join types (e.g., one-to-many, many-to-one) to ensure correct aggregation and maintain symmetrical data aggregation. Once configured, datasets automatically generate SQL based on predefined logic, enabling accurate and consistent data exploration within the canvas.
Datasets are essential for ensuring that views are properly related and aggregated. They:
Maintain data integrity by defining relationships between views with appropriate join types (one_to_many, many_to_one, etc.).
Enable symmetric aggregation, ensuring that aggregations are consistently applied across joined views and avoid discrepancies in results.
Simplify querying by automatically generating SQL, reducing the need for manual writing.
Ensure consistency in queries and data explorations in the canvas.
The next page will guide you through creating and customizing your datasets.