Dynamic query compilation engine

Literally magic.

Count features a unique dynamic query compilation engine that enables unmatched flexibility in how you write and interact with SQL and no-code cells in the canvas. It is designed to harness the speed of modern cloud data warehouses while minimizing data movement and keeping your data secure.

How does the dynamic query compilation engine work?

At query runtime, Count dynamically compiles the SQL for each cell and sends it to the appropriate database, then renders the result. When two or more cells are joined together in the DAG, Count will include all SQL necessary to execute each cell - including the SQL from any parent cells. For example:

  • Cells a and b are SQL cells, cell b contains the SQL select * from a.

  • At runtime, Count sends the SQL for cell a to the database as-is (potentially adding a limit).

  • At runtime, Count generates the SQL for cell b which looks like with a as (<query for a>), select * from a

Because the query for each cell contains all of the necessary SQL, all queries can be executed in parallel. Due to this fact, it is common to see the results of some cells appear before their parents have finished executing, particularly if the results for the parent cell are larger.

Remember, each cell is a compilation of CTEs - you can see the compiled SQL at any time and export it to your database.

How does query compilation work for local cells?

For any local cells in your DAG, query compilation and execution behaves slightly differently:

  • If a local cell references a parent cell from a data warehouse, the SQL for the parent cell is not compiled into the local cell. Rather, the parent cell is treated as if it is an external database table.

  • Local cells have to wait until their parent cells have finished executing before they can be executed, as local cells operate on cell results.

What are the benefits of dynamic query compilation?

By compiling SQL dynamically at runtime and relying on your underlying database for computation, Count enables the following:

  • Reusability - define a metric in one cell, then query that cell in downstream analysis.

  • Reactivity - changes flow through the DAG and cause downstream cells to update automatically.

  • Auto-lineage - Count builds the DAG for you by checking which tables you reference in your cells.

  • Control over data locality - you don't need to pull all of your database results into the canvas to benefit from cell referencing.

  • Performance - Count takes full advantage of the capabilities of your cloud data warehouse for scalability and query speed.

  • Write in native SQL - whatever your database, you can use your native SQL syntax to work in Count rather than learning a tool specific query language.

When are cells executed?

Cells are executed in response to several events. Here 'cell' could mean a SQL cell, visual cell, control cell etc.

  • A canvas is opened, and the cell is visible or referenced by a visible cell.

  • A report is opened, and the cell is in one of the report frames or referenced by one that is.

  • The input for a cell is edited.

  • The input of a parent of a cell is edited (this means that editing a single cell can cause multiple cells to execute).

  • A cell is duplicated or pasted into the canvas.

  • A new reference in a local cell is created to a cell that has not previously executed.

  • A new dynamic text reference is created to a cell that has not previously executed.

  • A frame is unhidden, and contains cells which have not previously executed.

  • The view is switched from a report to a canvas, which contains cells that have not previously executed.

To disable automatic execution of cells that run on your data warehouse, select this option in the workspace settings or in the canvas query execution menu.

Appendix: What is a DAG?

You'll see this acronym crop up a lot in our documentation - DAG stands for 'directed acyclic graph'. Yes, this is a bit of computer-science jargon, but it's also a crucial concept that's becoming more relevant in data analytics, and is already important in data engineering. Let's break it down:

  • Directed - the only reason a pair of cells can be related is that one cell is influencing the results of another, and this influence must be in one direction only.

  • Acyclic - it is impossible to construct a cycle of dependencies in a DAG (and therefore Count). For example, if you write two cells which look like a: select * from b and b: select * from a, both cells will display an error.

  • Graph - this is just the generic name for a bunch of boxes with arrows between them (apologies to any reader with a formal background in graph theory).

The DAG is represented in Count by the colored arrows which appear between cells. You don't have to think about it, Count will construct and manage the DAG automatically, but it's very helpful to be aware of the relationships between cells as those determine how cells execute in response to changes.

Last updated