DuckDB on the server
Last updated
Last updated
When cells return a large results set, by default Count will only return the first 10,000 rows of the results to your browser. When this happens, the footer of the cell will contain some information about the full result set:
You can choose to download all of the results, in which case downstream Python and DuckDB cells will continue to run in your browser.
Otherwise, any downstream DuckDB cells will be executed on Count's servers rather than in your browser. When running on the server:
The full results of any upstream cells are always available
DuckDB may have access to more working memory than it does when running in a browser
Queries may fail if they use too much memory, or attempt to access files on the local filesystem
The versions of DuckDB in your browser and on Count's servers are the same, so you shouldn't notice any difference when running queries.
You may also see DuckDB cells running on the server when your local DuckDB database is nearing its memory limit.
When using DuckDB on the server, your queries are executed in isolated virtual machines (VMs). These VMs execute one query at a time, and have limits placed upon them which may depend on your subscription plan:
Total available RAM - this is the total amount of RAM that the VM can use, which includes the space it requires to load any parent cells, execute the query, and stream the query results out of DuckDB. The VM uses a minimal operating system, which adds approximately 100-150MB of RAM overhead while running the query.
Maximum result size - Count will stop streaming the results out of DuckDB once the total uncompressed size of the results (in Arrow format) reaches the limit configured for your workspace. This limit is typically half of the RAM size allocated to the VM.
Total query duration - queries are terminated after they have been running for 1 hour.
Maximum query concurrency - each workspace can execute up to 100 concurrent DuckDB queries on the server. (DuckDB queries in the browser have no concurrency limits.)
Why is the maximum result size less than the total available RAM?
When executing queries, the VM needs memory space to load its operating system and allocate memory during the processing of the query. (Results are always streamed out of the VM, so their impact on memory usage is minimal.)
Additionally, it is common for queries to reference multiple parent cells, which requires loading multiple parent result sets into VM memory.
For these reasons, Count uses a default factor of 2 between the VM RAM size, and the maximum query result size. If you find that this default is not appropriate for your workload, please contact Count support.