Python cells
Last updated
Last updated
Create a Python cell by either:
Using the Y
keyboard shortcut to place a new cell
Selecting the Python cell option from the control bar
Selecting the Add Python cell option when referencing a cell from the + icon that appears when the cell is selected.
Python cells work very similarly to SQL cells, consisting of a text input area and an output area. Python cells are reactive just like all other cells, and their relationships are indicated by the same connector lines.
In Count, Python cells are executed locally in your browser using a version of Python that has been developed to work on the web. When you first execute a Python cell, the Python environment is downloaded and started, which may take a few seconds the first time.
In Python cells, there is a special global cells
variable that contains the results of other cells formatted as pandas DataFrames. Access cell results using keys or attributes on this object:
When importing non-Python cells as DataFrames, text columns may be imported as a categorical series if the cardinality of the column is low. This lowers the memory usage of the column, which is helpful when importing large result sets.
Most operations work the same way on categorical vs non-categorical columns, but if a non-categorical form is required, then use the astype
method:
All variables defined at the root scope of a Python cell are global, and can be accessed in any other Python cell. Count will detect relationships between Python cells based on the variables they reference, and add connector lines automatically.
Because Count executes cells reactively, it's possible to accidentally create circular dependencies. In this case, like other cells, Python cells will display an error and refuse to execute:
The last expression in a Python cell is special, and becomes the single output of that cell. If this output can be represented as a table, it can be queried by local DuckDB cells too:
As Python is a more expressive language than SQL, it is able to output more data types:
Table output - if the final expression of the cell is representable as a table
Image output - if the final expression of the cell is a PNG-formatted image bytes
object
Logs output - if the cell has printed anything during its execution
The output type defaults to Automatic, which can be overridden from the Output type button above the cell.
Because Python cells work just like any other cell, you can use control cells to add interactivity to any Python cell. In the example below, the parameters of a plot are adjustable using control cells:
If a Python cell returns an object with an HTML representation, a View output button will appear. Clicking this button will cause the output to be displayed in the output section of the cell.
Only one HTML output can be displayed at once - if another output is shown, the previous output will be hidden. To close an output, click the Close rich output button from the floating cell controls:
An object is considered to have HTML output if:
It has a method called _ipython_display_
which returns a string
It has a method called _repr_html_
which returns a string
It has a method called _repr_mimebundle_
which returns a dict with a key text/html
Any HTML output is contained within a sandboxed iframe, so some functionality may be restricted. Most existing packages which conform to the IPython standard methods described above should work with Count. If you encounter a package which does not work as expected, please contact Count support.
To load a module, just import it as usual and Count will attempt to automatically download it and make it available. The first import of a new package may take a few seconds for this reason.
As your Python code is running in your browser, there are some restrictions on the modules that you can load. Available packages include:
pandas
numpy
matplotlib
scipy
scikit-learn
Any module hosted on PyPi that is written in pure Python
Any module specifically built for running in the browser. Many popular data-focussed modules are already supported, with more on the way. See the full list here.
The popular requests
and urllib3
modules are supported in Count, with the exception of streaming responses - responses are always loaded fully into memory.
When making a network request from Python, a description of the request is sent to a Count server, and the actual request is performed by an ephemeral virtual machine. The response is then proxied back through a Count server, and returned to your browser. Count does not read the response, though it may read the request to inject secrets (see below).
Maximum request payload limits apply, so network requests will fail if they attempt to send too much data.
If your project has any secrets configured, it is possible to use these secrets in network requests. Just format the secret name (not value!) using the secret
method exposed by the count_requests
module. The result of this method is just another string which looks like $$abc123$$
, so it can be used in other places a string is expected. For example:
There are some security considerations to note when using secrets in Python cells:
Secrets can only be used in network requests.
Secret values are inserted into the network request once it arrives at a Count server, and then sent to the URL specified in the request.
Secret values are not accessible from the Count app regardless of your permission level.
If you grant edit access to a canvas, you should assume that the editor will be able to access any secret you have defined by, for example, sending it to a URL that they control.
You should never directly enter a secret value into the text of a Python cell, as it will be visible to all viewers of that canvas, even if the cell input is hidden.
Be careful when using libraries which transform secrets. For example, instead of using the auth
parameter in requests
(which automatically base64-encodes its arguments), construct the Authorization header manually.
Unlike in a Jupyter notebook where cells execute top-to-bottom, in Count your Python cells execute in DAG order, just like SQL cells.
When opening a canvas containing Python cells, Count will:
Analyse all of the Python code and look for variable definitions and references.
Draw arrows between cells if we’ve detected that variables defined in one cell are referenced in another.
Download all of the imported modules and perform all of the imports.
Perhaps wait for any SQL cells to finish executing if any Python cells depend on them.
Execute the Python cells in order based on which variables they define.
Currently it is not possible to stop the execution of a single Python cell. Instead, you can choose to restart the Python kernel - this will clear all variables from memory and re-initialise the Python session, and all of your Python cells will be re-executed in DAG order.
When importing a Jupyter notebook, a common pitfall is to have variables defined multiple times in different cells. If this happens, the cells in Count may not execute in the order you expect (though it will be the same order every time the canvas is opened).
You can always check where Count thinks your variables have been defined by following the cell connector lines to their source:
Not all packages are supported in Python cells, but many are. You can import a package in a Python cell by importing it as usual. If that doesn't work, then there are some other options you can try.
Step 1: Use micropip
Micropip is a library for installing Python packages, and is used like:
If this step doesn't succeed, the cause might be:
The package contains non-Python native extensions, which need to be handled specially. Many popular packages have already been compiled to work in a web browser, but if not the process is quite tricky even for Python experts - see the instructions here. In many cases, it will not be possible to support this package. To check whether or not a given package uses native extensions, follow the steps below.
The package depends on other packages with incompatible versions. In this case it is often possible to install the package by using an older version. Follow the steps below
To determine if your package is supported, you can try following these steps:
Step 2: Find the package on PyPI.
You can use the built-in search function, or try a search engine query for "<package name> PyPi". An example page for a package looks like this:
Step 3: Find the.whl file for the package
Click the Download files link, and look under the Build Distribution list for a file name that contains none-any
:
If the filename ends with this, it means that the package is pure Python, and it should be possible to load it in Count.
If you can't find a .whl
file that ends with none-any
, then it is likely that the package contains non-Python extensions, and will need to be compiled to work in the browser.
Step 4: Load the .whl
file in Count
Copy the link to the .whl
file (right-click > Copy link address), and import it in Count using micropip:
Step 5 (optional): Install an older package version
If the latest version of a package can't be installed, sometimes an earlier version can be. To find older package versions, click the Release history link and choose an older version. Then, follow steps 3 and 4 to try an older .whl
file.
When executing a Python cell, the following steps are performed:
Imported packages are downloaded and installed (if needed)
Upstream cells are loaded into Pandas DataFrames
Your Python code is executed
The result of your Python code is converted into a table to be displayed
Most of these steps should be quite quick. If a cell is slow, first try profiling your code to see if there are any particularly slow parts. For example, you could use the cProfile
package to define a profile
function:
which you would then use as follows:
In this (trivial) example, it is clear that all of the time is spent in the regular expression constructor method __init__
.
If you do not identify any long-running parts of your code, and are concerned that your Python cells are slow for other reasons, please contact Count support.