Databricks

How to connect Databricks to Count.

Click "Connect a database"

Enter your connection details

You'll need:

Hostname and path of your Databricks cluster

You may also need to ensure that your firewall accepts requests from the following IP addresses:

34.107.75.117
35.198.189.90
35.234.110.177
104.155.13.208

Use this guide to obtain your hostname and path Use this guide to obtain an access token

Authentication options

Access token

To connect using an access token, select this option from the Authentication options dropdown, enter the required values and click Connect.

OAuth

To connect using OAuth, you must first create a custom OAuth application in your Databricks account console.

Give a sensible name for the connection (e.g. Count-OAuth) and select "All APIs" for the Access scopes.
Set the Redirect URL to be https://count.co/connection-oauth.
By default, the Refresh token TTL is set to 7 days. Every 7 days, Count's connection will need to be re-authorized (see below) to re-enable to the connection to Databricks. Count recommends a value between 1 week (10080) and 90 days (129600) to avoid the need to re-authorize too often.
Once created, make a note of the Client ID and the Client secret, as these will be required to create the connection in Count.

Navigate to your Count workspace, click Connect a database, and choose Databricks.
Select OAuth in the Authentication options dropdown, and enter your Host, and the Client ID obtained in step 3.
When you click Authorize, you will be redirected to your usual Databricks login flow (including an MFA required by your account).
Once you have logged successfully authorized on the Databricks side, you will be automatically redirected back to your Connect to Databricks page in Count. You'll now be asked for the value of the Client secret obtained in step 3, your Database name, and any other optional properties you wish to configure, including Refresh token validity (see below). The initial code retrieved from Databricks has a very short lifetime (1 minute). Please make sure you complete this step within that time.
Click Connect to complete the connection.

Set the value of Refresh token validity in your Count Databricks connection to the value of 60 * Refresh token TTL (i.e. the value of the TTL in seconds) to receive email reminders of when re-authorization is due.

If an error is encountered during step 7, you will need to re-authorize once again (steps 5 - 7).

Re-authorization

The refresh token obtained by Count from Databricks during step 7 enables Count to obtain a short lived (60 minutes by default) session token from Databricks to, amongst other things, run queries against your database. This refresh token itself has an expiry given by the value of Refresh token TTL set above. The connection will therefore need to be re-authorized whenever it is due to expire.

Navigate to your Databricks connection settings page, and click Re-authorize to regenerate a new refresh token (steps 5 - 7 above).

OAuth Service Principal

To connect using OAuth Service Principal:

Create a Service Principal in the Databricks account console
Assign workspace level permissions to that Service Principal
Create an OAuth secret for that Service Principal, making a note of the Client ID and Client Secret.

Enter your Host, Path, Client Id, and Client Secret (and other optional parameters) then click connect.

Connect

Once you hit Connect you will see your connection details including a full list of tables accessible with this connection.

Troubleshooting

The most common issue common issue when connecting to a Databricks cluster is that the warehouse is unresponsive if it has stopped due to inactivity. Count's connecting and querying logic will retry every 10 seconds for 3 minutes. If Count is still unable to connect to your cluster after this time, a 503 error will return. Ensure that your warehouse is running and available to accept connections from the Count IP addresses (see above) and retry the connection in Count.

If you are having trouble connecting, reach out to us to schedule a support session

Limitations

BigInt

The BigInt data type in Databricks can accommodate integers in the range -2^63 -> 2^63 -1 . Due to a limitation in the Databricks client, Count can accommodate a reduced range of -2^53 -> 2^53 -1 for this data type. Values outside of this reduced range will be displayed as null values.

VALUES clause

Count uses a reduced parse to determine which cells are referenced in your SQL statements. Databrick's SELECT statement and querying syntax is more flexible than those of other databases. As a result, there is an unresolvable ambiguity in Count's reduced parse for statements of the form

SELECT * FROM VALUES 1, 2, 3

Running a SQL statement similar to this will result in an error similar to "Unexpected number", "Unexpected literal", etc. The workaround for this issue is to parenthetise the row values in the VALUES clause, e.g.

SELECT * FROM VALUES (1), (2), (3)

PreviousBigQuery NextMicrosoft SQL Server

Last updated 5 months ago