Databricks Integration Docs

Summary

The Databricks Data Intelligence Platform allows client’s to use data and AI. It’s built on a lake-house to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. MetaRouter allows clients to pass event and identity to their chosen cloud storage account in real time. By ensuring that compliant and correctly configured event data goes into the cloud data storage, and then to the data warehouse, MetaRouter can reduce processing time, post-processing data loops, and improve overall schema quality. MetaRouter can also offer enterprise level client access to additional vendor ID’s that might be used for attribution, identity graphing or 2nd party data sales.

How Does It Work?

MetaRouter, like most warehouse implementations, will pass the your events to their chosen cloud data storage account, where it can then be loaded into Databricks via their own connectors. You will need to pick your desired cloud storage account, connect your event data to that storage via a MetaRouter integration, and then within the cloud storage account and Databricks, connect the data to your warehouse. Using these steps clients can quickly pipeline their eventing data from their digital properties into Databricks for warehousing use-cases.

Integration Setup

Step 1:

Chose and Setup Cloud Storage Account:

Setup Amazon’s S3 - https://aws.amazon.com/pm/serv-s3/
Setup Google Cloud Storage - https://support.google.com/cloud/answer/6250993?hl=en
Setup Microsoft Azure - https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-setup-guide/

Step 2:

Once your desired cloud storage account is chosen and setup, you can use MetaRouter to pipe data into the cloud storage account. Our integrations provide strong flexibility to shape and transform your event data when loading into these Data Lake options, allowing for a cleaner, more consistent schema and warehouse structure when that data is pulled into Databricks.

We provide integrations for all three of the storage options listed above, and instructions for how to configure those integrations within our platform can be found here:

Setup the MetaRouter Amazon’s S3 integrations - https://docs.metarouter.io/docs/amazon-s3
Setup the MetaRouter Google Cloud Storage integration - https://docs.metarouter.io/docs/google-cloud-storage
Setup the MetaRouter Microsoft Azure Blob integration - https://docs.metarouter.io/docs/blob

Step 3:

Configure Databricks to read and consume event data via your chosen cloud storage account:

Configure Databricks to Consume Amazon’s S3 - https://docs.databricks.com/en/storage/amazon-s3.html
Configure Databricks to Consume Google Cloud Storage - https://docs.databricks.com/en/storage/gcs.html
Configure Databricks to Consume Microsoft Azure - https://docs.databricks.com/en/storage/azure-storage.html

And you’re done! This will allow data to flow from your digital property, process through MetaRouter’s cloud storage integration, and ultimately be consumed (on a schedule of your choosing within Databricks’ UI) into your Databricks warehouse.