Google BigQuery
What Is Google BigQuery?
Google BigQuery is a fully-managed, server-less data warehouse that enables scalable and fast SQL queries using the processing power of Google's infrastructure. It is designed to handle large datasets, allowing users to analyze terabytes of data in seconds and petabytes in minutes.
MetaRouter provides an efficient Extract, Transform and Load (ETL) platform that streams structured customer data into BigQuery in real-time. Ensure that data is always structured properly with MetaRouter' parameter-by-parameter data mapping and transformation playbooks.
What are the benefits of integrating MetaRouter with Google BigQuery?
- Private data stream that can never be co-mingled with anyone else's data.
- In-stream transformations providing highly structured data to the warehouse, improving data availability and reducing costly errors.
- Data streamed into BigQuery near-real-time - no batching required.
- Addressable third-party IDs are provided within your warehouse when paired with the Sync Injector.
Important Notes and Limitations
-
Event Name and Table Matching:
-
The event name specified in MetaRouter will match the table name inside of BigQuery. Changing the name of the event will create a new table.
- If you select
false
forBIG_QUERY_OPTIONS
under the connection parameters, then you do not have to givebigquery.datasets.create
permission in the IAM role--the integration will not create a new dataset, and no events will be delivered if the event does not exist.
- If you select
-
We cannot route events to custom names tables.
-
-
Event Table Schema Consistency:
- Once an event table is created in BigQuery, its schema must remain consistent. MetaRouter does not programmatically delete or alter the structure of existing event tables to prevent data inconsistencies and ensure data integrity. This means any changes to the event structure should be carefully planned and manually handled.
- MetaRouter cannot alter or delete existing columns, but we can add new columns based on new parameters being included in an event payload and mapped to the destination. This allows for extending the data model as new requirements emerge, without disrupting the existing data flow. However, ensure that new columns follow the established schema and data type conventions to maintain consistency.
-
Data Type Consistency:
- It is crucial to maintain data type consistency for each field across all events. For instance, if a field is initially set as a string, all subsequent events must adhere to this data type. Inconsistent data types can cause ingestion errors and complicate data analysis. Establishing and following a strict schema ensures smooth data processing and accurate analysis.
Getting Started - From Your Google BigQuery Account
Gather your Google Project ID:
- Navigate to the Google Cloud Console:
- Go to the Google Cloud Console.
- Select Your Project:
- Click on the project drop-down menu at the top of the page. If you have multiple projects, you can select the appropriate one from this list. If you don’t have a project, you can create a new one by clicking on "New Project".
- Locate the Project ID:
- Once you have selected your project, the Project ID will be displayed on the project dashboard. It is typically in the format of
my-project-id
.
- Once you have selected your project, the Project ID will be displayed on the project dashboard. It is typically in the format of
Set up a service account and gather credentials:
- Create a Service Account:
- Navigate to the Google Cloud Console.
- Go to "IAM & Admin" section and select "Service Accounts".
- Click "Create Service Account", enter a name, and provide a description.
- Please be sure to associate this account with the project associated with the project ID you will use for this integration.
- Grant Permissions:
- Assign the necessary roles to your service account. You will need to grant at least the "BigQuery Data Editor" roles. Please see more information on user roles below:
- BigQuery IAM Roles and Permissions - If you select
BIG_QUERY_OPTIONS
asfalse
in the connection parameters, then you do not have to give permission to create a dataset. - Control access to resources with IAM
- BigQuery IAM Roles and Permissions - If you select
- Assign the necessary roles to your service account. You will need to grant at least the "BigQuery Data Editor" roles. Please see more information on user roles below:
- Create and Download Key File:
- After creating the service account, go to the "Keys" section and add a key.
- Select "Create new key", choose "JSON", and download the key file. This JSON file contains the credentials needed to authenticate your application with BigQuery.
- Use the Key File as
CREDENTIALS
:- Upload the key file to your MetaRouter integration configuration as part of the setup process for BigQuery.
Getting Started - From Your MetaRouter Account
Adding a Google BigQuery integration.
From the integration library, add a Google BigQuery integration. Then, fill out the Connection Parameters:
Connection Parameter | Description |
---|---|
BIG_QUERY_OPTIONS | Select true if you would like the integration to create a dataset if it does not exist for an event. If you select false , then you do not have to give bigquery.datasets.create permission in the IAM role--the integration will not create a new dataset, and no events will be delivered if the event does not exist. |
CREDENTIALS | JSON key from file downloaded from Google Cloud Console. |
DATASET | Name of dataset. If inputting a name of a dataset that doesn’t exist, then the integration will create a new dataset with the given name. Please follow dataset naming best practices as outlined https://cloud.google.com/bigquery/docs/datasets#dataset-naming. |
PROJECT_ID | Project ID gathered from Google Cloud Console. |
REGION | Location specified for your BigQuery dataset. ENUM accepting string values, please see https://cloud.google.com/bigquery/docs/locations#regions. |
Event Mappings
Due to the complexity of event mapping in BigQuery, it is highly recommended to reach out to the MetaRouter team and collaborate with a solutions engineer. While MetaRouter includes recommended parameters under Global mapping, you will need to accurately map events when initially building out the integration. This collaboration ensures precise and efficient configuration tailored to your specific needs.
Global
Global mappings will be applied to all events. If your parameter names do not match the Expected Inputs provided, you will need to overwrite the Inputs provided with your own.
Ensuring event and data consistency is crucial when using BigQuery with MetaRouter. Please see Notes and Limitations above.
Output Key | Description | Expected Input |
---|---|---|
id | String: A unique identifier for the message. This field ensures each event can be uniquely identified and tracked. | messageId |
user_id | String: A unique identifier for the user. | userId |
previous_id | String: The identifier for the previous event. | previousId |
anonymous_id | String: An anonymous identifier for the user. | anonymousId |
original_timestamp | Timestamp: The original timestamp of the event. | originalTimestamp |
received_at | Timestamp: The timestamp when the event was received. | receivedAt |
sent_at | Timestamp: The timestamp when the event was sent. | sentAt |
timestamp | Timestamp: The timestamp of the event. | timestamp |
integrations | String: Integration-specific options or overrides. | integrations |
type | String: The type of event (e.g., identify, track). | type |
event | String: The name of the event. | event |
name | String: The name associated with the event. | name |
category | String: The category associated with the event. | category |
— | The context object in MetaRouter provides useful information about the environment in which an event occurred. This can include details about the page, device, browser, location, and more. | context |
— | Traits are attributes that describe a user or a group. When you call the identify method, you can pass in a traits object that includes key-value pairs of user information such as name, email, phone number, etc. | traits |
— | Properties are specific to events and provide additional context about the event itself. For instance, in a purchase event, properties might include details like the order value, product ID, quantity, and payment method. These are different from traits because they are transient and event-specific rather than being persistent attributes of a user. | properties |
Additional Google BigQuery Documentation
Google BigQuery - Data Types
Updated about 1 month ago