LogoLogo
  • Contributing to RudderStack
  • Destination_Name
  • LICENSE
  • RudderStack Docs
  • docs
    • FAQ
    • Identity Resolution
    • Home
    • cloud-extract-sources
      • ActiveCampaign Source
      • Bing Ads
      • Chargebee
      • Common Settings
      • Facebook Ads
      • Freshdesk
      • Google Ads Source
      • Google Analytics
      • Google Search Console
      • Google Sheets
      • Cloud Extract Sources
      • Intercom v2
      • Intercom
      • Mailchimp
      • Marketo
      • Mixpanel
      • NetSuite
      • Pipedrive
      • QuickBooks
      • Salesforce Pardot
      • Sendgrid Source
      • Stripe Source
      • Xero
      • Zendesk Chat
      • Zendesk
      • hubspot
        • HubSpot Data Model and Schema Information
        • HubSpot
      • salesforce
        • Salesforce
        • Schema Comparison: RudderStack vs. Segment
    • connections
      • Connection Modes: Cloud Mode vs. Device Mode
    • data-governance
      • Data Governance
      • RudderTyper
      • Data Governance API
      • RudderTyper
      • tracking-plans
        • Tracking Plans
        • Tracking Plan Spreadsheet
    • data-warehouse-integrations
      • Amazon Redshift
      • Azure Data Lake
      • Azure Synapse
      • ClickHouse
      • Databricks Delta Lake
      • Google Cloud Storage Data Lake
      • Google BigQuery
      • Identity Resolution
      • Warehouse Destinations
      • Microsoft SQL Server
      • PostgreSQL
      • Amazon S3 Data Lake
      • Snowflake
      • FAQ
      • Warehouse Schema
    • destinations
      • Destinations
      • Webhooks
      • advertising
        • Bing Ads
        • Criteo
        • DCM Floodlight
        • Facebook App Events
        • Facebook Custom Audience
        • Facebook Pixel
        • Google Ads (gtag.js)
        • Google AdWords Enhanced Conversions
        • Google Adwords Remarketing Lists (Customer Match)
        • Advertising
        • LinkedIn Insight Tag
        • Lotame
        • Pinterest Tag
        • Reddit Pixel
        • Snap Pixel
        • TikTok Ads
      • analytics
        • Amplitude
        • AWS Personalize
        • Chartbeat
        • Firebase
        • FullStory
        • Google Analytics 360
        • Google Analytics
        • Heap.io
        • Hotjar
        • Analytics
        • Indicative
        • Keen
        • Kissmetrics
        • Kubit
        • Lytics
        • Mixpanel
        • Pendo
        • PostHog
        • Quantum Metric
        • Singular
        • adobe-analytics
          • Adobe Analytics Heartbeat Measurement
          • Mobile Device Mode Settings
          • Web Device Mode Settings
          • E-commerce Events
          • Adobe Analytics
          • Setting Up Adobe Analytics in RudderStack
        • google-analytics-4
          • Cloud Mode
          • Device Mode
          • Google Analytics 4
          • Setting up Google Analytics 4
        • profitwell
          • ProfitWell
          • Cloud Mode
          • Device Mode
      • attribution
        • Adjust
        • AppsFlyer
        • Branch
        • Attribution
        • Kochava
        • TVSquared
      • business-messaging
        • Business Messaging
        • Intercom
        • Kustomer
        • Slack
        • Trengo
      • continuous-integration
        • Visual Studio App Center
        • Continuous Integration
      • crm
        • Delighted
        • HubSpot
        • CRM
        • Salesforce
        • Variance
        • Zendesk
      • customer-data-platform
        • Customer Data Platform
        • Segment
      • error-reporting
        • Bugsnag
        • Error Reporting
        • Sentry
      • marketing
        • ActiveCampaign
        • AdRoll
        • Airship
        • Appcues
        • Autopilot
        • Blueshift
        • Braze
        • CleverTap
        • Customer.io
        • Gainsight PX
        • Gainsight
        • Marketing
        • Iterable
        • Klaviyo
        • Leanplum
        • Mailchimp
        • Marketo Lead Import
        • Marketo
        • MoEngage
        • Ometria
        • Pardot
        • Post Affiliate Pro
        • Qualtrics
        • SendGrid
        • Salesforce Marketing Cloud
        • Userlist
        • drip
          • Cloud Mode
          • Device Mode
          • Drip
          • Setting Up Drip in RudderStack
      • productivity
        • Google Sheets
        • Productivity
      • storage-platforms
        • Amazon S3
        • DigitalOcean Spaces
        • Google Cloud Storage
        • Storage Platforms
        • Azure Blob Storage
        • MinIO
        • Redis
      • streaming-platforms
        • Amazon EventBridge
        • Amazon Kinesis Firehose
        • Amazon Kinesis
        • Azure Event Hubs
        • BigQuery Stream
        • Confluent Cloud
        • Google Pub/Sub
        • Streaming Platforms
        • Apache Kafka
      • tag-managers
        • Google Tag Manager
        • Tag Managers
      • testing-and-personalization
        • Algolia Insights
        • Candu
        • Google Optimize
        • A/B Testing & Personalization
        • LaunchDarkly
        • Monetate
        • Optimizely Full Stack
        • Optimizely Web
        • Split.io
        • Statsig
        • VWO (Visual Website Optimizer)
    • get-started
      • RudderStack Cloud vs. RudderStack Open Source
      • Glossary
      • Get Started
      • RudderStack Architecture
    • reverse-etl
      • Amazon Redshift
      • Amazon S3
      • ClickHouse
      • FAQ
      • Google BigQuery
      • Reverse ETL
      • PostgreSQL
      • Snowflake
      • common-settings
        • Importing Data using Models
        • Importing Data using Tables
        • Common Settings
        • Sync Modes
        • Sync Schedule
      • features
        • Airflow Provider
        • Features
        • Models
        • Visual Data Mapper
    • rudderstack-api
      • Data Regulation API
      • HTTP API
      • RudderStack API
      • Personal Access Tokens
      • Pixel API
      • Test API
      • api-specification
        • Application Lifecycle Events Specification
        • API Specification
        • Video Events Specification
        • rudderstack-ecommerce-events-specification
          • Browsing
          • Coupons
          • E-Commerce Events Specification
          • Ordering
          • Promotions
          • Reviewing
          • Sharing
          • Wishlist
        • rudderstack-spec
          • Alias
          • Common Fields
          • Group
          • Identify
          • RudderStack Event Specification
          • Page
          • Screen
          • Track
    • rudderstack-cloud
      • Audit Logs
      • Dashboard Overview
      • Destinations
      • RudderStack Cloud
      • Live Events
      • Connection Modes: Cloud Mode vs. Device Mode
      • Sources
      • Teammates (User Management)
      • connections
        • Adding a Destination
        • Connections
    • rudderstack-open-source
      • Control Plane Setup
      • RudderStack Open Source
      • installing-and-setting-up-rudderstack
        • Developer Machine Setup
        • Docker
        • Data Plane Setup
        • Kubernetes
        • Sending Test Events
    • stream-sources
      • App Center
      • AppsFlyer
      • Auth0
      • Braze
      • Customer.io
      • Extole
      • Event Stream Sources
      • Iterable
      • Looker
      • PostHog
      • Segment
      • Shopify
      • Webhook Source
      • rudderstack-sdk-integration-guides
        • Client-side Event Filtering
        • SDKs
        • AMP Analytics
        • Cordova
        • .NET
        • Go
        • Java
        • Node.js
        • PHP
        • Python
        • React Native
        • Ruby
        • Rust
        • Unity
        • SDK FAQs
        • rudderstack-android-sdk
          • Adding Application Class
          • Flushing Events Periodically
          • Android
        • rudderstack-flutter-sdk
          • Flutter SDK v1
          • Flutter v2
          • Flutter
        • rudderstack-ios-sdk
          • iOS
          • tvOS
          • watchOS
        • rudderstack-javascript-sdk
          • Data Storage in Cookies
          • Detecting Ad-blocked Pages
          • JavaScript
          • JavaScript SDK Enhancements
          • JavaScript SDK FAQs
          • Querystring API
          • Quick Start Guide
          • Version Migration Guide
          • consent-managers
            • Consent Managers
            • OneTrust
    • transformations
      • Access Token
      • FAQ
      • Transformations
      • Transformations API
    • user-guides
      • User Guides
      • administrators-guide
        • Troubleshooting Guide
        • Alerting Guide
        • Bucket Configuration Settings for Event Backups
        • Configuration Parameters
        • Event Replay
        • High Availability
        • Horizontal Scaling
        • Administrator's Guides
        • Infrastructure Provisioning
        • Monitoring and Metrics
        • Okta SSO Setup
        • OneLogin SSO Setup
        • RudderStack Grafana Dashboard
        • Software Releases
      • how-to-guides
        • How to Use Custom Domains
        • How to Develop Integrations for RudderStack
        • How to Configure a Destination via the Event Payload
        • How to Filter Events using Different Methods
        • How to Filter Selective Destinations
        • How to Submit a Pull Request for a New Integration
        • How-to Guides
        • How to Debug Live Destination Events
        • How to Use AWS Lambda Functions with RudderStack
        • create-a-new-destination-transformer-for-rudder
          • Best Practices for Coding Transformation Functions in JavaScript
          • How to Create a New Destination Transformation for RudderStack
        • implement-native-js-sdk-integration
          • How to Add a Device Mode SDK to RudderStack JavaScript SDK
          • How to Implement a Native JavaScript SDK Integration
        • rudderstack-jamstack-integration
          • How to Integrate RudderStack with Your JAMstack Site
          • How to Integrate Rudderstack with Your Angular App
          • How to Integrate Rudderstack with Your Astro Site
          • How to Integrate Rudderstack with Your Eleventy Site
          • How to Integrate Rudderstack with Your Ember.js App
          • How to Integrate Rudderstack with a Gatsby Website
          • How to Integrate Rudderstack with a Hugo Site
          • How to Integrate Rudderstack with Your Jekyll Site
          • How to Integrate Rudderstack with Your Next.js App
          • How to Integrate Rudderstack with Your Nuxt.js App
          • How to Integrate Rudderstack with Your Svelte App
          • How to Integrate Rudderstack with Your Vue App
      • migration-guides
        • Migrating from Blendo to RudderStack
        • Migrating Your Warehouse Destination from Segment to RudderStack
        • Migration Guides
        • Migrating from Segment to RudderStack
  • src
    • @rocketseat
      • gatsby-theme-docs
        • text
          • Home
Powered by GitBook
On this page
  • Setting up the BigQuery project
  • Setting up the service account for RudderStack
  • IPs to be whitelisted
  • Configuring Google BigQuery destination in RudderStack
  • Connection settings
  • Schema, partitioned tables, views, and deduplication
  • Partitioned tables and views
  • Deduplication method
  • Switching from deduplication to partitioned tables
  • FAQs
  • How are reserved words handled by RudderStack?
  • When sending data into a data warehouse, how can I change the table where this data is sent?
  • I'm looking to send data to BigQuery through RudderStack and I'm trying to understand what data is populated in each column. How do I go about this?
  • I am trying to load data into my BigQuery destination and I get the error "Cannot read and write in different locations". What should I do?
  • When piping data to a BigQuery destination, I can set the bucket but not a folder within the bucket. Is there a way to put Rudderstack data in a specific bucket folder?
  • Does open source RudderStack support near real-time syncing to BigQuery and event replay?
  • What is the current sync frequency for BigQuery?
  • Do I need to stop the running pipeline to change my sync frequency? Or will the new change be effective even without stopping the pipeline?
  • When configuring the BigQuery destination, where does Google use the credentials JSON from?
  • When configuring the BigQuery destination, should the user permissions be set for the specific dataset or the whole project?
  • How long are the failed syncs retried before being aborted?
  • Contact us

Was this helpful?

  1. docs
  2. data-warehouse-integrations

Google BigQuery

Step-by-step guide on setting up Google BigQuery as a destination in RudderStack.

PreviousGoogle Cloud Storage Data LakeNextIdentity Resolution

Last updated 2 years ago

Was this helpful?

is an industry-leading, fully-managed cloud data warehouse that lets you efficiently store and analyze petabytes of data.

RudderStack lets you configure BigQuery as a destination to which you can send your event data seamlessly.

Refer to the guide for more information on how the events are mapped to the tables in BigQuery.

Find the open-source transformer code for this destination in the .

Setting up the BigQuery project

Before you set up BigQuery as a destination in RudderStack, follow these steps to set up your BigQuery project:

  1. Create a Google Cloud Platform (GCP) project if you don't have one already. For more details, refer to this .

  2. Enable the BigQuery API for your existing project if it's not done already. For more information, refer to this .

  3. Log into your . Copy the project ID as shown in the following image:

This project ID is required for configuring BigQuery as a destination in RudderStack.

To enable RudderStack to load data into your BigQuery cluster, make sure that is enabled for the project.

  1. Create a new Google Cloud Storage (GCS) bucket or provide an existing one to store files before loading the data into your BigQuery instance.

To make sure that the data loads from GCS to BigQuery, co-locate your GCS storage bucket with BigQuery. For more information, refer to the .

Setting up the service account for RudderStack

For RudderStack to successfully send events to your BigQuery instance, you also need to set up a service account with the necessary permissions.

Make sure that you create the service account for the same BigQuery project.

Follow the steps below to set up a service account:

  1. Create a new service account by going to IAM & Admin > Service Accounts.

  2. Add the service account permissions as specified below:

  • Add the Storage Object Creator and Storage Object Viewer roles to the account.

  • Add the BigQuery Job User and BigQuery Data Owner roles to the account.

If a dataset name (configurable by the setting: Namespace in the RudderStack dashboard) already exists, the role of BigQuery Data Editor would suffice instead of BigQuery Data Owner.

  1. Create a key for the service account with JSON as the type and store it.

  1. Create and download the private JSON key required for configuring BigQuery as a destination in RudderStack, as shown:

IPs to be whitelisted

You will need to whitelist the following RudderStack IPs to enable network access:

  • 3.216.35.97

  • 34.198.90.241

  • 54.147.40.62

  • 23.20.96.9

  • 18.214.35.254

  • 35.83.226.133

  • 52.41.61.208

  • 44.227.140.138

  • 54.245.141.180

  • 3.66.99.198

  • 3.64.201.167

If you have your deployment in the EU region, you can whitelist only the following two IPs:

  • 3.66.99.198

  • 3.64.201.167

All the outbound traffic is routed through these RudderStack IPs.

Configuring Google BigQuery destination in RudderStack

To send event data to SQL Server, you first need to add it as a destination in RudderStack and connect it to your data source. Once the destination is enabled, events will automatically start flowing to SQL Server via RudderStack.

To configure SQL Server as a destination in RudderStack, follow these steps:

  1. Assign a name to your destination and then click on Next.

Connection settings

Enter the following credentials in the Connection Credentials page:

  • Project: The GCP project ID where the BigQuery database is located.

  • Location: The GCP region for your dataset.

  • Prefix: If specified, RudderStack creates a folder in the bucket with this prefix and pushes all the data within that folder.

  • Namespace: Enter the schema name where RudderStack will create all the tables. If you don't specify any namespace, RudderStack will set this to the source name, by default.

  • Sync Frequency: Specify how often RudderStack should sync the data to your BigQuery dataset.

  • Sync Starting At: This optional setting lets you specify the particular time of the day (in UTC) when you want RudderStack to sync the data to BigQuery.

  • Exclude Window: This optional setting lets you set a time window when RudderStack will not sync the data to your database.

Schema, partitioned tables, views, and deduplication

RudderStack uses the source name (written in snake case, for example, source_name) to create a dataset in BigQuery.

RudderStack supports two modes by which data is ingested into BigQuery:

  • Partitioned tables (default method)

  • Deduplication

By default, RudderStack uses the partitioned tables method to ingest data into BigQuery. Deduplication is disabled.

Ingestion method

RSERVER_WAREHOUSE_BIGQUERY_IS_DEDUP_ENABLED

Partitioned tables

false

Deduplication

true

Partitioned tables and views

RudderStack creates ingestion-time partition tables based on the load date, so you can take advantage of it to query a subset of data.

Duplicate data is not discarded when loading it into BigQuery.

In addition to tables, RudderStack creates a view (<table_name>_view) for every table for de-duplication purposes.

Users can modify the view query to change the time window of the view. The default value is set to 60 days.

Deduplication method

In this method, RudderStack automatically discards any duplicate events while loading them into the BigQuery tables. To enable this method, set the environment variable RSERVER_WAREHOUSE_BIGQUERY_IS_DEDUP_ENABLED in your RudderStack data plane setup to true.

Views will not be created for the tables if you choose the deduplication method.

Note that the querying cost will increase if you choose this method, as RudderStack removes the duplicates from the tables through sequential scans that happen on every warehouse upload. For example, if your warehouse upload frequency is set to every 30 minutes, RudderStack performs a full scan on the tracks table every 30 minutes to upload any new events.

Switching from deduplication to partitioned tables

You can easily switch from using the deduplication method to ingest your data in BigQuery to the partitioned tables by setting the environment variable RSERVER_WAREHOUSE_BIGQUERY_IS_DEDUP_ENABLED in your RudderStack setup to false.

RudderStack then does the following:

  • Creates users_view while loading the users table, if not already present.

  • Creates the views for the new track events.

FAQs

How are reserved words handled by RudderStack?

Besides, integers are not allowed at the start of the schema or table name. Hence, such schema, column or table names will be prefixed with a _.

For instance, '25dollarpurchase' will be changed to '_25dollarpurchase'.

When sending data into a data warehouse, how can I change the table where this data is sent?

By default, RudderStack sends the data to the table/dataset based on the source it is connected to. For example, if the source is Google Tag Manager, RudderStack sets the schema name as gtm_*. However, you can override this behavior by setting the Namespace field in the BigQuery destination settings, as shown:

I'm looking to send data to BigQuery through RudderStack and I'm trying to understand what data is populated in each column. How do I go about this?

I am trying to load data into my BigQuery destination and I get the error "Cannot read and write in different locations". What should I do?

Make sure that both your BigQuery dataset and the bucket have the same region.

When piping data to a BigQuery destination, I can set the bucket but not a folder within the bucket. Is there a way to put Rudderstack data in a specific bucket folder?

Yes, you can set the desired folder name in the Prefix field while setting up your BigQuery destination in RudderStack.

Does open source RudderStack support near real-time syncing to BigQuery and event replay?

The near-realtime BigQuery syncing feature is currently under development and is planned to be released in the coming months. Unfortunately, Event Replay is not a part of open-source RudderStack currently.

What is the current sync frequency for BigQuery?

If you're using open source RudderStack, the minimum sync frequency is 30 minutes. If you're self-hosting the data plane or using RudderStack Cloud Pro / Enterprise, you can tweak the config to set uploadFreqInS in config.yaml to 0 and determine the best possible value for near real-time sync.

Do I need to stop the running pipeline to change my sync frequency? Or will the new change be effective even without stopping the pipeline?

To change the sync frequency, you need not stop the pipeline.

When configuring the BigQuery destination, where does Google use the credentials JSON from?

When configuring the BigQuery destination, should the user permissions be set for the specific dataset or the whole project?

You need to set the user permissions for the whole project. Otherwise, you may encounter issues.

How long are the failed syncs retried before being aborted?

Contact us

In your , set up the data source. Then, select BigQuery from the list of destinations.

Staging GCS Storage Bucket Name: The name of the storage bucket as specified in the section.

Credentials: Your GCP service account credentials JSON as created in the section.

For more details on the tables and columns created by RudderStack, refer to the guide.

If you are using or hosting the RudderStack data plane, you can manage these modes via the environment variable RSERVER_WAREHOUSE_BIGQUERY_IS_DEDUP_ENABLED, as shown in the following table:

You can find more details on the BigQuery partitioned tables in . For information on how RudderStack creates these tables on load, refer to the section of the BigQuery documentation.

More information on views can be found in this .

It is highly recommended that you use the corresponding view (containing the events from the last 60 days) to avoid duplicate events in your query results. Since BigQuery are merely logical views and are not cached, you can create a native table from it to save money - by avoiding running the query that defines the view every time.

All the other skipped views are not regenerated. You can either manually create those views or to get them created.

There are some limitations when it comes to using in a schema, table, or column names. If such words are used in event names, traits or properties, they will be prefixed with a _when RudderStack creates tables or columns for them in your schema.

Refer to the documentation for details on how RudderStack generates the schema in the warehouse and populates the data in each column.

For more information, refer to this .

BigQuery uses the credentials JSON from the dashboard configuration when setting up the destination. For more information, refer to the section.

RudderStack retries the failed syncs for up to 3 hours before aborting them. For more information, refer to this .

For queries on any of the sections covered in this guide, you can or start a conversation in our community.

RudderStack dashboard
Setting up the BigQuery Project
Warehouse Schema
open source RudderStack
BigQuery documentation
Creating partitioned tables
BigQuery guide
views
contact us
reserved words
RudderStack Warehouse Schema
FAQ
FAQ
contact us
Slack
Setting up the service account for RudderStack
Setting up the service account for RudderStack
Google BigQuery
Warehouse Schemas
GitHub repository
guide
BigQuery guide
BigQuery console
billing
BigQuery documentation
BigQuery connection settings
BigQuery connection settings
Namespace
Create a key
JSON key required for the RudderStack UI
Copy the Project ID