BigQuery

Loads data from a BigQuery table using timestamp-based tracking and scheduled ingestion.

BigQuery Input

The BigQuery Input allows you to read data from BigQuery tables or custom queries incrementally.

Requirements

Setting up a Service Account

Create a new service account:

# Set your project ID
export PROJECT_ID="your-project-id"

# Create the service account
gcloud iam service-accounts create bigquery-reader \
    --description="Service account for BigQuery data reading" \
    --display-name="BigQuery Reader"

Grant the required permissions:

# Grant BigQuery Data Viewer role (for reading data)
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/bigquery.dataViewer"

# Grant BigQuery Job User role (for executing queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/bigquery.jobUser"

Create and save the service account key (credentials):

# Create and save the key file
gcloud iam service-accounts keys create bigquery-reader-credentials.json \
    --iam-account=bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com

This creates a bigquery-reader-credentials.json file in your current directory. Use the contents of this file as the value for the credentials_json secret in your input configuration.

Note: Keep this credentials file secure and never commit it to version control.

Configuration

The BigQuery Input is configured using the following settings:

Settings

Setting	Type	Required	Description
project	string	Yes	The Google Cloud Project ID where your BigQuery dataset is located
dataset	string	Yes	The name of the BigQuery dataset containing your table
table	string	No	The name of the table to read from (required if query is not provided)
timestamp_column	string	Yes	Column used for incremental loading
query	string	No	Optional custom query to use instead of table (required if table is not provided)

Secrets

Setting	Type	Required	Description
credentials_json	text	Yes	The Google Cloud service account credentials in JSON format

Features

Sync Mode

Incremental Sync: Uses the timestamp_column provided to read new or updated records
Automatically manages state between runs for incremental syncs

Query Support

Simple table queries with automatic schema detection
Custom SQL queries for complex data transformations
Wraps custom queries in CTEs for incremental loading

Efficient Data Reading

Uses BigQuery Query API for reliable data access
Automatic connection and retry handling
Supports both simple and complex queries

Best Practices

Query Optimization
- Keep queries simple for better performance
- Test custom queries with small data sets first

Example Configurations

Incremental Sync with Table

{
  "settings": {
    "project": "my-project-123",
    "dataset": "my_dataset",
    "table": "my_table",
    "timestamp_column": "updated_at"
  },
  "secrets": {
    "credentials_json": {
      "value": "{ your service account JSON here }"
    }
  }
}

Troubleshooting

Common issues and solutions:

Permission Denied
- Verify service account has both bigquery.dataViewer and bigquery.jobUser roles
- Check project ID matches credentials
- Ensure dataset and table exist
Invalid Query
- Test custom queries in BigQuery console first
- Verify column names and types
- Check for proper SQL syntax
Timestamp Column Issues
- Verify column exists in table/query results
- Ensure column is TIMESTAMP type
- Check column name matches exactly
Performance Issues
- Review query efficiency
- Check BigQuery usage quotas

BigQuery Input

Requirements​

Setting up a Service Account​

Configuration​

Settings​

Secrets​

Features​

Sync Mode​

Query Support​

Efficient Data Reading​

Best Practices​

Example Configurations​

Incremental Sync with Table​

Troubleshooting​