Skip to main content

BigQuery

Loads data from a BigQuery table using timestamp-based tracking and scheduled ingestion.

BigQuery Input

The BigQuery Input allows you to read data from BigQuery tables or custom queries incrementally.

Requirements

Setting up a Service Account

  1. Create a new service account:
# Set your project ID
export PROJECT_ID="your-project-id"

# Create the service account
gcloud iam service-accounts create bigquery-reader \
--description="Service account for BigQuery data reading" \
--display-name="BigQuery Reader"
  1. Grant the required permissions:
# Grant BigQuery Data Viewer role (for reading data)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"

# Grant BigQuery Job User role (for executing queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
  1. Create and save the service account key (credentials):
# Create and save the key file
gcloud iam service-accounts keys create bigquery-reader-credentials.json \
--iam-account=bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com

This creates a bigquery-reader-credentials.json file in your current directory. Use the contents of this file as the value for the credentials_json secret in your input configuration.

Note: Keep this credentials file secure and never commit it to version control.

Configuration

The BigQuery Input is configured using the following settings:

Settings

SettingTypeRequiredDescription
projectstringYesThe Google Cloud Project ID where your BigQuery dataset is located
datasetstringYesThe name of the BigQuery dataset containing your table
tablestringNoThe name of the table to read from (required if query is not provided)
timestamp_columnstringYesColumn used for incremental loading
querystringNoOptional custom query to use instead of table (required if table is not provided)

Secrets

SettingTypeRequiredDescription
credentials_jsontextYesThe Google Cloud service account credentials in JSON format

Features

Sync Mode

  • Incremental Sync: Uses the timestamp_column provided to read new or updated records
  • Automatically manages state between runs for incremental syncs

Query Support

  • Simple table queries with automatic schema detection
  • Custom SQL queries for complex data transformations
  • Wraps custom queries in CTEs for incremental loading

Efficient Data Reading

  • Uses BigQuery Query API for reliable data access
  • Automatic connection and retry handling
  • Supports both simple and complex queries

Best Practices

  1. Query Optimization
    • Keep queries simple for better performance
    • Test custom queries with small data sets first

Example Configurations

Incremental Sync with Table

{
"settings": {
"project": "my-project-123",
"dataset": "my_dataset",
"table": "my_table",
"timestamp_column": "updated_at"
},
"secrets": {
"credentials_json": {
"value": "{ your service account JSON here }"
}
}
}

Troubleshooting

Common issues and solutions:

  1. Permission Denied

    • Verify service account has both bigquery.dataViewer and bigquery.jobUser roles
    • Check project ID matches credentials
    • Ensure dataset and table exist
  2. Invalid Query

    • Test custom queries in BigQuery console first
    • Verify column names and types
    • Check for proper SQL syntax
  3. Timestamp Column Issues

    • Verify column exists in table/query results
    • Ensure column is TIMESTAMP type
    • Check column name matches exactly
  4. Performance Issues

    • Review query efficiency
    • Check BigQuery usage quotas