BigQuery
Loads data from a BigQuery table using timestamp-based tracking and scheduled ingestion.
BigQuery Input
The BigQuery Input allows you to read data from BigQuery tables or custom queries incrementally.
Requirements
Setting up a Service Account
- Create a new service account:
# Set your project ID
export PROJECT_ID="your-project-id"
# Create the service account
gcloud iam service-accounts create bigquery-reader \
--description="Service account for BigQuery data reading" \
--display-name="BigQuery Reader"
- Grant the required permissions:
# Grant BigQuery Data Viewer role (for reading data)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Grant BigQuery Job User role (for executing queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
- Create and save the service account key (credentials):
# Create and save the key file
gcloud iam service-accounts keys create bigquery-reader-credentials.json \
--iam-account=bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com
This creates a bigquery-reader-credentials.json file in your current directory. Use the contents of this file as the value for the credentials_json secret in your input configuration.
Note: Keep this credentials file secure and never commit it to version control.
Configuration
The BigQuery Input is configured using the following settings:
Settings
| Setting | Type | Required | Description |
|---|---|---|---|
| project | string | Yes | The Google Cloud Project ID where your BigQuery dataset is located |
| dataset | string | Yes | The name of the BigQuery dataset containing your table |
| table | string | No | The name of the table to read from (required if query is not provided) |
| timestamp_column | string | Yes | Column used for incremental loading |
| query | string | No | Optional custom query to use instead of table (required if table is not provided) |
Secrets
| Setting | Type | Required | Description |
|---|---|---|---|
| credentials_json | text | Yes | The Google Cloud service account credentials in JSON format |
Features
Sync Mode
- Incremental Sync: Uses the timestamp_column provided to read new or updated records
- Automatically manages state between runs for incremental syncs
Query Support
- Simple table queries with automatic schema detection
- Custom SQL queries for complex data transformations
- Wraps custom queries in CTEs for incremental loading
Efficient Data Reading
- Uses BigQuery Query API for reliable data access
- Automatic connection and retry handling
- Supports both simple and complex queries
Best Practices
- Query Optimization
- Keep queries simple for better performance
- Test custom queries with small data sets first
Example Configurations
Incremental Sync with Table
{
"settings": {
"project": "my-project-123",
"dataset": "my_dataset",
"table": "my_table",
"timestamp_column": "updated_at"
},
"secrets": {
"credentials_json": {
"value": "{ your service account JSON here }"
}
}
}
Troubleshooting
Common issues and solutions:
-
Permission Denied
- Verify service account has both bigquery.dataViewer and bigquery.jobUser roles
- Check project ID matches credentials
- Ensure dataset and table exist
-
Invalid Query
- Test custom queries in BigQuery console first
- Verify column names and types
- Check for proper SQL syntax
-
Timestamp Column Issues
- Verify column exists in table/query results
- Ensure column is TIMESTAMP type
- Check column name matches exactly
-
Performance Issues
- Review query efficiency
- Check BigQuery usage quotas