BigQuery Cron
Loads all data from a BigQuery table on a configured schedule.
BigQuery Cron Input
The BigQuery Cron Input allows you to read data from BigQuery tables or custom queries with support for full sync mode.
Requirements
Setting up a Service Account
- Create a new service account:
# Set your project ID
export PROJECT_ID="your-project-id"
# Create the service account
gcloud iam service-accounts create bigquery-reader \
--description="Service account for BigQuery data reading" \
--display-name="BigQuery Reader"
- Grant the required permissions:
# Grant BigQuery Data Viewer role (for reading data)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Grant BigQuery Job User role (for executing queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
- Create and save the service account key (credentials):
# Create and save the key file
gcloud iam service-accounts keys create bigquery-reader-credentials.json \
--iam-account=bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com
This creates a bigquery-reader-credentials.json file in your current directory. Use the contents of this file as the value for the credentials_json secret in your input configuration.
Note: Keep this credentials file secure and never commit it to version control.
Configuration
The BigQuery Cron Input is configured using the following settings:
Settings
| Setting | Type | Required | Description |
|---|---|---|---|
| project | string | Yes | The Google Cloud Project ID where your BigQuery dataset is located |
| dataset | string | Yes | The name of the BigQuery dataset containing your table |
| table | string | No | The name of the table to read from (required if query is not provided) |
| query | string | No | Optional custom query to use instead of table (required if table is not provided) |
| cron | string | Yes | Cron expression defining when to check for new data |
Secrets
| Setting | Type | Required | Description |
|---|---|---|---|
| credentials_json | text | Yes | The Google Cloud service account credentials in JSON format |
Features
Sync Mode
- Full Sync: Reads all data from the table/query on each run
Query Support
- By default this input will do a select all on the specified table
- Custom SQL queries for complex data transformations
Efficient Data Reading
- Uses BigQuery Query API for reliable data access
- Automatic connection and retry handling
- Supports both simple and complex queries
Best Practices
-
Query Optimization
- Keep queries simple for better performance
- Test custom queries with small data sets first
- Consider cost implications of full vs incremental sync
-
Scheduling
- Consider BigQuery's quotas and pricing
- Allow enough time between runs for processing
Example Configurations
Full Sync with Table
{
"settings": {
"project": "my-project-123",
"dataset": "my_dataset",
"table": "my_table",
"cron": "0 */1 * * *"
},
"secrets": {
"credentials_json": {
"value": "{ your service account JSON here }"
}
}
}
Troubleshooting
Common issues and solutions:
-
Permission Denied
- Verify service account has both bigquery.dataViewer and bigquery.jobUser roles
- Check project ID matches credentials
- Ensure dataset and table exist
-
Invalid Query
- Test custom queries in BigQuery console first
- Verify column names and types
- Check for proper SQL syntax
-
Performance Issues
- Use the regular BigQuery Input for larger tables that can be incrementally processed
- Review query efficiency
- Check BigQuery usage quotas