Skip to main content

BigQuery Cron

Loads all data from a BigQuery table on a configured schedule.

BigQuery Cron Input

The BigQuery Cron Input allows you to read data from BigQuery tables or custom queries with support for full sync mode.

Requirements

Setting up a Service Account

  1. Create a new service account:
# Set your project ID
export PROJECT_ID="your-project-id"

# Create the service account
gcloud iam service-accounts create bigquery-reader \
--description="Service account for BigQuery data reading" \
--display-name="BigQuery Reader"
  1. Grant the required permissions:
# Grant BigQuery Data Viewer role (for reading data)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"

# Grant BigQuery Job User role (for executing queries)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
  1. Create and save the service account key (credentials):
# Create and save the key file
gcloud iam service-accounts keys create bigquery-reader-credentials.json \
--iam-account=bigquery-reader@$PROJECT_ID.iam.gserviceaccount.com

This creates a bigquery-reader-credentials.json file in your current directory. Use the contents of this file as the value for the credentials_json secret in your input configuration.

Note: Keep this credentials file secure and never commit it to version control.

Configuration

The BigQuery Cron Input is configured using the following settings:

Settings

SettingTypeRequiredDescription
projectstringYesThe Google Cloud Project ID where your BigQuery dataset is located
datasetstringYesThe name of the BigQuery dataset containing your table
tablestringNoThe name of the table to read from (required if query is not provided)
querystringNoOptional custom query to use instead of table (required if table is not provided)
cronstringYesCron expression defining when to check for new data

Secrets

SettingTypeRequiredDescription
credentials_jsontextYesThe Google Cloud service account credentials in JSON format

Features

Sync Mode

  • Full Sync: Reads all data from the table/query on each run

Query Support

  • By default this input will do a select all on the specified table
  • Custom SQL queries for complex data transformations

Efficient Data Reading

  • Uses BigQuery Query API for reliable data access
  • Automatic connection and retry handling
  • Supports both simple and complex queries

Best Practices

  1. Query Optimization

    • Keep queries simple for better performance
    • Test custom queries with small data sets first
    • Consider cost implications of full vs incremental sync
  2. Scheduling

    • Consider BigQuery's quotas and pricing
    • Allow enough time between runs for processing

Example Configurations

Full Sync with Table

{
"settings": {
"project": "my-project-123",
"dataset": "my_dataset",
"table": "my_table",
"cron": "0 */1 * * *"
},
"secrets": {
"credentials_json": {
"value": "{ your service account JSON here }"
}
}
}

Troubleshooting

Common issues and solutions:

  1. Permission Denied

    • Verify service account has both bigquery.dataViewer and bigquery.jobUser roles
    • Check project ID matches credentials
    • Ensure dataset and table exist
  2. Invalid Query

    • Test custom queries in BigQuery console first
    • Verify column names and types
    • Check for proper SQL syntax
  3. Performance Issues

    • Use the regular BigQuery Input for larger tables that can be incrementally processed
    • Review query efficiency
    • Check BigQuery usage quotas