GitHub Actions Workflow Logs
Streams GitHub Actions workflow run logs for completed runs in a repository, chunked into ~1 MB pieces split at newline boundaries.
Sync Type: Incremental
Sync Interval: 10 minutes
Requirements
Before configuring this input, you need to:
-
Create a Personal Access Token (PAT) or GitHub App — GitHub authentication docs
Option A: Personal Access Token (PAT)
- Go to your GitHub settings
- Navigate to Developer settings > Personal access tokens > Tokens (classic) or Fine-grained tokens
- Click "Generate new token"
- For fine-grained tokens, select the repository and grant
Actions: Read+Metadata: Readpermissions - For classic tokens, select the
repoorpublic_reposcope (and optionallyactions:read) - Copy and securely store the generated token
Option B: GitHub App
- Create a GitHub App in your organization or account
- Grant the app
Actions: ReadandMetadata: Readpermissions - Install the app on the target repository
- Generate and store the private key in PEM format
- Note the Client ID and Installation ID
-
Token Scopes & Permissions:
- Personal Access Token (Classic):
repo(full control) orpublic_repo(public repos only), optionally addactions:read - Personal Access Token (Fine-grained):
Actions: ReadandMetadata: Readon target repository - GitHub App:
Actions: ReadandMetadata: Readpermissions
- Personal Access Token (Classic):
-
API Access: Ensure the account/organization has not restricted API access for PATs or apps
Details
How Incremental Syncing Works
On the first sync with no Backfill Start Time configured, the input starts from the current time — the first sync emits nothing and subsequent syncs pick up new completed runs going forward. Configure Backfill Start Time to pull historical data.
Each sync queries a lookback window anchored to the cursor (a structural marker of "the highest time we've fully covered"), not to the current time. This means a run whose lifecycle spans a sync boundary — it was created before the last sync ended but did not complete until after — is still captured in the next sync, because the window looks back by up to Maximum Job Execution Time hours behind the cursor.
The walk is checkpointed per sub-window rather than once at the end. As each sub-window of the lookback range is successfully drained, the cursor advances to reflect that coverage — even if the sub-window emitted no records. The benefit is that crash recovery during a long backfill is cheap: a sync that fails partway through a 30-day backfill picks up from the last successfully-drained sub-window, not from the original starting cursor.
A run that completes within the last ~5 seconds of a sync's walk is deferred to a future sync rather than emitted in the current one. This propagation buffer gives GitHub's API state time to fully settle — so when the cursor advances past a given timestamp, any second-granularity sibling runs at that timestamp have all had time to become visible to the list endpoint, and the next sync sees them together rather than missing the laggards. The practical effect is a ~5-second freshness lag at the tail of every sync, in exchange for exactly-once emission with no boundary-second duplicates.
Correctness contract: logs are guaranteed for any completed run where (updated_at − created_at) ≤ Maximum Job Execution Time. Runs with a longer execution duration may be missed. Increase Maximum Job Execution Time if your workflows regularly run longer than 2 hours.
Subsequent syncs pick up where the previous one ended, using an internal cursor advanced per sub-window. Combined with the propagation buffer, this gives exactly-once emission under normal operation. (See Failure modes below for partial-walk semantics.)
Data Retrieval
- Workflow runs are fetched from the GitHub REST API, filtered by their creation date
- Log archives (ZIP files) are downloaded from GitHub's CDN using signed URLs — these do not count against REST API rate limits
- Root-level per-job log files are extracted and chunked for emission. Per-step logs are skipped.
Log Chunking
A single job's log file is split into ~1 MB chunks at newline boundaries. Each chunk is emitted as one record. Each chunk carries its position via chunk_meta.chunk_index and its end-of-file marker via chunk_meta.last_chunk. Records that share a file_id reassemble into one full log file — see Reconstructing full log files below.
The input automatically batches API queries to stay under GitHub's per-query result cap.
Special Cases
- Expired logs: GitHub retains workflow logs for ~90 days. Expired runs (404 / 410 HTTP status) are skipped without failing the sync. The query window's lower bound is floored at 30 days ago (a safety bound, tighter than GitHub's retention) so in normal operation we don't approach the retention edge.
- Skipped workflow runs: Workflow runs with
conclusion: skipped(e.g., a workflow whose top-levelif:condition evaluated false, or anAuto approveworkflow on an already-approved PR) are not emitted. Their log archives contain only nested per-step runner system entries — no root-level per-job log to ingest. If you compare your emitted-run count against GitHub'sstatus=completedcount for the same window and see a gap, expect it to be exactly the number ofconclusion=skippedruns in that range.
Re-runs and the created_at filter
GitHub's workflow-runs list endpoint only supports filtering by created_at. A re-run preserves the original created_at but bumps run_attempt, so once an input sync has advanced past a run's window, subsequent re-runs of that same run will not be picked up. This is a limitation of GitHub's API, not the input.
Similarly, when a run's first sync sees run_attempt already > 1, only the most recent attempt's logs are fetched — earlier attempts cannot be retrieved retroactively.
Configuration
Settings
| Setting | Type | Required | Default | Description |
|---|---|---|---|---|
| Owner | string | Yes | - | Repository owner (user or organization name) |
| Repository | string | Yes | - | Repository name (without owner prefix, e.g., api-service not owner/api-service) |
| Authentication Method | oneOf | Yes | Personal Access Token | Authentication method to use: Personal Access Token or GitHub App. See sub-fields below. |
| Maximum Job Execution Time (Hours) | integer | No | 2 | Upper bound on workflow run duration (updated_at minus created_at). Runs that complete within this duration are guaranteed to have their logs captured. Runs exceeding this duration may be missed. Accepted range: 1–12. Increase this value if your workflows regularly run longer than 2 hours. |
| Backfill Start Time | date (RFC3339) | No | Current time | Captures runs whose updated_at is after this time. Defaults to the current time on first sync (no historical backfill). Values older than 30 days are silently clipped to a 30-day lookback cap (a safety bound, tighter than GitHub's ~90-day workflow-log retention). |
| Generate Synthetic Data | boolean | No | false | Generate synthetic demo data instead of connecting to the real data source. |
Authentication Method: Personal Access Token
| Field | Type | Required | Description |
|---|---|---|---|
| Personal Access Token | secret | Yes | Personal access token with repo and/or actions:read scopes. |
Authentication Method: GitHub App
| Field | Type | Required | Description |
|---|---|---|---|
| Client ID | string | Yes | The GitHub App's client ID |
| Installation ID | string | Yes | The installation ID for accessing the repository |
| Private Key | secret | Yes | The GitHub App's private key in PEM format |
Record Fields
Each emitted record represents a single chunk of a job's log file. The following table describes the top-level and most commonly used nested fields:
| Field | Description |
|---|---|
workflow_meta | Object containing identifying metadata about the run, workflow, and job that produced this log file. The same workflow_meta is repeated on every chunk of the same file. |
workflow_meta.run_id | Unique identifier for the workflow run within the repository (GitHub's run ID). |
workflow_meta.run_attempt | The attempt number for re-runs; starts at 1. Useful for distinguishing retry attempts of the same run. |
workflow_meta.workflow_id | Unique identifier for the workflow definition itself. |
workflow_meta.workflow_name | Human-readable name of the workflow (e.g., "CI/Build"). |
workflow_meta.run_number | Sequential run number within the workflow. |
workflow_meta.head_branch | Git ref the run was associated with — branch name, tag (e.g. v1.2.3), or PR ref (e.g. refs/pull/N/head). |
workflow_meta.head_sha | Git commit SHA for the code that ran. |
workflow_meta.event | The trigger event (e.g., push, pull_request, schedule). |
workflow_meta.status | Run status. Always completed — this input only ingests completed runs. |
workflow_meta.conclusion | Final result if status is completed: success, failure, cancelled, etc. |
workflow_meta.created_at | RFC3339 timestamp when the run was created. |
workflow_meta.updated_at | RFC3339 timestamp of the most recent update to the run. |
workflow_meta.run_started_at | RFC3339 timestamp when the run began executing. |
workflow_meta.html_url | Link to the run in the GitHub UI. |
workflow_meta.repository | Repository identifier in owner/repo format. |
workflow_meta.actor | GitHub user or app that triggered the run. |
workflow_meta.triggering_actor | The actor that directly triggered this particular run (may differ from actor for scheduled/automated runs). |
workflow_meta.job_name | Human-readable name of the job within the workflow. |
workflow_meta.log_file | Source filename within GitHub's log archive (e.g., 0_build.txt). Useful for grouping multiple jobs from the same run. |
file_id | Stable, deterministic identifier for the source log file: {owner}/{repo}/{run_id}-{run_attempt}-{log_file}. Group records by file_id to reassemble a full file. |
chunk_meta.chunk_index | Zero-based position of this chunk within the source file. Use to order chunks when reassembling. |
chunk_meta.last_chunk | true on the final chunk of a file; false on every earlier chunk. Signals completeness of the log. |
chunk_meta.chunk_size_bytes | Byte length of the content for this chunk. |
chunk_meta.truncated_lines | Count of log lines in this chunk that exceeded 1 MB and were truncated. 0 for nearly all real-world logs. |
content | The raw log text for this chunk, as UTF-8. See Note on non-UTF-8 content below. |
Reconstructing full log files
To rebuild a complete log file from its chunks:
- Group by
file_id: Collect all records that share the samefile_id. - Sort by chunk index: Within each group, sort records ascending by
chunk_meta.chunk_index. - Verify completeness: Confirm that the highest-index record has
chunk_meta.last_chunk == true, and that chunk indexes form a contiguous sequence from 0 to N (no gaps). - Concatenate: Append the
contentstrings in index order. The result is the original log file as it appeared in GitHub's archive.
If chunk_meta.last_chunk == true is missing or the indexes have gaps, the log is incomplete (e.g., a partial sync, or chunks pending in a downstream queue). Wait for the next sync or investigate.
Content format
GitHub's job log files have a consistent structure that's preserved verbatim in the content field:
- Leading byte-order mark. The first chunk of every log file begins with
U+FEFF(the UTF-8 BOM). When reassembling a full file, the BOM appears once at the start. Most parsers handle this transparently; strict ones may need to strip it. - Per-line timestamp prefix. Each line is formatted as
<RFC3339Nano timestamp><space><message>, e.g.2026-04-28T00:31:56.3946773Z Current runner version: '2.334.0'. Timestamps are nanosecond-precision UTC. - Inline GitHub annotation markers. GitHub uses tags like
##[group]...##[endgroup],##[error],##[warning],##[command], and##[debug]inline in the log text to drive UI rendering (collapsible sections, severity badges). They appear as literal text incontent— downstream consumers can parse them out or display them as-is.
Note on non-UTF-8 content
GitHub Actions logs are UTF-8 encoded. If a job's stdout or stderr writes raw non-UTF-8 bytes (rare — usually only when a process emits binary data into its log stream), those bytes are replaced with the Unicode replacement character U+FFFD ("�") when the record is JSON-encoded. This replacement is not reversible. The vast majority of real-world workflow logs are clean UTF-8 and unaffected.
Rate Limits
| Scope | Limit | Window | Notes |
|---|---|---|---|
| REST API (Core) - Personal Access Token | 5,000 requests | Per hour | Increases to 15,000 for Enterprise Cloud |
| REST API (Core) - GitHub App | 5,000 base + scales | Per hour | Scales to 12,500 per hour; 15,000 on Enterprise Cloud |
| Secondary Rate Limit | 900 points | Per minute | 100 concurrent requests max |
| Log Downloads | Unlimited | N/A | Downloaded from signed CDN URLs; does not count against REST API limits |
Rate Limit Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Used
Source: GitHub REST API Rate Limits
Troubleshooting
1. Authentication Errors
- "invalid credentials": Verify the PAT or GitHub App private key is correct and hasn't expired
- "insufficient permissions": Ensure the classic PAT has
repo(orpublic_repofor public repositories), the fine-grained PAT hasActions: ReadandMetadata: Read, or the GitHub App hasActions: Readpermission - "repository not found": Double-check the Owner and Repository settings; verify the token has access to the repository
2. Rate Limit Errors
- The input automatically handles rate limiting by waiting until the reset time
- If sync is consistently slow due to rate limits, consider:
- Using a GitHub App for higher rate limits
- Configuring a more recent Backfill Start Time to reduce historical data to process
- Running syncs during off-peak hours
3. No Data
- First sync: The default starting point is the current time, so the initial sync emits nothing. Set Backfill Start Time to an RFC3339 timestamp to pull historical runs.
- Missing logs: Workflow logs are retained for 90 days by default. Older runs won't have downloadable logs and are skipped.
- Empty repositories: Verify that the repository has Actions workflows enabled and has run at least once.
4. Missing Runs (Sync Boundary Gap)
- If runs that were in-flight during a previous sync appear to be missing, check whether their total execution time exceeded Maximum Job Execution Time. Increase this setting (up to 12 hours) to widen the lookback window and capture longer-running workflows.
5. Partial Records
- Records may have an empty
job_nameif the log filename doesn't follow the expected<N>_<job_name>.txtor<job_name>.txtpattern - A non-zero
truncated_linesvalue indicates one or more log lines in that chunk exceeded 1 MB and were truncated to their first 1 MB — the record is still valid; only the tail of the offending line was dropped
6. Duplicate Records After Errors or Wall-Clock Cap
- The cursor advances per sub-window (typically every few minutes of the lookback range). If a sync errors mid-walk (transient API failure, network issue) or hits the 4-hour wall-clock cap (only triggered by very large backfills or pathological workloads), the next sync resumes from the last successfully-drained sub-window — re-emitting only records from the in-flight sub-window that was interrupted, not from earlier successfully-drained sub-windows.
- The re-emit set is bounded by one sub-window's worth of records (typically minutes of activity, not hours).
- Destinations that key on
file_id+chunk_meta.chunk_index(most upsert-capable sinks) absorb this transparently. Append-only destinations will see duplicate records bounded by what got through before the failure.
Sample Record
Each record represents one ~1 MB chunk of a single job's combined log file. Workflow/run/job identifying metadata is nested under workflow_meta; chunk-specific fields are under chunk_meta.
Code
Related Articles
- GitHub Actions REST API — Workflow Runs
- List workflow runs for a repository
- Download workflow run logs
- GitHub Authentication
- Personal Access Token (Classic) Scopes