← gitpulse
New Feature·Pushed May 1, 2026·M

Daily sync now skips already-processed commits

GitPulse can now skip commits it has already analyzed. On each daily run, the action fetches the manifest from the deployed site, restores prior stories locally, and only invokes the LLM for brand-new commits.

GitPulse analyzes git commits and generates AI summaries for project activity feeds. Running that analysis on every commit every day becomes expensive and slow as history grows. The action now checks the deployed GitHub Pages site at startup. It fetches [[code]]manifest.json[[/code]], which lists every commit SHA already processed. Prior story JSONs are restored to disk in parallel. New commits are identified by filtering the git log against the manifest SHA set — only the delta goes to the LLM. The bootstrap case is handled automatically. If no manifest exists yet (first run), all commits in the configured window are processed normally. From that point forward, subsequent runs are O(new commits) regardless of total history depth. Output paths moved from [[code]]site/src/content/[[/code]] to [[code]]site/public/data/[[/code]] so the JSON files are served at HTTP-accessible paths after deploy. The old content directory is gone. In the [[code]]@gitpulse/action[[/code]] package, [[code ref=1]]SiteFetcher[[/code]] handles HTTP retrieval with graceful fallback. [[code ref=2]]state.ts[[/code]] manages the manifest and cursor tracking. Configuration accepts [[code ref=4]]GITPULSE_SITE_URL[[/code]] to override the default [[code]]https://{owner}.github.io/{repo}/[[/code]] URL.
Technical description
This PR implements incremental processing for GitPulse's daily GitHub Action runs. Previously, every scheduled run walked all commits in the configured window and sent each one to the LLM — an O(n) operation against the entire history. As repositories accumulate commits, this becomes expensive and slow. The solution: at startup, fetch a manifest from the already-deployed GitHub Pages site. This manifest (written in the previous run) maps each processed commit SHA to its story ID. Restore those prior stories to the working tree, then walk the local git log and filter out anything already in the manifest. Only the delta flows through the LLM. **State Management** Two new files handle persistence. [[code ref=2]]state.ts[[/code]] exports [[code ref=5]]buildManifestFromStories[[/code]] which creates a manifest containing all story IDs and their commit SHAs, sorted by date. [[code ref=6]]buildStateFromStories[[/code]] extracts the cursor — the newest commit SHA and its timestamp — into [[code]]state.json[[/code]]. These files are written to [[code]]site/public/data/[[/code]] so they are HTTP-fetchable after GitHub Pages deployment. On subsequent runs, [[code ref=3]]priorManifest[[/code]] is null-checked. If it exists, prior stories are restored via [[code ref=1]]SiteFetcher.restorePriorStories()[[/code]] before the git walk. The manifest SHA set is used to filter [[code]]allCommits[[/code]] into [[code]]newCommits[[/code]]. **SiteFetcher** The new [[code ref=1]]SiteFetcher[[/code]] class in [[code]]action/src/site-fetcher.ts[[/code]] fetches JSON from the deployed site over HTTPS. It handles missing files gracefully (returns null), making bootstrap a natural first-run case rather than requiring special flags. Parallel restoration uses the existing [[code]]pMap[[/code]] concurrency utility. Failed fetches are counted and reported but do not block the run — the LLM can re-generate any missing stories. **Configuration Changes** In [[code]]action/src/config.ts[[/code]], [[code]]outDir[[/code]] was replaced with [[code]]dataDir[[/code]] and [[code]]storiesDir[[/code]]. The new [[code ref=4]]GITPULSE_SITE_URL[[/code]] environment variable overrides the default GitHub Pages URL. The auto-constructed fallback follows [[code]]https://{owner}.github.io/{repo}/[[/code]] convention. **Site Path Migration** The [[code]]site[[/code]] app previously read stories from [[code]]src/content/stories/[[/code]] — a directory generated at build time. This is replaced by [[code]]public/data/stories/[[/code]], which is served statically by GitHub Pages. [[code]]site/src/lib/repo.ts[[/code]] and [[code]]site/src/lib/stories-loader.ts[[/code]] point to the new paths. The old [[code]]site/src/content/.gitkeep[[/code]] is deleted. ````mermaid graph LR A[Daily Run Starts] --> B{Fetch manifest.json?} B -->|No| C[Bootstrap: process all commits] B -->|Yes| D[Restore prior stories from site] D --> E[Walk local git log] E --> F[Filter: only new SHAs] F --> G[LLM processes delta] G --> H[Write stories + manifest + state] H --> I[Deploy to GitHub Pages] I --> J((Ready for next run)) ```` Files at a Glance: - [[code]]action/src/site-fetcher.ts[[/code]] — Fetches manifest and stories from deployed site - [[code]]action/src/state.ts[[/code]] — Manifest and state building/writing - [[code]]action/src/config.ts[[/code]] — Updated paths and new siteUrl config - [[code]]action/src/index.ts[[/code]] — Incremental logic, filtering, cursor tracking - [[code]]site/src/lib/repo.ts[[/code]] — Path updated to public/data/ - [[code]]site/src/lib/stories-loader.ts[[/code]] — Path updated to public/data/ - [[code]]site/src/content/.gitkeep[[/code]] — Deleted

Categories

  • New Feature (60%)Adds incremental processing - restores prior state from deployed site and only processes new commits
  • Performance (35%)Reduces LLM calls from O(all commits) to O(new commits only); bootstrap is a one-time cost
  • Configuration (5%)New environment variables GITPULSE_SITE_URL and changed output paths