Case study

Using bug history to make pull request review more effective.

I built an autonomous regression-review workflow that compares pull request diffs against historical failure patterns, then posts evidence-backed feedback before risky changes reach production.

The project sits at the overlap of quality engineering and AI workflow design: use LLM reasoning where it adds leverage, but anchor decisions in concrete historical evidence rather than vague summarisation.

Context

The problem it solves

Traditional code review is good at spotting style issues, obvious mistakes, and architectural concerns. It is much less reliable at noticing when a new change quietly resembles a past production failure, especially in large or fast-moving codebases.

Quality teams often have valuable defect history spread across Sentry, issue trackers, and commit logs, but that knowledge rarely shows up at the exact moment a risky pull request is under review.

Design goal

Turn historical bug knowledge into something reviewable and immediate, not something buried in old tickets after a regression has already shipped.

My role

What I built

Workflow

How it works

1. Build the knowledge base

Past bugs, incidents, issue tickets, and blame history are collected and stored in a vector-backed retrieval layer so the system can surface similar historical failures for a new change.

2. Inspect the pull request diff

The agent analyses changed files and code patterns, then retrieves the most relevant historical examples before generating any review output.

3. Reason over evidence

GPT-4o is used as a reasoning layer, not the source of truth. The model compares the diff against retrieved examples and identifies likely regression patterns with supporting references.

4. Post review comments

Findings are sent back to GitHub as structured review comments so the PR author gets signal in the same workflow where decisions are already happening.

Key decisions

Important design choices

Impact

Why this matters

The value of this project is not just AI-assisted review. It demonstrates a way to turn QA knowledge into an active engineering system that improves decision quality earlier in the lifecycle, where prevention is cheaper than recovery.

Stack

Tools involved