Research Document v1.0
Abstract
This document synthesizes key findings from the Advanced Context Engineering for Coding Agents (ACE-FCA) framework developed by HumanLayer, combined with Stanford’s developer productivity research. The central thesis: context management is not just important for AI coding agents–it is the determinant of output quality. With AI tools generating increasing volumes of code, the ability to measure and filter quality has become the critical differentiator between productive AI augmentation and a “tech debt factory.”
Source: humanlayer/advanced-context-engineering-for-coding-agents
1. The Problem: AI Tools in Production Codebases
1.1 Stanford Developer Productivity Findings
The Stanford study on AI’s impact on developer productivity found two critical insights:
- Rework Problem: A significant portion of “extra code” shipped by AI tools ends up just reworking the slop that was shipped last week
- Brownfield Penalty: Coding agents excel at greenfield projects but are often counter-productive for established codebases and complex tasks
Common complaints from engineering teams:
- “Too much slop”
- “Tech debt factory”
- “Doesn’t work in big repos”
- “Doesn’t work for complex systems”
1.2 The Default Response
The common response falls between:
- Pessimist: “This will never work”
- Optimist: “Maybe someday when there are smarter models”
The ACE-FCA thesis: You can get remarkably far with today’s models if you embrace core context engineering principles.
2. Why Context Is Everything
2.1 LLMs Are Stateless Functions
At any given point, a turn in a coding agent is a stateless function call:
Context Window In -> Next Step Out
The contents of your context window are the ONLY lever you have to affect the quality of your output.
2.2 Context Window Optimization Priorities
Optimize your context window for (in order):
- Correctness - Accurate information
- Completeness - All necessary information present
- Size - Minimal noise
- Trajectory - Pointing toward the goal
The worst things that can happen to your context window, in order:
- Incorrect Information
- Missing Information
- Too Much Noise
2.3 The ~170K Token Constraint
As Geoff Huntley puts it:
“The name of the game is that you only have approximately 170k of context window to work with. So it’s essential to use as little of it as possible. The more you use the context window, the worse the outcomes you’ll get.”
3. The 40% Compaction Principle
3.1 Frequent Intentional Compaction
The ACE-FCA framework centers on “frequent intentional compaction”–deliberately structuring how you feed context to the AI throughout development.
Key practice: Keep context utilization in the 40-60% range, compacting frequently rather than filling the window.
3.2 What Eats Context?
- Searching for files (Glob/Grep results)
- Understanding code flow
- Applying edits
- Test/build logs
- Huge JSON blobs from tools
Compaction distills these into structured artifacts.
3.3 The Research/Plan/Implement Workflow
The ACE-FCA workflow splits tasks into three phases:
Research Phase
- Understand the codebase
- Identify relevant files
- Map information flows
- Identify potential causes
Plan Phase
- Outline exact implementation steps
- Specify files to edit and how
- Detail testing/verification for each phase
Implement Phase
- Step through plan phase by phase
- Compact status back into plan after each phase
This workflow deliberately creates compaction points where human review provides maximum leverage.
4. Human Leverage Points
4.1 The Error Amplification Problem
- A bad line of code = 1 bad line
- A bad line of plan = hundreds of bad lines of code
- A bad line of research = thousands of bad lines of code
4.2 High-Leverage Review
Focus human attention on the HIGHEST LEVERAGE parts of the pipeline:
| Review Point | Leverage | What to Look For |
|---|---|---|
| Research | Highest | Incorrect assumptions, missing context, wrong file locations |
| Plan | High | Flawed approach, missing edge cases, wrong testing strategy |
| Code | Medium | Implementation bugs, style issues |
When you review research and plans, you get more leverage than reviewing code alone.
4.3 Mental Alignment
The most important outcome of research/plan/implement isn’t code quality–it’s mental alignment.
“I can’t read 2000 lines of golang daily. But I can read 200 lines of a well-written implementation plan.”
A guaranteed side effect of shipping more code is that a larger proportion of the codebase becomes unfamiliar to any given engineer. The workflow maintains team knowledge.
5. Measurement: The Missing Link
5.1 Beyond Context to Measurement
While the ACE-FCA framework emphasizes context engineering, the deeper insight is that measurement is what validates context quality.
The research/plan/implement workflow includes built-in measurement points:
- Research is validated by human review
- Plans are tested against research
- Implementation is verified against plans
- Tests confirm implementation correctness
5.2 The Rework Metric
From the Stanford study: the true productivity metric isn’t lines of code shipped, but lines of code that survive without rework.
AI tools that optimize for code volume without quality measurement end up creating negative productivity when rework is factored in.
5.3 Quality Signals to Track
| Metric | What It Measures | Why It Matters |
|---|---|---|
| Rework Rate | % of code changed within 2 weeks | Indicates slop/tech debt |
| Plan Accuracy | % of plan steps requiring modification | Research quality signal |
| Context Utilization | % of window used at task completion | Efficiency indicator |
| Human Review Findings | Issues caught at research/plan phase | Leverage effectiveness |
6. Practical Implications
6.1 What This Means for Teams
- Context is the bottleneck, not intelligence - Models are capable enough; feed them better
- Compact early and often - Don’t wait for context overflow
- Review upstream - Catch errors at research/plan phase
- Measure what matters - Track rework, not just output volume
- Maintain mental alignment - Team knowledge is a product
6.2 When It Doesn’t Work
The ACE-FCA approach has limitations:
- Requires engaged human participation throughout
- Complex dependency chains can defeat research depth
- Need at least one domain expert on the team
- Some problems can’t be prompted through in any timeframe
6.3 The Real Skill
“AI for coding is not just for toys and prototypes, but rather a deeply technical engineering craft.”
The craft is:
- Knowing when to compact
- Knowing what to keep vs. discard
- Knowing where to focus human review
- Knowing how to measure quality
7. Key Takeaways
- 40-60% Context Utilization: Keep the window under-filled, compact frequently
- Research -> Plan -> Implement: Build compaction into workflow structure
- Human Review at High Leverage Points: Research errors cost more than code errors
- Measure Rework, Not Output: Productivity = code that survives
- Mental Alignment Matters: Teams need to understand AI-generated code
References
- ACE-FCA: Advanced Context Engineering for Coding Agents - HumanLayer
- Stanford Study on AI’s Impact on Developer Productivity - Yegor et al.
- Specs are the new code - Sean Grove, AI Engineer 2025
- Ralph Wiggum as a Software Engineer - Geoff Huntley
- 12-Factor Agents - HumanLayer
- Code Review Essentials for Software Teams - Blake Smith
Document generated 2026-02-02. Based on research from the ACE-FCA framework by HumanLayer.