Engineering Docs With AI: What It Writes Well vs What It Doesn't

Engineering Docs With AI: What It Writes Well vs What It Doesn't

6/15/20263 views9 min read

TL;DR

  • AI writes API references, setup guides, and runbooks well — they're mechanical translations of structured artifacts (code, scripts, alerts) into prose.
  • AI writes design docs and architecture decision records badly — those need judgment, trade-off framing, and the kind of context only humans hold.
  • The right split isn't "AI vs human" — it's "AI drafts the deterministic layer, humans own the judgment layer." Trying to merge them produces the worst of both.

When a VP of engineering at a 200-person company told me they'd "let AI handle the docs problem," I asked which docs. Their team had silently committed to AI-generated everything — API references, runbooks, design docs, ADRs. Six months later the design docs had been quietly rewritten by humans, and the API reference was the best it had ever been.

Why does AI documentation work for some docs and not others?

Because "documentation" is three different jobs in a trench coat. Reference docs describe what a thing is. Operational docs describe how to use it. Decision docs describe why a thing exists the way it does. AI is competent at the first two and unreliable at the third — and the reason is structural, not a model limitation.

Definition: Reference doc — a doc whose source of truth is code or config (API signatures, environment variables, command flags). Updatable mechanically.

Reference and operational docs derive from artifacts the AI can read. API signatures are in code. Environment variables are in config. Runbook steps map to alerts and incident history. The AI's job is translation, not invention — and translation is where it's strongest.

Design docs and decision records derive from judgment the AI can't read. Why did the team pick Postgres over DynamoDB? Because two engineers had operational scars from DynamoDB at a previous job, the CTO weighted reliability over scale-out, and the team's existing observability tooling already had Postgres dashboards. None of that is in the codebase.

What AI writes well

API reference

AI generates accurate API reference from the source code, including parameter types, return shapes, error codes, and basic example usage. The output is consistent across endpoints — which humans struggle to maintain when reference doc is a hand-written sidecar. The maintenance loop closes automatically: when the code changes, the doc regenerates.

Setup guide / onboarding doc

AI writes setup guides well from the actual setup scripts, the README, and the deploy config. It catches inconsistencies between what the README says and what the script does — which is the #1 reason setup guides go stale. A first-day onboarding doc that the AI drafts from the repo + scripts + env-vars is typically 80% correct on day one and improves with feedback.

Runbook

Runbooks are operational reference — for alert X, do Y, escalate Z. AI drafts these well from the alert definition, the service architecture, and prior incident history. The pattern that works: every new alert ships with an AI-drafted runbook entry that the on-call engineer edits within the first week of seeing the alert fire.

Definition: Runbook — operational doc that pairs an alert or failure mode with the actions an on-call engineer should take. Lives or dies on freshness.

What AI writes badly

Design doc

A design doc justifies a non-obvious choice. The AI can describe options and trade-offs in the abstract but cannot say which trade-off this team should accept and why. The result is bland: "Option A has higher availability, Option B has lower cost." A human design doc says: "We're picking Option A because last quarter's outage cost us 3 enterprise renewals and we're not eating that again."

Architecture decision record (ADR)

ADRs are explicitly about the "why." AI-generated ADRs read like Wikipedia summaries of the technology. Human-written ADRs read like a memory of the conversation — which is the only thing the ADR is for. Six months later, when someone asks "why did we pick this," only the human version answers.

Roadmap-adjacent docs

Anything that says "what's coming and why" — quarterly engineering plans, roadmap framing docs, internal positioning of a refactor — depends on context that lives outside the codebase. AI drafts here read as generic. Humans write these or they read as marketing.

The split — copy/paste matrix

## Doc-by-doc AI/human split

| Doc type            | AI drafts? | Human owns? | Refresh trigger    |
|---------------------|------------|-------------|--------------------|
| API reference       | Yes, 100%  | Spot review | On code change     |
| Setup guide         | Yes, 100%  | Edit weekly | On repo change     |
| Runbook entry       | Yes, 100%  | On-call ed. | On alert change    |
| Postmortem timeline | Yes, 100%  | RCA section | Per incident       |
| Design doc          | Outline    | Full body   | On scope change    |
| ADR                 | Template   | Full body   | Per decision       |
| Quarterly plan      | No         | Full        | Quarterly          |
| Onboarding wiki     | Sections   | Org-context | Monthly            |

## Refresh rules
- AI-drafted docs auto-regenerate on source change + post a diff PR.
- Human-owned docs have a named owner with a quarterly review reminder.
- Mixed docs (outline AI / body human) get a "last reviewed" stamp.

The matrix is what makes the split survive a quarter. Without it, every new doc becomes a debate about whether AI should write it.

Tool tip (Course for Business): The reason engineering doc rollouts fail isn't the tooling — it's that nobody on the team knows where the AI boundary sits. Our 6-week program uses the Augment, don't replace framing to draw that boundary explicitly per doc type, and AI Champions (1:15-20) make sure engineers know which docs they're still expected to write themselves. Week 3 includes a Shoulder-to-Shoulder session where a senior engineer pairs with a junior on rewriting an AI-drafted runbook — the junior learns the doc craft, the runbook gets accurate. Walk through the program at https://course.aiadvisoryboard.me/business.

Team scan (what AI champions report after week 1)

  • ~85% of engineers have used AI to draft at least one doc this week
  • The most common first AI doc use case: regenerating a stale README or API ref
  • Saved time per doc: 30-90 minutes on reference docs, near zero on design docs
  • Top complaint: "the AI design doc sounded confident and was wrong" — flag the doc type, not the AI
  • One engineer has rewritten an AI-drafted ADR completely — this is the expected pattern, share it
  • Runbook coverage on alerts went from ~40% to ~85% in one week — biggest win
  • API reference last-updated-stamp went from 6 months stale to current
  • Zero design docs shipped purely AI-written — this is the line you hold
  • One champion ran the doc-by-doc split matrix in the engineering all-hands — adopt that as the kickoff pattern

Micro-case (what changes after 7-14 days)

A 90-engineer SaaS company had three documentation problems compounding: API reference was six months stale, runbooks covered 40% of paging alerts, and the design doc folder was full of half-finished drafts because seniors didn't want to spend a day on a doc. They split the work by type in week one — AI took the API ref and runbook regeneration on a daily job, humans kept the design docs and quarterly plans as before. By day fourteen, API reference was current and auto-regenerating on code merge, runbook coverage hit 85% with on-call engineers editing each new entry, and the design-doc backlog stopped growing because seniors had two extra hours a week back. Net effect: on-call engineers were finding the runbook before the page resolved, and new hires were ramping in five days instead of three weeks because the setup guide was finally accurate. The design docs didn't improve — but they didn't get worse either, and the team's appetite to write them came back.

Note on this case: This example is illustrative — based on typical patterns we observe with companies of 30-500 employees, not a single named client. Specific numbers are rounded approximations of common ranges, not guarantees.

Tool tip (Course for Business): The thing that makes engineering doc adoption stick is the named AI Champion (1:15-20) running the weekly doc-by-doc review — what's working, what's drifting, where humans need to take a doc back. Our 6-week program teaches that review explicitly in week 4, with a Shoulder-to-Shoulder hot seat where a champion sits with the eng manager and walks through the matrix on real docs from the team's repo. Book a 30-min mapping call at https://course.aiadvisoryboard.me/business.

FAQ

Should we let AI write our public-facing docs? For reference content (API docs, SDK guides) — yes, with editorial review. For tutorial content and conceptual explainers that shape how customers understand your product — no, those are positioning artifacts and need human voice.

What about diagrams? AI is now usable for sequence diagrams and basic architecture diagrams when fed accurate input. It's bad at "which level of abstraction does this diagram need to be at" — which is the same judgment problem as design docs. Use AI to draft, humans to choose the abstraction.

Will AI doc quality keep improving? Yes for reference and operational docs. The judgment layer for design docs and ADRs is unlikely to close soon — that's not a model-quality problem, it's a context-access problem.

How do we stop AI docs from being subtly wrong? Two things. First, the source-of-truth chain must be tight: AI reads code, not other AI-generated docs. Second, the "last regenerated" stamp must be visible to readers so they can sanity-check against the codebase.

What about Notion/Confluence-style team docs (decisions, retros, project pages)? AI fills the structured parts (timelines, status fields, generated summaries) but the narrative parts stay human. The split matrix above applies — figure out which job each doc is doing and route accordingly.

Conclusion

AI engineering docs are not a binary win or a binary failure. They're a specific tool for a specific layer — the deterministic translation of code, config, and alerts into prose. Get that layer right and you save engineers 30-90 minutes per doc and end up with reference docs that are actually current. Try to extend it into design docs and you ship confident-sounding bland-ness that no one trusts.

Run the doc-by-doc split matrix this week. Pick the three doc types where you'll let AI draft 100%. Pick the three where humans stay fully responsible. Write the boundary down.

If you want every employee — including your engineers — to ship their first AI automation in five days with this kind of structured boundary, book a 30-min call and we'll map your team's first week at https://course.aiadvisoryboard.me/business.

Frequently Asked Questions

AI-Powered Solution

Ready to transform your team's daily workflow?

AI Advisory Board helps teams automate daily standups, prevent burnout, and make data-driven decisions. Join hundreds of teams already saving 2+ hours per week.

Save 2+ hours weekly
Boost team morale
Data-driven insights
Start 14-Day Free TrialNo credit card required
Newsletter

Get weekly insights on team management

Join 2,000+ leaders receiving our best tips on productivity, burnout prevention, and team efficiency.

No spam. Unsubscribe anytime.