Manage AI like a team, not a brain
Koshy John wrote a piece called “A.I. Should Elevate Your Thinking, Not Replace It”. It’s worth reading. He argues AI is splitting engineers into two groups: people who use it to think faster, and people who use it to avoid thinking. The first group compounds. The second group hollows out, simulating competence without ever building it.
He’s right. I’ve done it myself, and not the lazy junior version of it. The polished senior version, where the output was fluent enough that I didn’t catch it. Colleagues did. That was the embarrassing bit, and I own it. Worse for early-career engineers, because they don’t have the reps yet. You build judgement by struggling, and there’s no shortcut.
But the piece left me with a question he doesn’t quite answer. If I agree with all of that, what does the operating model actually look like? Because the bar he sets, “understand everything that is done on your behalf,” doesn’t scale. It’s the same bar engineering managers can’t meet for their own teams, and never have. The smartest leaders don’t read every PR or every doc. They build judgement about what to read carefully, what to sample, and what to trust.
So here’s what I keep coming back to.
AI is a team you manage, not a brain you absorb.
That changes the question. The question isn’t “can I understand everything this thing produced.” It’s “do I have a system that catches the things I’d want to catch, so my attention can go to the calls that actually need me.”
There’s a real new constraint at play. AI doesn’t just speed work up, it generates output at a volume that exhausts critical thinking. The more I let it produce, the more noise it creates, and the more I get worn down reviewing things that don’t need me. By the time something actually does need my judgement, I’m already tired. That’s the modern version of Koshy’s failure mode. It’s not laziness, it’s attention bankruptcy.
The way out is the same way it’s always been in management. You codify the judgement you’ve already done. Coding standards don’t replace your taste in code, they externalise it so juniors stop getting reviewed on the same five things forever. AGENTS.md files, product principles, prompt rules, sub-agents that validate against principles before anything reaches you. Same idea. I’ve open-sourced one version of this as a skill called counsel. It fans a draft out to multiple local agents in parallel and synthesises their critiques, so by the time something reaches me it has already been peer-reviewed by other models with different blind spots. Each one of these tools is a piece of judgement you’ve applied once, captured, and now never reapply manually. Your attention compounds at the frontier where new judgement is genuinely required.
It’s a ratchet. Every time the LLM lacks the right instinct, you don’t conclude “AI is dumb.” You ask why it lacked the instinct, codify the missing context or principle, and move on. Next time, that judgement runs downstream automatically. You’re not thinking less. You’re thinking about harder things, because the easier thinking has been captured.
I’ve been doing this in my own job recently. I kept ending up in the same comments on engineering RFCs, repeating the same product judgement over and over, and eventually wrote up a memo to my senior team admitting the problem was mine to fix. The proposal was to codify our product principles into a prompt that engineers run against their own docs before sharing them. Self-service, not dependent on me catching things in review. This isn’t about removing the thinking. The thinking is still forced on them, and on me. They still have to reason about the principles, decide where to follow them and where to deviate and explain why. I still have to reason about the judgement calls that actually need me. What changes is that neither of us is doing the foundational thinking from scratch every time. The thinking is preserved. It’s streamlined.
There’s a second-order point most takes on this miss. You’re not building this for the AI you have today. You’re building it for the AI you’ll have in 6-12 months. Cursor got built in a window where the underlying models weren’t quite good enough yet, on the bet that they’d be good enough soon. Same logic applies here. The judgement you codify, the validators you wire up, the principles you encode all get more powerful as the underlying models improve. Wait until the models are obviously good enough and you’re already a year behind the people who built infrastructure for the better models that were always coming. None of this excuses the thinking part. The principles are still yours. The ownership is still yours. But you have to plan for where the model will be, not where it is.
Here’s where the management analogy breaks, and Koshy’s worry kicks back in. With humans, you build calibration over years. Reputation corrects errors. People flag uncertainty. AI doesn’t do any of that by default. It produces fluent confidence whether or not it should, and worse, the next AI down the line will cite its mistakes as fact. We’ve seen this at Ably. One LLM-generated inaccuracy ends up in a doc, another agent picks it up as truth, and now you’ve got a foundational layer that’s quietly wrong. Karpathy talks about this. Context rot at the foundation. It’s real, and the model I’m describing doesn’t prevent it. It bounds it, and makes it visible faster.
So the firewall has to be explicit. Judgement, the actual choosing of words and decisions, is not offloadable. When I write, I own every word, even when an LLM has helped me draft. When I make a call, the call is mine. The way I make that real for myself is heavy audio review. I read content out loud, dictate revisions, and iterate four or five times before anything gets published. I’ve leaned into the practice hard enough that I’m apparently in the top 10 for dictation use, which says more about volume than precision, but it makes reading every word fast enough to be true rather than aspirational. What’s offloadable is everything that surrounds the judgement. Research, drafting, validating against known principles, surfacing risks. Everything that lets me arrive at the decision with full context but without exhausted attention.
That’s the line I think Koshy and I would agree on entirely. You can’t shortcut the formation of judgement. Juniors still have to do the reps. Seniors still have to read every word that goes out the door under their name. But everything between those two endpoints, the support work, the boilerplate, the noise reduction, is exactly what AI should be doing, and what most engineers still aren’t using it for properly.
The discriminating skill in the AI era is the one good engineering managers have always had. Telling fluency from reasoning. Knowing where to dive deep and where to trust. Owning the output regardless. The mechanism is new. The skill isn’t.
If your AI use isn’t making your team better at this, it’s probably making them worse.

