Code Isn't Cheap. It Costs Tokens

The developer world is currently buzzing with two powerful, yet contradictory, statements.

For 25 years, we lived by Linus Torvalds' mantra: "Talk is cheap. Show me the code." It was a call for proof over promises. Coding was high-effort, high-skill, and the only metric that mattered.

Then, Kailash Nadh recently flipped the script in a brilliant blog post, arguing that AI has made code generation abundant. In this new world, he says, "Code is cheap. Show me the talk." The bottleneck is no longer typing syntax; it's our ability to articulate, architect, and think critically.

Both are right. But as we move beyond simple code-completion to relying on autonomous AI agents, there is a third reality we have to face:

Coding is not cheap. It costs you tokens.

Over the last few months, I’ve been heavily using coding agents for my own day-to-day work. In fact, I am barely writing any raw code myself these days. I've fully embraced the agentic workflow and still exploring how to master it to become more efficient.

But diving into this deep end has taught me a few hard lessons. I quickly realized that if 'talk' is the new programming language, we have to accept that our "talking" is heavily metered. We are no longer paying for software with just our biological fatigue. The bottleneck is no longer how many hours we can stay awake typing, but how efficiently we can manage our context windows.

Here are a few things I’ve learned about surviving—and mastering—this new AI driven engineering economy.

1. The Return of the Essay: Communication is Non-Negotiable

Remember how we used to dread writing long-form essays, structured answers, and detailed reports in school? We often wondered when we'd ever use those skills in the "real world."

Well, welcome to the real world of AI. Clear communication has transitioned from a "soft skill" to a hard, non-negotiable engineering requirement.

Why? Because at their core, LLMs are stateless, dumb models. They do not possess intuition, they haven't sat in on your product meetings, and they don't know your unwritten business logic. If you don't explicitly explain every single bit of information, they will just output anything—because statistically generating "anything" is exactly what they are built to do.

Thinking clearly, writing meticulously, and drawing detailed system diagrams are no longer tasks you can neglect or push to the bottom of the backlog. They are the prerequisites for getting meaningful work out of an agent.

2. Specs are the New Syntax: Precision Prevents Burnout

When talk becomes the input, sloppy talk becomes the new spaghetti code. In human conversation, ambiguity is cleared up with a quick clarifying question. In agentic coding, ambiguity triggers a cascade of expensive errors.

When your AI agent gets stuck—trying to fix a bug, failing, reading logs, and trying again—it isn't just wasting time. It is burning thousands of inference tokens per loop.

Spec quality comparator

Same intent, very different token bill

Compare how a loose request and a precise spec change assumptions, retries, and review confidence.

Potential saving 26,600 tokens

Prompt sample

Build me a dashboard for users. Make it modern and connect it to the API.

38,400

Token estimate

Retry loops

Assumptions

Review confidence38%

What happens next

1Generates a generic layout

2Finds schema mismatch

3Rewrites fetch layer

4Fixes styling drift

5Needs another clarification

Why it matters

Agent guesses the framework, chart library, and API shape.
Missing empty, loading, and permission states create follow-up loops.
The review becomes architecture discovery instead of code review.

The Trap: Writing "build me a dashboard" forces the agent to guess your database schema, your state management, and your UI framework. It will generate the wrong architecture, which you will then ask it to fix, kicking off an expensive cycle of trial and error.
The Prescription: Write bulletproof specifications. Detail the exact stack, the specific API endpoints, the pagination logic, and the design tokens before you hit enter. Writing a flawless spec on the first try minimizes agent guesswork and saves substantial token costs.

3. Context Curation: Managing the New Memory

We used to manage limited RAM; now we manage the context window.

Because LLMs are stateless, they need the full picture every single time you prompt them. The temptation is to throw your entire repository at the agent and say, "figure it out." This is the equivalent of loading an entire hard drive into memory just to edit a single text file.

Feeding tens of thousands of lines of a complex backend architecture into an LLM just to tweak one routing function is a massive financial waste. It dilutes the model’s focus, increases the chance of hallucinations, and skyrockets your per-request inference cost.

The new technical skill is Context Curation. Like a surgeon, you must extract and map out only the specific files, schemas, and dependencies relevant to the immediate task. High-signal, low-noise prompts yield faster, cheaper, and more accurate results.

4. Model Routing: Match the Tool to the Task

Not every task requires the absolute heaviest, state-of-the-art reasoning model on the market.

Using the most advanced state of the art, expensive model available to generate simple HTML boilerplate, write basic unit tests, or format a JSON object is the definition of burning tokens.

The modern AI Engineer needs a strategic mental map of model tiers and their capabilities to perform different tasks. Use faster, lightweight models for scaffolding, simple scripts, or syntax refactoring. Reserve the heavy-hitting, advanced reasoning models strictly for complex architectural decisions, multi-threading logic, or difficult debugging scenarios. Model routing is the new tech stack selection.

5. The Physical Cost: Why Knowing How to Code is Still Mandatory

It’s easy to treat AI like magic, but inference is computationally heavy, really heavy. Just try doing the linear algebra and all the calculus maths involved in the inference pipeline to understand the real story. Every prompt relies on massive data centers burning significant amounts of electricity. This is exactly why every major model provider imposes strict usage limits, rate-limits your API keys, or caps your prompts even on premium paid plans. Compute is finite, and it is expensive. This is just the beginning, and it's easy to see how token economics will heavily influence the future of engineering.

Fixing a simple console.log error by blindly pasting 1,000 lines of context into an LLM isn't just financially wasteful—it's an irresponsible use of resources that will bite us in the long run.

This brings up a crucial reality check: while we embrace the new agentic workflow and proudly joke that we "don't write code anymore," we cannot afford to forget how to code. We definitely cannot stop teaching the fundamentals to young developers.

In fact, knowing how to code is more critical now than ever. When an AI generates a complex multi-threading script or a database migration, you still have to review it.

In this new era, AI is writing the code, and overnight, everyone has proudly adopted the role of Tech Lead just reviewing the pull requests. But getting promoted to Tech Lead without even knowing the ABCs of your tech is a recipe for disaster. If you can't read the syntax, make sense of the logic, your 'quick review' will take longer than writing it from scratch.

Conclusion: The Rise of the Articulate Architect

We are not entering an era where coding skill is obsolete. We are entering an era where orchestration and communication are paramount.

The developers who will thrive in this token economy are not the ones who blindly hit "generate." They are the articulate architects. They are the ones who can think clearly & logically, write perfect, unambiguous specs, surgically curate context windows, and route tasks to the most efficient models.

They understand that while "talk" may be the new syntax, in the inference economy, precision pays.