Post

Constructivist Software Development

Constructivist Software Development

Status quo

I’ve always been impressed with what Anthropic’s LLMs are capable of in terms of coding. They have helped me create prototypes and navigate documentation like no other tool. I can’t imagine working without coding agents as they can and do speed things up immensely. However, working with Claude Code (CC) has always been frustrating for me, for several reasons:

1. Inefficiency

CC is expensive and inefficient for the tasks I’m delegating to it. CC loves reverse engineering library ZIP files and taking the most complicated way of retrieving information. Instead of consulting documentation or asking me—a rule explicitly stated in all of my CLAUDE.md files—it tries to show off, burning through a nonsensical amount of credits on irrelevant, out-of-scope tasks.

I assume this “hold my beer” mindset is built on the assumption that CC is meant to completely replace software engineers instead of helping them out, at the expense of knowing how things work under the hood. I’m not implying we need to understand all assembler instructions to deliver software in 2026. What has always differentiated exceptional software engineers from mediocre ones is a deep understanding of the abstraction layer just beneath the one they normally operate at.

In the world of agentic coding, this is software design. That said, I fully respect people messing around and prototyping new ideas in their free time and personally discourage any gatekeeping. However, many people seem to ignore or gloss over the fact that there are people who love the process of learning, allowing them to make much more precise and informed decisions.

2. Product Quality

CC is one of the most poorly designed pieces of software I’ve ever had the chance to use. Yes, I’m taking esoteric Java Swing profilers into consideration. The terminal still flickers and randomly overwrites its own output. Subagent execution is an inaccessible black box, officially documented as intentional.

Security or privacy is something CC doesn’t even remotely care about: I stopped counting the number of instances CC didn’t hesitate to rummage through my .env files, even though I explicitly banned it.

The only reason I kept trying out CC were the brilliant underlying models, not the software wrapper around it, which, to many, is considered a thing of the past as well. We’re writing loops now, it seems.

swing With Java Swing desktop apps, at least you knew where you stood. Image source: Picryl

3. Quotas

Saying that customers of Anthropic are being billed under questionable terms is an understatement and has already been discussed elsewhere. To this day, however, I still have no idea how Anthropic rate limits Claude Code usage. This directly impacted how I used CC: no matter the task, I had to keep an eye on the specific model being used, the size of the context window, and the length of the session, distracting me completely from the main job.

I partially solved this by sticking to Opus 4.6 in very short-lived sessions, as this approach minimised CC’s interference. Most of the time, this resulted in a clearer focus on the task at hand and a minimal amount of overcomplicated and overengineered hand-written jungles of code.

Nonetheless, I still fail to understand why the quota is so obscurely presented as one mysterious percentage, instead of giving users insights as to how that quota is computed to allow for usage optimizations. The recent usage breakdown to the tool level is a good step forward but still far from optimal. I’m not a hater of the subscription model in general but prefer people not forcing me into how to use their product.

4. Inconsistency

Further, my experience with Claude Code is that it’s not reliable. Unreliability manifests at multiple levels: generic errors from Anthropic’s API servers without any guidance, inexplicable agent freezes mid-session without doing anything visibly substantial, and the unpredictable quality of the answers themselves.

On one hand, Claude Code can brilliantly solve tougher problems, e.g. completing coverage of a unit test suite with cornercases I’ve missed, but on the other, it’s capable of screwing up when dealing with a trivial issue without even realizing and admitting that it screwed up.

To give you a very specific example of this, I was recently debugging a serverless function that did nothing too fancy, just generated a UUIDv7 and pushed it to the user’s JWT as a custom claim, something the model has probably seen millions of times. However, because CC was so convinced that the bug was lying in the underlying middleware that the function ran on, it insisted on running a small webserver inside of the serverless function because it would help me deliver a fix to production fast.

Now, you might argue that “I’m just holding it wrong”. Let’s even suppose that this is the case. However, I don’t have time for fiddling with CC skills, polishing the right CLAUDE.md or the right prompt tone, as I have more important things to do. I would expect everything to be set once and for all if I’m not expecting different behavior. You’re also not forced to fiddle around with your operating system every single time you use it, why should the opposite be the case for any software product, regardless of its sophistication or emergent intelligence, the most paradoxical argument of all?

Claude Code helped me write obnoxious and repetitive code so that I could focus on more challenging aspects of my project. However, always adapting to its quirks was increasingly taking me more time than writing the annoying code in the first place, not to mention I learned almost nothing in the process but how to please an opinionated moody teenager.

contraption It makes no sense to keep forcing a tool for the use it wasn’t designed for. Image source: Youtube

We Need to Talk, Claude Code

That’s exactly why this week, I started experimenting with Claude Code alternatives that would fit my needs better. I’m completely okay with paying a bit extra for API billing as that provides a very honest feedback on whether I’m overusing those tools and am becoming a Prompterstein. I guess that’s the price (literally) you pay for staying away from oversubsidized omnipotent coding agents whose lifespans are already questionable.

After a bit of research, I decided to use Aider as my base CLI interface and picked OpenRouter as the inference provider so that I don’t have to juggle a list of API keys.

To avoid the rest of the article sounding like an Aider and OpenRouter promo, let me share what I generally expect from my coding agents instead:

  1. I want help with specific tasks rather than one-shotting my next million-dollar SaaS that I have no idea how to implement. Vibecoding platforms or “tools for builders” are not my cup of tea as they intentionally don’t let you learn and understand how things work. Having insights and learning along the way is my hard requirement.

  2. I am still doing manual code reviews because I need to understand the produced code so that I can make well-informed decisions, including those in the future. For me, it’s also about taking responsibility. I am the one shipping the code and delivering the service, thus putting my name on it, so I must know what I’m shipping. Therefore, “it works, all linters and tests are passing, thank me later” isn’t sufficient: the tool must strictly adhere to what I tell it to do and must be able to explain itself.

  3. Referring to documentation, best industry practices, or any other already proven code is something that has always been taken for granted in my industry. The agent shouldn’t hallucinate nonsense out of thin air but back it up with existing code or documentation. This is especially important when it’s working on something that’s not yet present in the codebase. Writing out a custom grammar parser is cute but unsustainable long-term as there are people who have put a lot of thought and care into grammar parsing.

  4. I don’t need long-term memory management or any recalling capabilities. Speaking of which, I personally think that developing agent memory is a crude misapprehension of how LLMs work. The long-term memory, if anything, is the codebase and tests. The agent should be able to navigate a codebase natively and is something that it will be doing on a daily basis. It doesn’t need to load the whole codebase but should know where to look and what to look for. Agent memories have the same issues as preemptive docstrings like this getter returns surname of the Person entity. Once they desynchronize, which in the rapid vibecoding pace they almost unavoidably will, you’re dealing with a set of textfiles that contradict themselves. I know this from personal experience—all too well.

  5. I prefer technology that respects my privacy. I don’t want my agent rummaging through my .env files and config files just to see how to help me. If the agent wants information from me, it should ask me directly, instead of reverse-engineering a linked library just to learn a signature of one method.

The Agents Family

After reflecting on what my main use cases of coding agents are, I was able to come up with a 5-step process in the philosophy of UNIX: do one thing and do it right, and pass it on to the next agent. Here is the 5-step pattern I settled on using my new CLI setup:

  1. I’m working on a new feature or fixing a bug. I pass an initial prompt to an agent, point it to existing code, provide it with all necessary context and tell it what needs to be done. Then, we discuss what the spec really should be and what parts are ambiguous or need clarification. If the agent has multiple interpretations of the prompt, I forbid it from guessing and nodding along but rather asking follow-up questions until the implementation plan is elaborate enough.

  2. Once the plan is ready, the agent starts implementing the feature. It writes code that’s self-explanatory and anyone, including a human, can understand.

  3. Once the first implementation draft is ready, I confront the implementation with an adversarial agent that looks for inconsistencies, sources of new bugs, inefficiencies or any other violation with my original task or my general system instructions. Only when the adversarial agent says we’re good to go, it’s time for me to exploratively test the implementation. I don’t read the code yet, I’m just making sure we’re headed in the right direction.

  4. Once I explore the changes, I read the code to understand it on a conceptual level, asking follow-up questions to probe its design decisions:

    Why was this code handwritten and a library wasn’t used?

    Why this line of code specifically is implemented this specific way?

    What benefit does this caching have here?

    Why are you hardcoding this constant here?

  5. I then pass the code to the last agent that does a code review that picks up remaining nitpicks and polishes the code from the technical standpoint (KISS, DRY, decomposition, SRP, testability) so that it’s shippable. This is where I also focus on reading the code in more detail and making sure I actually understand it to the finest detail.

human chain Do one thing and do it right. Image source: Picryl

Even though I haven’t had a chance to test it under full load, I’m hoping this will be a good step forward from the CC mess. I’ll also slowly switch to API-based billing for other GenAI use cases outside of software development, namely research, idea brainstorming and proofreading, so that I don’t have to keep track of multiple AI expenses.

Finishing thoughts

While writing this, it dawned on me that this revisited workflow aligns perfectly with the product itself I’m currently developing: Heuri. It’s an online learning platform that’s based on the constructivist educational philosophy. Constructivism is based on the idea that skills and knowledge persist and are reusable if they are built by the student themselves, not if they are presented and pre-chewed by someone else.

CC is more aligned with instructivist style of information transmission which aligns with Anthropic’s voice exceptionally well:

“Our mission at Anthropic is to deprecate software developers, so here, take this tool and see how it’s done.”

By switching to more context-aware tools that will most likely be more elaborate to set up in the beginning, I’m hoping to strengthen my existing skills by having a knowledgeable guide by my side.

Happy learning!

Music Fun Fact

Loading a fun music fact...