HanLHo. - Fractional Architect & Software Product Engineer - testing-dsls

AI’s Opportunity: Pacing Control Loops with Development

2026-02-04T00:00:00+00:00

What caught my attention in the book Vibe Coding by Gene Kim and Steve Yegge is the idea that, as LLMs and coding agents change how we build software, control loops—tests, reviews, and other signals that tell you whether a change behaves as expected—should be faster and more integrated into development feedback loops than before. My intuition says this makes perfect sense.

For example, when there is a dedicated test stage or a QA role that tests after the fact, that role inevitably struggles to keep up with the speed of development. Over time, this makes it increasingly difficult to sustain a 'testing after the fact' organisation of quality.

So how do we solve this?

Some may think that introducing AI by implementing it at the test level after the fact, could be the solution. However, at the rate of development I see and read about, this approach will be hard to keep up with. One either has to accept not fully taking advantage of what AI can help development with, or rethink how testing is integrated into the development process.

Put bluntly: if AI lets you produce a feature in hours but the first meaningful acceptance signal only arrives days later in a separate stage, quality assurance become the bottleneck.

To me, the logical consequence is a stronger shift towards automated quality controls, including acceptance tests and code reviews at the least. I refer to acceptance tests here as writing executable specifications of expected behaviour (in domain language) before or alongside the code. This implies that testing has to move earlier in the development chain because of AI.

AI is an opportunity to start writing acceptance tests if you have not yet. It pushes us to invest time in strategic test design, testing against stable contracts, testing from a behavioural point of view, and isolating test descriptions from the actual implementation.

Put differently, the shift in development practices that LLMs are causing should inspire more adherence to testing best practices, not less. That is, if you want to keep on adding new features, fix and prevent bugs, and keep up the pace of development.

More broadly, to keep benefiting from AI over time, we should shift towards tightly coupled feedback loops embedded in everyday development. This is not limited to testing but also applies to, for example, reviews. In that sense, AI doesn’t remove quality practices; it raises the stakes if you don’t have them.

If testing 'shifts left', team structures must change as well. This evolution points towards smaller, more autonomous teams where testing, development, and feedback are inseparable rather than sequential.

AI presents us with an opportunity: not faster quality control after the fact, but to design systems, processes, and teams that make quality the fastest path forward, so control loops can keep up with the increasing pace of development loops.

Implementing an Urgent Feature with Opencode, Claude, and Zed

2025-12-18T00:00:00+00:00

This is a short post to share a positive experience I had using an LLM agent to quickly add a feature to an existing personal CLI time-tracking application</a>. Below, I describe how I added it using Zed, Opencode and Claude.
To start, I wasn't even sure the feature I needed existed in a text-based time-tracking application I use day-to-day to keep track of what I work on. My application has a way of getting the information I needed out, but the feature that should make this easy was missing details on what I had actually worked on.
So, should I invest the time and expand the feature, or accept that it was missing for now and spend a lot more time on manual work? I could try an LLM agent and see if it could help me implement the change quickly. Each choice had downsides: spending more time digging through time-tracking information to fill in timesheets is not very appealing, and implementing the feature (with or without agents) could become a time sink. I also had lots of other work planned for the day.
I decided to implement it, telling myself I’d stop if it didn’t look like I had a clear path to finish it within one hour.
Because I think it is relevant to the (spoiler) successful implementation, let me share a little about this project. It is a Rust codebase that I use to test development practices, and I think it is structured and implemented fairly well. The CLI, the main part of the application, has over 95% coverage using behaviour-driven, DSL-style acceptance tests. This setup gives the LLM models both structure and plenty of examples to follow when adding tests. I will not go into the details here, but I have added a brief example at the end. Also noteworthy: this is a small project, which definitely makes a difference.
For this implementation, I used Zed with its Opencode integration. Lately, I have been on the command line building smaller apps prompt-driven, without worrying much about the fine details. But for this project the actual implementation mattered to me, so I wanted to track changes more closely in an IDE. Opencode taps into my Claude subscription; I can use Opus for planning and Haiku for implementation.
Honestly, I was very pleased with how smoothly this feature was implemented. What contributed to this was the plan-first approach before implementing anything. For anything non-trivial, always plan first!
Here is a high-level overview of my interaction with Opencode in Zed:

Investigate

I started with Claude Opus and asked whether the feature I needed already existed, rather than looking it up myself because I was under time pressure. It didn't.</li> </ul> </li>
Plan

I asked Claude to plan the feature and use a test-driven approach. It broke the work into nine tests, and I asked it to pause after each one for me to review.</li>
Before starting any implementation, I asked it to write the plan to a Markdown file in the backlog.</li>
Then I reviewed the plan. That sounds more superficial than what I did, but I cannot say much more than that it simply looked good.</li> </ul> </li>
Implementation

I switched to Claude Haiku for implementation.</li>
It started off well and asked for feedback after each test cycle, and I asked for a refactor to remove duplication.</li>
While it was implementing, I discovered I wanted a different kind of description for the tasks, so I told it to change that in the plan. I did not switch models for this.</li>
The plan was updated in all the correct places.</li>
After this, the workflow changed: it stopped asking me for feedback after each step, and before I realised it, seven of nine tests were running. Not the TDD flow I asked for, but it worked.</li>
Instead of asking it to redo anything, I reviewed the implementation (it was not a lot of code) and continued.</li>
I ran the program against my own data and everything worked as intended. Aside from the one refactoring to remove duplication, I did not change the code.</li>
One loose end I had to remind it of: update the documentation.</li> </ul> </li> </ul>
I have shared the full session here: https://opncd.ai/share/aBYozahW.
There were some pitfalls of using LLMs that I ran into, and I admittedly leaned into them. Speed beating accuracy is a real risk. The feature works and the code looks good, but if I were coding hands-on I probably would have reviewed more thoroughly. It is hard to tell if the end result would have been drastically better. It requires discipline to not start running along with the agent and to not start accepting everything if the outcome is as expected. The LLM gave me what I needed and any follow-up changes should be small, but I still see little things I would have done differently if I done it manually (which other developers may disagree with too, to be fair). For example, some of the tests could do with fewer assertions. The current code organisation makes that easy to address later. If any tech debt was added, it is very small and under control, so I stopped, generated my reports and filled in my timesheets.
Overall, working in Zed made it easy to review the code, and combining Opencode's plan phases kept things organised. The existing, structured DSL-based test approach with plenty of examples also helped.
Extra: How an LLM can work better with a well-structured DSL</h2>
To give some context, what the application needed was to combine two existing flags: breakdown</code> and details</code>. The time breakdown</code> reports were already implemented but were only reporting time spent per day, week, month or year. What I needed were details of the projects I had worked on. The application already had a details</code> flag but it was not implemented for this view.
In the test DSL, the flags are set by calling methods in the given</code> setup phase: breakdown_flag(...)</code> and details_flag()</code>. The breakdown feature did not implement the details</code> flag, so it was not used in the tests for this feature. What is nice (and I credit this way of testing for it) is that the LLM was able to figure out the details_flag</code> was already present and decided to re-use it: Cmd::given().details_flag()....</code>. Here is an example of such a DSL test:
#[test] fn breakdown_day_with_details_should_show_tasks_per_day() { let some_content = r"## TT 2020-01-01 - #project-a 1h Task A - #project-b 2h Task B"; Cmd::given() .details_flag() .breakdown_flag("day") .tags_filter(&["project-a", "project-b"]) .at_date("2020-01-01") .a_file_with_content(some_content) .when_run() .should_succeed() .expect_task_with_duration("project-a", "1h 00m") .expect_task_with_duration("project-b", "2h 00m"); } </code></pre> Thank you for reading, Hans