This is a short post to share a positive experience I had using an LLM agent to quickly add a feature to an existing personal CLI time-tracking application. Below, I describe how I added it using Zed, Opencode and Claude.
To start, I wasn't even sure the feature I needed existed in a text-based time-tracking application I use day-to-day to keep track of what I work on. My application has a way of getting the information I needed out, but the feature that should make this easy was missing details on what I had actually worked on.
So, should I invest the time and expand the feature, or accept that it was missing for now and spend a lot more time on manual work? I could try an LLM agent and see if it could help me implement the change quickly. Each choice had downsides: spending more time digging through time-tracking information to fill in timesheets is not very appealing, and implementing the feature (with or without agents) could become a time sink. I also had lots of other work planned for the day.
I decided to implement it, telling myself I’d stop if it didn’t look like I had a clear path to finish it within one hour.
Because I think it is relevant to the (spoiler) successful implementation, let me share a little about this project. It is a Rust codebase that I use to test development practices, and I think it is structured and implemented fairly well. The CLI, the main part of the application, has over 95% coverage using behaviour-driven, DSL-style acceptance tests. This setup gives the LLM models both structure and plenty of examples to follow when adding tests. I will not go into the details here, but I have added a brief example at the end. Also noteworthy: this is a small project, which definitely makes a difference.
For this implementation, I used Zed with its Opencode integration. Lately, I have been on the command line building smaller apps prompt-driven, without worrying much about the fine details. But for this project the actual implementation mattered to me, so I wanted to track changes more closely in an IDE. Opencode taps into my Claude subscription; I can use Opus for planning and Haiku for implementation.
Honestly, I was very pleased with how smoothly this feature was implemented. What contributed to this was the plan-first approach before implementing anything. For anything non-trivial, always plan first!
Here is a high-level overview of my interaction with Opencode in Zed:
- Investigate
- I started with Claude Opus and asked whether the feature I needed already existed, rather than looking it up myself because I was under time pressure. It didn't.
- Plan
- I asked Claude to plan the feature and use a test-driven approach. It broke the work into nine tests, and I asked it to pause after each one for me to review.
- Before starting any implementation, I asked it to write the plan to a Markdown file in the backlog.
- Then I reviewed the plan. That sounds more superficial than what I did, but I cannot say much more than that it simply looked good.
- Implementation
- I switched to Claude Haiku for implementation.
- It started off well and asked for feedback after each test cycle, and I asked for a refactor to remove duplication.
- While it was implementing, I discovered I wanted a different kind of description for the tasks, so I told it to change that in the plan. I did not switch models for this.
- The plan was updated in all the correct places.
- After this, the workflow changed: it stopped asking me for feedback after each step, and before I realised it, seven of nine tests were running. Not the TDD flow I asked for, but it worked.
- Instead of asking it to redo anything, I reviewed the implementation (it was not a lot of code) and continued.
- I ran the program against my own data and everything worked as intended. Aside from the one refactoring to remove duplication, I did not change the code.
- One loose end I had to remind it of: update the documentation.
I have shared the full session here: https://opncd.ai/share/aBYozahW.
There were some pitfalls of using LLMs that I ran into, and I admittedly leaned into them. Speed beating accuracy is a real risk. The feature works and the code looks good, but if I were coding hands-on I probably would have reviewed more thoroughly. It is hard to tell if the end result would have been drastically better. It requires discipline to not start running along with the agent and to not start accepting everything if the outcome is as expected. The LLM gave me what I needed and any follow-up changes should be small, but I still see little things I would have done differently if I done it manually (which other developers may disagree with too, to be fair). For example, some of the tests could do with fewer assertions. The current code organisation makes that easy to address later. If any tech debt was added, it is very small and under control, so I stopped, generated my reports and filled in my timesheets.
Overall, working in Zed made it easy to review the code, and combining Opencode's plan phases kept things organised. The existing, structured DSL-based test approach with plenty of examples also helped.
Extra: How an LLM can work better with a well-structured DSL
To give some context, what the application needed was to combine two existing flags: breakdown and details. The time breakdown reports were already implemented but were only reporting time spent per day, week, month or year. What I needed were details of the projects I had worked on. The application already had a details flag but it was not implemented for this view.
In the test DSL, the flags are set by calling methods in the given setup phase: breakdown_flag(...) and details_flag(). The breakdown feature did not implement the details flag, so it was not used in the tests for this feature. What is nice (and I credit this way of testing for it) is that the LLM was able to figure out the details_flag was already present and decided to re-use it: Cmd::given().details_flag()..... Here is an example of such a DSL test:
#[test]
fn breakdown_day_with_details_should_show_tasks_per_day() {
let some_content = r"## TT 2020-01-01
- #project-a 1h Task A
- #project-b 2h Task B";
Cmd::given()
.details_flag()
.breakdown_flag("day")
.tags_filter(&["project-a", "project-b"])
.at_date("2020-01-01")
.a_file_with_content(some_content)
.when_run()
.should_succeed()
.expect_task_with_duration("project-a", "1h 00m")
.expect_task_with_duration("project-b", "2h 00m");
}
Thank you for reading,
Hans