HanLHo. - Fractional Architect & Software Product Engineer - llm

Notes on Why AI is the Third Coming of Domain-Driven Design

2026-03-29T00:00:00+00:00

Notes on "Dear Architects" podcast episode: "Why AI is the Third Coming of Domain-Driven Design"</a>.

The title is only a small part of the episode. My main takeaway is that, because AI changes the medium of communication to natural language, precise ubiquitous language will matter even more.

Condensed takeaways:

Modularity should make future change easier.</li>

Coupling should be chosen deliberately.</li>

Boundaries should reflect how teams actually work.</li>

Modeling should optimize for usefulness, not completeness.</li>

Architecture should record its reasoning and assumptions.</li>

Architecture is ongoing judgment and trade-offs, not a one-time design step.</li> </ol>

Creating architecture diagrams with C4 and coding agents

2026-03-11T00:00:00+00:00

LLMs can draw diagrams, but you get better results with a conceptual model, a validation loop, and a lightweight verification pass against the codebase than with free-form diagramming.

I used the C4 model</a> extensively to map architecture landscapes. Last week I saw an opportunity to catch up with it and try it out with coding agents. I found that modelling architecture in a text-based format with guardrails (a DSL with rules) is easier and more consistent for a coding agent. I tried it out on a small Rust project I know well. This post is a field note of my findings.

C4</a> is a zoom-in model for software architecture.

This post discusses only the levels we actually need:

System context: people and software systems.</li>

Container: deployable/runnable things inside a software system.</li>

Component: the main building blocks inside a container.</li> </ul>
A key element for coding agents: C4 can be expressed as a text model (a DSL), so the architecture model can be edited like code and validated/exported via a CLI.
C4 is model-as-code: one model, many views/diagrams. </aside>
The test project</h2>
To try this out, I used one of my personal projects: a text-based time-tracking application</a> with two runtime modes (a CLI and a web dashboard). Both operate on the same domain and the same Markdown time-entry files.
The functionality does not matter much for this post, except for two things. First, the codebase is relatively small and easy to analyse. Second, it is well-structured: ports and adapters, plus behaviour-driven, DSL-based acceptance tests.
I've used C4 on larger landscapes too. I expect the workflow to translate, but the experience will differ on larger (or less structured) codebases.
Building the model</h2>
I started the coding agent session with a direct request to build C4 diagrams for the project at system and container level, with the DSL written first.
build me a c4 model at system level and container level (as defined by the C4 model). Please create the DSL first reference: c4model.com </code></pre>
Below, I'll go through the process using the diagrams, but keep in mind these are all generated from a text-based DSL. From the start, the agent produced a working model in the Structurizr DSL. I then gave it a command to run Structurizr CLI</a> as a check at each step.
To start with, the agent inspected the Rust codebase to work out the system boundary. It established one Time Tracker</code> software system with two runtime modes: a CLI and a web dashboard. Both use the same Markdown time-entry files.
(Apologies for the dark diagrams; dark mode was enabled when I took these screenshots. To enlarge them, open the images in a new tab.)

Here is a summary of the session:
One of the first decisions was scope: whether to model only the web path or both runtime paths. The choice was to represent them as separate containers.</li>We modelled a software system, two application containers, an internal datastore for run statistics, and the Markdown time-entry files as an external dependency. That first version was structurally correct.</li> </ul> (I had completely forgotten about the runtime statistics feature ...) After the first version, something felt missing between the Markdown files and the CLI/web containers: the shared core logic. In C4 terms, that isn't another container; it belongs at component level. So I kept the container model strict and added the component level to make the shared logic explicit. I initially asked it to model the shared core logic as a container, but the agent pushed back, and the model improved because of it. I asked it to add component views for both runtime containers instead of inventing a fake core</code> container. That preserved a strict container model while making the architecture more insightful. </li> Naming discussions helped sharpen the model. The agent came up with names I was not sure about, but on a first pass it probably did a better job than I would have. One direction I set explicitly was to name things as close as possible to the codebase. The names were not bad, but this is not where I want to leave room for interpretation. </li> To support those component views, we introduced a shared component fragment that both CLI and web could include. That shared layer covered parsing, domain types, reporting, and execution statistics. The result was a more accurate picture of how the code is actually organised. </li> </ul> Once the model structure felt right, I shifted to presentation. I asked the agent to style it so different roles were easier to distinguish: CLI and web containers, shared components, adapters, renderers, and datastores. Then I asked for rounded boxes and a more explicit person-style user element.</li> </ul> The final result: I have also made the generated static site with the diagrams</a> available as it was straightforward to do with help from the agent. You can click the small magnifying glass icons to zoom into the next level. In summary, this result took several passes: boundaries first; then the component layer; then names aligned with the code; and finally presentation. The DSL in practice</h2> One important artefact we have not discussed yet: the DSL itself. Here is the full model</a> with the diagrams defined in the Structurizr DSL. All the edits were done by the agent, including the initial creation from scratch. I reviewed, asked questions, and iterated. Before this, I typed every box and relationship by hand (scrolling up and down the file, or keeping two windows open), added tech stacks (taking care not to confuse the order of strings), and so on. Using the agent was a major documentation speed boost, and the DSL came out clean and organised the way I prefer: relationships after the element definitions, not inside them. While I see the risk of not thinking things through, being relieved of painstaking manual element/relationship editing, working with agent also gave me: Iteration close to the code</li> Meaningful discussions on abstraction levels and naming</li> A knowledgeable architecture assistant at hand</li> </ul> Why I think it worked: the model is defined in text, so the agent can edit it like code. C4 provides guardrails through a small number of nested abstraction levels, and the DSL keeps names, descriptions, and styles consistent across views. A CLI tool to validate the model closes the loop, so the agent can check its work as it goes. In addition, you can ask the LLM to review the model, in the context of the actual codebase or not. Operationalising the workflow</h2> I used Codex CLI with Codex 5.3; any other recent coding agent and model will probably work as well. </aside> Going forward, here is how I will instruct LLMs to work with C4 and keep the architecture diagrams up to date. Agent Skill</h3> First, after completing this experiment, I turned my learning into a reusable agent skill called modelling-c4-diagrams</code></a>, which I can now use from any project. AGENTS.md instructions</h3> In the project's AGENTS.md</code> I added a short reference</a> so future agents can discover the DSL files and know how to validate/export. This avoids repeating the discovery work in each new session. - **Architecture docs (C4)**: source DSL at `docs/c4/time-tracker.dsl` (shared components in `docs/c4/shared-tracking-core.dsl`); validate with `just architecture-docs-validate`; export static site with `just architecture-docs-export` </code></pre> Verification</h3> In this project, the LLM and I used the following commands to verify the output: Validate C4 Structurizr DSL: structurizr-cli validate -workspace docs/c4/time-tracker.dsl</code></li> </ul> </li> Export C4 diagrams to docs/site for inspection (and GitHub Pages publishing) structurizr-cli export -workspace docs/c4/time-tracker.dsl -format static -output docs/site</code></li> </ul> </li> View the architecture documentation open docs/site/index.html</code></li> </ul> </li> </ul> Structurizr interface to LLM</h3> I found the validation loop with the CLI to work well. If the export succeeds, the DSL is valid and the views conform to the tool's rules. That still does not tell you whether the model is accurate, or whether the diagrams communicate well. The C4 diagram review checklist</a> is a good yardstick. The LLM did not seem to require much extra instruction to create a proper model and views. I pointed it to c4model.com at the beginning of the session, and that may have been enough context. (Hard to tell what it knows or does under the hood.) The skill I created and referenced above now serves as a main interface. Conclusion</h2> The architecture model and diagrams are insightful artefacts ("pictures can say more than words"). But most of the thinking and modelling usually happens visually, while recording it often becomes a chore. This experiment showed me that LLMs can help keep a model up to date without turning it into a separate manual process. When the model is constrained (C4) and expressed as text (a DSL), you can version it like code, review it like code, and validate/export it through a CLI. Constrained text models plus validation give coding agents a better architecture-diagram workflow than free-form diagramming. Addendum: Next experiments</h2> Some follow-ups I might try if I run this workflow again. Keeping the model in sync</h3> Work with the LLM to design how to encode parts of the architecture model directly in the codebase. Use the C4 views as shared context, then define a way to keep the model in sync with the implementation. Unless you are using a very principled framework (maybe Spring in Java?), I expect this to be quite custom per project anyway. Coding agents may lower the barrier to getting started with this kind of non-obvious quality-improvement work. Verification beyond the CLI</h3> Use an MCP like Chrome DevTools to inspect exported diagrams as a second verification step. </li> One concrete use case: manual editing is often required to position boxes and, especially, dependencies. A visual inspection could double-check that no text boxes overlap and that lines do not cross boxes. </li> Coding agents could be used to evaluate the shape of the architecture outside of the code. </li> </ul> Publishing and representation</h3> Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code. A better LLM interface in tooling</h3> This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive --help</code> output, or ship a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).

A skill to support TIL creation 2026-03-02T00:00:00+00:00 Skills are a great way to introduce capabilities in your agent flows. To support my "The Day I Learned" repository</a> and website</a>, I created a skill to extract learnings from coding and LLM sessions. The skill below is installed in my global agent settings (AGENTS.md</code>), and from each project I can call it to create a TIL in a dedicated project location. In my til project</a>, I also have a script</a> that collects these across all projects. This is usually where I do some reordering and rewriting, but this skill works well and removes friction when sharing short learning snippets. Instead of creating a markdown file manually in the correct format, this gives me a structured draft that I only need to rework. It is much easier to improve something than to start from a blank canvas. Full skill definition: --- name: til-learning-partner description: Use when successful tool usage should be captured as a concise, utilitarian TIL note or Tilly is mentioned. --- # TIL Learning Partner ## Overview Capture short, practical "Today I Learned" notes from successful outcomes. Each note is a standalone markdown file in `docs/tils/` with a slug-only filename. ## Trigger Conditions - User asks to capture or write a TIL. - A command/tool outcome usage is successful and worth preserving. - User asks to summarize what worked into a reusable note. ## Workflow 1. Confirm the outcome is successful and concrete. 2. Ask one lightweight prompt: `Capture this as a TIL?` 3. If user declines, do not write a file. 4. If user confirms, extract one focused learning: - Issue solved - How it was solved - Key command(s) learned - Practical takeaway 5. Generate a slug from the learning title and save to `docs/tils/<slug>.md`. 6. If `docs/tils/<slug>.md` exists, use deterministic suffixes: - `<slug>-2.md`, `<slug>-3.md`, and so on. ## Output Contract - Path: `docs/tils/<slug>.md` - Slug rules: - Lowercase - ASCII-safe - Hyphen-separated words - No date in filename or slug ## TIL Template ~~~~markdown # TIL: <specific issue solved> ## Issue <one concise problem statement> ## Solution <what worked and why> ## Key commands \```bash <command 1> <command 2> \``` ## Takeaway <how to apply this next time> _Created: YYYY-MM-DD_ ~~~~ ## Guardrails - Keep notes utilitarian, direct, and brief. - Cover one specific learning per file. - Avoid exploratory or uncertain phrasing. - Avoid blog-style storytelling and long introductions. - Do not invent commands or outcomes that were not observed. </code></pre> Experience report: Site update using coding agents and Beads 2026-03-01T00:00:00+00:00 This is an experience report on updating my (this!) website using coding agents and Beads</a> ('a distributed, git-backed graph issue tracker for AI agents.'). Site update using coding agents</h2> Over the past week, I used coding agents to update my website. The result is, at least I hope so, a cleaner design and new features, delivered faster than I could have managed alone and at a design standard I probably would not have achieved on my own either. Context: in October, I moved my blog from Hashnode to a self hosted Zola site to take more control of my online presence and potentially support freelance work. Tools</h3> For improving the overall look and feel I mostly relied on Impeccable</a>, which is built on Anthropic's original frontend-design skill. It has a pretty clean and clear command set</a> across diagnosis, audit, and several kinds of gradual improvements (e.g. /bolder</code>). In my case, what definitely helped is that I already had a visual style in place I wanted to maintain: simple and burgundy-based. I do not know how it compares to other kinds of frameworks, but for what I did I would recommend trying out Impeccable: easy setup and good workflows. As for the 'AI assistance': I mostly used Codex and Opencode as coding agents, GPT-5.2 as a model for planning and analysis, and Kimi 2.5 and Gemini Flash 3 for implementation. For quality reviews, I relied entirely on the coding tools, aside from updating a size value here and there, so it's safe to say these updates were more vibe-coded as in its original definition</a> than actually 'engineered'. Changes made</h3> Looking at the completed tasks list, I completed over 50, the most important changes, aside from the content overhaul: Visual refresh, much cleaner and more appealing, I think.</li> Atom feed. Maybe I'll integrate a newsletter later, but at least people can subscribe.</li> Bluesky sidebar with live post rotation (note this is all handled on the client side)</li> Better code block readability in posts.</li> At the end of each post, instead of comments, I added a simple 'Want to respond?' section for basic interaction.</li> 'Tags' are now visible and browsable. I also added 'categories' ('Experience Report', 'How-to', etc.).</li> </ul> Last but not least: dark mode. I thought I was finished. Then, for the fun of it, I asked for a dark mode, which produced in one shot the style it has now. That definitely would have taken me ages to get right and now it took 10 mins. The only change I had to ask was regarding the styling and position of the dark mode toggle, that's it. The LLM also came up with some fading effect when toggling dark mode. Normally I do not like the LLM to come up with extra stuff I did not ask for, but this one pleasantly surprised me so I kept it in. Try it ... I think it looks pretty nice. Using Beads for the first time</h2> For every project, I have a custom Markdown-based backlog or tracking system. Each one is a little different and works better or worse depending on when I came up with the project or started it. I have tried Task Master</a> before, but I could not get it working reliably or easily. Enter Beads</a>. Beads markets itself as an AI native task tracker and I think it shows. The tool feels designed by someone who actually uses AI to get work done. After installing it, you can simply use it from the command line. But you can also use it to initialise your coding agents' instruction set for your projects, which means instructions will be added to your AGENTS.md (or CLAUDE.md) file. (Note: if you are working in a shared repo, you can use Beads in a shared in stealth mode, to not interfere with others: bd init --stealth</code>.) I found it works reasonably well and is easy to start with bd quickstart</code>. This is the first task management system I've worked with that has brought some consistency across my projects. While using it, I've come up with some customisations in the form of skills: Land the plane</code>: Instead of repeating the same instructions to finish up a task, I created a skill instead of referencing the same commands in each agent instructions file. (I also needed some customization.)</li> Create task</code>: The command line already has a nice way to create tasks as well, but when working with an agent I prefer to use my own skill that I can just ask to create a task while dictating.</li> </ul> Beads is actually a pretty good name, it is just easy to create tasks and start new tasks, in other words to keep going, one task, or bead, after another. I will have to see how it holds up when a whole backlog starts building up, but based on experiences so far, I will continue using Beads to keep track of work on personal projects. Harness Engineering 2026-02-27T00:00:00+00:00 Today I heard the term “harness engineering” for the first time: Harness engineering is the practice of building tooling, tests, and automation that let coding agents execute tasks safely and reliably. </blockquote> If code is written more and more by LLMs, the focus seems to be shifting to creating guardrails so agents can validate their own work. Heard in: The Pragmatic Engineer - Mitchell Hashimoto’s new way of writing code</a> AI’s Opportunity: Pacing Control Loops with Development 2026-02-04T00:00:00+00:00 What caught my attention in the book Vibe Coding by Gene Kim and Steve Yegge is the idea that, as LLMs and coding agents change how we build software, control loops—tests, reviews, and other signals that tell you whether a change behaves as expected—should be faster and more integrated into development feedback loops than before. My intuition says this makes perfect sense. For example, when there is a dedicated test stage or a QA role that tests after the fact, that role inevitably struggles to keep up with the speed of development. Over time, this makes it increasingly difficult to sustain a 'testing after the fact' organisation of quality. So how do we solve this? Some may think that introducing AI by implementing it at the test level after the fact, could be the solution. However, at the rate of development I see and read about, this approach will be hard to keep up with. One either has to accept not fully taking advantage of what AI can help development with, or rethink how testing is integrated into the development process. Put bluntly: if AI lets you produce a feature in hours but the first meaningful acceptance signal only arrives days later in a separate stage, quality assurance become the bottleneck. To me, the logical consequence is a stronger shift towards automated quality controls, including acceptance tests and code reviews at the least. I refer to acceptance tests here as writing executable specifications of expected behaviour (in domain language) before or alongside the code. This implies that testing has to move earlier in the development chain because of AI. AI is an opportunity to start writing acceptance tests if you have not yet. It pushes us to invest time in strategic test design, testing against stable contracts, testing from a behavioural point of view, and isolating test descriptions from the actual implementation. Put differently, the shift in development practices that LLMs are causing should inspire more adherence to testing best practices, not less. That is, if you want to keep on adding new features, fix and prevent bugs, and keep up the pace of development. More broadly, to keep benefiting from AI over time, we should shift towards tightly coupled feedback loops embedded in everyday development. This is not limited to testing but also applies to, for example, reviews. In that sense, AI doesn’t remove quality practices; it raises the stakes if you don’t have them. If testing 'shifts left', team structures must change as well. This evolution points towards smaller, more autonomous teams where testing, development, and feedback are inseparable rather than sequential. AI presents us with an opportunity: not faster quality control after the fact, but to design systems, processes, and teams that make quality the fastest path forward, so control loops can keep up with the increasing pace of development loops. Experience Report: Building a time-tracking AI assistant 2026-02-04T00:00:00+00:00 This is a short experience report about using skills</a> (with Codex and its models) to build a personal AI assistant that helps me maintain my time-tracking log. To set expectations: the assistant does not manage my calendar or tasks. It helps me keep a time-tracking log that lives in a Markdown file by interpreting logging requests and editing the file for me (while categorising entries correctly). I start most days with a bit of planning, which means adding entries to that log. The format is completely custom and tailored to my needs, and I wrote a small companion CLI tool, tt</code>, to generate reports from it. (The project is open source on GitHub, but honestly I don't think it is useful to anyone other than me.) To give an idea, this is what a day entry looks like: ## TT 2026-02-04 - #admin ##work 30m inbox and daily planning - #prj-content ##work 2h article outline and research notes - #prj-content ##work 1h 30m first draft writing - prj-personal-assistant #llm ##work 1h walking skeleton - prj-personal-assistant #llm ##work 1h create skills - #break ##energy 20m outdoor walk - #learning ##work 1h documentation reading and summary </code></pre> And using tt</code>, I can generate reports like: Overview 2026-02-04 -> 2026-02-04: - prj-content: 3h 30m - prj-personal-assistant: 2h 00m - learning: 1h 00m - admin: 30m - break: 20m Total: 7h 20m Breakdown: - ##work: 7h 00m - ##energy: 20m </code></pre> Editing the file is not hard, but it is tedious. The goal of this project was not to replace my log format, but to make it easier to operate. Today I ran a small LLM experiment to make logging less cumbersome. Instead of writing an entry like #prj-personal-assistant #llm #codex ##work 2h Setup walking skeleton</code>, I want to be able to say: "Create a new task to set up a walking skeleton, add tags codex and llm, and attribute the time to the personal assistant project." And by "say" I mean it literally: I dictate it in normal language, it gets transcribed and sent to the LLM. This turned out to be a surprisingly fast (and fun) experiment with promising first results. Technical details: I used the Codex agent and its models, mostly Codex 5.2. Working with Codex was smooth, but this post is not about comparing coding agents; I suspect it would work with any capable agent that supports skills. I started with a log file containing over a year of time entries. That history was a good dataset to prime the LLM on the format: what a day looks like, what an entry line looks like, and how entries should be categorised with tags. From there I moved into implementation, with a small set of local files and skills. This is the file tree I ended up with (not ready to call it "architecture" yet): AGENTS.md skills ├── tt-cli │ ├── references │ │ └── command-cheatsheet.md │ └── SKILL.md └── tt-log ├── references │ ├── log-structure.md │ ├── tag-inference.md │ └── validation.md ├── scripts │ └── validate_tt_update.py └── SKILL.md time-tracking-log.md </code></pre> tt</code> is an abbreviation for "time tracking". In practice, AGENTS.md</code> tells the agent which skill to use for which capability: ### Time tracking ### Time tracking - Use `tt`, the custom time-tracking CLI, for time-tracking operations. - Use `$tt-log` for `time-tracking-log.md` edits, tag inference, and 7h to 8h daily policy checks. - Use `$tt-cli` for `tt` command discovery, report commands, and CLI troubleshooting. - Rule of thumb: log edits/validation => `$tt-log`; reporting/CLI usage questions => `$tt-cli`. </code></pre> The two skills are the heart of the implementation: tt-cli</code> handles the tt</code> CLI tool: command discovery, reporting, filters, and general troubleshooting.</li> tt-log</code> handles log editing, task insertion, tag inference, section ordering, and policy checks.</li> </ul> From the start I wanted to use skills because my custom format and tooling are a specialised capability. Initially Codex suggested a single skill, but it was clear to me that reading/querying and writing/editing were different responsibilities, so I pushed it in that direction (it agreed, ha!). That split improved the quality of outcomes. Beyond maintainability, making responsibilities explicit made behaviour more predictable, because the CLI skill gives the LLM a way to validate its work. The tt-log</code> skill can focus on reliable edits and validation, while tt-cli</code> handles queries like "how much do I still need to log today?" and validates the log. The references</code> locations for both skills were set up by the LLM while we created the skills. They are pretty clean in terms of responsibility, and reviewing and refining the split proved useful. During implementation I also wanted basic checks for "did I log enough today?", so we added a validation workflow that checks a daily target range (7h to 8h). The logic is always the same, so I had it write a script: skills/tt-log/scripts/validate_tt_update.py</code>. I iteratively refined the default logging rules (which tags to use for which kinds of tasks, the fact that not all my days look the same, and so on). I don't expect it to be perfect, but I will probably tweak it over the next couple of weeks as exceptions pop up. As an aside, when finishing up Codex proposed me to create a 'one-page pdf summary' of this project. I think it did a pretty good job</a>. </aside> So in short: Created initial time-tracking skill behaviour for structured log edits based on existing time tracking data.</li> Split responsibilities into two dedicated skills (tt-log</code> and tt-cli</code>).</li> Added automated validation for parse integrity, per-day totals, and a daily policy range.</li> Iteratively refined defaults and behaviour based on real usage (for example, a longer workout-at-noon baseline on Tuesdays).</li> </ol> Things I can now ask: "I want to fill the rest of the day with work on a project I forgot the tag of. Give me the last 5 projects I recorded time on so I can tell you what to log to." Before, this was not hard, but it involved a bunch of small chores: checking previous days, finding the right project tag, copying it into a new line, and calculating the time left for the day. Besides the usefulness (and fun), there was an unexpectedly valuable lesson: AI assistance works best in the same way good code does. Define clear boundaries and add executable checks so changes are easier to make and the system can validate its own work. On Building Reliable Software with LLMs 2026-01-27T00:00:00+00:00 This post captures my current thinking on how LLMs are impacting software development, particularly around software quality and engineering discipline. My main observation: most of the best practices we've relied on for years are just as important—maybe even more so—in an LLM-assisted development environment. Working with LLMs requires more discipline and attention to fundamentals, not less. When using LLMs, there is a heightened risk of losing understanding: of the problem domain, the underlying technology, and the implementation details. Code can become messy quickly without careful attention, review, and guidance. While this is certainly true, we didn't need LLMs for this to happen. Why else have so many projects failed historically? Why is technical debt a topic in most projects? The critical difference with LLMs is the increased risk and temptation of velocity. We move too fast and skip the practices that help us maintain and change software in the future. Discipline and rigour</a> have become more important than ever. These practices are becoming MORE crucial in an LLM-assisted workflow: Codebase quality. This matters for LLM agents too, because they learn from existing code. A clean, well-organised codebase helps agents perform better; inconsistencies lead to poorer results. An LLM will mimic what's already there. </li> Feedback loops and testing. If an LLM is helping you write code, you need reliable ways to verify it hasn't broken anything. A well-designed, automated test suite that's easy to extend and interpret helps maintain understanding and control of implemented functionality. </li> Well-designed boundaries and contracts both within and outside your application. These allow you to constrain, shape, isolate, and test the work an LLM produces. </li> Managing risk and technical debt. Be intentional and explicit about where you rely on LLMs and where you don't. Document these decisions. Maintain a technical debt log with risk assessments and timelines. </li> Documentation of past decisions. Keep a history of architectural decisions through decision logs and ADRs, and ensure the LLM you're working with is aware of them. I've had LLMs point out inconsistencies in the codebase or flag how new change requests conflict with past decisions. </li> </ul> Taken together, these practices are what make LLM-assisted development sustainable rather than brittle. I don't think the skill gap in building and delivering software will ultimately be about prompt cleverness. LLM agents will be genuinely helpful tools, and working effectively with them will be an accelerator. However, as we rely on them more to write code—even when we review that code carefully—the most important work increasingly becomes the disciplined practice of boxing them in with testing, architecture, and contracts. Avoid painting yourself into a corner. In an LLM-assisted workflow, that means being deliberate about where you let agents move fast, and where you slow them down with guardrails. LLMs make it easier to move fast, and easier to get stuck. Agent Chisels: My LLM Agent Skills and Workflows 2026-01-13T00:00:00+00:00 I have been meaning to share more about my LLM workflows and tooling for a while, partly to have a reference for conversations, but mostly to learn in public. Agent Chisels</a> is where I will be sharing the custom artefacts (primarily skills</code>, with commands</code> and agents</code> to follow) that I find most useful and actively use in my daily workflow. Skills</h2> I have shared two skills</a> I use almost daily. In addition, I've also included a third skill, more of a meta-skill for evaluating other skills, which I used when reviewing these for release. I actively use this skill to iterate and improve my skills so it fits the goal of this project. documenting-architectural-decisions</code>: Document and manage architectural decisions using ADRs. Supports Y-statement and traditional ADR formats. Used for creating, reviewing, or searching decision records. This repository contains several examples of decision logs created with this skill, for example, here</a> is the one for the jj</code> plugin.</li> </ul> I use jj</code> or Jujutsu</code></a>, an alternative version control system, in all my projects. Getting LLMs to work reliably with it is quite a challenge, so I have a skill to detect and remind an LLM to use jj</code> and one to add the capability of using jj</code>. The Claude Code plugin</a> also adds a use-jj</code> command and a hook to remind an LLM of using jj</code>. detecting-jujutsu</code> — Verify if the current repository uses Jujutsu (jj) instead of git. Used when confirming VCS state before operations.</li> using-jujutsu</code> — Detailed guidance on Jujutsu (jj) VCS operations including committing, pushing, searching history, and working with revisions/revsets.</li> </ul> And finally, there is the meta-skill to evaluate skills. Here is an example</a> of a report generated by this skill. evaluating-skills</code> — A skill to evaluate skills against best practices for size, structure, examples, and prompt engineering. Use when reviewing skills for deployment, optimisation, or standards compliance.</li> </ul> To use these, you can use the Claude Code plugin system or install them manually; take a look at the installation section</a> for more details. A little on the setup of the repository</h2> I use symbolic links liberally to avoid duplication. For example, symbolic links allow me to share the independent skills with the Claude Code plugin while also using them in this project itself. All skills I share in this repo are dynamically linked to my ~/.claude/skills</code> directory. Note that this is also the easiest way to make these skills available to other LLM CLI agents</a> like Opencode, Codex, and Mistral Vibe. In this repo, I have mostly worked with Opencode</a> and the skills in the .claude</code> location just work with it. ❯ ls -l .claude/skills l... detecting-jujutsu -> ../../skills/detecting-jujutsu l... documenting-architectural-decisions -> ../../skills/documenting-architectural-decisions l... evaluating-skills -> ../../skills/evaluating-skills l... using-jujutsu -> ../../skills/using-jujutsu d... verify-release-readiness </code></pre> The l</code> at the beginning of each line stands for symbolic link. You'll notice one real directory in there, that is a skill only relevant to this repository. Future plans</h2> I'll be adding LLM artefacts as I move my own setup more and more to this repository. Since I'm trying to reuse as much as possible (within reason) between different LLM agents, I need a central location anyway, preferably vendor neutral yet pragmatic (e.g. using the .claude/skills</code> location to share skills). Also, my hope is to make this repository a more live and automatically up-to-date version of the artefacts I use day to day. Related to this, I am thinking of creating a setup similar to dotfiles</a> (where developers share configuration files) but for LLM agent configurations: 'agentfiles'. I intend to share my LLM agent configurations and how I integrate them. Let me know if you would be interested in this or are already sharing. Implementing an Urgent Feature with Opencode, Claude, and Zed 2025-12-18T00:00:00+00:00 This is a short post to share a positive experience I had using an LLM agent to quickly add a feature to an existing personal CLI time-tracking application</a>. Below, I describe how I added it using Zed, Opencode and Claude. To start, I wasn't even sure the feature I needed existed in a text-based time-tracking application I use day-to-day to keep track of what I work on. My application has a way of getting the information I needed out, but the feature that should make this easy was missing details on what I had actually worked on. So, should I invest the time and expand the feature, or accept that it was missing for now and spend a lot more time on manual work? I could try an LLM agent and see if it could help me implement the change quickly. Each choice had downsides: spending more time digging through time-tracking information to fill in timesheets is not very appealing, and implementing the feature (with or without agents) could become a time sink. I also had lots of other work planned for the day. I decided to implement it, telling myself I’d stop if it didn’t look like I had a clear path to finish it within one hour. Because I think it is relevant to the (spoiler) successful implementation, let me share a little about this project. It is a Rust codebase that I use to test development practices, and I think it is structured and implemented fairly well. The CLI, the main part of the application, has over 95% coverage using behaviour-driven, DSL-style acceptance tests. This setup gives the LLM models both structure and plenty of examples to follow when adding tests. I will not go into the details here, but I have added a brief example at the end. Also noteworthy: this is a small project, which definitely makes a difference. For this implementation, I used Zed with its Opencode integration. Lately, I have been on the command line building smaller apps prompt-driven, without worrying much about the fine details. But for this project the actual implementation mattered to me, so I wanted to track changes more closely in an IDE. Opencode taps into my Claude subscription; I can use Opus for planning and Haiku for implementation. Honestly, I was very pleased with how smoothly this feature was implemented. What contributed to this was the plan-first approach before implementing anything. For anything non-trivial, always plan first! Here is a high-level overview of my interaction with Opencode in Zed: Investigate I started with Claude Opus and asked whether the feature I needed already existed, rather than looking it up myself because I was under time pressure. It didn't.</li> </ul> </li> Plan I asked Claude to plan the feature and use a test-driven approach. It broke the work into nine tests, and I asked it to pause after each one for me to review.</li> Before starting any implementation, I asked it to write the plan to a Markdown file in the backlog.</li> Then I reviewed the plan. That sounds more superficial than what I did, but I cannot say much more than that it simply looked good.</li> </ul> </li> Implementation I switched to Claude Haiku for implementation.</li> It started off well and asked for feedback after each test cycle, and I asked for a refactor to remove duplication.</li> While it was implementing, I discovered I wanted a different kind of description for the tasks, so I told it to change that in the plan. I did not switch models for this.</li> The plan was updated in all the correct places.</li> After this, the workflow changed: it stopped asking me for feedback after each step, and before I realised it, seven of nine tests were running. Not the TDD flow I asked for, but it worked.</li> Instead of asking it to redo anything, I reviewed the implementation (it was not a lot of code) and continued.</li> I ran the program against my own data and everything worked as intended. Aside from the one refactoring to remove duplication, I did not change the code.</li> One loose end I had to remind it of: update the documentation.</li> </ul> </li> </ul> I have shared the full session here: https://opncd.ai/share/aBYozahW. There were some pitfalls of using LLMs that I ran into, and I admittedly leaned into them. Speed beating accuracy is a real risk. The feature works and the code looks good, but if I were coding hands-on I probably would have reviewed more thoroughly. It is hard to tell if the end result would have been drastically better. It requires discipline to not start running along with the agent and to not start accepting everything if the outcome is as expected. The LLM gave me what I needed and any follow-up changes should be small, but I still see little things I would have done differently if I done it manually (which other developers may disagree with too, to be fair). For example, some of the tests could do with fewer assertions. The current code organisation makes that easy to address later. If any tech debt was added, it is very small and under control, so I stopped, generated my reports and filled in my timesheets. Overall, working in Zed made it easy to review the code, and combining Opencode's plan phases kept things organised. The existing, structured DSL-based test approach with plenty of examples also helped. Extra: How an LLM can work better with a well-structured DSL</h2> To give some context, what the application needed was to combine two existing flags: breakdown</code> and details</code>. The time breakdown</code> reports were already implemented but were only reporting time spent per day, week, month or year. What I needed were details of the projects I had worked on. The application already had a details</code> flag but it was not implemented for this view. In the test DSL, the flags are set by calling methods in the given</code> setup phase: breakdown_flag(...)</code> and details_flag()</code>. The breakdown feature did not implement the details</code> flag, so it was not used in the tests for this feature. What is nice (and I credit this way of testing for it) is that the LLM was able to figure out the details_flag</code> was already present and decided to re-use it: Cmd::given().details_flag()....</code>. Here is an example of such a DSL test: #[test] fn breakdown_day_with_details_should_show_tasks_per_day() { let some_content = r"## TT 2020-01-01 - #project-a 1h Task A - #project-b 2h Task B"; Cmd::given() .details_flag() .breakdown_flag("day") .tags_filter(&["project-a", "project-b"]) .at_date("2020-01-01") .a_file_with_content(some_content) .when_run() .should_succeed() .expect_task_with_duration("project-a", "1h 00m") .expect_task_with_duration("project-b", "2h 00m"); } </code></pre> Thank you for reading, Hans My agentic coding stack for October 2025 2025-10-05T00:00:00+00:00 After wrapping my head around the constant changes in LLM subscriptions and performance, here's my new coding stack for October 2025: Warp Pro: my go-to agentic CLI, speedy and reliable for coding and task automation. Supports most top end LLMs. Zed Pro (using the $20 trial, then €10/month) OpenCode for agentic CLI dev (local + cloud models, easy swapping, no lock-in) I dropped Claude Code and did not subscribe to Codex. If this turns out to be a bad idea, I can always resubscribe again. My main decision influences: cost control, flexibility and avoiding vendor lock-in (Claude Code's past month changes have this effect). After spending a few months in the CLI I want to look again at working more in an IDE, hence Zed is on this list. Note: I do not subscribe to Max plans, I combine multiple lower cost plans. Grounded decision records from AI conversations 2025-09-02T00:00:00+00:00 If you've read some of my posts before or worked with me, you know I like using Architectural Decision Records (ADRs) for lots of reasons</a>. To me the most important one is documenting the why of a decision. If you've worked with AI models before, you've probably asked them for options when brainstorming solution ideas when you're not sure about direction. In this situation, I've found it quite easy, maybe even more logical, to document the decision in a decision record with help from the AI model you have been working with. After all, you got feedback from it anyway. This post briefly explains how I use Claude Code to write decision records (not necessarily architectural ones) when it's helped me make a decision. My goal isn't to dive into an elaborate investigation into the pros and cons of this approach, but I will touch upon a few points. The process is not specific to Claude Code but can be adapted to other AI models or tools as well. Here's the process I've been following: 1. Ask for the decision record during the conversation After going back and forth on options and reaching a decision, I ask something like: Given your feedback, I think Option 2 is the way I want to implement. Principally I do not want to complicate things with 2 tools at the moment. Before implementing, summarise this into an ADR in the 'adr/' directory as a way of confirming our mutual understanding. So, write an ADR first, then ask me to confirm the ADR. Once I do, continue implementing. </code></pre> The phrasing is a bit sloppy, as this is what I actually wrote in my recent changes, but that's OK, Claude Code can work with this. 2. Review and edit the generated draft What it comes up with initially is actually pretty good. A lot will depend on the conversation you had with it and the input you gave it, of course. But the decision record mostly contains what was discussed and decided quite well. The text will be structured like an ADR even without me having to explain what an ADR is. It even picked up the decision record's next number too. Usually content editing is needed, but having a complete draft to start with makes a big difference instead of starting from a blank page. So after the draft is written, I edit it mostly on content only. 3. Use as implementation foundation When done, the decision is basically documented and it can serve as the basis of our next step. To make the continuation based on the decision record only, at this point you could clear or restart the Claude Code session to start fresh. This last step is meant to ground the decision record in reality. When you include the decision record as basis for implementation, it becomes living documentation that creates a feedback loop. During implementation you may encounter issues not anticipated during the decision making process. These can then be documented in the decision record, creating a continuous improvement cycle. I often use my home projects as playgrounds to experiment with new ideas, technologies, and methodologies I want to try out because they're new to me or because I want to confirm they're still useful. This is why most of my home projects serve two purposes: build the thing and apply what I think are best practices. In this particular project, I felt the need to create decision records. They aren't necessarily architecture related decisions but more decisions I'd like to document to remember why I went this way. I mainly make use of the markdown format of ADRs to document my decisions. Generally speaking, this is mostly an experiment at the moment. I'm trying this out in the setting of personal projects, but I think it may lower the bar for writing decision records in general. Not everyone wants to go to great lengths to record decisions, even if they see the value. Personally, I think it doesn't need to take a lot of time to document a decision, yet I often end up spending much more time on it than I anticipated or hoped for when I started writing one. Here</a> is a decision record example for those interested. It is written by an LLM mostly. I know this may be controversial, but it is also useful. I do not think this is 'AI slop', if you review it and it all makes sense, I don't see much need for a complete rewrite. Having a documented decision is worth more than not documenting because it was written by AI. I haven't tried this in a team or organisational setting, but I'm curious to see how it works out and how people would feel about it. In any case, whoever created the decision record with help of an AI is always responsible for the decision and its recording. Also, "working with an LLM replaces thinking" is an often heard argument. I agree and it's a reason for concern for me too, the impact on my own thinking. But an LLM also comes up with good ideas. It is easy and tempting to simply accept what's there. Maybe not all decisions require deep thinking though, and we just want to note down why we're doing the thing we're doing this way. I am also thinking it may be useful to add to the decision record to what extent an LLM was used to generate the content. Firstly it will get it out of the way when people are suspicious of AI being used and to what extent. I think this is comparable to time or other constraints one has when writing ADRs or decision records in general: this may be worthwhile to document too. Sometimes there is not enough time to think deeply about the decision or consider more options and you have to make a decision with quite some unknowns. In these cases I recommend documenting this in the decision record too. Finally, there are quite some interesting things that can be done next. When working with Claude Code, if you would create decision records more often it makes sense to create a command to use as 'saved prompt'. I actually created an agent for this as well (to try out agents mostly, I admit). I've written about ADRs before, if you want to read more about them: Less Mentioned Benefits of Architecture Decision Records</a> </li> Ground Your ADRs with a Verification Section</a> </li> </ul> Thank you for reading, Hans Grounding AI Instructions in Living Documentation 2025-07-28T00:00:00+00:00 Context engineering shows interesting potential to ground documentation to actual code, or as how I sometimes refer to it: reality. Linking AI instruction files (CLAUDE.md</a>, .rules, .cursorrules, etc) to development documentation may turn static docs into living resources. Each code generation cycle tests documentation accuracy and real-world application. This creates a direct feedback loop that keeps documentation aligned with actual development workflows. Also, this coupling of documentation and implementation may create friction but I expect this to a good thing long-term. It signals opportunities for documentation improvement, encouraging streamlined, practical documentation that genuinely serves developers, while also identifying code that diverges from documented standards. Some quick thoughts: The feedback loop is fuzzy given how LLMs work, but an LLM can likely explain why it implemented something based on the documentation. </li> Documentation will likely become more actionable and directive. </li> The 'why' behind guidelines typically belongs elsewhere, but could become an evaluation against actual code. </li> Developer documentation may work better in the code repository (and in markdown). </li> LLMs currently work best with concise instruction files - this constraint likely benefits developer documentation too. </li> </ul>

Addendum: Next experiments</h2>
Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3>
Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>

A better LLM interface in tooling</h3>
This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive `--help</code> output, or ship` `a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).</p>`

HanLHo. - Fractional Architect & Software Product Engineer - llm

Notes on Why AI is the Third Coming of Domain-Driven Design

Creating architecture diagrams with C4 and coding agents

A skill to support TIL creation

Experience report: Site update using coding agents and Beads

Harness Engineering

AI’s Opportunity: Pacing Control Loops with Development

Experience Report: Building a time-tracking AI assistant

On Building Reliable Software with LLMs

Agent Chisels: My LLM Agent Skills and Workflows

Implementing an Urgent Feature with Opencode, Claude, and Zed

My agentic coding stack for October 2025

Grounded decision records from AI conversations

Grounding AI Instructions in Living Documentation

HanLHo. - Fractional Architect & Software Product Engineer - llm

Notes on Why AI is the Third Coming of Domain-Driven Design

Creating architecture diagrams with C4 and coding agents

Operationalising the workflow</h2> I used Codex CLI with Codex 5.3; any other recent coding agent and model will probably work as well.</p> </aside> Going forward, here is how I will instruct LLMs to work with C4 and keep the architecture diagrams up to date.</p>

Addendum: Next experiments</h2> Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3> Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>

A skill to support TIL creation

Experience report: Site update using coding agents and Beads

Harness Engineering

AI’s Opportunity: Pacing Control Loops with Development

Experience Report: Building a time-tracking AI assistant

On Building Reliable Software with LLMs

Agent Chisels: My LLM Agent Skills and Workflows

Implementing an Urgent Feature with Opencode, Claude, and Zed

My agentic coding stack for October 2025

Grounded decision records from AI conversations

Grounding AI Instructions in Living Documentation

Addendum: Next experiments</h2>
Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3>
Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>