HanLHo. - Fractional Architect & Software Product Engineer - agent-skills

Creating architecture diagrams with C4 and coding agents

2026-03-11T00:00:00+00:00

LLMs can draw diagrams, but you get better results with a conceptual model, a validation loop, and a lightweight verification pass against the codebase than with free-form diagramming.

I used the C4 model</a> extensively to map architecture landscapes. Last week I saw an opportunity to catch up with it and try it out with coding agents. I found that modelling architecture in a text-based format with guardrails (a DSL with rules) is easier and more consistent for a coding agent. I tried it out on a small Rust project I know well. This post is a field note of my findings.

C4</a> is a zoom-in model for software architecture.

This post discusses only the levels we actually need:

System context: people and software systems.</li>

Container: deployable/runnable things inside a software system.</li>

Component: the main building blocks inside a container.</li> </ul>
A key element for coding agents: C4 can be expressed as a text model (a DSL), so the architecture model can be edited like code and validated/exported via a CLI.
C4 is model-as-code: one model, many views/diagrams. </aside>
The test project</h2>
To try this out, I used one of my personal projects: a text-based time-tracking application</a> with two runtime modes (a CLI and a web dashboard). Both operate on the same domain and the same Markdown time-entry files.
The functionality does not matter much for this post, except for two things. First, the codebase is relatively small and easy to analyse. Second, it is well-structured: ports and adapters, plus behaviour-driven, DSL-based acceptance tests.
I've used C4 on larger landscapes too. I expect the workflow to translate, but the experience will differ on larger (or less structured) codebases.
Building the model</h2>
I started the coding agent session with a direct request to build C4 diagrams for the project at system and container level, with the DSL written first.
build me a c4 model at system level and container level (as defined by the C4 model). Please create the DSL first reference: c4model.com </code></pre>
Below, I'll go through the process using the diagrams, but keep in mind these are all generated from a text-based DSL. From the start, the agent produced a working model in the Structurizr DSL. I then gave it a command to run Structurizr CLI</a> as a check at each step.
To start with, the agent inspected the Rust codebase to work out the system boundary. It established one Time Tracker</code> software system with two runtime modes: a CLI and a web dashboard. Both use the same Markdown time-entry files.
(Apologies for the dark diagrams; dark mode was enabled when I took these screenshots. To enlarge them, open the images in a new tab.)

Here is a summary of the session:
One of the first decisions was scope: whether to model only the web path or both runtime paths. The choice was to represent them as separate containers.</li>We modelled a software system, two application containers, an internal datastore for run statistics, and the Markdown time-entry files as an external dependency. That first version was structurally correct.</li> </ul> (I had completely forgotten about the runtime statistics feature ...) After the first version, something felt missing between the Markdown files and the CLI/web containers: the shared core logic. In C4 terms, that isn't another container; it belongs at component level. So I kept the container model strict and added the component level to make the shared logic explicit. I initially asked it to model the shared core logic as a container, but the agent pushed back, and the model improved because of it. I asked it to add component views for both runtime containers instead of inventing a fake core</code> container. That preserved a strict container model while making the architecture more insightful. </li> Naming discussions helped sharpen the model. The agent came up with names I was not sure about, but on a first pass it probably did a better job than I would have. One direction I set explicitly was to name things as close as possible to the codebase. The names were not bad, but this is not where I want to leave room for interpretation. </li> To support those component views, we introduced a shared component fragment that both CLI and web could include. That shared layer covered parsing, domain types, reporting, and execution statistics. The result was a more accurate picture of how the code is actually organised. </li> </ul> Once the model structure felt right, I shifted to presentation. I asked the agent to style it so different roles were easier to distinguish: CLI and web containers, shared components, adapters, renderers, and datastores. Then I asked for rounded boxes and a more explicit person-style user element.</li> </ul> The final result: I have also made the generated static site with the diagrams</a> available as it was straightforward to do with help from the agent. You can click the small magnifying glass icons to zoom into the next level. In summary, this result took several passes: boundaries first; then the component layer; then names aligned with the code; and finally presentation. The DSL in practice</h2> One important artefact we have not discussed yet: the DSL itself. Here is the full model</a> with the diagrams defined in the Structurizr DSL. All the edits were done by the agent, including the initial creation from scratch. I reviewed, asked questions, and iterated. Before this, I typed every box and relationship by hand (scrolling up and down the file, or keeping two windows open), added tech stacks (taking care not to confuse the order of strings), and so on. Using the agent was a major documentation speed boost, and the DSL came out clean and organised the way I prefer: relationships after the element definitions, not inside them. While I see the risk of not thinking things through, being relieved of painstaking manual element/relationship editing, working with agent also gave me: Iteration close to the code</li> Meaningful discussions on abstraction levels and naming</li> A knowledgeable architecture assistant at hand</li> </ul> Why I think it worked: the model is defined in text, so the agent can edit it like code. C4 provides guardrails through a small number of nested abstraction levels, and the DSL keeps names, descriptions, and styles consistent across views. A CLI tool to validate the model closes the loop, so the agent can check its work as it goes. In addition, you can ask the LLM to review the model, in the context of the actual codebase or not. Operationalising the workflow</h2> I used Codex CLI with Codex 5.3; any other recent coding agent and model will probably work as well. </aside> Going forward, here is how I will instruct LLMs to work with C4 and keep the architecture diagrams up to date. Agent Skill</h3> First, after completing this experiment, I turned my learning into a reusable agent skill called modelling-c4-diagrams</code></a>, which I can now use from any project. AGENTS.md instructions</h3> In the project's AGENTS.md</code> I added a short reference</a> so future agents can discover the DSL files and know how to validate/export. This avoids repeating the discovery work in each new session. - **Architecture docs (C4)**: source DSL at `docs/c4/time-tracker.dsl` (shared components in `docs/c4/shared-tracking-core.dsl`); validate with `just architecture-docs-validate`; export static site with `just architecture-docs-export` </code></pre> Verification</h3> In this project, the LLM and I used the following commands to verify the output: Validate C4 Structurizr DSL: structurizr-cli validate -workspace docs/c4/time-tracker.dsl</code></li> </ul> </li> Export C4 diagrams to docs/site for inspection (and GitHub Pages publishing) structurizr-cli export -workspace docs/c4/time-tracker.dsl -format static -output docs/site</code></li> </ul> </li> View the architecture documentation open docs/site/index.html</code></li> </ul> </li> </ul> Structurizr interface to LLM</h3> I found the validation loop with the CLI to work well. If the export succeeds, the DSL is valid and the views conform to the tool's rules. That still does not tell you whether the model is accurate, or whether the diagrams communicate well. The C4 diagram review checklist</a> is a good yardstick. The LLM did not seem to require much extra instruction to create a proper model and views. I pointed it to c4model.com at the beginning of the session, and that may have been enough context. (Hard to tell what it knows or does under the hood.) The skill I created and referenced above now serves as a main interface. Conclusion</h2> The architecture model and diagrams are insightful artefacts ("pictures can say more than words"). But most of the thinking and modelling usually happens visually, while recording it often becomes a chore. This experiment showed me that LLMs can help keep a model up to date without turning it into a separate manual process. When the model is constrained (C4) and expressed as text (a DSL), you can version it like code, review it like code, and validate/export it through a CLI. Constrained text models plus validation give coding agents a better architecture-diagram workflow than free-form diagramming. Addendum: Next experiments</h2> Some follow-ups I might try if I run this workflow again. Keeping the model in sync</h3> Work with the LLM to design how to encode parts of the architecture model directly in the codebase. Use the C4 views as shared context, then define a way to keep the model in sync with the implementation. Unless you are using a very principled framework (maybe Spring in Java?), I expect this to be quite custom per project anyway. Coding agents may lower the barrier to getting started with this kind of non-obvious quality-improvement work. Verification beyond the CLI</h3> Use an MCP like Chrome DevTools to inspect exported diagrams as a second verification step. </li> One concrete use case: manual editing is often required to position boxes and, especially, dependencies. A visual inspection could double-check that no text boxes overlap and that lines do not cross boxes. </li> Coding agents could be used to evaluate the shape of the architecture outside of the code. </li> </ul> Publishing and representation</h3> Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code. A better LLM interface in tooling</h3> This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive --help</code> output, or ship a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).

A skill to support TIL creation 2026-03-02T00:00:00+00:00 Skills are a great way to introduce capabilities in your agent flows. To support my "The Day I Learned" repository</a> and website</a>, I created a skill to extract learnings from coding and LLM sessions. The skill below is installed in my global agent settings (AGENTS.md</code>), and from each project I can call it to create a TIL in a dedicated project location. In my til project</a>, I also have a script</a> that collects these across all projects. This is usually where I do some reordering and rewriting, but this skill works well and removes friction when sharing short learning snippets. Instead of creating a markdown file manually in the correct format, this gives me a structured draft that I only need to rework. It is much easier to improve something than to start from a blank canvas. Full skill definition: --- name: til-learning-partner description: Use when successful tool usage should be captured as a concise, utilitarian TIL note or Tilly is mentioned. --- # TIL Learning Partner ## Overview Capture short, practical "Today I Learned" notes from successful outcomes. Each note is a standalone markdown file in `docs/tils/` with a slug-only filename. ## Trigger Conditions - User asks to capture or write a TIL. - A command/tool outcome usage is successful and worth preserving. - User asks to summarize what worked into a reusable note. ## Workflow 1. Confirm the outcome is successful and concrete. 2. Ask one lightweight prompt: `Capture this as a TIL?` 3. If user declines, do not write a file. 4. If user confirms, extract one focused learning: - Issue solved - How it was solved - Key command(s) learned - Practical takeaway 5. Generate a slug from the learning title and save to `docs/tils/<slug>.md`. 6. If `docs/tils/<slug>.md` exists, use deterministic suffixes: - `<slug>-2.md`, `<slug>-3.md`, and so on. ## Output Contract - Path: `docs/tils/<slug>.md` - Slug rules: - Lowercase - ASCII-safe - Hyphen-separated words - No date in filename or slug ## TIL Template ~~~~markdown # TIL: <specific issue solved> ## Issue <one concise problem statement> ## Solution <what worked and why> ## Key commands \```bash <command 1> <command 2> \``` ## Takeaway <how to apply this next time> _Created: YYYY-MM-DD_ ~~~~ ## Guardrails - Keep notes utilitarian, direct, and brief. - Cover one specific learning per file. - Avoid exploratory or uncertain phrasing. - Avoid blog-style storytelling and long introductions. - Do not invent commands or outcomes that were not observed. </code></pre> Combining agent skills 2026-02-26T00:00:00+00:00 A short post on combining skills. It's not always clear when a skill will be called or triggered. The easiest way is to put instructions and directly invoke the skill you want by name. For example, for my Today I Learned mini blog site</a>, I have two skills to keep my TILs focussed and concise. One skill evaluates that my text is concise and utilitarian, which is what I expect or aim for in a TIL. And I have a more general skill that I also apply for this blog that creates the text according to UK English 'as done by an English teacher'. To combine both, I have the review-til</code> skill that invokes both and specifies the output: Review the provided TIL file or text by running both checks in parallel: 1. **TIL Rules Check**: Invoke the `til-rules-checker` skill to verify content guidelines (word count, focus, clarity, format) 2. **Text Grading**: Invoke the `text-grader` skill to assess spelling, grammar, and clarity using UK English standards After both checks complete, provide a unified summary with: - TIL rules verdict (PASS/NEEDS REVISION) - Text grade (A/B/C/D/F) - Combined list of actionable improvements (if any) If a file path is provided as an argument, read that file first. If text is provided directly, use that text. </code></pre> Note both Invoke the '...'</code> instructions. This works in Codex and Opencode. When I asked to review two TILs at once (Review both non-committed TILs</code>) the result looks like this: Review summary: - beads-query-default-limit.md - Word count: 32 - TIL rules: PASS (focused, clear, non-blog style, utilitarian) - Text grade: A (96%) - beads-query-subcommand.md - Word count: 55 - TIL rules: PASS (focused, clear intent, non-blog style, utilitarian) - Text grade: B (84%) due to grammar issues (hard cap applied) </code></pre> (The reason for the B</code> grade is also detailed in the full output.) In case you are interested, these are the skills I have currently published online: check-til-rules</a>: This is the skill that has my TIL rules.</li> review-til</a>: The skill I usually invoke that combines the two skills. The content at the time of writing, aside from the frontmatter, is above.</li> </ul> Experience Report: Building a time-tracking AI assistant 2026-02-04T00:00:00+00:00 This is a short experience report about using skills</a> (with Codex and its models) to build a personal AI assistant that helps me maintain my time-tracking log. To set expectations: the assistant does not manage my calendar or tasks. It helps me keep a time-tracking log that lives in a Markdown file by interpreting logging requests and editing the file for me (while categorising entries correctly). I start most days with a bit of planning, which means adding entries to that log. The format is completely custom and tailored to my needs, and I wrote a small companion CLI tool, tt</code>, to generate reports from it. (The project is open source on GitHub, but honestly I don't think it is useful to anyone other than me.) To give an idea, this is what a day entry looks like: ## TT 2026-02-04 - #admin ##work 30m inbox and daily planning - #prj-content ##work 2h article outline and research notes - #prj-content ##work 1h 30m first draft writing - prj-personal-assistant #llm ##work 1h walking skeleton - prj-personal-assistant #llm ##work 1h create skills - #break ##energy 20m outdoor walk - #learning ##work 1h documentation reading and summary </code></pre> And using tt</code>, I can generate reports like: Overview 2026-02-04 -> 2026-02-04: - prj-content: 3h 30m - prj-personal-assistant: 2h 00m - learning: 1h 00m - admin: 30m - break: 20m Total: 7h 20m Breakdown: - ##work: 7h 00m - ##energy: 20m </code></pre> Editing the file is not hard, but it is tedious. The goal of this project was not to replace my log format, but to make it easier to operate. Today I ran a small LLM experiment to make logging less cumbersome. Instead of writing an entry like #prj-personal-assistant #llm #codex ##work 2h Setup walking skeleton</code>, I want to be able to say: "Create a new task to set up a walking skeleton, add tags codex and llm, and attribute the time to the personal assistant project." And by "say" I mean it literally: I dictate it in normal language, it gets transcribed and sent to the LLM. This turned out to be a surprisingly fast (and fun) experiment with promising first results. Technical details: I used the Codex agent and its models, mostly Codex 5.2. Working with Codex was smooth, but this post is not about comparing coding agents; I suspect it would work with any capable agent that supports skills. I started with a log file containing over a year of time entries. That history was a good dataset to prime the LLM on the format: what a day looks like, what an entry line looks like, and how entries should be categorised with tags. From there I moved into implementation, with a small set of local files and skills. This is the file tree I ended up with (not ready to call it "architecture" yet): AGENTS.md skills ├── tt-cli │ ├── references │ │ └── command-cheatsheet.md │ └── SKILL.md └── tt-log ├── references │ ├── log-structure.md │ ├── tag-inference.md │ └── validation.md ├── scripts │ └── validate_tt_update.py └── SKILL.md time-tracking-log.md </code></pre> tt</code> is an abbreviation for "time tracking". In practice, AGENTS.md</code> tells the agent which skill to use for which capability: ### Time tracking ### Time tracking - Use `tt`, the custom time-tracking CLI, for time-tracking operations. - Use `$tt-log` for `time-tracking-log.md` edits, tag inference, and 7h to 8h daily policy checks. - Use `$tt-cli` for `tt` command discovery, report commands, and CLI troubleshooting. - Rule of thumb: log edits/validation => `$tt-log`; reporting/CLI usage questions => `$tt-cli`. </code></pre> The two skills are the heart of the implementation: tt-cli</code> handles the tt</code> CLI tool: command discovery, reporting, filters, and general troubleshooting.</li> tt-log</code> handles log editing, task insertion, tag inference, section ordering, and policy checks.</li> </ul> From the start I wanted to use skills because my custom format and tooling are a specialised capability. Initially Codex suggested a single skill, but it was clear to me that reading/querying and writing/editing were different responsibilities, so I pushed it in that direction (it agreed, ha!). That split improved the quality of outcomes. Beyond maintainability, making responsibilities explicit made behaviour more predictable, because the CLI skill gives the LLM a way to validate its work. The tt-log</code> skill can focus on reliable edits and validation, while tt-cli</code> handles queries like "how much do I still need to log today?" and validates the log. The references</code> locations for both skills were set up by the LLM while we created the skills. They are pretty clean in terms of responsibility, and reviewing and refining the split proved useful. During implementation I also wanted basic checks for "did I log enough today?", so we added a validation workflow that checks a daily target range (7h to 8h). The logic is always the same, so I had it write a script: skills/tt-log/scripts/validate_tt_update.py</code>. I iteratively refined the default logging rules (which tags to use for which kinds of tasks, the fact that not all my days look the same, and so on). I don't expect it to be perfect, but I will probably tweak it over the next couple of weeks as exceptions pop up. As an aside, when finishing up Codex proposed me to create a 'one-page pdf summary' of this project. I think it did a pretty good job</a>. </aside> So in short: Created initial time-tracking skill behaviour for structured log edits based on existing time tracking data.</li> Split responsibilities into two dedicated skills (tt-log</code> and tt-cli</code>).</li> Added automated validation for parse integrity, per-day totals, and a daily policy range.</li> Iteratively refined defaults and behaviour based on real usage (for example, a longer workout-at-noon baseline on Tuesdays).</li> </ol> Things I can now ask: "I want to fill the rest of the day with work on a project I forgot the tag of. Give me the last 5 projects I recorded time on so I can tell you what to log to." Before, this was not hard, but it involved a bunch of small chores: checking previous days, finding the right project tag, copying it into a new line, and calculating the time left for the day. Besides the usefulness (and fun), there was an unexpectedly valuable lesson: AI assistance works best in the same way good code does. Define clear boundaries and add executable checks so changes are easier to make and the system can validate its own work.

Addendum: Next experiments</h2>
Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3>
Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>

A better LLM interface in tooling</h3>
This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive `--help</code> output, or ship` `a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).</p>`

HanLHo. - Fractional Architect & Software Product Engineer - agent-skills

Creating architecture diagrams with C4 and coding agents

A skill to support TIL creation

Combining agent skills

Experience Report: Building a time-tracking AI assistant

HanLHo. - Fractional Architect & Software Product Engineer - agent-skills

Creating architecture diagrams with C4 and coding agents

Operationalising the workflow</h2> I used Codex CLI with Codex 5.3; any other recent coding agent and model will probably work as well.</p> </aside> Going forward, here is how I will instruct LLMs to work with C4 and keep the architecture diagrams up to date.</p>

Addendum: Next experiments</h2> Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3> Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>

A better LLM interface in tooling</h3> This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive --help</code> output, or ship a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).</p>

A skill to support TIL creation

Combining agent skills

Experience Report: Building a time-tracking AI assistant

Addendum: Next experiments</h2>
Some follow-ups I might try if I run this workflow again.</p>

Publishing and representation</h3>
Export to Mermaid (or PlantUML) for embedding in the agent's instructions, but keep the Structurizr DSL as the source of truth. Split the DSL so documentation for each container or component lives closer to the code.</p>

A better LLM interface in tooling</h3>
This requires changes to Structurizr. It could provide build/run instructions for LLMs via an extensive `--help</code> output, or ship` `a dedicated subcommand that prints LLM instructions</a> (similar to bd prime</code>).</p>`