<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>HanLHo. - Fractional Architect &amp; Software Product Engineer - tool-calling</title>
    <link rel="self" type="application/atom+xml" href="https://hanlho.com/tags/tool-calling/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://hanlho.com"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-05-12T00:00:00+00:00</updated>
    <id>https://hanlho.com/tags/tool-calling/atom.xml</id>
    <entry xml:lang="en">
        <title>When to Use Expert Tool-Calling LLMs</title>
        <published>2026-05-12T00:00:00+00:00</published>
        <updated>2026-05-12T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://hanlho.com/p/when-to-use-expert-tool-calling-llms/"/>
        <id>https://hanlho.com/p/when-to-use-expert-tool-calling-llms/</id>
        
        <content type="html" xml:base="https://hanlho.com/p/when-to-use-expert-tool-calling-llms/">&lt;p&gt;Often, large language models (LLMs) are evaluated in certain categories, for example, coding, writing, or tool-calling. I wasn&#x27;t sure what tool-calling meant, so I decided to do some AI-assisted research. I used &lt;a href=&quot;https:&#x2F;&#x2F;chat.mistral.ai&#x2F;chat&quot;&gt;LeChat&lt;&#x2F;a&gt; and GPT-5.4, going back and forth between them, checking references, to get a better understanding.&lt;&#x2F;p&gt;
&lt;p&gt;My main desired outcome was an infographic that is usable, and I think it does a good high-level job of explaining what tool-calling is and when you should use it. That is why I decided to share it here as well.&lt;&#x2F;p&gt;
&lt;p&gt;To find LLMs to use for this category you could check &lt;a href=&quot;https:&#x2F;&#x2F;llm-stats.com&quot;&gt;LLM Stats&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;To be clear and explicit: &lt;em&gt;the below is generated by an LLM but driven and reviewed by me&lt;&#x2F;em&gt;. I also added a text version that is an almost exact copy after the graph.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;llm-tool-calling&quot;&gt;LLM Tool-Calling&lt;&#x2F;h1&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;info-when-to-use-expert-tool-calling-llms.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Here is &lt;em&gt;the same content&lt;&#x2F;em&gt; in text:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;h1 id=&quot;when-to-use-expert-tool-calling-llms&quot;&gt;When to Use Expert Tool-Calling LLMs&lt;&#x2F;h1&gt;
&lt;p&gt;&lt;em&gt;A quick guide to deciding where tool-calling models create the most value.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;1-final-rule&quot;&gt;1. Final Rule&lt;&#x2F;h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Use a tool-calling LLM when the task is not just &lt;em&gt;“answer this”&lt;&#x2F;em&gt; but &lt;em&gt;“go get information, act on it, inspect what happened, and decide the next step.”&lt;&#x2F;em&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;2-decision-heuristic&quot;&gt;2. Decision Heuristic&lt;&#x2F;h2&gt;
&lt;p&gt;Use a tool-calling model when the task requires &lt;strong&gt;at least 2&lt;&#x2F;strong&gt; of the following:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
&lt;strong&gt;2+ checks = strong fit&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th style=&quot;text-align: right&quot;&gt;Check&lt;&#x2F;th&gt;&lt;th&gt;Heuristic&lt;&#x2F;th&gt;&lt;th&gt;What it means&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right&quot;&gt;✅&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;External state access&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Needs live or private data from code, databases, logs, tickets, APIs, calendars, or documents.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right&quot;&gt;✅&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;Iterative execution&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Requires action → inspect result → refine → rerun.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right&quot;&gt;✅&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;Structured action&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Needs API calls, SQL queries, JSON arguments, file edits, or workflow steps.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right&quot;&gt;✅&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;Cross-system synthesis&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Combines information or actions across multiple tools or systems.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right&quot;&gt;✅&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;Verifiable output&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Produces something checkable: tests, tickets, records, charts, or citations.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;3-top-3-high-differentiation-use-cases&quot;&gt;3. Top 3 High-Differentiation Use Cases&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;1-multi-system-workflow-automation&quot;&gt;1. Multi-System Workflow Automation&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Best when&lt;&#x2F;strong&gt; work spans multiple tools and requires conditional logic, structured API calls, and stateful execution.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Examples&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub issue → Jira ticket → Slack notification&lt;&#x2F;li&gt;
&lt;li&gt;Support ticket → order lookup → refund check → escalation&lt;&#x2F;li&gt;
&lt;li&gt;CRM update → email draft → calendar follow-up&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h3 id=&quot;2-codebase-exploration-debugging-maintenance&quot;&gt;2. Codebase Exploration, Debugging &amp;amp; Maintenance&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Best when&lt;&#x2F;strong&gt; the model must inspect code, follow references, run tests, read logs, and compare implementation to specs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Examples&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Trace authentication flow&lt;&#x2F;li&gt;
&lt;li&gt;Investigate failing tests&lt;&#x2F;li&gt;
&lt;li&gt;Find affected files for a feature change&lt;&#x2F;li&gt;
&lt;li&gt;Draft a patch with file references&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h3 id=&quot;3-structured-data-analysis-operational-analytics&quot;&gt;3. Structured Data Analysis &amp;amp; Operational Analytics&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Best when&lt;&#x2F;strong&gt; the model needs to query, transform, analyze, or visualize live data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Examples&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Generate and refine SQL&lt;&#x2F;li&gt;
&lt;li&gt;Analyze churn or revenue by segment&lt;&#x2F;li&gt;
&lt;li&gt;Run Python anomaly detection&lt;&#x2F;li&gt;
&lt;li&gt;Create a chart and explain outliers&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;4-do-not-use-a-tool-calling-llm-primarily-for&quot;&gt;4. Do NOT Use a Tool-Calling LLM Primarily For&lt;&#x2F;h2&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Avoid&lt;&#x2F;th&gt;&lt;th&gt;Reason&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Pure writing or ideation&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Better handled by a writing-optimized or general-purpose model.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Summarizing text already provided&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;No external data access or tool use is required.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Static Q&amp;amp;A without external data&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;A general reasoning model is usually sufficient.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;High-risk production changes without approval&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Requires human oversight, review, and explicit approval.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Legal, medical, or financial decisions without expert review&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Tool use can support retrieval and analysis, but expert validation is required.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;hr &#x2F;&gt;
&lt;h2 id=&quot;compact-decision-rule&quot;&gt;Compact Decision Rule&lt;&#x2F;h2&gt;
&lt;p&gt;Use an expert tool-calling LLM when the task involves:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Getting information&lt;&#x2F;strong&gt; from live or private systems.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Taking structured action&lt;&#x2F;strong&gt; through tools, APIs, queries, or file operations.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Inspecting results&lt;&#x2F;strong&gt; and adapting the next step.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Producing verifiable output&lt;&#x2F;strong&gt; such as tests, tickets, records, charts, or citations.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;If the task is only to write, summarize provided text, or answer from static context, a tool-calling LLM is usually unnecessary.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
