Large Language Models vs their Harness

Gemini deleted all my files on my machine, Claude deleted my entire project including my git history. ChatGPT did... You've heard people online throw statements like this before. There has been potential confusion on the distinction between what large language models are capable of by themselves, and the various agentic harnesses that are built around them.

At its core, a large language model is simply a system that takes in one modality and outputs another modality. In most cases, this means taking in text and outputting text. It does not inherently know how to interact with the outside world, browse the web, or execute code. It is just a sophisticated pattern matching engine that predicts the next sequence of tokens based on the context it is provided.

What is a Harness?

All a harness really is, is just something that orchestrates when to call the large language model. It acts as the control layer around the model. When you want the model to interact with external systems, the harness adds a list of tools and the instructions on how to call them as part of the system prompt. This gives the model the necessary context to understand what capabilities it has access to.

The Core Orchestration Loop

When the large language model generates its response, it outputs text. The harness then steps in and scans this text for special syntax to see if the model generated tool calls. If it did, then the harness itself executes the actual tools. It is the harness, aka the software layer that runs the code, queries the database, or fetches the webpage.

Once the tools return their results, the harness simply adds the outcome of those tool calls back to the context window and recalls the large language model. This loop allows the system to continue reasoning with the new information and take further actions if necessary.

Going back to what we were originally discussing, a destructive command like rm -rf * --no-preserve-root might be generated by the large language model, but it is the harness that actually invokes the tool to execute it. By adding a control layer over what commands the harness is allowed to execute, you can block some potentially dangerous actions. While you cannot catch everything, you as the human can control an allow or block list for added safety. Harnesses like Cursor, Antigravity, and others all have this kind of functionality built in to protect your system.

There have been some great tutorials published recently on how to build harnesses, which really drive home the point that the core orchestration loop is quite simple. For instance, Thorsten Ball wrote a fantastic guide on building a harness in Go here. Similarly, Mihail Eric authored an excellent piece on building one in Python here. There is also a great video by Theo that references both of these articles here.

While these resources are fantastic, I want to abstract the notion of a harness even further. A harness is ultimately just a wrapper around a large language model. Those articles all focus on terminal based tools, but you could just as easily build a harness that works entirely in the browser and calls web APIs. Obviously, terminal tools are typically more useful in practice, but the concept of a harness is far broader than just the command line. This is exactly the concept I am trying to convey.

As such, I have prepared a demonstration. Albeit not very useful for daily tasks, it is designed to show you just how broad and applicable the concept of a harness truly is.

Interactive Demonstration - Web based Harness

To make this concept crystal clear, I built a simulated educational harness below. It does not use real large language models or backends, but it perfectly visualizes the step by step process we just discussed.

When you select an action, you can observe the following orchestration steps in real time:

User Request: The user sends a prompt to the large language model.
Generation: The model streams its response, realizing it needs a tool, and outputs the necessary tool syntax.
Interception: The harness, which is constantly scanning the output, detects the tool syntax, pauses the stream, and extracts the command.
Execution: The harness executes the actual web API using the extracted arguments.
Feedback Loop: The harness feeds the execution result back into the model's context so it is aware of the outcome.
Completion: The model reads the tool result and provides a final conversational response to the user.

Try selecting an option below to see this orchestration in action.

Educational Harness Demo

Ready

Select an action below to see how a harness works step-by-step.

Choose a demonstration:

This browser simulation is very similar to how an agentic harness in a terminal environment operates. Instead of interacting with web APIs, a coding harness gives the model access to tools like ripgrep, ls, and cat in order to find, view, and modify your files. The underlying orchestration loop remains exactly the same.

Going Further: Production Ready Harnesses

To go even further and explore how to build actual production ready harnesses, I strongly encourage you to read the recent research paper Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems by Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, and Zhiqiang Shen.

This paper dives deep into the actual architecture, memory management, and control plane permissions required for a production ready harness, using Claude Code and OpenClaw as practical examples.

Sources:

Thorsten Ball How to build an agent
Mihail Eric The Emperor Has No Clothes
Theo How does Claude Code actually work?
Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems