Coordinating Actions of AI Agents

Now that my Gmail agent works, it’s time to add more fun agents! A few I have in mind to start with include:

A ski-report generator
A documentation assistant (for reminders, grocery lists, etc)
A health coach that can advise on how to how to meet my health goals
A local event finder
And many more

However, I’ve configured my Home Assistant to route queries to an API endpoint, and although there is some functionality in the Home Assistant platform to route queries to different endpoints based on keywords, a much more robust approach I’ve decided to take is building out an “Orchestrator Agent” that will receive the query from Home Assistant, assess which agent in my collection is best suited to respond to that query, get a response from that agent, and then return the response to Home Assistant. Let’s take a look at what that might look like.

Design

In order to effectively route queries to the appropriate agent, Claude and I built an Orchestrator Agent that receives the query, and then uses an “intent router” to determine via code or LLM which agent that query should be passed to. It also maintains a record of the conversation so that follow-up questions are handled smoothly without losing context. The overall structure looks something like this:

Let’s take a closer look at some of the components of the Orchestrator Agent.

The Agent Registry

In order for the Orchestration Agent to delegate queries/tasks effectively, it needs to know which agents it’s able to delegate to. The ‘AgentRegistry’ serves as a “source-of-truth” reference for the orchestrator. The AgentRegistry instance includes a dictionary with all of the agents, and a collection of methods to retrieve the ‘capabilities’ of each of those agents.

Conversation Context

For the orchestrator agent to smoothly handle ongoing conversations, it needs to track context with a ‘ConversationContext’ object. These objects include a unique conversation ID, a record of all of the “turns” in the conversation, including the user and agent responses, and the last agent to have responded to the user.

The Intent Router

Finally, we have the Intent Router - this is where the magic happens.

First, when a query comes in, the intent router checks to see if the response seems to reference a continued conversation. For example, if an incoming query starts with “What else…” or “Tell me more about…” and there was an agent that had responded to a query recently, then the router will send the query back to that agent to continue the conversation.

Next, the router checks to see if the query is just a “Hello” or a “Thanks”. If it’s one of those, then the orchestrator agent responds directly to the user, rather than spending time and compute trying to figure out which agent to delegate the query to.

Now the router moves on to actually figuring out where it should send queries that have made it this far. There are two mechanisms by which the router tries to identify the appropriate agent. The first is with a code-based classifier. Since code is faster and cheaper than LLM based classification, this takes care of the “easy” cases. It does this by checking the query for certain keywords or regex patterns, giving the query a “score” for each agent, and seeing if the score for one agent is sufficiently high enough to be confident that it’s the right choice. For example, if the query was “Check my email for messages about my recent snowboard purchase” - then the router would assign a high score for my Gmail agent (since the query includes words like “email” and “messages”, and a regex pattern for “check * email *”, and it would give a low score to the Ski Agent (since it includes the word “snowboard”). Since the Gmail has a sufficiently high score, the Orchestrator Agent would correctly route the query to the Gmail Agent.

However, there might be some cases where keywords and regex patterns are not sufficient to identify which agent to route things to. Maybe the code-based classifier assigns similarly high scores to two separate agents, or low scores to all agents. In these cases, the query is passed to a LLM based router, which is slower and more costly than the code-based classifier, but better at handling nuance. For example, if the query was “Have I heard anything from Carol in the last 4 hours?” then the code-based classifer would assign similarly low scores to all of my agents. There’s no specific word or pattern in that message that suggests that I’m looking for an email. However, a LLM based classifier is capable of looking at that query, assessing the capabilities of the existing agents (Ski Agent, Notes Agent, Gmail Agent, etc) and determining that out of those agents, the only one that could possibly give an appropriate response to that question would be the Gmail Agent. The router would pass the message along to the Gmail Agent, and we’d be off to the races!

Once the router passes the query to the appropriate agent, the agent streams back the response, and the response is streamed back to the Home Assistant OS over the API.

Summary

Now that my Orchestrator Agent has been configured, the general framework of this project is complete! I can ask any question to my Home Assistant device, and if I have my Orchestrator Agent and a separate specialist agent configured, I should in theory be able to get a quick, accurate response. That being said, there’s a high likelihood that as I add more and more agents, it will become more challenging for the Orchestrator Agent to effectively identify what the appropriate agent to respond to a given query is. To keep track of this, and identify when it’s time to revisit my orchestration logic, I’m going to take a pass at building out an evaluation framework similar to what Anthropic describes here. To see how that pans out, check out my next post!

Orchestrating Agents

Design

The Agent Registry

Conversation Context

The Intent Router

Summary

Comments

Building Clarvis

Clarvis Agent Prototyping

More from this blog

Where Things Stand

Evaluating the Orchestrator

Clarvis Agent Prototyping

Hardware Setup

Command Palette

Design

The Agent Registry

Conversation Context

The Intent Router

Summary

Comments

Building Clarvis

Clarvis Agent Prototyping

More from this blog