Clarvis Agent Design Prototype

There are many, many things I’d like a personal home assistant to be able to do. Schedule my doctors appointment, pay my bills, write my grocery list. Hopefully I’ll get to all of that eventually, but keeping an eye on my personal inbox, dodging spam and looking for things that actually deserve a response, seems like a pretty good starting point.

Planning Process

In addition to this being my first time building an agent with the Claude Agent SDK, it was also my first time building with Claude Code. There were as many learnings about how to use Claude Code as there were about how to build Agents, and I’ll do my best to describe them as I go.

In trying to wrap my head around Claude Code, one of the first things I asked it to do was “Write me a quickstart guide for using Claude Code”. The result was a beautifully formatted markdown file with all kinds of helpful tips and tricks for getting started. Very importantly, I learned about “Plan Mode” - a feature that makes Claude Code come up with an entire plan to achieve whatever goal you give it before it starts taking action. I knew I’d need a good MCP server to integrate with Gmail, so I set Claude Code to plan mode and told it something along the lines of “Come up with a plan to build me an agent with the Claude Agents SDK, using the GongRzhe MCP server, that can check my emails for me.” Claude came up with a plan, I reviewed it, and we were off to the races.

Let me pause for a moment to share a couple things I wish I had known during this phase:

In Claude Code, you can use /model to change which model (Haiku, Sonnet, or Opus) is being used. The default is Sonnet, but for coding work I’ve found that Opus is almost always worth the extra tokens. When I started on this agent, I didn’t consider that the agent backing Claude Code might not be done with Opus, so all my planning, and the first part of my development, was guided by Sonnet.
You can explicitly tell Claude how hard you want it to think about certain things. I’ve found this to be especially helpful during planning phases. You can even specify exactly how hard you want Claude to think - ie "think" < "think hard" < "think harder" < "ultrathink”, with deeper thinking costing more tokens. I didn’t know this when I started this project, so I’m not actually sure I was using thinking mode at all when setting up my plan. In hindsight, for a task of this complexity (building a MCP backed agent completely from scratch) I probably would have told Claude to “think pretty hard” at a minimum.

Building

For being the step where the rubber hits the road, I have very little to say about the building phase. But then again I suppose that’s the beauty of Claude Code. Once the plan is laid out, Claude can pretty much go to town with the building. Before I knew it, Claude told me “✅ Setup Status - READY TO RUN”!

Although, as we’ll see in the next section, the build could have been executed better, it was actually fairly faithful to the plan that I had approved. Truly, the plan step is the crux of any implementations success. In this case, there was only one thing I wish I had told Claude to incorporate into the build process:

For each step of the build, I’ve found that it helps to ask Claude to come up with tests that demonstrate that the current build step has been completed successfully. Typically, now I’ll include something like this in the plan prompt: “For each step of the build, write one or more tests. If a test fails, identify and resolve the cause until the passes. Do not proceed to the next step until all tests pass.” That’s made successive projects much smoother. In this case, I didn’t ask for those tests, and as we’ll see in the next section, there was a decent amount of debugging required to get things running.

Testing/Troubleshooting

Once the initial build was complete, it was time to test. This is where things started to get shaky, and I had to pay for not doing the things I mentioned above (using Opus, thinking mode, and tests).

When I initialized the agent, I would actually start up and let me input a prompt (like “Tell me what the last email I got was”) but it would quickly respond with something like "I don't have the necessary permissions to access your Gmail account". Thus began a long back and forth between Claude and I trying to figure out how to troubleshoot this.

Claude would say “Oh we forgot to set up this file!” and I would say “No, it’s actually just in this different directory”.

Then Claude thought my environment variables (including the secrets necessary to access my Gmail account) weren’t being passed through to the MCP server (they were).

At one point, we tested the MCP with MCP Inspector and found that it worked, which led Claude to believe that there was a problem with the Claude Agent SDK, and that I should submit a bug ticket to Anthropic (I’m glad I didn’t - that would’ve been embarrassing).

After a substantial amount of guiding Claude through troubleshooting steps, we found that it was something obvious. In order to allow agents to use MCP tools, they have to be explicitly allow-listed. After a couple hours of back and forth troubleshooting, it turned out to be some standard configuration.

options = ClaudeAgentOptions(
            system_prompt=SYSTEM_PROMPT,
            mcp_servers={
                "gmail": {
                    "type": mcp_config["type"],
                    "command": mcp_config["command"],
                    "args": mcp_config["args"],
                    "env": mcp_config.get("env", {}),
                },
            },
            model=self.config.model,
            max_turns=self.config.max_turns,
            allowed_tools=self.config.allowed_tools
            # ^ this line allows the tools I needed. 
            # That one missing line is what took so long to debug.
        )

In the end, the biggest pointers I learned about troubleshooting were:

Make Claude create and maintain a running doc of troubleshooting steps and results. As context rots or is compacted, Claude will need a quality source of truth to remain productive.
Be wary of taking Claude’s findings at face value. Verify yourself, or ask Claude to explain its reasoning or show evidence. Also, when prompting Claude to solve something, it can help to tell it “If you can’t find the answer, say so” in the prompt. That will help prevent it from hallucinating solutions.

Mission Accomplished…Finally

After a few hours (most of which were spent troubleshooting) I got a functional Gmail agent that I can ask to search through my inbox for things. As a proof of concept, it has a lot of room for improvement, and will ultimately be just one piece of a much larger system of agents/subagents.

The real value here was the lessons learned along the way. In future posts I’ll outline how my use of Claude Code has changed over time.

And a final note - I could have saved myself a whole lot of time had I just read this lovely guide on how to use Claude Code from the start :’)

Clarvis Agent Prototyping

Planning Process

Building

Testing/Troubleshooting

Mission Accomplished…Finally

Comments

Building Clarvis

Hardware Setup

More from this blog

Where Things Stand

Evaluating the Orchestrator

Orchestrating Agents

Hardware Setup

Command Palette

Planning Process

Building

Testing/Troubleshooting

Mission Accomplished…Finally

Comments

Building Clarvis

Hardware Setup

More from this blog