Skip to main content

Command Palette

Search for a command to run...

Hello Home Assistant

Updated
5 min read
J

Professional Solutions Architect by day, casual project-er by night. Based in Portland, OR.

As I mentioned in my last article, this blog will mostly be documenting the projects I work on. A couple years back, I built a homemade plant-watering system to wrap my head around AWS based IoT development (unfortunately at the time I didn’t have the sense to document my progress). This time around, I’m planning on building a voice activated home assistant, supported by agents specialized in accomplishing certain helpful tasks like checking email, updating my calendar, maintaining notes/lists, and more.

The general idea is that it should operate largely the same way as a Google Home or an Amazon Alexa - voice activation, internet access, the works. However, my homemade system will have a few advantages. It won’t use any of my data for commercial purposes (serving ads) and I’ll have more direct control over the flow of that data. Additionally, if all goes to plan, I’ll be able to add additional tools to give my system the ability and autonomy to do the things that help me out the most.

Groundwork Learning

I’m coming into this project with a lot to learn, but I’m also coming in with a lot that I do already know. That’s in large part thanks to the following courses:

  • AI Engineer Core Track: LLM Engineering, RAG, QLoRA, Agents” by Ed Donner - this was a great first step into LLM engineering for someone like myself, who already had a good understanding of Python and a decent amout of exposure to frontier LLMs (only via web UI), but no experience making calls to LLMs via API or SDK, let alone running a LLM locally. The course starts simple, with an intro to LLM APIs for a variety of models (from OpenAI, Anthropic, Google, and others) via both direct API calls, and by leveraging the OpenAI Python client. It then detours into running small open source LLMs (like Ollama) locally, and introduces a few platforms helpful to LLM engineers, like HuggingFace and Gradio. The second half of the course is spent primarily on methods to improve the outputs of LLMs, via inference-time techniques (prompt engineering and RAG) and training-time techniques (fine tuning). I found this course to be super helpful in understanding what’s “under the hood” of a Large Language Model, and felt like I concluded the course with a good understanding of the basic techniques and methods, and a decent amount of hands-on experience from small projects throughout the course.

  • AI Engineer Agentic Track: The Complete Agent & MCP Course” by Ed Donner - this follows the first course nicely with lessons that translate a general understanding of LLMs to an understanding of how to leverage LLMs to build Agentic tools and workflows. While the course covers several frameworks for building agents, I opted to go through the sections on general agent concepts like tool-use, general agent patterns (as described by Anthropic here), evaluations, etc., and then move onto learning how to use the OpenAI Agents SDK before moving onto learning about setting up and using MCP servers.

  • Building with the Claude API” from Anthropic - this is the course I’m working through currently. While it does cover a lot of the same material as described in the courses above, it’s been super helpful for understanding how to use the Claude API specifically, and there have been a couple sections that have gone more in depth than the courses above. For example, there was a lot of great content in the Prompt Engineering and Prompt Evaluation sections, and the intro to Claude Code has been very helpful as well. I’ve still got a bit to wrap up on this one (Features of Claude, Model Context Protocol, and Agents and Workflows) but so far I’d highly recommend it.

  • There are a couple other courses I’ve either barely started on or am considering, which I’ll mention in future posts as I make more headway on them.

The Task at Hand

As I mentioned above, the objective of this project is to build a fully functional home assistant, augmented with Agents that enable it to go beyond what a typical off the shelf home assistant might be able to do. The hardware I’ve picked out to start on this includes:

  • A Home Assistant Voice Preview Edition - this is the box that will contain the speaker/mic for I/O.

  • A MINISFORUM UN100P mini PC - this is where the local orchestration will happen. The Home Assistant OS will run here, handling the basic I/O, and with an Intel N100 processor, 16GB of RAM, and 256GB of SSD, it should have capacity to handle tool use and host some use-case specific MCP servers if I decide to make my LLM calls from the box.

On the software side, I’ll be developing on Mac, and I’ll handle any cloud compute on AWS. I plan on leveraging Claude Code heavily, and I’d like to try out a concept that’s new to me called “Compound Engineering” - devised by the good folks at Every. The general idea behind Compound Engineering is that a coding agent can go through a plan/build/review loop during the development process, and during each loop it “compounds” its learnings so that it’s more effective on the next loop/task. More on this later.

This week, the goal is just to get through the basics - ie setting up the Home Assistant OS on my mini PC and working on setting up a base agent to use for an initial test round. As I get through this I’ll get a better grasp on what the full architecture should look like (Should I make my LLM calls directly from the box? Or from AWS Lambda or Bedrock? Where does speech-to-text happen? Etc). On my next post, I’ll have some updates on how things play out and where we go from there.

Building Clarvis

Part 6 of 7

In this series, I'll be documenting my latest project - building a personalized Home Assistant named "Clarvis".

Up next

Welcome to Jim's Blog!

Glad you found the place - make yourself at home