Day one, and the terminal is winning.
I have spent twenty years buying media and roughly four hours, so far, learning that the Unix terminal does not care about my feelings. I can plan a national CTV campaign in my sleep. Ask me to navigate a file system with no mouse and I am suddenly a man holding a torch in his own house, looking for the light switch. There is a real learning curve here, and I am very much still on the steep bit.
The point of all this is Leo, my home AI setup. The first proper milestone was getting OpenWebUI running so I had a sane front end to talk to models through, rather than living entirely in a black rectangle. That part actually went well. The trouble started when I asked it to do anything serious.
Leo currently lives on a machine with 32GB of RAM, which it turns out is plenty for browsing and pretending, and not nearly enough for running a decent local model with any room to think. Things slow to a crawl, the fans spin up like the thing is trying to take off, and you sit there watching a token appear every few seconds like it is being delivered by post. So I have done the only sensible, completely unhinged thing and ordered a Mac Studio with a great deal more memory to give it actual headroom. Money where the mouth is.
In the meantime I have been leaning on hosted models through OpenRouter, which is brilliant and also a quiet little minefield. Leave the taps running on the wrong model and the costs add up faster than you would like. It is a useful discipline, honestly. Nothing teaches you to write a tighter prompt like watching a meter tick.
The other thing that has stood out, having now poked at a fair few of them, is how stark the gap still is in the agentic space. For anything that needs to plan, use tools and not lose the plot halfway through, Claude is still clearly out in front. The others are catching up in patches, but it is not close yet for the work I actually want Leo to do.
Next on the bench: once the Mac Studio lands, I want to properly road-test a local model on it. I am eyeing Gemma first and I am keen to benchmark it honestly against Qwen, on my own tasks rather than someone else's leaderboard. No numbers to report yet. That is rather the point of a log.