A few observations from building agents the past few weeks:
-
Raw model calls are the wrong primitive for 99% of agent-building use cases. Most don't demand building a harness from scratch.
-
Most agents are coding agents. Regardless of the business outcome an agent drives, it ends up looking a lot like a coding agent: interacting with a filesystem, writing & executing code, calling APIs, running bash. Coding ability is converging with general knowledge-work capability.
-
Models are reaching parity on evals, but they all "behave" differently. Tool-calling patterns, verbosity, laziness, sycophancy, hallucination tendencies, instruction following, output formatting/consistency, etc. This means there's value in harnesses tuned to each model. Perhaps model providers themselves are best positioned to build these?
-
With a pre-built harness, the things people should actually have to configure are prompts, tools/MCPs, hooks, and the context/filesystem available to the agent. "Agent APIs" that expose these as first-class params will likely become the dominant primitive vs. model calls wrapped in custom, hand-rolled harnesses.
-
In that world, for products that let end users swap models, expect model switchers to get replaced by agent (model + harness) switchers. This creates room for something like OpenRouter for Agent APIs.