Agentic engineering: Modeling tools as capabilities

June 1, 2026

For as long as there have been human hands, there have been tools, and a tool has always been the same kind of thing: something we set between ourselves and the world to reach what we, on our own, never could. We are small and the world is large, and the tool is how we close the distance. Look closely and every one of them faces in one of two directions. Some reach out to let us see further than our senses allow. Others reach out to let us change what our hands alone could never move.

The telescope and the microscope are pure observation: they alter nothing out there, they only widen the narrow window of the eye, until a smear of light becomes a galaxy and a single drop of water becomes a teeming city. The lever and the plow and the lit fire are the other kind entirely: they reach into the world and leave it changed, a stone moved, a field turned over, a night held back a while longer. One sort of tool brings the world to us. The other carries us out into the world. And the whole of what follows lives in the difference between those two.

A world small enough to see

Here is the quiet miracle hidden in all that tool-use. Because we can watch the change a tool makes, and check the reading it hands back against what we already trust, we slowly build, on the inside, a model of the world: a small private copy of how things are and how they answer when they are pushed. On that inner model we do our reasoning, predicting and weighing the consequence of a move before we ever make it, and only then reaching out. This is the very thing we must now give to a language model, which can think but cannot reach, no more than your own hand can pass through the glass of a screen. So we will build it a world of exactly this shape, small enough to hold whole in the mind.

That world is a maze: a small grid of corridors and walls, one square its entrance, one its way out. Call it the world model, the ground truth of the whole little game, and mark well where it sits, outside the program we are about to write, never once inside it. A single rule gives the maze the whole of its meaning, and everything that follows, every clean step and every door never offered, falls out of it like rain from a single cloud: the model cannot see it. The walls live here, in the world model, and nowhere else, ours to gaze down upon from above and hidden completely from the very thing we will one day set loose inside.

It is always easier to show than to tell, so here is the whole of it at once, ours to take in from the outside, and dark, completely, to the one who must someday cross it.

A four by four grid maze. Open corridors are clean light squares; walls are hatched squares. The entrance is marked S at the top left and the exit E at the bottom right. A dashed frame labels the whole grid as hidden state, the part the model never sees.

Between that blind model and the world stands everything we are now going to build, and the whole of it is made of tools, which are the only door into the world model: nothing reads it, and nothing changes it, except through them. The two directions we found every human tool facing, the seeing and the changing, are the two kinds of tool we are about to build. A query reads the world and leaves it untouched. A mutation changes it. And because every last touch of the world runs through a tool we ourselves wrote, the world model stays deterministic, accountable down to its smallest move: nothing ever happens in it that we did not first make possible. Keep that one division, the seeing from the changing, close at hand; everything we are about to build is built upon it.

A tool is a capability

To bring the model through the maze we must let it act, and yet a model cannot reach into the world and rearrange it as it pleases. The world model changes only through the actions we define and lay within its reach, and the unit of that reach is what Scott Wlaschin calls a capability. A capability is a move with the world already baked inside it: not a description of something the model might like to try, but the deed itself, sealed and ready, that the model may only take or leave. move_east is a capability. To exercise it is not to ask to go east; it is to go east, to change where the explorer stands. The model cannot invent a move the maze does not define, cannot conjure a square out of nothing and step onto it. It can only ever choose among the capabilities it has been given.

To read the world and to change the world are not the same act, and that difference, the very one we drew between the telescope and the lever, now orders everything from here. A query is of the seeing kind. It can show the model where it stands, or what lies just beside it, and in a world as small as this one it costs us nothing at all, for it leaves the world exactly as it found it. A mutation is of the changing kind, and something else entirely. Every single one we hand over is a grant of authority, a standing permission to alter the world in one particular way, and to give the model a mutation is to give it exactly that much power over the state of the world model, no less.

So the question that governs all the rest is not how clever the model is. It is which capabilities the model holds when its turn comes round. And once you see that a capability is a grant of authority, the safe answer almost writes itself: work out, beforehand, the moves that are truly legal from where the model stands, and hand it those, and only those.

Handing over only the doors that exist

And here is what lets us do exactly that: the world model is deterministic. Within its walls, given the state of things, what may happen next is something we can work out beforehand, by our own hand, before the model is asked anything at all. So we settle the exact set of moves that are legal from where it stands, and hand it those and nothing besides. At a junction the thing is plain to the eye: we draw only the doors that genuinely exist and hold them out, and the walls are not refused, you understand, they are simply never offered.

The queries are fixed, the same readings on offer every single turn. The mutations are not. The legal moves are not something the model asks for; they are something we compute, fresh, before its turn has even begun. From the state the world model happens to be in, we work out which moves are legal and call exactly those, and only those, into being, as the mutations the model may reach for this turn. At the entrance nothing lies to the north or the west but the world's own edge, so only move_east and move_south are ever created; the other two simply never come to exist, and so there is nothing there to call. The computation is deliberately, almost proudly, dumb. It does not plan, it does not search, it has no notion where the exit lies. It reads the board and lists what is legal, the way a tic-tac-toe board lists its empty cells, and nothing grander than that.

Because the move was proven legal before it was ever built, taking it cannot fail; there is no branch inside it for a wall. The illegal move is not rejected. It is unrepresentable, which is a stronger and a quieter thing. The mutation that walks into the wall was never created this turn, so the model cannot call it, cannot so much as name it. And notice, the model needs no separate sense to find the open doors: the moves it is handed are the open doors. The shape of its choices, narrowed each turn to only what is legal, is all the sight it needs to know which ways lie open. Stand a moment in its shoes and feel how strange and how safe this is: you reach for the ways forward, and your hand closes only ever on ways that are truly open, because no false one was ever set within your reach. This is the old principle of least authority: you make a system safe not by trusting the caller to behave itself, but by handing it only what it is permitted to do. Wlaschin's counsel to design as if for a malicious caller turns out to describe a language model rather beautifully. Not malicious, no. But unbounded. And best met in the very same way.

The loop that wields the tool

So one whole turn now works, from end to end: see the world through a query, choose among the changes that are real, change the world through the one you chose. The step from here is the simplest imaginable. We do it again. That repetition has a name and a shape, and the name is a loop, and the heart of its turn is simple enough to hold in a single breath. From the state the world model is in, it computes the legal moves and builds them into the mutations for this turn, and hands the model those, and the standing queries, and nothing else. The model chooses one. The loop applies it to the world model, reads the world the move has left behind, and comes around again. The choosing belongs to the model; the working out of what may be chosen belongs to the loop, and here, in a world this small, it is only arithmetic.

This is worth lingering over, for it is exactly where the safety lives. Legality is settled by that computation, deterministic and dumb, before the model is ever consulted. The loop does not ask the model whether a move is allowed; it has already settled the matter, and builds only the moves that pass. In a world this small the test is a single line of arithmetic, and that line, modest as it is, is precisely the part we can truly reason about. Other corners of the loop, as we will see in a moment, do lean on models of their own. This corner never does. The boundary of what the agent may do is drawn by a computation we can read with our own eyes, and the model's judgment is never once permitted to widen it.

A window, and a memory behind it

Now let the loop run, and a trouble appears that a single turn never had. Turn after turn after turn, each one leaves its trace behind: where the explorer stood, the move it chose, the world that move left in its wake. Let it run long enough and that record swells past anything the model can hold before it at once. The context is not endless, you see. It is a window, a fixed number of recent moves, and that bound is the agent's budget for the turn. The window cannot grow without end, so neither can the cost of any single turn, however long the run goes on.

But a window that simply lets its oldest turn drop is a window that marches the explorer straight back into walls it had already learned by heart. So before the oldest turns slip over the edge, we compact them. Compaction is not a single stroke but a little pipeline of filters, each one putting a narrow question to the turns about to be lost. Which of these still bear on the road ahead? Which were made meaningless by where the explorer now stands? Which merely say again what is already known? The questions are strict, but the judgment they call for is not arithmetic, and so the whole pipeline is given to a second model, the compactor, which takes the filters one by one, reads the turns, and rules on what shall survive. What passes is kept. What does not is gently let go.

What survives is folded into a memory, and the memory is kept the same way, by judgment and not by arithmetic. A third model, standing behind the player and the compactor, takes what compaction spared and the memory already held, and draws the two together into the facts worth keeping: the walls discovered, the dead ends sealed, the shape of the path walked so far. It never reads back across the whole long history, only over what is leaving the window and what is already distilled, so its own work stays bounded however long the run goes on. The memory is not a growing heap of all that ever happened. It is a small and careful reading of it, rewritten as the run lengthens so that it stays a handful of facts and never a hill of them, small enough to set back into the player's hands inside the very window it was meant to relieve. And that is what makes the budget bearable: the explorer carries forward what it learned, without dragging behind it every single step that taught the lesson.

Notice how the labor is divided here. The player does none of this. It does not decide what to remember, nor when to compact, nor which turn tumbles out of the window. It only chooses its next move. The remembering and the forgetting are done all around it, by models of their own, so that the player is never asked to keep house over its own memory while it is busy playing. And not one of those models can widen what the player is allowed to do. They shape what it knows; the legal moves, computed turn by turn, still bound what it may attempt. Intelligence is spent freely on context, and never, ever, on the boundary of authority.

The same loop, a different maze

None of this, of course, was ever truly about mazes. Strip the corridors away and look at what remains there on the page: a computation that reads the state of some world and yields up the moves that are legal within it, and a loop that builds those moves, takes the one the model points to, and asks again, holding only a window of recent moves and a memory of what they taught. Change the world entirely and the shape does not so much as shiver.

Now set a repository where the maze once stood. The legal moves are no longer compass directions but the things you may do to a codebase, and which of them exist still depends, exactly as before, on the state of things. Commit only when something is staged, merge only on a clean branch, deploy only when the tests run green, and the model can no sooner deploy a red build than the explorer could step bodily through a wall. The loop keeps its shape without a tremor, and the window and the memory cross over with it: build the legal moves, take the one the model points to, read the world it left behind, ask again, holding in view only the slice of recent work and a curated record of what has been tried, decided, and learned.

But we owe the reader honesty about what the maze made easy. In the maze, legality was a single line of arithmetic we could read with our own eyes, and it was so for one reason only: we had built the world ourselves, and could see the whole of it at once. A repository is neither of those things. It is not a world we own, and never one we see entire. "Is the branch clean" is still cheap and certain, but "are the tests green" is neither, for the only way to know is to run them, long minutes of work for an answer that may already be stale by the time the next move is chosen. The shape of the loop is the easy half, and it does carry over untouched. The computing of what is legal is the hard half, and it is exactly the half the maze handed to us for free. Nearly everything difficult about a real agent lives in that one step, the honest drawing of the legal set in a world too large to hold whole and too alive to wholly trust. The maze was never the whole of the thing. It was the half we had already solved, drawn at last small enough to see.

The runtime is the room

There is a more principled version of this very room, and it goes by a name in the stack I tend to reach for. We have drawn the room's shape here by hand, but it is a shape an effect system will draw for you. A capability becomes a service: something the program may use only because the runtime has chosen to provide it. The agent loop becomes an effect that declares, right there in its very type, the services it leans upon, and refuses, flatly, to run at all until they are supplied.

The runtime is the one that provides them. It reads what an effect requires and hands over exactly that, and nothing more. This is the outer boundary, the question of which capabilities the program may ever touch at all, fixed before it draws breath and far beyond the model's reach to widen. The finer line, which of those capabilities are legal this turn, the loop goes on drawing just as we have drawn it here. The two of them stack, one atop the other: the runtime settles what is possible at all, and the loop settles what is offered now. Least authority stops being a discipline you must remember to keep, and becomes, at the outer edge, a property the types themselves quietly enforce. But that is a post all its own, and the very next one. For now it is enough to notice that the room we raised by hand already has a proper architecture waiting, patient, just beneath the floorboards.

Conclusion

We set out to bring a model through a world it could not see, and the whole of it came down, in the end, to the tools we placed within its reach. A query reads the world and, in a world like this, costs us nothing to grant. A mutation changes it, and so it is a grant of authority; and the safe way to hand one over is to compute, first, the moves that are genuinely legal, and to build only those. The illegal move never has to be refused, because it never becomes a move at all. This bounds what the agent can do, mind you, not whether it does the wise thing: handed only legal moves, it may still reach for a poor one, and choosing well remains forever the model's own work. Around those moves turns a loop, and the loop does not labor alone. Three models sit at its table, each to its own task: one plays and chooses, one compacts the turns as they age, one distills them down into a memory. Yet not a single one of the three may widen what the player is permitted to do, for that boundary is the loop's to draw, and it draws it by plain arithmetic. Intelligence is poured out freely on what the agent knows, in a bounded window and a curated memory, and never once on what it may do. And beneath the whole arrangement a runtime waits, ready to hold this very shape in its types, which is a room we will step into next.

A model is already the cleverest thing in the room, after all. What it lacks is never cleverness, only a world built well enough to hold it, and the right tools laid within its reach. So build the smallest world your model needs, hand it only the moves that truly exist, and let it choose. And then, perhaps, turn the same question gently on yourself. You too move through a world you only ever half see, reaching into it through tools, carrying in your head a model of how it answers back. Which world model are you reasoning from, and which capabilities, truly, are the ones already in your hands? The agent feeling its way through the maze was only ever us, drawn at last small enough to see.