Dispatch, Don't Chat: The Fire-and-Forget Model for AI Agents
Most AI products keep you staring at a stream of tokens. The dispatcher model is different: one sentence out, the agent runs in the background, results come back when ready. Why fire-and-forget is the right interaction model for agents on the go.
Eric Shang
Founder, Nexting Inc.

The right way to use a capable AI agent is to dispatch it, not chat with it. Say one sentence, let the agent work in the background, and let the result come back when it is ready — the way you delegate to a competent colleague. The chat-and-stream interface that dominates today trains you to babysit a machine that does not need babysitting. This essay argues that fire-and-forget is the correct interaction model for capable agents, explains how we drifted into chat, and shows what a good dispatcher must do.
The uncomfortable thing you do all day with AI
Watch yourself the next time you use a capable AI model. You type a request. A cursor blinks. Then tokens begin to arrive, one small fragment at a time, and you read them as they land. You are not really reading — you are surveilling. You scan each phrase to check whether the machine is going off the rails, whether it misunderstood, whether you should hit stop and re-prompt. Your eyes are locked to a progress bar made of words. For thirty seconds, or two minutes, or longer, a tool that is supposed to give you back your time is instead consuming your full attention.
This is the default posture of AI in 2026, and almost nobody questions it, because it is the posture every product ships. Open any assistant and the screen is a chat thread. Open any coding tool and a panel streams the agent's thoughts at you in real time. The interaction is synchronous by construction: the human and the machine are pinned to the same clock, and the human waits.
The thesis of this essay is simple and, I think, increasingly obvious: for capable agents that can run for minutes or hours, synchronous chat is the wrong interface. It made sense for a question-answering toy. It makes no sense for a worker. The correct model is the one every functioning organization already uses to coordinate competent people — you dispatch a task and you walk away. We call it dispatch, not chat. Fire-and-forget.
“I just want results. I don't need real-time interaction. I need a tool to dispatch tasks anytime, anywhere.”
That line is from Eric Shang, the founder of Nexting, and it is the cleanest statement of the problem I have found. It sounds almost too plain to be a product thesis. But sit with it. The entire industry has built its interface around real-time interaction, and a large fraction of users do not want real-time interaction. They want the work done. The gap between those two facts is the whole story.
How we ended up chatting with our software
The chat interface did not arrive because someone proved it was the best way to command an intelligent machine. It arrived through a long accident of history, and it stuck because it was easy to ship.
The dream of talking to a computer is old. In 1950 Alan Turing framed the question of machine intelligence as a conversation: could a program converse with a person without being unmasked as a machine? That framing put dialogue at the center of the field before anyone had built anything. Then in 1966 Joseph Weizenbaum wrote ELIZA at MIT, a program that imitated a psychotherapist by pattern-matching the user's words and reflecting them back as questions. ELIZA understood nothing. It was, by Weizenbaum's own description, a caricature. And yet people poured their private anguish into it, convinced something was listening.
That reaction terrified Weizenbaum and should still instruct us. The conversational frame is psychologically sticky. A back-and-forth exchange triggers our social machinery whether or not anyone is home on the other side. Decades of chatbots followed the same groove — PARRY in 1972, A.L.I.C.E. in the late 1990s — all of them conversational, because conversation is what we expected of a thinking machine and conversation is what felt alive.
Then the interface got real, and got frozen
When large language models finally made the conversation genuinely useful, the conversational shell was already the default expectation. ChatGPT shipped as a chat thread because that was the obvious container for a model you talk to, and because token-by-token streaming solved a real product problem: early models were slow, and a response that appears all at once after a long blank pause feels broken, while a response that streams feels responsive and lets you bail early if it starts wrong.
So streaming chat became the norm for sound reasons that applied to a 2023 toy. The interface gave quick feedback, a sense of responsiveness, and the chance to interrupt based on partial output. Those benefits are real. The problem is that we never revisited the decision when the underlying thing changed. The model stopped being a slow autocomplete you supervise and became an agent that can plan, call tools, edit files, and run for an hour. The clothes no longer fit, but we kept wearing them.
The hidden tax of watching a token stream
Streaming feels free. It is not. There is a measurable cognitive cost to reading content that is being pushed at you on the machine's schedule rather than pulled by you on yours, and recent human-computer interaction research has started to quantify it.
A 2025 line of work on cognitive-load-aware streaming makes the mechanism precise. Normal reading is a pull-based activity: you decide when to move your eyes, when to slow down on a hard sentence, when to skim. Token streaming inverts that into push-based delivery: the system decides when each fragment arrives, and you must interpret partial, unfolding content, hold an incomplete picture in working memory, and repeatedly re-orient your attention whenever the generation timing fails to match your reading rhythm. The researchers found that where the stream stutters matters — a pause at a sentence boundary is tolerable, but an interruption mid-phrase disproportionately hurts comprehension and increases effort. A follow-up at the 2026 CHI conference proposed pacing tokens adaptively to the reader's inferred load, which only underscores the point: the naive stream actively fights your brain.
Put plainly: watching a token stream is work. It is low-grade, continuous, attention-shredding work, and you do it many times a day without counting it.
And then the interruption tax on top
The damage does not end when the stream does. The deeper cost is what synchronous waiting does to everything else you were trying to accomplish. Professor Gloria Mark's research at UC Irvine found that after an interruption, a knowledge worker takes an average of about 23 minutes to return to the original task. Other recent measurements pile on: workers toggle between apps on the order of a thousand times a day, and Microsoft's 2025 Work Trend Index reported employees facing interruptions roughly every two minutes during core hours.
A synchronous AI session is, structurally, a self-inflicted interruption. You context-switch out of your real work into the role of stream-supervisor, you sit in that role for the length of the generation, and then you context-switch back — paying the re-focus tax on both ends. Multiply by the number of times a day you ask an AI to do something nontrivial. The tool sold to you as a time-saver can become one of the largest sources of fragmented attention in your day.
- The stream tax: reading push-paced fragments costs more than reading at your own rhythm.
- The vigilance tax: you stay alert to catch the agent going wrong, so you cannot truly look away.
- The switch tax: entering and exiting the waiting state breaks the focus you needed for your actual work.
- The opportunity tax: the minutes spent watching are minutes not spent on the thing only you can do.
How humans actually delegate to other humans
Here is the tell that chat is wrong: we already know how to hand work to a capable agent, and it looks nothing like chat. We do it with each other every day.
When you ask a colleague to draft a memo, you do not pull a chair up behind them and read over their shoulder as they type each word, ready to grab the keyboard the instant a sentence wanders. That would be insulting to them and ruinous to you. You say what you need, you trust them to ask if they get stuck, and you go do something else. The result lands in your inbox later. You review the finished thing, not the keystrokes.
This is asynchronous delegation, and it is the basic coordination pattern of every organization that has ever functioned at scale. Its defining feature is that the delegator's attention is released the moment the task is handed off. The work and the watching are decoupled. The competence of the worker is precisely what earns them the right not to be watched.
“You do not measure a good employee by how entertaining their typing is to watch. You measure them by what shows up when they are done.”
Chat-based AI violates this on purpose. It asks you to supervise the keystrokes of an agent that is, in many domains, already more capable than you at the narrow task at hand. We accept the over-the-shoulder posture from a machine that we would never inflict on a person, partly out of habit and partly because we do not yet trust the machine. But trust is a property you build by letting go in low-stakes cases and seeing good results return — not a property you build by staring harder.
Synchronous versus asynchronous, made concrete
The distinction is not academic. It maps onto a real architectural fork that the agent-tooling world has been racing down throughout 2025 and 2026. On one branch are tools that pair with you in real time. On the other are tools you delegate to and walk away from.
The asynchronous branch has a name now — the background agent. A background agent takes a task, spins up an isolated environment, works autonomously, and returns a finished artifact when it is done. You do not watch it work. OpenAI's Codex cloud mode assigns a task to a sandboxed virtual machine that clones your repository, works on its own, and opens a pull request at the end. Google's Jules follows the same shape: you say “update this project to the next framework version,” and it goes off, plans, edits, runs tests, and comes back with a pull request. Framework vendors have started shipping native fire-and-forget primitives so that any tool can offload heavy work to a background queue and return immediately. The pattern has a clear summary in the practitioner literature: synchronous wins when human attention is locked in and waiting; async wins when the work outlasts the human's active attention.
That last sentence is the design rule of this entire essay. The question is never “is chat good or bad” in the abstract. The question is whether the work outlasts your willingness to sit and watch. For a one-line factual answer, it does not, and chat is fine. For anything a capable agent should be doing — multi-step research, code changes, drafting, planning, errands across tools — the work outlasts your attention, and dispatch wins.
| Dimension | Chat / stream (synchronous) | Dispatch (fire-and-forget) |
|---|---|---|
| Your attention | Held for the whole generation | Released at the moment of handoff |
| Unit you consume | Tokens as they stream | The finished result |
| Posture | Supervising keystrokes | Reviewing an outcome |
| Time the work can take | As long as you will sit and wait | Minutes to hours, unattended |
| Parallelism | One conversation at a time, mostly | Many tasks dispatched in parallel |
| Failure feel | You catch it live, mid-stream | You catch it at review, with full context |
| Best for | Quick Q&A, tight iterative editing | Real work that outlasts your patience |
When chat is still the right tool
Dispatch is not a universal replacement, and any honest version of this argument has to draw the line carefully. Chat earns its keep in a real and recurring set of situations, and pretending otherwise would be the same overreach this essay is criticizing.
Chat is right when the loop is genuinely tight and your judgment is needed on every turn. If you are workshopping a paragraph and you want to react to each revision, the cost of dispatching and waiting for a round-trip exceeds the cost of watching. Chat is right when you are exploring and you do not yet know what you want — the conversation is how you discover the question, and a fire-and-forget handoff presumes a clear instruction you do not have. Chat is right when stakes are high and reversibility is low, so that catching a wrong turn at token five saves you from a bad outcome at token five hundred. And chat is right for learning, when watching the reasoning unfold is the point, not a tax.
- Tight creative iteration — you react to every revision and the round-trip latency dominates.
- Open-ended exploration — you are using the dialogue to find the question itself.
- High-stakes, low-reversibility steps — catching a mistake live is cheaper than undoing it.
- Learning and understanding — the reasoning trace is the deliverable.
The mistake the industry made was not building chat. Chat is excellent at the things above. The mistake was making chat the only shape, the universal front door, so that even tasks that obviously want delegation get squeezed through a real-time supervision interface. The fix is not to abolish chat. It is to add dispatch as a first-class peer and route each task to the model that fits it.
Why the streaming default quietly shrinks what you ask for
There is a subtler harm in synchronous-by-default than wasted minutes, and it is the one I care most about. When the only way to use an agent is to sit and watch it, you unconsciously restrict yourself to tasks short enough to be worth watching. The interface sets a ceiling on ambition.
Think about what you actually ask a chat assistant to do. You ask for things that finish in a paragraph or two, because anything longer means a longer vigil. You do not ask it to spend forty minutes cross-referencing six sources and producing a memo, because you cannot stand to babysit for forty minutes, and the interface gives you no other way to make the request. So the tool's real capability — sustained, multi-step, autonomous work — goes unused, not because the model cannot do it but because the interface makes asking for it unpleasant.
Dispatch removes the ceiling. When the cost of asking is one sentence and the cost of waiting is zero attention, you start handing over the big, slow, valuable tasks — precisely the ones worth handing over. The interface stops being a leash on what you delegate and starts being a faithful conduit for it. This is the quiet, compounding win of fire-and-forget: it does not just save you the minutes you currently waste, it unlocks the work you currently never start.
You can see the same ceiling effect in how teams adopted background coding agents through 2025 and 2026. As long as the only way to use an agent was an interactive pair-programming session, people used it for small, supervised edits — the equivalent of short chat answers. The moment the deliverable became an asynchronous pull request you could review whenever you got to it, the size of the tasks people handed over jumped. They started asking for whole migrations and multi-file refactors, work that nobody would sit and watch a stream produce. Nothing about the model's capability changed in that transition; only the interface changed, and the interface changed what people dared to ask. That is the pattern to generalize beyond code: when the cost of delegating drops to one sentence and the cost of waiting drops to zero, the ambition of what you delegate rises to meet the real capability of the agent rather than the limit of your patience. The interface had been hiding most of the value the whole time.
Parallelism is the part nobody talks about
There is a structural advantage to dispatch that goes beyond saving the minutes you spend watching, and it is the one that changes the shape of a working day rather than just trimming it. A synchronous chat is a single-threaded relationship. You can really only watch one stream at a time, because watching is a full-attention act — that is the whole problem. So no matter how many agents you theoretically have access to, the chat interface throttles you to one conversation, one task, one wait, in series.
Dispatch is multi-threaded by nature. Because you release your attention at the moment of handoff, there is nothing stopping you from dispatching a second task, and a third, and a tenth, each to a different agent or the same one, all running at once while you go about your day. The dispatcher becomes a small operations center: a dozen things in flight, none of them needing you until they are done, each surfacing only when it has something to show. This is how a manager of competent people works — many tasks delegated, attention spent only on review and unblocking — and it is structurally impossible under a chat interface that demands you supervise each stream in real time.
The practitioner consensus emerging in 2026 names this directly: synchronous tooling caps your throughput at one human-attention-unit, while asynchronous tooling lets the work fan out and run wider than any single person could watch. When you internalize that, the chat thread starts to look less like a feature and more like a bottleneck — a funnel that forces all your parallel intentions through one serial pipe.
- Chat: one task in your attention at a time; throughput capped by your patience.
- Dispatch: many tasks in flight at once; throughput capped only by how fast you can form intentions.
- The shift: your role moves from operator to dispatcher — from doing the watching to deciding what gets handed off and reviewing what comes back.
The economics of attention
It helps to put a rough price on what synchronous AI use costs, because the cost is invisible precisely because it is paid in small, untracked increments. Suppose you ask a capable agent to do something nontrivial ten times in a working day — a conservative number for anyone who has folded AI into their actual workflow. Suppose each of those tasks streams for an average of two minutes while you watch. That is twenty minutes of pure stream-supervision, which already sounds bad.
But the twenty minutes is the small part. The expensive part is the focus you destroy entering and leaving each of those ten waiting states. If Gloria Mark's roughly 23-minute refocus figure is even directionally right for the deeper of those interruptions, you do not need many of them per day before the real cost dwarfs the visible two-minute waits. The visible cost is the tip; the context-switch cost is the iceberg. This is why people who use AI heavily often report feeling busier and more scattered rather than freer — the tool did save them keystrokes, but it taxed them in the far more expensive currency of sustained attention.
Dispatch attacks the iceberg, not the tip. By releasing your attention at handoff, it removes the entering-and-leaving cost almost entirely. You form the intent, you speak it, and your focus never left the thing you were doing. The two minutes of streaming you would have watched simply do not happen in your timeline at all — they happen in the background, on the agent's clock, and the only moment that re-enters your timeline is the brief, batched moment of reviewing a finished result. The arithmetic is lopsided in dispatch's favor, and it gets more lopsided the more capable the agents become, because more capable agents mean longer tasks, and longer tasks mean larger waits avoided.
“The cheapest thing an AI can give you is a faster answer. The most valuable thing it can give you is your attention back. Chat optimizes the first; dispatch optimizes the second.”
The one-sentence problem
If the human side of dispatch is “say one sentence and walk away,” then the hardest design problem is making one sentence enough. A sentence is a tiny, lossy, ambiguous thing. “Handle the Anderson follow-up” means nothing without a mountain of context: who Anderson is, what the follow-up is, what handling it looks like, what tone, what deadline, what counts as done. Between two people who share that context, the sentence is sufficient. Between you and a context-free machine, it is hopeless — and the machine's response to a hopeless sentence is to ask clarifying questions, which drags you right back into a synchronous chat.
So the entire viability of dispatch rests on closing the context gap, and there are only a few honest ways to do it.
Memory, not interrogation
The first is durable memory of you — your projects, your people, your prior tasks, your standards — so that a thin sentence can be thickened by what the dispatcher already knows. This is the difference between a new contractor who needs everything spelled out and a long-time colleague who needs three words. The dispatcher has to be the colleague. Every bit of standing context it holds is a clarifying question it does not have to ask, and every avoided question is a step that keeps the interaction asynchronous.
Asking only when it truly matters
The second is disciplined clarification. A good dispatcher does not ask — it acts on its best reading — except when the ambiguity is both genuine and consequential, in which case asking once is far cheaper than delivering the wrong result confidently. The skill is in the threshold: ask too often and you have rebuilt chat; ask too rarely and you get garbage from misread sentences. Calibrating that threshold to your tolerance, and learning it over time, is a real part of what a dispatcher must do.
Reviewable, reversible output
The third is to make the output safe to be wrong. If a misread sentence produces a draft you can glance at and discard, the cost of ambiguity is small and you can dispatch freely. If it produces an irreversible action, the cost is large and you are forced back toward supervision. So a dispatcher should bias, wherever it can, toward proposing reviewable artifacts over taking irreversible actions — because reviewability is what lets you forgive the inevitable imperfection of one-sentence instructions.
Solve the one-sentence problem and dispatch works. Fail to solve it and dispatch silently degrades back into chat, one clarifying question at a time. This is the make-or-break of the whole model, and it is mostly a problem of context and trust rather than of raw model capability.
What dispatch does to your relationship with the work
There is a softer point underneath all the productivity arithmetic, and I think it is the one that actually matters to how a day feels. The chat-and-stream interface puts you in a posture of supervision toward your tools. You are the overseer, eyes on the worker, hand near the stop button. That posture is subtly exhausting and subtly diminishing — you spend your day watching a machine do things you have decided not to do yourself, which is neither the satisfaction of doing the work nor the freedom of being done with it.
Dispatch puts you in a different posture entirely: the posture of someone who decides what should happen and then trusts it to happen. You spend your attention on the part that is irreducibly yours — judgment, intent, taste, the choice of what is worth doing — and you let the execution fall away into the background where it belongs. This is closer to how a good leader relates to a good team, and it feels qualitatively better than supervision. You are not babysitting a machine. You are directing a small operation and getting on with your life.
I think a lot of the low-grade unease people feel about AI in their daily work comes from the supervision posture rather than from the AI itself. Watching a machine type all day is a strange and slightly hollow way to spend a mind. Saying what you want and then having it quietly turn out to be done is something else — it feels like leverage rather than surveillance. The interaction model, not the model weights, is what determines which of those two experiences you have.
The anatomy of a good dispatcher
If dispatch is the right model, then the product question becomes: what does a good dispatcher have to do? Saying “fire and forget” is easy. Building something you can actually trust to fire and forget is hard, because the moment you stop watching, every weakness in the system becomes invisible to you until the result lands — or fails to. A dispatcher you can trust has four non-negotiable properties.
1. It knows you
Delegation works between people because of shared context. You can say three words to a good colleague because they know the project, your standards, and what “done” means to you. A dispatcher with no memory of you forces you back into the long, explicit, supervised prompt — which is to say, back into chat. A dispatcher that knows your projects, your preferences, your people, and your history can turn one ambiguous sentence into a correctly-scoped task. Context is what makes brevity safe.
2. It routes to the right agent
One sentence is not one task. “Push the fix and tell the team” is a coding job and a messaging job. A dispatcher's real intelligence is in the routing: parsing intent, choosing which agent or tool should own each piece, and handing each piece to the surface that does it best. The dispatcher is not the worker. It is the dispatcher — the switchboard between your intent and the specialized agents you already run.
3. It delivers reliably
Fire-and-forget is a broken promise if the result silently fails to come back. The single hardest engineering requirement of a dispatcher is delivery you can count on without looking. The result has to reach you when it is ready, through whatever channel you are actually reachable on, even though you have long since walked away and forgotten the task exists. If you have to remember to go check, it was never fire-and-forget — it was just chat with extra steps.
4. It works when you are not at a screen
The whole premise is that you dispatch and go live your life. That means the dispatcher cannot assume you are sitting at a keyboard, because the entire value is that you are not. It has to accept the task and deliver the result while you are walking, driving, cooking, in a meeting, or holding a phone that is locked in your pocket. A dispatcher chained to a desk has quietly re-imposed the synchrony it claimed to remove.
| Requirement | What breaks without it |
|---|---|
| Knows you | You fall back to long supervised prompts — i.e. chat |
| Routes correctly | One sentence does the wrong job, or only half of it |
| Delivers reliably | Results vanish; you babysit to be safe |
| Works off-screen | You are chained to a desk — synchrony returns |
Why the dispatcher wants to be worn
Trace those four requirements to their conclusion and you arrive somewhere specific. A dispatcher must capture intent the instant you have it, must not require a screen, and must deliver results while you move through the world. The device that fits that description is not an app you open. It is something you wear.
Consider the friction of dispatching from a phone. You feel the impulse to hand off a task. Now you have to take the phone out, unlock it, find the app, wait for it to load, tap into a thread, and type or dictate — and somewhere in that sequence the impulse decays, or you get pulled into the seven other things a phone shows you the moment it lights up. The phone is a context-switching machine. Routing your one-sentence dispatch through it reintroduces exactly the attention fragmentation that dispatch was meant to eliminate.
A worn dispatcher collapses that to a single act. You raise your hand or tap once, you say the sentence, you are done. No screen, no unlock, no app, no feed. The intent goes out the moment you have it, while it is still fully formed, and your attention never leaves what you were doing. The body is the lowest-friction surface there is for issuing a command, because it is always already present. This is the physical argument for the wearable agent dispatcher, and it falls directly out of the interaction model rather than being bolted on as a gadget gimmick.
How Nexting implements dispatch
This is the model Nexting is built around, so let me be concrete about what it actually does — and careful to claim only what is real. Nexting is a wearable agent dispatcher: a small device you wear that lets you talk to your own AI agents anywhere, with no phone in hand and no app to open.
The interaction is the thesis made physical. You wear the device. You say one line. The agent runs in the background. The result comes back when it is ready — and it comes back even while your phone is locked or you have walked away from it. That is the entire loop, and the design discipline is in refusing to add a token stream you are expected to watch. Dispatch, then go.
It dispatches to your own agents
Nexting does not try to be the smart one. It is the switchboard. It connects to the capable agents you already run — with deep integration for Claude Code, OpenClaw, and Codex — and routes your spoken intent to them. This is bring-your-own-agent, and it is free: you connect the agents you own rather than renting a single proprietary assistant locked inside the device. For people who would rather not run their own, there is an optional managed tier, Nexting Pro, with hosted models — but the default posture is that the intelligence is yours and the device is the dispatcher.
It delivers while you are gone
The hard requirement — reliable delivery off-screen — is the part most easily under-built, so it is worth stating plainly: results are pushed back to you when the work is done, even if your phone is locked or in your pocket. You are not asked to remember the task and go check on it. The point of firing and forgetting is that you genuinely get to forget, and the result finds you. You can even reach a long-running session you left on your laptop — for instance, a running Claude Code session — and steer it from your pocket without sitting back down at the desk.
It respects that an always-on device hears your life
A wearable you can talk to anywhere only works if privacy is real, not asserted. In bring-your-own-agent modes the agent runs on your own device with your keys, and the system is end-to-end encrypted by default — the cloud relays ciphertext rather than reading your content. The managed Pro tier necessarily involves the cloud to run hosted models, and we say so rather than pretending otherwise. Across both, the policy is the same: no training on your data, no selling it, no sharing it, and deletion on request. The honest caveat belongs here too — a managed model you do not host is a different trust posture than an agent running on your own hardware, and the product is built to let you choose which one you want.
What the form factor actually is
Two shapes carry the same dispatcher idea, and I will give you only the numbers I can stand behind. Nexting PIN is shipping now at $129, a small device that pins to your collar, with free worldwide shipping from Shenzhen. Nexting Ring is the flagship, currently in private beta — I am not going to quote a price or a date for it, because there is not a public one to quote, and inventing one would be exactly the kind of unearned claim this essay argues against.
The hardware is deliberately not the headline. The device is the means; the dispatch model is the point. Nexting's current hardware is a Co-Builder Edition, 3D-printed today, which is an honest description of an early-stage device built in public rather than a polished mass-market gadget pretending to be finished. The features that matter for dispatch are the ones that serve the loop: speak-to-text capture, schedules and reminders, iPhone integration, a Feishu and Lark voice bridge, and a skills ecosystem so the agents you dispatch to can grow new capabilities over time.
“The device should disappear. What you should feel is that you had a thought, you said it, and later the thing was done.”
The objections, taken seriously
A thesis is only worth as much as its handling of the strongest counterarguments, so here are the real ones.
“If I do not watch, the agent will go wrong and I will not catch it.”
This is the honest fear, and it is correct in proportion to how much you trust the agent. The answer is not to watch forever; it is to dispatch the class of tasks where a wrong result is reviewable and reversible, build trust as good results return, and reserve live supervision for the genuinely irreversible steps. You already do this with people. You do not read a trusted colleague's every keystroke, but you do review a contract before it is signed. Dispatch does not abolish review — it moves review from the keystrokes to the outcome, which is where review belongs.
“Streaming makes it feel responsive. Dispatch will feel dead.”
Responsiveness and synchrony are not the same thing. A dispatcher can acknowledge instantly — yes, I heard you, I am on it — which is all the responsiveness the moment of handoff actually requires. The streaming sensation of liveness is a substitute for trust, and like most substitutes it is worse than the real thing. What feels genuinely good is not watching a machine type; it is the result arriving while you were busy living, the small pleasant surprise of work that finished itself.
“This only matters for engineers running coding agents.”
Coding agents are where async went mainstream first, because the deliverable — a pull request — is so clean to review asynchronously. But the model generalizes to any task with a reviewable artifact: a research summary, a drafted message, a booked appointment, a triaged inbox, a scheduled reminder. Anywhere the work outlasts your patience and the result can be checked after the fact, dispatch beats chat. That is most of the useful work, not a niche.
Where this goes
The industry is converging on this from two directions at once. From the infrastructure side, background agents, durable long-running execution, pause-and-resume, and native fire-and-forget primitives are showing up across the major agent frameworks through 2025 and 2026 — the plumbing for asynchronous delegation is being laid as fast as anyone can pour it. From the interface side, the chat window is starting to feel, for the first time, like a legacy default rather than the obvious answer. When the plumbing for async exists and the synchronous interface starts to chafe, the front door changes. It is changing now.
My prediction is that the dominant way people invoke capable agents a few years from now will not be a chat thread. It will be a dispatch: a short instruction issued from wherever you happen to be, to an agent that knows you, that routes the work correctly, that runs while you do something else, and that delivers the result reliably to wherever you are reachable. The chat thread will survive for what it is genuinely good at — tight iteration, exploration, learning — and it will stop being the place you go to get real work done.
The deepest reason is not efficiency, though the efficiency is real. It is that delegation is how humans have always coordinated competence, and we are finally building machines competent enough to delegate to. The right interface for a capable worker is the one we have used for capable workers all along: tell them what you need, trust them to ask if they are stuck, and go live your life. Dispatch, don't chat. Say the sentence, and let the work find its way back to you.
That is the whole idea, and it is small enough to wear.
Meet Nexting PIN — $129
A wearable agent dispatcher. Wear it, say one sentence, and your own agents — Claude Code, OpenClaw, Codex — finish the work in the background.
Buy now