Prompt engineering is dying...? Tickets are next.
Anthropic and OpenAI made agents keep working. Ralph showed the harder part: make the work live outside the chat.
The useful part was not “keep going”
Anthropic and OpenAI both shipped /goal, and the internet did the normal thing. Everyone treated it like the biggest agent feature of the year.
I get why.
It feels like a clean step from chatbots to workers: give the agent a durable objective, then let it keep going until the condition is met.
That is useful.
But I think the useful part is being misread.
The breakthrough is not “the agent keeps working.” A dumb bash loop already did that.
Geoffrey Huntley’s Ralph Wiggum Loop looked stupid on purpose: keep feeding the agent the same durable task, restart with fresh context, and let the repo carry the memory.
It looked like a joke.
It was not a joke.
Ralph worked because the task did not live only in the chat window. It lived in files, specs, tests, logs, and git.
That difference matters.
Context rot is the hidden tax
Most people think agents fail because the model is not smart enough.
Sometimes that is true.
But a lot of agent failure is not raw intelligence. It is context rot.
The chat gets polluted with:
Failed attempts.
Stale assumptions.
Partial tool output.
False confidence.
Old plans that should have died thirty minutes ago.
Then the agent keeps solving the wrong version of the job.
Long context does not fully fix this. It can even make the problem worse because the model has more old junk to treat as relevant.
Ralph’s crude move was simple: reset the worker, keep the durable state outside the worker.
The repo remembers.
The ticket remembers.
The tests remember.
The next loop starts cleaner.
That is why this matters beyond coding.
/goal still needs a manager
Claude Code’s /goal and Codex’s /goal are useful because they productize part of the loop.
Claude can keep working toward a completion condition, then a smaller evaluator model checks whether the condition holds.
Codex can keep a durable objective active, then pause, resume, clear, and inspect the goal as the run continues.
That is a real interface shift.
But neither feature magically knows what good work means.
If the goal is vague, the loop gets vague faster. If the validator is weak, the agent can reward-hack the proof.
If the permissions are too broad, persistence becomes dangerous. If rollback is missing, the loop can turn a small mistake into a cleanup job.
That is the part I care about.
The feature is not the manager.
The manager is the system around the work:
The ticket.
The validator.
The permission boundary.
The rollback plan.
The ticket is the new prompt
A good ticket is not a prettier prompt.
It is a work contract.
At minimum, it should tell the agent:
The exact outcome.
The scope it may touch.
The inputs it must read first.
Where memory and progress logs live.
What proof means done.
What budget it has.
What permissions are allowed.
How to roll back if it goes sideways.
That is the difference between “build me something” and “work this job safely.”
This is also why coding agents are the obvious first place for /goal.
Code has tests, builds, diffs, logs, and version control. The work naturally produces proof.
Business workflows can use the same pattern, but you have to create the proof yourself.
For invoice extraction, proof might be sample audits and uncertain values flagged instead of guessed.
For inbox triage, proof might be every email labeled, only high-confidence replies drafted, and ambiguous threads parked for human review.
For CRM cleanup, proof might be required fields complete, duplicates below a threshold, and a random sample passing inspection.
The loop is portable.
The validators are not automatic.
The operator job changed
This is the part I think most people are underpricing.
The next skill is not becoming a prompt wizard.
The next skill is becoming good at defining work.
Before you hand off a task, you need to decide:
What is the objective?
What is the boundary?
What does proof look like?
What can the agent touch?
What can it never touch?
When should it stop?
When should it ask?
That is management.
Not corporate management.
Actual management: define the job, create the environment, set the proof gate, control the blast radius, then review the output.
AI did not remove the manager.
It made the manager more important.
The weird part is that the manager might now be you, a ticket, a test suite, a checklist, and a rollback plan.
That is why I made this video.
Not because /goal is bad. It is useful.
But if you only copy the “keep going” part, you get a faster mess.
If you copy the ticket, the memory, the proof, and the stop condition, you get something much closer to a worker.
Watch the full breakdown here:
Author’s note: An LLM was used for this post (transcribed the video and summarized here, all I did was put images in between and hit “publish” button)






