rlm-workflow: Recursive Language Models for coding agents


Installation
npx skills add https://github.com/doubleuuser/rlm-workflow --skill rlm-workflow

skills.sh
https://skills.sh/doubleuuser/rlm-workflow/rlm-workflow

GitHub
https://github.com/doubleuuser/rlm-workflow


Since the Recursive Language Models[0] paper demonstrated a method of increasing effective context length to 10M tokens by using sub-agents to move information from the context window to an information store outside the chat, there's been a number of different takes on how to put this into practice in development workflows. Some even go in the direction of storing entire session contexts in a database for later retrieval to preserve reasoning for changes that were made.

rlm-workflow is yet another take, this time with a slightly different angle:

Important information like requirements, codebase analysis and implementation plans should not be passed in the chat in the first place. Chat is effectively CLI and should be used for invocations and commands, not for passing information.

rlm-workflow is modelled after a regular kanban workflow from requirement to implementation plan to testing and manual QA. The workflow is sequential and phase; each phase outputs one markdown doc and takes the previous phases' docs as input. Each phase is gated on fulfilling criteria defined in the previous phase, and at the end of a phase, its output docs are locked. 

The user first creates the 00-requirements.md doc in an RLM folder, then invokes the workflow in chat. It then runs until the manual QA stage, where it waits for user approval before continuing. After finishing an RLM run, the agent updates DECISIONS.md which is a ledger of requirements implemented previously, their whys and whats, and links to respective RLM docs. It also updates STATE.md, an overview of the app's current state.

To be practical, this is what your repo will look like:

rlm/00-my-first-requirements/
- 00-requirements.md (user-created)
- 01-as-is.md
- 02-to-be.md
- 03-implementation-summary.md
- 04-manual-qa.md (test cases are pre-defined; the user enters pass/fail and notes in the doc)
+ /addenda/ if needed

To summarize:
1. Specs are never passed through the chat so they do not suffer from context rot
2. Work is always done based on docs that are locked, so it cannot suffer from degradation
3. The workflow is self-documenting; it is also easily human readable; can also be used to generate information for non-technical stakeholders
4. There is no need to index the codebase a database. The rlm docs provide progressive disclosure and point the model in the right direction. Should significantly reduce token usage.
4. In my simple test, the workflow improves both quality and time to success for complex requirements

Yes, it is essentially a waterfall workflow but the agent iterates within each phase before passing. Iteration with the user happens in the QA phase, where you will normally discover edge cases etc. You can add new requirements as addenda docs or ask the agent to do so, and it will implement according to the workflow.

What about the chat sessions? Forget about them. Instructions don't matter, only outcomes.


rlm-workflow simulates a standard kanban-workflow with distinct phases like requirements, codebase analysis, implementation plan, implementation summary, verification, manual QA of implementation, and then updating of global repo artifacts (STATE.md and DECISIONS.md) to document the codebase.

The benefits of using rlm-workflow for assisted engineering includes improved traceability through workflow and global docs, reduced token usage, reduced context rot, improved accuracy and code quality, and improved speed.

[0]: https://arxiv.org/abs/2512.24601

recursive-mode for coding agents

Installation
npx skills add try-works/recursive-mode --skill '*' --full-depth

skills.sh
https://skills.sh/try-works/recursive-mode/recursive-mode

GitHub
https://github.com/try-works/recursive-mode

recursive-mode.dev
For detailed documentation



recursive-mode is a workflow for agentic engineering that I have been using daily and refining over the past four months. It starts from a simple observation: chat context is a poor medium for engineering state. It solves that problem and opens up a few new possibilities for further coding performance improvements, by using generative recursion during planning, implementation and verification. It is installable and invokable as a skill, but in practice is more like strict inner harness than an open-ended instruction like most skills are.

In a typical agentic coding session, requirements, codebase analysis, implementation plans, execution details, test output, and verification all end up in one expanding conversation. This works for smaller pieces of work but breaks down on larger tasks. As the session grows, it becomes harder to tell whether the agent is still working from the original requirements or just following the most recent context. Over time that causes drifting implementations and severe trust issues, as we can see in this Claude Code user‘s case (see further discussion on HackerNews):

‘Claude has regressed to the point it cannot be trusted to perform complex engineering.

  1. Ignores instructions
  2. Claims "simplest fixes" that are incorrect
  3. Does the opposite of requested activities
  4. Claims completion against instructions’

Thankfully, we can fix these issues by using better workflows. recursive-mode addresses them by moving the important state into repository docs and advancing work through explicit recursive phases during planning, implementation, and verification.

At a high level, recursive-mode formalizes the workflow most of us already try to follow, whether manually or through skills: define requirements, analyze the codebase, create an implementation plan, implement the plan, etc., and also adds recursion at every step.

recursive-mode is sequential and phased. Each phase takes accepted docs from earlier phases as input and produces one locked markdown doc as output. Each phase is gated on criteria defined earlier in the run. Once a phase is accepted, its output is locked. Later phases can extend or correct earlier work through addenda, but they do not silently rewrite prior state.

That constraint changes the workflow in a useful ways. It makes the run resumable. It makes the reasoning legible. It prevents later work from quietly changing the original intent. It gives verification phases something stable to verify against.

How it works
The user starts by creating a 00-requirements.md document inside a run folder, then invokes the recursive-mode skill in chat. Note that chat should be treated as a control surface for invocations and commands, never, ever as store for plans and tasks. The run then proceeds through its phases until manual QA, where it waits for user approval before continuing.

After the run is complete, the agent updates DECISIONS.md, which acts as a ledger of previously implemented requirements, including why they were made, what changed, and links to the relevant run docs. It also updates STATE.md, which provides a current overview of the application.

In practice, a repo using recursive-mode will generate this documentation per run:

.recursive/
├── /memory/              # Structured memory bank
├── RECURSIVE.md        # Canonical workflow spec
├── STATE.md              # Current repository state
├── DECISIONS.md         # Decisions ledger
└── run/00-my-first-requirements/                  
    ├── 00-requirements.md          # User-created requirement
    ├── 01-as-is.md                 # Analysis of codebase current state
    ├── 02-to-be.md                 # Implementation plan
    ├── 03-implementation-summary.md  # What was done in practice
    ├── 04-test-summary.md        # Automated test summary
    ├── 05-manual-qa.md           # Test cases
    └── addenda/                  # Addenda docs added as needed


The specific file names are not what matters. What matters is that requirements, codebase analysis, plans, implementation summaries, verification output, and manual QA exist as durable repo artifacts instead of being left inside a volatile chat session. The worktree diffs also combine with this documentation for an even stronger trace of codebase evolution and its reasons.

This gives us a number of concrete benefits:

  • Specs do not suffer context rot.
  • The workflow documents itself. It is readable by both humans and models, and it also produces material that can be reused for summaries and stakeholder communication.
  • There is less no to compensate for a weak workflow by indexing large amounts of chat context into an external database. The recursive docs provide progressive disclosure and direct the model toward the right context at the right time. In practice, this should also reduce token usage compared with less structured workflows. 
  • In my testing of the very first version of this workflow, then called rlm-workflow, both speed and code quality was significantly increased when building a new project from scratch based on a 700-line requirement. The quality improvements compound and makes you faster over time.




Factory.ai’s Missions is similar to recursive-mode in that both approaches treat long-running agentic work as something that should be broken into bounded units and driven by externalized state rather than one growing chat session. The difference is that recursive-mode is repo-native and document-first. It is open source, and it fits directly into a normal repository workflow. It installs and runs as a skill and works for pretty much any agent, IDE or CLI. 

Using recursive-mode also opens up new possibilities:

Create a harness specific to your codebase by using AutoAgent or similar
https://github.com/kevinrgu/autoagent

Finetune or post-train an open model on your codebase using Embarrassingly Simple Self-Distillation https://arxiv.org/abs/2604.01193

This is enabled because by using recursive-mode you create a comprehensive data set of documentation for each change: run docs are created before, during and after planning, implementation and verification, and worktrees hold the actual code diffs. 

If you found this interesting, follow me on my new Twitter :)

Coding agents and the growing 1% problem

2026-04-19

The mainstreaming of AI workflows will cause a massive hiring boom because the more AI-assisted work you do, the more work you create for yourself.

The “1% problem” with AI is that you can spend a couple of minutes, 30 minutes, or an hour and get something interesting and passable. But in order for you to really share it with other people in your organization and for you to be able to guarantee the quality of the experience, the accuracy of the data and so on, you have to put in orders of magnitude more work, 10x or 100x, than it took to get the first draft out.




To make something out of this that you could publish in your application and share with real end-users you would have to spend 100x more time and effort on it. 

When you can do more with less effort you easily end up doing more and getting exhausted.



The other aspect of the 1% problem is that every feature you launch increases the complexity of your codebase, your product, and even your organization as someone will at some point be able to answer customer questions about it, market it and sell it and so on. You’ll also have to maintain it and further develop it as your codebase evolves. It’s a factor of at least 10x the effort it took to launch the first version.

If everyone builds their own agentic workspaces and workflows, adds a chatbot or AG-UI to their applications, then someone will need to maintain this. And this means new hires and why AI will result in a massive hiring boom, at least across software and likely across almost every industry though not at exactly the same time.

This also means that we need to go back to thinking before we build something because otherwise we will either end up with massive debt and increasing hiring needs, or churned users.

Why Chinese AI labs went open and will remain open

2026-04-17

All across the internet there's speculation and confusion about why Chinese labs open source their models, and that they're going closed. Chinese labs will remain open, because the reason they went open to begin with is still valid. 

In late September of last year, Alibaba hosted their big AI conference, ApSara. I took a look at the main video on YouTube the day after. How many views did it have? I think it didn't break 50 views in 24 hours. The same video from OpenAI or Anthropic would have had at least 100k views and probably much more. 

Internet comments say that open sourcing is a national strategy, a loss maker subsidized by the government. On the contrary, it is a commercial strategy and the best strategy available in this industry. 

When it comes to building global businesses, China has two unicorns: DJI and Insta360. No, not Xiaomi, not Lenovo, not Tencent. ByteDance acquired TikTok (musical.ly, with 200M users) and their attempts at building TikTok Shop have been disasters so they don't make it into this list either.

DJI and Insta360 are unicorns because they don't just make the best products in their respective industries, by far, but are also the clear category leader in the minds of consumers and are the clear go to brands for anyone considering a drone or action camera. They are trusted to have the best products on the market, and the same cannot be said for the other brands I listed. 

DJI and Insta360 are successful in part because of clear technical and product vision, and in part because of focuses, professional marketing. That marketing in large part is video content on YouTube, on their owned channels and through influencers. YouTube is such an important marketing channel for any business, and especially Chinese businesses with no presence and few PR contacts abroad, because it gets them into the conversation. It is the beginning of trust-building because known personalities approve of the product.

This is how important YouTube is for Chinese brands:




Above I've talked about hardware products. What about language models? Being part of the conversation is just as important regardless of industry. When OpenAI's launch videos can get 100k+ views by default and Alibaba's get just a handful, it's clear that even a company like Alibaba has no pull outside of China. For MiniMax, Kimi, Z.ai, it is of course even harder. 

So what can they do to be part of the conversation? If Qwen could only be accessed through Alibaba Cloud APIs, why would anyone bother trying it out other than for novelty when they're already satisfied with their GPTs and Claudes?

Open sourcing models the answer. That's how these labs drive thousands of conversations across YouTube, reddit, X, and eventually get in the tech media and even mainstream media, despite having had no international marketing teams whatsoever back in 2023-2024. 

As a display of this importance there's even an account on Xiaohongshu tracking metrics like GLM's mentions on r/LocalLlama:




Open sourcing models is not a commercial risk because barely anyone can run them locally, few companies have the ability to manage and post-train their own models, and models lose relevance quickly. The real risk that exists is inference providers competing with the labs themselves, but that is being fixed with non-commercial licenses for models in 2026. 

There are additional benefits to open source. Even Google is open sourcing their smaller models, Gemma. The benefit is building affinity between end-users and on-device models, because the future of inference isn't local or cloud, it's hybrid local and cloud. Google would love to be the preference in both cases. 

We are also going to see proprietary open source models released in 2026, in the sense of models with their own memory systems and perhaps recursive capabilities. These have no standard, and every lab would prefer to be the one to define the new standards, like OpenAI and Anthropic have done with inference APIs.

Additionally, we'll also see fine-tuned and post-trained open models, sold by independent labs to both individual and corporate end users in 2026. These help set standards, too.

So in conclusion, there are many commercial benefits to open source, and as long as Chinese labs do not have strong international marketing and sales capabilities within their organizations they will keep open sourcing their models, because there is no other choice. Their business depends on giving models away for free, because open source is like PR, but real.





© Try-Works