Networks of Workspaces: A user-friendly approach to capability amplification
The following is our best guess for a practical system that might be scalable with respect to human work, and thus might be able to solve the evaluation tasks using only short-term contributions.
This page will only make sense if you've read about the corresponding features on the taxonomy of mechanisms page.
Proposed system
Stage 1: Recursion + pointers + edits + persistence (+ caching)
To solve any nontrivial task, we need to instantiate more than one agent. Recursion and iteration are the natural candidates. It is easier to simulate iteration via recursion than vice versa, so choosing recursion seems more user-friendly.
We also need to handle large data structures, such as books, both on the top-level and as intermediate structures that agents can create and pass around, so we need pointers in some form. Given pointers, caching is an easy way to reduce human labor.
Recursive Q&A without persistence is challenging (in my experience), so I’d introduce edits and persistence to reduce the responsibility of each individual agent.
Stage 2: Interaction via indirection
We want the system to interact with the world, especially through dialog, e.g. for acquiring personal background needed to solve cost-benefit analysis tasks. Given that we have edits and persistence, implementing interaction via indirection and dependency tracking (when the external context changes) seems natural.
Stage 3: Reflection
For solving large tasks, I suspect that it is useful for the system to be able to introspect on its computations. However, it is an empirical question (1) how large computations can become before something like this technique is necessary and (2) how large computations need to be to implement reasoning about computations. If (2) is larger than (1), we are in trouble.
What's not included?
Internal dialog
Internal dialog clashes with edits—they can fill similar roles, and updating dependencies through a dialog seems like a hassle.
It also doesn’t seem to buy much if pointers to data are available. Given their time constraints, individual agents can’t build up much internal state anyway.
Meta-execution
In the long run, we might want meta-execution. Without it, caching can never result in full automation of responses in new contexts. It also seems worthwhile to investigate how meta-execution compares to or interacts with reflection as an approach for thinking about computation.
However, it seems that meta-execution requires a lot of work to get off the ground. Before we reach the stage where it can do anything without human involvement, there is a long stretch where for each object-level action, we have to do a large amount of meta-level work. And for this meta-level thinking to work out well, we need our basic system to work fairly smoothly (since this is what meta-execution uses to decide on an action). It seems easier to first iterate on the basic system in more applied domains.
Once the basic system is in place and working, it is always possible to switch from H being involved in object-level actions to the meta-execution setting where H is only indirectly involved by choosing what to do on the meta-level. It doesn’t seem that much is lost by delaying this step.
Open questions
How should budgets work?
I describe an option under persistence, but it isn’t entirely determined and even as far as it is, I don’t feel confident that it is the right choice.
How should we structure the content of workspaces?
My guess is that we’d want to use free-form text content with a flexible markup language (perhaps GitHub Flavored Markdown). That leaves open how to integrate:
- The instantiation of sub-workspaces, the associated budget control, and the responses returned from such workspaces.
- Hiding and revealing content behind pointers. For hiding, we could automatically generate a variable name for each unit of content in the current workspace (each paragraph, list and sub-list and list item, link, etc), so that it can be referenced in messages to other workspaces.
Compared to operations on registers (or lists) that contain commands and their results, edits to free-form text are more difficult to automate. I tend to think that we should ignore this concern for now and focus on building a system that works well for human contributors.