A taxonomy of approaches to capability amplification
We'd like to design mechanisms for solving cognitive tasks that are scalable with respect to human work. This page reviews ideas that could be part of such mechanisms.
Here are some examples of cognitive tasks:
- Read this book and tell me why x did y.
- Provide a detailed analysis of the pros and cons of these two products.
- Tell me how to invest $100k to achieve the most social good.
- Write a paper on NLP that substantially advances the state of the art.
Below, H refers to a human worker who has 15 minutes to solve a short task, and f(x, n) refers to an algorithm that takes a task x and a number n of calls to the human worker, and then produces a solution to x while making at most n calls to H.
We’ll start with simple ideas that are definitely not sufficient on their own if we want to solve non-trivial problems, but will form a basis for later ideas.
This page is mostly a review of Paul Christiano’s work on amplification and (to a lesser extent) our work on dialog markets, with a few new bits here and there.
Q&A: One-shot question-answering
As a starting point, we can pass the task to H and return what they return:
f(x, n) = H(x)
This will only solve problems that H can solve in 15 minutes.
Iteration: A sequence of short tasks
A second idea is to increase the total time available for solving the task by calling H n times, each time providing the problem x, the number of calls remaining, and the previous partial answer. For n=3, this looks like this:
f(x, n=3) = H(x, 0, H(x, 1, H(x, 2, "init")))
This is already much more powerful. It’s question-answering with a potentially very large number of steps for thinking, sketching, and revising. Each instance of H can leave notes for its successor, as long as the notes can be written and read within the available time.
Recursion: Making sub-calls to agents that make sub-calls
As an alternative to iteration, we can allow each instance of H to make sub-calls to further instances, up to a global limit of n calls:
f(x, n) = H(x, H_{limit n})
The limiting case where n is unbounded is HCH.
However, we want to actually implement systems, and ideally get guarantees of roughly the following form: “For any task and any ‘achievable’ solution quality, there is a finite amount of computation that allows the system to solve the task at that quality.” This requires that we define how f behaves for finite n.
Example
Suppose the task x is “What is 351 * 5019?” I’ll represent the workspace that H sees using a table with three elements: The question, a body that contains the sub-questions H has asked so far, and potentially the reply H has sent. (H is done after sending the reply; I’m including it for our purposes here.)
H can now ask sub-questions and they will appear in the body along with their answers:
Each sub-question is handled by a separate instance of H, which sees a workspace of the same shape and which can herself ask sub-questions.
Eventually, top-level H has sufficient information to return a response:
Issues
This example raises two issues:
Do we assume that each sub-call returns immediately?
For thinking about the abstract problem, this seems cleanest. If sub-calls take real time, the time available for the entire tree of agents is limited by the time the top-level agent H has, defeating the purpose of sub-calls.
For implementations with real humans, our options and their corresponding disadvantages are:
- Pause the timer for H while waiting on sub-calls. This effectively gives H more time to think and thus breaks one of the key constraints for this project.
- Pause the timer while waiting on sub-calls, and also emulate H using multiple workers. That is, worker 1 handles everything up to the first sub-call, worker 2 everything up to the next one, etc. This destroys any internal state that workers might have built up before they initiated their sub-call. This is probably acceptable if we allow workers to add notes to the workspace.
- Require H to specify a simple program that is used to combine the results of sub-calls. This limits H to computations that are tail-recursive with respect to human computation.
Of these, 2 seems best to me.
How do we implement the limit on the total number of sub-calls?
We can associate a budget with each context (workspace state). When H makes a sub-call, she specifies what portion of her budget the sub-call may use (at most). When the sub-call returns, it also returns how much budget it used up, and this is deducted from the budget for H’s workspace. (This is the strategy followed by Paul’s ALBA implementation.)
Pointers: Handling data abstractly
So far, the size of the data that the collection of Hs can handle has been bounded by the constraint that all information needs to be passed explicitly between different instances of H. This means that the mechanisms so far would fail even on simple tasks like sorting a list with a million elements, since we wouldn’t be able to communicate the list to the first instance of H. We can address this by introducing pointers to data.
A simple way to implement this is using nested messages (as introduced in annotated functional programming).
We’ve previously considered that H could ask the sub-question “What is 300 * 5019?” Instead, H could use square brackets to designate certain parts of the question as sub-messages:
In this case, the next instance of H (let’s call it H2) sees a message with opaque pointers to data:
H2 can pass these pointers to sub-questions without looking at their values:
Alternatively, H2 can expand a pointer to reveal its value:
Using pointers, we can provide large datasets to bounded agents. For example, we could ask H to summarize a book represented as a linked list. Here H looks at the first few sentences:
Pointers don’t always have to be explicit. If we show a page in a book with arrows for flipping to the previous/next page, we can think about this as a combination of pointers and edits.
We don't have to implement pointers using nested messages. Any approach to constructing functional data structures will do. The reason to be functional is that this makes it easy to describe the intended behavior of H: "Return what is most helpful to the agent that instantiated you." If mutations can affect computations far away in the tree of agents in ways that don't flow through the parent, the intended behavior is less clear. (We'll see clean ways to allow mutation when we talk about interaction.)
A mechanism with recursion (or iteration) and pointers is computationally universal:
- Given that our evaluation and data model are close to Scheme, implementing the meta-circular evaluator from SICP on top of our system would probably be straightforward, at least the purely functional version that doesn't have definitions and assignments.
- We could also pick a simple Universal Turing Machine. Each instance of H is responsible for implementing a single step, and so only needs access to the current location of the tape. To allow for bidirectional navigation, the tape could be encoded as a doubly linked list.
Internal dialog: Pointing to agents and sending follow-up messages
We can separate the act of instantiating an agent from sending messages to it. Instantiation creates an agent that hasn’t observed any messages yet, and returns a pointer to it:
When H sends a message to @1, she receives a response and a pointer to that agent’s new state, @2:
Follow-up messages work the same way as initial messages:
We treat agents as functional data structures: @1, @2 and @3 all point to different versions of the same agent. H can send follow-up messages to any of them, as often as she likes. (The command “instantiate” doesn’t actually create an agent—it simply returns a pointer to the agent that hasn’t received any observations.)
Just like pointers to data, pointers to agents can be used in messages:
Internal dialog was introduced in Strong HCH.
Issues
Humans are not functional data structures
We can’t actually store references to a particular human’s state at a point in time and resume from that state multiple times. We can address this using the same strategy we used to simulate sub-calls that return immediately: We identify worker states with workspace states. We use different humans to handle different messages to a single pointer, accept that we lose any internal state that the original agent referred to by @1 had, but rely on the corresponding workspace providing enough information for the different workers to have roughly comparable mental states.
Follow-up messages conflict with the question/body/response workspace structure
With follow-up messages, workspaces no longer necessarily have a single question and answer. So, the separation into question/body/response doesn’t hold up (or would need to be repeated). In this case, it seems simpler to treat workspaces as sequences of interleaved observations and actions. For example:
Here, @3 is the name of the parent workspace that asks a follow-up question directed at the workspace we're looking at.
In the following, I will still often use the question/body/response workspace structure, which is simple and appropriate when we don’t have internal dialog, and can be adapted when we do.
Edits: Incremental changes to workspaces
So far, we have treated workspaces as append-only. For example, this is a workspace:
At this point, we’d usually append a next action (and next result) to the body:
Alternatively, our workspaces could support destructive operations (edits and deletions). One approach is to use indexed registers:
In this case, we can apply the same action as before, but we specify a target register (say A). When the call returns, it overwrites the information in register A and the workspace looks like this:
Instead of registers, we could also treat the workspace body as free-form text. In this case, the actions would be more complex, and we’d need a story for how sub-calls and their results interact with the text. (See also: Structured content)
Issues
Edits exacerbate the difference between a single stateful agent and its multi-agent simulation
Previously, we simulated a single agent accumulating state using multiple agents who each had access to successive versions of a workspace. Destructive edits increase the difference between what the single agent would have done (whose state was built up with access to the entire version history of the workspace), and what a sequence of agents would do (each of which has access only to a single revision). This may not matter—we can probably just give up on the idea of H acting over 15 minutes to generate multiple actions that ultimately result in a response, and directly start with the setting where H has 15 minutes to generate a single action that modifies a given workspace.
Persistence: Iterative improvement of workspace trees
The schemes described so far are all fairly demanding for H. In any recursive question-answering scheme, if the human H at the top of the tree screws up, the overall answer will be wrong. If one of the intermediate Hs returns the wrong response and the workers consuming this result are not sufficiently careful, it can affect the correctness of the overall response as well. While each H can reduce her responsibility by only doing a tiny bit of work and passing on most of the task to her descendants, she still faces the challenge to get this reduction in responsibility right. If she takes on too much work, or otherwise chooses a bad decomposition, and doesn’t communicate well to what extent her results can be relied upon, flaws can affect the global computation. (See also: Reflection)
Edits address this issue to some extent: They reduce H’s responsibility to only making a small change to a workspace. However, once H is done editing the body and submits a response, the workspace effectively terminates, without room for improvement. This makes sense, since whoever instantiated this sub-problem can then use the response and move on. But it also means that whatever work was done before submission is now set in stone.
By contrast, consider the setting where all workspaces persist even after an initial response has been sent. In this case, we can generate an infinite sequence of tasks for each workspace by asking H to make a small improvement to a workspace we present to her (or to pass, if she doesn’t see a way to improve it). If H adds a new sub-call, we create a new workspace as usual. If there was already a response, and H changes it, we propagate this change to the parent workspace and mark it as stale, i.e. requiring editing by another instance of H. We likewise propagate changes to child workspaces when the question text of sub-calls gets edited, since these workspaces now have a persistent identity independent of the question associated with them.
In this setting, it is natural to start with a “null answer”—
—and to improve it step by step over time by iterating on notes and subcalls in the body and updating the overall response based on their outputs, thus growing and adjusting the computation tree. When parts of the tree of workspaces get edited, these edits propagate to the rest of the tree step by step as we mark dependent structures as stale.
Issues
How do we choose which workspace gets edited next?
Previously, at each point there was a unique workspace that was waiting for an action by H. Now, any workspace can be acted upon, and the “stale” markers roughly correspond to the previous notion of “waiting for an action”. H can’t make that choice, since we posit that each H only ever sees a single workspace. So, we need an algorithm that does make that choice.
In practice, how we choose what gets edited next might have a big impact on how efficient our scheme is. In theory, if what we care about is eventual convergence to high-quality responses, it maybe good enough to repeatedly loop over all workspaces.
How do budgets interact with persistence?
It’s not obvious how to generalize budgets to this setting. My best guess: Budgets can be used for iterating on the current workspace, and can also be passed on to sub-workspaces. When a sub-workspace is instantiated, a portion of the parent’s budget is assigned to it. This portion starts small, and can be increased later. An increase marks the corresponding sub-workspace as stale, so that it can be edited to take into account the increase in budget. Workspaces can return unused budget to their parents.
How does persistence interact with robustness and error correction?
As we increase the number of calls to H, the probability that one of the calls fails goes up. To address this, we need to amplify reliability along with capability. Iterative improvement of persistent workspace trees seems related—can we make this relation more precise?
Structured content: Organizing workspaces as documents, registers, trees, etc.
So far, we have seen a few ideas about what individual workspace bodies could look like:
- A list of H’s actions (view, instantiate, ask) and resulting observations
- Named registers, each of which contains an action and observation
This is an axis along which proposals for amplification can vary. Here are some other possibilities:
- Free-form plain text
- Rich text documents
- Tree-structured outlines (as in Workflowy)
If we want to include access to external computation:
- Computational scratchpads (think Jupyter notebook)
- Virtual machine states, along with some Desktop environment
For each choice, we want to limit the visible portion to a fixed finite size, so that we can treat the policy that H implements as finite.
Reflection: Reifying computation for inspection
As the tree of questions and answers (or workspaces) grows, one might worry that the computation will start to break down for non-local reasons. That is, each instance of H is trying to do their best to act in the context they see, and none of them take actions that locally look like mistakes, but since no one can see the big picture, some failures might go unnoticed.
To help address this, we can give H access to a command that reifies the entire tree of workspaces into a data structure:
H can then instantiate sub-agents that reason about properties of this computation.
Or:
If reified computations don’t just include sub-computations, but also information about ancestors of the workspace where “reflect” was executed, this can replace communication to some extent, as long as internal states accumulated by instances of H don’t play a big role relative to explicit content. For example, instead of asking a follow-up question directed at the parent to clarify what unit would be most useful, we can run a sub-computation with access to our ancestors:
Issues
How difficult is reasoning about large computations?
Reasoning about large computations without any individual having access to more than a small portion will be challenging, probably for the same reasons that make it difficult to solve any complex problem by composing local thinking. This may point to a bootstrapping issue: If our system could reason about large computations, we could use that ability to fix issues that prevent us from getting local thinking to solve complex problems; but maybe our system can’t successfully implement such reasoning using local thinking without already having the ability to reflect.
Meta-execution: Agents made out of agents
If reflection is about the agent-collective understanding one of its computations after the fact, then meta-execution is about the collective understanding a computation ahead of time through careful planning.
We have looked at a number of mechanisms that all make calls to H. I’ll use MH to refer to one such mechanism. Each call to H includes some context (e.g., a question) and H returns an action (e.g., a sub-question to ask) that seems good to her in this context. We don’t know how H works—she is a human, her brain is a black box to us.
In meta-execution (introduced here), we replace H with an implementation X that we will provide ourselves. Let’s call the resulting mechanism MX. If we want to solve object-level tasks such as “Read this book and tell me why x did y.” we present them to MX.
In the process of solving such tasks, MX will quickly encounter sub-tasks that it would have presented to H, and will present them to our implementation X instead. Here’s how X works: We create a new cognitive task that looks like this...
...and present it to MH. Here, #1 is a data structure with information about the current sub-task, but this context is now hidden behind pointers. For example, it might look like this:
#1 = “The question that starts with word #2 and continues with words #3”
#2 = “The word that starts with letter #4 and continues with letters #5”
#4 = “R”
#5 = “The list that starts with letter #6 and continues with letters #7”
#6 = “e”
...
Any actions that don’t look at the entire data structure will be applicable across multiple contexts (see: Caching), and if looking at the value of the data structure would have caused H to malfunction, we may be able to avoid this by analyzing it piecemeal through MH.
In short, meta-execution replaces MH with MMH. Our mechanism still bottoms out at H, but we have introduced a level of indirection, so that H chooses meta-level actions, and only indirectly contributes to solving the object-level task.
Previously, we could think about the collection of Hs as operating on a tree of workspaces, with each H responsible for a single change to a single workspace. Now each change itself corresponds to a tree of workspaces that generates that change.
With minor modification, we can also use meta-execution to simulate stateful agents (e.g. if we want to create an agent that maintains state over multiple actions in a workspace). In that case, the context includes the agent’s state, and the returned action includes an updated state.
Interaction: Responding to a stateful world
So far, we have assumed that the problem that our mechanisms are solving is given as a single question. Through the use of pointers, question and response can be large, but they were taken to be fully specified in some sense, with no way to query for more information. We assumed that the only interaction between the system and the real world is through the final response it returns.
How could we enable back-and-forth interaction between our mechanism and the real world?
Types of interactions
For some kinds of interactions, we can treat the world as functional. For queries to Google and Wikipedia, this is probably acceptable in cases where the returned information changes slowly relative to the time scale on which the problem is solved. In this case, we can simply provide H with access to a function that retrieves the relevant information.
For other interactions, we really do need to treat the world as stateful, changing with each interaction. If the system holds a conversation with a user, we need to take into account that each message changes the situation substantially. If H interacts with a computer that maintains state for more than 15 minutes, this is the case as well. In this setting, we can’t give each H direct access to the world, since their interactions would interfere with each other. We can’t simply introduce a pointer to the user (or computer) and approach it the same way we approach dialog with internal agents, since we treat such agents as purely functional data structures.
Interaction through resource pointers
A first idea is to use internal dialog, but to introduce a new type of pointer &1 that cannot be copied. When it is passed to a sub-call, the H that originally had access to it loses that access. When a message is sent to &1, the pointer is “used up” and no further messages can be sent to it. When &1 replies, it returns a new resource pointer &2 along with its response.
Interaction through indirection
Alternatively, we can apply indirection, similar to the meta-execution indirection. The task we present to the system includes a pointer to all observations so far:
Once the system returns an answer, we create a new top-level task of the same form and re-run the entire computation. Since the observations behind the pointer changed, computations that accessed it may now behave differently, and the overall response may be different as well. We thus generate the dialog with the user step by step through a sequence of tasks. We can also pass state between these tasks if necessary by reformulating the task as follows:
If we use persistent workspaces and the corresponding propagation machinery, we can make rerunning the computations more tractable: We treat changes to observations of external state just like we treat other edits. That is, we track what sub-computations depend on the new observations and mark them as stale. If the context changed only a little bit, we can approximately reuse a lot of computation this way and thus reduce the number of calls to H.
From a computational point of view, interaction via indirection and persistence might recover a version of self-adjusting computation.
Out of scope
There are additional mechanisms that will matter a lot for practical implementations, but that are out of scope if we are primarily concerned with what capabilities we can theoretically reach by amplifying human thinking, and don’t care about reducing the number of calls to H:
Caching
Mechanisms that make calls to H can eliminate redundant work by memoizing H. That is, we save all responses from H for the corresponding inputs, and whenever we would usually call H, we first check whether there is already a response in the cache table. If so, we use that response instead of calling H.
This is especially powerful in combination with abstraction (pointers). If the content behind a pointer hasn’t been revealed yet, and so H chose their action without knowledge of that content, we can reuse their response across all values that could be behind the pointer.
For example, this makes it possible to learn the high-level structure of an algorithm for computing the length of a list by caching the actions that generate these two workspaces:
Automation using ML
Beyond caching, we can consider more sophisticated automation strategies. For example, we could introduce automation to decide when to hide values behind pointers, or automation that learns when two natural-language questions are likely to be semantically identical.
Aggregation & incentives
Real implementations need to divide up work between humans whose background knowledge, motivation, and abilities differ. One approach to incentivizing and evaluating their work is the one taken in dialog markets: occasionally evaluate in depth by solving the task “What should the reward for contribution #1 be?” using our system, and learn to predict the outcome of such deep evaluations from cheaper signals.
Serial vs. parallel execution
To run tasks that require a large number of human work hours, we want to parallelize where possible, so that different humans can simultaneously work on different parts of the same global task.
Eager vs. lazy evaluation
We might be able to reduce the amount of human work by delaying sub-tasks until they are needed to generate the overall return value in as much detail as its consumer requires.
Existing systems
The table below is a rough attempt to relate the mechanisms above to some existing and proposed systems. Many have key features that are not captured by the mechanisms discussed above, especially multi-user features, so this is an extremely lossy projection.
n/a | n/a | prose | |||||||
Q&A with edits | n/a | n/a | prose | ||||||
Chat | n/a | n/a | prose | ||||||
n/a | n/a | ||||||||
Strong HCH (1, 2) | |||||||||
Hover over shaded cells to reveal clarifying information.