All of the software you use is designed for the purpose of turning very specific kinds of data into other very specific kinds of data. Press a key, and a certain numerical code is reported up through your BIOS, operating system, and application code, triggering reactions along the way—such as the display of a character on the screen, or the selection of a menu item, or the skipping of a music track. “Character on the screen” sounds simple, but the representation of a character in a desktop application is astonishingly complex:
- the character probably lives in a larger data structure representing a text document or database field
- the visible character on the screen probably sits inside a specific user interface element such as a text box
- that UI element might has specific settings for things like font size and colour
- turning those settings into graphics on screen will involve the use of an operating system graphics toolkit
- the graphics toolkit will need to talk to the video card, via a low-level graphics library
- the low-level graphics library will need to talk to a video card driver
- the video card driver tells the video card which pixels to set to which colours
I’ve simplified and skipped multiple levels and sub-systems here, and it’s already complicated.
To make this work, software engineers need to maintain a certain level of rigour. Code must work with precise data structures, just so, or errors will likely be introduced. Each layer of the application, operating system, and driver code must talk to each other layer in very precise terms, lest the whole system collapse.
As a result, most parts of the system are very selective in what they will talk to, and what they will say. The same goes for networked applications, where online services hide behide narrow APIs that permit only certain others to interact with the system in certain ways. In software engineering, one of the names for this is encapsulation, the idea that a (sub-)system should be a self-contained unit that can remain wholly reliable provided you communicate with it in the specific ways that it requires.
In practice, this is hard. Each module has its own little world model in which certain data structures have certain meanings that they might not have elsewhere. Software engineers are obliged to adopt the mindset of the philosopher, in which things mean what they are defined as meaning, not what they might mean in common usage. Since there are many interacting pieces of software in even trivial applications, this involves being mindful of many different conceptual models. Sometimes, we get confused.
In RFC760, which defines the Internet Protocol, a foundational document in computer-computer communication, Jon Postel wrote:
In general, an implementation should be conservative in its sending behavior, and liberal in its receiving behavior. That is, it should be careful to send well-formed datagrams, but should accept any datagram that it can interpret (e.g., not object to technical errors where the meaning is still clear).
As “be liberal in what you accept, and conservative in what you send”, this principle has come to be known as “Postel’s law”. It means that software engineers should try to code their systems to accept some degree of error from the systems they must communicate with, while striving to avoid introducing errors of their own.
This is a fine aspiration in some ways, but it has a cost. If your system is too liberal in what it accepts, then it ceases to have meaningful standards at all; if other systems begin to rely on the acceptance of malformed or confusing data, then the receiving system must continue to accept these forever, or break the systems relying on it. This significantly complicates maintenance, because now the system has both a formal standard for how it behaves, and an informal set of behaviours that it will exhibit when receiving the “wrong” data. In practice, being liberal with what you accept is not feasible beyond trivial cases.
As a result, our software is stuck with the need to communicate in very specific and abstract ways. The “value chain” of a software application starts with some data (from the user, or from the network), and proceeds through a series of abstract transformations that require precise alignment in order to work. The engineering effort required is high, and so most applications have only a limited capacity to talk to other applications, even when they both work with similar kinds of data.
This is why most schemes that involve having lots of computers talk to each other fail. It is really really hard for computers to agree on semantics and protocols, and to tolerate error without creating the side-effect of making the error itself part of the protocol.
This tweet from Gordon Brander got me thinking:
Focusing on structured data when GPT-3 exists feels like solving yesterday’s problem.
GPT-3 means that we don’t need to worry so much about how we structure our data, because GPT-3 is really good at turning one kind of data into another. Say Alice has some data, and Bob has an API for doing something useful with data, but Alice’s data doesn’t match Bob’s API. The old way of solving that problem would be to think like a philosopher, teasing out the semantics of Alice’s data and the paradigmatic model of Bob’s API, then constructing a transformation between the two. Now, we can just ask GPT-3 to do it for us.
Here’s an example of GPT-3 turning plain text into JSON data. Here’s GPT-3 generating SQL queries. Here’s ChatGPT generating code to interact with an API. These are the building blocks of an interaction between two systems, and—with the right prompting—an LLM can generate them, on demand, in seconds. This is a profoundly different kind of problem-solving.
LLMs are not philosophers. They don’t care really care about semantics or abstract data models. They know how to get some things done, and you can ask them to do it. LLMs are hustlers. If you want to get some data from A to B, the hustler will get it done faster than the philosopher, and most of the time with no loss in quality.
The immediate implication is that software integration projects will get a lot easier. The translation layers, the glue code that lets us stick two systems together, can be generated by LLMs. Individual components can afford to break Postel’s law and become more conservative in what they accept, more rigorous about standards, because the cognitive load of handling the necessary data transformations can be outsourced to the LLM rather than the programmer. Since LLMs can learn, APIs can change much faster than they can now, when we’re reliant on humans to a) understand the change and its implications, and b) update all of the relevant code. We will need to build a lot of safety harnesses before we can feel comfortable with this, but the payoff is so high that we will be willing to take some risks here too.
We can think of inter-application communication as a kind of trade, and the capacity for specific transactions to occur as being a matching problem. Matching theory describes “the formation of mutually beneficial relationships over time”, typically concerning economic relations between people, but the principle is applicable to relations between computer systems too. Matching is not just about an employee’s skills and and employer’s salary offer, but whether they can find each other, the influence of geography, language, social networks, and so on. If there is a job available, and a person willing and able to do it, but that person remains unemployed and the vacancy unfilled, then this is attributed to “friction” in the job market—something must have got in the way of the person taking the job.
Communication between computers has a lot of friction, despite all of our best efforts. Communication between different parts of the same computer, or even within the same application, often have friction! We might have some data over here, and an API over there, and with just a bit of hustle, we could get the two to talk to each other. But philosophers don’t hustle, and so our current applications can’t do it.
In the analogy with a market, the LLM is the entrepeneur matching a source of supply with a source of demand: some system needs a certain kind of computation doing, and some other system is capable of doing it, given the right data as an input. An entrepreneurial LLM transforms the data, and writes the code to invoke the API of the system that can do the computation. API marketplaces currently exist, but are poorly-utilised because the marginal cost of interacting with a new API is high. Even when what we want is reasonable, the persnickety details of data formats and API schemas and network protocols get in the way. With the right framework, LLMs can help us to realise the aim of networks of collaborating software applications that we have spent the last few decades imagining.