cinnamonlab Boston University

Towards a Linguistics of AI

Cinnamonlab’s goal is to develop a “linguistics of AI.” This statement consists of three claims, which I defend in this essay.

I show how thinking of LLM interpretability as a “linguistics of AI” provides a conceptual framework that unifies Cinnamonlab’s six research themes.

LLMs as a Form of Language

Linguists understand Language (with a capital L) to be a conventional system of arbitrary symbols that combine with one another to form representations of abstract ideas. People experience these symbols in the form of speech sounds, hand gestures, or written characters; but linguists recognize that these are merely physical instantiations of a Language that is ultimately abstract (Saussure, 1916). Most linguists today think of Language as a sort of mental computational system that is responsible for generating expressions of speech or signs and parsing expressions generated by others (Chomsky, 1968, 1986).

In the natural world, many animal species have systems of communication that resemble Language. These systems usually consist of finite collections of memorized symbols, which make reference to objects and states of being that are immediate in time and space. Human Language is distinguished from animal communication systems in that it is fully abstract, both in the physical form of symbols, and in the range of ideas that expressions of Language can convey. More specifically, Language is characterized among animal communication systems by the following properties, identified by Hockett (1960).

Just as evolution has produced systems of communication used by animal species, human engineers have devised a number of systems that allow computers to communicate with humans and with one another. When evaluating these systems of computer communication according to Hockett’s four criteria, we find that LLMs are the only such systems that fulfill the definition of Language. Most systems of human–computer interaction are designed by engineers without the involvement of users (thus violating transmissibility), and they can only convey meanings that are directly related to device functions (thus violating displacement). Some systems satisfy duality and productivity (like programming languages), and others do not (like graphical user interfaces). LLMs are the only system of computer communication that satisfies all four essential properties of Language. LLM expressions are meaningful, but they are made of meaningless tokens. LLM semantics are by no means limited to device functions; in fact, many LLMs have no device functions at all, apart from generating text. LLMs can generate an infinite range of texts by concatenating tokens. And the primitive symbols and rules of LLM text generation are not hard-coded by engineers, but inferred from text via machine learning.

Thus, LLMs are a form of Language, in the sense that they are a conventional system of arbitrary symbols that combine with one another to form representations of abstract ideas. LLM Language clearly differs from human Language, but it fulfills the technical definition of Language according to linguists.

Interpretability as the Scientific Study of AI

Interpretability has emerged as a thriving field of study within AI, but there does not seem to be any consensus on what it is. What interpretability researchers seem to agree on is that it is desirable for AI models to be “interpretable” (or “explainable” or “understandable”) in some way. But when we actually try to operationalize “interpretability” in the form of benchmarks or other evaluation paradigms, we often find that the results are misleading or mysterious, or altogether grounded in circular logic. In practice, much of interpretability research “[relies] on some notion of ‘you’ll know it when you see it’” (Doshi-Velez and Kim, 2017), which makes it difficult to determine what exactly has been learned from that research.

One view of interpretability, popular among linguists and cognitive scientists (either explicitly or implicitly), is that interpretability, in its current form, is the scientific study of AI. Mainstream research in AI subfields like NLP and computer vision is dominated by techniques for producing systems that can be described as “black boxes.” Interpretabilists working in such subfields are therefore concerned with “opening the black box.” In that vein, much of the recent progress that has been made in interpretability has consisted of fine-grained description of AI model behavior, assignment of meaning to neural network representations, discovery of causal relations between AI model components, formulation of theories of deep learning and neural architectures, and development of methods for automating such analyses and conducting them at scale. These are essentially the same activities that any scientific discipline is engaged in; only in this case, AI models are the object of study.

But if interpretability is the scientific study of AI, then what is the rest of AI? Most AI subfields are applied sciences aiming to solve real-world problems, while interpretability is a pure science aiming to understand AI models as a natural phenomenon. In other words, AI engineers build AI, while AI interpretabilists study it. (That is not to say that interpretability does not involve any applied research: interpretability studies rely on tools that have been engineered for that purpose, while insights uncovered by the interpretability literature regularly spur innovation in applied AI research.) Under this view, the importance of interpretability to AI becomes clear. Who, in the 21st century, would be willing to enter a building designed by an architect who had never studied physics? Who would be willing to take a medication developed without any knowledge of biochemistry? Who would feel safe getting into a self-driving car, equipped with the best-performing, state-of-the-art models, when the science of computer vision is still poorly understood?

Of course, merely saying that interpretability is the scientific study of AI does not solve the problem of “you’ll know it when you see it”; but it does provide interpretabilists with a sense of direction, by allowing them to take inspiration from other scientific disciplines. Benchmarks and model analysis techniques should be viewed not as oracles that generate explanations, but as instruments whose measurements are reliable within a certain domain of application. Claims about causal relations should be justified by mechanistic theories and controlled experiments. Insights about AI models should be supported by a diverse array of converging evidence. Progress in the field should be measured by interpretabilists’ ability to answer pertinent questions about AI models. And the lack of unified standards of practice is not a reason to dismiss interpretability, but rather a symptom experienced by all new, emerging scientific disciplines (Kuhn, 1962).

LLM Interpretability as the Linguistics of AI

If LLMs are a form of Language, and interpretability is the science of AI, and linguistics is the science of Language, then it follows that the interpretability of LLMs can be understood as the linguistics of AI. It is now popular to think of LLMs as a form of task-agnostic “general intelligence,” or even as a sort of conscious entity with something resembling a soul. Framing LLM interpretability as the linguistics of AI reminds us that before all of these things, LLMs are conventional systems of arbitrary symbols that combine with one another to form representations of abstract ideas. Whether the LLM is (apparently) solving a math problem, answering a real-world question, or correcting your writing, its output is, first and foremost, a sequence of tokens from which the user extracts meaning. True “understanding” of LLMs therefore requires an understanding of how those tokens are generated, and how users assign meaning to them.

To that end, Cinnamonlab’s research focuses on six thematic areas, which aim to provide a comprehensive view of LLMs as a form of Language.

AI is supposed to transform computers from rote instruction-followers into agents capable of human-like “intelligence.” But the goal of intelligence can never be achieved if we do not understand what intelligence is. Human Language is a fundamental component of human intelligence, and AI linguistics applies the practice of rigorous scientific inquiry to a fundamental component of artificial intelligence. Progress on the questions above is progress towards a world where the workings of LLMs are no longer mysterious to us, and where it is possible to develop and deploy AI technology in a more principled manner.

Previous post
NAQs (Never-Asked Questions)