Towards a Linguistics of AI
Cinnamonlab’s goal is to develop a “linguistics of AI.” This statement consists of three claims, which I defend in this essay.
- Large language models (LLMs) are a form of Language, in the sense that they are a system of symbolic communication that fulfills the definition of Language according to linguists.
- The field of AI interpretability is a pure-scientific discipline whose object of study is black-box AI models created using deep learning.
- LLMs being a form of language, AI interpretability being the science of AI, and linguistics being the science of Language, the interpretability of LLMs and other natural language processing (NLP) systems should therefore be understood as a “linguistics of AI.”
I show how thinking of LLM interpretability as a “linguistics of AI” provides a conceptual framework that unifies Cinnamonlab’s six research themes.
LLMs as a Form of Language
Linguists understand Language (with a capital L) to be a conventional system of arbitrary symbols that combine with one another to form representations of abstract ideas. People experience these symbols in the form of speech sounds, hand gestures, or written characters; but linguists recognize that these are merely physical instantiations of a Language that is ultimately abstract (Saussure, 1916). Most linguists today think of Language as a sort of mental computational system that is responsible for generating expressions of speech or signs and parsing expressions generated by others (Chomsky, 1968, 1986).
In the natural world, many animal species have systems of communication that resemble Language. These systems usually consist of finite collections of memorized symbols, which make reference to objects and states of being that are immediate in time and space. Human Language is distinguished from animal communication systems in that it is fully abstract, both in the physical form of symbols, and in the range of ideas that expressions of Language can convey. More specifically, Language is characterized among animal communication systems by the following properties, identified by Hockett (1960).
- Duality: Linguistic symbols (words and morphemes) carry meaning, but they are made of elements that are meaningless (phonemes). Language can assign any meaning to any symbol, and it does not require the physical instantiation of symbols (whether as sounds, gestures, or characters) to resemble their meanings in any way.
- Displacement: The meaning of Linguistic expressions is not limited to the concrete or the immediate; or in Hockett’s phraseology, the “here and now.” Linguistic expressions can refer to concepts with no physical instantiation, to entities that are not currently visible to the speaker or the listener, to events from the past or the future, and to counterfactual states of the world.
- Productivity: Language allows expressions to be combined with one another to form new expressions, resulting in an infinite range of possible expressions that are constructed from a finite set of primitive symbols (von Humboldt, 1836). The construction of completely new utterances that have never been uttered before is an everyday experience for Language users.
- Transmissibility: The primitive symbols of Language are not hard-coded in human DNA, but socially constructed and transmitted across generations. Any human being can become a native speaker of any particular language, as long as they are sufficiently exposed to that language during childhood.
Just as evolution has produced systems of communication used by animal species, human engineers have devised a number of systems that allow computers to communicate with humans and with one another. When evaluating these systems of computer communication according to Hockett’s four criteria, we find that LLMs are the only such systems that fulfill the definition of Language. Most systems of human–computer interaction are designed by engineers without the involvement of users (thus violating transmissibility), and they can only convey meanings that are directly related to device functions (thus violating displacement). Some systems satisfy duality and productivity (like programming languages), and others do not (like graphical user interfaces). LLMs are the only system of computer communication that satisfies all four essential properties of Language. LLM expressions are meaningful, but they are made of meaningless tokens. LLM semantics are by no means limited to device functions; in fact, many LLMs have no device functions at all, apart from generating text. LLMs can generate an infinite range of texts by concatenating tokens. And the primitive symbols and rules of LLM text generation are not hard-coded by engineers, but inferred from text via machine learning.
Thus, LLMs are a form of Language, in the sense that they are a conventional system of arbitrary symbols that combine with one another to form representations of abstract ideas. LLM Language clearly differs from human Language, but it fulfills the technical definition of Language according to linguists.
Interpretability as the Scientific Study of AI
Interpretability has emerged as a thriving field of study within AI, but there does not seem to be any consensus on what it is. What interpretability researchers seem to agree on is that it is desirable for AI models to be “interpretable” (or “explainable” or “understandable”) in some way. But when we actually try to operationalize “interpretability” in the form of benchmarks or other evaluation paradigms, we often find that the results are misleading or mysterious, or altogether grounded in circular logic. In practice, much of interpretability research “[relies] on some notion of ‘you’ll know it when you see it’” (Doshi-Velez and Kim, 2017), which makes it difficult to determine what exactly has been learned from that research.
One view of interpretability, popular among linguists and cognitive scientists (either explicitly or implicitly), is that interpretability, in its current form, is the scientific study of AI. Mainstream research in AI subfields like NLP and computer vision is dominated by techniques for producing systems that can be described as “black boxes.” Interpretabilists working in such subfields are therefore concerned with “opening the black box.” In that vein, much of the recent progress that has been made in interpretability has consisted of fine-grained description of AI model behavior, assignment of meaning to neural network representations, discovery of causal relations between AI model components, formulation of theories of deep learning and neural architectures, and development of methods for automating such analyses and conducting them at scale. These are essentially the same activities that any scientific discipline is engaged in; only in this case, AI models are the object of study.
But if interpretability is the scientific study of AI, then what is the rest of AI? Most AI subfields are applied sciences aiming to solve real-world problems, while interpretability is a pure science aiming to understand AI models as a natural phenomenon. In other words, AI engineers build AI, while AI interpretabilists study it. (That is not to say that interpretability does not involve any applied research: interpretability studies rely on tools that have been engineered for that purpose, while insights uncovered by the interpretability literature regularly spur innovation in applied AI research.) Under this view, the importance of interpretability to AI becomes clear. Who, in the 21st century, would be willing to enter a building designed by an architect who had never studied physics? Who would be willing to take a medication developed without any knowledge of biochemistry? Who would feel safe getting into a self-driving car, equipped with the best-performing, state-of-the-art models, when the science of computer vision is still poorly understood?
Of course, merely saying that interpretability is the scientific study of AI does not solve the problem of “you’ll know it when you see it”; but it does provide interpretabilists with a sense of direction, by allowing them to take inspiration from other scientific disciplines. Benchmarks and model analysis techniques should be viewed not as oracles that generate explanations, but as instruments whose measurements are reliable within a certain domain of application. Claims about causal relations should be justified by mechanistic theories and controlled experiments. Insights about AI models should be supported by a diverse array of converging evidence. Progress in the field should be measured by interpretabilists’ ability to answer pertinent questions about AI models. And the lack of unified standards of practice is not a reason to dismiss interpretability, but rather a symptom experienced by all new, emerging scientific disciplines (Kuhn, 1962).
LLM Interpretability as the Linguistics of AI
If LLMs are a form of Language, and interpretability is the science of AI, and linguistics is the science of Language, then it follows that the interpretability of LLMs can be understood as the linguistics of AI. It is now popular to think of LLMs as a form of task-agnostic “general intelligence,” or even as a sort of conscious entity with something resembling a soul. Framing LLM interpretability as the linguistics of AI reminds us that before all of these things, LLMs are conventional systems of arbitrary symbols that combine with one another to form representations of abstract ideas. Whether the LLM is (apparently) solving a math problem, answering a real-world question, or correcting your writing, its output is, first and foremost, a sequence of tokens from which the user extracts meaning. True “understanding” of LLMs therefore requires an understanding of how those tokens are generated, and how users assign meaning to them.
To that end, Cinnamonlab’s research focuses on six thematic areas, which aim to provide a comprehensive view of LLMs as a form of Language.
- Foundations of CL/NLP: What does it mean to do research in computational linguistics and/or natural language processing?
- Theoretical Linguistics: LLM Language is supposed to imitate human Language. What is human Language like?
- Theory of Architectures: LLM Language and human Language are built on fundamentally different computational architectures. What are the capabilities and limitations of each kind of architecture?
- Linguistic Evaluation: What do LLMs learn about human Language through training? What ought they know about human Language?
- Analysis of Representations: How do LLMs store and operationalize linguistic knowledge? What mechanisms are responsible for linking representations of knowledge to model behavior?
- AI and Society: Human language allows humans to be social, and LLM language allows LLMs to participate in human society. What role should LLMs play in human society? What can language technologies tell us humans about ourselves?
AI is supposed to transform computers from rote instruction-followers into agents capable of human-like “intelligence.” But the goal of intelligence can never be achieved if we do not understand what intelligence is. Human Language is a fundamental component of human intelligence, and AI linguistics applies the practice of rigorous scientific inquiry to a fundamental component of artificial intelligence. Progress on the questions above is progress towards a world where the workings of LLMs are no longer mysterious to us, and where it is possible to develop and deploy AI technology in a more principled manner.