Research

Cinnamonlab conducts cutting-edge research in computational linguistics, interpretability, and social aspects of AI using a diverse range of interdisciplinary methods and perspectives. We publish in ACL conferences and workshops as well as other top-tier venues in computational linguistics, theoretical linguistics, and AI.

Research Themes

Analysis of Representations probing, feature attribution, causality

Analyzing what information is encoded by neural representations and how they drive model behavior

probing-causality

Linguistic Evaluation behavioral tests, psycholinguistic modeling

Assessing the linguistic abilities of language models and comparing them to human language processing

ling-eval

Theory of Architectures formal language theory, computational complexity

Describing the expressive power of architectures for neural networks and linguistic theories

formal-analysis

Theoretical Linguistics syntax, phonology, psycholinguistics

Gaining insights about human language and linguistics using computational techniques

linguistics

AI and Society bias, fairness, humanities and social science

Understanding AI’s relation to society; applying AI to the humanities and social sciences

ai-society

Foundations of CL/NLP position papers, metatheory, philosophy of science

Reflecting on what computational linguistics is, why we do it, and how we ought to do it

foundations

Publications

2026

Context-Free Recognition with Transformers

Sélim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill

arXiv · Jan 6, 2026 · arxiv:2601.01754

formal-analysis

2025

ModelCitizens: Representing Community Voices in Online Safety

Ashima Suvarna, Christina A Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel

EMNLP · Nov 5, 2025 · doi:10.18653/v1/2025.emnlp-main.1571

ai-society

Generative linguistics, Large Language Models, and the social nature of scientific success

Sophie Hao

Italian Journal of Linguistics · Jul 1, 2025 · [no id info]

linguistics foundations

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

Lindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao

NAACL · May 1, 2025 · doi:10.18653/v1/2025.naacl-long.109

ling-eval linguistics foundations

ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

Qinchan Li, Sophie Hao

NAACL · Apr 30, 2025 · doi:10.18653/v1/2025.naacl-long.159

ling-eval

2024

Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels

Kexin Luo, Yue Mao, Bei Zhang, Sophie Hao

LREC-Coling · May 22, 2024 · [no id info]

ai-society

Universal Generation for Optimality Theory Is PSPACE-Complete

Sophie Hao

CL · Mar 1, 2024 · doi:10.1162/coli_a_00494

linguistics formal-analysis foundations

2023

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Sophie Hao, Tal Linzen

EMNLP Findings · Dec 8, 2023 · doi:10.18653/v1/2023.findings-emnlp.300

probing-causality ling-eval

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Sophie Hao, 446 other authors

TMLR · May 11, 2023 · [no id info]

ling-eval

2022

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity

Sophie Hao, Dana Angluin, Robert Frank

TACL · Jul 27, 2022 · doi:10.1162/tacl_a_00490

formal-analysis

Theory and Applications of Attribution for Interpretable Language Technology

Sophie Hao

PhD Dissertation · Jul 1, 2022 · [no id info]

probing-causality foundations

An Adversarial Benchmark for Fake News Detection Models

Lorenzo Jaime Yu Flores, Sophie Hao

AAAI-22 AdvML Workshop · Feb 28, 2022 · arxiv:2201.00912

ling-eval

2020

Evaluating Attribution Methods using White-Box LSTMs

Sophie Hao

BlackboxNLP · Nov 20, 2020 · doi:10.18653/v1/2020.blackboxnlp-1.28

probing-causality formal-analysis

Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling

Sophie Hao, Simon Mendelsohn, Rachel Sterneck, Randi Martinez, Robert Frank

CMCL · Nov 19, 2020 · doi:10.18653/v1/2020.cmcl-1.10

ling-eval linguistics

Rhythmic Syncope in Subregular Phonology

Dustin Bowers, Sophie Hao

Penn Linguistics Conference · Oct 1, 2020 · [no id info]

linguistics formal-analysis

Attribution Analysis of Grammatical Dependencies in LSTMs

Sophie Hao

arXiv · May 4, 2020 · arxiv:2005.00062

probing-causality

Computing Vowel Harmony: The Generative Capacity of Search amp; Copy

Computing Vowel Harmony: The Generative Capacity of Search & Copy

Vendala Ruby, Hossep Dolatian, Sophie Hao

Annual Meeting on Phonology · May 2, 2020 · doi:10.3765/amp.v8i0.4752

linguistics formal-analysis

Metrical Grids and Generalized Tier Projection

Sophie Hao

SCiL · Jan 4, 2020 · doi:10.7275/scil.1227

linguistics formal-analysis

2019

Finite-state Optimality Theory: non-rationality of Harmonic Serialism

Sophie Hao

Journal of Language Modelling · Sep 16, 2019 · doi:10.15398/jlm.v7i2.210

linguistics formal-analysis foundations

Action-Sensitive Phonological Dependencies

Sophie Hao, Dustin Bowers

SIGMORPHON · Aug 2, 2019 · doi:10.18653/v1/W19-4225

linguistics formal-analysis

Unbounded Stress in Subregular Phonology

Sophie Hao, Vendala Ruby

SIGMORPHON · Aug 2, 2019 · doi:10.18653/v1/W19-4216

linguistics formal-analysis

Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing

William Merrill, Lenny Khazan, Noah Amsel, Sophie Hao, Simon Mendelsohn, Robert Frank

BlackboxNLP · Aug 1, 2019 · doi:10.18653/v1/W19-4823

formal-analysis probing-causality

Learnability and Overgeneration in Computational Syntax

Sophie Hao

SCiL · Jan 5, 2019 · doi:10.7275/1qmz-bg76

linguistics formal-analysis foundations

2018

Context-Free Transductions with Neural Stacks

Sophie Hao, William Merrill, Dana Angluin, Robert Frank, Noah Amsel, Andrew Benz, Simon Mendelsohn

BlackboxNLP · Nov 1, 2018 · doi:10.18653/v1/W18-5433

formal-analysis probing-causality

2017

Harmonic Serialism and Finite-State Optimality Theory

Sophie Hao

FSMNLP · Jan 1, 2017 · doi:10.18653/v1/W17-4003

Important: The main result of this paper is incorrect. Please see Hao (2019) for the corrected version.

linguistics formal-analysis foundations

2015

Model-Theoretic Minimalism

Sophie Hao

BA Thesis · May 14, 2015 · [no id info]

linguistics formal-analysis foundations

2014

The peril of sounding manly: A look at vocal characteristics of lawyers before the United States Supreme Court

Alan C. L. Yu, Katie Franich, Jacob Phillips, Betsy Pillion, Sophie Hao, Zhigang Yin, Daniel Chen

LabPhon · Jul 27, 2014 · [no id info]

linguistics

2013

The Dimension and Entropy of ω-Languages

Sophie Hao

Manuscript · Jul 29, 2013 · [no id info]

formal-analysis

Invited Talks

2025

Towards a Linguistics of Large Language Models

Sophie Hao

Harvard University · Oct 21, 2025

Towards a Linguistics of Large Language Models

Sophie Hao

Stony Brook University · Oct 17, 2025

Towards a Science and Mathematics of Language Models

Sophie Hao

Mathematics of Language Conference · Aug 17, 2025

Towards a Linguistics of Large Language Models

Sophie Hao

Boston University · Mar 21, 2025

Towards a Linguistics of Large Language Models

Sophie Hao

University College London · Mar 4, 2025

Towards a Linguistics of Large Language Models

Sophie Hao

UC San Diego · Jan 30, 2025

2024

Word Embeddings: Examining Culture Through a Data-Driven Lens

Sophie Hao

Vanderbilt University · Apr 3, 2024

2023

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Sophie Hao

FLaNN · Dec 18, 2023

Transformers and Circuit Complexity

Sophie Hao

Flatiron Institute · Nov 17, 2023

2022

Understanding RNNs and Transformers using Formal Languages

Sophie Hao

ETH Zürich · Nov 21, 2022

Understanding RNNs and Transformers using Formal Languages

Sophie Hao

University of Notre Dame · Nov 7, 2022

Finding Interpretable Word Embedding Subspaces using Covariance and Correlation Maximization

Sophie Hao

New York University · Sep 14, 2022