cinnamonlab Boston University

Research

Cinnamonlab conducts cutting-edge research in computational linguistics, interpretability, and social aspects of AI using a diverse range of interdisciplinary methods and perspectives. We publish in ACL conferences and workshops as well as other top-tier venues in computational linguistics, theoretical linguistics, and AI.

Research Themes

Analysis of Representations probing, feature attribution, causality

Analyzing what information is encoded by neural representations and how they drive model behavior

Linguistic Evaluation behavioral tests, psycholinguistic modeling

Assessing the linguistic abilities of language models and comparing them to human language processing

Theory of Architectures formal language theory, computational complexity

Describing the expressive power of architectures for neural networks and linguistic theories

Theoretical Linguistics syntax, phonology, psycholinguistics

Gaining insights about human language and linguistics using computational techniques

AI and Society bias, fairness, humanities and social science

Understanding AI’s relation to society; applying AI to the humanities and social sciences

Foundations of CL/NLP position papers, metatheory, philosophy of science

Reflecting on what computational linguistics is, why we do it, and how we ought to do it

Publications

2026

Context-Free Recognition with Transformers
Context-Free Recognition with Transformers
Sélim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill
arXiv  ·  Jan 5, 2026  ·  https://arxiv.org/abs/2601.01754

2025

ModelCitizens: Representing Community Voices in Online Safety
ModelCitizens: Representing Community Voices in Online Safety
Ashima Suvarna, Christina A Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel
EMNLP  ·  Nov 5, 2025  ·  doi:10.18653/v1/2025.emnlp-main.1571
What Goes Into a LM Acceptability Judgment Rethinking the Impact of Frequency and Length
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
Lindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao
NAACL  ·  May 1, 2025  ·  doi:10.18653/v1/2025.naacl-long.109
ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors
ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors
Qinchan Li, Sophie Hao
NAACL  ·  Apr 30, 2025  ·  doi:10.18653/v1/2025.naacl-long.159

2024

Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels
Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels
Kexin Luo, Yue Mao, Bei Zhang, Sophie Hao
LREC-Coling  ·  May 22, 2024  ·  [no id info]

2023

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Sophie Hao, Tal Linzen
EMNLP Findings  ·  Dec 8, 2023  ·  doi:10.18653/v1/2023.findings-emnlp.300
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Sophie Hao, 446 other authors
TMLR  ·  May 11, 2023  ·  [no id info]

2022

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao, Dana Angluin, Robert Frank
TACL  ·  Jul 27, 2022  ·  doi:10.1162/tacl_a_00490
An Adversarial Benchmark for Fake News Detection Models
An Adversarial Benchmark for Fake News Detection Models
Lorenzo Jaime Yu Flores, Sophie Hao
AAAI-22 AdvML Workshop  ·  Feb 28, 2022  ·  arxiv:2201.00912

2020

Evaluating Attribution Methods using White-Box LSTMs
Evaluating Attribution Methods using White-Box LSTMs
Sophie Hao
BlackboxNLP  ·  Nov 20, 2020  ·  doi:10.18653/v1/2020.blackboxnlp-1.28
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
Sophie Hao, Simon Mendelsohn, Rachel Sterneck, Randi Martinez, Robert Frank
CMCL  ·  Nov 19, 2020  ·  doi:10.18653/v1/2020.cmcl-1.10
Rhythmic Syncope in Subregular Phonology
Rhythmic Syncope in Subregular Phonology
Dustin Bowers, Sophie Hao
Penn Linguistics Conference  ·  Oct 1, 2020  ·  [no id info]
Attribution Analysis of Grammatical Dependencies in LSTMs
Attribution Analysis of Grammatical Dependencies in LSTMs
Sophie Hao
arXiv  ·  May 4, 2020  ·  arxiv:2005.00062
Computing Vowel Harmony: The Generative Capacity of Search amp; Copy
Computing Vowel Harmony: The Generative Capacity of Search & Copy
Vendala Ruby, Hossep Dolatian, Sophie Hao
Annual Meeting on Phonology  ·  May 2, 2020  ·  doi:10.3765/amp.v8i0.4752
Metrical Grids and Generalized Tier Projection
Metrical Grids and Generalized Tier Projection
Sophie Hao
SCiL  ·  Jan 4, 2020  ·  doi:10.7275/scil.1227

2019

Finite-state Optimality Theory: non-rationality of Harmonic Serialism
Finite-state Optimality Theory: non-rationality of Harmonic Serialism
Sophie Hao
Journal of Language Modelling  ·  Sep 16, 2019  ·  doi:10.15398/jlm.v7i2.210
Action-Sensitive Phonological Dependencies
Action-Sensitive Phonological Dependencies
Sophie Hao, Dustin Bowers
SIGMORPHON  ·  Aug 2, 2019  ·  doi:10.18653/v1/W19-4225
Unbounded Stress in Subregular Phonology
Unbounded Stress in Subregular Phonology
Sophie Hao, Vendala Ruby
SIGMORPHON  ·  Aug 2, 2019  ·  doi:10.18653/v1/W19-4216
Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing
Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing
William Merrill, Lenny Khazan, Noah Amsel, Sophie Hao, Simon Mendelsohn, Robert Frank
BlackboxNLP  ·  Aug 1, 2019  ·  doi:10.18653/v1/W19-4823

2018

Context-Free Transductions with Neural Stacks
Context-Free Transductions with Neural Stacks
Sophie Hao, William Merrill, Dana Angluin, Robert Frank, Noah Amsel, Andrew Benz, Simon Mendelsohn
BlackboxNLP  ·  Nov 1, 2018  ·  doi:10.18653/v1/W18-5433

2017

Harmonic Serialism and Finite-State Optimality Theory
Harmonic Serialism and Finite-State Optimality Theory
Sophie Hao
FSMNLP  ·  Jan 1, 2017  ·  doi:10.18653/v1/W17-4003
Important: The main result of this paper is incorrect. Please see Hao (2019) for the corrected version.

2015

Model-Theoretic Minimalism
Model-Theoretic Minimalism
Sophie Hao
BA Thesis  ·  May 14, 2015  ·  [no id info]

2014

The peril of sounding manly: A look at vocal characteristics of lawyers before the United States Supreme Court
The peril of sounding manly: A look at vocal characteristics of lawyers before the United States Supreme Court
Alan C. L. Yu, Katie Franich, Jacob Phillips, Betsy Pillion, Sophie Hao, Zhigang Yin, Daniel Chen
LabPhon  ·  Jul 27, 2014  ·  [no id info]

2013

The Dimension and Entropy of ω-Languages
The Dimension and Entropy of ω-Languages
Sophie Hao
Manuscript  ·  Jul 29, 2013  ·  [no id info]

Invited Talks

2025

Towards a Linguistics of Large Language Models
Towards a Linguistics of Large Language Models
Sophie Hao
Harvard University  ·  Oct 21, 2025
Towards a Linguistics of Large Language Models
Towards a Linguistics of Large Language Models
Sophie Hao
Stony Brook University  ·  Oct 17, 2025
Towards a Science and Mathematics of Language Models
Towards a Science and Mathematics of Language Models
Sophie Hao
Mathematics of Language Conference  ·  Aug 17, 2025
Towards a Linguistics of Large Language Models
Towards a Linguistics of Large Language Models
Sophie Hao
Boston University  ·  Mar 21, 2025
Towards a Linguistics of Large Language Models
Towards a Linguistics of Large Language Models
Sophie Hao
University College London  ·  Mar 4, 2025
Towards a Linguistics of Large Language Models
Towards a Linguistics of Large Language Models
Sophie Hao
UC San Diego  ·  Jan 30, 2025

2024

Word Embeddings: Examining Culture Through a Data-Driven Lens
Word Embeddings: Examining Culture Through a Data-Driven Lens
Sophie Hao
Vanderbilt University  ·  Apr 3, 2024

2023

Transformers and Circuit Complexity
Transformers and Circuit Complexity
Sophie Hao
Flatiron Institute  ·  Nov 17, 2023

2022

Understanding RNNs and Transformers using Formal Languages
Understanding RNNs and Transformers using Formal Languages
Sophie Hao
ETH Zürich  ·  Nov 21, 2022
Understanding RNNs and Transformers using Formal Languages
Understanding RNNs and Transformers using Formal Languages
Sophie Hao
University of Notre Dame  ·  Nov 7, 2022