About
I am a post-doctoral researcher at the Ecole Normale Supérieure in Paris (institut Jean Nicod). I completed my PhD in linguistics and cognitive science at ENS and Tel Aviv University, where I was co-advised by Emmanuel Chemla (ENS) and Roni Katzir (TAU). My dissertation can be found here.
I am mainly interested in building computational models that try to learn natural language in the same way we think humans do. I am also interested in the evolution of language, formal language theory, information theory, as well as in animal linguistics and comparative cognition, and I like combining these fields in thinking about the origins of the human language capacity.
I did my master's in computational linguistics under the supervision of Roni Katzir, after completing a bachelor's degree in Computer Science (double-major with a BA in Film), both at Tel Aviv University. In between I worked for several tech companies in Tel Aviv, and as an editor and writer in Israeli press.
Papers and manuscripts
Projects
Minimum Description Length Recurrent Neural Networks
With Michal Geyer, Emmanuel Chemla & Roni Katzir
addition network
Binary addition network found through neuroevolution guided by MDL
Neural networks still struggle with tasks that are very easy for humans, like recognizing simple regularities such as 10101010... or aaabbb..., and learning basic arithmetic.
To make networks generalize better, we replace standard objectives with a computable version of Kolmogorov Complexity, the Minimum Description Length principle (MDL), which balances the network's architecture size with its accuracy.
Using neuroevolution guided by MDL, we find small and perfect networks that can handle tasks which are notoriously hard for traditional networks like basic addition and formal languages such as Dyck-1 , anbn, anb2n, anbmcn+m, and anbncn. MDL networks are very small, and often contain only one or two hidden units, which makes it possible to prove that they are correct for all strings. No other neural network that we know of has been proven do to that.
Bridging the Empirical-Theoretical Gap in Formal Language Learning
With Emmanuel Chemla & Roni Katzir
addition network
L1 and L2 regularization surfaces compared to MDL
Neural nets are known to be able to perfectly represent formal languages such as anbn and Dyck. Yet such networks are never found using standard techniques. How come?
We manually build an optimal anbn LSTM and find that it is not an optimum of the standard cross-entropy loss, even with regularization terms that according to common wisdom should lead to good generalization. Meta-heuristics like early-stopping and dropout don't help either. However, moving to the Minimum Description Length objective leads to the perfect target network.
Large Language Models and the Argument From the Poverty of the Stimulus
With Emmanuel Chemla & Roni Katzir
Surprisal values for 'Who did the fact that Mary remembered surprise yesterday/*you'
GPT-3's surprisal values for a grammatical parasitic gap sentence (blue) and its ungrammatical variant (red)
Modern language models are trained on huge corpora that amount to years or even lifetimes of human linguistic experience. Can we use this fact to learn about the initial state of a human child acquiring language?
Building on work by Wilcox et al. (2022), we examine the knowledge of state-of-the-art language models, including GPT-j and GPT-3, regarding important syntactic constraints. We find that all models fail to acquire an adequate knowledge of these phenomena, delivering predictions that clash with the judgments of human speakers. Since these models are trained on data that go above and beyond the linguistic experience of children, our findings support the claim that children are equipped with innate linguistic biases that these models don't have.
Misc.
Contact
π