Nur Lan
About
I am a PhD candidate in linguistics and cognitive science at the Ecole Normale Supérieure in Paris and Tel Aviv University, co-advised by Emmanuel Chemla (ENS) and Roni Katzir (TAU). My PhD is funded by the ED3C.
I am mainly interested in building computational models that try to learn natural language in the same way we think humans do. I am also interested in the evolution of language, formal language theory, information theory, as well as in animal linguistics and comparative cognition, and I like combining these fields in thinking about the origins of the human language capacity.
I did my master's in computational linguistics under the supervision of Roni Katzir, after completing a bachelor's degree in Computer Science (double-major with a BA in Film), both at Tel Aviv University. In between I worked for several tech companies in Tel Aviv, and as an editor and writer in Israeli press.
Projects
Minimum Description Length Recurrent Neural Networks
With Michal Geyer, Emmanuel Chemla & Roni Katzir
Neural networks are remarkably successful, but still fail on tasks which are very easy for humans, like understanding simple regularities such as 10101010... or aaabbb..., and learning basic arithmetic operations like addition and multiplication.
To make networks generalize better, we replace standard objectives with a computable version of Kolmogorov Complexity, the Minimum Description Length principle (MDL), which balances the network's architecture size with its accuracy.
Using an evolutionary architecture search guided by MDL, we find small and perfect networks that can handle tasks which are notoriously hard for traditional networks, like basic addition, and recognition of formal languages such as Dyck-1, anbn, anb2n, anbmcn+m, and anbncn. MDL networks are very small, and often contain only one or two hidden units, which makes it possible to prove that they are correct for all strings. No other neural network that we know of has been proven do to that.
addition network
Addition network
anbncn network
anbncn network
Large Language Models and the Argument From the Poverty of the Stimulus
With Emmanuel Chemla & Roni Katzir
Modern language models are trained on huge corpora that amount to years or even lifetimes of human linguistic experience. Can we use this fact to learn about the initial state of a human child acquiring language?
Building on work by Wilcox et al. (2022), we examine the knowledge of state-of-the-art language models, including GPT-j and GPT-3, regarding important syntactic constraints. We find that all models fail to acquire an adequate knowledge of these phenomena, delivering predictions that clash with the judgments of human speakers. Since these models are trained on data that go above and beyond the linguistic experience of children, our findings support the claim that children are equipped with innate linguistic biases that these models don't have.
Surprisal values for 'Who did the fact that Mary remembered surprise yesterday/*you'
GPT-3's surprisal values for a grammatical parasitic gap sentence (blue) and its ungrammatical variant (red)
Papers and manuscripts
Contact
Misc.
π