Skip to main content


Fieldwork on child language learning in Dagestan


Graduate student Annie Gagliardi and her collaborators are working with children in remote villages in Dagestan in an effort to unlock the mysteries of why children are such successful language learners. Gagliardi is a trainee in the University of Maryland’s new program on “Biological and Computational Foundations of Language Diversity”, which is supported by NSF’s Integrative Graduate Education and Research Traineeship (IGERT) program. Gagliardi is focusing on the processes that allow children to learn how their language organizes nouns into different classes. Languages categorize words in many different ways, ranging from one class (as in English) to as many as thirteen classes, and the classification may be based on gender (as in Spanish) or biological properties (e.g., animals), shapes, the sound of the word, or the classification may be arbitrary. Noun classes often determine the form of other words in a sentence, such as adjectives or verbs. Gagliardi and her collaborators are integrating research techniques from linguistics, machine learning and developmental psychology in order to identify the cognitive mechanisms that underlie these categories.

This research takes advantage of several research traditions. Traditional linguistics has typically focused on identifying features relevant to classification by describing the vocabulary items in various languages and looking for commonalities across them. Traditional psychology has typically explored categorization mechanisms through behavioral experiments on the categorization of artificial stimuli. Traditional computer science has been focused on engineering tools for finding classes, using computational modeling to extract information from large corpora. While each approach has made valuable contributions within its own field, these results have generally not been linked together to inform the broader goal: learning how categories are discovered and learned by children presented with a system dependent on categorization. Under the IGERT program, collaboration between linguists, psycholinguists and computer scientists is encouraged, enabling new integration of the traditional approaches by combining methods to identify how human learners create classes.

The investigation centers on Tsez, a Nakh-Dagestanian language spoken in the northeast Caucasus. This language was chosen because of its system of 4 noun classes and the fact that is it still being acquired as a first language by children. As a first step, Gagliardi has collected a large sample of naturalistic speech to children. Because this information forms the input to the learning mechanism, it is important to fully characterize it. Second, she has employed computational models to determine the informativity of various cues to categories, using the naturalistic data as input to the model. Third, she has evaluated the predictions of the model by investigating childrens’ and adults’ sensitivity to these cues using behavioral experiments.

Address Goals

The research is preparing graduate student Annie Gagliardi to be a uniquely well trained interdisciplinary scientist, who can draw upon tools from linguistics, psychology, and computer science to address questions that cannot be solved using the tools of an individual discipline.

The research promises to significantly advance our understanding of how children form categories and use category information to draw inferences, both are key features of human language learning.