Skip to main content


Developing the new area of "Statistical Linguistics"

Research Achievements

Developing the new area of "Statistical Linguistics"

For our IGERT we proposed to develop the new area of ’statistical Linguistics', applying insights from statistical computational linguistics to problems in theoretical linguistics. In work with trainee Marieke Obdeyn, new core faculty member Colin Wilson has shown that applying a very successful probabilistic model from computational linguistics, the maximum-entropy model, leads to substantial improvements in accounting for a central problem in theoretical phonology, namely apparent cross-linguistic variation in a fundamental principle that prohibits similar consonants from appearing in the same word. This had long been recognized as a statistical language universal, but previous analyses deployed mathematically ad hoc approaches which Wilson and Obdeyn showed to be systematically biased. When these biases are removed through the maximum entropy model the range of possible cross-linguistic variation can be substantially narrowed.