The Sinclair Lecture 2021: Of wo/men and machines: an interdisciplinary take on language in use

11 Jul 2021

Linguistics has gone through major changes in terms of how language knowledge is conceptualized and explored. Over the past century, the neat and tidy structuralist approach to language knowledge has given way to a fascination with the probabilistic messiness of language use. The advent of the computer and of machine-readable corpora facilitated this transformation. Yet, whereas during the 20th century theory and methodology, arguably, developed in parallel, the 21st century revolution appears to have made both theoretical framing and methodological rigor next to redundant: ever larger datasets are mined with algorithms of ever greater power, while sampling principles are forgotten and the inner workings of the mining model have become intractable. The results are superhuman, but reaffirm the belief in the Big and Mighty and, ultimately, instil doubt in theoretical traditions.

In this talk I will argue that, if we want to understand language knowledge, our field needs to approach human-sized datasets with algorithms inspired by biologically-based learning that pay attention to patterns in language that can be picked up by human brains, too. I will present research carried out by the Out Of Our Minds team [outofourminds.bham.ac.uk] that aims to achieve this and in so doing goes back to a basic insight of corpus linguistics: “a language user has available … a large number of semi-preconstructed phrases that constitute single choices” (Sinclair 1991: 110), both at the lexical and at the grammatical level.

Using a combination of smaller-scale corpus linguistic studies, larger-scale computational simulations and behavioural experimentation I show that language users do detect patterns in input, and that these patterns are very different from both linguistic abstractions and algorithmic trends. Linguists cut abstract patterns loose from the words on which the patterns manifest themselves and from the contexts in which they occur; algorithms with their unconstrained appetite for data overcommit to the input at the expense of generalizability; but language users retain the meaningful link between a pattern and the word(s) it is marked on in context, and do so with relatively little input. Sinclair (1991: 100) struck the balance when he said that “grammar is nothing but a generalization over the usage patterns of individual words in utterances”.