What is learned from exposure
The problem
If you speak Polish and want to say that you don’t have a computer or phone you face a difficult decision: do you say Nie mam komputera or do you say Nie mam komputeru? And is it Nie mam telefona or Nie mam telefonu? And what about your tablet? And your iPhone? Polish speakers get confused about this too. You can see it for yourself here !
It should be Nie mam komputera but Nie mam telefonu. Why, we hear you ask in despair? Well, that’s the question linguists have been trying to answer for a long time, and they have proposed many classifications of things that take -a and things that take -u.
The tradition
Some linguists are “lumpers” and try to explain the variation with as few categories as possible. For example, some have proposed that the ending -a would be used for small, manipulable items, while the ending -u would be reserved for large, immovable objects. But that doesn’t even hold for our small sample: surely, phones are, and have always been, more movable than computers?
Other linguists are “splitters” and try to make their categories so small as to avoid there being any exceptions. Westfal (1965) is a prototypical example of a splitter and wrote a 365 page book that deals with -a versus -u variation. There may not be any exceptions to his rules, but there are nearly as many rules as there are masculine inanimate words! We think Westfal lost hope at some point too, because he concluded his book by saying that the -u ending would be the elegant one, while the -a ending would come with a tinge of vulgarity or roughness.
What we did
We took a different approach: using our biologically and psychologically plausible learning algorithm, we re-examined the input, unconstrained by linguistic tradition. That means: we weren’t limiting ourselves to the patterns linguists have previously looked for, because, maybe there is a system, not under the purview of linguistic convention, that we patternovores discern and learn?
What we found
We found that words that end in -a tend to contain three-letter sequences that activate comparatively fewer words in Polish. Semantically speaking, these words tend to be poorly entrenched or rather atypical. Words that end in -u, on the other hand, contain three-letter sequences that activate many other words in Polish. Semantically speaking, these words are well entrenched and contextually typical.
Intriguingly, this finding provides a possible explanation for two observations. Recall that Westfal (1956) concludes that the -u ending would be the elegant one while the -a ending would come with a tinge of vulgarity (or roughness). Words that take -u are made up of three-letter sequences that are distributed over many other words and are hence typical for the Polish language. Words that take -a, on the other hand, contain three-letter sequences that are distributed over fewer words, i.e. three-letter sequences that would be less typical for Polish and, hence, such words appear less desirable. Words that are typical for Polish are naturally also better known than words that are atypical, hence this finding may explain why the minority ending -a is the one that attracts proportionally more foreign words.
We tested our computational findings regarding the three-letter sequences on people who have Polish as their mother tongue. We asked them to say “Nie ma X” where X was a made-up word that looks Polish but does not actually exist in the language. And sure enough, they preferred the -u ending significantly more often when our model preferred the -u ending, and they picked the -a ending significantly more often when our model preferred the -a ending.
Article, data and code
You can read the details of our study here. The dataset is available from UBIRA and the code is on our GitHub repository.