Out Of Our Minds

< Back

About language, learning and tomatoes

[3] Insights for language: Applying the algorithm

Published: 01 Jul 2019

In our last blog we talked you through the Rescorla-Wagner rule and worked through a classic example from the literature. This week we would like to repeat this procedure on an actual linguistic example. Let's say that we want to use the same algorithm to learn when to use the verb start and when to use the verb begin from exposure to examples only. There are two forms to choose from: START and BEGIN. Native speakers of English won't find it hard to select the verb that fits the grammatical and semantic context best, but what guides them? Let's look at a few examples:

The movie ______ at 7:30pm.
The car wouldn't ______ .
My grandfather _____ his company in 1927.
The meeting ____ at 9am.
We should ___ early, before traffic gets busy.
The day _____ well.

Table 6 presents information just like the one that was given in Table 1, but this time for verbs. Let's agree to label the context that requires one of the two verbs we're interested in <commence>.

Table 6

Cues	Outcomes	Frequency
commence; journey	START	2
commence; machine	START	11
commence; movie	START	19
commence; movie	BEGIN	10
commence; business	START	2
commence; meeting	BEGIN	17
commence; meeting	START	9
commence; day	BEGIN	8
commence; day	START	3

This time we will not expand Table 6 - it contains a total of 81 learning events! Rather, we will present only the first ten learning events, as they would appear after such table expansion and randomization (like we did in the previous exercises).

Table 7

Trial	Cues	Outcomes
1	commence; meeting	BEGIN
2	commence; movie	START
3	commence; meeting	BEGIN
4	commence; day	BEGIN
5	commence; meeting	BEGIN
6	commence; machine	START
7	commence; movie	START
8	commence; machine	START
9	commence; machine	START
10	commence; movie	START
	...	...

Finally, Table 8 represents the value of the learning weights as they get updated for each cue-outcome combination, for the first ten and the last three learning events.

Table 8

Trial	commence		business		day		journey		machine		meeting		movie
	Outcome		Outcome		Outcome		Outcome		Outcome		Outcome		Outcome
	BEGIN	START	BEGIN	START	BEGIN	START	BEGIN	START	BEGIN	START	BEGIN	START	BEGIN	START
1	0.010	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.010	0.000	0.000	0.000
2	0.010	0.010	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.010	0.000	0.000	0.010
3	0.020	0.010	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.020	0.000	0.000	0.010
4	0.030	0.010	0.000	0.000	0.010	0.000	0.000	0.000	0.000	0.000	0.020	0.000	0.000	0.010
5	0.039	0.010	0.000	0.000	0.010	0.000	0.000	0.000	0.000	0.000	0.029	0.000	0.000	0.010
6	0.039	0.020	0.000	0.000	0.010	0.000	0.000	0.000	0.000	0.010	0.029	0.000	0.000	0.010
7	0.038	0.029	0.000	0.000	0.010	0.000	0.000	0.000	0.000	0.010	0.029	0.000	0.000	0.020
8	0.038	0.039	0.000	0.000	0.010	0.000	0.000	0.000	-0.001	0.020	0.029	0.000	0.000	0.020
9	0.037	0.048	0.000	0.000	0.010	0.000	0.000	0.000	-0.001	0.029	0.029	0.000	0.000	0.020
10	0.037	0.058	0.000	0.000	0.010	0.000	0.000	0.000	-0.001	0.029	0.029	0.000	-0.001	0.029
	...	...	...	...	...	...	...	...	...	...	...	...	...	...
79	0.208	0.295	-0.003	0.016	0.060	0.011	-0.003	0.016	-0.012	0.091	0.121	0.035	0.045	0.126
80	0.205	0.302	-0.003	0.016	0.060	0.011	-0.003	0.016	-0.012	0.091	0.118	0.042	0.045	0.126
81	0.212	0.297	-0.003	0.016	0.060	0.011	-0.003	0.016	-0.012	0.091	0.118	0.042	0.052	0.122

The whole learning session can be presented visually:

The figure reveals some interesting results of learning.

First, it appears that the context of <commence> gives a bit more support to the outcome START than to BEGIN. This follows from the fact that <commence> was present on each trial, and, overall, START was somewhat more frequent than BEGIN (it was used in 46 vs. 35 trials). We can also see that the association weight between <commence> and BEGIN shows a tendency to stabilise (that is: reach asymptote) around the value of \(0.2\). This is not the case for the weight between <commence> and START, which continues to grow in strength till the end of the learning session.

Next, our machine learned that <machine> and <movie> have a tendency to START, but that a <meeting> most likely BEGINs. There are some other, less strong positive connections, which reveal the 'doubts' that our machine has; for example, sometimes a <movie> can BEGIN and a <meeting> can occasionally START but a <machine> will most definitely not BEGIN, since that association weight terminates with a small negative value.

Linguists will recognize in this description of learning the dimensions that are typically distinguished in corpus-linguistic approaches to lexicography. This parallelism illustrates nicely how a computational modelling approach can be directly relevant to core questions in linguistics while adding insights as to how learning the relevant dimensions of lexical experience are identified and learned over time. Crucially, this approach works on much more than lexical phenomena and we refer to our Outputs page for other examples.

Petar/Dagmar