About language, learning and tomatoes

[2] The maths behind the plants: Running the algorithm
Published: 03 Jun 2019
Tomato

Last week we explained the Rescorla-Wagner rule and set out to learn which fluid (or maybe a combination of both fluids) is most likely to yield tomatoes. We will calculate the first few learning weights by hand, to give you a good understanding of how the machinery works.

In the first trial (row 1 in Table 4) the cue pot is present, and the outcome was NO_TOMATO. This means that the second condition (2) from the three learning situations above is satisfied: there is positive evidence for the outcome, and we need to calculate the sum of weights \(\sum{w_{.j}}\) from the cue to the outcome. In the first step, this sum can only be zero because learning is only commencing. Next, still as per the formula in (2), we subtract this zero from one \((1 - 0)\), yielding 1, and multiply that resulting 1 with the learning rate (speed) - \(\gamma\), which we fix to a conveniently small value of 0.01. Hence, we calculate\(0.01 \times 1 = 0.01\). This \(0.01\) is the value of the connection weight between the pot and NO_TOMATO after the first update, as reflected in Table 5 below. For the remaining cues and outcomes, the situation is as follows. If red and blue liquids are not present, then condition (1) from the above listed learning situations applies and nothing happens. Because the cue pot is present on this trial, but the outcome TOMATO is not, we need to apply rule (3) for negative evidence or: \(\gamma(0 - \sum{w_{kj}}) = 0.01(0 - 0) = 0\). In sum, all the remaining cells in the first row of Table 4 contain zeros.

In the second trial, the cues pot, red, blue, and the outcome TOMATO are present. For this, we need to apply rule (2): \(\Delta w^{t} = \gamma(1 -\sum{w_{.j}})\), and then we also need to update: \(w_{ij}^{t+1} = w_{ij}^{t}+ \Delta w^{t}\). As this is the very first “appearance” of the outcome TOMATO, the sum of all weights (\(\sum{w_{.j}}\)) is zero, and thus, in the case of the cue pot, for example, we have: \(\gamma(1 - \sum{w_{kj}}) = 0.01(1 - 0) = 0.01 \times 1 = 0.01\). The same applies to the two remaining cues, red and blue. We can see that after the second update, the three respective cells in the second row of the Table 5 (pot and TOMATO; red and TOMATO; blue and TOMATO), contain exactly those values. The outcome NO_TOMATO was not present on the second learning trial and that means we need to apply the negative evidence (or error-correcting) equation to capture the change: \(\Delta w^{t} =\gamma(0 -\sum w_{.j})\), and then update the connection weights with: \(w_{ij}^{t+1} = w_{ij}^{t} + \Delta w^{t}\). Now, recall that NO_TOMATO has already been "experienced" on the first learning trial, which means that the sum of weights for the currently present cues (pot, red, blue) will not necessarily be equal to zero. Instead it is \(\sum{w_{.j} = 0.01}\), which is a consequence of having the pot present in the first trial and gaining a weight of 0.01 there. Our change thus becomes: \(\Delta w^{t} = \gamma(0 -\sum{w_{.j}}) = 0.01(0 - 0.01) = 0.01 \times (-0.01) = -0.0001\). And from that we can update the values for all cues as follows: for both the red and the blue, which have not previously occurred, the update is the same as their weights in the current step (\(w_{ij}^{t}\)), i.e. they have zero values, and we have: \(w_{ij}^{t+1} =w_{ij}^{t} + \Delta w^{t} = 0 + (-0.0001) = -0.0001\); the pot gets: \(w_{ij}^{t+1} = w_{ij}^{t} + \Delta w^{t} = 0.01 + (-0.0001) = 0.0099\). This is shown in the second row of Table 5.

We can now proceed and calculate the value of the weights after the third learning trial. The third learning trial combines pot and red paired with the outcome TOMATO. Let's get the sums of the weights (\(\sum{w_{.j}}\)) for the two possible outcomes, TOMATO and NO_TOMATO. We can get some help from Table 5 which does all the book-keeping for us, having all weights aligned. The connection weight for both pot and red to TOMATO is now 0.01 after trial 2, which gives \(\sum{w_{.j}} = 0.01 + 0.01 = 0.02\). Analogously, the sum of weights for the outcome NO_TOMATO is \(\sum{w_{.j}} = 0.0099- 0.0001 = 0.0098\). Now, given that in the third learning trial pot, red, and TOMATO are present, we need to apply the positive evidence update rule (2) in those cases, the no change update rule (1) for all combinations with the absence of blue liquid, and the negative evidence update rule (3) in all remaining cases (i.e., pot and NO_TOMATO, and red and NO_TOMATO). Let's list all results, separately:

Cue and Outcome

Change: \(\Delta w^{t}=\gamma(1-\sum(w_{.j}))\)

Update: \(w_{ij}^{t+1}=w_{ij}^{t}+\Delta w^{t}\)

pot \(\rightarrow\) TOMATO

\(0.01(1-0.02)=0.0098\)

\(0.01+0.0098=0.0198\)

red \(\rightarrow\) TOMATO

\(0.01(1-0.02)=0.0098\)

\(0.01+0.0098=0.0198\)

pot \(\rightarrow\) NO_TOMATO

\(0.01(0-0.0098)=-0.000098\)

\(0.0099-0.000098=0.009802\)

red \(\rightarrow\) NO_TOMATO

\(0.01(0-0.0098)=- 0.000098\)

\(-0.0001-0.000098=-0.000198\)

These values are listed in Table 5, together will all the values for all other Trials.

Table 5

Trial

pot

red

blue

Outcome

Outcome

Outcome

TOMATO

NO_TOMATO

TOMATO

NO_TOMATO

TOMATO

NO_TOMATO

1

0.0000

0.0100

0.0000

0.0000

0.0000

0.0000

2

0.0100

0.0099

0.0100

-0.0001

0.0100

-0.0001

3

0.0198

0.0098

0.0198

-0.0002

0.0100

-0.0001

4

0.0194

0.0197

0.0194

0.0097

0.0100

-0.0001

5

0.0291

0.0195

0.0194

0.0097

0.0197

-0.0003

6

0.0286

0.0293

0.0194

0.0097

0.0192

0.0095

7

0.0381

0.0289

0.0289

0.0093

0.0192

0.0095

8

0.0376

0.0385

0.0289

0.0093

0.0186

0.0191

9

0.0469

0.0381

0.0383

0.0088

0.0186

0.0191

10

0.0561

0.0376

0.0474

0.0084

0.0186

0.0191

11

0.0648

0.0369

0.0562

0.0077

0.0274

0.0185

12

0.0739

0.0364

0.0562

0.0077

0.0365

0.0179

13

0.0822

0.0358

0.0645

0.0071

0.0448

0.0173

14

0.0810

0.0452

0.0645

0.0071

0.0436

0.0268

15

0.0895

0.0447

0.0731

0.0066

0.0436

0.0268

16

0.0882

0.0540

0.0731

0.0066

0.0422

0.0361

17

0.0866

0.0634

0.0715

0.0160

0.0422

0.0361

18

0.0953

0.0624

0.0715

0.0160

0.0509

0.0351

19

0.0943

0.0718

0.0715

0.0160

0.0509

0.0351

20

0.0929

0.0807

0.0715

0.0160

0.0495

0.0440


Petar/Dagmar