Rescorla–Wagner model

The Rescorla-Wagner model is a model of classical conditioning in which the animal is theorized to learn from the discrepancy between what is expected to happen and what actually happens. This is a trial-level model in which each stimulus is either present or not present at some point in the trial. The prediction of the unconditioned stimulus for a trial can be represented as the sum of all the associative strengths for the conditioned stimuli present during the trial. This is the feature of the model that represents a major advance over previous models, and allowed a straightforward explanation of important experimental phenomena such as blocking. For this reason, the Rescorla-Wagner model has become one of the most influential models of learning, though it has been frequently criticized since its publication. It has attracted considerable attention in recent years, as many studies have suggested that the phasic activity of dopamine neurons in mesostriatal DA projections in the midbrain encodes for the type of prediction error detailed in the model.

The Rescorla-Wagner model was created by Robert A. Rescorla of the University of Pennsylvania and Allan R. Wagner of Yale University.

Success and popularity
The Rescorla-Wagner model has been successful and popular because :
 * 1) it can generate clear and ordinal predictions
 * 2) it has a number of successful predictions
 * 3) processing event representation by intensity and unexpectedness has an intuitive appeal
 * 4) it provides considerable heuristic value
 * 5) it has relatively few free parameters and independent variables
 * 6) it has had little competition from other theories

Basic assumptions of the model

 * 1) The amount of surprise an organism is assumed to experience when encountering an unconditioned stimulus (US) is assumed to be dependent on the summed associative value of all cues present during that trial. This assumption differs from previous models which considered only the associative value of a particular conditioned stimulus (CS) to be the determining aspect of surprise.
 * 2) Excitation and inhibition are opposite features. One stimulus can only have a positive associative strength (being a conditioned excitor) or a negative associative strength (being a conditioned inhibitor); it cannot have both.
 * 3) The associative strength of a stimulus is expressed directly in the behaviour it elicits/inhibits. There is no way of learning about a stimulus without showing what was learned in the organism's reactions.
 * 4) The salience of a CS is a constant. The salience of a CS (alpha) is not supposed to undergo any changes during training and can thus be represented by a constant.
 * 5) The history of a cue does not have any effects on its current state. It is only the current associative value of a cue which determines the amount of learning. It does not matter whether the CS may have undergone several conditioning-extinction sessions or the like.

The first two assumptions are unique to the Rescorla-Wagner model. The last three assumptions were present in antecedents of the model and are less central to the theory but still important to the structure of the model.

Equation
$$ \Delta V^{n+1}_X = \alpha_X \beta (\lambda - V_{tot}) $$

and

$$ V^{n+1}_X = V^n_X + \Delta V^{n+1}_X $$

where
 * $$\Delta V_X$$ is the change in the strength of association of X
 * $$\alpha$$ is the salience of the CS (bounded by 0 and 1)
 * $$\beta$$ is the rate parameter for the US (bounded by 0 and 1), sometimes called its association value
 * $$\lambda$$ is the maximum conditioning possible for the US
 * $$V_X$$ is the current associative strength
 * $$V_{tot}$$ is the total associative strength of all CS

The Revised RW model by Van Hamme and Wasserman (1994)
Van Hamme and Wasserman have extended the original Rescorla-Wagner (RW) model and introduced a new factor in their revised RW model in 1994 : They suggested that not only conditioned stimuli physically present on a given trial can undergo changes in their associative strength, the associative value of a CS can also be altered by a within-compound-association with a CS present on that trial. A within-compound-association is established if two CSs are presented together during training (compound stimulus). If one of the two component CSs is subsequently presented alone, then it is assumed to activate a representation of the other (previously paired) CS as well. Van Hamme and Wasserman propose that stimuli indirectly activated through within-compound-associations have a negative learning parameter — thus phenomena of retrospective reevaluation can be explained.

Let's consider the following example, an experimental paradigm called "backward blocking", indicative of retrospective revaluation, where AB is the compound stimulus A+B:

Phase 1:     AB-US

Phase 2:      A-US

Test trials: Group 1, which received both Phase 1- and 2-trials, elicits a weaker conditioned response (CR) to B compared to the Control group, which only received Phase 1-trials.

The original RW model cannot account for this effect. But the revised model can: In phase 2, stimulus B is indirectly activated through within-compound-association with A. But instead of a positive learning parameter (usually called alpha) when physically present, during Phase 2, B has a negative learning parameter. Thus during the second phase, B's associative strength declines whereas A's value increases because of its positive learning parameter.

Thus, the revised RW model can explain why the CR elicited by B after backward blocking training is weaker compared with AB-only conditioning.

Some failures of the RW Model
Spontaneous recovery from extinction and recovery from extinction caused by reminder treatments (reinstatement)
 * It is a well established observation that a time-out interval after completion of extinction results in partial recovery from extinction, i.e. the previously extinguished reaction or response recurs — but usually at a lower level than before extinction training. Reinstatement refers to the phenomenon that exposure to the US from training alone after completion of extinction results in partial recovery from extinction. The RW model can't account for those phenomena.

Extinction of a previously conditioned inhibitor
 * The RW model predicts that repeated presentation of a conditioned inhibitor alone (a CS with negative associative strength) results in extinction of this stimulus (a decline of its negative associative value). This is a false prediction. Contrarily, experiments show the repeated presentation of a conditioned inhibitor alone even increases its inhibitory potential.

Facilitated reacquisition after extinction
 * One of the assumptions of the model is that the history of conditioning of a CS does not have any influences on its present status — only its current associative value is important. Contrary to this assumption, many experiments show that stimuli that were first conditioned and then extinguished are more easily reconditioned (i.e. fewer trial are necessary for conditioning).

The exclusiveness of excitation and inhibition
 * The RW model also assumes that excitation and inhibition are opponent features. A stimulus can either have excitatory potential (a positive associative strength) or inhibitory potential (a negative associative strength), but not both. By contrast it is sometimes observed, that stimuli can have both qualities. One example is backward excitatory conditioning in which a CS is backwardly paired with a US (US-CS instead of CS-US). This usually makes the CS become a conditioned excitor. But interestingly, the stimulus also has inhibitory features which can be proven by the retardation of acquisition test. This test is used to assess the inhibitory potential of a stimulus since it is observed that excitatory conditioning with a previously conditioned inhibitor is retarded. The backwardly conditioned stimulus passes this test and thus seems to have both excitatory and inhibitory features.

Pairing a novel stimulus with a conditioned inhibitor
 * A conditioned inhibitor is assumed to have a negative associative value. By presenting an inhibitor with a novel stimulus (i.e. its associative strength is zero), the model predicts that the novel cue should become a conditioned excitor. This is not the case in experimental situations. The predictions of the model stem from its basic term (lambda-V). Since the summed associative strength of all stimuli (V) present on the trial is negative (zero + inhibitory potential) and lambda is zero (no US present), the resulting change in the associative strength is positive, thus making the novel cue a conditioned excitor.

CS-preexposure effect
 * The CS-preexposure effect (also called latent inhibition) is the well established observation that conditioning after exposure to the stimulus later used as the CS in conditioning is retarded. The RW model doesn't predict any effect of presenting a novel stimulus without a US.

Higher-order conditioning
 * In higher order conditioning a previously conditioned CS is paired with a novel cue (i.e. first CS1-US then CS2-CS1). This usually makes the novel cue CS2 elicit similar reactions to the CS1. The model cannot account for this phenomenon since during CS2-CS1 trials, no US is present. But by allowing CS1 to act similarly to a US, one can reconcile the model with this effect.

Sensory preconditioning
 * Sensory preconditioning refers to first pairing two novel cues (CS1-CS2) and then pairing one of them with a US (CS2-US). This turns both CS1 and CS2 into conditioned excitors. The RW model cannot explain this, since during the CS1-CS2-phase both stimuli have an associative value of zero and lambda is also zero (no US present) which results in no change in the associative strength of the stimuli.