Removal of reinforcement improves instrumental performance in humans by decreasing a general action bias rather than unmasking learnt associations
Fig 3
Computational modelling results.
(A) Comparison of the Bayesian information criterion (BIC) relative to the baseline model. Negative BIC differences indicate a decrease in BIC relative to the baseline model and hence better fit. Conversely, a positive BIC difference indicates worse fit. The bias model provided the best fit. (B) The bias model contained two separate bias parameters, bR and bP, for reinforced and probe blocks, respectively. The bias is reduced on probe compared to reinforced trials. (C) Initial estimates Q0 of option values. On average, estimates were initialized with positive values. (D) Softmax choice probabilities to select an option as a function of its value. The sigmoids for reinforced and probe trials were generated using the mean fitted parameters. This figure illustrates how a reduction in response bias together with a positive value initialization resulted in the increase in d’ observed in behaviour. Solid vertical grey line indicates average Q0. As values of go stimuli were acquired (shifting rightwards from the vertical line), the difference in action probabilities between probe and reinforced trials became smaller (green arrow). Conversely, as values of no-go stimuli were acquired (shifting leftwards from the vertical line), the difference became more pronounced (red arrow), thus leading to a stronger reduction in false alarm rates. (E) Time course of simulated go-response probabilities. The probability P(Go) for go trials (green) and no-go trials (red) was simulated based on the bias model. Darker shades of green and red indicate probe trials. Solid lines represent mean, shaded areas SEM across simulations.