RESUMEN
The human brain excels at constructing and using abstractions, such as rules, or concepts. Here, in two fMRI experiments, we demonstrate a mechanism of abstraction built upon the valuation of sensory features. Human volunteers learned novel association rules based on simple visual features. Reinforcement-learning algorithms revealed that, with learning, high-value abstract representations increasingly guided participant behaviour, resulting in better choices and higher subjective confidence. We also found that the brain area computing value signals - the ventromedial prefrontal cortex - prioritised and selected latent task elements during abstraction, both locally and through its connection to the visual cortex. Such a coding scheme predicts a causal role for valuation. Hence, in a second experiment, we used multivoxel neural reinforcement to test for the causality of feature valuation in the sensory cortex, as a mechanism of abstraction. Tagging the neural representation of a task feature with rewards evoked abstraction-based decisions. Together, these findings provide a novel interpretation of value as a goal-dependent, key factor in forging abstract representations.
Asunto(s)
Encéfalo/fisiología , Aprendizaje/fisiología , Algoritmos , Conducta , Mapeo Encefálico , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Lóbulo Parietal , Corteza Prefrontal/fisiología , Refuerzo en Psicología , Recompensa , Adulto JovenRESUMEN
Due to the lack of enough generalization in the state space, common methods of reinforcement learning suffer from slow learning speed, especially in the early learning trials. This paper introduces a model-based method in discrete state spaces for increasing the learning speed in terms of required experiences (but not required computation time) by exploiting generalization in the experiences of the subspaces. A subspace is formed by choosing a subset of features in the original state representation. Generalization and faster learning in a subspace are due to many-to-one mapping of experiences from the state space to each state in the subspace. Nevertheless, due to inherent perceptual aliasing (PA) in the subspaces, the policy suggested by each subspace does not generally converge to the optimal policy. Our approach, called model-based learning with subspaces (MoBLeSs), calculates the confidence intervals of the estimated Q -values in the state space and in the subspaces. These confidence intervals are used in the decision-making, such that the agent benefits the most from the possible generalization while avoiding from the detriment of the PA in the subspaces. The convergence of MoBLeS to the optimal policy is theoretically investigated. In addition, we show through several experiments that MoBLeS improves the learning speed in the early trials.