The theory of reinforcement learning provides a normative account1,deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment.Touse reinforcement learning successfully insituations approaching real-world complexity, however, agents are confronted with a difficult task: theymust derive efficient representations of the environment from high dimensional sensory inputs, and use these to generalize past experience to newsituations. Remarkably, humans and other animals seemto solve this problemthrough a harmonious combination of reinforcement learningandhierarchical sensoryprocessing systems4,5, the former evidenced by a wealth of neural data revealing notable parallels betweenthe phasic signals emitted bydopaminergic neurons and temporal difference reinforcement learning algorithms3.While reinforcement learning agents have achieved some
successes in a variety of domains6–8, their applicability has previously been limited to domains inwhich useful features can be handcrafted,
or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9–11 to develop a novel artificial agent, termed a deep Q-network, that can
learn successful policies directly fromhigh-dimensional sensory inputs using end-to-end reinforcement learning。