sorry 自然文章方程没有号将就看看abstract吧。

引用：

The theory of reinforcement learning provides a normative account1,deeply rooted in psychological2 and neuroscientific3 perspectives on　animal behaviour, of how agents may optimize their control of an　environment.Touse reinforcement learning　successfully insituations　approaching real-world complexity, however, agents are confronted　with a difficult task: theymust derive efficient　representations of the　environment from high　dimensional sensory inputs, and use these　to　generalize past experience to　newsituations.　Remarkably, humans　and other animals seemto solve this problemthrough a harmonious　combination of reinforcement learningandhierarchical　sensoryprocessing　systems4,5, the former evidenced by a wealth of neural data　revealing　notable parallels betweenthe phasic signals emitted bydopaminergic　neurons and temporal difference reinforcement learning　algorithms3.While reinforcement learning agents have achieved some
successes in a variety of domains6–8, their applicability has previously　been limited to domains inwhich useful features can be handcrafted,
or to domains with fully observed, low-dimensional state spaces.　Here we use recent advances in training deep neural networks9–11 to　develop a novel artificial agent, termed a deep Q-network, that can
learn successful policies directly fromhigh-dimensional sensory inputs　using end-to-end reinforcement learning。