好吧议论不少了来看看他们文章开头吧。用CNN来估RL里的Q值。

引用：

We set out to create a single algorithm that would be able to develop a wide range of competencies on a varied range of challenging tasks—a central goal of general artificial intelligence that has eluded previous efforts.To achieve this,we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class agent is to select actions in a fashion that maximizes cumulative future reward. More formally, we use a deep convolutional neural network to approximate the optimal action-value function, which is the maximum sum of rewards rt discounted by c at each timestep t, achievable by a behaviour policy p5P(ajs), after making an observation (s) and taking an action (a).
of artificial neural network16 known as deep neural networks. Notably,recent advances in deep neural networks, in which several layers of nodes are used to build up progressively more abstract representations.

of the data, have made it possible for artificial neural networks to learn
concepts such as object categories directly from raw sensory data. We
use one particularly successful architecture, the deep convolutional
network17, which uses hierarchical layers of tiled convolutional filters
to mimic the effects of receptive fields—inspired by Hubel and Wiesel’s
seminalwork on feedforward processing in early visual cortex18—thereby
exploiting the local spatial correlations present in images, and building
in robustness to natural transformations such as changes of viewpoint
or scale.
We consider tasks in which the agent interacts with an environment
through a sequence of observations, actions and rewards.The goal of the