rlstructures with GPUs

Based on rlstructures v0.2

Other Tutorials:

The described techniques are available in the rlalgos/reinforce_device directory which illustrates the use of GPUs for REINFORCE. Note that the ability to use GPU (for loss computation) is also provided for DQN and A2C in the repository.

Let us restart from the REINFORCE implementation provided in a previous tutorial. In such an implementation, GPU can occur at three different locations to speed-up the learning process:

  • At the loss computation level to accelerate loss computation.
  • At the batcher level to accelerate data acquisition (both for the learning and evaluation batcher)

GPU for loss computation is the easiest way to use GPU in your RL algorithm. It can be done in the following step:

  • Maintain a learning model on GPU during training
  • Move acquired trajectories to GPU when computing the loss
  • Then loss computations can be made on GPU. Note that, if you are using the replay_agent function, then you also need an agent that works on GPU as explained in the next section

Moving agents to GPU may be useful in two cases:

  1. when using the default replay_agent function over trajectories stored in GPU since this function makes use of the RL_Agent.__call__ method

2. when executing batchers on GPU instead of CPU which is the topic of the next section.

In rlstructures, we consider that an Agent on GPU has the following properties:

  • its initial_state function must return a DictTensor stored on GPU (or and empty DictTensor)
  • its __call__ function must return DictTensors on GPU, Moreover, it considers that the observation and state arguments are also on GPU
  • Note that agent_info is assumed to be always on CPU

To assess that a RL_Agent is ok with these different constraints, we provide the RL_Agent_CheckDevice class that behaves like any provided agent, but check that the inputs and outputs of this agent are on the right device. It can be use as a debugging tool.

In our example, assuming that our agent works on GPU, we can use the replay_agent function in the loss computation by using a GPU-based agent built as follow:

Creating of the learning agent which uses the learning model. This agent may be used in replay_agent


First, note that we provide a copy_model argument such that the agents created by the batcher will maintain a copy of the model in each process. The learning agent (self.agent) does not need a copy since we want to use self.learning_model to compute gradient.

Second, if we remove the RL_Agent_CheckDevice container, our code will still work without any device check.

Batcher on GPU

If using GPU may be a good way to obtain a large speed-up, in some particular cases, it can be interesting to use GPUs when acquiring trajectories i.e at the batcher execution time. In that case, one can perfectly use one different GPU than the one used for loss computation, allowing rlstructures to deal with multiple GPUs (or instance, one GPU for the loss computation, one GPU for the training batcher, one GPU for the evaluation batcher,…)

IMPORTANT: Each process created that is using a GPU will consume a minimal amount of memory (between 300 and 700 Mb) due to CUDA initialization so it is not possible to create GPU batchers with dozens of threads. Moreover a GPU is sequential by nature, so multiple processes on the same GPU will not provide a large speedup. The best configuration in that case is to create a batcher with 1 process but N envs per process with N big.

To use batchers on GPUs we need:

  • To define a RL_Agent that works on the GPU (see previous section)
  • To define an environment that works on GPU. Such an environment will produce GPU outputs and take input on the GPU. If such an environment can be written manually, we propose a wrapper (DeviceEnv) that wrap any CPU environment, and move it to GPU
Building an environment from Gym that works on any device
  • We need to inform the Batcher that it will work on GPU at construction time, and be sure that each process will have its own copy of the model
Creating of a Batcher working on ‘batcher_device’

Remark: It appears that copying the learning model parameters to GPU:k with k>0 actually consumes memory on GPU:0. To avoid this effect, we first convert learning parameters to CPU, and then send them to the batcher.


While doing loss computation on GPU instead of CPU is very easy to implement, doing data acquisition on GPU is more complex. We advice users to focus on two settings: batcher with lot of processes on CPUs or batcher with a single process on GPU.

Research Scientist at Facebook/FAIR -- publications are my owns

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store