Based on rlstructures v0.2

Other Tutorials:

The described techniques are available in the rlalgos/reinforce_device directory which illustrates the use of GPUs for REINFORCE. Note that the ability to use GPU (for loss computation) is also provided for DQN and A2C in the repository.

Let us restart from the REINFORCE implementation provided in a previous tutorial. In such an implementation, GPU can occur at three different locations to speed-up the learning process:

  • At the loss computation level to accelerate loss computation.


Based on rlstructures v0.2

Other Tutorials:

In this tutorial, we describe how we can implement actor-critic methods.

  • How we can use auto-reset environments to avoid wasting computation time
  • Explain how the actor-critic loss can be computed with recurrent architectures

Recurrent Architectures and Policies

A first step is to implement the underlying model and corresponding agent. In our case, the model is a classic recurrent model that outputs at each timestep both action probabilities, and a critic value. Note that the critic and action. …


Based on rlstructures v0.2

Other Tutorials:

In this tutorial, we illustrates the flexibility of rlstructures by showing how the previous implementation of REINFORCE can be easily modified to implement a completely different model used in a unsupervised-RL setting, where multiple policies are learned simultaneously. The model implemented is the DIAYN model proposed in https://arxiv.org/abs/1802.06070 (but in a REINFORCE version)

@inproceedings{DBLP:conf/iclr/EysenbachGIL19, author = {Benjamin Eysenbach and Abhishek Gupta and Julian Ibarz and Sergey Levine}, title = {Diversity is All You Need…


Based on rlstructures v0.2

Other Tutorials:

In this tutorial we:

  1. Implement a parallelized version of REINFORCE using rlstructures

2. Show how we can add a parallel evaluation of the learned policy without slowing down the learning process (aka as fast as possible evaluation)

The complete source code is available in the rlstructures tutorial repository: http://github.com/facebookresearch/rlstructures

A quick note about logging

rlstructures proposes a rlalgos.logger object that is able to log scalars, images, text, etc both in tensorboard format but also in CSV format for future…


Based on rlstructures v0.2

Other Tutorials:

In this article, we detail the different concepts used by rlstructures that will allow anyone to implement its own RL algorithm. The concepts are:

  1. Data structures: rlstructures provide two main data structures that are used everywhere, namely DictTensor and TemporalDictTensor (and Trajectories that are just a pair of one DictTensor and one TemporalDictTensor)
  2. Agent API: the agent API allows one to implement policies acting on a batch of environments in a simple way
  3. Batcher…

Based on rlstructures v0.2

Link: https://github.com/facebookresearch/rlstructures

Other Tutorials:

Introduction

This tutorial is the first one of a series of tutorials that explain how rlstructures can be used to implement complex RL algorithms that work at scale (multiple CPUs, multiple GPUs). We will publish new tutorials every one or two weeks focused on classical RL algorithms, but also on non-conventional ones (e.g hierarchical RL, unsupervised RL) and non-conventional application domains (e.g RL for computer vision, RL for compilers optimization, etc.)

We encourage…

Ludovic Denoyer

Research Scientist at Facebook/FAIR -- publications are my owns

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store