Getting OpenAI Gym environments to render properly in remote environments such as Google Colab and Binder turned out to be more challenging than I expected. In this post I lay out my solution in the hopes that I might save others time and effort to work it out independently.

Google Colab Preamble

If you wish to use Google Colab, then this section is for you! Otherwise, you can skip to the next section for the Binder Preamble.

Install X11 system dependencies

Install necessary X11 dependencies, in particular Xvfb, which is an X server that can run on machines with no display hardware and no physical input devices.

!apt-get install -y xvfb x11-utils

Install additional Python dependencies

Now that you have installed Xvfb, you need to install a Python wrapper pyvirtualdisplay in order to interact with Xvfb virtual displays from within Python. Next you need to install the Python bindings for OpenGL: PyOpenGL and PyOpenGL-accelerate. The former are the actual Python bindings, the latter is and optional set of C (Cython) extensions providing acceleration of common operations for slow points in PyOpenGL 3.x.

!pip install pyvirtualdisplay==0.2.* PyOpenGL==3.1.* PyOpenGL-accelerate==3.1.*

Install OpenAI Gym

Next you need to install the OpenAI Gym package. Note that depending on which Gym environment you are interested in working with you may need to add additional dependencies. Since I am going to simulate the LunarLander-v2 environment in my demo below I need to install the box2d extra which enables Gym environments that depend on the Box2D physics simulator.

!pip install gym[box2d]==0.17.*

Create a virtual display in the background

Next you need to create a virtual display in the background which the Gym Envs can connect to for rendering purposes. You can check that there is no display at present by confirming that the value of the DISPLAY environment variable has not yet been set.

!echo $DISPLAY

The code in the cell below creates a virtual display in the background that your Gym Envs can connect to for rendering. You can adjust the size of the virtual buffer as you like but you must set visible=False when working with Xvfb.

This code only needs to be run once per session to start the display.

import pyvirtualdisplay


_display = pyvirtualdisplay.Display(visible=False,  # use False with Xvfb
                                    size=(1400, 900))
_ = _display.start()

After running the cell above you can echo out the value of the DISPLAY environment variable again to confirm that you now have a display running.

!echo $DISPLAY

For convenience I have gathered the above steps into two cells that you can copy and paste into the top of you Google Colab notebooks.

%%bash

# install required system dependencies
apt-get install -y xvfb x11-utils

# install required python dependencies (might need to install additional gym extras depending)
pip install gym[box2d]==0.17.* pyvirtualdisplay==0.2.* PyOpenGL==3.1.* PyOpenGL-accelerate==3.1.*

import pyvirtualdisplay


_display = pyvirtualdisplay.Display(visible=False,  # use False with Xvfb
                                    size=(1400, 900))
_ = _display.start()

Binder Preamble

If you wish to use Binder, then this section is for you! Although there really isn't much of anything that needs doing.

No additional installation required!

Unlike Google Colab, with Binder you can bake all the required dependencies (including the X11 system dependencies!) into the Docker image on which the Binder instance is based using Binder config files. These config files can either live in the root directory of your Git repo or in a binder sub-directory as is this case here. If you are interested in learning more about Binder, then check out the documentation for BinderHub which is the underlying technology behind the Binder project.

# config file for system dependencies
!cat ../binder/apt.txt

freeglut3-dev
xvfb
x11-utils

# config file describing the conda environment
!cat ../binder/environment.yml

name: null

channels:
  - conda-forge
  - defaults

dependencies:
  - gym-box2d=0.17
  - jupyterlab=2.0
  - matplotlib=3.2
  - pip=20.0
  - python=3.7
  - pyvirtualdisplay=0.2

# config file containing python deps not avaiable via conda channels
!cat ../binder/requirements.txt

PyOpenGL==3.1.*
PyOpenGL-accelerate==3.1.*

Create a virtual display in the background

Next you need to create a virtual display in the background which the Gym Envs can connect to for rendering purposes. You can check that there is no display at present by confirming that the value of the DISPLAY environment variable has not yet been set.

!echo $DISPLAY

The code in the cell below creates a virtual display in the background that your Gym Envs can connect to for rendering. You can adjust the size of the virtual buffer as you like but you must set visible=False when working with Xvfb.

This code only needs to be run once per session to start the display.

import pyvirtualdisplay


_display = pyvirtualdisplay.Display(visible=False,  # use False with Xvfb
                                    size=(1400, 900))
_display.start()

After running the cell above you can echo out the value of the DISPLAY environment variable again to confirm that you now have a display running.

!echo $DISPLAY

Demo

Just to prove that the above setup works as advertised I will run a short simulation. First I will define an Agent that chooses an action randomly from the set of possible actions and the define a function that can be used to create such agents.

import typing

import numpy as np


# represent states as arrays and actions as ints
State = np.array
Action = int

# agent is just a function! 
Agent = typing.Callable[[State], Action]


def uniform_random_policy(state: State,
                          number_actions: int,
                          random_state: np.random.RandomState) -> Action:
    """Select an action at random from the set of feasible actions."""
    feasible_actions = np.arange(number_actions)
    probs = np.ones(number_actions) / number_actions
    action = random_state.choice(feasible_actions, p=probs)
    return action


def make_random_agent(number_actions: int,
                      random_state: np.random.RandomState = None) -> Agent:
    """Factory for creating an Agent."""
    _random_state = np.random.RandomState() if random_state is None else random_state
    return lambda state: uniform_random_policy(state, number_actions, _random_state)

In the cell below I wrap up the code to simulate a single epsiode of an OpenAI Gym environment. Note that the implementation assumes that the provided environment supports rgb_array rendering (which not all Gym environments support!).

import gym
import matplotlib.pyplot as plt
from IPython import display


def simulate(agent: Agent, env: gym.Env) -> None:
    state = env.reset()
    img = plt.imshow(env.render(mode='rgb_array'))
    done = False
    while not done:
        action = agent(state)
        img.set_data(env.render(mode='rgb_array')) 
        plt.axis('off')
        display.display(plt.gcf())
        display.clear_output(wait=True)
        state, reward, done, _ = env.step(action)       
    env.close()

Finally you can setup your desired environment...

lunar_lander_v2 = gym.make('LunarLander-v2')
_ = lunar_lander_v2.seed(42)

...and run a simulation!

random_agent = make_agent(lunar_lander_v2.action_space.n, random_state=None)
simulate(random_agent, lunar_lander_v2)

Currently there appears to be a non-trivial amount of flickering during the simulation. Not entirely sure what is causing this undesireable behavior. If you have any idea how to improve this, please leave a comment below. I will be sure to update this post accordingly if I find a good fix.