My install environment:

  • Ubuntu 14.04
  • Anaconda 3
  • Python 3.6
  • TensorFlow-gpu 1.13.1
  • Nvidia Driver 430.26
  • Cuda 10.0
  • Cudnn 7.6

MuJoCo (Multi-Joint dynamics with Contact)

MuJoCo 2.0

  1. Download: https://www.roboti.us/index.html.
  2. Get License: https://www.roboti.us/license.html.
    get the computer id:
    
    chmod +x getid_linux
    ./getid_linux
    
  3. Unzip the download according to:
    mkdir ~/.mujoco
    cp mujoco200_linux.zip ~/.mujoco
    cd ~/.mujoco
    unzip mujoco200_linux.zip
    mv mujoco200_linux mujoco200
    
  4. Copy mjkey.txt to ~/.mujoco and ~/.mujuco/mujoco200/bin:
    cp mjkey.txt ~/.mujoco
    cp mjkey.txt ~/.mujoco/mujoco200/bin
    
  5. Add the environment variables to ~/.bashrc:
    gedit ~/.bashrc
    
    export LD_LIBRARY_PATH=/home/csy/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export MUJOCO_KEY_PATH=/home/csy/.mujoco${MUJOCO_KEY_PATH}
    
  6. Test MuJoCo:
    cd ~/.mujoco/mujoco200/bin
    ./simulate ../model/humanoid.xml
    

mujoco-py

  1. Create a virtual environment use Anaconda and activate the environment:
    conda create -n mujoco-gym python=3.6
    conda activate mujoco-gym
    
  2. Install mujoco-py:
    cd drl/mujoco-gym
    git clone https://github.com/openai/mujoco-py.git
    cd mujoco-py
    pip install -e .
    
  3. If something error, for example, no file named 'patchelf', then, install it:
    Download: https://nixos.org/releases/patchelf/patchelf-0.9/patchelf-0.9.tar.gz
    cd patchelf-0.9
    ./configure
    make
    sudo make install
    
    patchelf --version
    
  4. Test mujoco-py:
    $ python
    import mujoco_py
    import os
    mj_path, _ = mujoco_py.utils.discover_mujoco()
    xml_path = os.path.join(mj_path, 'model', 'humanoid.xml')
    model = mujoco_py.load_model_from_path(xml_path)
    sim = mujoco_py.MjSim(model)
    
    print(sim.data.qpos)
    # [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    
    sim.step()
    print(sim.data.qpos)
    # [-2.09531783e-19  2.72130735e-05  6.14480786e-22 -3.45474715e-06
    #   7.42993721e-06 -1.40711141e-04 -3.04253586e-04 -2.07559344e-04
    #   8.50646247e-05 -3.45474715e-06  7.42993721e-06 -1.40711141e-04
    #  -3.04253586e-04 -2.07559344e-04 -8.50646247e-05  1.11317030e-04
    #  -7.03465386e-05 -2.22862221e-05 -1.11317030e-04  7.03465386e-05
    #  -2.22862221e-05]
    

Gym

Installation

  1. Install Gym:
    pip install gym[all]
    
  2. Test Gym:
    import gym
    env = gym.make('Humanoid-v2')
    
    from gym import envs
    print(envs.registry.all())    # print the available environments
    
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    
    for i_episode in range(200):
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()    # take a random action
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break
    env.close()
    

If something error like this:

ERROR: GLEW initalization error: Missing GL version

Install the following package:

sudo apt-get install libglew-dev

Then, add the following path to ~/.bashrc:

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so

Usage of Gym

Gym is a toolkit (or environments) for developing and comparing RL algorithms.

Every environment has the following functions:

  • step(action): the function returns four values.
    • observation: observation of the environment after action, eg. pixel data, joint angles.
    • reward: amount of reward achieved by the previous action.
    • done: if True, this episode has terminated, and call the function reset().
    • info: diagnostic information useful for debugging.
  • reset(): return the initial observation.
  • render(mode = 'human')
  • close()

Also, we can create new environments for Gym, reference here.

Baselines

  1. Install some require packages:
    sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev
    
  2. Download baselines:
    git clone https://github.com/openai/baselines.git
    cd baselines
    
  3. Install TensorFlow:
    nvidia-smi   # make sure the Nvidia dirver has been installed
    cat /usr/local/cuda/version.txt    # check the version of cuda
    cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2    # check the version of cudnn
    
    pip install tensorflow-gpu -i https://pypi.doubanio.com/simple/    # this can increase the download speed
    
  4. Install baselines:
    pip install -e .
    
  5. Test the installation:
    pip install pytest    # this package is used to test the 'test*' files in baselines
    pytest    # maybe some errors, packages need to be installed, such as matplotlib, pandas, etc.
    
  6. Training models:
    python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]
    
    python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7    # training PPO with MuJoCo Humanoid
    

Packages in mujoco-gym viual environment

After all the above installed, the file environment.yml of the environment is as follows:

name: mujoco-gym
channels:
  - defaults
dependencies:
  - ca-certificates=2019.5.15=0
  - certifi=2019.3.9=py36_0
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - ncurses=6.1=he6710b0_1
  - openssl=1.1.1c=h7b6447c_1
  - pip=19.1.1=py36_0
  - python=3.6.8=h0371630_0
  - readline=7.0=h7b6447c_5
  - setuptools=41.0.1=py36_0
  - sqlite=3.28.0=h7b6447c_0
  - tk=8.6.8=hbc83047_0
  - wheel=0.33.4=py36_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - absl-py==0.7.1
    - alabaster==0.7.12
    - astor==0.8.0
    - atari-py==0.1.15
    - atomicwrites==1.3.0
    - attrs==19.1.0
    - babel==2.7.0
    - backcall==0.1.0
    - bleach==1.5.0
    - cffi==1.12.3
    - chardet==3.0.4
    - click==7.0
    - cloudpickle==1.2.1
    - cycler==0.10.0
    - cython==0.29.10
    - decorator==4.4.0
    - docutils==0.14
    - enum34==1.1.6
    - filelock==3.0.12
    - future==0.17.1
    - gast==0.2.2
    - glfw==1.8.1
    - grpcio==1.20.1
    - gym==0.12.5
    - h5py==2.9.0
    - html5lib==0.9999999
    - idna==2.8
    - imagehash==4.0
    - imageio==2.5.0
    - imagesize==1.1.0
    - importlib-metadata==0.18
    - ipdb==0.12
    - ipython==7.5.0
    - ipython-genutils==0.2.0
    - jedi==0.13.3
    - jinja2==2.10.1
    - joblib==0.13.2
    - keras-applications==1.0.7
    - keras-preprocessing==1.0.9
    - kiwisolver==1.1.0
    - lockfile==0.12.2
    - markdown==3.1.1
    - markupsafe==1.1.1
    - matplotlib==3.1.0
    - mock==3.0.5
    - more-itertools==7.0.0
    - mpi4py==3.0.2
    - numpy==1.16.4
    - numpydoc==0.9.1
    - opencv-python==4.1.0.25
    - packaging==19.0
    - pandas==0.24.2
    - parso==0.4.0
    - pexpect==4.7.0
    - pickleshare==0.7.5
    - pillow==6.0.0
    - pluggy==0.12.0
    - prompt-toolkit==2.0.9
    - protobuf==3.7.1
    - ptyprocess==0.6.0
    - py==1.8.0
    - pycparser==2.19
    - pyglet==1.3.2
    - pygments==2.4.2
    - pyparsing==2.4.0
    - pytest==4.6.3
    - pytest-instafail==0.3.0
    - python-dateutil==2.8.0
    - pytz==2019.1
    - pywavelets==1.0.3
    - requests==2.22.0
    - scipy==1.3.0
    - six==1.12.0
    - snowballstemmer==1.2.1
    - sphinx==2.1.2
    - sphinx-rtd-theme==0.4.3
    - sphinxcontrib-applehelp==1.0.1
    - sphinxcontrib-devhelp==1.0.1
    - sphinxcontrib-htmlhelp==1.0.2
    - sphinxcontrib-jsmath==1.0.1
    - sphinxcontrib-qthelp==1.0.2
    - sphinxcontrib-serializinghtml==1.1.3
    - tensorboard==1.13.1
    - tensorflow-estimator==1.13.0
    - tensorflow-gpu==1.13.1
    - tensorflow-tensorboard==0.4.0
    - termcolor==1.1.0
    - tqdm==4.32.2
    - traitlets==4.3.2
    - urllib3==1.25.3
    - wcwidth==0.1.7
    - werkzeug==0.15.4
    - zipp==0.5.1
prefix: /home/csy/anaconda3/envs/mujoco-gym

References