A Survey of UAV Simulation With Reinforcement Learning

Simulation is an invaluable tool for the robotics researcher. In allows developing and testing algorithms in a safe and inexpensive manner, without having to worry about the time-consuming and expensive process of dealing with real-world hardware.

They allow engineers to identify errors early in the develpoment process.
Simulation systems provide not only massive amounts of data, but also the labels required for training algorithms.
Provide a safe environment for learning from experience useful for RL methods.

The ideal simulator has three main characteristics:

Fast, to collect a large amount of data with limited time and compute, such as Mujoco.
Physically-accurate, to represent the dynamics of the real world with high-fidelity.
Photo-realistic, to minimize the discrepancy between simulated and real-world sensors’ observations.

PX4 Simulation Doc

Simulators

AirSim

Code
- Train quadrotors to follow high tension power lines
Blog
- Python
- State: 29D
- Action: 3D-[Roll, Pitch, Thrust]

Air Learning

Code
Python
State: Image, UAV state
Action: Forward, Left, Right, Back

GymFC

Code
Python
State: rotor speed, angular velocity error
Action: control signals of each motor

Ethz reinmav-gym

Code
Python
State: Position, quaternion, velocity
Action: angular velocity, thrust

RAI

Code
C++
State: Rotation Matrix, position, linear velocity, angular velocity
Action: rotor thrusts

Drone Racing Simulators

Velocidrone: https://www.velocidrone.com/
The Drone Racing League (DRL): https://thedroneracingleague.com/
Liftoff: https://www.liftoff-game.com/
Unity: https://unity.com/

PaddlePaddle/RLSchool

Code
DDPG for UAV velocity control
PARL-PPO for UAV control
State: sensor measurements, flighting state and task related state.
Action: voltage value of four propeller motors, each value is in range \([0.1,15.0]\).

Flightmare

https://uzh-rpg.github.io/flightmare/
Code
Two main components (decoupled and run independently):
- A configurable rendering engine built on Unity, up to 230Hz for rendering block.
- A flexible physics engine for dynamics simulation, up to 200000Hz for dynamics block.
Trade-off between accuracy and speed by the end-users.
The interface between the rendering engine and the quadrotor dynamics is implemented using high-performance asynchronous messaging library ZeroMQ.
Multi-modal sensor suite:
- Visual: RGB, depth, semantic segmentation.
- IMU.
- 3D point-cloud of the scene.
API for RL, use the python wrapper to implement OpenAI-Gym style interface for RL tasks.
Interface for multi-agent simulation, which can simulate hundreds of quadrotors in parallel.
Integrate with a virtual-reality headset for interaction with the environment.
Used for the following quadrotor tasks:
- quadrotor control policy learning.
- quadrotor path-planning in a complex 3D environment.

RotorS VS Hector VS AirSim VS CARLA VS FlightGoggles VS Flightmare

RotorS
- built on Gazebo with ROS.
- provides several quadrotors such as AscTec Hummingbird, Pelican, and Firefly.
- used for path-planning, mapping, exploration, etc.
- Gazebo has limited rendering capabilities and is not designed for efficient parallel dynamics simulation.
Hector
- built on Gazebo with ROS.
- used for autonomous mapping and navigation with rescue robots.
AirSim
- Photo-realistic simulator built on Unreal Engine.
- limited simulation speed, difficult to apply it to model-free RL tasks (e.g. training an end-to-end control policy for quadrotor stabilization or flying through a fast moving gate).
CARLA
- Photo-realistic simulator built on Unreal Engine.
- mainly made for autonomous driving research and only provides dynamics of ground vehicles.
FlightGoggles
- photo-realistic sensor simulator for perception-driven robotic vehicles.
- exteroceptive sensors:
  - RGB-D cameras.
    - distortion-free.
    - camera projection model with optional motion blur, lens dirt, auto-exposure, and bloom.
    - parameters can be changed via API using ROS param or LCM config.
  - time-of-flight distance sensors.
    - a downward-facing single-point range finder for altitude estimation.
  - infrared radiation (IR) beacon sensors.
    - provide image-space measurements of IR beacons in the camera’s field of view.
    - the beacons can be placed at static locations or on the moving objects.
- two separate components (modular architecture):
  - a photo-realistic rendering engine built on Unity.
    - utilize position and orientation information of vehicle to simulate camera imagery and exteroceptive sensors, and to detect collisions (using polygon colliders).
    - dynamic elements, such as moving obstacles, lights, vehicles, and human actors can be added.
  - a quadrotor dynamics simulation implemented in C++.
    - vehicle state is updated at 960Hz.
- vehicle-in-the-loop simulation (use motion capture system).
  - circumventing the need to estimate complex and hard-to-model interactions such as aerodynamics, motor mechanics, battery electrochemistry, and behavior of other agents.
  - acquire the pose of the vehicle in real time.
  - real dynamics, real inertial sensing.
  - can be seen as an extension of customary hardware-in-the-loop configurations.
- API provide:
  - dynanics states.
  - control inputs.
  - sensor outputs.
- Message interface:
  - ROS.
  - LCM.
- useful for rendering camera images given trajectories and inertial measurements from flying vehicles in real-world.
- decoupling the dynamics modelling from the photo-realistic rendering engine.
- Applications:
  - visual inertial navigation research for fast and agile vehicles:
    - Visual inertial odometry (VIO) to estimate the vehicle state.
    - change environment and camera parameters and thereby enables us to quickly verify VIO performance over a multitude of scenarios.
  - human-vehicle interaction.
  - active sensor selection.
  - multi-agent systems.
  - AlphaPilot challenge:
    - an autonomous drone racing challenge.
    - test the autonomous guidence, navigation, and control capability in a realistic simulation environment.
    - sensors data provide (via ROS API):
      - (stereo) cameras.
      - IMU.
      - downward-facing time-of-flight range sensor.
      - infrared gate beacons.
    - autonomous systems obtains sensor measurements and provide collective thrust and attitude rate inputs to the quadrotor’s low-level rate mode controller.
      - methods:
        
        end-to-end learning based method.
        
        traditional pipelines: estimation, planning, and control.
        
        estimation: Kalman filter, ROVIO, VINS-Mono, etc.
        
        planning: visual servo using infrared beacons, polynomial trajectory planning, manually-defined waypoints, sampling-based techniques for building trajectory libraries.
        
        control: linear control, model predictive control, geometric and backstepping control.
    - environment: FlightGoggles Abandoned Factory.
      - the exact gate locations were subject to random unknown perturbations.
    - Score = 10 X gates - time.

Rendering Engine

built with Unity.
various high-quality 3D environments: warehouse, nature forest, etc.
- users can add environment perturbation, such as wind.
A new environment or asset can easily be created or directly purchased from the Unity Asset Store.
Sensors:
- RGB cameras with ground-truth depth and semantic segmentation.
  - users can change the camera intrinsics such as field of view, focal length, and lens distortion.
  - snesor noise: also can simulate physical effects on the camera including motion blur, lens dirt, and bloom.
- Rangefinders.
- Collision detection between agents and its surroundings.
Provide a graphical user interface (GUI) as well as a C++ API for users to extract ground-truth point clouds of the environment.

Dynamic Modelling

A gazebo-based quadrotor dynamics, slower but more realistic.
- basic model: noise-free.
- more advanced rigid-body dynamics: including friction and rotor drag.
Real-world dynamics, offers the interface to combine real-world dynamics with photo-realistic rendering.
- inertial sensing and motor encoders are directly depend on the physics model.
A parallelized implementation of classical quadrotor dynamics, useful for large-scale RL applications.
control modes:
- body-rate mode: implement a low-level controller for tracking the desired body rates.
- rotor-thrusts mode: low-level controller generates desired rotor thrusts for each motor.

Learning a Policy for Quadrotor Control

Three tasks:

stabilize a quadrotor from randomly initialized poses.
- state: \((p, \theta, v)\).
- action: \((c, \omega_x, \omega_y, \omega_z)\).
stabilize a quadrotor from randomly initialized poses under a single motor failure.
- state: \((p, \theta, v, \omega)\).
- action: \((f_1, f_2, f_3)\).
control a quadrotor to fly through static gates as fast as possible.
- state: \((p, \theta, v, \omega, p_{gate}, \theta_{gate})\).
- action: \((f_1, f_2, f_3, f_4)\).

Method:

Train neural network controllers for each task using PPO algorithm and OpenAI stable-baselines implementation.
Simulate 100 quadrotors in parallel for trajectory sampling and collect in total 25 million time-steps for each task.

Point Cloud and Path Planning

Provide an interface to export the 3D information of the full environment as point cloud with any desired resolution.
illustrate a section of the complex nature forest environment, with a resolution of 0.1m and contains detailed 3D structure information of the forest.
compute the shortest collision-free path between two points, from point A to point B.
run the OMPL on the point-cloud extracted from the forest with a default solver for path-planning.

Other Applications

Virtual reality and safe human-robot interaction.
Can be used to study the implications of large scale-multi robot systems.
Can be extremely useful for testing odometry and SLAM systems.
Can also be used to learn deep sensorimotor policies via imitation learning.