Fast and Accurate Trajectory Tracking for UAV based on DRL

Motivation

With the increasing of data volume and accuracy requirements for practical applications, the stable autonomous guidance and control have been considered as one of the most critical.
Efficient tracking algorithms enable a smooth trajectory and hence a lower system power dissipation during the flight.
PID controller works well when process dynamics are benign and the performance requirements are modest.
Due to UAV multi degrees of freedom, tracking methods based on conventional control theory such as PID has limitations:
- response time: can not treat processes with large time delay efficiently.
- adjustment robustness: it shows poor performance for tracking problems requiring aggressive dynamic configurations, including uncertain internal disturbance compensation and imbalances retrieval.
Model based approach that calculates the force and torques based on UAV’s current status is complicated and rigid.
It is hard to obtain a high fidelity mathematical model of a UAV which has an under-actuated system with nonlinear dynamics.
To improve the stability and real-time control, DNN embedded on different hardware platforms.
Through large data training, the DNN-based control system achieves adaptability and robustness that guarantee the stability of the flight with the tolerance of unexpected disturbance.

Contributions

Present an actor-critic RL framework that controls UAV trajectory through a set of desired waypoints.
A deep neural network is constructed to learn the optimal tracking policy.
RL is developed to optimize the resulting tracking scheme.
Implement using FPGA, one single decision can be made within 0.0002s at only 0.03mW power consumption in a decision epoch.
Experimental results:
- less position error.
- less system power consumption.
- faster attainment.

Method

Trajectory Generation

Waypoints are generated considering:
- the actuator locations on the body of the quadrotor.
- obstacles and potential collision hazards detected in the field of view of the onboard sensors.
Trajectory generation:
- After waypoints are selected, a \(C^2\) trajectory is generated for the desired inertial position in time to connect these waypoints.
  - \(C^2\): continuous and twice differentiable w.r.t. time.
  - \(p_{d_t}\): position.
  - \(\dot{p}_{d_t}\): translational velocity.
  - \(\ddot{p}_{d_t}\): desired translational acceleration.
  - \(R_{d_t}\): desired attitude.
  - \(f_{d_t}\): desired UAV control thrust.

Trajectory Tracking

Goal: minimize the differences between desired poses and actual poses during tracking.
Desired state (18D): \(S_{d_t} = \{ p_{d_t}, v_{d_t}, a_{d_t}, R_{d_t} \}\).
Actual State (18D): \(S_t = \{ p_t, v_t, a_t, R_t \}\).

State

\[\mathbb{S}_t = \{ S_{d_t}, S_t \}.\]

Action

\[\mathbb{A}_t = \{ f_t, \tau_t \}.\]

Reward

Minimize the distance between the desired position and actual position.
The stability of the quadrotor.
Simply using a linear combination of \(\Delta P_t, \Delta V_t, \Delta R_t\) as the reward function will make convergence difficult in learning process.
Using geometrically discounted reward will prevent the accumulated reward to become infinite and make the model more tractable.
- Therefore, define the reward at each time step following a standard normal distribution.
  - Thus, guaranteeing the largest reward is accepted when the total differences between desired trajectory and actual trajectory at time \(t\) reaches \(0\).
  - And the reward closing to \(0\) when the total differences increase.

\[r_t = \Delta P_t + \Delta V_t + \Delta R_t = \vert p_t - p_{d_t} \vert + \vert v_t - v_{d_t} \vert + \vert R_t - R_{d_t} \vert,\] \[R_t = \frac{1}{\sqrt{2 \pi}} \exp\left(- \frac{r_t^2}{2}\right).\]

Then, the total discounted reward is:

\[\mathbb{R} = \sum_{t=0}^\infty \gamma^t R_t.\]

DNN

The actor model is pre-trained using labeled pair data \((\mathbb{S}_t, \mathbb{A}_t)\).

Experiment

Hardware Configuration

FPGA: all massive parallel computations are implemented on it.
- DSP blocks are used as multiplier.
- Implement DNN controller (actor model).
- Output control commands.
ARM: low-level control.
- Takes control commands from FPGA as input.
- Calculates actuations in each freedom.
- Sends actuations to UAV and accepts flight states and sensor data from UAV via UART.

PID Implementation (baseline)

Three PID controller are used for each component of velocity respectively.
- Calculate errors of each component between desired velocity and achieved velocity.
The achieved position \(p_t\) is calculated through environment simulation and used as feedback on current velocity.

Evaluation

L1-Norm of position tracking error.
L1-Norm of velocity tracking error.
Time used to complete tracking.
Power consumption.
Robustness: tracking in a noisy environment.
- Add different levels of random Gaussian noise on the position (such noise could be used to model the effect of wind gust).

References

Y. Li et al., “Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning,” IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA, 2019.