Motivation

  • With the increasing of data volume and accuracy requirements for practical applications, the stable autonomous guidance and control have been considered as one of the most critical.
  • Efficient tracking algorithms enable a smooth trajectory and hence a lower system power dissipation during the flight.
  • PID controller works well when process dynamics are benign and the performance requirements are modest.
  • Due to UAV multi degrees of freedom, tracking methods based on conventional control theory such as PID has limitations:
    • response time: can not treat processes with large time delay efficiently.
    • adjustment robustness: it shows poor performance for tracking problems requiring aggressive dynamic configurations, including uncertain internal disturbance compensation and imbalances retrieval.
  • Model based approach that calculates the force and torques based on UAV’s current status is complicated and rigid.
  • It is hard to obtain a high fidelity mathematical model of a UAV which has an under-actuated system with nonlinear dynamics.
  • To improve the stability and real-time control, DNN embedded on different hardware platforms.
  • Through large data training, the DNN-based control system achieves adaptability and robustness that guarantee the stability of the flight with the tolerance of unexpected disturbance.

Contributions

  • Present an actor-critic RL framework that controls UAV trajectory through a set of desired waypoints.
  • A deep neural network is constructed to learn the optimal tracking policy.
  • RL is developed to optimize the resulting tracking scheme.
  • Implement using FPGA, one single decision can be made within 0.0002s at only 0.03mW power consumption in a decision epoch.
  • Experimental results:
    • less position error.
    • less system power consumption.
    • faster attainment.

Method

Trajectory Generation

  • Waypoints are generated considering:
    • the actuator locations on the body of the quadrotor.
    • obstacles and potential collision hazards detected in the field of view of the onboard sensors.
  • Trajectory generation:
    • After waypoints are selected, a \(C^2\) trajectory is generated for the desired inertial position in time to connect these waypoints.
      • \(C^2\): continuous and twice differentiable w.r.t. time.
      • \(p_{d_t}\): position.
      • \(\dot{p}_{d_t}\): translational velocity.
      • \(\ddot{p}_{d_t}\): desired translational acceleration.
      • \(R_{d_t}\): desired attitude.
      • \(f_{d_t}\): desired UAV control thrust.

Trajectory Tracking

  • Goal: minimize the differences between desired poses and actual poses during tracking.
  • Desired state (18D): \(S_{d_t} = \{ p_{d_t}, v_{d_t}, a_{d_t}, R_{d_t} \}\).
  • Actual State (18D): \(S_t = \{ p_t, v_t, a_t, R_t \}\).

State

\[\mathbb{S}_t = \{ S_{d_t}, S_t \}.\]

Action

\[\mathbb{A}_t = \{ f_t, \tau_t \}.\]

Reward

  • Minimize the distance between the desired position and actual position.
  • The stability of the quadrotor.
  • Simply using a linear combination of \(\Delta P_t, \Delta V_t, \Delta R_t\) as the reward function will make convergence difficult in learning process.
  • Using geometrically discounted reward will prevent the accumulated reward to become infinite and make the model more tractable.
    • Therefore, define the reward at each time step following a standard normal distribution.
      • Thus, guaranteeing the largest reward is accepted when the total differences between desired trajectory and actual trajectory at time \(t\) reaches \(0\).
      • And the reward closing to \(0\) when the total differences increase.
\[r_t = \Delta P_t + \Delta V_t + \Delta R_t = \vert p_t - p_{d_t} \vert + \vert v_t - v_{d_t} \vert + \vert R_t - R_{d_t} \vert,\] \[R_t = \frac{1}{\sqrt{2 \pi}} \exp\left(- \frac{r_t^2}{2}\right).\]

Then, the total discounted reward is:

\[\mathbb{R} = \sum_{t=0}^\infty \gamma^t R_t.\]

DNN

  • The actor model is pre-trained using labeled pair data \((\mathbb{S}_t, \mathbb{A}_t)\).

Experiment

Hardware Configuration

  • FPGA: all massive parallel computations are implemented on it.
    • DSP blocks are used as multiplier.
    • Implement DNN controller (actor model).
    • Output control commands.
  • ARM: low-level control.
    • Takes control commands from FPGA as input.
    • Calculates actuations in each freedom.
    • Sends actuations to UAV and accepts flight states and sensor data from UAV via UART.

PID Implementation (baseline)

  • Three PID controller are used for each component of velocity respectively.
    • Calculate errors of each component between desired velocity and achieved velocity.
  • The achieved position \(p_t\) is calculated through environment simulation and used as feedback on current velocity.

Evaluation

  • L1-Norm of position tracking error.
  • L1-Norm of velocity tracking error.
  • Time used to complete tracking.
  • Power consumption.
  • Robustness: tracking in a noisy environment.
    • Add different levels of random Gaussian noise on the position (such noise could be used to model the effect of wind gust).

References

  • Y. Li et al., “Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning,” IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA, 2019.