Fast and Accurate Trajectory Tracking for UAV based on DRL
Motivation
- With the increasing of data volume and accuracy requirements for practical applications, the
stable autonomous guidance and control
have been considered as one of the most critical. - Efficient tracking algorithms enable a
smooth trajectory
and hence alower system power dissipation
during the flight. PID
controller works well when process dynamics arebenign
and the performance requirements aremodest
.- Due to UAV
multi degrees of freedom
, tracking methods based on conventional control theory such asPID
has limitations:response time
: can not treat processes withlarge time delay
efficiently.adjustment robustness
: it shows poor performance for tracking problems requiringaggressive dynamic
configurations, includinguncertain internal disturbance compensation
and imbalances retrieval.
Model based approach
that calculates the force and torques based on UAV’s current status iscomplicated and rigid
.- It is hard to obtain a high fidelity
mathematical model
of a UAV which has anunder-actuated
system withnonlinear dynamics
. - To improve the
stability and real-time control
,DNN
embedded on different hardware platforms. - Through large data training, the
DNN-based control
system achievesadaptability
androbustness
that guarantee the stability of the flight with thetolerance of unexpected disturbance
.
Contributions
- Present an actor-critic RL framework that controls UAV trajectory through a set of desired waypoints.
- A deep neural network is constructed to learn the optimal
tracking policy
. RL
is developed to optimize the resulting tracking scheme.- Implement using FPGA, one single decision can be made within
0.0002s
at only0.03mW
power consumption in a decision epoch. - Experimental results:
- less position error.
- less system power consumption.
- faster attainment.
Method
Trajectory Generation
Waypoints
are generated considering:- the actuator locations on the body of the quadrotor.
obstacles and potential collision
hazards detected in the field of view of the onboard sensors.
Trajectory generation
:- After waypoints are selected, a \(C^2\) trajectory is generated for the desired inertial position in time to connect these waypoints.
- \(C^2\): continuous and twice differentiable w.r.t. time.
- \(p_{d_t}\): position.
- \(\dot{p}_{d_t}\): translational velocity.
- \(\ddot{p}_{d_t}\): desired translational acceleration.
- \(R_{d_t}\): desired attitude.
- \(f_{d_t}\): desired UAV control thrust.
- After waypoints are selected, a \(C^2\) trajectory is generated for the desired inertial position in time to connect these waypoints.
Trajectory Tracking
Goal
: minimize the differences betweendesired poses
andactual poses
during tracking.Desired state
(18D): \(S_{d_t} = \{ p_{d_t}, v_{d_t}, a_{d_t}, R_{d_t} \}\).Actual State
(18D): \(S_t = \{ p_t, v_t, a_t, R_t \}\).
State
\[\mathbb{S}_t = \{ S_{d_t}, S_t \}.\]Action
\[\mathbb{A}_t = \{ f_t, \tau_t \}.\]Reward
- Minimize the
distance
between the desired position and actual position. - The
stability
of the quadrotor. - Simply using a
linear combination
of \(\Delta P_t, \Delta V_t, \Delta R_t\) as the reward function will makeconvergence difficult
in learning process. - Using
geometrically discounted reward
will prevent the accumulated reward to become infinite and make the model more tractable.- Therefore, define the reward at each time step following a
standard normal distribution
.- Thus, guaranteeing the
largest reward
is accepted when the total differences between desired trajectory and actual trajectory at time \(t\) reaches \(0\). - And the reward closing to \(0\) when the
total differences increase
.
- Thus, guaranteeing the
- Therefore, define the reward at each time step following a
Then, the total discounted reward is:
\[\mathbb{R} = \sum_{t=0}^\infty \gamma^t R_t.\]DNN
- The actor model is pre-trained using
labeled
pair data \((\mathbb{S}_t, \mathbb{A}_t)\).
Experiment
Hardware Configuration
FPGA
: all massive parallel computations are implemented on it.- DSP blocks are used as multiplier.
- Implement DNN controller (actor model).
- Output
control commands
.
ARM
: low-level control.- Takes
control commands
from FPGA as input. - Calculates actuations in each freedom.
- Sends actuations to UAV and accepts flight states and sensor data from UAV via
UART
.
- Takes
PID Implementation (baseline)
- Three PID controller are used for each component of velocity respectively.
- Calculate errors of each component between
desired velocity
andachieved velocity
.
- Calculate errors of each component between
- The
achieved position
\(p_t\) is calculated through environment simulation and used asfeedback
on current velocity.
Evaluation
- L1-Norm of
position tracking error
. - L1-Norm of
velocity tracking error
. Time
used to complete tracking.Power
consumption.Robustness
: tracking in a noisy environment.- Add different levels of random Gaussian noise on the position (such noise could be used to model the effect of wind gust).
References
- Y. Li et al., “Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning,” IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA, 2019.