Applications

  • DL: No action
    • Classification
    • Regression
  • DRL: Discrete or Continuous actions
    • Robotic Perception and Control

Implementation

The difference of processes between DL and DRL
Process DL DRL
Input Dataset (states \( x \) and labels \( y \)) Trajectories \( \{(s,a,r,s',d), \cdots\} \)
NN CNN (\( w,b \)) NN (\( \phi,\theta \))
Output estimated \( y_{\_} \) Q-function value \( Q_\phi \), Policy \( \pi_\theta \)
Loss Function Cross-entropy, MSE Rewards, MSBE (\( Q_\phi - Q^* \))
Optimizer SGD, Adam DQN, DDPG, TRPO, PPO
Evaluation Accuracy compute based on Test-dataset Gym, Mujoco, Tensorboard