Reinforcement Learning_Policy Gradient
The following notes contain Lesson 7?of the David Silver's lecture [1] and Chapter 9?of Shiyu Zhao's Mathematical Foundation of Reinforcement Learning [2].
This part originally included lots of frustrating mathematical contents. Since I have not had a good understanding yet, these contents are mainted for later discussion.





Reference
[1] https://www.davidsilver.uk/teaching/
[2] https://github.com/MathFoundationRL/Book-Mathmatical-Foundation-of-Reinforcement-Learning
標(biāo)簽:強(qiáng)化學(xué)習(xí)