散文網(wǎng) » 科技 »學習 » Reinforcement Learning_Code_Temporal Difference Learning_Frozen

Reinforcement Learning_Code_Temporal Difference Learning_Frozen

2023-04-02 22:56 作者:別叫我小紅 0人讀過 | 我要投稿

Here are some terrible code that has lots of redundancy, is not well object-oriented, and has poor results. Hope I can draw a lesson from them in the future.

RESULTS:

Visualizations?of (i) action value tables and optimal actions,?(ii)?changes in?steps and rewards?with?episodes,?and?(iii) animation results are shown below respectively.

(It should be noticed that, for some mistakes, the animation results may differ from those demonstrated by?the action value tables.)

1. Q-Learning (bootstrap, off-policy)

(1) With Epsilon-greedy Explorer

Fig. 1.(1).1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.

Fig. 1.(1).2. Changes in steps and rewards with episodes.

Fig. 1.(1).3. Animation result?with map_size = 4.

Fig. 1.(1).4. Animation result with map_size = 7.

Fig. 1.(1).5. Animation result with map_size = 9.

Fig. 1.(1).6. Animation result with map_size = 11.

(2) With Random Explorer

Fig. 1.(2).1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.

Fig. 1.(2).2. Changes in steps and rewards with episodes.

From the steps results in Fig. 1.(1). 2,?we can see that the average steps number almost does not?decrease?with the episodes. It may be caused by random explorer, who just chooses a random direction when asked to take an action?and ignores?existing?improvements in target policy.

Fig. 1.(2).3. Animation result with map_size = 11.

2.Sarsa (bootstrap, on-policy)

Fig. 2.1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.?

Fig. 2.2. Changes in steps and rewards with episodes.

Fig. 2.3. Animation result with map_size = 11.

3. Sarsa( $%5Clambda%20$ ) (bootstrap, on-policy)

Based on Sarsa, Sarsa( $%5Clambda%20$ ) introduces backward view of temporal difference and has an eligibility trace.

Fig. 3.1.? Action value tables and optimal actions with map_size = 4, 7, 9, 11.?

Fig. 3.2. Changes in steps and rewards with episodes.

Fig. 3.3. Animation result with map_size = 11.

4. Monte Carlo (not bootstrap, on-policy)

Fig. 4.1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.?

Fig. 4.2. Changes in steps and rewards with episodes.

Fig. 4.3. Animation result with map_size = 11.

CODES:

FrozenLake_bench.py

Params.py

QLearningLeaner.py

EpsilonGreedyExplorer.py

UniformExplorer.py

SarsaAgent.py

SarsaLambdaAgent.py

MonteCarolAgent.py

Visualization.py

The above codes are based on Gymnasium Documentation's tutorial "Frozenlake benchmark" and expand solutions to Sarsa, Sarsa( $%5Clambda%20$ ) and Monte Carlo algorithms.

Reference

[1]?https://gymnasium.farama.org/tutorials/training_agents/FrozenLake_tuto/

標簽：

Reinforcement Learning_Code_Temporal Difference Learning_Frozen的評論 (共條)

愛情散文傷感散文哲理散文優(yōu)美生活隨筆親情唯美句子傷感的句子現(xiàn)代詩歌空間日志經(jīng)典語句愛情句子作文大全

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

Reinforcement Learning_Code_Temporal Difference Learning_Frozen

Reinforcement Learning_Code_Temporal Difference Learning_Frozen的評論 (共條)

你可能也喜歡這些文章

最新發(fā)布的文章

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

Reinforcement Learning_Code_Temporal Difference Learning_Frozen

本文作者的其他文章

Reinforcement Learning_Code_Temporal Difference Learning_Frozen的評論 (共 條)

你可能也喜歡這些文章

最新發(fā)布的文章

Reinforcement Learning_Code_Temporal Difference Learning_Frozen的評論 (共條)