【PPO × Family】第六課:統(tǒng)籌多智能體

來(lái)晚了!第六節(jié)課的相關(guān)鏈接已經(jīng)整理出來(lái)啦!
00: 43 關(guān)于多智能體的系統(tǒng)在生物或機(jī)器人合作的場(chǎng)景,大家可以搭配一下 link 進(jìn)行了解:
- https://www.earth.com/news/fish-swim-in-schools-to-save-energy/
- https://twitter.com/Interior/status/1519073932992778244
- https://www.newscientist.com/article/2357548-us-military-plan-to-create-huge-autonomous-drone-swarms-sparks-concern/
- https://www.nist.gov/programs-projects/performance-collaborative-robot-systems
01: 05 想要了解更多關(guān)于 SMAC(在星際爭(zhēng)霸2上進(jìn)行多智能體協(xié)同強(qiáng)化學(xué)習(xí)的環(huán)境)的解釋和教學(xué),大家可以在他們的GitHub倉(cāng)庫(kù)和DI-engine中找到:
- https://github.com/oxwhirl/smac
- https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html
02: 18 想要了解更詳細(xì)的多智能體協(xié)作中會(huì)涉及到的領(lǐng)域,大家可以參考:
https://www.karltuyls.net/wp-content/uploads/2020/06/MA-DM-ICML-ACAI.pdf
02: 55 關(guān)于多智能體決策的通用設(shè)定更詳細(xì)的解釋,大家可以參考:
https://www.karltuyls.net/wp-content/uploads/2020/06/MA-DM-ICML-ACAI.pdf
04: 03 對(duì)于 Dec-POMDP 的具體介紹,大家可以參考 Link:
http://rbr.cs.umass.edu/camato/decpomdp/overview.html
08: 06 具體關(guān)于值分解中的 VDN 和 QMIX 算法的解釋,大家可以參考論文:
- VDN:https://arxiv.org/pdf/1706.05296.pdf
- QMIX:https://arxiv.org/pdf/1803.11485.pdf
08: 33 關(guān)于值分解系列方法,感興趣的同學(xué)可以通過(guò)補(bǔ)充材料進(jìn)行參考:
https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_value_dec.pdf
09: 41 有關(guān)于值分解方法可能失效的情形的具體解釋,大家可以參考論文:
- QTRAN:https://arxiv.org/pdf/1905.05408.pdf
- QPLEX:https://arxiv.org/pdf/2008.01062.pdf
10: 28 想要更詳細(xì)了解 MAPG 方法所面臨的挑戰(zhàn),大家可以參考論文:
- https://arxiv.org/pdf/2108.08612.pdf
- https://arxiv.org/pdf/2008.01062.pdf
12: 29 關(guān)于 MAPPO 的詳細(xì)介紹,想要了解的同學(xué)可以參考論文:
https://arxiv.org/pdf/2103.01955.pdf
14: 09 一鍵切換 IPPO 和 MAPPO 的完整代碼示例,可以在他們的GitHub倉(cāng)庫(kù)中找到:
- 完整示例:https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_application_demo.py
- 參考教學(xué)文檔:https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/marl_zh.html#di-engine-marl
14: 21 關(guān)于 MPE 環(huán)境的詳細(xì)解釋和示例,大家可以搭配 Link:
https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html
14: 53 PPO + MPE 實(shí)踐的視頻示例完整demo,可以在他們的GitHub倉(cāng)庫(kù)中找到:
https://github.com/opendilab/PPOxFamily/issues/62
16: 43 詳細(xì)的 TRPO/PPO 的特點(diǎn)分析,可以在第一節(jié)課的文字稿中查看:
https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_manuscript.pdf
18: 18 有關(guān)于 HATRPO/HAPPO 的詳細(xì)理論解釋,大家可以參考論文:
https://arxiv.org/pdf/2109.11251.pdf
21: 02 對(duì) HATRPO/HAPPO 訓(xùn)練流程感興趣的同學(xué),可以參考補(bǔ)充材料:
https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_happo.pdf
21: 33 想要了解更多關(guān)于 PPO + MA MuJoCo 的實(shí)踐介紹和 MuJoCo 的教學(xué),大家可以在他們的GitHub倉(cāng)庫(kù)和DI-engine中找到:
- https://github.com/schroederdewitt/multiagent_mujoco
- https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html
22: 31 有關(guān)于?Bi-DexHands 的詳細(xì)介紹,可以參考GitHub Link:
https://github.com/PKU-MARL/DexterousHands
22: 58 詳細(xì)的 MAT 架構(gòu)分析和解釋,可以參考論文:
https://proceedings.neurips.cc/paper_files/paper/2022/file/69413f87e5a34897cd010ca698097d0a-Paper-Conference.pdf
24: 05 對(duì)共享參數(shù)?Param. Sharing 的相關(guān)內(nèi)容感興趣的同學(xué),可以參考論文:
https://arxiv.org/pdf/2102.07475.pdf
25: 07 有關(guān)于掩碼 Various Mask 的詳細(xì)介紹,大家可以參考論文:
https://arxiv.org/pdf/2006.14171.pdf
26: 28 想要了解更多關(guān)于ACE的解釋和操作,可以參考他們提供的補(bǔ)充材料:
https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_supp_ace.pdf