深度解析:最強(qiáng)AI引擎AlphaZero是怎樣學(xué)習(xí)國(guó)際象棋的?

轉(zhuǎn)載自微信公眾號(hào):國(guó)際象棋動(dòng)態(tài)?? 歡迎訂閱!
?Alphazero是到底如何學(xué)習(xí)國(guó)際象棋的呢?它到底是如何做出走某一步棋的決定的?它是如何看待‘王的安全’或者‘子力協(xié)調(diào)性’這種概念?它到底怎樣學(xué)習(xí)開(kāi)局,它對(duì)開(kāi)局的理解和人類現(xiàn)有的開(kāi)局理論又有什么樣的區(qū)別?本篇文章(源自Chess.com的文章和DeepMind團(tuán)隊(duì)最新論文)帶你看看最強(qiáng)AI引擎AlphaZero是怎樣學(xué)習(xí)國(guó)際象棋的。
AlphaZero是怎樣學(xué)習(xí)國(guó)際象棋的?
在某種程度上來(lái)講,AlphaZero的學(xué)習(xí)過(guò)程和人類的學(xué)習(xí)過(guò)程是比較類似的。根據(jù)DeepMind團(tuán)隊(duì)最新發(fā)表的論文來(lái)看(其中包括第十四任世界冠軍克拉姆尼克發(fā)表的觀點(diǎn)),盡管AlphaZero從未學(xué)習(xí)任何人類的對(duì)局,但還是在其神經(jīng)網(wǎng)絡(luò)里發(fā)現(xiàn)了許多人類可以解讀的思路和概念。
How does AlphaZero learn chess? Why does it make certain moves? What values does it give to concepts such as king safety or mobility? How does it learn openings, and how is that different from how humans developed opening theory
那么,Alphazero是到底如何學(xué)習(xí)國(guó)際象棋的呢?它到底是如何做出走某一步棋的決定的?它是如何看待‘王的安全’或者‘子力協(xié)調(diào)性’這種概念?它到底怎樣學(xué)習(xí)開(kāi)局,它對(duì)開(kāi)局的理解和人類現(xiàn)有的開(kāi)局理論又有什么樣的區(qū)別?
Questions like these are being discussed in a fascinating new paper by DeepMind, titled Acquisition of Chess Knowledge in AlphaZero. It was written by Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim, and Ulrich Paquet together with Kramnik. It is the second cooperation between DeepMind and Kramnik, after their research from last year when they used AlphaZero to explore the design of different variants of the game of chess, with different sets of rules.
以上問(wèn)題都在DeepMind團(tuán)隊(duì)最新的論文中得以討論,論文題目是《解讀AlphaZero的國(guó)際象棋理論》。該論文由Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet 以及克拉姆尼克共同撰寫(xiě)。這也是DeepMind團(tuán)隊(duì)第二次和克拉姆尼克合作,去年他們共同研究了如何利用AlphaZero去創(chuàng)立不同的國(guó)際象棋變種以及相關(guān)走子規(guī)則。
編碼“人類現(xiàn)有的概念知識(shí)”
In their latest paper, the researchers tried a method for encoding human conceptual knowledge, to determine the extent to which the AlphaZero network represents human chess concepts. Examples of such concepts are the bishop pair, material (im)balance, mobility, or king safety. These concepts have in common that they are pre-specified functions that encapsulate a particular piece of domain-specific knowledge.
在他們最新的論文里,研究人員嘗試使用一種方法將人類現(xiàn)有的國(guó)際象棋知識(shí)和概念編碼化,以確定Alphazero神經(jīng)網(wǎng)絡(luò)里在多大程度上可以代表人類的國(guó)際象棋思路和概念,比如雙象優(yōu)勢(shì),子力不對(duì)等,協(xié)調(diào)性或者王的安全等等概念。每一種特定概念都被設(shè)計(jì)成預(yù)先指定的函數(shù),對(duì)其進(jìn)行了封裝。
Some of these concepts were taken from Stockfish 8's evaluation function, such as material, imbalance, mobility, king safety, threats, passed pawns, and space. Stockfish 8 uses these as sub-functions that give individual scores leading to a "total" evaluation that is exported as a continuous value, such as "0.25" (a slight advantage to White) or "-1.48" (a big advantage to Black). Note that more recent versions of Stockfish have developed into Alpha-Zero-like neural networks but were not used for this paper.
其中一些概念則取自于Stockfish8里的評(píng)估功能,比如子力,不對(duì)等性,協(xié)調(diào)性,王的安全,威脅,通路兵,以及空間。Stokfish給每一種概念設(shè)計(jì)成一個(gè)函數(shù),每一個(gè)函數(shù)返回的數(shù)值相加則形成了‘最終’評(píng)估分?jǐn)?shù),比如“0.25”(白方稍優(yōu))或者“-1.48”(黑方大優(yōu))。值得一提的更新版本的Stockfish已經(jīng)開(kāi)始采用了AlphaZero類型的神經(jīng)網(wǎng)絡(luò),但在本篇論文里沒(méi)有用到。
The third type of concepts encapsulates more specific lower-level features, such as the existence of forks, pins, or contested files, as well as a range of features regarding pawn structure.
第三種類型的概念則是將更多的底層特征進(jìn)行封裝,比如找出捉雙,牽制,線路,以及一系列有關(guān)兵型的概念。
Having established this wide array of human concepts, the next step for the researchers was to try and find them within the AlphaZero network, for which they used a sparse linear regression model. After that, they started visualizing the human concept learning with what they call what-when-where plots: what concept is learned when in training time where in the network.
在建立了廣泛的人類知識(shí)圖譜和模型后,研究人員下一步的工作則是在AlphaZero的神經(jīng)網(wǎng)絡(luò)里采用稀疏化線性模型來(lái)尋找人類模型的痕跡,然后再將整個(gè)學(xué)習(xí)過(guò)程進(jìn)行可視化,可視化的展現(xiàn)形式為 what-when-where 圖:即在神經(jīng)網(wǎng)絡(luò)的哪個(gè)地方,什么時(shí)間學(xué)習(xí)了哪種概念。
According to the researchers, AlphaZero indeed develops representations that are closely related to a number of human concepts over the course of training, including high-level evaluation of the position, potential moves and consequences, and specific positional features.
根據(jù)研究人員發(fā)現(xiàn),AlphaZero的某些學(xué)習(xí)特征,的確與人類訓(xùn)練時(shí)學(xué)習(xí)的概念產(chǎn)生重合,包括對(duì)局面的抽象評(píng)估,潛在招法及其后果,以及特定的局面性特征。
One interesting result was about material imbalance. As was demonstrated in Matthew Sadler and Natasha Regan's award-winning book Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI (New In Chess, 2019), AlphaZero seems to view material imbalance differently from Stockfish 8. The paper gives empirical evidence that this is the case at the representational level: AlphaZero initially "follows" Stockfish 8's evaluation of material more and more during its training, but at some point, it turns away from it again.
其中在‘子力不對(duì)等’這個(gè)概念中發(fā)現(xiàn)了一個(gè)有意思的現(xiàn)象。就像Matthew Sadler 和 Natasha Regan在其獲獎(jiǎng)著作中(Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI)說(shuō)明的那樣,AlphaZero在對(duì)待‘子力不對(duì)等’這個(gè)問(wèn)題上似乎與Stockfish8不同,本篇論文在實(shí)驗(yàn)基礎(chǔ)上證實(shí)了這個(gè)過(guò)程:起先,Alphazero在訓(xùn)練過(guò)程中,隨著時(shí)間推移,越來(lái)越‘贊同’Stockfish對(duì)待子力的觀點(diǎn),但是就在某個(gè)節(jié)點(diǎn),慢慢又出現(xiàn)了相反的觀點(diǎn)。
棋子價(jià)值和子力
The next step for the researchers was to relate the human concepts to AlphaZero's value function. One of the first concepts they looked at was piece value, something a beginner will first learn when starting to play chess. The classical values are nine for a queen, five for a rook, three for both the bishop and knight, and one for a pawn. The left figure below (taken from the paper) shows the evolution of piece weights during AlphaZero's training, with piece values converging towards commonly-accepted values.
研究人員下一步的工作是比較人類與Alphazero對(duì)待棋子價(jià)值的區(qū)別。就像初學(xué)者學(xué)棋一樣,研究人員首先研究的是每個(gè)棋子自身的價(jià)值,傳統(tǒng)上每個(gè)棋子的分值分別是后9分,車5分,象或馬3分,兵1分。下面左圖(取自論文)顯示了AlphaZero學(xué)習(xí)過(guò)程中每個(gè)棋子價(jià)值的演變,最后得出的分?jǐn)?shù)也與人類的看法基本一致。

右圖:AlphaZero在訓(xùn)練過(guò)程中,由其神經(jīng)網(wǎng)絡(luò)對(duì)上述6個(gè)概念做出的權(quán)重值比較,圖片由 DeepMind 提供。
The image on the right shows that during AlphaZero's training, material becomes more and more important in the early stages of learning chess (consistent to human learning) but it reaches a plateau and at some point, the values of more subtle concepts such as mobility and king safety are becoming more important while material actually decreases in importance.
右圖說(shuō)明了AlphaZero在初期訓(xùn)練中,認(rèn)為子力是最重要的(與人類的認(rèn)知一致),但是當(dāng)?shù)搅四骋粋€(gè)節(jié)點(diǎn),個(gè)別微妙的概念,如協(xié)調(diào)性和王的安全的重要性開(kāi)始越來(lái)越高,而子力的重要性相對(duì)之前則有所減少。
AlphaZero的訓(xùn)練過(guò)程 Vs. 近代人類對(duì)與國(guó)際象棋的認(rèn)知過(guò)程
Another part of the paper is dedicated to comparing AlphaZero's training to the progression of human knowledge over history. The researchers point out that there is a marked difference between AlphaZero’s progression of move preferences through its history of training steps, and what is known of the progression of human understanding of chess since the 15th century:
該論文的另一部分著重比較了AlphaZero的訓(xùn)練歷程與近代人類對(duì)于國(guó)際象棋的認(rèn)知過(guò)程。研究人員們指出AlphaZero在整個(gè)訓(xùn)練階段選擇招法的過(guò)程和人類自從15世紀(jì)以來(lái)人類對(duì)于國(guó)際象棋理解的過(guò)程存在著顯著差異。
AlphaZero starts with a uniform opening book, allowing it to explore all options equally, and largely narrows down plausible options over time. Recorded human games over the last five centuries point to an opposite pattern: an initial overwhelming preference for 1.e4, with an expansion of plausible options over time.
AlphaZero在第一步棋時(shí),平等看待每步可走的棋,隨著時(shí)間推移再篩選出最合理的走法。而人類過(guò)去五個(gè)世紀(jì)的對(duì)局記錄則是相反:最開(kāi)始基本人人都走e4,隨著時(shí)間推移,開(kāi)始越來(lái)越多的采用其他走法。
The researchers compare the games AlphaZero is playing against itself with a large sample taken from the ChessBase Mega Database, starting with games from the year 1475 up till the 21st century.
研究人員們將AlphaZero自身產(chǎn)生的對(duì)局與Chessbase Mega Database里的對(duì)局進(jìn)行大量比對(duì),選擇的人類對(duì)局時(shí)間范圍為1475年-21世紀(jì)。
Humans initially played 1.e4 almost exclusively but 1.d4 was slightly more popular in the early 20th century, soon followed by the increasing popularity of more flexible systems like 1.c4 and 1.Nf3. AlphaZero, on the other hand, tries out a wide array of opening moves in the early stage of its training before starting to value the "main" moves higher.
人類一開(kāi)始幾乎都只走1.e4,在20世紀(jì)早期的時(shí)候1.d4開(kāi)始越來(lái)越流行,然后1.c4 1.Nf3也開(kāi)始慢慢普及。AlphaZero則相反,最開(kāi)始的時(shí)候會(huì)它嘗試每一種走法,而后慢慢的篩選出所謂“主流”走法。

西班牙開(kāi)局柏林防御
A more specific example provided is about the Berlin variation of the Ruy Lopez (the move 3...Nf6 after 1.e4 e5 2.Nf3 Nc6 3.Bb5), which only became popular at the top level early 21st century, after Kramnik successfully used it in his world championship match with GM Garry Kasparov in 2000. Before that, it was considered to be somewhat passive and slightly better for White with the move 3...a6 being preferable.
拿西班牙開(kāi)局柏林防御變例舉例(1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6),該變例直到21世紀(jì)初期才開(kāi)始流行,流行于2000年克拉姆尼克vs卡斯帕羅夫的世界冠軍賽。在此之前3...Nf6這步棋被廣泛認(rèn)為略微被動(dòng),會(huì)給白棋稍優(yōu)的局面,3...a6則是更流行的走法。
The researchers write:Looking back in time, it took a while for human chess opening theory to fully appreciate the benefits of Berlin defense and to establish effective ways of playing with Black in this position. On the other hand, AlphaZero develops a preference for this line of play quite rapidly, upon mastering the basic concepts of the game. This already highlights a notable difference in opening play evolution between humans and the machine.
研究人員寫(xiě)道:
回望過(guò)去,柏林防御花了很長(zhǎng)時(shí)間才被人類的布局理論徹底接受,并被認(rèn)為是對(duì)黑棋非常有利的一個(gè)布局。而另一方面,AlphaZero只需要在掌握對(duì)局基礎(chǔ)概念的時(shí)候,就能非??焖俚牟捎冒亓址烙?。這一點(diǎn)說(shuō)明了人類和機(jī)器之間在布局理論進(jìn)化上的顯著差別。

Remarkably, when different versions of AlphaZero are trained from scratch, half of them strongly prefer 3… a6, while the other half strongly prefer 3… Nf6! It is interesting as it means that there is no "unique” good chess player. The following table shows the preferences of four different AlphaZero neural networks:
值得注意的是,當(dāng)不同AlphaZero版本在初期訓(xùn)練的時(shí)候,有半數(shù)的版本極其偏好3...a6,另一半則極其偏好3...Nf6! 這就意味著AlphaZero在這里產(chǎn)生了“人格分裂”。下圖表格里顯示了四種AlphaZero不同神經(jīng)網(wǎng)絡(luò)版本里的偏好:

In a similar vein, AlphaZero develops its own opening "theory" for a much wider array of openings over the course of its training. At some point, 1.d4 and 1.e4 are discovered to be good opening moves and are rapidly adopted. Similarly, AlphaZero's preferred continuation after 1.e4 e5 is determined in another short temporal window. The figure below illustrates how both 2.d4 and 2.Nf3 are quickly learned as reasonable White moves, but 2.d4 is then dropped almost as quickly in favor of 2.Nf3 as a standard reply.
同樣,AlphaZero在自我訓(xùn)練的過(guò)程中,發(fā)展出了屬于它自己的布局理論,在某個(gè)時(shí)間段,1.d4和1.e4被認(rèn)定是最好的走法,也被迅速采納。同樣地AlphaZero在1.e4 e5 之后也是經(jīng)過(guò)一點(diǎn)短暫的時(shí)間后才決定出來(lái)哪步棋最好。下圖中顯示了2.d4和2.Nf3迅速被認(rèn)為是最佳走法,但是馬上2.d4的走法被放棄,取而代之的是2.Nf3為標(biāo)準(zhǔn)走法。

克拉姆尼克的質(zhì)量評(píng)估
Kramnik's contribution to the paper is a qualitative assessment, as an attempt to identify themes and differences in the style of play of AlphaZero at different stages of its training. The 14th world champion was provided sample games from four different stages to look at.
克拉姆尼克對(duì)于本篇論文的貢獻(xiàn)體現(xiàn)在質(zhì)量評(píng)估方面,DeepMind團(tuán)隊(duì)提供給世界冠軍克拉姆尼克AlphaZero在四個(gè)不同訓(xùn)練階段的產(chǎn)生的對(duì)局作為樣本,讓其分析一下AlphaZero在自我訓(xùn)練過(guò)程的不同階段中的訓(xùn)練主題和走棋風(fēng)格。
According to Kramnik, in the early training stage, AlphaZero has "a crude understanding of material value and fails to accurately assess material in complex positions. This leads to potentially undesirable exchange sequences, and ultimately losing games on material." In the second stage, AlphaZero seemed to have "a solid grasp on material value, thereby being able to capitalize on the material assessment weakness" of the early version.
根據(jù)克拉姆尼克的看法,“AlphaZero在早期訓(xùn)練過(guò)程中,對(duì)子力的理解非常粗糙,并且經(jīng)常在復(fù)雜局面中出現(xiàn)分析失誤。這就導(dǎo)致了很多錯(cuò)誤的換子順序,最終由于少子輸棋。” 在第二個(gè)階段的時(shí)候,AlphaZero看起來(lái)對(duì)子力價(jià)值有了充分理解,解決了第一階段對(duì)于子力評(píng)估的問(wèn)題。
In the third stage, Kramnik feels that AlphaZero has a better understanding of king safety in imbalanced positions. This manifests in the second version "potentially underestimating the attacks and long-term material sacrifices of the third version, as well as the second version overestimating its own attacks, resulting in losing positions."
在第三階段,克拉姆尼克開(kāi)始感覺(jué)到AlphaZero在子力不對(duì)等的局面中對(duì)于王的安全有了更好的理解。這也體現(xiàn)在第二版本AZ與第三版本AZ對(duì)弈時(shí),經(jīng)常低估第三版本AZ攻擊的潛力和棄子帶來(lái)的長(zhǎng)期價(jià)值,第二版本AZ也時(shí)常過(guò)于樂(lè)觀的估計(jì)自己的攻擊,最終導(dǎo)致輸棋。
In its fourth stage of the training, has a "much deeper understanding" of which attacks will succeed and which would fail. Kramnik notices that it sometimes accepts sacrifices played by the "third version," proceeds to defend well, keep the material advantage, and ultimately converts to a win.
在訓(xùn)練的第四個(gè)階段,AlphaZero開(kāi)始有了更深層次的理解,知道哪些攻擊會(huì)奏效,哪些攻擊會(huì)失敗??死纺峥俗⒁獾?,有時(shí)候第四階段AZ會(huì)接受第三階段AZ的棄子,然后頑強(qiáng)頂住,保持子力優(yōu)勢(shì),直到最終轉(zhuǎn)換成贏棋。
Another point Kramnik makes, which feels similar to how humans learn chess, is that tactical skills appear to precede positional skills as AlphaZero learns. By generating self-play games over separate opening sets (e.g. the Berlin or the Queen's Gambit Declined in the "positional" set and the Najdorf and King's Indian in the "tactical" set), the researchers manage to provide circumstantial evidence but note that further work is needed to understand the order in which skills are acquired.
克拉姆尼克提出另一個(gè)觀點(diǎn),就像人類學(xué)棋一樣,AlphaZero的學(xué)棋過(guò)程也是先偏重學(xué)習(xí)戰(zhàn)術(shù)棋,而不是戰(zhàn)略棋。通過(guò)讓AlphaZero在不同開(kāi)局主題下進(jìn)行自我對(duì)弈學(xué)習(xí)(比如,“局面型布局”柏林防御或者后翼?xiàng)壉芙^棄兵變例,以及“戰(zhàn)術(shù)型布局”納道爾夫和古印度防御),研究者們?cè)O(shè)法提供更多間接證據(jù),不過(guò)現(xiàn)階段仍需要更多的研究工作來(lái)證明AlphaZero的技能學(xué)習(xí)順序。

本篇論文對(duì)于國(guó)際象棋界以外的影響
For a long time, it was believed that machine-learning systems learn uninterpretable representations that have little in common with human understanding of the domain they are trained on. In other words, how and what AI teaches itself is mostly gibberish to humans.
長(zhǎng)期以來(lái),人們認(rèn)為在機(jī)器學(xué)習(xí)系統(tǒng)中,機(jī)器所自學(xué)的那些無(wú)法解釋的特征很難與人類對(duì)現(xiàn)有的,所訓(xùn)練事物的規(guī)律認(rèn)知產(chǎn)生聯(lián)系。換句話說(shuō),AI那些自學(xué)的過(guò)程對(duì)于人類來(lái)講毫無(wú)參考的用處。
With their latest paper, the researchers have provided strong evidence for the existence of human-understandable concepts in an AI system that wasn't exposed to human-generated data. AlphaZero's network shows the use of human concepts, even though AlphaZero has never seen a human game of chess.
但在本篇論文中,研究人員們提供了強(qiáng)有力的證據(jù),證明了人工智能系統(tǒng)中是存在人類可以理解的概念,AlphaZero的神經(jīng)網(wǎng)絡(luò)里展示了人類概念的使用,盡管AlphaZero從未學(xué)習(xí)過(guò)任何一盤人類對(duì)局。
This might have implications outside the chess world. The researchers conclude:
這一觀點(diǎn)可能會(huì)對(duì)國(guó)際象棋界以外的行業(yè)產(chǎn)生影響,研究人員得出結(jié)論:
The fact that human concepts can be located even in a superhuman system trained by self-play broadens the range of systems in which we should expect to find human-understandable concepts. We believe that the ability to find human-understandable concepts in the AZ network indicates that a closer examination will reveal more.
通過(guò)在人工智能系統(tǒng)的自我訓(xùn)練過(guò)程中找出到人類概念這一事實(shí)來(lái)看,我們可以期待在其他更多領(lǐng)域里,我們都應(yīng)該能在機(jī)器學(xué)習(xí)過(guò)程中都能找到人類概念的身影。我們相信只要再細(xì)致研究,便能夠在AZ神經(jīng)網(wǎng)絡(luò)里找出更多與人類概念有關(guān)的細(xì)節(jié)。
Co-author Nenad Tomasev commented to Chess.com that for him personally, he was really curious to consider if there is such a thing as a "natural" progression of chess theory:
論文合著者Nenad Tomasev對(duì)Chess.com評(píng)論說(shuō),就他個(gè)人而言,他很想認(rèn)真考慮是否到底存在國(guó)際象棋理論的“自然”發(fā)展這樣的事情:
Even in the human context—if we were to 'restart' history, go back in time— would the theory of chess have developed in the same way? There were a number of prominent schools of thought in terms of the overall understanding of chess principles and middlegame positions: the importance of dynamism vs. structure, material vs. sacrificial attacks, material imbalance, the importance of space vs. the hypermodern school that invites overextension in order to counterattack, etc. This also informed the openings that were played. Looking at this progression, what remains unclear is whether it would have happened the same way again. Maybe some pieces of chess knowledge and some perspectives are simply easier and more natural for the human mind to grasp and formulate? Maybe the process of refining them and expanding them has a linear trajectory, or not? We can't really restart history, so we can only ever guess what the answer might be.
假設(shè)我們‘重啟’歷史,回到過(guò)去,國(guó)際象棋理論還會(huì)以同樣的方式發(fā)展嗎?在對(duì)國(guó)際象棋原則和中局位置的整體理解方面,有過(guò)許多突出的思想流派,如:動(dòng)態(tài)與結(jié)構(gòu)的重要性對(duì)比,保存子力與棄子攻擊的重要性對(duì)比,子力不對(duì)等與空間的重要性對(duì)比,空間的重要性與超現(xiàn)代學(xué)派的所謂引誘對(duì)手拉長(zhǎng)戰(zhàn)線再給與其反擊的重要性對(duì)比等等。這些學(xué)派思想也指引了布局理論的思想。回首過(guò)往的進(jìn)程,我們很難確定這一切如果重來(lái),是不是還會(huì)再以同樣的方式重現(xiàn)。也許一些知識(shí)概念和一些觀點(diǎn)對(duì)于人類的思維來(lái)說(shuō)更容易、更自然地掌握和形成?還是說(shuō)提煉和擴(kuò)展這些知識(shí)的過(guò)程有一個(gè)線性軌跡?我們無(wú)法真正重啟歷史,一切答案也只在猜測(cè)之中。
However, when it comes to AlphaZero, we can retrain it many times—and also compare the findings to what we have previously seen in human play. We can therefore use AlphaZero as a Petri dish for this question, as we look at how it acquires knowledge about the game. As it turns out, there are both similarities and dissimilarities in how it builds its understanding of the game compared to human history. Also, while there is some level of stability (results being in agreement across different training runs), it is by no means absolute (sometimes the training progression looks a little bit different, and different opening lines end up being preferred).
然而,當(dāng)我們談到AlphaZero時(shí),我們可以對(duì)其進(jìn)行多次重新訓(xùn)練——并將結(jié)果與我們之前在人類對(duì)局中看到的結(jié)果進(jìn)行比對(duì)。因此,我們可以將 AlphaZero 用作這類問(wèn)題的實(shí)驗(yàn)道場(chǎng),用來(lái)了解它如何獲取國(guó)際象棋的知識(shí)。事實(shí)證明,與人類國(guó)際象棋理論的發(fā)展歷程相比,AZ對(duì)國(guó)際象棋理論領(lǐng)悟的過(guò)程與其既有相似之處,也有不同之處。當(dāng)然了,該結(jié)論雖然基本靠譜(結(jié)果在不同的訓(xùn)練運(yùn)行中基本一致),但卻不是絕對(duì)正確(有時(shí)訓(xùn)練進(jìn)程看起來(lái)有點(diǎn)不同,會(huì)導(dǎo)致不同的開(kāi)局偏好)。
Now, this is by no means a definitive answer to what is, to me personally, a fascinating question. There is still plenty to think about here. Yet, we hope that our results provide an interesting perspective and make it possible for us to start thinking a bit deeper about how we learn, grow, improve—the very nature of intelligence and how it goes all the way from a blank slate to what is a deep understanding of a very complex domain like chess.
對(duì)我個(gè)人而言,面對(duì)這樣引人入勝的問(wèn)題,我無(wú)法給出一個(gè)確切的答案,這里還有很多值得深思的地方。然而,我們希望這項(xiàng)實(shí)驗(yàn)結(jié)論可以提供給人們更多有趣的視角,使我們能夠更深入地思考我們?nèi)祟惖降兹绾螌W(xué)習(xí)、成長(zhǎng)、改進(jìn),深入思考關(guān)‘智力’的本質(zhì),以及我們的智力到底如何從一張白紙發(fā)展到深刻理解像國(guó)際象棋這樣非常復(fù)雜的領(lǐng)域。
克拉姆尼克的看法
"There are two major things which we can try to find out with this work. One is: how does AlphaZero learn chess, how does it improve? That is actually quite important. If we manage one day to understand it fully, then maybe we can interpret it into the human learning process.
“我們可以通過(guò)這項(xiàng)工作嘗試解決兩個(gè)重要的課題。一是:AlphaZero是如何學(xué)習(xí)國(guó)際象棋的,二是它是如何持續(xù)進(jìn)步的?這也是非常重要的一點(diǎn)。如果我們有一天能夠完全解讀背后的過(guò)程,那么也許我們就可以解密人類的學(xué)習(xí)過(guò)程。
Secondly, I believe it is quite fascinating to discover that there are certain patterns that AlphaZero finds meaningful, which actually make little sense for humans. That is my impression. That actually is a subject for further research, in fact, I was thinking that it might easily be that we are missing some very important patterns in chess, because after all, AlphaZero is so strong that if it uses those patterns, I suspect they make sense. That is actually also a very interesting and fascinating subject to understand, if maybe our way of learning chess, of improving in chess, is actually quite limited. We can expand it a bit with the help of AlphaZero, of understanding how it sees chess."
其次,我認(rèn)為探尋某些 AlphaZero 認(rèn)為有意義的‘規(guī)律認(rèn)知’是一個(gè)非常有意思的過(guò)程,盡管這些所謂的規(guī)律對(duì)人類,至少對(duì)我來(lái)說(shuō)沒(méi)有太大的意義,但對(duì)我們來(lái)說(shuō)確是有待進(jìn)一步研究的課題。事實(shí)上我曾經(jīng)想過(guò),也許我們?cè)趪?guó)際象棋上遺漏了許多很重要的概念,雖然我懷疑這些概念是否對(duì)我們有任何意義,但AlphaZero確實(shí)使用了我們不懂的概念,也因此才會(huì)變的如此之強(qiáng),搞懂這些規(guī)律和概念實(shí)際上將會(huì)是一個(gè)非常有意思的課題。也許我們學(xué)習(xí)、提高國(guó)際象棋水平的過(guò)程與能力十分有限,但在AlphaZero的幫助下,我們或許可以擴(kuò)展我們的思路,幫助我們更好的理解國(guó)際象棋本身這項(xiàng)運(yùn)動(dòng)。(完)
轉(zhuǎn)載自微信公眾號(hào):國(guó)際象棋動(dòng)態(tài)???歡迎訂閱!