散文網(wǎng) » 生活 »日常 » 強(qiáng)化學(xué)習(xí)與最優(yōu)控制

強(qiáng)化學(xué)習(xí)與最優(yōu)控制

2023-06-15 22:57 作者:想不到吧我還是我 0人讀過 | 我要投稿

鏈接：https://pan.baidu.com/s/1wgyQz-jpgF32a-lkQzuV7A?pwd=8xc7?

提取碼：8xc7

Dimitri P. Bertseka,美國MIT終身教授，美國國家工程院院士，清華大學(xué)復(fù)雜與網(wǎng)絡(luò)化系統(tǒng)研究中心客座教授,電氣工程與計(jì)算機(jī)科學(xué)領(lǐng)域國際知名作者，著有《非線性規(guī)劃》《網(wǎng)絡(luò)優(yōu)化》《凸優(yōu)化》等十幾本暢銷教材和專著。本書的目的是考慮大型且具有挑戰(zhàn)性的多階段決策問題，這些問題原則上可以通過動(dòng)態(tài)規(guī)劃和最優(yōu)控制來解決，但它們的精確解決方案在計(jì)算上是難以處理的。本書討論依賴于近似的解決方法，以產(chǎn)生具有足夠性能的次優(yōu)策略。這些方法統(tǒng)稱為增強(qiáng)學(xué)習(xí)，也可以叫做近似動(dòng)態(tài)規(guī)劃和神經(jīng)動(dòng)態(tài)規(guī)劃等。

本書的主題產(chǎn)生于最優(yōu)控制和人工智能思想的相互作用。本書的目的之一是探索這兩個(gè)領(lǐng)域之間的共同邊界，并架設(shè)一座具有任一領(lǐng)域背景的專業(yè)人士都可以訪問的橋梁。

內(nèi)容簡介

　　《強(qiáng)化學(xué)習(xí)與最優(yōu)控制（英文版）》的目的是考慮大型且具有挑戰(zhàn)性的多階段決策問題，這些問題原則上可以通過動(dòng)態(tài)規(guī)劃和優(yōu)控制來解決，但它們的解決方案在計(jì)算上是難以處理的?！稄?qiáng)化學(xué)習(xí)與最優(yōu)控制（英文版）》討論依賴于近似的解決方法，以產(chǎn)生具有足夠性能的次優(yōu)策略。這些方法統(tǒng)稱為增強(qiáng)學(xué)習(xí)，也可以叫做近似動(dòng)態(tài)規(guī)劃和神經(jīng)動(dòng)態(tài)規(guī)劃等。《強(qiáng)化學(xué)習(xí)與最優(yōu)控制（英文版）》的主題產(chǎn)生于優(yōu)控制和人工智能思想的相互作用。《強(qiáng)化學(xué)習(xí)與最優(yōu)控制（英文版）》的目的之一是探索這兩個(gè)領(lǐng)域之間的共同邊界，并架設(shè)一座具有任一領(lǐng)域背景的人士都可以訪問的橋梁。

作者簡介

Dimitri P. Bertseka,美國MIT終身教授，美國國家工程院院士，清華大學(xué)復(fù)雜與網(wǎng)絡(luò)化系統(tǒng)研究中心客座教授。電氣工程與計(jì)算機(jī)科學(xué)領(lǐng)域國際知名作者，著有《非線性規(guī)劃》《網(wǎng)絡(luò)優(yōu)化》《凸優(yōu)化》等十幾本暢銷教材和專著。

前言/序言

Preface

Turning to the succor of modern computing machines, let us

renounce all analytic tools.

Richard Bellman [Bel57]

From a teleological point of view the particular numerical solution

of any particular set of equations is of far less importance than

the understanding of the nature of the solution.

Richard Bellman [Bel57]

In this book we consider large and challenging multistage decision problems,

which can be solved in principle by dynamic programming (DP for short),

but their exact solution is computationally intractable. We discuss solution

methods that rely on approximations to produce suboptimal policies with

adequate performance. These methods are collectively known by several

essentially equivalent names: reinforcement learning, approximate dynamic

programming, and neuro-dynamic programming. We will use primarily the

most popular name: reinforcement learning.

Our subject has benefited greatly from the interplay of ideas from

optimal control and from artificial intelligence. One of the aims of the

book is to explore the common boundary between these two fields and to

form a bridge that is accessible by workers with background in either field.

Another aim is to organize coherently the broad mosaic of methods that

have proved successful in practice while having a solid theoretical and/or

logical foundation. This may help researchers and practitioners to find

their way through the maze of competing ideas that constitute the current

state of the art.

There are two general approaches for DP-based suboptimal control.

The first is approximation in value space, where we approximate in some

way the optimal cost-to-go function with some other function. The major

alternative to approximation in value space is approximation in policy

查看全部↓

標(biāo)簽：