最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

2023阿里巴巴全球數(shù)學(xué)競賽預(yù)選賽題/決賽部分題個人解 (五)

2023-06-26 08:59 作者:saqatl  | 我要投稿

應(yīng)用與計(jì)算數(shù)學(xué)題 2.?LLM Reward Collapse

給定一個特定的 prompt,讓 LLM 生成?n?個回答。標(biāo)注者將這?n?個回答從最好到最差排序。設(shè)?G?在?%5B-1%2C1%5D?上強(qiáng)凹且單調(diào)遞增,我們想訓(xùn)練一個 reward model,為第?i?個回答分配一個?0%20%5Cle%20r_i%20%5Cle%201?的分?jǐn)?shù)。獎勵?r_i?應(yīng)為如下優(yōu)化問題的解:

%5Cmax%5Climits_%7B0%20%5Cle%20r_1%2C%20%5Cldots%2C%20r_n%20%5Cle%201%7D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_i%20-%20r_j)

(a) 證明

L(r_1%2C%20%5Cldots%2C%20r_n)%20%3A%3D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_%7Bi%7D%20-%20r_%7Bj%7D)

是凹的,并證明上述優(yōu)化問題有唯一解?(r_1%5E*%2C%20%5Cldots%2C%20r_n%5E*)%20%5Cin%20%5B0%2C1%5D%5En。

(b)?證明上述解滿足:(1)?1%20%3D%20r_%7B1%7D%5E*%20%5Cge%20r_%7B2%7D%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_%7Bn%7D%5E*%20%3D%200;(2)?對任意?%20i%3D%201%2C%5Cldots%2Cn,r_%7Bi%7D%5E*%20%2B%20r_%7Bn%2B1-i%7D%5E*%20%3D%201

(c)?設(shè)當(dāng)?n%20%5Cto%20%5Cinfty?時(shí),r_1%2C%20%5Cldots%2C%20r_n?的經(jīng)驗(yàn)分布收斂于區(qū)間?%5B0%2C1%5D?上的概率測度?%5Cmu。此時(shí)優(yōu)化問題即為

%5Csup%5Climits_%7B%5Cmu%7D%20%5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%7D%20G(%7CX%20-%20X'%7C)

若?%5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%7D%20G(%7CX%20-%20X'%7C)?在一個概率測度?%5Cmu%5E*?上取得最大值,證明:%5Cmathbb%7BE%7D_%7BX%20%5Csim%20%5Cmu%5E*%7DG(%7CX%20-%20c%7C)?不依賴于?c%20%5Cin%20%5B0%2C1%5D。

(a) 與 (b):該優(yōu)化問題的解記為?%5Ctextbf%20r%5E*。直接由定義得

%5Cbegin%7Baligned%7D%0A%0AL(%5Ctextbf%20r)%20%2B%20L(%5Ctextbf%20r')%20%26%3D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_i%20-%20r_j)%20%2B%20G(r_i'%20-%20r_j')%20%5C%5C%0A%0A%26%20%5Cle%20%5Csum%5Climits_%7Bi%3Cj%7D%202G%5Cleft(%5Cfrac%7Br_i%20%2B%20r_i'%20-%20r_j%20-%20r_j'%7D%7B2%7D%5Cright)%20%3D%202L%5Cleft(%5Cfrac%7B%5Ctextbf%20r%20%2B%20%5Ctextbf%20r'%7D%7B2%7D%5Cright)%0A%0A%5Cend%7Baligned%7D

因此?L?是凹的。特別地,上述不等式取等當(dāng)且僅當(dāng)對任意?1%20%5Cle%20i%20%3C%20j%20%5Cle%20n?都有?r_i%20-%20r_j%20%3D%20r_i'%20-%20r_j'。由于?G?是遞增的,因此?r_1%5E*%20%3D%201,這說明優(yōu)化問題的解是唯一的(若有兩個不同的解?%5Ctextbf%20r_1%2C%20%5Ctextbf%20r_2,則?%5Cfrac%7B%5Ctextbf%20r_1%20%2B%20%5Ctextbf%20r_2%7D%7B2%7D?使得?L?更大)。

下面再證明最優(yōu)解滿足?r_1%5E*%20%5Cge%20r_2%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_n%5E*。若不然,則存在?k%20%5Cge%200?使得?r_1%5E*%20%5Cge%20r_2%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_k%5E*?但?r_k%5E*%20%3C%20r_%7Bk%2B1%7D%5E*。但此時(shí)考察

%5Ctilde%7Br%7D_i%20%3D%20%5Cbegin%7Bcases%7D%0A%0Ar_i%5E*%20%26%20i%20%5Cne%20k%2C%20k%2B1%5C%5C%0A%0Ar_%7Bk%2B1%7D%5E*%20%26%20i%20%3D%20k%5C%5C%0A%0Ar_k%5E*%20%26%20i%20%3D%20k%2B1%0A%0A%5Cend%7Bcases%7D

此時(shí)

%5Csum%5Climits_%7Bi%3Cj%7DG(r_i%5E*%20-%20r_j%5E*)%20-%20%5Csum%5Climits_%7Bi%3Cj%7DG(%5Ctilde%20r_i%20-%20%5Ctilde%20r_j)%20%3D%20G(r_k%5E*%20-%20r_%7Bk%2B1%7D%5E*)%20-%20G(r_%7Bk%2B1%7D%5E*%20-%20r_k%5E*)%20%3C%200

這與?%5Ctextbf%20r%5E*?的最優(yōu)性矛盾。

最后,令?%5Ctilde%20r_i%20%3D%201%20-%20r_%7Bn-i%2B1%7D%5E*,則?%5Ctilde%20r_i%20-%20%5Ctilde%20r_j%20%3D%20r_%7Bn-j%2B1%7D%5E*%20-%20r_%7Bn-i%2B1%7D%5E*,故?L(%5Ctextbf%20r%5E*)%20%3D%20L(%5Ctilde%20%7B%5Ctextbf%20r%7D)。根據(jù)解的唯一性可知?%5Ctextbf%20r%5E*%20%3D%20%5Ctilde%20%7B%5Ctextbf%20r%7D,因此?r_%7Bi%7D%5E*%20%2B%20r_%7Bn%2B1-i%7D%5E*%20%3D%201。

(c)?%5Cmu%5E*?為最優(yōu)解?%5Ctextbf%20r%5E*?在?n%20%5Cto%20%5Cinfty?時(shí)所收斂的概率測度,設(shè)其對應(yīng)的概率密度為?f%20%5Cge%20a%20%3E%200,其中?a?為常數(shù)。令

%5Cmathcal%20F%5Bf%5D%20%3D%20%5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%5E*%7D%20G(%7CX%20-%20X'%7C)%20%3D%20%5Cint_%7B%5B0%2C1%5D%5E2%7D%20G(%7Cx%20-%20y%7C)f(x)f(y)dxdy

%5Cmathbb%7BE%7D_%7BX%20%5Csim%20%5Cmu%5E*%7DG(%7CX%20-%20c%7C)%20%3D%20%5Cfrac%7B%5Cdelta%20%5Cmathcal%7BF%7D%7D%7B%5Cdelta%20f%7D(c)%20%3D%20%5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx

由于?G?在緊集?%5B0%2C1%5D?上連續(xù),那么?%5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx?也是連續(xù)的。因此,如果?%5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx?不是常數(shù),那么我們可以考慮?f%20%2B%20%5Cdelta%20f?使得?%5Cint%20%5Cdelta%20f%20%3D%200,%5C%7C%5Cdelta%20f%5C%7C_%7B%5Cinfty%7D%20%3C%20a%2F2,此時(shí)?%5Cmathcal%20F%5Bf%20%2B%20%5Cdelta%20f%5D%20%3E%20%5Cmathcal%20F%5Bf%5D,矛盾。

下面的題賽時(shí)并沒有選,只是出于個人喜好和某人一起做的。

組合與概率題 5.?證明:對任意?%5Cvarepsilon%20%3E%200,存在?n_0 使得任取?%20n%20%5Cge%20n_0,所有邊數(shù)不少于?n%5E%7B1%20%2B%20%5Cvarepsilon%7D 的?n 階簡單圖包含一個圈?C 滿足:其至少有?%7CE(C)%7C 條弦。這里?%7C%5Ccdot%7C?是集合的勢。(圈 C是一條邊,其連接?C?中兩個頂點(diǎn)但不屬于?C?的邊集?E(C)。)

先來吐槽這個題,一方面是因?yàn)閷诠接忠宀幌铝耍硪环矫媸沁@是個 open problem,直到開賽前 3 天有一篇文章掛在 arxiv 上解決了這個問題(https://arxiv.org/abs/2306.09157),就很離譜。


2023阿里巴巴全球數(shù)學(xué)競賽預(yù)選賽題/決賽部分題個人解 (五)的評論 (共 條)

分享到微博請遵守國家法律
汨罗市| 塘沽区| 墨玉县| 眉山市| 天祝| 奎屯市| 米林县| 卢龙县| 嘉荫县| 阜平县| 绥阳县| 台安县| 海兴县| 陵水| 肥东县| 呼图壁县| 松桃| 竹山县| 乐至县| 齐齐哈尔市| 连城县| 裕民县| 鹰潭市| 岚皋县| 天台县| 洪泽县| 宁陵县| 咸阳市| 台湾省| 松江区| 济南市| 玉溪市| 吉首市| 永泰县| 土默特右旗| 武汉市| 屏南县| 蛟河市| 锡林浩特市| 泰兴市| 郸城县|