最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

《注意力矩陣乘法》Attention as matrix multiplication

2023-02-27 20:38 作者:學(xué)的很雜的一個(gè)人  | 我要投稿


來(lái)源:https://e2eml.school/transformers.html#softmax

中英雙語(yǔ)版,由各類翻譯程序和少量自己理解的意思做中文注釋


相關(guān)文章匯總在文集:Transformers from Scratch(中文注釋)

--------------------------------------------------------------------------------------------------------------------


Feature weights could be straightforward to build by counting how often each word pair/next word transition occurs in training, but attention masks are not.?

通過(guò)計(jì)算每個(gè)單詞對(duì)/下一個(gè)單詞轉(zhuǎn)換在訓(xùn)練中發(fā)生的頻率,可以很容易地建立特征權(quán)重,但注意力掩碼不是。

Up to this point, we've pulled the mask vector out of thin air.?

到目前為止,我們已經(jīng)憑空拉出了掩模矢量。

How transformers find the relevant mask matters.?

transformers是如何找到相關(guān)的掩碼。

It would be natural to use some sort of lookup table, but now we are focusing hard on expressing everything as matrix multiplications.?

使用某種查找表是很自然的,但現(xiàn)在我們專注于將所有內(nèi)容表示為矩陣乘法。

We can use the same?lookup?method we introduced above by stacking the mask vectors for every word into a matrix and using the one-hot representation of the most recent word to pull out the relevant mask.

我們可以使用與上面介紹的相同的查找方法,將每個(gè)單詞的掩碼向量堆疊到一個(gè)矩陣中,并使用最新單詞的獨(dú)熱表示來(lái)提取相關(guān)的掩碼。

In the matrix showing the collection of mask vectors, we've only shown the one we're trying to pull out, for clarity.

在顯示掩碼向量集合的矩陣中,為了清楚起見(jiàn),我們只顯示我們?cè)噲D提取的那個(gè)。

We're finally getting to the point where we can start tying into the paper.

我們終于到了可以開(kāi)始進(jìn)入到論文的地步。

This mask lookup is represented by the?QK^T?term in the attention equation.

這種掩碼查找由注意方程中的QK^T項(xiàng)表示。

The query?Q?represents the feature of interest and the matrix?K?represents the collection of masks.

查詢 Q 表示感興趣的特征,矩陣 K 表示掩碼的集合。

Because it's stored with masks in columns, rather than rows, it needs to be transposed (with the?T?operator) before multiplying.

因?yàn)樗怯醚诖a存儲(chǔ)在列中,而不是在行中,所以在乘法之前需要轉(zhuǎn)置(使用 T 運(yùn)算符)。

By the time we're all done, we'll make some important modifications to this, but at this level it captures the concept of a differentiable lookup table that transformers make use of.

當(dāng)我們完成所有操作時(shí),我們將對(duì)此進(jìn)行一些重要的修改,但在此級(jí)別,它捕獲了transformers使用的可微查找表的概念。

《注意力矩陣乘法》Attention as matrix multiplication的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
常熟市| 济源市| 高邑县| 抚顺县| 仁寿县| 南乐县| 嘉黎县| 包头市| 鄢陵县| 郁南县| 赤城县| 宿松县| 闵行区| 新津县| 柞水县| 景泰县| 宜章县| 金乡县| 汝阳县| 张掖市| 小金县| 定兴县| 大宁县| 会理县| 井研县| 贞丰县| 双流县| 吉首市| 卫辉市| 鹤峰县| 周宁县| 阜城县| 南城县| 阜宁县| 鄂温| 长白| 积石山| 凌海市| 临海市| 噶尔县| 南川市|