最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

45篇Transformer精選論文分享!模型、架構(gòu)、訓(xùn)練方法一次看完!

2023-08-16 17:55 作者:深度之眼官方賬號(hào)  | 我要投稿

今天來(lái)聊聊transformer。

得益于ChatGPT的爆火,今年大模型可謂是人工智能領(lǐng)域最熱門(mén)的研究方向,作為大模型奠基之作的transformer也重新活躍在眾人面前,新的研究成果一個(gè)接一個(gè)出,學(xué)姐銳評(píng):卷。

對(duì)于剛?cè)腴T(mén)AI的同學(xué)來(lái)說(shuō),transformer是必學(xué)的知識(shí)點(diǎn);對(duì)于其他人工智能領(lǐng)域的同學(xué)來(lái)說(shuō),transformer更是必須要掌握的基礎(chǔ)。

所以學(xué)姐這回幫大家整理了transformer相關(guān)的論文資料,包括23篇模型相關(guān)論文,10篇架構(gòu)相關(guān)論文,8篇預(yù)訓(xùn)練后處理,4篇訓(xùn)練方法,方面剛?cè)腴T(mén)的小白快速上手,也方便其他同學(xué)梳理自己的知識(shí)體系。

論文list如下:

掃碼添加小享,回復(fù)“精選45”??

免費(fèi)獲取全部45篇論文+代碼合集

一、模型(23)

GPT

Improving Language Understanding by Generative Pre-Training

GPT-2

Language Models are Unsupervised Multitask Learners

GPT-3

Language Models are Few-Shot Learners

GPT-3.5

Models referred to as"GPT 3.5"

GPT-4

GPT-4 Technical Report

GPT-NeoX

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-J

Pretrained Models

Gopher

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

AlphaCode

Competition-Level Code Generation with AlphaCode

RETRO

Improving language models by retrievingfrom trillions of tokens

Chinchilla

Training Compute-Optimal Large Language Models

Flamingo

Flamingo: a Visual Language Model for FewShot Learning

Gato

A Generalist Agent

Anthropic LM

A General Language Assistantas a Laboratory for Alignment

PaLM

PaLM: Scaling Language Modeling with Pathways

GLaM

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

LAMDA

LaMDA: Language Models for Dialog Applications

LLaMA

Open and Efficient Foundation Language Models

Switch

Switch Transformers: Scaling to Trillion Parameter Modelswith Simple and Efficient Sparsity

BLOOM

BLOOM: A 176B-Parameter Open-Access MultilingualLanguage Model

Galactica

Galactica: A Large Language Model for Science

OPT

OPT: Open Pre-trained Transformer Language Models

GLM-130B

GLM-130B: AN OPEN BILINGUAL PRE-TRAINEDMODEL

二、架構(gòu)(10)

多查詢(xún)注意力

Fast Transformer Decoding: One Write-Head is All You Need

稀疏注意力

Generating Long Sequences with Sparse Transformers

混合專(zhuān)家

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

UNIFIED SCALING LAWS FOR ROUTED LANGUAGE MODELS

Efficient Large Scale Language Modeling with Mixtures of Experts

FlashAttention

FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness

編碼器 + 解碼器

Attention Is All You Need

平行注意力

PaLM: Scaling Language Modeling with Pathways

RoPE

ROFORMER: ENHANCED TRANSFORMER WITH ROTARYPOSITION EMBEDDING

ALiBi

TRAIN SHORT.TEST LONG: ATTENTION WITH LINEARBIASES ENABLES INPUT LENGTH EXTRAPOLATION

三、預(yù)訓(xùn)練后處理(8)

采用 PPO 算法的 RLHF

Deep Reinforcement Learning from Human Preferences

Learning to summarize from human feedback

Constitutional

Constitutional Al: Harmlessness from AI Feedback

Minerva

Solving Quantitative Reasoning Problems with Language Models

Codex

Evaluating Large Language Models Trained on Code

FeedME (SFT)

Training language models to follow instructions with human feedback

Fine-Tuning Language Models from Human Preferences

FLAN

FINETUNED LANGUAGE MODELS ARE ZERO-SHOTLEARNERS

四、訓(xùn)練方法(4)

設(shè)置超參數(shù)

Training Compute-Optimal Large Language Models

Scaling Laws for Neural Language Models

基于人類(lèi)反饋的預(yù)訓(xùn)練

Pretraining Language Models with Human Preferences

MuP

Tensor Programs V:Tuning Large Neural Networks viaZero-Shot Hyperparameter Transfer

掃碼添加小享,回復(fù)“精選45”??

免費(fèi)獲取全部45篇論文+代碼合集


45篇Transformer精選論文分享!模型、架構(gòu)、訓(xùn)練方法一次看完!的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
收藏| 襄垣县| 饶河县| 南丹县| 双牌县| 通城县| 黄平县| 虹口区| 宜州市| 正镶白旗| 五华县| 大埔县| 金溪县| 舟山市| 仲巴县| 读书| 金坛市| 根河市| 视频| 华容县| 重庆市| 双辽市| 崇信县| 色达县| 邵武市| 自治县| 女性| 法库县| 东城区| 上饶市| 宝应县| 瑞丽市| 方正县| 仙游县| 南乐县| 昭通市| 台南市| 定南县| 永城市| 桑植县| 桂平市|