最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

肝了190+篇ViT頂會(huì)必讀論文!涵蓋通用、訓(xùn)練、卷積等細(xì)分領(lǐng)域

2023-09-11 18:00 作者:深度之眼官方賬號(hào)  | 我要投稿

今天分享近三年(2021-2023)各大頂會(huì)中的視覺Transformer論文,有190+篇,涵蓋通用ViT、高效ViT、訓(xùn)練transformer、卷積transformer等細(xì)分領(lǐng)域。

全部論文原文及開源代碼學(xué)姐已打包,看這里??????

掃碼添加小享,回復(fù)“ViT200

免費(fèi)獲取全部論文+開源代碼合集

General Vision Transformer(通用ViT)

1、GPViT: "GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation", ICLR, 2023

標(biāo)題:GPViT: 一種具有組傳播的高分辨率非層次結(jié)構(gòu)視覺Transformer

內(nèi)容:本文提出了一種高效的替代組傳播塊(GP塊)來交換全局信息。在每個(gè)GP塊中,特征首先由一定數(shù)量的可學(xué)習(xí)組標(biāo)記分組,然后在組特征間進(jìn)行組傳播以交換全局信息,最后通過一個(gè)transformer解碼器將更新后的組特征中的全局信息返回到圖像特征。作者在各種視覺識(shí)別任務(wù)上評(píng)估了GPViT,包括圖像分類、語義分割、目標(biāo)檢測(cè)和實(shí)例分割。與之前的工作相比,該方法在所有任務(wù)上都取得了顯著的性能提升,特別是在需要高分辨率輸出的任務(wù)上,例如在語義分割任務(wù)ADE20K上,GPViT-L3的性能比Swin Transformer-B高出2.0 mIoU,而參數(shù)數(shù)量只有其一半。

2、CPVT: "Conditional Positional Encodings for Vision Transformers", ICLR, 2023

標(biāo)題:條件位置編碼在視覺transformer中的應(yīng)用

內(nèi)容:本文提出了一種針對(duì)視覺Transformer的條件位置編碼(CPE)方案。與以前預(yù)定義且與輸入標(biāo)記無關(guān)的固定或可學(xué)習(xí)位置編碼不同,CPE是動(dòng)態(tài)生成的,并取決于輸入標(biāo)記的局部鄰域。因此,CPE可以輕松概括到比模型在訓(xùn)練期間見過的更長(zhǎng)的輸入序列。此外,CPE可以在視覺任務(wù)中保持所需的平移等價(jià)性,從而提高性能。作者使用一個(gè)簡(jiǎn)單的位置編碼生成器(PEG)來實(shí)現(xiàn)CPE,并無縫集成到當(dāng)前的Transformer框架中?;赑EG,作者提出了條件位置編碼視覺Transformer(CPVT)。實(shí)驗(yàn)證明,CPVT的注意力圖與學(xué)習(xí)到的位置編碼非常相似,并取得了優(yōu)于狀態(tài)的結(jié)果。

3、LipsFormer: "LipsFormer: Introducing Lipschitz Continuity to Vision Transformers", ICLR, 2023

標(biāo)題:LipsFormer: 在視覺Transformer中引入Lipschitz連續(xù)性

內(nèi)容:本文提出了一種稱為L(zhǎng)ipsFormer的Lipschitz連續(xù)Transformer,在理論和實(shí)驗(yàn)上探索了提高基于Transformer的模型訓(xùn)練穩(wěn)定性的方法。與之前通過學(xué)習(xí)率預(yù)熱、層規(guī)范化、注意力機(jī)制和權(quán)重初始化來解決訓(xùn)練不穩(wěn)定的經(jīng)驗(yàn)技巧不同,本文認(rèn)為L(zhǎng)ipschitz連續(xù)性是確保訓(xùn)練穩(wěn)定性的更本質(zhì)的特性。在LipsFormer中,不穩(wěn)定的Transformer組件模塊被Lipschitz連續(xù)的對(duì)應(yīng)物替換:LayerNorm被CenterNorm替換,Xavier初始化被譜初始化替換,點(diǎn)積注意力被縮放余弦相似度注意力替換,并引入加權(quán)殘差連接。作者證明引入的這些模塊滿足Lipschitz連續(xù)性,并導(dǎo)出了LipsFormer的Lipschitz常數(shù)上確界。

其他51篇

  1. BiFormer: "BiFormer: Vision Transformer with Bi-Level Routing Attention", CVPR, 2023

  2. AbSViT: "Top-Down Visual Attention from Analysis by Synthesis", CVPR, 2023

  3. DependencyViT: "Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention", CVPR, 2023

  4. ResFormer: "ResFormer: Scaling ViTs with Multi-Resolution Training", CVPR, 2023

  5. SViT: "Vision Transformer with Super Token Sampling", CVPR, 2023

  6. PaCa-ViT: "PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers", CVPR, 2023

  7. GC-ViT: "Global Context Vision Transformers", ICML, 2023

  8. MAGNETO: "MAGNETO: A Foundation Transformer", ICML, 2023

  9. SMT: "Scale-Aware Modulation Meet Transformer", ICCV, 2023

  10. CrossFormer++: "CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention", arXiv, 2023

  11. QFormer: "Vision Transformer with Quadrangle Attention" arXiv, 2023

  12. LIT: "Less is More: Pay Less Attention in Vision Transformers", AAAI, 2022

  13. DTN: "Dynamic Token Normalization Improves Vision Transformer", ICLR, 2022

  14. RegionViT: "RegionViT: Regional-to-Local Attention for Vision Transformers", ICLR, 2022

  15. CrossFormer: "CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention", ICLR, 2022

  16. CSWin: "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows", CVPR, 2022

  17. MPViT: "MPViT: Multi-Path Vision Transformer for Dense Prediction", CVPR, 2022

  18. Diverse-ViT: "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy", CVPR, 2022

  19. DW-ViT: "Beyond Fixation: Dynamic Window Visual Transformer", CVPR, 2022

  20. MixFormer: "MixFormer: Mixing Features across Windows and Dimensions", CVPR, 2022

  21. DAT: "Vision Transformer with Deformable Attention", CVPR, 2022

  22. Swin-Transformer-V2: "Swin Transformer V2: Scaling Up Capacity and Resolution", CVPR, 2022

  23. MSG-Transformer: "MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens", CVPR, 2022

  24. NomMer: "NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition", CVPR, 2022

  25. Shunted: "Shunted Self-Attention via Multi-Scale Token Aggregation", CVPR, 2022

  26. PyramidTNT: "PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture", CVPRW, 2022

  27. ReMixer: "ReMixer: Object-aware Mixing Layer for Vision Transformers", CVPRW, 2022

  28. UN: "Unified Normalization for Accelerating and Stabilizing Transformers", ACMMM, 2022

  29. Wave-ViT: "Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning", ECCV, 2022

  30. DaViT: "DaViT: Dual Attention Vision Transformers", ECCV, 2022

  31. MaxViT: "MaxViT: Multi-Axis Vision Transformer", ECCV, 2022

  32. VSA: "VSA: Learning Varied-Size Window Attention in Vision Transformers", ECCV, 2022

  33. LITv2: "Fast Vision Transformers with HiLo Attention", NeurIPS, 2022

  34. ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR 2021

  35. Perceiver:Perceiver: General Perception with Iterative Attention(ICML 2021)

  36. PiT:Rethinking Spatial Dimensions of Vision Transformers(ICCV 2021)

  37. VT:Visual Transformers: Where Do Transformers Really Belong in Vision Models?(ICCV 2021)

  38. PVT:Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(ICCV 2021)

  39. iRPE:Rethinking and Improving Relative Position Encoding for Vision Transformer(ICCV 2021)

  40. CaiT:Going deeper with Image Transformers(ICCV 2021)

  41. Swin-Transformer:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows(ICCV 2021)

  42. T2T-ViT:Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet(ICCV 2021)

  43. DPT:DPT: Deformable Patch-based Transformer for Visual Recognition(ACMMM 2021)

  44. Focal: "Focal Attention for Long-Range Interactions in Vision Transformers", NeurIPS, 2021

  45. Twins: "Twins: Revisiting Spatial Attention Design in Vision Transformers", NeurIPS, 2021

  46. ARM: "Blending Anti-Aliasing into Vision Transformer", NeurIPS, 2021

  47. DVT: "Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length", NeurIPS, 2021

  48. TNT: "Transformer in Transformer", NeurIPS, 2021

  49. ViTAE: "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias", NeurIPS, 2021

  50. DeepViT: "DeepViT: Towards Deeper Vision Transformer", arXiv, 2021

  51. LV-ViT: "All Tokens Matter: Token Labeling for Training Better Vision Transformers", NeurIPS, 2021

Efficient Vision Transformer(高效VIT)

1、Tri-Level: "Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training", AAAI, 2023

標(biāo)題:一層一層剝開洋蔥:用于高效視覺Transformer訓(xùn)練的數(shù)據(jù)冗余分層降低

內(nèi)容:本文從三個(gè)稀疏角度提出了一種端到端高效訓(xùn)練框架,稱為Tri-Level E-ViT。具體來說,作者利用分層數(shù)據(jù)冗余降低方案,通過在三個(gè)級(jí)別探索稀疏性:數(shù)據(jù)集中的訓(xùn)練示例數(shù),每個(gè)示例中的patch(token)數(shù),以及位于注意力權(quán)重中的token間的連接數(shù)。通過大量實(shí)驗(yàn),證明了所提出的技術(shù)可以顯著加速各種ViT架構(gòu)的訓(xùn)練,同時(shí)保持準(zhǔn)確率。

2、ToMe: "Token Merging: Your ViT But Faster", ICLR, 2023

標(biāo)題:Token融合:你的ViT變得更快

內(nèi)容:作者提出了Token Merging (ToMe),這是一種簡(jiǎn)單的方法,可以在不需要訓(xùn)練的情況下增加現(xiàn)有ViT模型的吞吐量。ToMe使用一個(gè)通用且輕量級(jí)的匹配算法逐步合并transformer中相似的token,其速度與剪枝相當(dāng),但更準(zhǔn)確。開箱即用,ToMe可以使最先進(jìn)的ViT-L @ 512和ViT-H @ 518模型在圖像上的吞吐量提高2倍,在視頻上的ViT-L吞吐量提高2.2倍,其準(zhǔn)確率僅下降0.2-0.3%。ToMe也可以輕松地在訓(xùn)練期間應(yīng)用,在實(shí)踐中將MAE在視頻上的微調(diào)速度提高近2倍。 ToMe訓(xùn)練可以進(jìn)一步最小化準(zhǔn)確率下降,在音頻上使ViT-B的吞吐量提高2倍,準(zhǔn)確率僅下降0.4% mAP。 從定性上看,作者發(fā)現(xiàn)ToMe可以將對(duì)象部分合并為一個(gè)token,甚至可以跨多個(gè)視頻幀??傮w而言,ToMe的準(zhǔn)確率和速度在圖像、視頻和音頻方面與最先進(jìn)的技術(shù)相當(dāng)。

3、HiViT: "HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer", ICLR, 2023

標(biāo)題:HiViT:一種更簡(jiǎn)單、更高效的分層視覺Transformer設(shè)計(jì)

內(nèi)容:在本文中,作者提出了一種新的分層視覺Transformer設(shè)計(jì),稱為HiViT(Hierarchical ViT的縮寫),它在MIM中同時(shí)具有高效率和良好性能。 關(guān)鍵是刪除不必要的“局部單元間操作”,導(dǎo)出結(jié)構(gòu)簡(jiǎn)單的分層視覺Transformer,其中掩蔽單元可以像普通視覺Transformer一樣串行化。 為此,作者從Swin Transformer開始,(i)將掩蔽單元大小設(shè)置為Swin Transformer主階段的標(biāo)記大小,(ii)在主階段之前關(guān)閉單元間自注意力,(iii)消除主階段之后的所有操作。

其他39篇

  1. STViT: "Making Vision Transformers Efficient from A Token Sparsification View", CVPR, 2023

  2. SparseViT: "SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer", CVPR, 2023

  3. Slide-Transformer: "Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention", CVPR, 2023

  4. RIFormer: "RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer", CVPR, 2023

  5. EfficientViT: "EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention", CVPR, 2023

  6. Castling-ViT: "Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference", CVPR, 2023

  7. ViT-Ti: "RGB no more: Minimally-decoded JPEG Vision Transformers", CVPR, 2023

  8. LTMP: "Learned Thresholds Token Merging and Pruning for Vision Transformers", ICMLW, 2023

  9. Evo-ViT: "Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer", AAAI, 2022

  10. PS-Attention: "Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention", AAAI, 2022

  11. ShiftViT: "When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism", AAAI, 2022

  12. EViT: "Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations", ICLR, 2022

  13. QuadTree: "QuadTree Attention for Vision Transformers", ICLR, 2022

  14. Anti-Oversmoothing: "Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice", ICLR, 2022

  15. QnA: "Learned Queries for Efficient Local Attention", CVPR, 2022

  16. LVT: "Lite Vision Transformer with Enhanced Self-Attention", CVPR, 2022

  17. A-ViT: "A-ViT: Adaptive Tokens for Efficient Vision Transformer", CVPR, 2022

  18. Rev-MViT: "Reversible Vision Transformers", CVPR, 2022

  19. ATS: "Adaptive Token Sampling For Efficient Vision Transformers", ECCV, 2022

  20. EdgeViT: "EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers", ECCV, 2022

  21. SReT: "Sliced Recursive Transformer", ECCV, 2022

  22. SiT: "Self-slimmed Vision Transformer", ECCV, 2022

  23. M(3)ViT: "M(3)ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design", NeurIPS, 2022

  24. ResT-V2: "ResT V2: Simpler, Faster and Stronger", NeurIPS, 2022

  25. EfficientFormer: "EfficientFormer: Vision Transformers at MobileNet Speed", NeurIPS, 2022

  26. GhostNetV2: "GhostNetV2: Enhance Cheap Operation with Long-Range Attention", NeurIPS, 2022

  27. DeiT: "Training data-efficient image transformers & distillation through attention", ICML, 2021

  28. ConViT: "ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases", ICML, 2021

  29. HVT: "Scalable Visual Transformers with Hierarchical Pooling", ICCV, 2021

  30. CrossViT: "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification", ICCV, 2021

  31. ViL: "Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding", ICCV, 2021

  32. Visformer: "Visformer: The Vision-friendly Transformer", ICCV, 2021

  33. MultiExitViT: "Multi-Exit Vision Transformer for Dynamic Inference", BMVC, 2021

  34. SViTE: "Chasing Sparsity in Vision Transformers: An End-to-End Exploration", NeurIPS, 2021

  35. DGE: "Dynamic Grained Encoder for Vision Transformers", NeurIPS, 2021

  36. GG-Transformer: "Glance-and-Gaze Vision Transformer", NeurIPS, 2021

  37. DynamicViT: "DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification", NeurIPS, 2021

  38. ResT: "ResT: An Efficient Transformer for Visual Recognition", NeurIPS, 2021

  39. SOFT: "SOFT: Softmax-free Transformer with Linear Complexity", NeurIPS, 2021

Conv + Transformer(卷積+Transformer)

1、SATA: "Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets", WACV, 2023

標(biāo)題:小數(shù)據(jù)集上視覺Transformer中的累積微不足道的注意力非常重要

內(nèi)容:作者提出通過閾值將注意力權(quán)重劃分為微不足道和非微不足道,然后通過所提出的Trivial WeIghts Suppression Transformation (TWIST)抑制累積的微不足道注意力權(quán)重,以減少注意力噪音。在CIFAR-100和Tiny-ImageNet數(shù)據(jù)集上的大量實(shí)驗(yàn)表明,作者的抑制方法將Vision Transformer的準(zhǔn)確率提高了高達(dá)2.3%。

2、SparK: "Sparse and Hierarchical Masked Modeling for Convolutional Representation Learning", ICLR, 2023

標(biāo)題:卷積表示學(xué)習(xí)的稀疏分層遮擋建模

內(nèi)容:作者識(shí)別并克服了將BERT風(fēng)格的預(yù)訓(xùn)練或遮蔽圖像建模擴(kuò)展到卷積網(wǎng)絡(luò)(convnets)的兩個(gè)關(guān)鍵障礙:(i) 卷積操作無法處理不規(guī)則的、隨機(jī)遮蔽的輸入圖像,(ii) BERT預(yù)訓(xùn)練的單尺度性質(zhì)與convnet的層次結(jié)構(gòu)不一致。 對(duì)于(i),作者將未遮蔽的像素視為3D點(diǎn)云的稀疏voxel,并使用稀疏卷積進(jìn)行編碼。 這是2D遮蔽建模中首次使用稀疏卷積。 對(duì)于(ii),作者開發(fā)了一個(gè)分層解碼器,用于從多尺度編碼特征重構(gòu)圖像。 該方法稱為稀疏遮蔽建模(SparK),它是通用的:可以直接用于任何卷積模型,無需backbone修改。

3、MOAT: "MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models", ICLR, 2023

標(biāo)題:MOAT: 交替移動(dòng)卷積和注意力產(chǎn)生強(qiáng)大的視覺模型

內(nèi)容:本文提出了MOAT,這是一類建立在移動(dòng)卷積(即逆殘差塊)和注意力機(jī)制之上的神經(jīng)網(wǎng)絡(luò)。與當(dāng)前將移動(dòng)卷積塊和transformer塊分開堆疊的工作不同,作者有效地將它們合并成一個(gè)MOAT塊。從一個(gè)標(biāo)準(zhǔn)的Transformer塊開始,用移動(dòng)卷積塊替換其多層感知機(jī),并進(jìn)一步在自注意力操作之前對(duì)其進(jìn)行重排序。移動(dòng)卷積塊不僅增強(qiáng)了網(wǎng)絡(luò)的表示能力,還產(chǎn)生了更好的下采樣特征。概念簡(jiǎn)單的MOAT網(wǎng)絡(luò)出人意料地有效,在ImageNet-1K上取得了89.1%的top-1準(zhǔn)確率,在ImageNet-1K-V2上取得了81.5%的top-1準(zhǔn)確率,均使用了ImageNet22K預(yù)訓(xùn)練。

掃碼添加小享,回復(fù)“ViT200

免費(fèi)獲取全部論文+開源代碼合集

其他14篇

  1. InternImage: "InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions", CVPR, 2023

  2. PSLT: "PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift", TPAMI, 2023

  3. MobileViT: "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer", ICLR, 2022

  4. Mobile-Former: "Mobile-Former: Bridging MobileNet and Transformer", CVPR, 2022

  5. TinyViT: "TinyViT: Fast Pretraining Distillation for Small Vision Transformers", ECCV, 2022

  6. ParC-Net: "ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer", ECCV, 2022

  7. ?: "How to Train Vision Transformer on Small-scale Datasets?", BMVC, 2022

  8. DHVT: "Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets", NeurIPS, 2022

  9. iFormer: "Inception Transformer", NeurIPS, 2022

  10. LeViT: "LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference", ICCV, 2021

  11. CeiT: "Incorporating Convolution Designs into Visual Transformers", ICCV, 2021

  12. Conformer: "Conformer: Local Features Coupling Global Representations for Visual Recognition", ICCV, 2021

  13. CoaT: "Co-Scale Conv-Attentional Image Transformers", ICCV, 2021

  14. CvT: "CvT: Introducing Convolutions to Vision Transformers", ICCV, 2021

Training + Transformer(訓(xùn)練+Transformer)

1、MixPro: "MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer", ICLR, 2023

標(biāo)題:MixPro: 使用MaskMix和漸進(jìn)式注意力標(biāo)記的數(shù)據(jù)增強(qiáng),用于視覺Transformer

內(nèi)容:作者分別在圖像空間和標(biāo)簽空間中提出了MaskMix和漸進(jìn)式注意力標(biāo)記(PAL)。具體來說,從圖像空間的角度來看,作者設(shè)計(jì)了MaskMix,它根據(jù)網(wǎng)格狀遮罩混合兩張圖像。每個(gè)遮罩補(bǔ)丁的大小是可調(diào)的,并且是圖像補(bǔ)丁大小的整數(shù)倍,這確保每個(gè)圖像補(bǔ)丁只來自一張圖像并包含更多的全局內(nèi)容。從標(biāo)簽空間的角度來看,作者設(shè)計(jì)了PAL,它利用漸進(jìn)因子動(dòng)態(tài)重新加權(quán)混合注意力標(biāo)簽的注意力權(quán)重。最后,作者將MaskMix和漸進(jìn)式注意力標(biāo)記組合起來,作為新的數(shù)據(jù)增強(qiáng)方法,命名為MixPro。

2、ConMIM: "Masked Image Modeling with Denoising Contrast", ICLR, 2023

標(biāo)題:Masked Image Modeling with Denoising Contrast

內(nèi)容:MIM最近在視覺Transformers(ViTs)上取得了state-of-the-art的表現(xiàn),其核心是通過去噪自動(dòng)編碼機(jī)制增強(qiáng)網(wǎng)絡(luò)對(duì)圖像塊級(jí)上下文的建模能力。與之前的工作不同,作者沒有額外增加圖像標(biāo)記器的訓(xùn)練階段,而是發(fā)掘了對(duì)比學(xué)習(xí)在去噪自動(dòng)編碼上的巨大潛力,并提出了一種純MIM方法ConMIM,它產(chǎn)生簡(jiǎn)單的圖像內(nèi)部塊間對(duì)比約束作為遮擋補(bǔ)丁預(yù)測(cè)的唯一學(xué)習(xí)目標(biāo)。作者進(jìn)一步通過非對(duì)稱設(shè)計(jì)增強(qiáng)了去噪機(jī)制,包括圖像擾動(dòng)和模型進(jìn)度率,以改進(jìn)網(wǎng)絡(luò)預(yù)訓(xùn)練。

3、MFM: "Masked Frequency Modeling for Self-Supervised Visual Pre-Training", ICLR, 2023

標(biāo)題:基于遮擋的頻域建模用于自監(jiān)督視覺預(yù)訓(xùn)練

內(nèi)容:作者提出了遮擋頻率建模(MFM),這是一種基于頻域的統(tǒng)一方法,用于視覺模型的自監(jiān)督預(yù)訓(xùn)練。它與在空間域中隨機(jī)插入遮擋令牌到輸入嵌入不同,MFM從頻域的角度出發(fā)。具體來說,MFM首先遮擋輸入圖像的一部分頻率分量,然后在頻譜上預(yù)測(cè)缺失的頻率。作者的關(guān)鍵洞見是,在頻域中預(yù)測(cè)遮擋的組件比在空間域中預(yù)測(cè)遮擋的補(bǔ)丁更適合揭示潛在的圖像模式,因?yàn)榇嬖诖罅康目臻g冗余。該發(fā)現(xiàn)表明,在遮擋預(yù)測(cè)策略的正確配置下,高頻分量中的結(jié)構(gòu)信息和低頻分量中的低級(jí)統(tǒng)計(jì)信息對(duì)于學(xué)習(xí)良好的表示都很有用。

其他46篇

  1. VisualAtom: "Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves", CVPR, 2023

  2. LGSimCLR: "Learning Visual Representations via Language-Guided Sampling", CVPR, 2023

  3. DisCo-CLIP: "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training", CVPR, 2023

  4. MaskCLIP: "MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining", CVPR, 2023

  5. MAGE: "MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis", CVPR, 2023 (Google).

  6. MixMIM: "MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning", CVPR, 2023

  7. iTPN: "Integrally Pre-Trained Transformer Pyramid Networks", CVPR, 2023

  8. DropKey: "DropKey for Vision Transformer", CVPR, 2023

  9. FlexiViT: "FlexiViT: One Model for All Patch Sizes", CVPR, 2023

  10. CLIPPO: "CLIPPO: Image-and-Language Understanding from Pixels Only", CVPR, 2023

  11. DMAE: "Masked Autoencoders Enable Efficient Knowledge Distillers", CVPR, 2023

  12. HPM: "Hard Patches Mining for Masked Image Modeling", CVPR, 2023

  13. MaskAlign: "Stare at What You See: Masked Image Modeling without Reconstruction", CVPR, 2023

  14. RILS: "RILS: Masked Visual Reconstruction in Language Semantic Space", CVPR, 2023

  15. FDT: "Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens", CVPR, 2023

  16. OpenCLIP: "Reproducible scaling laws for contrastive language-image learning", CVPR, 2023

  17. DiHT: "Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training", CVPR, 2023

  18. M3I-Pretraining: "Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information", CVPR, 2023

  19. SN-Net: "Stitchable Neural Networks", CVPR, 2023

  20. MAE-Lite: "A Closer Look at Self-supervised Lightweight Vision Transformers", ICML, 2023

  21. GHN-3: "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?", ICML, 2023

  22. A(2)MIM: "Architecture-Agnostic Masked Image Modeling - From ViT back to CNN", ICML, 2023

  23. PQCL: "Patch-level Contrastive Learning via Positional Query for Visual Pre-training", ICML, 2023

  24. DreamTeacher: "DreamTeacher: Pretraining Image Backbones with Deep Generative Models", ICCV, 2023

  25. BEiT: "BEiT: BERT Pre-Training of Image Transformers", ICLR, 2022

  26. iBOT: "Image BERT Pre-training with Online Tokenizer", ICLR, 2022

  27. AutoProg: "Automated Progressive Learning for Efficient Training of Vision Transformers", CVPR, 2022

  28. MAE: "Masked Autoencoders Are Scalable Vision Learners", CVPR, 2022

  29. SimMIM: "SimMIM: A Simple Framework for Masked Image Modeling", CVPR, 2022

  30. SelfPatch: "Patch-Level Representation Learning for Self-Supervised Vision Transformers", CVPR, 2022

  31. Bootstrapping-ViTs: "Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training", CVPR, 2022

  32. TransMix: "TransMix: Attend to Mix for Vision Transformers", CVPR, 2022

  33. data2vec: "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language", ICML, 2022

  34. SSTA: "Self-supervised Models are Good Teaching Assistants for Vision Transformers", ICML, 2022

  35. MP3: "Position Prediction as an Effective Pretraining Strategy", ICML, 2022

  36. CutMixSL: "Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning", IJCAI, 2022

  37. BootMAE: "Bootstrapped Masked Autoencoders for Vision BERT Pretraining", ECCV, 2022

  38. TokenMix: "TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers", ECCV, 2022

  39. ?: "Locality Guidance for Improving Vision Transformers on Tiny Datasets", ECCV, 2022

  40. HAT: "Improving Vision Transformers by Revisiting High-frequency Components", ECCV, 2022

  41. AttMask: "What to Hide from Your Students: Attention-Guided Masked Image Modeling", ECCV, 2022

  42. SLIP: "SLIP: Self-supervision meets Language-Image Pre-training", ECCV, 2022

  43. mc-BEiT: "mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training", ECCV, 2022

  44. SL2O: "Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models", ECCV, 2022

  45. TokenMixup: "TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers", NeurIPS, 2022

  46. GreenMIM: "Green Hierarchical Vision Transformer for Masked Image Modeling", NeurIPS, 2022

Robustness + Transformer(魯棒性+Transformer)16篇

  1. RobustCNN: "Can CNNs Be More Robust Than Transformers?", ICLR, 2023

  2. DMAE: "Denoising Masked AutoEncoders are Certifiable Robust Vision Learners", ICLR, 2023

  3. TGR: "Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization", CVPR, 2023

  4. ?: "Vision Transformers are Robust Learners", AAAI, 2022

  5. PNA: "Towards Transferable Adversarial Attacks on Vision Transformers", AAAI, 2022

  6. MIA-Former: "MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation", AAAI, 2022

  7. Patch-Fool: "Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?", ICLR, 2022

  8. Smooth-ViT: "Certified Patch Robustness via Smoothed Vision Transformers", CVPR, 2022

  9. RVT: "Towards Robust Vision Transformer", CVPR, 2022

  10. VARS: "Visual Attention Emerges from Recurrent Sparse Reconstruction", ICML, 2022

  11. FAN: "Understanding The Robustness in Vision Transformers", ICML, 2022

  12. CFA: "Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment", IJCAI, 2022

  13. ?: "Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem", ECML-PKDD, 2022

  14. ViP: "ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers", ECCV, 2022

  15. ?: "When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture", NeurIPS, 2022

  16. RobustViT: "Optimizing Relevance Maps of Vision Transformers Improves Robustness", NeurIPS, 2022

Model Compression + Transformer(模型壓縮 + Transformer)12篇

  1. TPS: "Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers", CVPR, 2023

  2. BinaryViT: "BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models", CVPRW, 2023

  3. OFQ: "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023

  4. UPop: "UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers", ICML, 2023

  5. COMCAT: "COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models", ICML, 2023

  6. UVC: "Unified Visual Transformer Compression", ICLR, 2022

  7. MiniViT: "MiniViT: Compressing Vision Transformers with Weight Multiplexing", CVPR, 2022

  8. SPViT: "SPViT: Enabling Faster Vision Transformers via Soft Token Pruning", ECCV, 2022

  9. PSAQ-ViT: "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022

  10. Q-ViT: "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022

  11. VTC-LFC: "VTC-LFC: Vision Transformer Compression with Low-Frequency Components", NeurIPS, 2022

  12. PSAQ-ViT-V2: "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", arXiv, 2022

掃碼添加小享,回復(fù)“ViT200

免費(fèi)獲取全部論文+開源代碼合集


肝了190+篇ViT頂會(huì)必讀論文!涵蓋通用、訓(xùn)練、卷積等細(xì)分領(lǐng)域的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
定远县| 新蔡县| 武山县| 江永县| 固原市| 黄山市| 满城县| 大理市| 海门市| 三台县| 莎车县| 紫阳县| 兴业县| 江孜县| 白玉县| 中江县| 章丘市| 云和县| 泰安市| 宿州市| 榕江县| 枝江市| 连南| 大竹县| 铜梁县| 金乡县| 永新县| 治多县| 兴城市| 新河县| 哈巴河县| 萨嘎县| SHOW| 高州市| 阿拉善左旗| 蓬莱市| 奎屯市| 达日县| 莱西市| 乌恰县| 张家界市|