最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

爆肝80+篇多模態(tài)機(jī)器學(xué)習(xí)高分論文及源碼分享!含2023最新

2023-11-20 17:30 作者:深度之眼官方賬號  | 我要投稿

多模態(tài)機(jī)器學(xué)習(xí)(MultiModal Machine Learning, MMML)是一種機(jī)器學(xué)習(xí)方法,它旨在解決復(fù)雜任務(wù),如多模態(tài)情感分析、跨語言圖像搜索等,這些任務(wù)需要同時(shí)考慮多種模態(tài)的數(shù)據(jù)并從中提取有用的信息。

得益于各種語言、視覺、視頻、音頻等大模型的性能不斷提升,多模態(tài)機(jī)器學(xué)習(xí)也逐漸興起,它可以幫助人工智能更全面、深入地理解周圍環(huán)境,提高模型的泛化能力和魯棒性,同時(shí)還可以促進(jìn)各學(xué)科之間的交流和融合。

在發(fā)展過程中,多模態(tài)機(jī)器學(xué)習(xí)的研究也面臨著許多方面的挑戰(zhàn),對于想要發(fā)論文的同學(xué)來說,了解這些挑戰(zhàn)并掌握已有的解決方案十分重要,可以幫助我們在此基礎(chǔ)上做出創(chuàng)新,快速找到自己的idea。

為了幫助同學(xué)們發(fā)出自己的paper,學(xué)姐這次又爆肝整理了多模態(tài)機(jī)器學(xué)習(xí)相關(guān)的81篇論文,包含表征、對齊、推理、生成、遷移、量化6個(gè)核心技術(shù)挑戰(zhàn)分類,篇幅原因每個(gè)分類只做簡單介紹。

掃碼添加小享,回復(fù)“多模態(tài)ML

免費(fèi)領(lǐng)取全部81篇論文及源碼

表征(12篇)

1.Multiplicative Interactions and Where to Find Them

乘法交互作用及其來源

簡述:論文探討了乘法交互在神經(jīng)網(wǎng)絡(luò)設(shè)計(jì)中的作用,它是一種可以描述多種神經(jīng)網(wǎng)絡(luò)架構(gòu)模式(如門控、注意力層、超網(wǎng)絡(luò)和動態(tài)卷積等)的統(tǒng)一框架。作者認(rèn)為,乘法交互層可以豐富神經(jīng)網(wǎng)絡(luò)的函數(shù)類,并且在融合多信息流或條件計(jì)算時(shí)提供強(qiáng)大的歸納偏差。通過在大型復(fù)雜強(qiáng)化學(xué)習(xí)和序列建模任務(wù)中的應(yīng)用,作者證明了乘法交互的潛力和有效性,它可以提高神經(jīng)網(wǎng)絡(luò)的表現(xiàn),并提供設(shè)計(jì)新神經(jīng)網(wǎng)絡(luò)體系結(jié)構(gòu)的新思路。

  • 2.Tensor fusion network for multimodal sentiment analysis

  • 3.On the Benefits of Early Fusion in Multimodal Representation Learning

  • 4.Extending long short-term memory for multi-view structured learning

  • 5.Devise: A deep visual-semantic embedding model

  • 6.Learning transferable visual models from natural language supervision

  • 7.Order-embeddings of images and language

  • 8.Learning Concept Taxonomies from Multi-modal Data

  • 9.Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!

  • 10.Learning factorized multimodal representations

  • 11.Multimodal clustering networks for self-supervised learning from unlabeled videos

  • 12.Deep multimodal subspace clustering networks

對齊(10篇)

1.Visual Referring Expression Recognition: What Do Systems Actually Learn?

視覺參照表達(dá)識別:系統(tǒng)實(shí)際學(xué)到了什么?

簡述:論文對最先進(jìn)的指稱表達(dá)式識別系統(tǒng)進(jìn)行了實(shí)證分析,發(fā)現(xiàn)這些系統(tǒng)可能會忽略語言結(jié)構(gòu),而依賴數(shù)據(jù)選擇和注釋過程中的淺層相關(guān)性。作者以一個(gè)在沒有輸入指稱表達(dá)式的情況下在輸入圖像上訓(xùn)練和測試的系統(tǒng)為例,發(fā)現(xiàn)該系統(tǒng)可以在前兩名預(yù)測中達(dá)到71.2%的精度。此外,只給定輸入即可預(yù)測對象類別的系統(tǒng)在前兩名預(yù)測中可以達(dá)到84.2%的精度。這些結(jié)果說明,在追求基于語言的實(shí)際任務(wù)上取得實(shí)質(zhì)性進(jìn)展時(shí),仔細(xì)分析模型正在學(xué)習(xí)什么以及數(shù)據(jù)是如何構(gòu)建的是至關(guān)重要的。

  • 2.Unsupervised multimodal representation learning across medical images and reports

  • 3.Clip-event: Connecting text and images with event structures

  • 4.Learning by aligning videos in time

  • 5.Multimodal adversarial network for cross-modal retrieval

  • 6.Videobert: A joint model for video and language representation learning

  • 7.Visualbert: A simple and performant baseline for vision and language

  • 8.Decoupling the role of data, attention, and losses in multimodal transformers

  • 9.Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

  • 10.MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences

推理(18篇)

1.Neural module networks

神經(jīng)模塊網(wǎng)絡(luò)

簡述:論文描述了一種構(gòu)建和學(xué)習(xí)神經(jīng)模塊網(wǎng)絡(luò)的程序,該程序?qū)⒙?lián)合訓(xùn)練的神經(jīng)“模塊”組合成用于問題回答的深度網(wǎng)絡(luò)。作者的方法將問題分解為其語言子結(jié)構(gòu),并使用這些結(jié)構(gòu)動態(tài)實(shí)例化模塊網(wǎng)絡(luò)(帶有可重用的組件以識別狗、對顏色進(jìn)行分類等)。所得的復(fù)合網(wǎng)絡(luò)是聯(lián)合訓(xùn)練的。作者在兩個(gè)具有挑戰(zhàn)性的視覺問題回答數(shù)據(jù)集上評估了該方法,在VQA自然圖像數(shù)據(jù)集和關(guān)于抽象形狀的復(fù)雜問題的新數(shù)據(jù)集上都取得了最佳結(jié)果。

  • 2.Dynamic memory networks for visual and textual question answering

  • 3.A Survey of Reinforcement Learning Informed by Natural Language

  • 4.Mfas: Multimodal fusion architecture search

  • 5.Multi-view intact space learning

  • 6.Neuro-Symbolic Visual Reasoning: Disentangling Visual from Reasoning

  • 7.Probabilistic neural symbolic models for interpretable visual question answering

  • 8.Learning by abstraction: The neural state machine

  • 9.Socratic models: Composing zero-shot multimodal reasoning with language

  • 10.Vqa-lol: Visual question answering under the lens of logic

  • 11.Multimodal logical inference system for visual-textual entailment

  • 12.Towards causal vqa: Revealing and reducing spurious correlations by invariant and covariant semantic editing

  • 13.Counterfactual vqa: A cause-effect look at language bias

  • 14.Exploring visual relationship for image captioning

  • 15.KAT: A Knowledge Augmented Transformer for Vision-and-Language

  • 16.Building a large-scale multimodal knowledge base system for answering visual queries

  • 17.Visualcomet: Reasoning about the dynamic context of a still image

  • 18.From Recognition to Cognition: Visual Commonsense Reasoning

掃碼添加小享,回復(fù)“多模態(tài)ML

免費(fèi)領(lǐng)取全部81篇論文及源碼

生成(12篇)

1.Multimodal summarization of complex sentences

復(fù)雜句的多模態(tài)總結(jié)

簡述:論文提出了將復(fù)雜句子自動說明為多模態(tài)總結(jié)的想法,這些總結(jié)結(jié)合了圖片、結(jié)構(gòu)和簡化壓縮文本。除了圖片之外,多模態(tài)總結(jié)還提供了關(guān)于發(fā)生了什么、誰做的、對誰做和如何做的額外線索,這可能有助于閱讀困難的人或希望快速瀏覽的人。作者提出了ROC-MMS,一個(gè)用于自動創(chuàng)建復(fù)雜句子的多模態(tài)總結(jié)(MMS)的系統(tǒng),通過生成圖片、文本摘要和結(jié)構(gòu),作者發(fā)現(xiàn),僅憑圖片不足以幫助人們理解大多數(shù)句子,尤其是對不熟悉該領(lǐng)域的讀者而言。

  • 2.Extractive Text-Image Summarization Using Multi-Modal RNN

  • 3.Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video

  • 4.Multimodal abstractive summarization ` for how2 videos

  • 5.Deep fragment embeddings for bidirectional image sentence mapping

  • 6.Phrase-based image captioning

  • 7.Style transfer for co-speech gesture animation: A multi-speaker conditional-mixture approach

  • 8.You said that?: Synthesising talking faces from audio

  • 9.Zero-shot text-to-image generation

  • 10.Stochastic video generation with a learned prior

  • 11.Parallel wavenet: Fast high-fidelity speech synthesis

  • 12.Arbitrary talking face generation via attentional audio-visual coherence learning

遷移(13篇)

1.Integrating Multimodal Information in Large Pretrained Transformers

在大型預(yù)訓(xùn)練Transformer中集成多模態(tài)信息

簡述:這篇論文提出了一個(gè)叫做Multimodal Adaptation Gate(MAG)的裝置,可以附加到BERT和XLNet上,讓它們在微調(diào)期間接受多模態(tài)非語言數(shù)據(jù)。這個(gè)裝置通過生成對BERT和XLNet內(nèi)部表示的轉(zhuǎn)變來實(shí)現(xiàn),而這個(gè)轉(zhuǎn)變是有條件于視覺和聲學(xué)模態(tài)的。實(shí)驗(yàn)表明,微調(diào)MAG-BERT和MAG-XLNet可以顯著提高情感分析性能,超過了以前的基線和僅語言微調(diào)的BERT和XLNet。在CMU-MOSI數(shù)據(jù)集上,MAG-XLNet首次實(shí)現(xiàn)了人類級別的多模態(tài)情感分析性能。

  • 2.Multimodal few-shot learning with frozen language models

  • 3.HighMMT: Towards Modality and Task Generalization for High-Modality Representation Learning

  • 4.FLAVA: A Foundational Language And Vision Alignment Model

  • 5.Pretrained transformers as universal computation engines

  • 6.Scaling up visual and visual language representation learning with noisy text supervision

  • 7.Foundations of multimodal co-learning

  • 8.Found in translation: Learning robust joint representations by cyclic translations between modalities

  • 9.Vokenization: Improving Language Understanding with Contextualized, VisualGrounded Supervision

  • 10.Combining labeled and unlabeled data with co-training

  • 11.Cross-modal data programming enables rapid medical machine learning

  • 12.An information theoretic framework for multi-view learning

  • 13.Comprehensive Semi-Supervised Multi-Modal Learning

量化(16篇)

1.Perceptual Score: What Data Modalities Does Your Model Perceive?

你的模型感知到什么樣的數(shù)據(jù)模式?

簡述:這篇論文介紹了一種新的度量方法,稱為感知分?jǐn)?shù),用于評估模型對輸入特征的不同子集(即模態(tài))的依賴程度。通過使用感知分?jǐn)?shù),作者發(fā)現(xiàn)四個(gè)流行數(shù)據(jù)集上的一種驚人一致趨勢:最近更準(zhǔn)確、最先進(jìn)的視覺問題回答或多模態(tài)對話視覺模型往往不如其前輩對視覺數(shù)據(jù)的感知。這種趨勢令人擔(dān)憂,因?yàn)榇鸢冈絹碓蕉嗟貜奈谋揪€索中推斷出來。使用感知分?jǐn)?shù)還可以通過將分?jǐn)?shù)分解為數(shù)據(jù)子集的貢獻(xiàn)來幫助分析模型偏差。作者希望就多模態(tài)模型的感知能力展開討論,并鼓勵(lì)從事多模態(tài)分類器工作的社區(qū)開始通過提出的感知分?jǐn)?shù)來量化感知能力。

  • 2.Multimodal explanations: Justifying decisions and pointing to the evidence

  • 3.Women also snowboard: Overcoming bias in captioning models

  • 4.FairCVtest Demo: Understanding Bias in Multimodal Learning with a Testbed in Fair Automatic Recruitment

  • 5.Smil: Multimodal learning with severely missing modality

  • 6.VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

  • 7.Behind the scene: Revealing the secrets of pre-trained vision-and-language models

  • 8.Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

  • 9.Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!

  • 10.MultiViz: Towards Visualizing and Understanding Multimodal Models

  • 11.M2Lens: Visualizing and explaining multimodal models for sentiment analysis

  • 12. HighMMT: Towards Modality and Task Generalization for High-Modality Representation Learning

  • 13.One model to learn them all

  • 14.What Makes Training Multi-Modal Classification Networks Hard?

  • 15.Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

  • 16.MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

掃碼添加小享,回復(fù)“多模態(tài)ML

免費(fèi)領(lǐng)取全部81篇論文及源碼


爆肝80+篇多模態(tài)機(jī)器學(xué)習(xí)高分論文及源碼分享!含2023最新的評論 (共 條)

分享到微博請遵守國家法律
吉隆县| 开原市| 新泰市| 林芝县| 奇台县| 巴林左旗| 连州市| 台江县| 和政县| 葫芦岛市| 鄂托克旗| 容城县| 麦盖提县| 德江县| 沅江市| 宁津县| 根河市| 凤城市| 娱乐| 长宁县| 博兴县| 抚顺市| 台中市| 德昌县| 即墨市| 永清县| 织金县| 客服| 江油市| 望奎县| 静海县| 荣昌县| 吉林省| 宁国市| 乃东县| 扬中市| 阿坝| 浮梁县| 灵台县| 永康市| 同仁县|