2023.03.20, 03.21, 03.22 ArXiv精選
關(guān)注領(lǐng)域:
AIGC
3D computer vision learning
Fine-grained learning
GNN
其他
聲明
論文較多,時間有限,本專欄無法做文章的講解,只挑選出符合PaperABC研究興趣和當(dāng)前熱點(diǎn)問題相關(guān)的論文,如果你的research topic和上述內(nèi)容有關(guān),那本專欄可作為你的論文更新源或Paper reading list.

Paper list:
今日ArXiv共更新142篇
03.21 ArXiv共更新221篇
03.20 ArXiv共更新99篇

AIGC
Diffusion-based Document Layout Generation
https://arxiv.org/pdf/2303.10787.pdf

利用擴(kuò)散模型做文檔布局生成.
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
https://arxiv.org/pdf/2303.11589.pdf

Text2Tex: Text-driven Texture Synthesis via Diffusion Models
https://arxiv.org/pdf/2303.11396.pdf

使用Diffusion model 完成文本引導(dǎo)的3D物體的紋理合成.真的是什么樣的科研角度都有?。?/p>
SKED: Sketch-guided Text-based 3D Editing
https://arxiv.org/pdf/2303.10735.pdf

西蒙弗雷澤大學(xué)和英偉達(dá)的工作.
將草圖和擴(kuò)散模型與NeRF相結(jié)合,實現(xiàn)基于草圖和文本的3D shape的可控區(qū)域編輯.
這個方向太卷了實在是!
3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
https://arxiv.org/pdf/2303.10406.pdf

本文提出了一個統(tǒng)一的3D先驗生成模型,主要利用了VQ-VAE和diffusion model.且能夠支持很多任務(wù),比如點(diǎn)云補(bǔ)全,無條件的shape生成以及跨模態(tài)的shape生成.
Vox-E: Text-guided Voxel Editing of 3D Objects
https://arxiv.org/pdf/2303.12048.pdf

真的好卷,谷歌的一篇工作.利用LDM實現(xiàn)文本到3D物體的編輯,面向體素.
Zero-1-to-3: Zero-shot One Image to 3D Object
https://arxiv.org/pdf/2303.11328.pdf

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
https://arxiv.org/pdf/2303.11989.pdf


擴(kuò)散模型相關(guān)
Object-Centric Slot Diffusion
https://arxiv.org/pdf/2303.10834.pdf

本文探討了如何使用擴(kuò)散模型來促進(jìn)object-centric的復(fù)雜場景理解,提出了基于擴(kuò)散模型的object-centric slot attention.

VLP
EVA-02: A Visual Representation for Neon Genesis
https://arxiv.org/pdf/2303.11331.pdf

曹越大神的EVA系列.只能說太強(qiáng)了.

VLP for 3D
Grounding 3D Object Affordance from 2D Interactions in Images
https://arxiv.org/pdf/2303.10437.pdf

這是一篇從2D圖像中的學(xué)習(xí)交互信息,進(jìn)而來促進(jìn)3D object affordance grounding的學(xué)習(xí).
3D Concept Learning and Reasoning from Multi-View Images
https://arxiv.org/pdf/2303.11327.pdf

3D multi-view visual question answering新的benchmark. 是個新坑.
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
https://arxiv.org/pdf/2303.11313.pdf

CLIP拓展到3D領(lǐng)域.

醫(yī)學(xué)圖像
HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image Segmentation
https://arxiv.org/pdf/2303.10333.pdf

MIM的思想,應(yīng)用到了3D醫(yī)學(xué)圖像上.