散文網(wǎng) » 科技 »學(xué)習(xí) » CVPR'23 最新 89 篇打包下載｜涵蓋視頻目標(biāo)檢測、關(guān)鍵點(diǎn)檢測、異常檢測等

CVPR'23 最新 89 篇打包下載｜涵蓋視頻目標(biāo)檢測、關(guān)鍵點(diǎn)檢測、異常檢測等

2023-04-04 11:46 作者:極市平臺 0人讀過 | 我要投稿

編輯丨極市平臺

CVPR2023已經(jīng)放榜，今年有2360篇，接收率為25.78%。在CVPR2023正式會議召開前，為了讓大家更快地獲取和學(xué)習(xí)到計(jì)算機(jī)視覺前沿技術(shù)，極市對CVPR023 最新論文進(jìn)行追蹤，包括分研究方向的論文、代碼匯總以及論文技術(shù)直播分享。

CVPR 2023 論文分方向整理目前在極市社區(qū)持續(xù)更新中，已累計(jì)更新了693篇，項(xiàng)目地址：https://www.cvmart.net/community/detail/7422

以下是最近更新的 CVPR 2023 論文，包含檢測、分割、人臉、視頻處理、醫(yī)學(xué)影像、神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)、多模態(tài)、小樣本學(xué)習(xí)等方向。

打包下載地址：?https://www.cvmart.net/community/detail/7480

2D目標(biāo)檢測(2D Object Detection)

[1]What Can Human Sketches Do for Object Detection?
paper：https://arxiv.org/abs/2303.15149

視頻目標(biāo)檢測(Video Object Detection)

[1]Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
paper：https://arxiv.org/abs/2303.14768?code：https://github.com/tencentyouturesearch/highlightdetection-clc

[2]3D Video Object Detection with Learnable Object-Centric Global Optimization
paper：https://arxiv.org/abs/2303.15416?code：https://github.com/jiaweihe1996/ba-det

3D目標(biāo)檢測(3D object detection)

[1]Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
paper：https://arxiv.org/abs/2303.14311

[2]Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
paper：https://arxiv.org/abs/2303.14488?code：https://github.com/cuogeihong/ceasc

[3]Viewpoint Equivariance for Multi-View 3D Object Detection
paper：https://arxiv.org/abs/2303.14548?code：https://github.com/tri-ml/vedet

偽裝目標(biāo)檢測(Camouflaged Object Detection)

[1]Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
paper：https://arxiv.org/abs/2303.14816?code：https://github.com/zhouhuang23/fspnet

關(guān)鍵點(diǎn)檢測(Keypoint Detection)

[1]Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
paper：https://arxiv.org/abs/2303.15270

異常檢測(Anomaly Detection)

[1]WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
paper：https://arxiv.org/abs/2303.14814

[2]SimpleNet: A Simple Network for Image Anomaly Detection and Localization
paper：https://arxiv.org/abs/2303.15140?code：https://github.com/donaldrr/simplenet

[3]Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
paper：https://arxiv.org/abs/2303.15167

圖像分割(Image Segmentation)

[1]Parameter Efficient Local Implicit Image Function Network for Face Segmentation
paper：https://arxiv.org/abs/2303.15122

[2]EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
paper：https://arxiv.org/abs/2303.15440

全景分割(Panoptic Segmentation)

[1]You Only Segment Once: Towards Real-Time Panoptic Segmentation
paper：https://arxiv.org/abs/2303.14651?code：https://github.com/hujiecpp/yoso

語義分割(Semantic Segmentation)

[1]Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
paper：https://arxiv.org/abs/2303.14360

[2]Instant Domain Augmentation for LiDAR Semantic Segmentation
paper：https://arxiv.org/abs/2303.14378

[3]Leveraging Hidden Positives for Unsupervised Semantic Segmentation
paper：https://arxiv.org/abs/2303.15014?code：https://github.com/hynnsk/hp

實(shí)例分割(Instance Segmentation)

[1]DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
paper：https://arxiv.org/abs/2303.14373

[2]The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
paper：https://arxiv.org/abs/2303.15062

視頻目標(biāo)分割(Video Object Segmentation)

[1]Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
paper：https://arxiv.org/abs/2303.14361?code：https://github.com/shaoyuanlo/stpl

密集預(yù)測(Dense Prediction)

[1]Ensemble-based Blackbox Attacks on Dense Prediction
paper：https://arxiv.org/abs/2303.14304

[2]Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
paper：https://arxiv.org/abs/2303.14960?code：https://github.com/PaddlePaddle/PaddleDetection

視頻處理(Video Processing)

[1]Affordance Grounding from Demonstration Video to Target Image
paper：https://arxiv.org/abs/2303.14644?code：https://github.com/showlab/afformer

[2]Frame Flexible Network
paper：https://arxiv.org/abs/2303.14817?code：https://github.com/bespontaneous/ffn

[3]Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
paper：https://arxiv.org/abs/2303.15043?code：https://github.com/shangwei5/vidue

人體解析/人體姿態(tài)估計(jì)(Human Parsing/Human Pose Estimation)

[1]ScarceNet: Animal Pose Estimation with Scarce Annotations
paper：https://arxiv.org/abs/2303.15023?code：https://github.com/chaneyddtt/scarcenet

[2]Human Pose Estimation in Extremely Low-Light Conditions
paper：https://arxiv.org/abs/2303.15410

超分辨率(Super Resolution)

[1]Learning Generative Structure Prior for Blind Text Image Super-resolution
paper：https://arxiv.org/abs/2303.14726?code：https://github.com/csxmli2016/marconet

[2]Learning to Zoom and Unzoom
paper：https://arxiv.org/abs/2303.15390

圖像復(fù)原/圖像增強(qiáng)/圖像重建(Image Restoration/Image Reconstruction)

[1]Visual-Tactile Sensing for In-Hand Object Reconstruction
paper：https://arxiv.org/abs/2303.14498

[2]3D-Aware Multi-Class Image-to-Image Translation with NeRFs
paper：https://arxiv.org/abs/2303.15012?code：https://github.com/sen-mao/3di2i-translation

圖像去陰影/去反射(Image Shadow Removal/Image Reflection Removal)

[1]Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
paper：https://arxiv.org/abs/2303.15046?code：https://github.com/ykdai/BracketFlare

圖像去噪/去模糊/去雨去霧(Image Denoising)

[1]Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
paper：https://arxiv.org/abs/2303.14218?code：https://github.com/yuzheng9/c2pnet

[2]Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
paper：https://arxiv.org/abs/2303.14934?code：https://github.com/nagejacob/spatiallyadaptivessid

人臉生成/合成/重建/編輯(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[1]OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
paper：https://arxiv.org/abs/2303.14662

[2]High-fidelity 3D Human Digitization from Single 2K Resolution Images
paper：https://arxiv.org/abs/2303.15108

[3]FaceLit: Neural 3D Relightable Faces
paper：https://arxiv.org/abs/2303.15437

圖像&視頻檢索/視頻理解(Image&Video Retrieval/Video Understanding)

[1]Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
paper：https://arxiv.org/abs/2303.14348?code：https://github.com/buptlinfy/zse-sbir

[2]Selective Structured State-Spaces for Long-Form Video Understanding
paper：https://arxiv.org/abs/2303.14526

行為識別/動作識別/檢測/分割/定位(Action/Activity Recognition)

[1]3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
paper：https://arxiv.org/abs/2303.14474

行人重識別/檢測(Re-Identification/Detection)

[1]Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
paper：https://arxiv.org/abs/2303.14481?code：https://github.com/zyk100/llcm

醫(yī)學(xué)影像(Medical Imaging)

[1]Label-Free Liver Tumor Segmentation
paper：https://arxiv.org/abs/2303.14869?code：https://github.com/mrgiovanni/synthetictumors

[2]Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
paper：https://arxiv.org/abs/2303.15038

圖像生成/圖像合成(Image Generation/Image Synthesis)

[1]Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
paper：https://arxiv.org/abs/2303.14377

[2]Freestyle Layout-to-Image Synthesis
paper：https://arxiv.org/abs/2303.14412?code：https://github.com/essunny310/freestylenet

點(diǎn)云(Point Cloud)

[1]Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
paper：https://arxiv.org/abs/2303.14505

[2]NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
paper：https://arxiv.org/abs/2303.15126?code：https://github.com/ispc-lab/neuralpci

[3]Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
paper：https://arxiv.org/abs/2303.15385

三維重建(3D Reconstruction)

[1]PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
paper：https://arxiv.org/abs/2303.14587?code：https://github.com/shuhongchen/panic3d-anime-reconstruction

場景重建/視圖合成/新視角合成(Novel View Synthesis)

[1]DyLiN: Making Light Field Networks Dynamic
paper：https://arxiv.org/abs/2303.14243

[2]FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
paper：https://arxiv.org/abs/2303.14368

[3]NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
paper：https://arxiv.org/abs/2303.14435?code：https://github.com/jokeryan/nerf-ds

[4]SUDS: Scalable Urban Dynamic Scenes
paper：https://arxiv.org/abs/2303.14536

[5]JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
paper：https://arxiv.org/abs/2303.15427

知識蒸餾(Knowledge Distillation)

[1]Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
paper：https://arxiv.org/abs/2303.14666

神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì)(Neural Network Structure Design)

[1]Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
paper：https://arxiv.org/abs/2303.14404?code：https://github.com/akhtarvision/bpc_calibration

[2]Compacting Binary Neural Networks by Sparse Kernel Selection
paper：https://arxiv.org/abs/2303.14470

圖神經(jīng)網(wǎng)絡(luò)(GNN)

[1]Mind the Label Shift of Augmentation-based Graph OOD Generalization
paper：https://arxiv.org/abs/2303.14859

圖像壓縮(Image Compression)

[1]Learned Image Compression with Mixed Transformer-CNN Architectures
paper：https://arxiv.org/abs/2303.14978?code：https://github.com/jmliu206/lic_tcm

模型訓(xùn)練/泛化(Model Training/Generalization)

[1]Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
paper：https://arxiv.org/abs/2303.14382?code：https://github.com/yichen928/activeft

[2]CFA: Class-wise Calibrated Fair Adversarial Training
paper：https://arxiv.org/abs/2303.14460?code：https://github.com/pku-ml/cfa

視覺-語言（Vision-language）

[1]VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
paper：https://arxiv.org/abs/2303.14302

[2]Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
paper：https://arxiv.org/abs/2303.14369?code：https://github.com/jpthu17/HBI

[3]IFSeg: Image-free Semantic Segmentation via Vision-Language Model
paper：https://arxiv.org/abs/2303.14396?code：https://github.com/alinlab/ifseg

[4]Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
paper：https://arxiv.org/abs/2303.14968?code：https://github.com/zwx8981/liqe

數(shù)據(jù)集(Dataset)

[1]CelebV-Text: A Large-Scale Facial Text-Video Dataset
paper：https://arxiv.org/abs/2303.14717?code：https://github.com/CelebV-Text/CelebV-Text

[2]On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
paper：https://arxiv.org/abs/2303.14840?code：https://github.com/junggy/hammer-dataset

[3]Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
paper：https://arxiv.org/abs/2303.15166?code：https://github.com/dreemurr-t/baid

[4]Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
paper：https://arxiv.org/abs/2303.15417?code：https://github.com/jaehakim97/blurhand_release

小樣本學(xué)習(xí)/零樣本學(xué)習(xí)(Few-shot Learning/Zero-shot Learning)

[1]Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
paper：https://arxiv.org/abs/2303.14652

[2]ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
paper：https://arxiv.org/abs/2303.14679?code：https://github.com/casia-iva-lab/zbs

[3]Learning Attention as Disentangler for Compositional Zero-shot Learning
paper：https://arxiv.org/abs/2303.15111?code：https://github.com/haoosz/ade-czsl

[4]Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
paper：https://arxiv.org/abs/2303.15322?code：https://github.com/manliucoder/psvma

持續(xù)學(xué)習(xí)(Continual Learning/Life-long Learning)

[1]Preserving Linear Separability in Continual Learning by Backward Feature Projection

paper：https://arxiv.org/abs/2303.14595

場景圖預(yù)測(Scene Graph Prediction)

[1]VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
paper：https://arxiv.org/abs/2303.14408?code：https://github.com/wz7in/cvpr2023-vlsat

視覺定位/位姿估計(jì)(Visual Localization/Pose Estimation)

[1]Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
paper：https://arxiv.org/abs/2303.15274

視覺推理/視覺問答(Visual Reasoning/VQA)

[1]MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
paper：https://arxiv.org/abs/2303.14933?code：https://github.com/zzc-1998/md-vqa

遷移學(xué)習(xí)/domain/自適應(yīng)(Transfer Learning/Domain Adaptation)

[1]BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
paper：https://arxiv.org/abs/2303.14773?code：https://github.com/changdaeoh/blackvip

對比學(xué)習(xí)(Contrastive Learning)

[1]Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
paper：https://arxiv.org/abs/2303.14865

半監(jiān)督學(xué)習(xí)/弱監(jiān)督學(xué)習(xí)/無監(jiān)督學(xué)習(xí)/自監(jiān)督學(xué)習(xí)(Self-supervised Learning/Semi-supervised Learning)

[1]Detecting Backdoors in Pre-trained Encoders
paper：https://arxiv.org/abs/2303.15180?code：https://github.com/giantseaweed/decree

神經(jīng)網(wǎng)絡(luò)可解釋性(Neural Network Interpretability)

[1]IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
paper：https://arxiv.org/abs/2303.14242?code：https://github.com/yangruo1226/idgi

聯(lián)邦學(xué)習(xí)(Federated Learning)

[1]The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
paper：https://arxiv.org/abs/2303.14868

其他

[1]DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
paper：https://arxiv.org/abs/2303.14585?code：https://github.com/yizhiwang96/deepvecfont-v2

[2]PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
paper：https://arxiv.org/abs/2303.14676

[3]Disentangling Writer and Character Styles for Handwriting Generation
paper：https://arxiv.org/abs/2303.14736?code：https://github.com/dailenson/sdt

[4]Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
paper：https://arxiv.org/abs/2303.14926

[5]DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
paper：https://arxiv.org/abs/2303.15101?code：https://github.com/lmozart/cvpr2023-dani-net

[6]Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
paper：https://arxiv.org/abs/2303.15266?code：https://github.com/zhourixin/bronze-ding

[7]Handwritten Text Generation from Visual Archetypes
paper：https://arxiv.org/abs/2303.15269?code：https://github.com/aimagelab/vatr

標(biāo)簽：人工智能計(jì)算機(jī)視覺深度學(xué)習(xí)