散文網(wǎng) » 科技 »學(xué)習(xí) » CVPR'23 最新 125 篇論文分方向整理｜檢測(cè)、分割、人臉、視頻處理、醫(yī)學(xué)影像、神經(jīng)網(wǎng)

CVPR'23 最新 125 篇論文分方向整理｜檢測(cè)、分割、人臉、視頻處理、醫(yī)學(xué)影像、神經(jīng)網(wǎng)

2023-03-27 10:06 作者:極市平臺(tái) 0人讀過(guò) | 我要投稿

編輯丨極市平臺(tái)

CVPR2023已經(jīng)放榜，今年有2360篇，接收率為25.78%。在CVPR2023正式會(huì)議召開前，為了讓大家更快地獲取和學(xué)習(xí)到計(jì)算機(jī)視覺(jué)前沿技術(shù)，極市對(duì)CVPR2023 最新論文進(jìn)行追蹤，包括分研究方向的論文、代碼匯總以及論文技術(shù)直播分享。
CVPR 2023?論文分方向整理目前在極市社區(qū)持續(xù)更新中，已累計(jì)更新了381篇，項(xiàng)目地址：https://www.cvmart.net/community/detail/7422
以下是最近更新的 CVPR 2023 論文，包含檢測(cè)、分割、人臉、視頻處理、醫(yī)學(xué)影像、神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)、多模態(tài)、小樣本學(xué)習(xí)等方向。
點(diǎn)擊?閱讀原文?即可打包下載。

-?檢測(cè) -?分割 -?視頻處理 -?估計(jì) -?人臉 -?目標(biāo)跟蹤 -?圖像&視頻檢索/視頻理解 -?醫(yī)學(xué)影像 -?GAN/生成式/對(duì)抗式 -?圖像生成/圖像合成 -?神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì) -?數(shù)據(jù)處理 -?模型訓(xùn)練/泛化 -?圖像特征提取與匹配 -?視覺(jué)表征學(xué)習(xí) -?模型評(píng)估 -?多模態(tài)學(xué)習(xí) -?視覺(jué)預(yù)測(cè) -?數(shù)據(jù)集 -?小樣本學(xué)習(xí)/零樣本學(xué)習(xí) -?持續(xù)學(xué)習(xí) -?遷移學(xué)習(xí)/domain/自適應(yīng) -?場(chǎng)景圖 -?視覺(jué)定位/位姿估計(jì) -?視覺(jué)推理/視覺(jué)問(wèn)答 -?對(duì)比學(xué)習(xí) -?強(qiáng)化學(xué)習(xí) -?機(jī)器人 -?半監(jiān)督學(xué)習(xí)/弱監(jiān)督學(xué)習(xí)/無(wú)監(jiān)督學(xué)習(xí)/自監(jiān)督學(xué)習(xí) -?其他

檢測(cè)

2D 目標(biāo)檢測(cè)(2D Object Detection)

[1]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
paper：https://arxiv.org/abs/2303.05892

3D 目標(biāo)檢測(cè)(3D object detection)

[1]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
paper：https://arxiv.org/abs/2303.05886

[2]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
paper：https://arxiv.org/abs/2303.08129
code：https://github.com/blvlab/pimae

[3]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
paper：https://arxiv.org/abs/2303.08316

[4]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
paper：https://arxiv.org/abs/2303.10209
code：https://github.com/PaddlePaddle/Paddle3D

[5]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
paper：https://arxiv.org/abs/2303.08686)

[6]AeDet: Azimuth-invariant Multi-view 3D Object Detection
paper：https://arxiv.org/abs/2211.12501
code：https://github.com/fcjian/AeDet

異常檢測(cè)(Anomaly Detection)

[1]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
paper：https://arxiv.org/abs/2211.11317

分割

全景分割(Panoptic Segmentation)

[1]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration
paper：https://arxiv.org/abs/2206.15083

語(yǔ)義分割(Semantic Segmentation)

[1]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
paper：https://arxiv.org/abs/2303.08600
code：https://github.com/jialeli1/lidarseg3d

[2]Side Adapter Network for Open-Vocabulary Semantic Segmentation
paper：https://arxiv.org/abs/2302.12242
code：https://github.com/mendelxu/san

[3]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
paper：https://arxiv.org/abs/2211.10206

實(shí)例分割(Instance Segmentation)

[1]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
paper：https://arxiv.org/abs/2303.08594

[2]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
paper：https://arxiv.org/abs/2303.08578
code：https://github.com/lslrh/sim

[3]DynaMask: Dynamic Mask Selection for Instance Segmentation
paper：https://arxiv.org/abs/2303.07868
code：https://github.com/lslrh/dynamask

視頻目標(biāo)分割(Video Object Segmentation)

[1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
paper：https://arxiv.org/abs/2303.07815

[2]InstMove: Instance Motion for Object-centric Video Segmentation
paper：https://arxiv.org/abs/2303.08132
code：https://github.com/wjf5203/vnext

[3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
paper：https://arxiv.org/abs/2303.10100

視頻處理(Video Processing)

[1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
paper：https://arxiv.org/abs/2303.07815

[2]InstMove: Instance Motion for Object-centric Video Segmentation
paper：https://arxiv.org/abs/2303.08132
code：https://github.com/wjf5203/vnext

[3]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
paper：https://arxiv.org/abs/2303.09757
code：https://github.com/jiaqixuac/map-net

[4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas
paper：https://arxiv.org/abs/2303.08120
code：https://github.com/chenyanglei/all-in-one-deflicker

視頻生成/視頻合成(Video Generation/Video Synthesis)

[1]3D Cinemagraphy from a Single Image
paper：https://arxiv.org/abs/2303.05724

[2]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
paper：https://arxiv.org/abs/2303.08320
code：https://github.com/modelscope/modelscope

視頻超分(Video Super-Resolution)

[1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
paper：https://arxiv.org/abs/2303.08331

估計(jì)

光流/運(yùn)動(dòng)估計(jì)(Optical Flow/Motion Estimation)

[1]Rethinking Optical Flow from Geometric Matching Consistent Perspective
paper：https://arxiv.org/abs/2303.08384
code：https://github.com/dqiaole/matchflow

深度估計(jì)(Depth Estimation)

[1]Fully Self-Supervised Depth Estimation from Defocus Clue
paper：https://arxiv.org/abs/2303.10752
code：https://github.com/ehzoahis/dered

人體解析/人體姿態(tài)估計(jì)(Human Parsing/Human Pose Estimation)

[1]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
paper：https://arxiv.org/abs/2303.08475

[2]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
paper：https://arxiv.org/abs/2302.14338

手勢(shì)估計(jì)(Gesture Estimation)

[1]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
paper：https://arxiv.org/abs/2303.05725
code：https://arxiv.org/abs/2303.05725

圖像處理

[1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
paper：https://arxiv.org/abs/2303.06285
code：https://github.com/yueming6568/deltaedit

圖像復(fù)原/圖像增強(qiáng)/圖像重建(Image Restoration/Image Reconstruction)

[1]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank
paper：https://arxiv.org/abs/2303.09101
code：https://github.com/huang-shirui/semi-uir

[1]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
paper：https://arxiv.org/abs/2303.05938
code：https://github.com/zhengdiyu/arbitrary-hands-3d-reconstruction

風(fēng)格遷移(Style Transfer)

[1]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
paper：https://arxiv.org/abs/2303.10598

[2]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN
paper：https://arxiv.org/abs/2204.14079
code：https://github.com/LeeDongYeun/FixNoise

人臉

人臉識(shí)別/檢測(cè)(Facial Recognition/Detection)

[1]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection
paper：https://arxiv.org/abs/2303.08545

[2]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling
paper：https://arxiv.org/abs/2303.08419

人臉生成/合成/重建/編輯(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[1]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation
paper：https://arxiv.org/abs/2106.09614
code：https://github.com/unibas-gravis/Occlusion-Robust-MoFA

目標(biāo)跟蹤(Object Tracking)

[1]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
paper：https://arxiv.org/abs/2303.10404

[2]Visual Prompt Multi-Modal Tracking
paper：https://arxiv.org/abs/2303.10826
code：https://github.com/jiawen-zhu/vipt

圖像&視頻檢索/視頻理解(Image&Video Retrieval/Video Understanding)

[1]Data-Free Sketch-Based Image Retrieval
paper：https://arxiv.org/abs/2303.07775

[2]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
paper：https://arxiv.org/abs/2303.07929

[3]Dual-path Adaptation from Image to Video Transformers
paper：https://arxiv.org/abs/2303.09857
code：https://github.com/park-jungin/dualpath

圖像/視頻字幕(Image/Video Caption)

[1]Dual-Stream Transformer for Generic Event Boundary Captioning
paper：https://arxiv.org/abs/2207.03038
code：https://github.com/gx77/dual-stream-transformer-for-generic-event-boundary-captioning

行為識(shí)別/動(dòng)作識(shí)別/檢測(cè)/分割/定位(Action/Activity Recognition)

[1]Video Test-Time Adaptation for Action Recognition
paper：https://arxiv.org/abs/2211.15393

行人重識(shí)別/檢測(cè)(Re-Identification/Detection)

[1]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
paper：https://arxiv.org/abs/2303.06819
code：https://github.com/kali-hac/transg

醫(yī)學(xué)影像(Medical Imaging)

[1]Neuron Structure Modeling for Generalizable Remote Physiological Measurement
paper：https://arxiv.org/abs/2303.05955
code：https://github.com/lupaopao/nest

[2]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
paper：https://arxiv.org/abs/2303.08364
code：https://github.com/junbongjang/contour-tracking

[3]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
paper：https://arxiv.org/abs/2303.08446

GAN/生成式/對(duì)抗式(GAN/Generative/Adversarial)

[2]Graph Transformer GANs for Graph-Constrained House Generation
paper：https://arxiv.org/abs/2303.08225

[1]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
paper：https://arxiv.org/abs/2303.10774

圖像生成/圖像合成(Image Generation/Image Synthesis)

[1]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
paper：https://arxiv.org/abs/2303.10406
code：https://github.com/colorful-liyu/3dqd

[2]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
paper：https://arxiv.org/abs/2303.09875
code：https://github.com/megvii-research/CVPR2023-DMVFN

[3]Regularized Vector Quantization for Tokenized Image Synthesis
paper：https://arxiv.org/abs/2303.06424

三維視覺(jué)

點(diǎn)云(Point Cloud)

[1]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
paper：https://arxiv.org/abs/2303.07938

[2]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
paper：https://arxiv.org/abs/2303.08134
code：https://github.com/zrrskywalker/point-nn

[3]Rotation-Invariant Transformer for Point Cloud Matching
paper：https://arxiv.org/abs/2303.08231

[4]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
paper：https://arxiv.org/abs/2303.09950
code：https://github.com/qinzheng93/graphscnet

三維重建(3D Reconstruction)

[1]Masked Wavelet Representation for Compact Neural Radiance Fields
paper：https://arxiv.org/abs/2212.09069

[2]Decoupling Human and Camera Motion from Videos in the Wild
paper：https://arxiv.org/abs/2302.12827

[3]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
paper：https://arxiv.org/abs/2303.05937

[4]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
paper：https://arxiv.org/abs/2303.07653

[5]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
paper：https://arxiv.org/abs/2303.09554

[6]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
paper：https://arxiv.org/abs/2212.04493
code：https://github.com/yccyenchicheng/SDFusion

場(chǎng)景重建/視圖合成/新視角合成(Novel View Synthesis)

[1]Robust Dynamic Radiance Fields
paper：https://arxiv.org/abs/2301.02239

[2]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
paper：https://arxiv.org/abs/2303.07634

[3]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
paper：https://arxiv.org/abs/2208.00277
code：https://github.com/google-research/jax3d

神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì)(Neural Network Structure Design)

[1]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
paper：https://arxiv.org/abs/2206.10555
code：https://github.com/dvlab-research/largekernel3d

CNN

[1]Randomized Adversarial Training via Taylor Expansion
paper：https://arxiv.org/abs/2303.10653
code：https://github.com/alexkael/randomized-adversarial-training

[2]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
paper：https://arxiv.org/abs/2303.08085
code：https://github.com/hmichaeli/alias_free_convnets

Transformer

[1]BiFormer: Vision Transformer with Bi-Level Routing Attention
paper：https://arxiv.org/abs/2303.08810
code：https://github.com/rayleizhu/biformer

[2]Making Vision Transformers Efficient from A Token Sparsification View
paper：https://arxiv.org/abs/2303.08685

圖神經(jīng)網(wǎng)絡(luò)(GNN)

[1]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks
paper：https://arxiv.org/abs/2303.06199

數(shù)據(jù)處理

[1]TINC: Tree-structured Implicit Neural Compression
paper：https://arxiv.org/abs/2211.06689
code：https://github.com/richealyoung/tinc

圖像聚類(Image Clustering)

[1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
paper：https://arxiv.org/abs/2303.09877
code：https://github.com/danieltrosten/deepmvc

模型訓(xùn)練/泛化(Model Training/Generalization)

[1]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
paper：https://arxiv.org/abs/2303.05675

[2]Universal Instance Perception as Object Discovery and Retrieval
paper：https://arxiv.org/abs/2303.06674
code：https://github.com/MasterBin-IIAU/UNINEXT

[3]Sharpness-Aware Gradient Matching for Domain Generalization
paper：https://arxiv.org/abs/2303.10353
code：https://github.com/wang-pengfei/sagm

圖像特征提取與匹配(Image feature extraction and matching)

[2]Iterative Geometry Encoding Volume for Stereo Matching
paper：https://arxiv.org/abs/2303.06615
code：https://github.com/gangweix/igev

[1]Referring Image Matting
paper：https://arxiv.org/abs/2206.05149
code：https://github.com/jizhizili/rim

視覺(jué)表征學(xué)習(xí)(Visual Representation Learning)

[1]MARLIN: Masked Autoencoder for facial video Representation LearnINg
paper：https://arxiv.org/abs/2211.06627
code：https://github.com/ControlNet/MARLIN

模型評(píng)估(Model Evaluation)

[1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
paper：https://arxiv.org/abs/2303.05762
code：https://github.com/chenweixin107/trojdiff

多模態(tài)學(xué)習(xí)(Multi-Modal Learning)

[1]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos
paper：https://arxiv.org/abs/2303.10421
code：https://github.com/xkwangcn/abaw-5th-rt-iai

[2]Emotional Reaction Intensity Estimation Based on Multimodal Data
paper：https://arxiv.org/abs/2303.09167

[3]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
paper：https://arxiv.org/abs/2303.09164

[4]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
paper：https://arxiv.org/abs/2303.05952

視聽學(xué)習(xí)(Audio-visual Learning)

[1]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
paper：https://arxiv.org/abs/2303.08536
code：https://github.com/joannahong/av-relscore

[2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
paper：https://arxiv.org/abs/2303.06357
code：https://arxiv.org/abs/2303.06357

視覺(jué)-語(yǔ)言（Vision-language）

[1]Lana: A Language-Capable Navigator for Instruction Following and Generation
paper：https://arxiv.org/abs/2303.08409
code：https://github.com/wxh1996/lana-vln

視覺(jué)預(yù)測(cè)(Vision-based Prediction)

[1]TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
paper：https://arxiv.org/abs/2303.09998

數(shù)據(jù)集(Dataset)

[1]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
paper：https://arxiv.org/abs/2212.04825
code：https://github.com/facebookresearch/Whac-A-Mole

[2]MVImgNet: A Large-scale Dataset of Multi-view Images
paper：https://arxiv.org/abs/2303.06042

[3]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
paper：https://arxiv.org/abs/2303.09095
code：https://github.com/climbingdaily/SLOPER4D

[4]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
paper：https://arxiv.org/abs/2212.04825
code：https://github.com/facebookresearch/Whac-A-Mole

[5]MVImgNet: A Large-scale Dataset of Multi-view Images
paper：https://arxiv.org/abs/2303.06042

小樣本學(xué)習(xí)/零樣本學(xué)習(xí)(Few-shot Learning/Zero-shot Learning)

[1]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
paper：https://arxiv.org/abs/2303.09674
code：https://github.com/phoenix-v/digeo

[2]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
paper：https://arxiv.org/abs/2303.09352
code：https://github.com/uitml/nohub

[3]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
paper：https://arxiv.org/abs/2303.08698
code：https://github.com/zhicaiwww/bi-vaegan

持續(xù)學(xué)習(xí)(Continual Learning/Life-long Learning)

[1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
paper：https://arxiv.org/abs/2303.09483
code：https://github.com/kim-sanghwan/ancl

遷移學(xué)習(xí)/domain/自適應(yīng)(Transfer Learning/Domain Adaptation)

[1]Trainable Projected Gradient Method for Robust Fine-tuning
paper：https://arxiv.org/abs/2303.10720

[2]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion
paper：https://arxiv.org/abs/2103.17084

[3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
paper：https://arxiv.org/abs/2203.15793
code：https://github.com/vibashan/irg-sfda

[4]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
paper：https://arxiv.org/abs/2203.15793
code：https://github.com/vibashan/irg-sfda

場(chǎng)景圖

場(chǎng)景圖理解(Scene Graph Understanding)

[1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
paper：https://arxiv.org/abs/2211.16312
code：https://github.com/cvmi-lab/pla

視覺(jué)定位/位姿估計(jì)(Visual Localization/Pose Estimation)

[1]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
paper：https://arxiv.org/abs/2303.09187

[2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition
paper：https://arxiv.org/abs/2212.00937

視覺(jué)推理/視覺(jué)問(wèn)答(Visual Reasoning/VQA)

[1]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
paper：https://arxiv.org/abs/2303.10482
code：https://github.com/szzexpoi/poem

[2]Generative Bias for Robust Visual Question Answering
paper：https://arxiv.org/abs/2208.00690

對(duì)比學(xué)習(xí)(Contrastive Learning)

[1]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
paper：https://arxiv.org/abs/2303.10323
code：https://github.com/mlii0117/dcl

強(qiáng)化學(xué)習(xí)(Reinforcement Learning)

[1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
paper：https://arxiv.org/abs/2303.10876
code：https://github.com/mediabrain-sjtu/eqmotion

機(jī)器人(Robotic)

[1]Efficient Map Sparsification Based on 2D and 3D Discretized Grids
paper：https://arxiv.org/abs/2303.10882

半監(jiān)督學(xué)習(xí)/弱監(jiān)督學(xué)習(xí)/無(wú)監(jiān)督學(xué)習(xí)/自監(jiān)督學(xué)習(xí)(Self-supervised Learning/Semi-supervised Learning)

[1]Extracting Class Activation Maps from Non-Discriminative Features as well
paper：https://arxiv.org/abs/2303.10334
code：https://github.com/zhaozhengchen/lpcam

[2]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
paper：https://arxiv.org/abs/2303.09870
code：https://github.com/devavrattomar/tesla

[3]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
paper：https://arxiv.org/abs/2303.09665

[4]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
paper：https://arxiv.org/abs/2303.09061
code：https://github.com/lliuz/mixteacher

[5]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
paper：https://arxiv.org/abs/2303.06380

[6]Non-Contrastive Unsupervised Learning of Physiological Signals from Video
paper：https://arxiv.org/abs/2303.07944

其他

[1]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition
paper：https://arxiv.org/abs/2303.10849

[2]Partial Network Cloning
paper：https://arxiv.org/abs/2303.10597
code：https://github.com/jngwenye/pncloning

[3]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
paper：https://arxiv.org/abs/2303.10449
code：https://github.com/lufan31/et-ood

[4]Adversarial Counterfactual Visual Explanations
paper：https://arxiv.org/abs/2303.09962
code：https://github.com/guillaumejs2403/ace

[5]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
paper：https://arxiv.org/abs/2303.09165
code：https://github.com/huitangtang/on_the_utility_of_synthetic_data

[6]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
paper：https://arxiv.org/abs/2303.09119
code：https://github.com/advocate99/diffgesture

[7]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
paper：https://arxiv.org/abs/2303.08658
code：https://github.com/kebii/r2et

[8]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
paper：https://arxiv.org/abs/2202.04235
code：https://github.com/twweeb/composite-adv

[9]Backdoor Defense via Deconfounded Representation Learning
paper：https://arxiv.org/abs/2303.06818
code：https://github.com/zaixizhang/cbd

[10]Label Information Bottleneck for Label Enhancement
paper：https://arxiv.org/abs/2303.06836

[11]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
paper：https://arxiv.org/abs/2303.08137
code：https://github.com/CyberAgentAILab/layout-dm

[12]Diversity-Aware Meta Visual Prompting
paper：https://arxiv.org/abs/2303.08138
code：https://github.com/shikiw/dam-vp

標(biāo)簽：人工智能 AI 計(jì)算機(jī)視覺(jué)深度學(xué)習(xí)目標(biāo)檢測(cè)