Related papers: Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition

Related papers

Robust Context-Aware Object Recognition [15.318646611581741]
RCOR treats localization as an integral part of recognition to decouple object-centric and context-aware modelling.<n>Results confirm that localization before recognition is now possible even in complex scenes as in ImageNet-1k.
arXiv Detail & Related papers (2025-10-01T07:45:38Z)
A Novel Local Focusing Mechanism for Deepfake Detection Generalization [10.223643897131192]
Deepfake generation techniques have intensified the need for robust and generalizable detection methods.<n>We propose a novel Local Focus Mechanism (LFM) that explicitly attends to discriminative local features for differentiating fake from real images.<n>LFM achieves a 3.7 improvement in accuracy and a 2.8 increase in average precision over the state-of-the-art Neighboring Pixel Relationships (NPR) method.
arXiv Detail & Related papers (2025-08-23T14:06:30Z)
VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection [9.029534000674388]
VG-DETR integrates a Vision Foundation Model (VFM) into the training pipeline in a "free lunch" manner.<n>We introduce a VFM-guided pseudo-label mining strategy that leverages the VFM's semantic priors to assess the reliability of the generated pseudo-labels.<n>In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels.
arXiv Detail & Related papers (2025-08-15T02:35:56Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detection [50.343419243749054]
Anomaly detection is critical in fields such as medical diagnostics and industrial defect detection.<n> CLIP's coarse-grained image-text alignment limits localization and detection performance for fine-grained anomalies.<n>Crane improves the state-of-the-art ZSAD from 2% to 28%, at both image and pixel levels, while remaining competitive in inference speed.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
DCA: Dividing and Conquering Amnesia in Incremental Object Detection [25.11059547936733]
We study the cause of forgetting and discover forgetting imbalance between localization and recognition in transformer-based IOD. We propose a Divide-and-Conquer Amnesia (DCA) strategy, which redesigns the transformer-based IOD into a localization-then-recognition process. Our approach achieves state-of-the-art performance, especially for long-term incremental scenarios.
arXiv Detail & Related papers (2025-03-19T15:17:14Z)
RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning [20.688641105430467]
Global localization is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE) We introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate place recognition by directly deriving it from pose estimation. We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors.
arXiv Detail & Related papers (2024-08-30T18:42:53Z)
Eliminating Feature Ambiguity for Few-Shot Segmentation [95.9916573435427]
Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features. This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods.
arXiv Detail & Related papers (2024-07-13T10:33:03Z)
Fine-grained Background Representation for Weakly Supervised Semantic Segmentation [35.346567242839065]
This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics. We present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning. Our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively.
arXiv Detail & Related papers (2024-06-22T06:45:25Z)
Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization. It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z)
Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios. GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used. We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z)
Green Steganalyzer: A Green Learning Approach to Image Steganalysis [30.486433532000344]
Green Steganalyzer (GS) is a learning solution to image steganalysis based on the green learning paradigm. GS consists of three modules: pixel-based anomaly prediction, 2) embedding location detection, and 3) decision fusion for image-level detection.
arXiv Detail & Related papers (2023-06-06T20:43:07Z)
Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation [25.59883486325534]
This paper aims to solve the video object segmentation (VOS) task in a scribble-supervised manner. We propose a scribble-supervised learning mechanism to facilitate the learning of our model to predict dense results.
arXiv Detail & Related papers (2023-03-25T07:21:40Z)
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations. DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z)
Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture. We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z)
Learning Non-target Knowledge for Few-shot Semantic Segmentation [160.69431034807437]
We propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query. A BG Mining Module (BGMM) is proposed to extract the BG region via learning a general BG prototype. A BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature.
arXiv Detail & Related papers (2022-05-10T13:52:48Z)
Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. GREW is the first large-scale dataset for gait recognition in the wild. SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z)
Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy. We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z)
Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization [65.50321170655225]
We introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations. Experiments on image-based Cloth-Changing ReID benchmarks, e.g., LTCC, PRCC, Real28, and VC-Clothes, demonstrate that GI-ReID performs favorably against the state-of-the-arts.
arXiv Detail & Related papers (2021-03-29T12:10:50Z)
Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA) IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors. IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans. We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.