Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition
- URL: http://arxiv.org/abs/2411.15933v1
- Date: Sun, 24 Nov 2024 17:39:39 GMT
- Title: Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition
- Authors: Klara Janouskova, Cristian Gavrus, Jiri Matas,
- Abstract summary: "Segment to Recognize Robustly" (S2R2) is a novel recognition approach which decouples the FG and BG modelling and combines them in a simple, robust, and interpretable manner.
S2R2 achieves state-of-the-art results on in-domain data while maintaining robustness to BG shifts.
- Score: 21.917582794820095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In image recognition, both foreground (FG) and background (BG) play an important role; however, standard deep image recognition often leads to unintended over-reliance on the BG, limiting model robustness in real-world deployment settings. Current solutions mainly suppress the BG, sacrificing BG information for improved generalization. We propose "Segment to Recognize Robustly" (S2R^2), a novel recognition approach which decouples the FG and BG modelling and combines them in a simple, robust, and interpretable manner. S2R^2 leverages recent advances in zero-shot segmentation to isolate the FG and the BG before or during recognition. By combining FG and BG, potentially also with a standard full-image classifier, S2R^2 achieves state-of-the-art results on in-domain data while maintaining robustness to BG shifts. The results confirm that segmentation before recognition is now possible.
Related papers
- DCA: Dividing and Conquering Amnesia in Incremental Object Detection [25.11059547936733]
We study the cause of forgetting and discover forgetting imbalance between localization and recognition in transformer-based IOD.
We propose a Divide-and-Conquer Amnesia (DCA) strategy, which redesigns the transformer-based IOD into a localization-then-recognition process.
Our approach achieves state-of-the-art performance, especially for long-term incremental scenarios.
arXiv Detail & Related papers (2025-03-19T15:17:14Z) - RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning [20.688641105430467]
Global localization is crucial in autonomous driving and robotics applications when GPS signals are unreliable.
Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE)
We introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate place recognition by directly deriving it from pose estimation.
We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors.
arXiv Detail & Related papers (2024-08-30T18:42:53Z) - Eliminating Feature Ambiguity for Few-Shot Segmentation [95.9916573435427]
Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features.
This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods.
arXiv Detail & Related papers (2024-07-13T10:33:03Z) - Fine-grained Background Representation for Weakly Supervised Semantic Segmentation [35.346567242839065]
This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics.
We present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning.
Our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively.
arXiv Detail & Related papers (2024-06-22T06:45:25Z) - Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization.
It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z) - Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios.
GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used.
We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z) - Green Steganalyzer: A Green Learning Approach to Image Steganalysis [30.486433532000344]
Green Steganalyzer (GS) is a learning solution to image steganalysis based on the green learning paradigm.
GS consists of three modules: pixel-based anomaly prediction, 2) embedding location detection, and 3) decision fusion for image-level detection.
arXiv Detail & Related papers (2023-06-06T20:43:07Z) - Reliability-Hierarchical Memory Network for Scribble-Supervised Video
Object Segmentation [25.59883486325534]
This paper aims to solve the video object segmentation (VOS) task in a scribble-supervised manner.
We propose a scribble-supervised learning mechanism to facilitate the learning of our model to predict dense results.
arXiv Detail & Related papers (2023-03-25T07:21:40Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture.
We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame.
Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z) - Learning Non-target Knowledge for Few-shot Semantic Segmentation [160.69431034807437]
We propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query.
A BG Mining Module (BGMM) is proposed to extract the BG region via learning a general BG prototype.
A BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature.
arXiv Detail & Related papers (2022-05-10T13:52:48Z) - Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based
Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems.
GREW is the first large-scale dataset for gait recognition in the wild.
SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Cloth-Changing Person Re-identification from A Single Image with Gait
Prediction and Regularization [65.50321170655225]
We introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations.
Experiments on image-based Cloth-Changing ReID benchmarks, e.g., LTCC, PRCC, Real28, and VC-Clothes, demonstrate that GI-ReID performs favorably against the state-of-the-arts.
arXiv Detail & Related papers (2021-03-29T12:10:50Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Gait Recognition via Effective Global-Local Feature Representation and
Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields.
Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans.
We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.