Related papers: Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation

URL: http://arxiv.org/abs/2112.02853v1
Date: Mon, 6 Dec 2021 08:22:58 GMT
Title: Reliable Propagation-Correction Modulation for Video Object Segmentation
Authors: Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu
Abstract summary: We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings. This avoids overriding the effects of the reliable correction modulator by the propagation modulator. Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks.
Score: 19.51247081512788
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error propagation through a correction mechanism with high reliability. The key insight is to disentangle the correction from the conventional mask propagation process with reliable cues. We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively. Specifically, we assemble the modulators with a cascaded propagation-correction scheme. This avoids overriding the effects of the reliable correction modulator by the propagation modulator. Although the reference frame with the ground truth label provides reliable cues, it could be very different from the target frame and introduce uncertain or incomplete correlations. We augment the reference cues by supplementing reliable feature patches to a maintained pool, thus offering more comprehensive and expressive object representations to the modulators. In addition, a reliability filter is designed to retrieve reliable patches and pass them in subsequent frames. Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance. Code is available at: https://github.com/JerryX1110/RPCMVOS.

Related papers

Joint Source-Channel-Generation Coding: From Distortion-oriented Reconstruction to Semantic-consistent Generation [58.67925548779465]
We propose Joint Source-Channel-Generation Coding (JSCGC), a novel paradigm that shifts the focus from perceptual reconstruction to probabilistic generation.<n>JSCGC improves substantially semantic quality and semantic fidelity, significantly outperforming conventional distortion-oriented J SCC methods.
arXiv Detail & Related papers (2026-01-19T08:12:47Z)
On Exact Editing of Flow-Based Diffusion Models [97.0633397035926]
We propose Conditioned Velocity Correction (CVC) to reformulate flow-based editing as a distribution transformation problem driven by a known source prior.<n>CVC rethinks the role of velocity in inter-distribution transformation by introducing a dual-perspective velocity conversion mechanism.<n>We show that CVC consistently achieves superior fidelity, better semantic alignment, and more reliable editing behavior across diverse tasks.
arXiv Detail & Related papers (2025-12-30T06:29:20Z)
Causality-Inspired Safe Residual Correction for Multivariate Time Series [12.183024727781449]
We propose CRC (Causality-inspired Safe Residual Correction), a plug-and-play framework explicitly designed to ensure non-degradation.<n>It employs a causality-inspired encoder to expose direction-aware structure by decoupling self- and cross-variable dynamics, and a hybrid corrector to model residual errors.<n>Experiments show that CRC consistently improves accuracy, while an in-depth ablation study confirms that its core safety mechanisms ensure exceptionally high non-degradation rates (NDR)
arXiv Detail & Related papers (2025-12-27T01:34:14Z)
Corrective Diffusion Language Models [12.724100711773593]
We study corrective behavior in diffusion language models, defined as the ability to assign lower confidence to incorrect tokens and iteratively refine them while preserving correct content.<n>We propose a correction-oriented post-training principle that explicitly supervises visible incorrect tokens, enabling error-aware confidence and targeted refinement.
arXiv Detail & Related papers (2025-12-17T17:04:38Z)
Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal [31.458406135473805]
We present UniCR, a unified framework that turns heterogeneous uncertainty evidence into a calibrated probability of correctness.<n>UniCR learns a lightweight calibration head with temperature scaling and proper scoring.<n>Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics.
arXiv Detail & Related papers (2025-09-01T13:14:58Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing [42.89229070245538]
We introduce REACT, a framework for precise and controllable knowledge editing.<n>In the initial phase, we utilize tailored stimuli to extract latent factual representations.<n>In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar.
arXiv Detail & Related papers (2025-05-25T01:57:06Z)
Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval [39.65722543824425]
Gap-Aware Retrieval framework introduces a learnable, pair-specific increment Delta_ij between text t_i and video v_j.<n>GARE consistently improves alignment accuracy and robustness to noisy supervision.
arXiv Detail & Related papers (2025-05-18T17:18:06Z)
Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework [14.23793349540553]
We provide a general theoretical framework for adapters that maintain frame consistency in DDIM-based models under a temporal consistency loss. We analyze the stability of modules in the DDIM inversion procedure, showing that the associated error remains controlled.
arXiv Detail & Related papers (2025-04-22T16:28:35Z)
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention [5.044679241062448]
Transformer models leverage self-attention mechanisms to capture dependencies, demonstrating exceptional performance in various applications. Existing fault tolerance methods protect each operation separately using decoupled kernels, incurring substantial computational and memory overhead. We propose a novel error-resilient framework for Transformer models, integrating end-to-end fault tolerant attention.
arXiv Detail & Related papers (2025-04-03T02:05:08Z)
Building the Self-Improvement Loop: Error Detection and Correction in Goal-Oriented Semantic Communications [2.677520298504178]
semantic communication (SemCom) focuses on transmitting meaning rather than symbols, leading to significant improvements in communication efficiency. Despite these advantages, semantic errors -- stemming from discrepancies between transmitted and received meanings -- present a major challenge to system reliability. This paper proposes a comprehensive framework for detecting and correcting semantic errors in SemCom systems.
arXiv Detail & Related papers (2024-11-03T12:29:23Z)
Perception-Oriented Video Frame Interpolation via Asymmetric Blending [20.0024308216849]
Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. We propose PerVFI (Perception-oriented Video Frame Interpolation) to mitigate these challenges. Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods.
arXiv Detail & Related papers (2024-04-10T02:40:17Z)
Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information. Inaccurate alignment usually leads to aligned features with significant artifacts. propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z)
Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning [1.4061979259370274]
Multipath effects and non-line-of-sight conditions cause ranging errors between anchors and tags. Existing approaches for mitigating these ranging errors rely on collecting large labeled datasets. This paper proposes a novel self-supervised deep reinforcement learning approach that does not require labeled ground truth data.
arXiv Detail & Related papers (2024-03-28T09:36:55Z)
Friendly Attacks to Improve Channel Coding Reliability [0.33993877661368754]
"Friendly attack" aims at enhancing the performance of error correction channel codes. Inspired by the concept of adversarial attacks, our method leverages the idea of introducing slight perturbations to the neural network input. We demonstrate that the proposed friendly attack method can improve the reliability across different channels, modulations, codes, and decoders.
arXiv Detail & Related papers (2024-01-25T13:46:21Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)
Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse. We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z)
Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.