Related papers: Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

URL: http://arxiv.org/abs/2502.02867v2
Date: Fri, 14 Feb 2025 11:57:25 GMT
Title: Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations
Authors: Minung Kim, Kawon Lee, Jungmo Kim, Sungho Choi, Seungyul Han,
Abstract summary: Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations.<n>We propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors.
Score: 5.971046215117033
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.

Related papers

HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection [6.060036926093259]
HAMLET-FFD is a cross-domain generalization framework for face forgery detection.<n>It integrates visual evidence with conceptual cues, emulating expert forensic analysis.<n>By design, HAMLET-FFD freezes all pretrained parameters, serving as an external plugin.
arXiv Detail & Related papers (2025-07-28T15:09:52Z)
CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning [30.111296778234124]
CorrMoE is a correspondence pruning framework that enhances robustness under cross-domain and cross-scene variations.<n>For scene diversity, we design a Bi-Fusion Mixture of Experts module that adaptively integrates multi-perspective features.<n>Experiments on benchmark datasets demonstrate that CorrMoE achieves superior accuracy and generalization compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T01:44:01Z)
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation [55.01077993490845]
Recent Large Vision Language Models (LVLMs) demonstrate promising capabilities in unifying visual understanding and generative modeling.<n>We introduce FOCUS, a unified LVLM that integrates segmentation-aware perception and controllable object-centric generation within an end-to-end framework.
arXiv Detail & Related papers (2025-06-20T07:46:40Z)
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation [38.0401463751139]
We present a novel Spectral-dEcomposed Token (SET) learning framework to advance the frontier. Particularly, the frozen VFM features are first decomposed into the phase and amplitude components in the frequency space. We develop an attention optimization method to bridge the gap between style-affected representation and static tokens during inference.
arXiv Detail & Related papers (2024-07-26T07:50:48Z)
Selective Domain-Invariant Feature for Generalizable Deepfake Detection [21.671221284842847]
We propose a novel framework which reduces the sensitivity to face forgery by fusing content features and styles. Both qualitative and quantitative results in existing benchmarks and proposals demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-19T13:09:19Z)
Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences [79.21515035128832]
We conduct a thorough analysis of the performance of current state-of-the-art feature extraction networks under various domain changes. We propose a novel data-centric method, Implicit Cross-Domain Correspondences (iCDC) iCDC represents the same environment with multiple Neural Radiance Fields, each fitting the scene under individual visual domains.
arXiv Detail & Related papers (2023-11-06T18:53:01Z)
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions [14.52296033767276]
We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism. Our proposed $nabla$HFC image processing block excels particularly in adverse weather conditions.
arXiv Detail & Related papers (2023-10-06T11:53:04Z)
Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms. We propose a textbfDomain-Controlled Prompt Learning for the specific domains. Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment [17.086123737443714]
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems. While existing methods demonstrate noteworthy results on synthetic data, they often fail to consider the disparity between synthetic and real-world data domains. We introduce the Multi-Granularity Cross-Domain Alignment framework, tailored to harmonize features across domains at both the scene and individual sample levels.
arXiv Detail & Related papers (2023-08-16T22:54:49Z)
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z)
Self-supervised Contrastive Learning for Cross-domain Hyperspectral Image Representation [26.610588734000316]
This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate. The proposed framework architecture leverages cross-domain CNN, allowing for learning representations from different hyperspectral images. The experimental results demonstrate the advantage of the proposed self-supervised representation over models trained from scratch or other transfer learning methods.
arXiv Detail & Related papers (2022-02-08T16:16:45Z)
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains. Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z)
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.