Related papers: Learning Generalizable Agents via Saliency-Guided Features Decorrelation

Learning Generalizable Agents via Saliency-Guided Features Decorrelation

URL: http://arxiv.org/abs/2310.05086v2
Date: Fri, 22 Dec 2023 09:36:17 GMT
Title: Learning Generalizable Agents via Saliency-Guided Features Decorrelation
Authors: Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi Chang, Lichao Sun, Bo Yang
Abstract summary: We propose Saliency-Guided Features Decorrelation to eliminate correlations between features and decisions. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features.
Score: 25.19044461705711
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.

Related papers

OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences [16.828732283348817]
We propose an unsupervised method named Offline Learning based Anomaly Detection (OIL-AD) OIL-AD detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association. Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
arXiv Detail & Related papers (2024-02-07T04:06:53Z)
Conditional Mutual Information for Disentangled Representations in Reinforcement Learning [13.450394764597663]
Reinforcement Learning environments can produce training data with spurious correlations between features. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features.
arXiv Detail & Related papers (2023-05-23T14:56:19Z)
ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning [2.984934409689467]
Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space. We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment.
arXiv Detail & Related papers (2022-03-21T13:17:30Z)
A New Representation of Successor Features for Transfer across Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics. We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes. Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities. This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z)
Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution [27.48620879003556]
We present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels.
arXiv Detail & Related papers (2020-04-20T23:55:30Z)
Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection [86.69077525494106]
Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models. Existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift. We propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains.
arXiv Detail & Related papers (2020-03-19T13:30:45Z)
When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs) In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.