Learning Generalizable Agents via Saliency-Guided Features Decorrelation
- URL: http://arxiv.org/abs/2310.05086v2
- Date: Fri, 22 Dec 2023 09:36:17 GMT
- Title: Learning Generalizable Agents via Saliency-Guided Features Decorrelation
- Authors: Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi
Chang, Lichao Sun, Bo Yang
- Abstract summary: We propose Saliency-Guided Features Decorrelation to eliminate correlations between features and decisions.
RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features.
Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features.
- Score: 25.19044461705711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In visual-based Reinforcement Learning (RL), agents often struggle to
generalize well to environmental variations in the state space that were not
observed during training. The variations can arise in both task-irrelevant
features, such as background noise, and task-relevant features, such as robot
configurations, that are related to the optimal decisions. To achieve
generalization in both situations, agents are required to accurately understand
the impact of changed features on the decisions, i.e., establishing the true
associations between changed features and decisions in the policy model.
However, due to the inherent correlations among features in the state space,
the associations between features and decisions become entangled, making it
difficult for the policy to distinguish them. To this end, we propose
Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations
through sample reweighting. Concretely, SGFD consists of two core techniques:
Random Fourier Functions (RFF) and the saliency map. RFF is utilized to
estimate the complex non-linear correlations in high-dimensional images, while
the saliency map is designed to identify the changed features. Under the
guidance of the saliency map, SGFD employs sample reweighting to minimize the
estimated correlations related to changed features, thereby achieving
decorrelation in visual RL tasks. Our experimental results demonstrate that
SGFD can generalize well on a wide range of test environments and significantly
outperforms state-of-the-art methods in handling both task-irrelevant
variations and task-relevant variations.
Related papers
- OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences [16.828732283348817]
We propose an unsupervised method named Offline Learning based Anomaly Detection (OIL-AD)
OIL-AD detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association.
Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
arXiv Detail & Related papers (2024-02-07T04:06:53Z) - Conditional Mutual Information for Disentangled Representations in
Reinforcement Learning [13.450394764597663]
Reinforcement Learning environments can produce training data with spurious correlations between features.
Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features.
We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features.
arXiv Detail & Related papers (2023-05-23T14:56:19Z) - ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement
Learning [2.984934409689467]
Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space.
We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment.
arXiv Detail & Related papers (2022-03-21T13:17:30Z) - A New Representation of Successor Features for Transfer across
Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics.
We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes.
Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person
Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities.
This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Facial Action Unit Intensity Estimation via Semantic Correspondence
Learning with Dynamic Graph Convolution [27.48620879003556]
We present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps.
In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations.
This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels.
arXiv Detail & Related papers (2020-04-20T23:55:30Z) - Self-Guided Adaptation: Progressive Representation Alignment for Domain
Adaptive Object Detection [86.69077525494106]
Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models.
Existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift.
We propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains.
arXiv Detail & Related papers (2020-03-19T13:30:45Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.