Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
- URL: http://arxiv.org/abs/2402.15430v2
- Date: Thu, 11 Apr 2024 06:40:12 GMT
- Title: Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
- Authors: Shuren Qi, Yushu Zhang, Chao Wang, Zhihua Xia, Xiaochun Cao, Jian Weng,
- Abstract summary: We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
- Score: 54.78115855552886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
Related papers
- Disentangling Representations through Multi-task Learning [0.0]
We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve classification tasks.
We experimentally validate these predictions in RNNs trained on multi-task classification.
We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities.
arXiv Detail & Related papers (2024-07-15T21:32:58Z) - Geometric Understanding of Discriminability and Transferability for Visual Domain Adaptation [27.326817457760725]
Invariant representation learning for unsupervised domain adaptation (UDA) has made significant advances in computer vision and pattern recognition communities.
Recently, empirical connections between transferability and discriminability have received increasing attention.
In this work, we systematically analyze the essentials of transferability and discriminability from the geometric perspective.
arXiv Detail & Related papers (2024-06-24T13:31:08Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Multi-Scale and Multi-Layer Contrastive Learning for Domain Generalization [5.124256074746721]
We argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network.
We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales.
We show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets.
arXiv Detail & Related papers (2023-08-28T08:54:27Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - A Principled Design of Image Representation: Towards Forensic Tasks [75.40968680537544]
We investigate the forensic-oriented image representation as a distinct problem, from the perspectives of theory, implementation, and application.
At the theoretical level, we propose a new representation framework for forensics, called Dense Invariant Representation (DIR), which is characterized by stable description with mathematical guarantees.
We demonstrate the above arguments on the dense-domain pattern detection and matching experiments, providing comparison results with state-of-the-art descriptors.
arXiv Detail & Related papers (2022-03-02T07:46:52Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Explainability-aided Domain Generalization for Image Classification [0.0]
We show that applying methods and architectures from the explainability literature can achieve state-of-the-art performance for the challenging task of domain generalization.
We develop a set of novel algorithms including DivCAM, an approach where the network receives guidance during training via gradient based class activation maps to focus on a diverse set of discriminative features.
Since these methods offer competitive performance on top of explainability, we argue that the proposed methods can be used as a tool to improve the robustness of deep neural network architectures.
arXiv Detail & Related papers (2021-04-05T02:27:01Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.