AnANet: Modeling Association and Alignment for Cross-modal Correlation
Classification
- URL: http://arxiv.org/abs/2109.00693v1
- Date: Thu, 2 Sep 2021 03:42:35 GMT
- Title: AnANet: Modeling Association and Alignment for Cross-modal Correlation
Classification
- Authors: Nan Xu, Junyan Wang, Yuan Tian, Ruike Zhang, and Wenji Mao
- Abstract summary: We present a comprehensive analysis of the image-text correlation and redefine a new classification system based on implicit association and explicit alignment.
The experimental results on our constructed new image-text correlation dataset show the effectiveness of our model.
- Score: 20.994250472941427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The explosive increase of multimodal data makes a great demand in many
cross-modal applications that follow the strict prior related assumption. Thus
researchers study the definition of cross-modal correlation category and
construct various classification systems and predictive models. However, those
systems pay more attention to the fine-grained relevant types of cross-modal
correlation, ignoring lots of implicit relevant data which are often divided
into irrelevant types. What's worse is that none of previous predictive models
manifest the essence of cross-modal correlation according to their definition
at the modeling stage. In this paper, we present a comprehensive analysis of
the image-text correlation and redefine a new classification system based on
implicit association and explicit alignment. To predict the type of image-text
correlation, we propose the Association and Alignment Network according to our
proposed definition (namely AnANet) which implicitly represents the global
discrepancy and commonality between image and text and explicitly captures the
cross-modal local relevance. The experimental results on our constructed new
image-text correlation dataset show the effectiveness of our model.
Related papers
- Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching [10.709744162565274]
We propose a novel method called DIAS to bridge the modality gap from two aspects.
The method achieves 4.3%-10.2% rSum improvements on Flickr30k and MSCOCO benchmarks.
arXiv Detail & Related papers (2024-10-22T09:37:29Z) - Towards Deconfounded Image-Text Matching with Causal Inference [36.739004282369656]
We propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task.
DCIN decomposes the intra- and inter-modal confounders and incorporates them into the encoding stage of visual and textual features.
It can learn causality instead of spurious correlations caused by dataset bias.
arXiv Detail & Related papers (2024-08-22T11:04:28Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Spuriousness-Aware Meta-Learning for Learning Robust Classifiers [26.544938760265136]
Spurious correlations are brittle associations between certain attributes of inputs and target variables.
Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold.
Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data.
arXiv Detail & Related papers (2024-06-15T21:41:25Z) - CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models [1.068128849363198]
We introduce a novel framework to assess the causal effect of concepts on specific classification outcomes.
We leverage state-of-the-art diffusion-based generative models to estimate counterfactual outcomes.
Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically.
arXiv Detail & Related papers (2024-05-24T18:33:18Z) - Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating
Spurious Correlations in Entity Typing [29.820473012776283]
Existing entity typing models are subject to the problem of spurious correlations.
We identify six types of existing model biases, including mention-context bias, lexical overlapping bias, named entity bias, pronoun bias, dependency bias, and overgeneralization bias.
By augmenting the original training set with their bias-free counterparts, models are forced to fully comprehend the sentences.
arXiv Detail & Related papers (2022-05-25T10:34:22Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.