Referring Expressions with Rational Speech Act Framework: A
Probabilistic Approach
- URL: http://arxiv.org/abs/2205.07795v1
- Date: Mon, 16 May 2022 16:37:50 GMT
- Title: Referring Expressions with Rational Speech Act Framework: A
Probabilistic Approach
- Authors: Hieu Le, Taufiq Daryanto, Fabian Zhafransyah, Derry Wijaya, Elizabeth
Coppock, Sang Chin
- Abstract summary: This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene.
Several recent REG systems have used deep learning approaches to represent the speaker/listener agents.
This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes.
- Score: 2.1425861443122383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on a referring expression generation (REG) task in which
the aim is to pick out an object in a complex visual scene. One common
theoretical approach to this problem is to model the task as a two-agent
cooperative scheme in which a `speaker' agent would generate the expression
that best describes a targeted area and a `listener' agent would identify the
target. Several recent REG systems have used deep learning approaches to
represent the speaker/listener agents. The Rational Speech Act framework (RSA),
a Bayesian approach to pragmatics that can predict human linguistic behavior
quite accurately, has been shown to generate high quality and explainable
expressions on toy datasets involving simple visual scenes. Its application to
large scale problems, however, remains largely unexplored. This paper applies a
combination of the probabilistic RSA framework and deep learning approaches to
larger datasets involving complex visual scenes in a multi-step process with
the aim of generating better-explained expressions. We carry out experiments on
the RefCOCO and RefCOCO+ datasets and compare our approach with other
end-to-end deep learning approaches as well as a variation of RSA to highlight
our key contribution. Experimental results show that while achieving lower
accuracy than SOTA deep learning methods, our approach outperforms similar RSA
approach in human comprehension and has an advantage over end-to-end deep
learning under limited data scenario. Lastly, we provide a detailed analysis on
the expression generation process with concrete examples, thus providing a
systematic view on error types and deficiencies in the generation process and
identifying possible areas for future improvements.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE)
Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution.
We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions.
We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z) - Multivariate Business Process Representation Learning utilizing Gramian
Angular Fields and Convolutional Neural Networks [0.0]
Learning meaningful representations of data is an important aspect of machine learning.
For predictive process analytics, it is essential to have all explanatory characteristics of a process instance available.
We propose a novel approach for representation learning of business process instances.
arXiv Detail & Related papers (2021-06-15T10:21:14Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.