Human Parity on CommonsenseQA: Augmenting Self-Attention with External
Attention
- URL: http://arxiv.org/abs/2112.03254v1
- Date: Mon, 6 Dec 2021 18:59:02 GMT
- Title: Human Parity on CommonsenseQA: Augmenting Self-Attention with External
Attention
- Authors: Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng,
Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
- Abstract summary: We propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear.
We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems.
The proposed system reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4% in comparison to the human accuracy of 88.9%.
- Score: 66.93307963324834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of today's AI systems focus on using self-attention mechanisms and
transformer architectures on large amounts of diverse data to achieve
impressive performance gains. In this paper, we propose to augment the
transformer architecture with an external attention mechanism to bring external
knowledge and context to bear. By integrating external information into the
prediction process, we hope to reduce the need for ever-larger models and
increase the democratization of AI systems. We find that the proposed external
attention mechanism can significantly improve the performance of existing AI
systems, allowing practitioners to easily customize foundation AI models to
many diverse downstream applications. In particular, we focus on the task of
Commonsense Reasoning, demonstrating that the proposed external attention
mechanism can augment existing transformer models and significantly improve the
model's reasoning capabilities. The proposed system, Knowledge External
Attention for Reasoning (KEAR), reaches human parity on the open CommonsenseQA
research benchmark with an accuracy of 89.4\% in comparison to the human
accuracy of 88.9\%.
Related papers
- Planning-Aware Diffusion Networks for Enhanced Motion Forecasting in Autonomous Driving [0.0]
Planning-Integrated Forecasting Model (PIFM) is a novel framework inspired by neural mechanisms governing decision-making and multi-agent coordination in the brain.
PIFM is able to forecast future trajectories of all agents within a scenario.
This architecture enhances model transparency, as it parallels the brain's method of dynamically adjusting predictions based on external stimuli and other agents'behaviors.
arXiv Detail & Related papers (2024-10-25T15:44:51Z) - To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems [11.690126756498223]
Vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems.
In practice, the performance disparity of machine learning models on out-of-distribution data makes dataset-specific performance feedback unreliable.
arXiv Detail & Related papers (2024-09-22T09:43:27Z) - From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures [1.5266118210763295]
Recent developments in artificial intelligence like the Transformer architecture incorporate the idea of attention in model designs.
Our review aims to provide a comparative analysis of these mechanisms from a cognitive-functional perspective.
arXiv Detail & Related papers (2024-04-25T05:13:38Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - AutonoML: Towards an Integrated Framework for Autonomous Machine
Learning [9.356870107137095]
Review seeks to motivate a more expansive perspective on what constitutes an automated/autonomous ML system.
In doing so, we survey developments in the following research areas.
We develop a conceptual framework throughout the review, augmented by each topic, to illustrate one possible way of fusing high-level mechanisms into an autonomous ML system.
arXiv Detail & Related papers (2020-12-23T11:01:10Z) - Attention that does not Explain Away [54.42960937271612]
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances.
We propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect.
arXiv Detail & Related papers (2020-09-29T21:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.