Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social
Commonsense
- URL: http://arxiv.org/abs/2105.05913v1
- Date: Wed, 12 May 2021 19:18:02 GMT
- Title: Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social
Commonsense
- Authors: Ting-Yun Chang, Yang Liu, Karthik Gopalakrishnan, Behnam Hedayatnia,
Pei Zhou, Dilek Hakkani-Tur
- Abstract summary: We focus on the Social IQA dataset, a task requiring social and emotional commonsense reasoning.
We propose several architecture variations and extensions, as well as leveraging external commonsense corpora.
Our proposed system achieves competitive results as those top-ranking models on the leaderboard.
- Score: 6.335245542129822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models have demonstrated outstanding performance in many
NLP tasks recently. However, their social intelligence, which requires
commonsense reasoning about the current situation and mental states of others,
is still developing. Towards improving language models' social intelligence, we
focus on the Social IQA dataset, a task requiring social and emotional
commonsense reasoning. Building on top of the pretrained RoBERTa and GPT2
models, we propose several architecture variations and extensions, as well as
leveraging external commonsense corpora, to optimize the model for Social IQA.
Our proposed system achieves competitive results as those top-ranking models on
the leaderboard. This work demonstrates the strengths of pretrained language
models, and provides viable ways to improve their performance for a particular
task.
Related papers
- Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization [0.6629765271909505]
This paper introduces a novel approach to model alignment through weak-to-strong generalization in the context of language models.
Our results suggest that this facilitation-based approach not only enhances model performance but also provides insights into the nature of model alignment.
arXiv Detail & Related papers (2024-09-11T15:16:25Z) - Measuring Social Norms of Large Language Models [13.648679166997693]
We present a new challenge to examine whether large language models understand social norms.
Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions.
We propose a multi-agent framework based on large language models to improve the models' ability to understand social norms.
arXiv Detail & Related papers (2024-04-03T05:58:57Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - Qwen Technical Report [132.54304067403922]
We introduce Qwen, the first installment of our large language model series.
Qwen is the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques.
We have also developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat.
arXiv Detail & Related papers (2023-09-28T17:07:49Z) - Large Language Models Are Also Good Prototypical Commonsense Reasoners [11.108562540123387]
Traditional fine-tuning approaches can be resource-intensive and potentially compromise a model's generalization capacity.
We draw inspiration from the outputs of large models for tailored tasks and semi-automatically developed a set of novel prompts.
With better designed prompts we can achieve the new state-of-art(SOTA) on the ProtoQA leaderboard.
arXiv Detail & Related papers (2023-09-22T20:07:24Z) - Generative Agent-Based Modeling: Unveiling Social System Dynamics
through Coupling Mechanistic Models with Generative Artificial Intelligence [0.5898893619901381]
We discuss the emerging new opportunity for building feedback-rich computational models of social systems using generative artificial intelligence.
Referred to as Generative Agent-Based Models (GABMs), such individual-level models utilize large language models such as ChatGPT to represent human decision-making in social settings.
We provide a GABM case in which human behavior can be incorporated in simulation models by coupling a mechanistic model of human interactions with a pre-trained large language model.
arXiv Detail & Related papers (2023-09-20T16:43:05Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - ANNA: Enhanced Language Representation for Question Answering [5.713808202873983]
We show how approaches affect performance individually and that the approaches are jointly considered in pre-training models.
We propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Our best model achieves new state-of-the-art results of 95.7% F1 and 90.6% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet.
arXiv Detail & Related papers (2022-03-28T05:26:52Z) - Knowledge Distillation for Quality Estimation [79.51452598302934]
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations.
Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results.
We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
arXiv Detail & Related papers (2021-07-01T12:36:21Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.