Teaching Human Behavior Improves Content Understanding Abilities Of LLMs
- URL: http://arxiv.org/abs/2405.00942v3
- Date: Thu, 10 Oct 2024 07:59:09 GMT
- Title: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs
- Authors: Somesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy,
- Abstract summary: Training LLMs on receiver behavior can help improve their content-understanding abilities.
We show this performance increase over 46 video and image understanding tasks over 26 benchmark datasets.
We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
- Score: 56.574610730939646
- License:
- Abstract: Communication is defined as "Who says what to whom with what effect". A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 46 video and image understanding tasks over 26 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
Related papers
- Feedback Loops With Language Models Drive In-Context Reward Hacking [78.9830398771605]
We show that feedback loops can cause in-context reward hacking (ICRH)
We identify and study two processes that lead to ICRH: output-refinement and policy-refinement.
As AI development accelerates, the effects of feedback loops will proliferate.
arXiv Detail & Related papers (2024-02-09T18:59:29Z) - From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes.
The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models.
Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z) - Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior [66.4024040742149]
We introduce the receivers' "behavior tokens," such as shares, likes, clicks, purchases, and retweets, in the LLM's training corpora to optimize content for the receivers and predict their behaviors.
Other than showing similar performance to LLMs on content understanding tasks, our trained models show generalization capabilities on the behavior dimension.
We call these models Large Content and Behavior Models (LCBMs)
arXiv Detail & Related papers (2023-09-01T09:34:49Z) - BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs [101.50522135049198]
BuboGPT is a multi-modal LLM with visual grounding that can perform cross-modal interaction between vision, audio and language.
Our contributions are two-fold: 1) An off-the-shelf visual grounding module based on SAM that extracts entities in a sentence and find corresponding masks in the image.
Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.
arXiv Detail & Related papers (2023-07-17T15:51:47Z) - Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training [108.25635150124539]
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs.
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects.
arXiv Detail & Related papers (2023-06-02T17:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.