The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
- URL: http://arxiv.org/abs/2311.09193v1
- Date: Wed, 15 Nov 2023 18:39:21 GMT
- Title: The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
- Authors: Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee,
Yixin Nie
- Abstract summary: The study explores the effectiveness of the Chain-of-Thought approach in improving vision-language tasks.
We present the "Description then Decision" strategy, which is inspired by how humans process signals.
- Score: 51.47803406138838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study explores the effectiveness of the Chain-of-Thought approach, known
for its proficiency in language tasks by breaking them down into sub-tasks and
intermediate steps, in improving vision-language tasks that demand
sophisticated perception and reasoning. We present the "Description then
Decision" strategy, which is inspired by how humans process signals. This
strategy significantly improves probing task performance by 50%, establishing
the groundwork for future research on reasoning paradigms in complex
vision-language tasks.
Related papers
- Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Using Left and Right Brains Together: Towards Vision and Language
Planning [95.47128850991815]
We introduce a novel vision-language planning framework to perform concurrent visual and language planning for tasks with inputs of any form.
We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks.
arXiv Detail & Related papers (2024-02-16T09:46:20Z) - Improving Agent Interactions in Virtual Environments with Language
Models [0.9790236766474201]
This research focuses on a collective building assignment in the Minecraft dataset.
We employ language modeling to enhance task understanding through state-of-the-art methods.
arXiv Detail & Related papers (2024-02-08T06:34:11Z) - Igniting Language Intelligence: The Hitchhiker's Guide From
Chain-of-Thought Reasoning to Language Agents [80.5213198675411]
Large language models (LLMs) have dramatically enhanced the field of language intelligence.
LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer.
Recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents.
arXiv Detail & Related papers (2023-11-20T14:30:55Z) - Solving Dialogue Grounding Embodied Task in a Simulated Environment
using Further Masked Language Modeling [0.0]
Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using language models.
Our experimental results provide compelling evidence of the superiority of our proposed method.
arXiv Detail & Related papers (2023-06-21T17:17:09Z) - Context-Aware Language Modeling for Goal-Oriented Dialogue Systems [84.65707332816353]
We formulate goal-oriented dialogue as a partially observed Markov decision process.
We derive a simple and effective method to finetune language models in a goal-aware way.
We evaluate our method on a practical flight-booking task using AirDialogue.
arXiv Detail & Related papers (2022-04-18T17:23:11Z) - Multitasking Inhibits Semantic Drift [46.71462510028727]
We study the dynamics of learning in latent language policies (LLPs)
LLPs can solve challenging long-horizon reinforcement learning problems.
Previous work has found that LLP training is prone to semantic drift.
arXiv Detail & Related papers (2021-04-15T03:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.