Multi-Novelty: Improve the Diversity and Novelty of Contents Generated by Large Language Models via inference-time Multi-Views Brainstorming
- URL: http://arxiv.org/abs/2502.12700v1
- Date: Tue, 18 Feb 2025 10:04:20 GMT
- Title: Multi-Novelty: Improve the Diversity and Novelty of Contents Generated by Large Language Models via inference-time Multi-Views Brainstorming
- Authors: Arash Lagzian, Srinivas Anumasa, Dianbo Liu,
- Abstract summary: Large Language Models (LLMs) demonstrate remarkable proficiency in generating accurate and fluent text.<n>They often struggle with diversity and novelty, leading to repetitive or overly deterministic responses.<n>We introduce inference-time multi-view brainstorming method, a novel approach that enriches input prompts with diverse perspectives.
- Score: 3.591342811819669
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) demonstrate remarkable proficiency in generating accurate and fluent text. However, they often struggle with diversity and novelty, leading to repetitive or overly deterministic responses. These limitations stem from constraints in training data, including gaps in specific knowledge domains, outdated information, and an over-reliance on textual sources. Such shortcomings reduce their effectiveness in tasks requiring creativity, multi-perspective reasoning, and exploratory thinking, such as LLM based AI scientist agents and creative artist agents . To address this challenge, we introduce inference-time multi-view brainstorming method, a novel approach that enriches input prompts with diverse perspectives derived from both textual and visual sources, which we refere to as "Multi-Novelty". By incorporating additional contextual information as diverse starting point for chain of thoughts, this method enhances the variety and creativity of generated outputs. Importantly, our approach is model-agnostic, requiring no architectural modifications and being compatible with both open-source and proprietary LLMs.
Related papers
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey [124.23247710880008]
multimodal CoT (MCoT) reasoning has recently garnered significant research attention.
Existing MCoT studies design various methodologies to address the challenges of image, video, speech, audio, 3D, and structured data.
We present the first systematic survey of MCoT reasoning, elucidating the relevant foundational concepts and definitions.
arXiv Detail & Related papers (2025-03-16T18:39:13Z) - Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey [46.617998833238126]
Large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing.<n>The convergence of these technologies has catalyzed the rise of multimodal AI, enabling richer, cross-modal understanding that spans text, vision, audio, and video modalities.<n>Multimodal large language models (MLLMs) have emerged as a powerful framework, demonstrating impressive capabilities in tasks like image-text generation, visual question answering, and cross-modal retrieval.<n>Despite these advancements, the complexity and scale of MLLMs introduce significant challenges in interpretability and explainability, essential for establishing
arXiv Detail & Related papers (2024-12-03T02:54:31Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation [1.9939549451457024]
This paper explores how different prompting techniques impact performance on riddles that demand diverse reasoning skills.
We introduce RISCORE, a fully automated prompting method that generates and utilizes contextually reconstructed sentence-based puzzles.
Our experiments demonstrate that RISCORE significantly improves the performance of language models in both vertical and lateral thinking tasks.
arXiv Detail & Related papers (2024-09-24T18:35:09Z) - Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach.<n>Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens.<n>We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in
Language Models [25.058162782167503]
Chain-of-thought (CoT) reasoning has exhibited impressive performance in language models for solving complex tasks and answering questions.
We introduce a novel approach for multi-modal CoT reasoning that utilizes latent space learning via diffusion processes to generate effective image features that align with language thoughts.
Our method fuses image features and text representations at a deep level and improves the complex reasoning ability of multi-modal CoT.
arXiv Detail & Related papers (2023-12-14T09:13:09Z) - TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild [102.93338424976959]
We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved instruction-following capabilities.
Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model.
To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models.
arXiv Detail & Related papers (2023-09-14T15:34:01Z) - Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs)
Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process.
We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z) - Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector.
The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Retrieving Multimodal Information for Augmented Generation: A Survey [35.33076940985081]
We review methods that assist and augment generative models by retrieving multimodal knowledge.
Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness.
arXiv Detail & Related papers (2023-03-20T05:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.