Related papers: Comparative Analysis of Pooling Mechanisms in LLMs: A Sentiment Analysis Perspective

Comparative Analysis of Pooling Mechanisms in LLMs: A Sentiment Analysis Perspective

URL: http://arxiv.org/abs/2411.14654v1
Date: Fri, 22 Nov 2024 00:59:25 GMT
Title: Comparative Analysis of Pooling Mechanisms in LLMs: A Sentiment Analysis Perspective
Authors: Jinming Xing, Ruilin Xing, Yan Sun,
Abstract summary: Transformer-based models like BERT and GPT rely on pooling layers to aggregate token-level embeddings into sentence-level representations. Common pooling mechanisms such as Mean, Max, and Weighted Sum play a pivotal role in this aggregation process. This paper investigates the effects of these pooling mechanisms on two prominent LLM families -- BERT and GPT, in the context of sentence-level sentiment analysis.
Score: 2.2334256816037987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have revolutionized natural language processing (NLP) by delivering state-of-the-art performance across a variety of tasks. Among these, Transformer-based models like BERT and GPT rely on pooling layers to aggregate token-level embeddings into sentence-level representations. Common pooling mechanisms such as Mean, Max, and Weighted Sum play a pivotal role in this aggregation process. Despite their widespread use, the comparative performance of these strategies on different LLM architectures remains underexplored. To address this gap, this paper investigates the effects of these pooling mechanisms on two prominent LLM families -- BERT and GPT, in the context of sentence-level sentiment analysis. Comprehensive experiments reveal that each pooling mechanism exhibits unique strengths and weaknesses depending on the task's specific requirements. Our findings underline the importance of selecting pooling methods tailored to the demands of particular applications, prompting a re-evaluation of common assumptions regarding pooling operations. By offering actionable insights, this study contributes to the optimization of LLM-based models for downstream tasks.

Related papers

Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models [1.4999444543328293]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks. This paper investigates the quantization of LLMs, focusing on the LLaMA architecture and its derivatives. We propose a novel mixed-precision quantization approach tailored for LLaMA-like models.
arXiv Detail & Related papers (2025-04-30T11:52:18Z)
Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks [0.0]
This study explores the potential of small language model(SLM) ensembles to achieve accuracy comparable to proprietary large language models (LLMs) We propose Ensemble Bayesian Inference (EBI), a novel approach that applies Bayesian estimation to combine judgments from multiple SLMs.
arXiv Detail & Related papers (2025-04-24T15:55:10Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification [17.512415475301395]
We investigate model editing to serve an efficient method for adapting large language models (LLMs) to solve aspect-based sentiment classification. Our findings reveal that a distinct set of mid-layer representations is essential for detecting the sentiment polarity of given aspect words. We develop a model editing approach that focuses exclusively on these critical parts of the LLM, leading to a more efficient method for adapting LLMs.
arXiv Detail & Related papers (2025-03-19T11:21:37Z)
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning [52.29223403698673]
This paper examines the use of Conformal Language Modelling (CLM) alongside Answer Set Programming (ASP) We apply CLM to generate sets of ASP programs from an LLM, providing statistical guarantees on the correctness of the outputs. Experimental results show that CLM significantly outperforms baseline models that use standard sampling methods.
arXiv Detail & Related papers (2025-03-07T14:10:10Z)
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems [102.36545569092777]
We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks.
arXiv Detail & Related papers (2025-02-06T21:27:11Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
Probing Ranking LLMs: Mechanistic Interpretability in Information Retrieval [22.875174888476295]
We study the workings of state-of-the-art, fine-tuning-based passage-reranking transformer networks. Our approach involves a probing-based, layer-by-layer analysis of neurons within ranking LLMs. We identify individual or groups of known human-engineered and semantic features within the network's activations.
arXiv Detail & Related papers (2024-10-24T08:20:10Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model [14.480267340831542]
We propose Structure-aware Planning with Accurate World Model (SWAP) for large language models (LLMs) SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks.
arXiv Detail & Related papers (2024-10-04T04:23:36Z)
Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? [18.990655668481075]
We propose a new pooling strategy, Multi-Layers Trainable Pooling, which transforms the outputs of all hidden layers, rather than just the last layer, using a cross-attention network. This paper sheds light on effective training strategies for LLM-based embedding models.
arXiv Detail & Related papers (2024-09-04T14:01:48Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox [46.39670209441478]
Large language models (LLMs) have exhibited exciting progress in multiple scenarios. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. This work provides a comprehensive benchmark suite for this research topic, including an evaluation system, detailed analyses, and a general toolbox.
arXiv Detail & Related papers (2024-06-15T12:02:14Z)
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration [39.35476224845088]
Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. We propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step.
arXiv Detail & Related papers (2024-04-19T08:52:22Z)
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models [56.256069117502385]
Chain of Thought (CoT) approaches can be used to enhance the capability of Large Language Models (LLMs) on complex reasoning tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored. We introduce a novel approach that addresses this challenge by using retrieval mechanisms to automatically select demonstration examples.
arXiv Detail & Related papers (2023-12-04T08:07:21Z)
Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.