Related papers: Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models

Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models

URL: http://arxiv.org/abs/2307.05646v1
Date: Tue, 11 Jul 2023 12:43:28 GMT
Title: Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models
Authors: Dhruv Mullick, Bilal Ghanem, Alona Fyshe
Abstract summary: Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR) We propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks.
Score: 4.2605449879340656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Customer feedback is invaluable to companies as they refine their products. Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) which allows us to analyse specific aspects of the products in reviews. Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR). In this work, we propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks. We show that the performance improvement is likely attributed to the improved model CR ability. We also release a new dataset that focuses on CR in ALSC.

Related papers

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement [49.687224320842105]
Large language models (LLMs) have recently transformed from text-based assistants to autonomous agents capable of planning, reasoning, and iteratively improving their actions. In this work, we introduce Critique-Guided Improvement (CGI), a novel two-player framework, comprising an actor model that explores an environment and a critic model that generates detailed nature language feedback.
arXiv Detail & Related papers (2025-03-20T10:42:33Z)
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.861013614500024]
We introduce a new benchmark designed to assess the critique capabilities of Large Language Models (LLMs) Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques.
arXiv Detail & Related papers (2025-01-24T13:48:10Z)
Enabling Scalable Oversight via Self-Evolving Critic [59.861013614500024]
SCRIT (Self-evolving CRITic) is a framework that enables genuine self-evolution of critique abilities. It self-improves by training on synthetic data, generated by a contrastive-based self-critic. It achieves up to a 10.3% improvement on critique-correction and error identification benchmarks.
arXiv Detail & Related papers (2025-01-10T05:51:52Z)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment [44.84304822376291]
Reward models (RMs) guide the alignment of large language models (LLMs) We propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios. Based on our benchmark, we conduct extensive analysis on the state-of-the-art RMs.
arXiv Detail & Related papers (2024-10-13T16:06:54Z)
Large Language Models for Page Stream Segmentation [0.03495246564946555]
Page Stream (PSS) is an essential prerequisite for automated document processing at scale. This paper introduces TABME++, an enhanced benchmark featuring commercial Optical Character Recognition (OCR) annotations. We evaluate the performance of large language models (LLMs) on PSS, focusing on decoder-based models fine-tuned with parameter-efficient methods.
arXiv Detail & Related papers (2024-08-21T20:28:42Z)
Learning to Refine with Fine-Grained Natural Language Feedback [81.70313509881315]
We propose looking at refinement with feedback as a composition of three distinct LLM competencies. A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors. We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries.
arXiv Detail & Related papers (2024-07-02T16:15:01Z)
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols. This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z)
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs [95.15814662348245]
Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs) have demonstrated remarkable proficiency in such reasoning tasks.
arXiv Detail & Related papers (2024-06-12T12:54:27Z)
Calibrated Self-Rewarding Vision Language Models [27.686545023186852]
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image. We propose the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning.
arXiv Detail & Related papers (2024-05-23T14:30:33Z)
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration [59.48235003469116]
We show that data augmentation consistently enhances OOD performance. We also show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance.
arXiv Detail & Related papers (2023-09-14T16:16:40Z)
Towards Automated Classification of Code Review Feedback to Support Analytics [4.423428708304586]
This study aims to develop an automated code review comment classification system. We trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics. Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy.
arXiv Detail & Related papers (2023-07-07T21:53:20Z)
High Quality Segmentation for Ultra High-resolution Images [72.97958314291648]
We propose the Continuous Refinement Model for the ultra high-resolution segmentation refinement task. Our proposed method is fast and effective on image segmentation refinement.
arXiv Detail & Related papers (2021-11-29T11:53:06Z)
CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC) CRAC yields new state-of-the-art performances on many benchmarks. In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.