An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad
Prediction
- URL: http://arxiv.org/abs/2311.01713v1
- Date: Fri, 3 Nov 2023 05:00:44 GMT
- Title: An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad
Prediction
- Authors: Junxian Zhou, Haiqin Yang, Ye Junpeng, Yuxuan He and Hao Mou
- Abstract summary: We construct two large Chinese ASQP datasets crawled from multiple online platforms.
The datasets hold several significant characteristics: larger size (each with 10,000+ samples), rich aspect categories, more words per sentence, and higher density than existing ASQP datasets.
We are the first to evaluate the performance of Generative Pre-trained Transformer (GPT) series models on ASQP and exhibit potential issues.
- Score: 6.189770781546809
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Aspect sentiment quad prediction (ASQP) is a critical subtask of aspect-level
sentiment analysis. Current ASQP datasets are characterized by their small size
and low quadruple density, which hinders technical development. To expand
capacity, we construct two large Chinese ASQP datasets crawled from multiple
online platforms. The datasets hold several significant characteristics: larger
size (each with 10,000+ samples) and rich aspect categories, more words per
sentence, and higher density than existing ASQP datasets. Moreover, we are the
first to evaluate the performance of Generative Pre-trained Transformer (GPT)
series models on ASQP and exhibit potential issues. The experiments with
state-of-the-art ASQP baselines underscore the need to explore additional
techniques to address ASQP, as well as the importance of further investigation
into methods to improve the performance of GPTs.
Related papers
- Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression.
Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth.
arXiv Detail & Related papers (2025-02-18T07:35:35Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.
We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.
Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo [0.5110571587151475]
'RetChemQA' is a benchmark dataset designed to evaluate the capabilities of machine learning models in the domain of reticular chemistry.
This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type.
The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group.
arXiv Detail & Related papers (2024-05-03T14:29:54Z) - TextSquare: Scaling up Text-Centric Visual Instruction Tuning [64.55339431760727]
We introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M.
Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs.
It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks.
arXiv Detail & Related papers (2024-04-19T11:38:08Z) - Adaptive Data Augmentation for Aspect Sentiment Quad Prediction [21.038795249448675]
Aspect sentiment quad prediction (ASQP) aims to predict the quad sentiment elements for a given sentence.
Data imbalance issue has not received sufficient attention in ASQP task.
We propose an Adaptive Data Augmentation (ADA) framework to tackle the imbalance issue.
arXiv Detail & Related papers (2024-01-12T06:20:56Z) - A Unified One-Step Solution for Aspect Sentiment Quad Prediction [3.428123050377681]
Aspect sentiment quad prediction (ASQP) is a challenging yet significant subtask in aspect-based sentiment analysis.
We release two new datasets for ASQP, which contain the following characteristics: larger size, more words per sample, and higher density.
We propose a unified one-step solution for ASQP, namely One-ASQP, to detect the aspect categories and to identify the aspect-opinion-sentiment triplets simultaneously.
arXiv Detail & Related papers (2023-06-07T05:00:01Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.