A Scalable Unsupervised Framework for multi-aspect labeling of Multilingual and Multi-Domain Review Data
- URL: http://arxiv.org/abs/2505.09286v1
- Date: Wed, 14 May 2025 11:11:17 GMT
- Title: A Scalable Unsupervised Framework for multi-aspect labeling of Multilingual and Multi-Domain Review Data
- Authors: Jiin Park, Misuk Kim,
- Abstract summary: We propose a multilingual, scalable, and unsupervised framework for cross-domain aspect detection.<n>This framework is designed for multi-aspect labeling of multilingual and multi-domain review data.<n>A human evaluation confirms that the quality of the automatic labels is comparable to those created manually.
- Score: 11.92436948211501
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Effectively analyzing online review data is essential across industries. However, many existing studies are limited to specific domains and languages or depend on supervised learning approaches that require large-scale labeled datasets. To address these limitations, we propose a multilingual, scalable, and unsupervised framework for cross-domain aspect detection. This framework is designed for multi-aspect labeling of multilingual and multi-domain review data. In this study, we apply automatic labeling to Korean and English review datasets spanning various domains and assess the quality of the generated labels through extensive experiments. Aspect category candidates are first extracted through clustering, and each review is then represented as an aspect-aware embedding vector using negative sampling. To evaluate the framework, we conduct multi-aspect labeling and fine-tune several pretrained language models to measure the effectiveness of the automatically generated labels. Results show that these models achieve high performance, demonstrating that the labels are suitable for training. Furthermore, comparisons with publicly available large language models highlight the framework's superior consistency and scalability when processing large-scale data. A human evaluation also confirms that the quality of the automatic labels is comparable to those created manually. This study demonstrates the potential of a robust multi-aspect labeling approach that overcomes limitations of supervised methods and is adaptable to multilingual, multi-domain environments. Future research will explore automatic review summarization and the integration of artificial intelligence agents to further improve the efficiency and depth of review analysis.
Related papers
- Open-Vocabulary Object Detection via Language Hierarchy [58.674088014474506]
We design Language Hierarchical Self-training (LHST) that introduces language hierarchy into weakly-supervised detector training.
LHST expands the image-level labels with language hierarchy and enables co-regularization between the expanded labels and self-training.
The proposed techniques achieve superior generalization performance consistently across 14 widely studied object detection datasets.
arXiv Detail & Related papers (2024-10-27T08:20:03Z) - LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.<n> LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.<n>Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Universal Cross-Lingual Text Classification [0.3958317527488535]
This research proposes a novel perspective on Universal Cross-Lingual Text Classification.
Our approach involves blending supervised data from different languages during training to create a universal model.
The primary goal is to enhance label and language coverage, aiming for a label set that represents a union of labels from various languages.
arXiv Detail & Related papers (2024-06-16T17:58:29Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Harnessing the Power of Beta Scoring in Deep Active Learning for
Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework.
It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations.
Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z) - Multi-label and Multi-target Sampling of Machine Annotation for
Computational Stance Detection [44.90471123149513]
We introduce a multi-label and multi-target sampling strategy to optimize the annotation quality.
Experimental results on the benchmark stance detection corpora show that our method can significantly improve performance and learning efficacy.
arXiv Detail & Related papers (2023-11-08T06:54:34Z) - Reliable Representation Learning for Incomplete Multi-View Missing Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view missing multi-label classification network named RANK.<n>We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.<n>Our model is not only able to handle complete multi-view multi-label data, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - AX-MABSA: A Framework for Extremely Weakly Supervised Multi-label Aspect
Based Sentiment Analysis [8.067010122141985]
We present an extremely weakly supervised multi-label Aspect Category Sentiment Analysis framework.
We only rely on a single word per class as an initial indicative information.
We propose an automatic word selection technique to choose these seed categories and sentiment words.
arXiv Detail & Related papers (2022-11-07T19:44:42Z) - Improving Classification through Weak Supervision in Context-specific
Conversational Agent Development for Teacher Education [1.215785021723604]
The effort required to develop an educational scenario specific conversational agent is time consuming.
Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes.
We propose using a multi-task weak supervision method combined with active learning to address these concerns.
arXiv Detail & Related papers (2020-10-23T23:39:40Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.