RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian
News Texts
- URL: http://arxiv.org/abs/2305.17679v1
- Date: Sun, 28 May 2023 10:04:15 GMT
- Title: RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian
News Texts
- Authors: Anton Golubev, Nicolay Rusnachenko, Natalia Loukachevitch
- Abstract summary: The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts.
The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper describes the RuSentNE-2023 evaluation devoted to targeted
sentiment analysis in Russian news texts. The task is to predict sentiment
towards a named entity in a single sentence. The dataset for RuSentNE-2023
evaluation is based on the Russian news corpus RuSentNE having rich
sentiment-related annotation. The corpus is annotated with named entities and
sentiments towards these entities, along with related effects and emotional
states. The evaluation was organized using the CodaLab competition framework.
The main evaluation measure was macro-averaged measure of positive and negative
classes. The best results achieved were of 66% Macro F-measure
(Positive+Negative classes). We also tested ChatGPT on the test set from our
evaluation and found that the zero-shot answers provided by ChatGPT reached 60%
of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT
also provided detailed explanations of its conclusion. This can be considered
as quite high for zero-shot application.
Related papers
- Implicit Sentiment Analysis Based on Chain of Thought Prompting [1.4582633500696451]
This paper introduces a Sentiment Analysis of Thinking (SAoT) framework.
The framework first analyzes the implicit aspects and opinions in the text using common sense and thinking chain capabilities.
The model is evaluated on the SemEval 2014 dataset, consisting of 1120 restaurant reviews and 638 laptop reviews.
arXiv Detail & Related papers (2024-08-22T06:55:29Z) - Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation [50.60733773088296]
We conduct a comprehensive human evaluation of the results of several shared tasks from the last International Workshop on Spoken Language Translation (IWSLT 2023)
We propose an effective evaluation strategy based on automatic resegmentation and direct assessment with segment context.
Our analysis revealed that: 1) the proposed evaluation strategy is robust and scores well-correlated with other types of human judgements; 2) automatic metrics are usually, but not always, well-correlated with direct assessment scores; and 3) COMET as a slightly stronger automatic metric than chrF.
arXiv Detail & Related papers (2024-06-06T09:18:42Z) - Can ChatGPT evaluate research quality? [3.9627148816681284]
ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match REF criteria.
Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks.
arXiv Detail & Related papers (2024-02-08T10:00:40Z) - Beyond Sentiment: Leveraging Topic Metrics for Political Stance
Classification [1.0878040851638]
This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification.
The experiment results show that BERTopic improves coherence scores by 17.07% to 54.20% when compared to traditional approaches.
Our findings suggest topic metrics are especially effective for context-rich texts and corpus where stance and sentiment correlations are weak.
arXiv Detail & Related papers (2023-10-24T00:50:33Z) - Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models [66.12432440863816]
We propose Prometheus, a fully open-source Large Language Model (LLM) that is on par with GPT-4's evaluation capabilities.
Prometheus scores a Pearson correlation of 0.897 with human evaluators when evaluating with 45 customized score rubrics.
Prometheus achieves the highest accuracy on two human preference benchmarks.
arXiv Detail & Related papers (2023-10-12T16:50:08Z) - INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained
Feedback [80.57617091714448]
We present InstructScore, an explainable evaluation metric for text generation.
We fine-tune a text evaluation metric based on LLaMA, producing a score for generated text and a human readable diagnostic report.
arXiv Detail & Related papers (2023-05-23T17:27:22Z) - Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study [31.719155787410685]
ChatGPT has drawn great attention from both the research community and the public.
We provide a preliminary evaluation of ChatGPT on the understanding of emphopinions, emphsentiments, and emphemotions contained in the text.
arXiv Detail & Related papers (2023-04-10T00:55:59Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Just Rank: Rethinking Evaluation with Word and Sentence Similarities [105.5541653811528]
intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade.
This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations.
We propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks.
arXiv Detail & Related papers (2022-03-05T08:40:05Z) - An analysis of full-size Russian complexly NER labelled corpus of
Internet user reviews on the drugs based on deep learning and language neural
nets [94.37521840642141]
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews.
A set of advanced deep learning neural networks is used to extract pharmacologically meaningful entities from Russian texts.
arXiv Detail & Related papers (2021-04-30T19:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.