Impacts Towards a comprehensive assessment of the book impact by
integrating multiple evaluation sources
- URL: http://arxiv.org/abs/2107.10434v1
- Date: Thu, 22 Jul 2021 03:11:10 GMT
- Title: Impacts Towards a comprehensive assessment of the book impact by
integrating multiple evaluation sources
- Authors: Qingqing Zhou, Chengzhi Zhang
- Abstract summary: This paper measures book impact based on an evaluation system constructed by integrating multiple evaluation sources.
Various technologies (e.g. topic extraction, sentiment analysis, text classification) were used to extract corresponding evaluation metrics.
The reliability of the evaluation system was verified by comparing with the results of expert evaluation.
- Score: 6.568523667580746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The surge in the number of books published makes the manual evaluation
methods difficult to efficiently evaluate books. The use of books' citations
and alternative evaluation metrics can assist manual evaluation and reduce the
cost of evaluation. However, most existing evaluation research was based on a
single evaluation source with coarse-grained analysis, which may obtain
incomprehensive or one-sided evaluation results of book impact. Meanwhile,
relying on a single resource for book assessment may lead to the risk that the
evaluation results cannot be obtained due to the lack of the evaluation data,
especially for newly published books. Hence, this paper measured book impact
based on an evaluation system constructed by integrating multiple evaluation
sources. Specifically, we conducted finer-grained mining on the multiple
evaluation sources, including books' internal evaluation resources and external
evaluation resources. Various technologies (e.g. topic extraction, sentiment
analysis, text classification) were used to extract corresponding evaluation
metrics from the internal and external evaluation resources. Then, Expert
evaluation combined with analytic hierarchy process was used to integrate the
evaluation metrics and construct a book impact evaluation system. Finally, the
reliability of the evaluation system was verified by comparing with the results
of expert evaluation, detailed and diversified evaluation results were then
obtained. The experimental results reveal that differential evaluation
resources can measure the books' impacts from different dimensions, and the
integration of multiple evaluation data can assess books more comprehensively.
Meanwhile, the book impact evaluation system can provide personalized
evaluation results according to the users' evaluation purposes. In addition,
the disciplinary differences should be considered for assessing books' impacts.
Related papers
- Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.
Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.
We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z) - A Critical Look at Meta-evaluating Summarisation Evaluation Metrics [11.541368732416506]
We argue that the time is ripe to build more diverse benchmarks that enable the development of more robust evaluation metrics.
We call for research focusing on user-centric quality dimensions that consider the generated summary's communicative goal.
arXiv Detail & Related papers (2024-09-29T01:30:13Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Evaluation in Neural Style Transfer: A Review [0.7614628596146599]
We provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices.
We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons but will also enhance the comprehension and interpretation of research findings in the field.
arXiv Detail & Related papers (2024-01-30T15:45:30Z) - Evaluation and Measurement of Software Process Improvement -- A
Systematic Literature Review [6.973622134568803]
Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization.
This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives.
arXiv Detail & Related papers (2023-07-24T21:51:15Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - Revisiting the Gold Standard: Grounding Summarization Evaluation with
Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale.
We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units.
We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z) - Social Biases in Automatic Evaluation Metrics for NLG [53.76118154594404]
We propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics.
We construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks.
arXiv Detail & Related papers (2022-10-17T08:55:26Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - How to Evaluate a Summarizer: Study Design and Statistical Analysis for
Manual Linguistic Quality Evaluation [3.624563211765782]
We show that best choice of evaluation method can vary from one aspect to another.
We show that the total number of annotators can have a strong impact on study power.
Current statistical analysis methods can inflate type I error rates up to eight-fold.
arXiv Detail & Related papers (2021-01-27T10:14:15Z) - User and Item-aware Estimation of Review Helpfulness [4.640835690336653]
We investigate the role of deviations in the properties of reviews as helpfulness determinants.
We propose a novel helpfulness estimation model that extends previous ones.
Our model is thus an effective tool to select relevant user feedback for decision-making.
arXiv Detail & Related papers (2020-11-20T15:35:56Z) - Re-evaluating Evaluation in Text Summarization [77.4601291738445]
We re-evaluate the evaluation method for text summarization using top-scoring system outputs.
We find that conclusions about evaluation metrics on older datasets do not necessarily hold on modern datasets and systems.
arXiv Detail & Related papers (2020-10-14T13:58:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.