Optical character recognition quality affects perceived usefulness of
historical newspaper clippings
- URL: http://arxiv.org/abs/2206.00369v1
- Date: Wed, 1 Jun 2022 10:07:50 GMT
- Title: Optical character recognition quality affects perceived usefulness of
historical newspaper clippings
- Authors: Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula
P\"a\"akk\"onen and Juha Rautiainen
- Abstract summary: Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869-1918.
Our article search database had two versions of each article with different quality optical character recognition.
- Score: 0.6299766708197884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Introduction. We study effect of different quality optical character
recognition in interactive information retrieval with a collection of one
digitized historical Finnish newspaper. Method. This study is based on the
simulated interactive information retrieval work task model. Thirty-two users
made searches to an article collection of Finnish newspaper Uusi Suometar
1869-1918 with ca. 1.45 million auto segmented articles. Our article search
database had two versions of each article with different quality optical
character recognition. Each user performed six pre-formulated and six
self-formulated short queries and evaluated subjectively the top-10 results
using graded relevance scale of 0-3 without knowing about the optical character
recognition quality differences of the otherwise identical articles. Analysis.
Analysis of the user evaluations was performed by comparing mean averages of
evaluations scores in user sessions. Differences of query results were detected
by analysing lengths of returned articles in pre-formulated and self-formulated
queries and number of different documents retrieved overall in these two
sessions. Results. The main result of the study is that improved optical
character recognition quality affects perceived usefulness of historical
newspaper articles positively. Conclusions. We were able to show that
improvement in optical character recognition quality of documents leads to
higher mean relevance evaluation scores of query results in our historical
newspaper collection. To the best of our knowledge this simulated interactive
user-task is the first one showing empirically that users' subjective relevance
assessments are affected by a change in the quality of optically read text.
Related papers
- Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation.
We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Comprehending Lexical and Affective Ontologies in the Demographically
Diverse Spatial Social Media Discourse [0.0]
This study aims to comprehend linguistic and socio-demographic features, encompassing English language styles, conveyed sentiments, and lexical diversity within social media data.
Our analysis entails the extraction and examination of various statistical, grammatical, and sentimental features from two groups.
Our investigation unveils substantial disparities in certain linguistic attributes between the two groups, yielding a macro F1 score of approximately 0.85.
arXiv Detail & Related papers (2023-11-12T04:23:33Z) - Chain-of-Factors Paper-Reviewer Matching [32.86512592730291]
We propose a unified model for paper-reviewer matching that jointly considers semantic, topic, and citation factors.
We demonstrate the effectiveness of our proposed Chain-of-Factors model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
arXiv Detail & Related papers (2023-10-23T01:29:18Z) - Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference.
The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z) - Where Does the Performance Improvement Come From? - A Reproducibility
Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval.
We first examine the related concerns and why the focus is on image-text retrieval tasks.
We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z) - OCR quality affects perceived usefulness of historical newspaper
clippings -- a user study [0.6299766708197884]
The effects of Optical Character Recognition (OCR) quality are studied in a user-oriented information retrieval setting.
The main result of the study is that improved optical character recognition quality affects perceived usefulness of historical newspaper articles significantly.
arXiv Detail & Related papers (2022-03-04T11:49:54Z) - Automatic Main Character Recognition for Photographic Studies [78.88882860340797]
Main characters in images are the most important humans that catch the viewer's attention upon first look.
Identifying the main character in images plays an important role in traditional photographic studies and media analysis.
We propose a method for identifying the main characters using machine learning based human pose estimation.
arXiv Detail & Related papers (2021-06-16T18:14:45Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Cognitive Representation Learning of Self-Media Online Article Quality [24.084727302752377]
Self-media online articles are mainly created by users, which have the appearance characteristics of different text levels and multi-modal hybrid editing.
We establish a joint model CoQAN in combination with the layout organization, writing characteristics and text semantics.
We have also constructed a large scale real-world assessment dataset.
arXiv Detail & Related papers (2020-08-13T02:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.