Fugu-MT 論文翻訳(概要): OCR quality affects perceived usefulness of historical newspaper clippings -- a user study

論文の概要: OCR quality affects perceived usefulness of historical newspaper clippings -- a user study

arxiv url: http://arxiv.org/abs/2203.03557v1
Date: Fri, 4 Mar 2022 11:49:54 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-10 12:00:49.275187
Title: OCR quality affects perceived usefulness of historical newspaper clippings -- a user study
Title（参考訳）: OCRの品質が歴史的新聞クリッピングの有用性に影響を及ぼす-ユーザー調査
Authors: Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula P\"a\"akk\"onen and Juha Rautiainen
Abstract要約: ユーザ指向情報検索設定において,光学文字認識(OCR)品質の影響について検討した。本研究の主な成果は,光学的文字認識能力の向上が歴史的新聞記事の有用性を著しく左右することである。
参考スコア（独自算出の注目度）: 0.6299766708197884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results. Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g., [3]). In this paper the effects of OCR quality are studied in a user-oriented information retrieval setting. Thirty-two users evaluated subjectively query results of six topics each (out of 30 topics) based on pre-formulated queries using a simulated work task setting. To the best of our knowledge our simulated work task experiment is the first one showing empirically that users' subjective relevance assessments of retrieved documents are affected by a change in the quality of optically read text. Users of historical newspaper collections have so far commented effects of OCR'ed data quality mainly in impressionistic ways, and controlled user environments for studying effects of OCR quality on users' relevance assessments of the retrieval results have so far been missing. To remedy this The National Library of Finland (NLF) set up an experimental query environment for the contents of one Finnish historical newspaper, Uusi Suometar 1869-1918, to be able to compare users' evaluation of search results of two different OCR qualities for digitized newspaper articles. The query interface was able to present the same underlying document for the user based on two alternatives: either based on the lower OCR quality, or based on the higher OCR quality, and the choice was randomized. The users did not know about quality differences in the article texts they evaluated. The main result of the study is that improved optical character recognition quality affects perceived usefulness of historical newspaper articles significantly. The mean average evaluation score for the improved OCR results was 7.94% higher than the mean average evaluation score of the old OCR results.
Abstract（参考訳）: 歴史的情報検索における光学文字認識(OCR)の品質の影響を,検索結果の有効性に関するデータ指向のシナリオで検討した。このような研究は、人工的に劣化したOCRの品質(例: [1-2])や、真に低品質なOCRデータに基づくテキストを含むテストコレクション(例: [3])の影響に焦点を当てている。本稿では,ユーザ指向情報検索環境におけるOCR品質の影響について検討する。シミュレーション作業タスク設定を用いて,前処理クエリに基づいて6つのトピック(30トピック中)の主観的なクエリ結果を評価した。我々の知る限り、我々の模擬作業実験は、ユーザが検索した文書の主観的関連性評価が、光学的に読まれたテキストの品質の変化によって影響を受けることを実証的に示す最初のものである。歴史的新聞コレクションの利用者は,ocrのデータ品質が印象主義的な効果を主に有しており,ocr品質が検索結果の妥当性評価に与える影響を調べるためのユーザ環境は,これまで失われてきた。フィンランド国立図書館(NLF)は、フィンランドの歴史新聞Uusi Suometar 1869-1918のコンテンツに対して、デジタル化された新聞記事の2つの異なるOCR品質の検索結果に対するユーザによる評価を比較するために、実験的なクエリ環境を構築した。クエリインターフェースは,ocr品質の低いもの,あるいはocr品質の高いもの,という2つの代替案に基づいて,ユーザに対して同じ基礎となるドキュメントを表示することが可能で,選択はランダム化された。ユーザは、評価した記事のテキストの品質の違いを知らなかった。本研究の主な成果は,光学的文字認識精度の向上が歴史的新聞記事の有用性に有意な影響を与えることにある。改善OCR結果の平均評価スコアは,旧OCR結果の平均評価スコアよりも7.94%高かった。

論文の概要: OCR quality affects perceived usefulness of historical newspaper clippings -- a user study

関連論文リスト