Fugu-MT 論文翻訳(概要): CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding

論文の概要: CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding

arxiv url: http://arxiv.org/abs/2105.10912v1
Date: Sun, 23 May 2021 11:08:45 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-25 15:26:19.174800
Title: CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding
Title（参考訳）: CiteWorth:Cite-Worthiness Detection for Improved Scientific Document Understanding
Authors: Dustin Wright and Isabelle Augenstein
Abstract要約: 本研究は,文が外部ソースを引用するか否かをラベル付けした英語における引用親和性検出の詳細な研究である。 CiteWorthは高品質で、挑戦的で、ドメイン適応のような問題の研究に適している。
参考スコア（独自算出の注目度）: 23.930041685595775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with scientific text require expensive manual annotation and tend to be small and limited to only one or a few fields. At the same time, scientific documents contain many potential training signals, such as citations, which can be used to build large labelled datasets. Given this, we present an in-depth study of cite-worthiness detection in English, where a sentence is labelled for whether or not it cites an external source. To accomplish this, we introduce CiteWorth, a large, contextualized, rigorously cleaned labelled dataset for cite-worthiness detection built from a massive corpus of extracted plain-text scientific documents. We show that CiteWorth is high-quality, challenging, and suitable for studying problems such as domain adaptation. Our best performing cite-worthiness detection model is a paragraph-level contextualized sentence labelling model based on Longformer, exhibiting a 5 F1 point improvement over SciBERT which considers only individual sentences. Finally, we demonstrate that language model fine-tuning with cite-worthiness as a secondary task leads to improved performance on downstream scientific document understanding tasks.
Abstract（参考訳）: データは極めてドメイン固有で多様であるため、科学的文書理解は困難である。しかし、科学的なテキストを持つタスクのデータセットは、高価な手作業のアノテーションを必要とし、1つまたは少数のフィールドに限られる傾向がある。同時に、科学文書には、大きなラベル付きデータセットを構築するために使用できる引用など、潜在的なトレーニング信号が多数含まれている。そこで,本研究では,文が外部ソースを引用するか否かをラベル付けした,英語における引用適性検出に関する詳細な研究を行う。これを実現するために,抽出された平文科学文書の膨大なコーパスから構築された引用価値検出のための,大きく,文脈化され,厳格に整理されたラベル付きデータセットであるciteworthを紹介する。我々は、CiteWorthが高品質で、挑戦的で、ドメイン適応のような問題の研究に適していることを示す。提案手法はLongformerに基づく段落レベルの文ラベル付けモデルであり,個々の文のみを考慮したSciBERTよりも5F1ポイント改善されている。最後に,第2タスクとしての引用性を考慮した言語モデルの微調整が,下流の科学的文書理解タスクの性能向上につながることを示す。

関連論文リスト

Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning [12.377363857246602]
短文分類のためのMI-DELIGHTという新しいモデルを提案する。まず、スパーシリティの問題を軽減するために、マルチソース情報探索を行う。次に,短いテキストの表現を学習するために,グラフ学習アプローチを採用する。
論文参考訳（メタデータ） (2025-01-16T00:26:15Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
機械生成コンテンツは、学術プラジャリズムや誤報の拡散といった課題を提起する。これらの課題を克服するために、新しい方法論とデータセットを導入します。人間の筆記スタイルをエミュレートするエンコーダデコーダモデルであるMhBARTを提案する。また,PDTB前処理による談話解析を統合し,構造的特徴を符号化するモデルであるDTransformerを提案する。
論文参考訳（メタデータ） (2024-12-17T08:47:41Z)
Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models [0.0]
本稿では,注目機構と文脈情報を備えたBidirectional Long Short-Term Memory (BiLSTM) ネットワークを提案し,引用を必要とする文を検出する。我々は、PubMed Open Access Subsetに基づく新しい大規模データセット(PMOA-CITE)を作成します。
論文参考訳（メタデータ） (2024-05-20T17:45:36Z)
Context-Enhanced Language Models for Generating Multi-Paper Citations [35.80247519023821]
本稿では,Large Language Models (LLMs) を用いて多文文を生成する手法を提案する。提案手法は,複数文の引用文を含むコヒーレントな段落に終止符を打つ,単一のソース・ペーパーと対象論文の集合を包含する。
論文参考訳（メタデータ） (2024-04-22T04:30:36Z)
ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science [0.0]
大きな言語モデルは、多くの自然言語処理タスクで印象的なパフォーマンスを記録します。 Retrieval augmentationは、外部の知識ソースからコンテキストを取得することで、効果的なソリューションを提供する。本稿では,検索強化時に文書構造に対応する構造対応検索言語モデルを提案する。
論文参考訳（メタデータ） (2023-11-21T02:02:46Z)
CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal Assistive Writing [44.75251805925605]
本稿では,Caselaw Access Project (CAP) の法域における引用・安心度検出のための178万文のラベル付きデータセットを紹介する。本論文では,様々な深層学習モデルの性能について検討した。ドメイン固有の事前学習モデルは、他のモデルよりも優れている傾向があり、88%のF1スコアが引用-可視性検出タスクである。
論文参考訳（メタデータ） (2023-05-03T04:20:56Z)
CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBenchは引用テキスト生成のベンチマークである。 CiteBenchのコードはhttps://github.com/UKPLab/citebench.comで公開しています。
論文参考訳（メタデータ） (2022-12-19T16:10:56Z)
Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
我々は、引用グラフを活用して、異なる設定下での科学的論文の抽出要約を改善することに重点を置いている。予備的な結果は、単純な教師なしフレームワークであっても、引用グラフが有用であることを示している。そこで我々は,大規模ラベル付きデータが利用可能である場合のタスクにおいて,より正確な結果を得るために,グラフベースのスーパービジョン・サムライゼーション・モデル(GSS)を提案する。
論文参考訳（メタデータ） (2022-12-08T11:53:12Z)
Towards generating citation sentences for multiple references with intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to handlee with multiple long inputs。実験により,提案手法は引用文を生成するためのより包括的な特徴を提供することが示された。
論文参考訳（メタデータ） (2021-12-02T15:32:24Z)
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction [89.33938657493765]
引用論文と引用論文の参照リンクの引用グラフを使用する。最先端技術に対するエンド・ツー・エンドの情報抽出の大幅な改善を観察する。
論文参考訳（メタデータ） (2021-06-03T03:00:12Z)
Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
引用グラフを用いて科学論文の要約作業を再定義します。我々は,141kの研究論文を異なる領域に格納した,新しい科学論文要約データセットセマンティックスタディネットワーク(ssn)を構築した。我々のモデルは、事前訓練されたモデルと比較して競争性能を達成することができる。
論文参考訳（メタデータ） (2021-04-07T11:13:35Z)
SPECTER: Document-level Representation Learning using Citation-informed Transformers [51.048515757909215]
SPECTERは、Transformer言語モデルの事前学習に基づいて、科学文書の文書レベルの埋め込みを生成する。 SciDocsは、引用予測から文書分類、レコメンデーションまでの7つの文書レベルのタスクからなる新しい評価ベンチマークである。
論文参考訳（メタデータ） (2020-04-15T16:05:51Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。