On the Use of Context for Predicting Citation Worthiness of Sentences in
Scholarly Articles
- URL: http://arxiv.org/abs/2104.08962v1
- Date: Sun, 18 Apr 2021 21:47:30 GMT
- Title: On the Use of Context for Predicting Citation Worthiness of Sentences in
Scholarly Articles
- Authors: Rakesh Gosangi, Ravneet Arora, Mohsen Gheisarieha, Debanjan Mahata,
Haimin Zhang
- Abstract summary: We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model.
We contribute a new benchmark dataset containing over two million sentences and their corresponding labels.
Our results quantify the benefits of using context and contextual embeddings for citation worthiness.
- Score: 10.28696219236292
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we study the importance of context in predicting the citation
worthiness of sentences in scholarly articles. We formulate this problem as a
sequence labeling task solved using a hierarchical BiLSTM model. We contribute
a new benchmark dataset containing over two million sentences and their
corresponding labels. We preserve the sentence order in this dataset and
perform document-level train/test splits, which importantly allows
incorporating contextual information in the modeling process. We evaluate the
proposed approach on three benchmark datasets. Our results quantify the
benefits of using context and contextual embeddings for citation worthiness.
Lastly, through error analysis, we provide insights into cases where context
plays an essential role in predicting citation worthiness.
Related papers
- ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Dataset of Quotation Attribution in German News Articles [19.222705178881558]
We present a new, freely available, creative-commons-licensed dataset for quotation attribution in German news articles based on WIKINEWS.
The dataset provides curated, high-quality annotations across 1000 documents (250,000 tokens)
arXiv Detail & Related papers (2024-04-25T17:19:13Z) - On Measuring Context Utilization in Document-Level MT Systems [12.02023514105999]
We propose to complement accuracy-based evaluation with measures of context utilization.
We show that automatically-annotated supporting context gives similar conclusions to human-annotated context.
arXiv Detail & Related papers (2024-02-02T13:37:07Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - Inline Citation Classification using Peripheral Context and
Time-evolving Augmentation [23.88211560188731]
We propose a new dataset, named 3Cext, which provides discourse information using the cited sentences.
We propose PeriCite, a Transformer-based deep neural network that fuses peripheral sentences and domain knowledge.
arXiv Detail & Related papers (2023-03-01T09:11:07Z) - Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets [18.754562380068815]
State-of-the-art contextualized models such as BERT use tasks such as WiC and WSD to evaluate their word-in-context representations.
This study presents the first quantitative analysis (using probing baselines) on the context-word interaction being tested in major contextual lexical semantic tasks.
arXiv Detail & Related papers (2021-12-13T15:37:05Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Article citation study: Context enhanced citation sentiment detection [11.610277023001807]
Citation sentimet analysis is one of the little studied tasks for scientometric analysis.
We developed eight datasets comprising citation sentences, which are manually annotated by us into three sentiment polarities.
We proposed an ensembled feature engineering method comprising word embeddings obtained for texts, parts-of-speech tags, and dependency relationships together.
arXiv Detail & Related papers (2020-05-10T00:27:19Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.