Anchor Prediction: Automatic Refinement of Internet Links
- URL: http://arxiv.org/abs/2305.14337v2
- Date: Wed, 24 May 2023 07:12:33 GMT
- Title: Anchor Prediction: Automatic Refinement of Internet Links
- Authors: Nelson F. Liu and Kenton Lee and Kristina Toutanova
- Abstract summary: We introduce the task of anchor prediction.
The goal is to identify the specific part of the linked target webpage that is most related to the source linking context.
We release the AuthorAnchors dataset, a collection of 34K naturally-occurring anchored links.
- Score: 25.26235117917374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Internet links enable users to deepen their understanding of a topic by
providing convenient access to related information. However, the majority of
links are unanchored -- they link to a target webpage as a whole, and readers
may expend considerable effort localizing the specific parts of the target
webpage that enrich their understanding of the link's source context. To help
readers effectively find information in linked webpages, we introduce the task
of anchor prediction, where the goal is to identify the specific part of the
linked target webpage that is most related to the source linking context. We
release the AuthorAnchors dataset, a collection of 34K naturally-occurring
anchored links, which reflect relevance judgments by the authors of the source
article. To model reader relevance judgments, we annotate and release
ReaderAnchors, an evaluation set of anchors that readers find useful. Our
analysis shows that effective anchor prediction often requires jointly
reasoning over lengthy source and target webpages to determine their implicit
relations and identify parts of the target webpage that are related but not
redundant. We benchmark a performant T5-based ranking approach to establish
baseline performance on the task, finding ample room for improvement.
Related papers
- Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents [68.22496852535937]
We introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning.
Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly.
We train our intent predictor to predict the next intent given the agent's past observations and actions.
arXiv Detail & Related papers (2024-10-29T21:37:04Z) - Revisiting Link Prediction: A Data Perspective [61.52668130971441]
Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction.
Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets.
We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity.
arXiv Detail & Related papers (2023-10-01T21:09:59Z) - Unsupervised Dense Retrieval Training with Web Anchors [29.44275536993025]
We train an unsupervised dense retriever, Anchor-DR, with a contrastive learning task that matches the anchor text and the linked document.
Experiments show that Anchor-DR outperforms state-of-the-art methods on unsupervised dense retrieval by a large margin.
Our analysis further reveals that the pattern of anchor-document pairs is similar to that of search query-document pairs.
arXiv Detail & Related papers (2023-05-10T01:46:17Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Anchor Prediction: A Topic Modeling Approach [2.0411082897313984]
We propose an annotation, which we refer to as anchor prediction.
Given a source document and a target document, this task consists in automatically identifying anchors in the source document.
We propose a contextualized relational topic model, CRTM, that models directed links between documents.
arXiv Detail & Related papers (2022-05-29T11:26:52Z) - Target-aware Abstractive Related Work Generation with Contrastive
Learning [48.02845973891943]
The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers.
Most of the existing related work section generation methods rely on extracting off-the-shelf sentences.
We propose an abstractive target-aware related work generator (TAG), which can generate related work sections consisting of new sentences.
arXiv Detail & Related papers (2022-05-26T13:20:51Z) - Personalized multi-faceted trust modeling to determine trust links in
social media and its potential for misinformation management [61.88858330222619]
We present an approach for predicting trust links between peers in social media.
We propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis.
Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset.
arXiv Detail & Related papers (2021-11-11T19:40:51Z) - Prediction of new outlinks for focused Web crawling [0.0]
This work provides a methodology for detecting new links effectively using a short history.
We provide statistical models for three targets: the link change rate, the presence of new links, and the number of new links.
A notable finding is that, if the history of the target page is not available, then our new features, that represent the history of related pages, are most predictive for new links in the target page.
arXiv Detail & Related papers (2021-11-09T11:36:21Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - A Multilingual Entity Linking System for Wikipedia with a
Machine-in-the-Loop Approach [2.2889152373118975]
Despite Wikipedia editors' efforts to add and maintain its content, the distribution of links remains sparse in many language editions.
This paper introduces a machine-in-the-loop entity linking system that can comply with community guidelines for adding a link.
We develop an interactive recommendation interface that proposes candidate links to editors who can confirm, reject, or adapt the recommendation.
arXiv Detail & Related papers (2021-05-31T16:29:42Z) - Predicting Links on Wikipedia with Anchor Text Information [0.571097144710995]
We study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia.
We propose an appropriate evaluation sampling methodology and compare several algorithms.
arXiv Detail & Related papers (2021-05-25T07:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.