Related papers: Link Prediction for Wikipedia Articles as a Natural Language Inference Task

Link Prediction for Wikipedia Articles as a Natural Language Inference Task

URL: http://arxiv.org/abs/2308.16469v2
Date: Tue, 5 Sep 2023 09:34:55 GMT
Title: Link Prediction for Wikipedia Articles as a Natural Language Inference Task
Authors: Chau-Thang Phan, Quoc-Nam Nguyen, Kiet Van Nguyen
Abstract summary: This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. We implement our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively.
Score: 1.1842520528140819
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Link prediction task is vital to automatically understanding the structure of large knowledge bases. In this paper, we present our system to solve this task at the Data Science and Advanced Analytics 2023 Competition "Efficient and Effective Link Prediction" (DSAA-2023 Competition) with a corpus containing 948,233 training and 238,265 for public testing. This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles. We implemented our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively. Our team UIT-NLP ranked 3rd in performance on the private test set, equal to the scores of the first and second places. Our code is publicly for research purposes.

Related papers

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
A Text-based Approach For Link Prediction on Wikipedia Articles [1.9567015559455132]
This paper present our work in the DSAA 2023 Challenge about Link Prediction for Wikipedia Articles. We use traditional machine learning models with POS tags (part-of-speech tags) to train the classification model for predicting whether two nodes has the link. We obtained the results by F1 score at 0.99999 and got 7th place in the competition.
arXiv Detail & Related papers (2023-09-01T08:00:43Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard. GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z)
Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. We design a simple but effective ensemble-based framework that combines various transfer learning techniques. We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z)
Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic Learning [53.73316523766183]
Coreference resolution systems need to tackle two main tasks. One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention. We propose a hybrid rule-neural coreference resolution system based on actor-critic learning.
arXiv Detail & Related papers (2022-12-20T08:55:47Z)
Yseop at FinSim-3 Shared Task 2021: Specializing Financial Domain Learning with Phrase Representations [0.0]
We present our approaches for the FinSim-3 Shared Task 2021: Learning Semantic Similarities for the Financial Domain. The aim of this task is to correctly classify a list of given terms from the financial domain into the most relevant hypernym. Our system ranks 2nd overall on both metrics, scoring 0.917 on Average Accuracy and 1.141 on Mean Rank.
arXiv Detail & Related papers (2021-08-21T10:53:12Z)
NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features. Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z)
Predicting Typological Features in WALS using Language Embeddings and Conditional Probabilities: \'UFAL Submission to the SIGTYP 2020 Shared Task [1.4848029858256582]
We submit a constrained system, predicting typological features only based on the WALS database. We reach the accuracy of 70.7% on the test data and rank first in the shared task.
arXiv Detail & Related papers (2020-10-08T12:05:48Z)
Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not. Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z)
ALPINE: Active Link Prediction using Network Embedding [20.976178936255927]
We propose ALPINE (Active Link Prediction usIng Network Embedding) for link prediction based on network embedding. We show that ALPINE is scalable, and boosts link prediction accuracy with far fewer queries.
arXiv Detail & Related papers (2020-02-04T11:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.