A Text-based Approach For Link Prediction on Wikipedia Articles
- URL: http://arxiv.org/abs/2309.00317v2
- Date: Tue, 7 Nov 2023 03:32:14 GMT
- Title: A Text-based Approach For Link Prediction on Wikipedia Articles
- Authors: Anh Hoang Tran, Tam Minh Nguyen and Son T. Luu
- Abstract summary: This paper present our work in the DSAA 2023 Challenge about Link Prediction for Wikipedia Articles.
We use traditional machine learning models with POS tags (part-of-speech tags) to train the classification model for predicting whether two nodes has the link.
We obtained the results by F1 score at 0.99999 and got 7th place in the competition.
- Score: 1.9567015559455132
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper present our work in the DSAA 2023 Challenge about Link Prediction
for Wikipedia Articles. We use traditional machine learning models with POS
tags (part-of-speech tags) features extracted from text to train the
classification model for predicting whether two nodes has the link. Then, we
use these tags to test on various machine learning models. We obtained the
results by F1 score at 0.99999 and got 7th place in the competition. Our source
code is publicly available at this link:
https://github.com/Tam1032/DSAA2023-Challenge-Link-prediction-DS-UIT_SAT
Related papers
- Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing [1.124958340749622]
The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles.
This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions.
Our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams.
arXiv Detail & Related papers (2024-07-25T02:48:56Z) - Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images [76.47980643420375]
This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences.
We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision.
Our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW.
arXiv Detail & Related papers (2023-11-30T13:22:15Z) - Link Prediction for Wikipedia Articles as a Natural Language Inference
Task [1.1842520528140819]
This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task.
We implement our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task.
Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively.
arXiv Detail & Related papers (2023-08-31T05:25:04Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - Homophone Reveals the Truth: A Reality Check for Speech2Vec [1.2691047660244335]
We review and examine the authenticity of a seminal work in this field: Speech2Vec.
There is no indication that these embeddings are generated by the Speech2Vec model.
Experiments showed that this model failed to learn effective semantic embeddings.
arXiv Detail & Related papers (2022-09-22T05:32:09Z) - Learning Tracking Representations via Dual-Branch Fully Transformer
Networks [82.21771581817937]
We present a Siamese-like Dual-branch network based on solely Transformers for tracking.
We extract a feature vector for each patch based on its matching results with others within an attention window.
The method achieves better or comparable results as the best-performing methods.
arXiv Detail & Related papers (2021-12-05T13:44:33Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for
Detection of Hate Speech and Offensive Code-Mixed Social Media text [1.0499611180329804]
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European languages.
The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers.
The best performing classification models developed for both languages are applied on test datasets.
arXiv Detail & Related papers (2021-02-19T11:08:02Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z) - Tha3aroon at NSURL-2019 Task 8: Semantic Question Similarity in Arabic [5.214494546503266]
We describe our team's effort on the semantic text question similarity task of NSURL 2019.
Our top performing system utilizes several innovative data augmentation techniques to enlarge the training data.
It takes ELMo pre-trained contextual embeddings of the data and feeds them into an ON-LSTM network with self-attention.
arXiv Detail & Related papers (2019-12-28T20:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.