ExpFinder: An Ensemble Expert Finding Model Integrating $N$-gram Vector
Space Model and $\mu$CO-HITS
- URL: http://arxiv.org/abs/2101.06821v1
- Date: Mon, 18 Jan 2021 00:44:21 GMT
- Title: ExpFinder: An Ensemble Expert Finding Model Integrating $N$-gram Vector
Space Model and $\mu$CO-HITS
- Authors: Yong-Bin Kang, Hung Du, Abdur Rahim Mohammad Forkan, Prem Prakash
Jayaraman, Amir Aryani, Timos Sellis (Fellow, IEEE)
- Abstract summary: $textitExpFinder$ is a new ensemble model for expert finding.
It integrates a novel $N$-gram vector space model, denoted as $n$VSM, and a graph-based model, denoted as $textit$mu$CO-HITS$.
- Score: 0.3560086794419991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding an expert plays a crucial role in driving successful collaborations
and speeding up high-quality research development and innovations. However, the
rapid growth of scientific publications and digital expertise data makes
identifying the right experts a challenging problem. Existing approaches for
finding experts given a topic can be categorised into information retrieval
techniques based on vector space models, document language models, and
graph-based models. In this paper, we propose $\textit{ExpFinder}$, a new
ensemble model for expert finding, that integrates a novel $N$-gram vector
space model, denoted as $n$VSM, and a graph-based model, denoted as
$\textit{$\mu$CO-HITS}$, that is a proposed variation of the CO-HITS algorithm.
The key of $n$VSM is to exploit recent inverse document frequency weighting
method for $N$-gram words and $\textit{ExpFinder}$ incorporates $n$VSM into
$\textit{$\mu$CO-HITS}$ to achieve expert finding. We comprehensively evaluate
$\textit{ExpFinder}$ on four different datasets from the academic domains in
comparison with six different expert finding models. The evaluation results
show that $\textit{ExpFinder}$ is a highly effective model for expert finding,
substantially outperforming all the compared models in 19% to 160.2%.
Related papers
- GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets.
NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z) - MatViX: Multimodal Information Extraction from Visually Rich Articles [6.349779979863784]
In materials science, extracting structured information from research articles can accelerate the discovery of new materials.
We introduce textscMatViX, a benchmark consisting of $324$ full-length research articles and $1,688$ complex structured files.
These files are extracted from text, tables, and figures in full-length documents, providing a comprehensive challenge for MIE.
arXiv Detail & Related papers (2024-10-27T16:13:58Z) - The Optimization Landscape of SGD Across the Feature Learning Strength [102.1353410293931]
We study the effect of scaling $gamma$ across a variety of models and datasets in the online training setting.
We find that optimal online performance is often found at large $gamma$.
Our findings indicate that analytical study of the large-$gamma$ limit may yield useful insights into the dynamics of representation learning in performant models.
arXiv Detail & Related papers (2024-10-06T22:30:14Z) - Inertial Confinement Fusion Forecasting via Large Language Models [48.76222320245404]
In this study, we introduce $textbfLPI-LLM$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms.
We propose the $textitLLM-anchored Reservoir$, augmented with a $textitFusion-specific Prompt$, enabling accurate forecasting of $textttLPI$-generated-hot electron dynamics during implosion.
We also present $textbfLPI4AI$, the first $textttLPI$ benchmark based
arXiv Detail & Related papers (2024-07-15T05:46:44Z) - Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention.
We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$.
We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph.
We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z) - A Spectral Approach to Item Response Theory [6.5268245109828005]
We propose a emphnew item estimation algorithm for the Rasch model.
The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph.
Experiments on synthetic and real-life datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.
arXiv Detail & Related papers (2022-10-09T18:57:08Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z) - Categorical Representation Learning: Morphism is All You Need [0.0]
We provide a construction for categorical representation learning and introduce the foundations of "$textitcategorifier$"
Every object in a dataset $mathcalS$ can be represented as a vector in $mathbbRn$ by an $textitencoding map$ $E: mathcalObj(mathcalS)tomathbbRn$.
As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the
arXiv Detail & Related papers (2021-03-26T23:47:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.