Identifying Experts in Question & Answer Portals: A Case Study on Data
Science Competencies in Reddit
- URL: http://arxiv.org/abs/2204.04098v2
- Date: Thu, 1 Sep 2022 21:21:22 GMT
- Title: Identifying Experts in Question & Answer Portals: A Case Study on Data
Science Competencies in Reddit
- Authors: Sofia Strukova, Jos\'e A. Ruip\'erez-Valiente, F\'elix G\'omez
M\'armol
- Abstract summary: We inspect the feasibility of identifying data science experts in Reddit.
Our method is based on the manual coding results where two data science experts labelled not only expert and non-expert comments, but also out-of-scope comments.
We present a semi-supervised approach which combines 1,113 labelled comments with 100,226 unlabelled comments during training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The irreplaceable key to the triumph of Question & Answer (Q&A) platforms is
their users providing high-quality answers to the challenging questions posted
across various topics of interest. From more than a decade, the expert finding
problem attracted much attention in information retrieval research. Based on
the encountered gaps in the expert identification across several Q&A portals,
we inspect the feasibility of identifying data science experts in Reddit. Our
method is based on the manual coding results where two data science experts
labelled not only expert and non-expert comments, but also out-of-scope
comments, which is a novel contribution to the literature, enabling the
identification of more groups of comments across web portals. We present a
semi-supervised approach which combines 1,113 labelled comments with 100,226
unlabelled comments during training. The proposed model uses the activity
behaviour of every user, including Natural Language Processing (NLP),
crowdsourced and user feature sets. We conclude that the NLP and user feature
sets contribute the most to the better identification of these three classes.
It means that this method can generalise well within the domain. Finally, we
make a novel contribution by presenting different types of users in Reddit,
which opens many future research directions.
Related papers
- Bottom-up Anytime Discovery of Generalised Multimodal Graph Patterns for Knowledge Graphs [0.0]
We introduce an algorithm for the bottom-up discovery of generalized multimodal graph patterns in knowledge graphs.
Upon discovery, the patterns are converted to SPARQL queries and presented in an interactive facet browser.
We evaluate our method from a user perspective, with the help of domain experts in the humanities.
arXiv Detail & Related papers (2024-10-08T09:07:27Z) - Backtracing: Retrieving the Cause of the Query [7.715089044732362]
We introduce the task of backtracing, in which systems retrieve the text segment that most likely caused a user query.
We evaluate the zero-shot performance of popular information retrieval methods and language modeling methods.
Our results show that there is room for improvement on backtracing and it requires new retrieval approaches.
arXiv Detail & Related papers (2024-03-06T18:59:02Z) - Inclusiveness Matters: A Large-Scale Analysis of User Feedback [7.8788463395442045]
We leverage user feedback from three popular online sources, Reddit, Google Play Store, and Twitter, for 50 of the most popular apps in the world.
Using a Socio-Technical Grounded Theory approach, we analyzed 23,107 posts across the three sources and identified 1,211 inclusiveness related posts.
Our study provides an in-depth view of inclusiveness-related user feedback from most popular apps and online sources.
arXiv Detail & Related papers (2023-11-02T04:05:46Z) - ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality.
We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions.
The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z) - Best-Answer Prediction in Q&A Sites Using User Information [2.982218441172364]
Community Question Answering (CQA) sites have spread and multiplied significantly in recent years.
One practical way of finding such answers is automatically predicting the best candidate given existing answers and comments.
We address this limitation using a novel method for predicting the best answers using the questioner's background information and other features.
arXiv Detail & Related papers (2022-12-15T02:28:52Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Advances and Challenges in Conversational Recommender Systems: A Survey [133.93908165922804]
We provide a systematic review of the techniques used in current conversational recommender systems (CRSs)
We summarize the key challenges of developing CRSs into five directions.
These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI)
arXiv Detail & Related papers (2021-01-23T08:53:15Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - An Empirical Study of Clarifying Question-Based Systems [15.767515065224016]
We conduct an online experiment by deploying an experimental system, which interacts with users by asking clarifying questions against a product repository.
We collect both implicit interaction behavior data and explicit feedback from users showing that: (a) users are willing to answer a good number of clarifying questions (11-21 on average), but not many more than that.
arXiv Detail & Related papers (2020-08-01T15:10:11Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z) - Deep Learning for Person Re-identification: A Survey and Outlook [233.36948173686602]
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras.
By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings.
arXiv Detail & Related papers (2020-01-13T12:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.