Author2Vec: A Framework for Generating User Embedding
- URL: http://arxiv.org/abs/2003.11627v1
- Date: Tue, 17 Mar 2020 23:31:11 GMT
- Title: Author2Vec: A Framework for Generating User Embedding
- Authors: Xiaodong Wu, Weizhe Lin, Zhilin Wang, and Elena Rastorgueva
- Abstract summary: We propose a novel end-to-end neural network-based user embedding system, Author2Vec.
The model incorporates sentence representations generated by BERT with a novel unsupervised pre-training objective, authorship classification.
Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks.
- Score: 5.805785001237604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online forums and social media platforms provide noisy but valuable data
every day. In this paper, we propose a novel end-to-end neural network-based
user embedding system, Author2Vec. The model incorporates sentence
representations generated by BERT (Bidirectional Encoder Representations from
Transformers) with a novel unsupervised pre-training objective, authorship
classification, to produce better user embedding that encodes useful
user-intrinsic properties. This user embedding system was pre-trained on post
data of 10k Reddit users and was analyzed and evaluated on two user
classification benchmarks: depression detection and personality classification,
in which the model proved to outperform traditional count-based and
prediction-based methods. We substantiate that Author2Vec successfully encoded
useful user attributes and the generated user embedding performs well in
downstream classification tasks without further finetuning.
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Machine and Deep Learning Applications to Mouse Dynamics for Continuous
User Authentication [0.0]
This article builds upon our previous published work by evaluating our dataset of 40 users using three machine learning and deep learning algorithms.
The top performer is a 1-dimensional convolutional neural network with a peak average test accuracy of 85.73% across the top 10 users.
Multi class classification is also examined using an artificial neural network which reaches an astounding peak accuracy of 92.48%.
arXiv Detail & Related papers (2022-05-26T21:43:59Z) - Class Token and Knowledge Distillation for Multi-head Self-Attention
Speaker Verification Systems [20.55054374525828]
This paper explores three novel approaches to improve the performance of speaker verification systems based on deep neural networks (DNN)
Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings.
Second, we have added a distilled representation token for training a teacher-student pair of networks using the Knowledge Distillation (KD) philosophy.
arXiv Detail & Related papers (2021-11-06T09:47:05Z) - PETGEN: Personalized Text Generation Attack on Deep Sequence
Embedding-based Classification Models [9.630961791758168]
Malicious users can evade deep detection models by manipulating their behavior.
Here we create a novel adversarial attack model against deep user sequence embedding-based classification models.
In the attack, the adversary generates a new post to fool the classifier.
arXiv Detail & Related papers (2021-09-14T15:48:07Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Towards Open-World Recommendation: An Inductive Model-based
Collaborative Filtering Approach [115.76667128325361]
Recommendation models can effectively estimate underlying user interests and predict one's future behaviors.
We propose an inductive collaborative filtering framework that contains two representation models.
Our model achieves promising results for recommendation on few-shot users with limited training ratings and new unseen users.
arXiv Detail & Related papers (2020-07-09T14:31:25Z) - Federated Learning of User Authentication Models [69.93965074814292]
We propose Federated User Authentication (FedUA), a framework for privacy-preserving training of machine learning models.
FedUA adopts federated learning framework to enable a group of users to jointly train a model without sharing the raw inputs.
We show our method is privacy-preserving, scalable with number of users, and allows new users to be added to training without changing the output layer.
arXiv Detail & Related papers (2020-07-09T08:04:38Z) - Large-scale Hybrid Approach for Predicting User Satisfaction with
Conversational Agents [28.668681892786264]
Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems.
Human annotation based approaches are easier to control, but hard to scale.
A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system.
arXiv Detail & Related papers (2020-05-29T16:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.