Domain-specific Continued Pretraining of Language Models for Capturing
Long Context in Mental Health
- URL: http://arxiv.org/abs/2304.10447v1
- Date: Thu, 20 Apr 2023 16:43:56 GMT
- Title: Domain-specific Continued Pretraining of Language Models for Capturing
Long Context in Mental Health
- Authors: Shaoxiong Ji, Tianlin Zhang, Kailai Yang, Sophia Ananiadou, Erik
Cambria, J\"org Tiedemann
- Abstract summary: This paper conducts domain-specific continued pretraining to capture the long context for mental health.
Specifically, we train and release MentalXLNet and MentalLongformer based on XLNet and Longformer.
We evaluate the mental health classification performance and the long-range ability of these two domain-specific pretrained models.
- Score: 23.458852189587073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models have been used in various natural language
processing applications. In the mental health domain, domain-specific language
models are pretrained and released, which facilitates the early detection of
mental health conditions. Social posts, e.g., on Reddit, are usually long
documents. However, there are no domain-specific pretrained models for
long-sequence modeling in the mental health domain. This paper conducts
domain-specific continued pretraining to capture the long context for mental
health. Specifically, we train and release MentalXLNet and MentalLongformer
based on XLNet and Longformer. We evaluate the mental health classification
performance and the long-range ability of these two domain-specific pretrained
models. Our models are released in HuggingFace.
Related papers
- MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.
Privacy concerns limit the accessibility of personalized treatment data.
MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Chinese MentalBERT: Domain-Adaptive Pre-training on Social Media for Chinese Mental Health Text Analysis [19.32304448831033]
We have collected a huge dataset from Chinese social media platforms.
We enriched it with publicly available datasets to create a database encompassing 3.36 million text entries.
To enhance the model's applicability to psychological text analysis, we integrated psychological lexicons into the pre-training masking mechanism.
arXiv Detail & Related papers (2024-02-14T13:08:25Z) - An Assessment on Comprehending Mental Health through Large Language
Models [2.7044181783627086]
More than 20% of adults may encounter at least one mental disorder in their lifetime.
This study presents an initial evaluation of large language models in addressing this gap.
Our results on the DAIC-WOZ dataset show that transformer-based models, like BERT or XLNet, outperform the large language models.
arXiv Detail & Related papers (2024-01-09T14:50:04Z) - Continuous Training and Fine-tuning for Domain-Specific Language Models
in Medical Question Answering [4.254954312483959]
Large language models exhibit promising general capabilities but often lack specialized knowledge for domain-specific tasks.
This work demonstrates a method using continuous training and instruction fine-tuning to rapidly adapt Llama 2 base models to the Chinese medical domain.
arXiv Detail & Related papers (2023-11-01T00:18:00Z) - What do Large Language Models Learn beyond Language? [10.9650651784511]
We find that pretrained models significantly outperform comparable non-pretrained neural models.
Experiments surprisingly reveal that the positive effects of pre-training persist even when pretraining on multi-lingual text or computer code.
Our findings suggest a hitherto unexplored deep connection between pre-training and inductive learning abilities of language models.
arXiv Detail & Related papers (2022-10-21T23:43:13Z) - MentalBERT: Publicly Available Pretrained Language Models for Mental
Healthcare [29.14340469459733]
Early detection of mental disorders and suicidal ideation from social content provides a potential way for effective social intervention.
Recent advances in pretrained contextualized language representations have promoted the development of several domain-specific pretrained models.
This paper trains and releases two pretrained language models, i.e., MentalBERT and MentalRoBERTa, to benefit machine learning for the mental healthcare research community.
arXiv Detail & Related papers (2021-10-29T08:36:47Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Pretrained Language Model Embryology: The Birth of ALBERT [68.5801642674541]
We investigate the developmental process from a set of randomly parameters to a totipotent language model.
Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining.
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
arXiv Detail & Related papers (2020-10-06T05:15:39Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.