Text-based classification of interviews for mental health -- juxtaposing
the state of the art
- URL: http://arxiv.org/abs/2008.01543v1
- Date: Wed, 29 Jul 2020 16:19:30 GMT
- Title: Text-based classification of interviews for mental health -- juxtaposing
the state of the art
- Authors: Joppe Valentijn Wouts
- Abstract summary: Currently, the state of the art for classification of psychiatric illness is based on audio-based classification.
This thesis aims to design and evaluate a state of the art text classification network on this challenge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, the state of the art for classification of psychiatric illness is
based on audio-based classification. This thesis aims to design and evaluate a
state of the art text classification network on this challenge. The hypothesis
is that a well designed text-based approach poses a strong competition against
the state-of-the-art audio based approaches. Dutch natural language models are
being limited by the scarcity of pre-trained monolingual NLP models, as a
result Dutch natural language models have a low capture of long range semantic
dependencies over sentences. For this issue, this thesis presents belabBERT, a
new Dutch language model extending the RoBERTa[15] architecture. belabBERT is
trained on a large Dutch corpus (+32GB) of web crawled texts. After this thesis
evaluates the strength of text-based classification, a brief exploration is
done, extending the framework to a hybrid text- and audio-based classification.
The goal of this hybrid framework is to show the principle of hybridisation
with a very basic audio-classification network. The overall goal is to create
the foundations for a hybrid psychiatric illness classification, by proving
that the new text-based classification is already a strong stand-alone
solution.
Related papers
- Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification [2.4071330817126477]
We propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification.<n>The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts.<n>We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models.
arXiv Detail & Related papers (2026-03-04T02:17:13Z) - On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z) - Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z) - HAMLET: Healthcare-focused Adaptive Multilingual Learning Embedding-based Topic Modeling [4.8342038441006805]
This paper introduces HAMLET, a graph-driven architecture for cross-lingual healthcare topic modeling.<n>The proposed approach uses neural-enhanced semantic fusion to refine the embeddings of topics generated by the Large Language Models.<n> Experiments were conducted using two healthcare datasets, one in English and one in French, from which six sets were derived.
arXiv Detail & Related papers (2025-05-12T00:31:36Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - A Multi-Grained Self-Interpretable Symbolic-Neural Model For
Single/Multi-Labeled Text Classification [29.075766631810595]
We propose a Symbolic-Neural model that can learn to explicitly predict class labels of text spans from a constituency tree.
As the structured language model learns to predict constituency trees in a self-supervised manner, only raw texts and sentence-level labels are required as training data.
Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks.
arXiv Detail & Related papers (2023-03-06T03:25:43Z) - Text classification dataset and analysis for Uzbek language [0.0]
We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites.
We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures.
Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models.
arXiv Detail & Related papers (2023-02-28T11:21:24Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - belabBERT: a Dutch RoBERTa-based language model applied to psychiatric
classification [0.0]
We present belabBERT, a new Dutch language model extending the RoBERTa architecture.
belabBERT is trained on a large Dutch corpus (+32 GB) of web crawled texts.
We evaluate the strength of text-based classification using belabBERT, and compared the results to the existing RobBERT model.
arXiv Detail & Related papers (2021-06-02T11:50:49Z) - Long Text Generation by Modeling Sentence-Level and Discourse-Level
Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process.
Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z) - ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text
Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification.
We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z) - TweetEval: Unified Benchmark and Comparative Evaluation for Tweet
Classification [22.265865542786084]
We propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks.
Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models.
arXiv Detail & Related papers (2020-10-23T14:11:04Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.