Related papers: Text-based classification of interviews for mental health -- juxtaposing the state of the art

Text-based classification of interviews for mental health -- juxtaposing the state of the art

URL: http://arxiv.org/abs/2008.01543v1
Date: Wed, 29 Jul 2020 16:19:30 GMT
Title: Text-based classification of interviews for mental health -- juxtaposing the state of the art
Authors: Joppe Valentijn Wouts
Abstract summary: Currently, the state of the art for classification of psychiatric illness is based on audio-based classification. This thesis aims to design and evaluate a state of the art text classification network on this challenge.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Currently, the state of the art for classification of psychiatric illness is based on audio-based classification. This thesis aims to design and evaluate a state of the art text classification network on this challenge. The hypothesis is that a well designed text-based approach poses a strong competition against the state-of-the-art audio based approaches. Dutch natural language models are being limited by the scarcity of pre-trained monolingual NLP models, as a result Dutch natural language models have a low capture of long range semantic dependencies over sentences. For this issue, this thesis presents belabBERT, a new Dutch language model extending the RoBERTa[15] architecture. belabBERT is trained on a large Dutch corpus (+32GB) of web crawled texts. After this thesis evaluates the strength of text-based classification, a brief exploration is done, extending the framework to a hybrid text- and audio-based classification. The goal of this hybrid framework is to show the principle of hybridisation with a very basic audio-classification network. The overall goal is to create the foundations for a hybrid psychiatric illness classification, by proving that the new text-based classification is already a strong stand-alone solution.

Related papers

Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification [2.4071330817126477]
We propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification.<n>The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts.<n>We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models.
arXiv Detail & Related papers (2026-03-04T02:17:13Z)
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z)
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z)
HAMLET: Healthcare-focused Adaptive Multilingual Learning Embedding-based Topic Modeling [4.8342038441006805]
This paper introduces HAMLET, a graph-driven architecture for cross-lingual healthcare topic modeling.<n>The proposed approach uses neural-enhanced semantic fusion to refine the embeddings of topics generated by the Large Language Models.<n> Experiments were conducted using two healthcare datasets, one in English and one in French, from which six sets were derived.
arXiv Detail & Related papers (2025-05-12T00:31:36Z)
Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification [29.075766631810595]
We propose a Symbolic-Neural model that can learn to explicitly predict class labels of text spans from a constituency tree. As the structured language model learns to predict constituency trees in a self-supervised manner, only raw texts and sentence-level labels are required as training data. Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks.
arXiv Detail & Related papers (2023-03-06T03:25:43Z)
Text classification dataset and analysis for Uzbek language [0.0]
We first present a newly obtained dataset for Uzbek text classification, which was collected from 10 different news and press websites. We also present a comprehensive evaluation of different models, ranging from traditional bag-of-words models to deep learning architectures. Our experiments show that the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) based models outperform the rule-based models.
arXiv Detail & Related papers (2023-02-28T11:21:24Z)
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z)
belabBERT: a Dutch RoBERTa-based language model applied to psychiatric classification [0.0]
We present belabBERT, a new Dutch language model extending the RoBERTa architecture. belabBERT is trained on a large Dutch corpus (+32 GB) of web crawled texts. We evaluate the strength of text-based classification using belabBERT, and compared the results to the existing RobBERT model.
arXiv Detail & Related papers (2021-06-02T11:50:49Z)
Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process. Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z)
ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification. We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification [22.265865542786084]
We propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models.
arXiv Detail & Related papers (2020-10-23T14:11:04Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.