Assistive Completion of Agrammatic Aphasic Sentences: A Transfer
Learning Approach using Neurolinguistics-based Synthetic Dataset
- URL: http://arxiv.org/abs/2211.05557v1
- Date: Thu, 10 Nov 2022 13:24:02 GMT
- Title: Assistive Completion of Agrammatic Aphasic Sentences: A Transfer
Learning Approach using Neurolinguistics-based Synthetic Dataset
- Authors: Rohit Misra, Sapna S Mishra and Tapan K. Gandhi
- Abstract summary: Damage to the inferior frontal gyrus can cause agrammatic aphasia.
Patients, although able to comprehend, lack the ability to form complete sentences.
- Score: 0.8831954614241233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Damage to the inferior frontal gyrus (Broca's area) can cause agrammatic
aphasia wherein patients, although able to comprehend, lack the ability to form
complete sentences. This inability leads to communication gaps which cause
difficulties in their daily lives. The usage of assistive devices can help in
mitigating these issues and enable the patients to communicate effectively.
However, due to lack of large scale studies of linguistic deficits in aphasia,
research on such assistive technology is relatively limited. In this work, we
present two contributions that aim to re-initiate research and development in
this field. Firstly, we propose a model that uses linguistic features from
small scale studies on aphasia patients and generates large scale datasets of
synthetic aphasic utterances from grammatically correct datasets. We show that
the mean length of utterance, the noun/verb ratio, and the simple/complex
sentence ratio of our synthetic datasets correspond to the reported features of
aphasic speech. Further, we demonstrate how the synthetic datasets may be
utilized to develop assistive devices for aphasia patients. The pre-trained T5
transformer is fine-tuned using the generated dataset to suggest 5 corrected
sentences given an aphasic utterance as input. We evaluate the efficacy of the
T5 model using the BLEU and cosine semantic similarity scores. Affirming
results with BLEU score of 0.827/1.00 and semantic similarity of 0.904/1.00
were obtained. These results provide a strong foundation for the concept that a
synthetic dataset based on small scale studies on aphasia can be used to
develop effective assistive technology.
Related papers
- Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS [10.019926246026928]
Dysarthric speech poses significant challenges in developing assistive technologies.<n>Recent advances in neural speech synthesis, especially zero-shot voice cloning, facilitate synthetic speech generation for data augmentation.<n>We investigate the effectiveness of state-of-the-art F5-TTS in cloning dysarthric speech using TORGO dataset.
arXiv Detail & Related papers (2025-08-07T07:39:48Z) - Dementia Insights: A Context-Based MultiModal Approach [0.3749861135832073]
Early detection is crucial for timely interventions that may slow disease progression.
Large pre-trained models (LPMs) for text and audio have shown promise in identifying cognitive impairments.
This study proposes a context-based multimodal method, integrating both text and audio data using the best-performing LPMs.
arXiv Detail & Related papers (2025-03-03T06:46:26Z) - A Lesion-aware Edge-based Graph Neural Network for Predicting Language Ability in Patients with Post-stroke Aphasia [12.129896943547912]
We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia.
Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning module.
arXiv Detail & Related papers (2024-09-03T21:28:48Z) - Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - Detecting the Clinical Features of Difficult-to-Treat Depression using
Synthetic Data from Large Language Models [0.20971479389679337]
We seek to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record data.
We use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model.
We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD
arXiv Detail & Related papers (2024-02-12T13:34:33Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Synthetic Pre-Training Tasks for Neural Machine Translation [16.6378815054841]
Our goal is to understand the factors that contribute to the effectiveness of pre-training models when using synthetic resources.
We propose several novel approaches to pre-training translation models that involve different levels of lexical and structural knowledge.
Our experiments on multiple language pairs reveal that pre-training benefits can be realized even with high levels of obfuscation or purely synthetic parallel data.
arXiv Detail & Related papers (2022-12-19T21:34:00Z) - Synthesising Electronic Health Records: Cystic Fibrosis Patient Group [3.255030588361125]
This paper evaluates synthetic data generators ability to synthesise patient electronic health records.
We test the utility of synthetic data for patient outcome classification, observing increased predictive performance when augmenting imbalanced datasets with synthetic data.
arXiv Detail & Related papers (2022-01-14T11:35:18Z) - On the Interplay Between Sparsity, Naturalness, Intelligibility, and
Prosody in Speech Synthesis [102.80458458550999]
We investigate the tradeoffs between sparstiy and its subsequent effects on synthetic speech.
Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility.
arXiv Detail & Related papers (2021-10-04T02:03:28Z) - Alternated Training with Synthetic and Authentic Data for Neural Machine
Translation [49.35605028467887]
We propose alternated training with synthetic and authentic data for neural machine translation (NMT)
Compared with previous work, we introduce authentic data as guidance to prevent the training of NMT models from being disturbed by noisy synthetic data.
Experiments on Chinese-English and German-English translation tasks show that our approach improves the performance over several strong baselines.
arXiv Detail & Related papers (2021-06-16T07:13:16Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Image Translation for Medical Image Generation -- Ischemic Stroke
Lesions [0.0]
Synthetic databases with annotated pathologies could provide the required amounts of training data.
We train different image-to-image translation models to synthesize magnetic resonance images of brain volumes with and without stroke lesions.
We show that for a small database of only 10 or 50 clinical cases, synthetic data augmentation yields significant improvement.
arXiv Detail & Related papers (2020-10-05T09:12:28Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.