Interpreting Depression From Question-wise Long-term Video Recording of
SDS Evaluation
- URL: http://arxiv.org/abs/2106.13393v1
- Date: Fri, 25 Jun 2021 02:32:13 GMT
- Title: Interpreting Depression From Question-wise Long-term Video Recording of
SDS Evaluation
- Authors: Wanqing Xie, Lizhong Liang, Yao Lu, Chen Wang, Jihong Shen, Hui Luo,
Xiaofeng Liu
- Abstract summary: Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient preliminary screening.
We propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time.
- Score: 13.578189285500716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Rating Depression Scale (SDS) questionnaire has frequently been used for
efficient depression preliminary screening. However, the uncontrollable
self-administered measure can be easily affected by insouciantly or deceptively
answering, and producing the different results with the clinician-administered
Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically,
facial expression (FE) and actions play a vital role in clinician-administered
evaluation, while FE and action are underexplored for self-administered
evaluations. In this work, we collect a novel dataset of 200 subjects to
evidence the validity of self-rating questionnaires with their corresponding
question-wise video recording. To automatically interpret depression from the
SDS evaluation and the paired video, we propose an end-to-end hierarchical
framework for the long-term variable-length video, which is also conditioned on
the questionnaire results and the answering time. Specifically, we resort to a
hierarchical model which utilizes a 3D CNN for local temporal pattern
exploration and a redundancy-aware self-attention (RAS) scheme for
question-wise global feature aggregation. Targeting for the redundant long-term
FE video processing, our RAS is able to effectively exploit the correlations of
each video clip within a question set to emphasize the discriminative
information and eliminate the redundancy based on feature pair-wise affinity.
Then, the question-wise video feature is concatenated with the questionnaire
scores for final depression detection. Our thorough evaluations also show the
validity of fusing SDS evaluation and its video recording, and the superiority
of our framework to the conventional state-of-the-art temporal modeling
methods.
Related papers
- Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions [0.9297614330263184]
An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper.
The proposed method achieves classification accuracies of 92% for depression and 93% for PTSD, outperforming traditional unimodal approaches.
arXiv Detail & Related papers (2025-02-06T10:57:10Z) - SAD-TIME: a Spatiotemporal-fused network for depression detection with Automated multi-scale Depth-wise and TIME-interval-related common feature extractor [8.335556993302937]
Depression is a severe mental disorder, and accurate diagnosis is pivotal to the cure and rehabilitation of people with depression.
In search of a more objective means of diagnosis, researchers have begun to experiment with deep learning-based methods for identifying depressive disorders.
arXiv Detail & Related papers (2024-11-13T11:08:28Z) - STANet: A Novel Spatio-Temporal Aggregation Network for Depression Classification with Small and Unbalanced FMRI Data [12.344849949026989]
We propose the Spatio-Temporal Aggregation Network (STANet) for diagnosing depression by integrating CNN and RNN to capture both temporal and spatial features.
Experiments demonstrate that STANet superior depression diagnostic performance with 82.38% accuracy and a 90.72% AUC.
arXiv Detail & Related papers (2024-07-31T04:06:47Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - AIOSA: An approach to the automatic identification of obstructive sleep
apnea events based on deep learning [1.5381930379183162]
OSAS is associated with higher mortality, worse neurological deficits, worse functional outcome after rehabilitation, and a higher likelihood of uncontrolled hypertension.
The gold standard test for diagnosing OSAS is polysomnography (PSG)
We propose a convolutional deep learning architecture able to reduce the temporal resolution of raw waveform data.
arXiv Detail & Related papers (2023-02-10T11:21:47Z) - Locate before Answering: Answer Guided Question Localization for Video
Question Answering [70.38700123685143]
LocAns integrates a question locator and an answer predictor into an end-to-end model.
It achieves state-of-the-art performance on two modern long-term VideoQA datasets.
arXiv Detail & Related papers (2022-10-05T08:19:16Z) - Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering [73.11017833431313]
Multi-modal video question answering aims to predict correct answer and localize the temporal boundary relevant to the question.
We devise a weakly supervised question grounding (WSQG) setting, where only QA annotations are used.
We transform the correspondence between frames and subtitles to Frame-Subtitle (FS) self-supervision, which helps to optimize the temporal attention scores.
arXiv Detail & Related papers (2022-09-08T07:20:51Z) - Deep 3D-CNN for Depression Diagnosis with Facial Video Recording of
Self-Rating Depression Scale Questionnaire [12.286463299994027]
We use a new dataset of 200 participants to demonstrate the validity of self-rating questionnaires and their accompanying question-by-question video recordings.
We offer an end-to-end system to handle the face video recording that is conditioned on the questionnaire answers and the responding time to automatically interpret sadness.
The superior performance of our system shows the validity of combining facial video recording with the SDS score for more accurate self-diagnose.
arXiv Detail & Related papers (2021-07-22T14:37:00Z) - Capturing Multi-Resolution Context by Dilated Self-Attention [58.69803243323346]
We propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention.
The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.
ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
arXiv Detail & Related papers (2021-04-07T02:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.