Related papers: SINAI at eRisk@CLEF 2025: Transformer-Based and Conversational Strategies for Depression Detection

SINAI at eRisk@CLEF 2025: Transformer-Based and Conversational Strategies for Depression Detection

URL: http://arxiv.org/abs/2509.19861v1
Date: Wed, 24 Sep 2025 08:04:32 GMT
Title: SINAI at eRisk@CLEF 2025: Transformer-Based and Conversational Strategies for Depression Detection
Authors: Alba Maria Marmol-Romero, Manuel Garcia-Vega, Miguel Angel Garcia-Cumbreras, Arturo Montejo-Raez,
Abstract summary: This paper describes the participation of the SINAI-UJA team in the eRisk@CLEF 2025 lab.<n>We addressed two of the proposed tasks: (i) Contextualized Early Detection of Depression, and (ii) Pilot Task: Conversational Depression Detection via LLMs.<n>Our approach for Task 2 combines an extensive preprocessing pipeline with the use of several transformer-based models, such as RoBERTa Base or MentalRoBERTA Large.<n>For the Pilot Task, we designed a set of conversational strategies to interact with LLM-powered personas, focusing on maximizing information gain within a limited number of
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper describes the participation of the SINAI-UJA team in the eRisk@CLEF 2025 lab. Specifically, we addressed two of the proposed tasks: (i) Task 2: Contextualized Early Detection of Depression, and (ii) Pilot Task: Conversational Depression Detection via LLMs. Our approach for Task 2 combines an extensive preprocessing pipeline with the use of several transformer-based models, such as RoBERTa Base or MentalRoBERTA Large, to capture the contextual and sequential nature of multi-user conversations. For the Pilot Task, we designed a set of conversational strategies to interact with LLM-powered personas, focusing on maximizing information gain within a limited number of dialogue turns. In Task 2, our system ranked 8th out of 12 participating teams based on F1 score. However, a deeper analysis revealed that our models were among the fastest in issuing early predictions, which is a critical factor in real-world deployment scenarios. This highlights the trade-off between early detection and classification accuracy, suggesting potential avenues for optimizing both jointly in future work. In the Pilot Task, we achieved 1st place out of 5 teams, obtaining the best overall performance across all evaluation metrics: DCHR, ADODL and ASHR. Our success in this task demonstrates the effectiveness of structured conversational design when combined with powerful language models, reinforcing the feasibility of deploying LLMs in sensitive mental health assessment contexts.

Related papers

GameTalk: Training LLMs for Strategic Conversation [51.29670609281524]
We introduce textbfGameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions.<n>Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations.<n>We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling.
arXiv Detail & Related papers (2026-01-22T19:18:39Z)
AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing [79.0112532518727]
We release TeleSalesCorpus, the first real-world-grounded dialogue dataset for this domain.<n>We then propose AI-Salesman, a novel framework featuring a dual-stage architecture.<n>We show that our proposed AI-Salesman significantly outperforms baseline models in both automatic metrics and comprehensive human evaluations.
arXiv Detail & Related papers (2025-11-15T09:44:42Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks [23.34710429552906]
We evaluate the capabilities and limitations of five prevalent Large Language Models: Llama, OPT, Falcon, Alpaca, and MPT.<n>The study encompasses various conversational tasks, including reservation, empathetic response generation, mental health and legal counseling, persuasion, and negotiation.
arXiv Detail & Related papers (2024-11-26T08:21:24Z)
ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents [52.7201882529976]
We propose SOP-guided Monte Carlo Tree Search (MCTS) planning framework to enhance controllability of dialogue agents.<n>To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o.<n>We also propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction.
arXiv Detail & Related papers (2024-07-04T12:23:02Z)
Can Large Language Models Automatically Score Proficiency of Written Essays? [3.993602109661159]
Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. We test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays.
arXiv Detail & Related papers (2024-03-10T09:39:00Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Interaction is all You Need? A Study of Robots Ability to Understand and Execute [0.5439020425819]
We equip robots with the ability to understand and execute complex instructions in coherent dialogs. We observe that our best configuration outperforms the baseline with a success rate score of 8.85. We introduce a new task by expanding the EDH task and making predictions about game plans instead of individual actions.
arXiv Detail & Related papers (2023-11-13T08:39:06Z)
Empirical Study of Zero-Shot NER with ChatGPT [19.534329209433626]
Large language models (LLMs) exhibited powerful capability in various natural language processing tasks. This work focuses on exploring LLM performance on zero-shot information extraction. Inspired by the remarkable reasoning capability of LLM on symbolic and arithmetic reasoning, we adapt the prevalent reasoning methods to NER.
arXiv Detail & Related papers (2023-10-16T03:40:03Z)
Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors. We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z)
Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard. GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z)
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents [59.091663077007304]
We propose JARVIS, a neuro-symbolic commonsense reasoning framework for modular, generalizable, and interpretable conversational embodied agents.<n>Our framework achieves state-of-the-art (SOTA) results on all three dialog-based embodied tasks, including Execution from Dialog History (EDH), Trajectory from Dialog (TfD), and Two-Agent Task Completion (TATC)<n>Our model ranks first in the Alexa Prize SimBot Public Benchmark Challenge.
arXiv Detail & Related papers (2022-08-28T18:30:46Z)
Using contextual sentence analysis models to recognize ESG concepts [8.905370601886112]
This paper summarizes the joint participation of the Trading Central Labs and the L3i laboratory of the University of La Rochelle on two sub-tasks of the FinSim-4 evaluation campaign. The first sub-task aims to enrich the 'Fortia ESG taxonomy' with new lexicon entries while the second one aims to classify sentences to either'sustainable' or 'unsustainable' with respect to ESG related factors.
arXiv Detail & Related papers (2022-07-04T13:33:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.