Overview of the PromptCBLUE Shared Task in CHIP2023
- URL: http://arxiv.org/abs/2312.17522v1
- Date: Fri, 29 Dec 2023 09:05:00 GMT
- Title: Overview of the PromptCBLUE Shared Task in CHIP2023
- Authors: Wei Zhu, Xiaoling Wang, Mosha Chen, Buzhou Tang
- Abstract summary: This paper presents an overview of the PromptC BLUE shared task held in the CHIP-2023 Conference.
It provides a good testbed for Chinese open-domain or medical-domain large language models (LLMs) in general medical natural language processing.
This paper describes the tasks, the datasets, evaluation metrics, and the top systems for both tasks.
- Score: 26.56584015791646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an overview of the PromptCBLUE shared task
(http://cips-chip.org.cn/2023/eval1) held in the CHIP-2023 Conference. This
shared task reformualtes the CBLUE benchmark, and provide a good testbed for
Chinese open-domain or medical-domain large language models (LLMs) in general
medical natural language processing. Two different tracks are held: (a) prompt
tuning track, investigating the multitask prompt tuning of LLMs, (b) probing
the in-context learning capabilities of open-sourced LLMs. Many teams from both
the industry and academia participated in the shared tasks, and the top teams
achieved amazing test results. This paper describes the tasks, the datasets,
evaluation metrics, and the top systems for both tasks. Finally, the paper
summarizes the techniques and results of the evaluation of the various
approaches explored by the participating teams.
Related papers
- Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation [10.500629810624769]
We study long-context language models evaluation through many-shot in-context learning (ICL)
We identify the skills each ICL task requires, and examine models' long-context capabilities on them.
We introduce a new many-shot ICL benchmark, MANYICLBENCH, designed to characterize LCLMs' retrieval and global context understanding capabilities separately.
arXiv Detail & Related papers (2024-11-11T17:00:59Z) - Narrative Action Evaluation with Prompt-Guided Multimodal Interaction [60.281405999483]
Narrative action evaluation (NAE) aims to generate professional commentary that evaluates the execution of an action.
NAE is a more challenging task because it requires both narrative flexibility and evaluation rigor.
We propose a prompt-guided multimodal interaction framework to facilitate the interaction between different modalities of information.
arXiv Detail & Related papers (2024-04-22T17:55:07Z) - IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts [4.78482610709922]
This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness.
The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages.
arXiv Detail & Related papers (2024-04-06T05:58:42Z) - Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task.
We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Rethinking and Improving Multi-task Learning for End-to-end Speech
Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules.
We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations.
We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z) - Little Giants: Exploring the Potential of Small LLMs as Evaluation
Metrics in Summarization in the Eval4NLP 2023 Shared Task [53.163534619649866]
This paper focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation.
We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting.
Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.
arXiv Detail & Related papers (2023-11-01T17:44:35Z) - The Eval4NLP 2023 Shared Task on Prompting Large Language Models as
Explainable Metrics [36.52897053496835]
generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples.
We introduce the Eval4NLP 2023 shared task that asks participants to explore prompting and score extraction for machine translation (MT) and summarization evaluation.
We present an overview of participants' approaches and evaluate them on a new reference-free test set spanning three language pairs for MT and a summarization dataset.
arXiv Detail & Related papers (2023-10-30T17:55:08Z) - BLP-2023 Task 2: Sentiment Analysis [7.725694295666573]
We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop.
The task is defined as the detection of sentiment in a given piece of social media text.
This paper provides a detailed account of the task setup, including dataset development and evaluation setup.
arXiv Detail & Related papers (2023-10-24T21:00:41Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.