The Medical Metaphors Corpus (MCC)
- URL: http://arxiv.org/abs/2508.07993v1
- Date: Mon, 11 Aug 2025 13:55:31 GMT
- Title: The Medical Metaphors Corpus (MCC)
- Authors: Anna Sofia Lippolis, Andrea Giovanni Nuzzolese, Aldo Gangemi,
- Abstract summary: Medical Metaphors Corpus (MCC) is a dataset of 792 annotated scientific conceptual metaphors spanning medical and biological domains.<n> MCC aggregates metaphorical expressions from diverse sources including peer-reviewed literature, news media, social media discourse, and crowdsourced contributions.<n> MCC enables multiple research applications including metaphor detection benchmarking, quality-aware generation systems, and patient-centered communication tools.
- Score: 1.3654846342364308
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Metaphor is a fundamental cognitive mechanism that shapes scientific understanding, enabling the communication of complex concepts while potentially constraining paradigmatic thinking. Despite the prevalence of figurative language in scientific discourse, existing metaphor detection resources primarily focus on general-domain text, leaving a critical gap for domain-specific applications. In this paper, we present the Medical Metaphors Corpus (MCC), a comprehensive dataset of 792 annotated scientific conceptual metaphors spanning medical and biological domains. MCC aggregates metaphorical expressions from diverse sources including peer-reviewed literature, news media, social media discourse, and crowdsourced contributions, providing both binary and graded metaphoricity judgments validated through human annotation. Each instance includes source-target conceptual mappings and perceived metaphoricity scores on a 0-7 scale, establishing the first annotated resource for computational scientific metaphor research. Our evaluation demonstrates that state-of-the-art language models achieve modest performance on scientific metaphor detection, revealing substantial room for improvement in domain-specific figurative language understanding. MCC enables multiple research applications including metaphor detection benchmarking, quality-aware generation systems, and patient-centered communication tools.
Related papers
- Towards Multimodal Metaphor Understanding: A Chinese Dataset and Model for Metaphor Mapping Identification [9.08615188602226]
We develop a Chinese multimodal metaphor advertisement dataset (namely CM3D) that includes annotations of specific target and source domains.<n>We propose a Chain-of-NLP (CoT) Prompting-based Metaphor Mapping Identification Model (CPMMIM) which simulates the human cognitive process for identifying these mappings.
arXiv Detail & Related papers (2025-01-05T04:15:03Z) - Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory [0.0]
We show that Large Language Models (LLMs) can accurately identify and explain the presence of conceptual metaphors in natural language data.
Using a novel prompting technique based on metaphor annotation guidelines, we demonstrate that LLMs are a promising tool for large-scale computational research on conceptual metaphors.
arXiv Detail & Related papers (2024-10-11T17:03:13Z) - The Language of Infographics: Toward Understanding Conceptual Metaphor Use in Scientific Storytelling [9.302187675469554]
We map Conceptual Metaphor (CMT) to the visualization domain to address patterns of visual conceptual metaphors that are often used in science infographics.
Our analysis shows that ontological and orientational conceptual metaphors are the most widely applied to translate complex scientific concepts.
arXiv Detail & Related papers (2024-07-18T11:39:50Z) - Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation [6.0158981171030685]
We present Meta4XNLI, the first parallel dataset for Natural Language Inference (NLI) newly annotated for metaphor detection and interpretation.<n>Our results show that fine-tuned encoders outperform decoders-only LLMs in metaphor detection.<n>Our study also finds that translation plays an important role in the preservation or loss of metaphors across languages.
arXiv Detail & Related papers (2024-04-10T14:44:48Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models [53.840982361119565]
The Composition Score is a novel model-based metric designed to quantify the degree of meaning composition during sentence comprehension.
Experimental findings show that this metric correlates with brain clusters associated with word frequency, structural processing, and general sensitivity to words.
arXiv Detail & Related papers (2024-03-07T08:44:42Z) - MetaCLUE: Towards Comprehensive Visual Metaphors Research [43.604408485890275]
We introduce MetaCLUE, a set of vision tasks on visual metaphor.
We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations.
We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
arXiv Detail & Related papers (2022-12-19T22:41:46Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Metaphor Generation with Conceptual Mappings [58.61307123799594]
We aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs.
We propose to control the generation process by encoding conceptual mappings between cognitive domains.
We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems.
arXiv Detail & Related papers (2021-06-02T15:27:05Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.