Related papers: Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition

Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition

URL: http://arxiv.org/abs/2601.16711v1
Date: Fri, 23 Jan 2026 12:59:06 GMT
Title: Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition
Authors: Shanshan Liu, Noriki Nishida, Fei Cheng, Narumi Tokunaga, Rumana Ferdous Munne, Yuki Yamagata, Kouji Kozaki, Takehito Utsuro, Yuji Matsumoto,
Abstract summary: Generalization to unseen concepts is a central challenge due to the scarcity of human annotations in MA-BCR.<n>We propose an evaluation framework built on hierarchical concept indices and novel metrics to measure generalization.
Score: 9.305243291174957
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generalization to unseen concepts is a central challenge due to the scarcity of human annotations in Mention-agnostic Biomedical Concept Recognition (MA-BCR). This work makes two key contributions to systematically address this issue. First, we propose an evaluation framework built on hierarchical concept indices and novel metrics to measure generalization. Second, we explore LLM-based Auto-Labeled Data (ALD) as a scalable resource, creating a task-specific pipeline for its generation. Our research unequivocally shows that while LLM-generated ALD cannot fully substitute for manual annotations, it is a valuable resource for improving generalization, successfully providing models with the broader coverage and structural knowledge needed to approach recognizing unseen concepts. Code and datasets are available at https://github.com/bio-ie-tool/hi-ald.

Related papers

From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level [38.24989792739013]
We present RepoReason, a diagnostic benchmark centered on abductive assertion verification.<n>We implement an execution-driven mutation framework that utilizes the environment as a semantic to regenerate ground-truth states.<n>Our findings provide granular white-box insights for optimizing the next generation of agentic software engineering.
arXiv Detail & Related papers (2026-01-07T09:22:28Z)
Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation [14.8602760818616]
Historical Report Guided textbfBi-modal Concurrent Learning Framework for Pathology Report textbfGeneration (BiGen) emulating pathologists' diagnostic reasoning.<n>BiGen retrieves WSI-relevant knowledge from pre-built medical knowledge bank by matching high-attention patches.<n>Our framework achieves state-of-the-art performance with 7.4% relative improvement in NLP metrics and 19.1% enhancement in classification metrics for Her-2 prediction versus existing methods.
arXiv Detail & Related papers (2025-06-23T14:00:21Z)
MA-COIR: Leveraging Semantic Search Index and Generative Models for Ontology-Driven Biomedical Concept Recognition [8.635416307171035]
We introduce MA-COIR, a framework that reformulates concept recognition as an indexing-recognition task.<n>By assigning semantic search indexes (ssIDs) to concepts, MA-COIR resolves ambiguities in ontology entries and enhances recognition efficiency.<n>Our results highlight the effectiveness of MA-COIR in recognizing both explicit and implicit concepts without the need for mention-level annotations.
arXiv Detail & Related papers (2025-05-19T11:00:43Z)
SCAN: Structured Capability Assessment and Navigation for LLMs [54.54085382131134]
textbfSCAN (Structured Capability Assessment and Navigation) is a practical framework that enables detailed characterization of Large Language Models.<n>SCAN incorporates four key components:.<n>TaxBuilder, which extracts capability-indicating tags from queries to construct a hierarchical taxonomy;.<n>RealMix, a query synthesis and filtering mechanism that ensures sufficient evaluation data for each capability tag;.<n>A PC$2$-based (Pre-Comparison-derived Criteria) LLM-as-a-Judge approach achieves significantly higher accuracy compared to classic LLM-as-a-Judge method
arXiv Detail & Related papers (2025-05-10T16:52:40Z)
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks. We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z)
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z)
Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution [48.86322922826514]
This paper defines a new task of Knowledge-aware Language Model Attribution (KaLMA) First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios. Second, we propose a new Conscious Incompetence" setting considering the incomplete knowledge repository. Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment.
arXiv Detail & Related papers (2023-10-09T11:45:59Z)
KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA) We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus. Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.