Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
- URL: http://arxiv.org/abs/2507.05010v1
- Date: Mon, 07 Jul 2025 13:48:54 GMT
- Title: Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
- Authors: Chenfei Xiong, Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Lorena Calvo-Bartolomé, Alexander Hoyle, Zhijing Jin, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Mennatallah El-Assady, Elliott Ash,
- Abstract summary: Co-DETECT (Collaborative Discovery of Edge cases in TExt ClassificaTion) is a novel mixed-initiative annotation framework.<n>It integrates human expertise with automatic annotation guided by large language models.
- Score: 89.62851347390959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Co-DETECT (Collaborative Discovery of Edge cases in TExt ClassificaTion), a novel mixed-initiative annotation framework that integrates human expertise with automatic annotation guided by large language models (LLMs). Co-DETECT starts with an initial, sketch-level codebook and dataset provided by a domain expert, then leverages the LLM to annotate the data and identify edge cases that are not well described by the initial codebook. Specifically, Co-DETECT flags challenging examples, induces high-level, generalizable descriptions of edge cases, and assists user in incorporating edge case handling rules to improve the codebook. This iterative process enables more effective handling of nuanced phenomena through compact, generalizable annotation rules. Extensive user study, qualitative and quantitative analyses prove the effectiveness of Co-DETECT.
Related papers
- Knowledge Graph Completion with Relation-Aware Anchor Enhancement [50.50944396454757]
We propose a relation-aware anchor enhanced knowledge graph completion method (RAA-KGC)<n>We first generate anchor entities within the relation-aware neighborhood of the head entity.<n>Then, by pulling the query embedding towards the neighborhoods of the anchors, it is tuned to be more discriminative for target entity matching.
arXiv Detail & Related papers (2025-04-08T15:22:08Z) - Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation [91.13055384151897]
CCFRec is a novel Code-based textual and Collaborative semantic Fusion method for sequential Recommendation.<n>We generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques.<n>In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy.
arXiv Detail & Related papers (2025-03-15T15:54:44Z) - DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models [13.917530818500481]
Continual adaptation of vision-language models (VLMs) focuses on leveraging cross-modal pretrained knowledge to incrementally adapt for expanding downstream tasks and datasets.<n>Existing research often focuses on connecting visual features with specific class text in downstream tasks, overlooking the latent relationships between general and specialized knowledge.<n>We propose DesCLIP, which leverages general attribute (GA) descriptions to guide the understanding of specific class objects.
arXiv Detail & Related papers (2025-02-02T01:06:02Z) - Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs [1.4932549821542682]
Existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages.<n>This paper proposes a two-step hierarchical approach for repository-level code summarization, tailored to business applications.
arXiv Detail & Related papers (2025-01-14T05:48:27Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [70.65910069412944]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.<n>Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)<n>We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity.
LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Neural Entity Summarization with Joint Encoding and Weak Supervision [29.26714907483851]
In knowledge graphs, an entity is often described by a large number of triple facts.
Existing solutions to entitymarization are mainly unsupervised.
We present a supervised approach that is based on our novel neural model.
arXiv Detail & Related papers (2020-05-01T00:14:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.