Related papers: Collaborative Development of NLP models

Collaborative Development of NLP models

URL: http://arxiv.org/abs/2305.12219v2
Date: Wed, 24 May 2023 22:05:16 GMT
Title: Collaborative Development of NLP models
Authors: Fereshte Khani, Marco Tulio Ribeiro
Abstract summary: We introduce CoDev, a framework that enables multi-user interaction with NLP models. CoDev aids users in operationalizing their concepts using Large Language Models. We then steer a large language model to generate instances within concept boundaries where local and global disagree.
Score: 6.22933818252838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite substantial advancements, Natural Language Processing (NLP) models often require post-training adjustments to enforce business rules, rectify undesired behavior, and align with user values. These adjustments involve operationalizing "concepts"--dictating desired model responses to certain inputs. However, it's difficult for a single entity to enumerate and define all possible concepts, indicating a need for a multi-user, collaborative model alignment framework. Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations. CoDev aids users in operationalizing their concepts using Large Language Models, and relying on the principle that NLP models exhibit simpler behaviors in local regions. Our main insight is learning a \emph{local} model for each concept, and a \emph{global} model to integrate the original data with all concepts. We then steer a large language model to generate instances within concept boundaries where local and global disagree. Our experiments show CoDev is effective at helping multiple users operationalize concepts and avoid interference for a variety of scenarios, tasks, and models.

Related papers

Toward Enhancing Representation Learning in Federated Multi-Task Settings [13.483094713610322]
Federated multi-task learning (FMTL) seeks to collaboratively train customized models for users with different tasks.<n>We propose Muscle loss, a contrastive learning objective that simultaneously aligns representations from all participating models.<n>We develop FedMuscle, a practical and communication-efficient FMTL algorithm that naturally handles both model and task.
arXiv Detail & Related papers (2026-02-02T04:39:36Z)
Unified Personalized Understanding, Generating and Editing [54.5563878110386]
We present textbf OmniPersona, an end-to-end personalization framework for unified LMMs.<n>It integrates personalized understanding, generation, and image editing within a single architecture.<n>Experiments demonstrate that OmniPersona delivers competitive and robust performance across diverse personalization tasks.
arXiv Detail & Related papers (2026-01-11T15:46:34Z)
Flexible Concept Bottleneck Model [7.3992593868058245]
Concept bottleneck models (CBMs) improve neural network interpretability by introducing an intermediate layer that maps human-understandable concepts to predictions.<n>We propose Flexible Concept Bottleneck Model (FCBM) which supports dynamic concept adaptation, including complete replacement of the original concept set.<n>Our method achieves accuracy comparable to state-of-the-art baselines with a similar number of effective concepts.
arXiv Detail & Related papers (2025-11-10T03:50:57Z)
FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
Synergistic Weak-Strong Collaboration by Aligning Preferences [53.47675666475273]
Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge. We propose a collaborative framework that pairs a specialized weak model with a general strong model. We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths.
arXiv Detail & Related papers (2025-04-21T15:57:33Z)
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization [2.163881720692685]
We introduce a new methodology for incorporating interpretability and intervenability into an existing model by integrating Concept Layers into its architecture. Our approach projects the model's internal vector representations into a conceptual, explainable vector space before reconstructing and feeding them back into the model. We evaluate CLs across multiple tasks, demonstrating that they maintain the original model's performance and agreement while enabling meaningful interventions.
arXiv Detail & Related papers (2025-02-19T11:10:19Z)
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning [72.46388818127105]
Conditional Language Policy (CLP) is a framework for finetuning language models on multiple objectives. We show that CLP learns steerable models that effectively trade-off conflicting objectives at inference time.
arXiv Detail & Related papers (2024-07-22T16:13:38Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP) MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Orthogonal Adaptation for Modular Customization of Diffusion Models [39.62438974450659]
We address a new problem called Modular Customization, with the goal of efficiently merging customized models. We introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning. Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture.
arXiv Detail & Related papers (2023-12-05T02:17:48Z)
Auxiliary Losses for Learning Generalizable Concept-based Models [5.4066453042367435]
Concept Bottleneck Models (CBMs) have gained popularity since their introduction. CBMs essentially limit the latent space of a model to human-understandable high-level concepts. We propose cooperative-Concept Bottleneck Model (coop-CBM) to overcome the performance trade-off.
arXiv Detail & Related papers (2023-11-18T15:50:07Z)
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs) The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z)
Concept-Centric Transformers: Enhancing Model Interpretability through Object-Centric Concept Learning within a Shared Global Workspace [1.6574413179773757]
Concept-Centric Transformers is a simple yet effective configuration of the shared global workspace for interpretability. We show that our model achieves better classification accuracy than all baselines across all problems.
arXiv Detail & Related papers (2023-05-25T06:37:39Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Concept-modulated model-based offline reinforcement learning for rapid generalization [5.512991103610139]
We propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches.
arXiv Detail & Related papers (2022-09-07T15:06:38Z)
Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces. It uses latent variables to model generalizable learning patterns. At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.