Distributed In-Context Learning under Non-IID Among Clients
- URL: http://arxiv.org/abs/2408.00144v1
- Date: Wed, 31 Jul 2024 20:06:25 GMT
- Title: Distributed In-Context Learning under Non-IID Among Clients
- Authors: Siqi Liang, Sumyeong Ahn, Jiayu Zhou,
- Abstract summary: In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query.
In this paper, we show that test queries will have different preferences among clients because of non-IIDness.
We introduce a novel approach to tackle the distributed non-IID ICL problem when a data usage budget is present.
- Score: 38.868357555845435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Advancements in large language models (LLMs) have shown their effectiveness in multiple complicated natural language reasoning tasks. A key challenge remains in adapting these models efficiently to new or unfamiliar tasks. In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query, called in-context examples (ICE), from a training dataset and providing them during the inference as context. Most existing studies utilize a centralized training dataset, yet many real-world datasets may be distributed among multiple clients, and remote data retrieval can be associated with costs. Especially when the client data are non-identical independent distributions (non-IID), retrieving from clients a proper set of ICEs needed for a test query presents critical challenges. In this paper, we first show that in this challenging setting, test queries will have different preferences among clients because of non-IIDness, and equal contribution often leads to suboptimal performance. We then introduce a novel approach to tackle the distributed non-IID ICL problem when a data usage budget is present. The principle is that each client's proper contribution (budget) should be designed according to the preference of each query for that client. Our approach uses a data-driven manner to allocate a budget for each client, tailored to each test query. Through extensive empirical studies on diverse datasets, our framework demonstrates superior performance relative to competing baselines.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems [10.71630696651595]
Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks have garnered significant interest within database and AI communities.
silos of multimodal data sources make it difficult to identify appropriate data sources for accomplishing the task at hand.
We propose CMDBench, a benchmark modeling the complexity of enterprise data platforms.
arXiv Detail & Related papers (2024-06-02T01:10:41Z) - Optimizing Server-side Aggregation For Robust Federated Learning via
Subspace Training [80.03567604524268]
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems.
We propose SmartFL, a generic approach that optimize the server-side aggregation process.
We provide theoretical analyses of the convergence and generalization capacity for SmartFL.
arXiv Detail & Related papers (2022-11-10T13:20:56Z) - FLIS: Clustered Federated Learning via Inference Similarity for Non-IID
Data Distribution [7.924081556869144]
We present a new algorithm, FLIS, which groups the clients population in clusters with jointly trainable data distributions.
We present experimental results to demonstrate the benefits of FLIS over the state-of-the-art benchmarks on CIFAR-100/10, SVHN, and FMNIST datasets.
arXiv Detail & Related papers (2022-08-20T22:10:48Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Aggregation Delayed Federated Learning [20.973999078271483]
Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while keeping data on their own devices.
Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-IID data.
Many existing works on handling non-IID data adopt the same aggregation framework as FedAvg and focus on improving model updates either on the server side or on clients.
In this work, we tackle this challenge by introducing redistribution rounds that delay the aggregation. We perform experiments on multiple tasks and show that the proposed framework significantly improves the performance on non-IID
arXiv Detail & Related papers (2021-08-17T04:06:10Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z) - Overcoming Noisy and Irrelevant Data in Federated Learning [13.963024590508038]
Federated learning is an effective way of training a machine learning model in a distributed manner from local data collected by client devices.
We propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset.
The effectiveness of our proposed approach is evaluated on multiple real-world image datasets in a simulated system with a large number of clients.
arXiv Detail & Related papers (2020-01-22T22:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.