ProxyLM: Predicting Language Model Performance on Multilingual Tasks via   Proxy Models
        - URL: http://arxiv.org/abs/2406.09334v2
 - Date: Fri, 14 Jun 2024 14:52:05 GMT
 - Title: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via   Proxy Models
 - Authors: David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee, 
 - Abstract summary: ProxyLM is a framework for predicting LM performance using proxy models in multilingual tasks.
Our methodology showcases adaptability to previously unseen languages in pre-trained LMs, outperforming the state-of-the-art performance by 1.89x as measured by root-mean-square error (RMSE)
This framework streamlines model selection, enabling efficient deployment and iterative LM enhancements without extensive computational resources.
 - Score: 9.710960283117771
 - License: http://creativecommons.org/licenses/by-sa/4.0/
 - Abstract:   Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper introduces ProxyLM, a scalable framework for predicting LM performance using proxy models in multilingual tasks. These proxy models act as surrogates, approximating the performance of the LM of interest. By leveraging proxy models, ProxyLM significantly reduces computational overhead on task evaluations, achieving up to a 37.08x speedup compared to traditional methods, even with our smallest proxy models. Additionally, our methodology showcases adaptability to previously unseen languages in pre-trained LMs, outperforming the state-of-the-art performance by 1.89x as measured by root-mean-square error (RMSE). This framework streamlines model selection, enabling efficient deployment and iterative LM enhancements without extensive computational resources. 
 
       
      
        Related papers
        - Multilingual Definition Modeling [1.9409995498330783]
We use monolingual dictionary data for four new languages (Spanish, French, Portuguese, and German)<n>We test the performance of pre-trained multilingual language models on definition modeling of monosemic words when finetuned on this data.<n>Results show that multilingual language models can perform on-pair with English but cannot leverage potential cross-lingual synergies.
arXiv  Detail & Related papers  (2025-06-02T09:48:37Z) - Efficient Evaluation of Large Language Models via Collaborative   Filtering [25.734508624520164]
Large Language Models (LLMs) have been proposed to measure and compare the capabilities of different LLMs.
 evaluating LLMs is costly due to the large number of test instances and their slow inference speed.
We propose a two-stage method to efficiently estimate a model's real performance on a given benchmark.
arXiv  Detail & Related papers  (2025-04-05T07:46:30Z) - Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection.
Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv  Detail & Related papers  (2025-04-02T20:33:27Z) - PRISM: Self-Pruning Intrinsic Selection Method for Training-Free   Multimodal Data Selection [28.442470930703337]
PRISM is a training-free approach for efficient multimodal data selection.
It uses Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs.
It reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods.
arXiv  Detail & Related papers  (2025-02-17T18:43:41Z) - Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need [9.660067334665792]
We propose a method that predicts emergent abilities by leveraging proxy tasks.
In a case study on tool utilization capabilities, our method demonstrated a strong correlation between predicted and actual performance.
arXiv  Detail & Related papers  (2024-12-10T01:56:30Z) - P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent   Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv  Detail & Related papers  (2024-11-14T01:29:36Z) - SELF-GUIDE: Better Task-Specific Instruction Following via   Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv  Detail & Related papers  (2024-07-16T04:41:58Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv  Detail & Related papers  (2024-07-04T15:14:17Z) - MetaGPT: Merging Large Language Models Using Model Exclusive Task   Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv  Detail & Related papers  (2024-06-17T10:12:45Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling   Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv  Detail & Related papers  (2024-06-12T21:01:26Z) - AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
  Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models.
It identifies biases, proposes resolutions, and guides the model to self-debias its outputs.
This approach minimizes computational costs and preserves model performance.
arXiv  Detail & Related papers  (2024-03-01T00:02:37Z) - LLM-augmented Preference Learning from Natural Language [19.700169351688768]
Large Language Models (LLMs) are equipped to deal with larger context lengths.
LLMs can consistently outperform the SotA when the target text is large.
Few-shot learning yields better performance than zero-shot learning.
arXiv  Detail & Related papers  (2023-10-12T17:17:27Z) - Scaling Sentence Embeddings with Large Language Models [43.19994568210206]
In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance.
Our approach involves adapting the previous prompt-based representation method for autoregressive models.
By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity tasks.
arXiv  Detail & Related papers  (2023-07-31T13:26:03Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv  Detail & Related papers  (2022-04-05T16:11:45Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
  Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv  Detail & Related papers  (2022-03-09T17:26:53Z) - Maximizing Efficiency of Language Model Pre-training for Learning
  Representation [6.518508607788086]
ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models.
Our work proposes adaptive early exit strategy to maximize the efficiency of the pre-training process.
arXiv  Detail & Related papers  (2021-10-13T10:25:06Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv  Detail & Related papers  (2021-09-09T03:48:35Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv  Detail & Related papers  (2020-10-18T00:21:53Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.