Related papers: On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards

On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards

URL: http://arxiv.org/abs/2407.04065v2
Date: Sat, 13 Jul 2024 03:21:40 GMT
Title: On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Authors: Zhimin Zhao, Abdul Ali Bangash, Filipe Roseiro Côgo, Bram Adams, Ahmed E. Hassan,
Abstract summary: This research focuses on understanding how these FM leaderboards operate in real-world scenarios ("leaderboard operations") We identify 5 unique workflow patterns and develop a domain model that outlines the essential components and their interaction within FM leaderboards. We then identify 8 unique types of leaderboard smells in LBOps.
Score: 11.99718417371013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models (FM), such as large language models (LLMs), which are large-scale machine learning (ML) models, have demonstrated remarkable adaptability in various downstream software engineering (SE) tasks, such as code completion, code understanding, and software development. As a result, FM leaderboards, especially those hosted on cloud platforms, have become essential tools for SE teams to compare and select the best third-party FMs for their specific products and purposes. However, the lack of standardized guidelines for FM evaluation and comparison threatens the transparency of FM leaderboards and limits stakeholders' ability to perform effective FM selection. As a first step towards addressing this challenge, our research focuses on understanding how these FM leaderboards operate in real-world scenarios ("leaderboard operations") and identifying potential leaderboard pitfalls and areas for improvement ("leaderboard smells"). In this regard, we perform a multivocal literature review to collect up to 721 FM leaderboards, after which we examine their documentation and engage in direct communication with leaderboard operators to understand their workflow patterns. Using card sorting and negotiated agreement, we identify 5 unique workflow patterns and develop a domain model that outlines the essential components and their interaction within FM leaderboards. We then identify 8 unique types of leaderboard smells in LBOps. By mitigating these smells, SE teams can improve transparency, accountability, and collaboration in current LBOps practices, fostering a more robust and responsible ecosystem for FM comparison and selection.

Related papers

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models [45.12916211850169]
behavior foundation models (BFMs) enable multi-modal, human-like control for humanoid agents. "Task Tokens" are a method to effectively tailor BFMs to specific tasks while preserving their flexibility. We show that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.
arXiv Detail & Related papers (2025-03-28T21:28:13Z)
A Framework for Double-Blind Federated Adaptation of Foundation Models [4.910367774892893]
We propose a framework for double-blind federated adaptation of FMs using fully homomorphic encryption (FHE) The proposed framework first decomposes the FM into a sequence of FHE-friendly blocks through knowledge distillation. The resulting FHE-friendly model is adapted for the downstream task via low-rank parallel adapters.
arXiv Detail & Related papers (2025-02-03T12:00:11Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap [12.313710667597897]
We conduct a semi-structured thematic synthesis to identify the key challenges in productionizing FMware across diverse data sources. We identify critical issues in FM selection, data and model alignment, prompt engineering, agent orchestration, system testing, and deployment. We discuss needed technologies and strategies to address these challenges and offer guidance on how to enable the transition from demonstration systems to scalable, production-ready FMware solutions.
arXiv Detail & Related papers (2024-10-28T07:16:00Z)
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models [11.993910471523073]
We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices.
arXiv Detail & Related papers (2024-10-11T17:27:04Z)
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards [67.65408769829524]
Scientific leaderboards are standardized ranking systems that facilitate evaluating and comparing competitive methods. The exponential increase in publications has made it infeasible to construct and maintain these leaderboards manually. automatic leaderboard construction has emerged as a solution to reduce manual labor.
arXiv Detail & Related papers (2024-09-19T11:12:27Z)
Synergizing Foundation Models and Federated Learning: A Survey [23.416321895575507]
This paper discusses the potentials and challenges of synergizing Federated Learning (FL) and Foundation Models (FM) FL is a collaborative learning paradigm that breaks the barrier of data availability from different participants. It provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy.
arXiv Detail & Related papers (2024-06-18T17:58:09Z)
On the Evaluation of Speech Foundation Models for Spoken Language Understanding [87.52911510306011]
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. We ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs?
arXiv Detail & Related papers (2024-06-14T14:37:52Z)
Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning [23.763256908202496]
Foundation models (FMs) have revolutionized the field of AI by showing remarkable performance in various tasks. FMs exhibit numerous limitations that prevent their broader adoption in many real-world systems. We propose a conceptual framework that encapsulates different modes by which agents could interact with FMs.
arXiv Detail & Related papers (2024-02-02T18:00:35Z)
Learn From Model Beyond Fine-Tuning: A Survey [78.80920533793595]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
The Role of Federated Learning in a Wireless World with Foundation Models [59.8129893837421]
Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. Currently, the exploration of the interplay between FMs and federated learning (FL) is still in its nascent stage. This article explores the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities.
arXiv Detail & Related papers (2023-10-06T04:13:10Z)
VideoGLUE: Video General Understanding Evaluation of Foundation Models [89.07145427268948]
We evaluate video understanding capabilities of foundation models (FMs) using a carefully designed experiment protocol. We jointly profile FMs' hallmark and efficacy efficiency when adapting to general video understanding tasks.
arXiv Detail & Related papers (2023-07-06T17:47:52Z)
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions [57.91211653592199]
The intersection of Foundation Model (FM) and Federated Learning (FL) presents a unique opportunity to unlock new possibilities for real-world applications.<n>On the one hand, FL, as a collaborative learning paradigm, help address challenges in FM development by expanding data availability.<n>On the other hand, FM, equipped with pre-trained knowledge and exceptional performance, can serve as a robust starting point for FL.
arXiv Detail & Related papers (2023-06-27T15:15:55Z)
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning [73.47165576175541]
Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream tasks. We propose ManagerTower, a novel VL model architecture that gathers and combines the insights of pre-trained uni-modal experts at different levels.
arXiv Detail & Related papers (2023-05-31T18:23:57Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.