Related papers: The Open Source Advantage in Large Language Models (LLMs)

Related papers

The Role of Open-Source LLMs in Shaping the Future of GeoAI [11.083173173865491]
Large Language Models (LLMs) are transforming geospatial artificial intelligence (GeoAI) This paper examines the open-source paradigm's pivotal role in this transformation.
arXiv Detail & Related papers (2025-04-24T13:20:17Z)
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints. We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC) Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions [1.3638337521666275]
Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources. Recent research has focused on developing decentralized techniques to enable distributed training and inference.
arXiv Detail & Related papers (2025-03-20T15:18:25Z)
Comprehensive Analysis of Transparency and Accessibility of ChatGPT, DeepSeek, And other SoTA Large Language Models [2.6900047294457683]
Despite increasing discussions on open-source Artificial Intelligence (AI), existing research lacks a discussion on the transparency and accessibility of state-of-the-art (SoTA) Large Language Models (LLMs) This study critically analyzes SoTA LLMs from the last five years, including ChatGPT, DeepSeek, LLaMA, and others, to assess their adherence to transparency standards and the implications of partial openness. Our findings reveal that while some models are labeled as open-source, this does not necessarily mean they are fully open-sourced.
arXiv Detail & Related papers (2025-02-21T23:53:13Z)
Fully Open Source Moxin-7B Technical Report [38.13392000279939]
Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities.<n>To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed in accordance with the Model Openness Framework (MOF)<n>Our model achieves the highest MOF classification level of "open science" through the comprehensive release of pre-training code and configurations.
arXiv Detail & Related papers (2024-12-08T02:01:46Z)
Rethinking Scale: The Efficacy of Fine-Tuned Open-Source LLMs in Large-Scale Reproducible Social Science Research [0.0]
Large Language Models (LLMs) are distinguished by their architecture, which dictates their parameter size and performance capabilities. Social scientists have increasingly adopted LLMs for text classification tasks, which are difficult to scale with human coders. This study demonstrates that small, fine-tuned open-source LLMs can achieve equal or superior performance to models such as ChatGPT-4.
arXiv Detail & Related papers (2024-10-31T20:26:30Z)
A Comprehensive Survey on Joint Resource Allocation Strategies in Federated Edge Learning [9.806901443019008]
Federated Edge Learning (FEL) enables model training in a distributed environment while ensuring user privacy by using physical separation for each user data. With the development of complex application scenarios such as the Internet of Things (IoT) and Smart Earth, the conventional resource allocation schemes can no longer effectively support these growing computational and communication demands. This paper systematically addresses the multifaceted challenges of computation and communication, with the growing multiple resource demands.
arXiv Detail & Related papers (2024-10-10T13:02:00Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering. Our proposed framework incorporates retrieval-augmented generation (RAG) to enhance the system's ability to acquire domain-specific knowledge and generate solutions.
arXiv Detail & Related papers (2024-08-07T08:43:32Z)
TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE. We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning [23.395624804517034]
Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. Data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects.
arXiv Detail & Related papers (2024-04-09T10:47:02Z)
FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability [70.84333325049123]
FoFo is a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats. This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats.
arXiv Detail & Related papers (2024-02-28T19:23:27Z)
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives [6.575445633821399]
Recent decisions by leading AI labs to either open-source their models or to restrict access to their models has sparked debate. This paper offers an examination of the risks and benefits of open-sourcing highly capable foundation models.
arXiv Detail & Related papers (2023-09-29T17:03:45Z)
External Reasoning: Towards Multi-Large-Language-Models Interchangeable Assistance with Human Feedback [0.0]
This paper proposes that Large Language Models (LLMs) could be augmented through the selective integration of knowledge from external repositories. Central to this approach is the establishment of a tiered policy for textbfExternal Reasoning based on Multiple LLM Interchange Assistance. The results indicate state-of-the-art performance in crefcomparison, surpassing existing solutions including ChatPDF.com.
arXiv Detail & Related papers (2023-07-05T17:05:32Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Universal Source-Free Domain Adaptation [57.37520645827318]
We propose a novel two-stage learning process for domain adaptation. In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps.
arXiv Detail & Related papers (2020-04-09T07:26:20Z)
Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.