Related papers: Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

URL: http://arxiv.org/abs/2404.08001v1
Date: Mon, 8 Apr 2024 07:37:31 GMT
Title: Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics
Authors: Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan,
Abstract summary: Large Language Models (LLMs) are undergoing a period of rapid updates and changes. It's challenging to acquire unique domain knowledge while keeping the model itself advanced. A sophisticated large language model system named as Xiwu has been developed, allowing you switch between the most advanced foundation models.
Score: 8.483323041108774
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while keeping the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowing you switch between the most advanced foundation models and quickly teach the model domain knowledge. In this work, we will report on the best practices for applying LLMs in the field of high-energy physics (HEP), including: a seed fission technology is proposed and some data collection and cleaning tools are developed to quickly obtain domain AI-Ready dataset; a just-in-time learning system is implemented based on the vector store technology; an on-the-fly fine-tuning system has been developed to facilitate rapid training under a specified foundation model. The results show that Xiwu can smoothly switch between foundation models such as LLaMA, Vicuna, ChatGLM and Grok-1. The trained Xiwu model is significantly outperformed the benchmark model on the HEP knowledge question-and-answering and code generation. This strategy significantly enhances the potential for growth of our model's performance, with the hope of surpassing GPT-4 as it evolves with the development of open-source models. This work provides a customized LLM for the field of HEP, while also offering references for applying LLM to other fields, the corresponding codes are available on Github.

Related papers

LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications [4.122613733775677]
SOLOMON is a novel Neuro-inspired Large Language Model (LLM) Reasoning Network architecture. We show how SOLOMON enables swift adaptation of general-purpose LLMs to specialized tasks by leveraging Prompt Engineering and In-Context Learning techniques. Results show that SOLOMON instances significantly outperform their baseline LLM counterparts and achieve performance comparable to state-of-the-art reasoning model, o1-preview.
arXiv Detail & Related papers (2025-02-05T19:27:24Z)
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation. We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding. Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z)
7B Fully Open Source Moxin-LLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement [42.10844666788254]
Moxin 7B is a fully open-source Large Language Models (LLMs) developed adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
arXiv Detail & Related papers (2024-12-08T02:01:46Z)
A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition [4.123601037699469]
In real-world domain modeling, engineers usually decompose complex tasks into easily solvable sub-tasks. We propose an LLM-based domain modeling approach via question decomposition, similar to developer's modeling process. Preliminary results show that our approach outperforms the single-prompt-based prompt.
arXiv Detail & Related papers (2024-10-13T14:28:04Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks. We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models. We present four brick-oriented operations: retrieval and routing, merging, updating, and growing. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z)
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models [157.44696790158784]
This report introduces xGen-MM, a framework for developing Large Multimodal Models (LMMs) The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks.
arXiv Detail & Related papers (2024-08-16T17:57:01Z)
Large Language Model for Verilog Generation with Golden Code Feedback [29.135207235743795]
This study introduces a novel approach utilizing reinforcement learning with golden code feedback to enhance the performance of pre-trained models. We have achieved state-of-the-art (SOTA) results with a substantial margin. Notably, our 6.7B parameter model ours demonstrates superior performance compared to current best-in-class 13B and 16B models.
arXiv Detail & Related papers (2024-07-21T11:25:21Z)
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models [42.891427362223176]
Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities. We propose a novel framework to fully harness the capabilities of LLMs. We further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework.
arXiv Detail & Related papers (2024-06-17T17:59:43Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.