Related papers: TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

URL: http://arxiv.org/abs/2401.04507v1
Date: Tue, 9 Jan 2024 11:52:58 GMT
Title: TechGPT-2.0: A large language model project to solve the task of knowledge graph construction
Authors: Jiaqi Wang, Yuying Chang, Zhong Li, Ning An, Qi Ma, Lei Hei, Haibo Luo, Yifei Lu, Feiliang Ren
Abstract summary: TechGPT-2.0 is a project designed to enhance the capabilities of large language models in knowledge graph construction tasks. It exhibits robust text processing capabilities, particularly in the domains of medicine and law. TechGPT-2.0 is trained on Huawei's Ascend server.
Score: 31.638140593358433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of large language models specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B large language model weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei's Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model's adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei's Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at https://github.com/neukg/TechGPT-2.0

Related papers

ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement [49.513401043490305]
This work explores the continual general pre-training of text-to-video models. We break this task into two key aspects: increasing model capacity and improving semantic understanding. For semantic understanding, we propose a method that leverages large language models as advanced text encoders.
arXiv Detail & Related papers (2024-12-25T18:58:07Z)
Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing [0.0]
This paper presents a novel approach to fine-tuning the Qwen2-1.5B model for Arabic language processing using Quantized Low-Rank Adaptation (QLoRA) on a system with only 4GB VRAM. We detail the process of adapting this large language model to the Arabic domain, using diverse datasets including Bactrian, OpenAssistant, and Wikipedia Arabic corpora. Experimental results over 10,000 training steps show significant performance improvements, with the final loss converging to 0.1083.
arXiv Detail & Related papers (2024-12-23T13:08:48Z)
In-Context Code-Text Learning for Bimodal Software Engineering [26.0027882745058]
Bimodal software analysis initially appeared to be within reach with the advent of large language models. We postulate that in-context learning for the code-text bimodality is a promising avenue. We consider a diverse dataset encompassing 23 software engineering tasks, which we transform in an in-context learning format.
arXiv Detail & Related papers (2024-10-08T19:42:00Z)
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning [0.0]
We introduce InstAr-500k, a new Arabic instruction dataset created by generating and collecting content. We assess this dataset by fine-tuning an open-source Gemma-7B model on several downstream tasks to improve its functionality. Based on multiple evaluations, our fine-tuned model achieves excellent performance on several Arabic NLP benchmarks.
arXiv Detail & Related papers (2024-07-02T10:43:49Z)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z)
Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets [2.8123257987021058]
We focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks.
arXiv Detail & Related papers (2024-02-12T19:25:11Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text [2.396908230113859]
Large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. We present Text2KGBench, a benchmark to evaluate the capabilities of language models to generate Knowledge Graphs (KGs) from natural language text guided by an ontology.
arXiv Detail & Related papers (2023-08-04T14:47:15Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z)
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.