Fugu-MT 論文翻訳(概要): BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

論文の概要: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

arxiv url: http://arxiv.org/abs/2310.07276v1
Date: Wed, 11 Oct 2023 07:57:08 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-12 23:42:43.489803
Title: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
Title（参考訳）: BioT5: 生物と化学知識と自然言語の相互統合を充実させる
Authors: Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan
Abstract要約: $mathbfBioT5$は、化学知識と自然言語の関連性によって生物学のクロスモーダルな統合を強化する事前学習フレームワークである。 $mathbfBioT5$は構造化知識と非構造化知識を区別し、より効果的な情報利用につながる。
参考スコア（独自算出の注目度）: 54.97423244799579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, $\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at $\href{https://github.com/QizhiPei/BioT5}{Github}$.
Abstract（参考訳）: 生物学的研究の最近の進歩は、分子、タンパク質、自然言語の統合を利用して薬物発見を促進する。しかし、現在のモデルでは、不正な分子スマイルの生成、文脈情報の過小利用、構造化および非構造化知識の平等な扱いなど、いくつかの制限が示されている。これらの問題に対処するために,我々は生物学におけるクロスモーダル統合と化学知識と自然言語関連を充実させる包括的事前学習フレームワークである$\mathbf{biot5}$を提案する。 $\mathbf{BioT5}$は、SELFIESを100%のロバストな分子表現に利用し、非構造生物文学におけるバイオエンティティの周囲の文脈から知識を抽出する。さらに、$\mathbf{BioT5}$は構造化知識と非構造化知識を区別し、より効果的な情報利用につながる。微調整後、BioT5は幅広いタスクにおいて優れたパフォーマンスを示し、バイオエンティティの基盤となる関係と特性を捉える強力な能力を示している。私たちのコードは$\href{https://github.com/QizhiPei/BioT5}{Github}$で利用可能です。

関連論文リスト

Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
本稿では,生物配列関連命令チューニングデータセットであるBiology-Instructionsを紹介する。このデータセットは、大きな言語モデル(LLM)と複雑な生物学的シーケンスに関連するタスクのギャップを埋めることができます。また、新たな3段階トレーニングパイプラインを備えたChatMultiOmicsという強力なベースラインも開発しています。
論文参考訳（メタデータ） (2024-12-26T12:12:23Z)
InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions [32.38318676313486]
InstructBioMolは自然言語と生体分子を橋渡しするように設計されている。マルチモーダルな生体分子を入力として統合し、研究者が自然言語で設計目標を明確にすることができる。結合親和性は10%向上し、ESPスコア70.4に達する酵素を設計することができる。
論文参考訳（メタデータ） (2024-10-10T13:45:56Z)
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey [75.47055414002571]
生物分子モデリングと自然言語(BL)の統合は、人工知能、化学、生物学の交差点において有望な学際領域として現れてきた。生体分子と自然言語の相互モデリングによって達成された最近の進歩について分析する。
論文参考訳（メタデータ） (2024-03-03T14:59:47Z)
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning [77.90250740041411]
本稿では,BioT5フレームワークの拡張であるBioT5+を紹介する。 BioT5+ には、分子理解のための IUPAC 名の統合、bioRxiv や PubChem などのソースからの広範なバイオテキストと分子データの統合、タスク間の汎用性のためのマルチタスク命令チューニング、数値データの処理を改善する数値トークン化技術など、いくつかの新機能が含まれている。
論文参考訳（メタデータ） (2024-02-27T12:43:09Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
我々は、軽量なアダプターモジュールを用いて、構造化された生体医学的知識を事前訓練された言語モデルに注入するアプローチを開発した。バイオメディカル知識システムUMLSと新しいバイオケミカルOntoChemの2つの大きなKGと、PubMedBERTとBioLinkBERTの2つの著名なバイオメディカルPLMを使用している。計算能力の要件を低く保ちながら,本手法がいくつかの事例において性能改善につながることを示す。
論文参考訳（メタデータ） (2023-12-21T14:26:57Z)
Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs [45.53337864477857]
Know2BIOは、バイオメディカルドメインのための汎用的なヘテロジニアスKGベンチマークである。 30の多様なソースからのデータを統合し、11のバイオメディカルカテゴリにわたる複雑な関係をキャプチャする。 Know2BIOは、バイオメディカルサイエンスの最新の知識を反映して、ユーザー指向の自動更新を行うことができる。
論文参考訳（メタデータ） (2023-10-05T00:34:56Z)
Interactive Molecular Discovery with Natural Language [69.89287960545903]
対象分子を記述・編集するための自然言語を用いた対話型分子設計を提案する。この課題をより良くするために、実験プロパティ情報を注入することによって強化された知識的で汎用的な生成事前学習モデルChatMolを設計する。
論文参考訳（メタデータ） (2023-06-21T02:05:48Z)
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing [107.49804059269212]
分子の化学構造とテキスト記述を共同で学習し, マルチモーダルな分子構造テキストモデル, MoleculeSTMを提案する。実験において、分子STMは、新しい生化学的概念を創出するための最先端の一般化能力を得る。
論文参考訳（メタデータ） (2022-12-21T06:18:31Z)
SciFive: a text-to-text transformer model for biomedical literature [0.9482369543628087]
本稿では,大規模なバイオメディカルコーパスで事前学習したドメイン固有T5モデルであるSciFiveを紹介する。本研究は,より困難なテキスト生成タスクの探索と,本領域における新しい手法の開発を支援する。
論文参考訳（メタデータ） (2021-05-28T06:09:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。