Fugu-MT 論文翻訳(概要): De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers

論文の概要: De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers

arxiv url: http://arxiv.org/abs/2101.00146v1
Date: Fri, 1 Jan 2021 03:09:31 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-16 11:08:25.354319
Title: De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers
Title（参考訳）: de-identifying hospital discharge summary: anend-to-end framework using ensemble of de-identifiers
Authors: Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa Jorm
Abstract要約: 本論文では、病院の退院要約から保護された健康情報を自動削除するエンドツーエンドの非識別フレームワークを提案する。オーストラリアのシドニーにある2つの主要な紹介病院のEMRから抽出された600の病院の排出要約が含まれていました。
参考スコア（独自算出の注目度）: 11.633410102787538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Objective:Electronic Medical Records (EMRs) contain clinical narrative text that is of great potential value to medical researchers. However, this information is mixed with Protected Health Information (PHI) that presents risks to patient and clinician confidentiality. This paper presents an end-to-end de-identification framework to automatically remove PHI from hospital discharge summaries. Materials and Methods:Our corpus included 600 hospital discharge summaries which were extracted from the EMRs of two principal referral hospitals in Sydney, Australia. Our end-to-end de-identification framework consists of three components: 1) Annotation: labelling of PHI in the 600 hospital discharge summaries using five pre-defined categories: person, address, date of birth, individual identification number, phone/fax number; 2) Modelling: training and evaluating ensembles of named entity recognition (NER) models through the use of three natural language processing (NLP) toolkits (Stanza, FLAIR and spaCy) and both balanced and imbalanced datasets; and 3) De-identification: removing PHI from the hospital discharge summaries. Results:The final model in our framework was an ensemble which combined six single models using both balanced and imbalanced datasets for training majority voting. It achieved 0.9866 precision, 0.9862 recall and 0.9864 F1 scores. The majority of false positives and false negatives were related to the person category. Discussion:Our study showed that the ensemble of different models which were trained using three different NLP toolkits upon balanced and imbalanced datasets can achieve good results even with a relatively small corpus. Conclusion:Our end-to-end framework provides a robust solution to de-identifying clinical narrative corpuses safely. It can be easily applied to any kind of clinical narrative documents.
Abstract（参考訳）: 目的:EMR(Electronic Medical Records)には医療研究者にとって大きな価値を持つ臨床物語テキストが含まれている。しかし、この情報は患者や臨床医の機密性にリスクをもたらす保護医療情報(phi)と混ざり合っている。本稿では,病院の退院サマリーから自動的にphiを除去するためのエンド・ツー・エンドの非識別フレームワークを提案する。対象と方法:オーストラリア・シドニーの2つの主要な紹介病院のEMRから抽出した600の病院退院サマリーを含む。 Our end-to-end de-identification framework consists of three components: 1) Annotation: labelling of PHI in the 600 hospital discharge summaries using five pre-defined categories: person, address, date of birth, individual identification number, phone/fax number; 2) Modelling: training and evaluating ensembles of named entity recognition (NER) models through the use of three natural language processing (NLP) toolkits (Stanza, FLAIR and spaCy) and both balanced and imbalanced datasets; and 3) De-identification: removing PHI from the hospital discharge summaries. 結果:我々のフレームワークの最終モデルはアンサンブルで、6つの単一モデルをバランスの取れたデータセットと不均衡なデータセットで組み合わせ、多数決のトレーニングを行った。これは 0.9866 精度、 0.9862 リコール、 0.9864 f1 スコアを達成した。偽陽性と偽陰性の大多数は人格に関連していた。考察:我々の研究は、バランスのとれたデータセットで3つの異なるnlpツールキットを使って訓練された異なるモデルのアンサンブルが、比較的小さなコーパスでも良い結果が得られることを示した。結論:我々のエンドツーエンドフレームワークは、臨床物語コーパスを安全に識別するための堅牢なソリューションを提供する。それはどんな臨床物語の文書にも容易に適用できる。

関連論文リスト

TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet [1.4043931310479378]
BlueScrubs-v1は、広範囲のインターネットコーパスから得られた2500億以上の医療トークンの収集されたデータセットである。各テキストには、医療関連性、精度、事実の詳細、安全性と倫理基準を含む3つのLCMベースの品質スコアが割り当てられている。このData Descriptorは、データセットの作成と検証について詳述し、医療AI研究の潜在的有用性について説明している。
論文参考訳（メタデータ） (2025-04-01T22:25:19Z)
DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization [13.038800602897354]
本研究は, 大規模言語モデルを用いて患者を同定し, 臨床記録の再検討を行った。本手法は, 臨床診断書に適合する患者を同定するために, 大規模言語モデルを用いている。 ClinicalBERTが最も有効であり, マスキングでPIIが同定できたが, 臨床記録の9%は再同定された。
論文参考訳（メタデータ） (2024-10-22T14:06:31Z)
Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
我々は533巻のボクセルレベルのラベルを142ドル(約1万2000円)で、全身CTスキャンのデータセットを作成し、解剖学的包括的カバレッジを提供する。提案手法はラベル集約段階において手作業によるアノテーションに依存しない。我々はCTデータに142ドルの解剖学的構造を予測できる統一解剖学的セグメンテーションモデルをリリースする。
論文参考訳（メタデータ） (2023-07-25T09:48:13Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
この研究は、この領域でツールやリソースを共有する際に直面する困難を浮き彫りにしている。臨床文献のコーパスを12種類に分類した。私たちは、ディープラーニングモデルと手動ルールの結果をマージして、ハイブリッドシステムを構築します。
論文参考訳（メタデータ） (2023-03-23T17:17:46Z)
A Marker-based Neural Network System for Extracting Social Determinants of Health [12.6970199179668]
健康の社会的決定因子(SDoH)は、患者の医療の質と格差を左右する。多くのSDoHアイテムは、電子健康記録の構造化形式でコード化されていない。我々は,臨床ノートから自動的にSDoH情報を抽出する,名前付きエンティティ認識(NER),関係分類(RC),テキスト分類手法を含む多段階パイプラインを探索する。
論文参考訳（メタデータ） (2022-12-24T18:40:23Z)
Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan [25.195233641408233]
「処分要約」は要約の有望な応用の1つである。要約が構造されていないソースからどのように生成されるべきかは、まだ不明である。本研究は,要約における最適粒度を同定することを目的とした。
論文参考訳（メタデータ） (2022-09-20T23:26:02Z)
ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
文脈単語埋め込みモデルは、複数のNLPタスクにおいて最先端の結果を得た。 ICDBigBirdは、Graph Convolutional Network(GCN)を統合するBigBirdベースのモデルである。 ICD分類作業におけるBigBirdモデルの有効性を実世界の臨床データセットで実証した。
論文参考訳（メタデータ） (2022-04-21T20:59:56Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
事前学習型言語モデル (LM) は低リソース環境下での言語間移動に大きな可能性を示している。脳卒中におけるコードミキシング(スペイン・カタラン)臨床ノートの低リソース・実世界の課題を解決するために,NER (name recognition) のためのLMの多言語間転写特性を示す。
論文参考訳（メタデータ） (2022-04-10T21:46:52Z)
A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text [1.4841452489515765]
現在の最先端(SOTA)NLPモデルは、ディープラーニング技術と高度に統合されている。本研究では,医学的実体認識,関係抽出,属性抽出の工学的枠組みを提案する。
論文参考訳（メタデータ） (2022-03-08T03:19:16Z)
Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence [79.038671794961]
我々はUCADI(Unified CT-COVID AI Diagnostic Initiative)を立ち上げ、各ホスト機関でAIモデルを分散的にトレーニングし、独立して実行することができる。本研究は,中国とイギリスに所在する23の病院で採取した3,336例の胸部CT9,573例について検討した。
論文参考訳（メタデータ） (2021-11-18T00:43:41Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
本稿では,ドメイン固有パスマッチングのためのトランスフォーマー言語モデルをトレーニングするためのルールベースのセルフスーパービジョンであるCAPRを紹介する。目的をトランスフォーマーベースの4つのアーキテクチャ、コンテキスト文書ベクトル、ビ-、ポリエンコーダ、クロスエンコーダに適用する。本稿では,ドメイン固有パスの検索において,CAPRが強いベースラインを上回り,ルールベースおよび人間ラベル付きパスを効果的に一般化することを示す。
論文参考訳（メタデータ） (2021-08-02T10:42:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。