Fugu-MT 論文翻訳(概要): SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

論文の概要: SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

arxiv url: http://arxiv.org/abs/2403.05918v2
Date: Tue, 12 Mar 2024 02:45:48 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-13 11:40:48.636404
Title: SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data
Title（参考訳）: SEMRes-DDPM:残差ネットワークに基づく拡散モデリングの不均衡データへの適用
Authors: Ming Zheng, Yang Yang, Zhi-Hang Zhao, Shan-Chao Gan, Yang Chen, Si-Kai Ni and Yang Lu
Abstract要約: データマイニングと機械学習の分野では、一般的に使われている分類モデルは、不均衡なデータで効果的に学習することはできない。古典的なオーバーサンプリング手法の多くは、データのローカル情報のみに焦点を当てたSMOTE技術に基づいている。本稿では,SEMRes-DDPMのオーバーサンプリング手法を提案する。
参考スコア（独自算出の注目度）: 9.969882349165745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data. In order to balance the data distribution before model training, oversampling methods are often used to generate data for a small number of classes to solve the problem of classifying unbalanced data. Most of the classical oversampling methods are based on the SMOTE technique, which only focuses on the local information of the data, and therefore the generated data may have the problem of not being realistic enough. In the current oversampling methods based on generative networks, the methods based on GANs can capture the true distribution of data, but there is the problem of pattern collapse and training instability in training; in the oversampling methods based on denoising diffusion probability models, the neural network of the inverse diffusion process using the U-Net is not applicable to tabular data, and although the MLP can be used to replace the U-Net, the problem exists due to the simplicity of the structure and the poor effect of removing noise. problem of poor noise removal. In order to overcome the above problems, we propose a novel oversampling method SEMRes-DDPM.In the SEMRes-DDPM backward diffusion process, a new neural network structure SEMST-ResNet is used, which is suitable for tabular data and has good noise removal effect, and it can generate tabular data with higher quality. Experiments show that the SEMResNet network removes noise better than MLP; SEMRes-DDPM generates data distributions that are closer to the real data distributions than TabDDPM with CWGAN-GP; on 20 real unbalanced tabular datasets with 9 classification models, SEMRes-DDPM improves the quality of the generated tabular data in terms of three evaluation metrics (F1, G-mean, AUC) with better classification performance than other SOTA oversampling methods.
Abstract（参考訳）: データマイニングと機械学習の分野では、一般的に使用される分類モデルは、不均衡なデータで効果的に学習できない。モデルトレーニング前のデータ分散のバランスをとるために、少数のクラスのデータを生成するためにオーバーサンプリング法がよく使われ、バランスの取れていないデータの分類の問題を解決する。古典的なオーバーサンプリング手法の多くは、データのローカル情報のみに焦点を当てたSMOTE技術に基づいているため、生成したデータに十分な現実性がないという問題がある可能性がある。 In the current oversampling methods based on generative networks, the methods based on GANs can capture the true distribution of data, but there is the problem of pattern collapse and training instability in training; in the oversampling methods based on denoising diffusion probability models, the neural network of the inverse diffusion process using the U-Net is not applicable to tabular data, and although the MLP can be used to replace the U-Net, the problem exists due to the simplicity of the structure and the poor effect of removing noise. ノイズ除去の問題です以上の問題を克服するために,semres-ddpm後方拡散法において,表データに適したノイズ除去効果のよい新しいニューラルネットワーク構造semst-resnetを用い,高品質な表データを生成することが可能な新しいオーバーサンプリング法semres-ddpmを提案する。 SEMRes-DDPMは、CWGAN-GPを用いたTabDDPMよりも実際のデータ分布に近いデータ分布を生成し、9つの分類モデルを持つ20の非バランスな表型データセット上で、SEMRes-DDPMは、3つの評価指標(F1、G-mean、AUC)で生成された表型データの品質を改善し、他のSOTAオーバサンプリング手法よりも優れた分類性能を示す。

関連論文リスト

Federated Learning for Diffusion Models [12.46092849473786]
拡散モデルは、様々なタスクに対して非常に現実的なサンプルを生成することができる強力な生成モデルである。拡散確率モデルを用いたFedDDPM-Federated Learningを提案する。本稿では、FedDDPMの厳密な収束解析を行い、トレーニングオーバーヘッドを低減するための強化アルゴリズムであるFedDDPM+を提案する。
論文参考訳（メタデータ） (2025-03-09T03:41:10Z)
DispFormer: Pretrained Transformer for Flexible Dispersion Curve Inversion from Global Synthesis to Regional Applications [59.488352977043974]
本研究では、レイリー波位相と群分散曲線から$v_s$プロファイルを反転させるトランスフォーマーベースのニューラルネットワークであるDispFormerを提案する。結果は、ラベル付きデータなしでもゼロショットのDispFormerは、基底の真実とよく一致する逆プロファイルを生成することを示している。
論文参考訳（メタデータ） (2025-01-08T09:08:24Z)
DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model [9.908561639396273]
DiffImputeという新しい拡散確率モデル(DDPM)を提案する。既存のデータの信頼性を損なうことなく、欠落したエントリに対して信頼できる警告を生成する。 Missing Completely At Random (MCAR) と Missing At Random (MAR) の様々な設定に適用できる。
論文参考訳（メタデータ） (2024-03-20T08:45:31Z)
Learning with Noisy Foundation Models [95.50968225050012]
本論文は、事前学習データセットにおけるノイズの性質を包括的に理解し分析する最初の研究である。雑音の悪影響を緩和し、一般化を改善するため、特徴空間に適応するチューニング法(NMTune)を提案する。
論文参考訳（メタデータ） (2024-03-11T16:22:41Z)
Classification Diffusion Models: Revitalizing Density Ratio Estimation [21.264497139730473]
$textitclassificationfusion model$ (CDMs) は、DREベースの生成法であり、拡散モデルをデノナイズする形式を取り入れている。提案手法は,MNISTデータセットを超える画像を生成するDREベースの最初の手法である。
論文参考訳（メタデータ） (2024-02-15T16:49:42Z)
UDPM: Upsampling Diffusion Probabilistic Models [33.51145642279836]
拡散確率モデル(DDPM、Denoising Diffusion Probabilistic Models)は近年注目されている。 DDPMは逆プロセスを定義することによって複雑なデータ分布から高品質なサンプルを生成する。生成逆数ネットワーク(GAN)とは異なり、拡散モデルの潜伏空間は解釈できない。本研究では,デノナイズ拡散過程をUDPM(Upsampling Diffusion Probabilistic Model)に一般化することを提案する。
論文参考訳（メタデータ） (2023-05-25T17:25:14Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
異なるスコアに基づく因果探索法は観測データから有向非巡回グラフを学習する。本稿では,Reweighted Score関数ReScoreの適応重みを動的に学習することにより因果発見性能を向上させるためのモデルに依存しないフレームワークを提案する。
論文参考訳（メタデータ） (2023-03-06T14:49:59Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
クロスドメインキーポイント検出方法は、常に適応中にソースデータにアクセスする必要がある。本稿では、ターゲット領域に十分に訓練されたソースモデルのみを提供する、ソースフリーなドメイン適応キーポイント検出について考察する。
論文参考訳（メタデータ） (2023-02-09T12:06:08Z)
Denoising diffusion models for out-of-distribution detection [2.113925122479677]
我々は,確率拡散モデル(DDPM)を自己エンコーダの復号化として活用する。 DDPMを用いてノイズレベルの範囲の入力を再構成し,結果の多次元再構成誤差を用いてアウト・オブ・ディストリビューション入力を分類する。
論文参考訳（メタデータ） (2022-11-14T20:35:11Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
我々は,観測されていない特徴分布を最大エントロピー仮説で近似することにより,ロジスティック回帰と類似した新しいモデルが,集約データからのみ学習されることを示す。我々は、この方法で学習したモデルが、完全な非凝集データでトレーニングされたロジスティックモデルに匹敵するパフォーマンスを達成することができるという、いくつかの公開データセットに関する実証的な証拠を提示する。
論文参考訳（メタデータ） (2022-10-05T09:17:27Z)
Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks [0.1074267520911262]
不均衡データ(ID)は、機械学習(ML)モデルから満足な結果を得るための問題である。本稿では,Deep Neural Networks(DNN)とConvolutional Neural Networks(CNN)に基づく手法の有効性を検討する。信頼性の高い結果を得るために,ランダムにシャッフルしたデータ分布を用いて100回実験を行った。
論文参考訳（メタデータ） (2022-09-01T07:42:16Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
既存のモデルはクリーンデータに基づいてトレーニングされ、クリーンデータトレーニングと現実世界の推論の間にtextitgapが発生する。本稿では,良質なサンプルと低品質のサンプルの両方が類似ベクトル空間に埋め込まれた領域適応法を提案する。広く使用されているデータセット、スニップス、および大規模な社内データセット(1000万のトレーニング例)に関する実験では、この方法は実世界の(騒々しい)コーパスのベースラインモデルを上回るだけでなく、堅牢性、すなわち、騒々しい環境下で高品質の結果を生み出すことを実証しています。
論文参考訳（メタデータ） (2021-04-13T17:54:33Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
深層学習におけるデータ不均衡やラベルノイズ問題に対処するための証明可能な手法(ABSGD)を提案する。本手法は運動量SGDの簡易な修正であり,各試料に個別の重み付けを行う。 ABSGDは追加コストなしで他の堅牢な損失と組み合わせられるほど柔軟である。
論文参考訳（メタデータ） (2020-12-13T03:41:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。