Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231210となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 完全同型暗号化とプライバシ保護機械学習のためのブラインド評価フレームワーク Blind Evaluation Framework for Fully Homomorphic Encryption and Privacy-Preserving Machine Learning ( http://arxiv.org/abs/2310.13140v2 ) ライセンス: Link先を確認	Hunjae Lee, Corey Clark,	(参考訳) FHE(Fully Homomorphic Encryption)を用いたプライバシー保護機械学習(PPML)への様々なアプローチが開発され、データ所有者による信頼できないサーバへのセキュアなデータアウトソーシングに焦点を当てている。 FHEは暗号化データ上での計算を可能にするが、特に標準プログラミングにおいて不可欠な決定式や条件文といった制御構造の統合において、大きな制限に直面している。例えば、決定木の特徴選択のために暗号化されたリストから最小値を選択するといったタスクは、暗号化された形式で比較式を評価することができないため、難しい。 FHEに関する既存の文献の多くは、これらの課題のために事前訓練されたモデルを使用して暗号化された予測に集中しており、トレーニングプロセスは、しばしばインターミディット・ラウンド・オブ・デクリプション・アンド・アセスメント(IRDE)を必要とする。 IRDEは、クライアントがプレーンテキストでデータを復号し評価することで制御構造を処理する一方、潜在的に信頼できないサーバが暗号化された計算を行う対話型通信である。 IRDEプロトコルは、暗号化プログラミングにおける制御構造問題に対する解決策を提供するが、それらのプログラムの一部が暗号化された空間(信頼できないサーバ)を離れ、秘密鍵を保持する信頼されたクライアントで実行されなければならないため、FHEの真に暗号化されたプログラムを構築するという原則に反する。このようなモデルは効率的に製造できるが、IRDEを全て必要としないモデルよりも劣るであろう。 IRDEを除去する機能により、複数のIRDEサイクルに対して信頼できるクライアントを必要とすることなく、信頼できないサーバ上で計算と制御構造を実行することができる。本稿では,Blind Evaluation Framework (BEF)を紹介した。BEFは暗号的にセキュアなプログラミングフレームワークで,条件式を評価せずに,暗号化空間における制御構造の実行を可能にする。 Various approaches to privacy-preserving machine learning (PPML) using Fully Homomorphic Encryption (FHE) have been developed, focusing on secure data outsourcing to untrusted servers by data owners. While FHE enables computation on encrypted data, it faces significant limitations, particularly in integrating control structures like decision expressions and conditional statements, which are vital in standard programming. For instance, tasks like selecting the smallest value from an encrypted list for feature selection in decision trees are challenging due to the inability to evaluate comparison expressions in encrypted form. Most existing literature on FHE have concentrated on encrypted prediction using pre-trained models due to these challenges, with training processes often requiring Intermediate Rounds of Decryption and Evaluation (IRDE). IRDE involves interactive communication where the potentially untrusted server performs encrypted computations, while the client handles control structures by decrypting and evaluating data in plaintext. While it presents a solution to the control structure problem in encrypted programming, IRDE protocols go against FHE's principles of building truly encrypted programs as portions of such programs must leave the encrypted space (untrusted server) and be executed on the trusted client, who holds the private keys for decryption. Such models, however efficiently they can be made, would be inferior to models that eliminate the need for IRDE all-together. The ability to remove IRDE allows both computation and control structures to be performed on untrusted servers without requiring trusted clients for multiple IRDE cycles. This paper introduces the Blind Evaluation Framework (BEF), a cryptographically secure programming framework enabling the execution of control structures in encrypted space without evaluating conditional expressions...	翻訳日:2024-03-19 01:54:08 公開日:2023-12-10
# MuFuzz: ブロックチェーンスマートコントラクトファズリングのためのシーケンス対応ミューテーションとシードマスクガイダンス MuFuzz: Sequence-Aware Mutation and Seed Mask Guidance for Blockchain Smart Contract Fuzzing ( http://arxiv.org/abs/2312.04512v2 ) ライセンス: Link先を確認	Peng Qian, Hanjie Wu, Zeren Du, Turan Vural, Dazhong Rong, Zheng Cao, Lun Zhang, Yanbin Wang, Jianhai Chen, Qinming He,	(参考訳) ブロックチェーンのスマートコントラクトが普及し、より多くの価値あるデジタル資産を保有するようになると、攻撃者にとってますます魅力的なターゲットとなる。ここ数年、スマートコントラクトは壊滅的な攻撃を受け、数十億ドルの損失を被った。スマートコントラクトの欠陥を特定することに対する研究の関心が高まっている。しかし、既存のスマートコントラクトファジィツールは相変わらず不満足だ。彼らは意味のあるトランザクションシーケンスをチェックアウトし、トランザクション毎に重要な入力を指定するのに苦労する。その結果、それらが引き起こすのは限られた範囲のコントラクト状態のみであり、ディープステート空間に隠された複雑な脆弱性を明らかにするのが難しくなる。本稿では,シーケンシャル・アウェア・ミュータントとシードマスク誘導戦略を用いて,スマートコントラクトファジリングに光を当てた。特に,まずデータフローに基づくフィードバックを用いてトランザクション順序を意味のある方法で決定し,さらにより深い状態を探索するためのシーケンス認識突然変異手法を導入する。その後、生成したトランザクション入力をターゲット分岐にバイアスを与えるマスク誘導型シード突然変異戦略を設計する。さらに,ファジングキャンペーン中にファジング資源割り当てのバランスをとる動的適応型エネルギー調整パラダイムを開発する。設計を MuFuzz という新しいスマートコントラクトファザに実装し,それを3つのベンチマークで広範囲に評価する。実証的な結果は、MuFuzzが既存のツールよりも、ブランチカバレッジとバグ発見の両方で優れていることを示している。全体として、MuFuzzは最先端のファズーよりも高いブランチカバレッジ(25%まで)を実現し、既存のバグ検知器よりも30%多くのバグを検出する。 As blockchain smart contracts become more widespread and carry more valuable digital assets, they become an increasingly attractive target for attackers. Over the past few years, smart contracts have been subject to a plethora of devastating attacks, resulting in billions of dollars in financial losses. There has been a notable surge of research interest in identifying defects in smart contracts. However, existing smart contract fuzzing tools are still unsatisfactory. They struggle to screen out meaningful transaction sequences and specify critical inputs for each transaction. As a result, they can only trigger a limited range of contract states, making it difficult to unveil complicated vulnerabilities hidden in the deep state space. In this paper, we shed light on smart contract fuzzing by employing a sequence-aware mutation and seed mask guidance strategy. In particular, we first utilize data-flow-based feedback to determine transaction orders in a meaningful way and further introduce a sequence-aware mutation technique to explore deeper states. Thereafter, we design a mask-guided seed mutation strategy that biases the generated transaction inputs to hit target branches. In addition, we develop a dynamic-adaptive energy adjustment paradigm that balances the fuzzing resource allocation during a fuzzing campaign. We implement our designs into a new smart contract fuzzer named MuFuzz, and extensively evaluate it on three benchmarks. Empirical results demonstrate that MuFuzz outperforms existing tools in terms of both branch coverage and bug finding. Overall, MuFuzz achieves higher branch coverage than state-of-the-art fuzzers (up to 25%) and detects 30% more bugs than existing bug detectors.	翻訳日:2024-03-18 12:56:06 公開日:2023-12-10
# 分裂の可視化:BFTプロトコルにおける信頼されたコンポーネントの役割について Vivisecting the Dissection: On the Role of Trusted Components in BFT Protocols ( http://arxiv.org/abs/2312.05714v1 ) ライセンス: Link先を確認	Alysson Bessani, Miguel Correia, Tobias Distler, Rüdiger Kapitza, Paulo Esteves-Verissimo, Jiangshan Yu,	(参考訳) Gupta et al (EuroSys'23) による最近の論文では、信頼できるコンポーネント(TC)ベースのByzantine Fault-tolerant (BFT)プロトコルが、レプリカグループのサイズを$3f+1$から$2f+1$に下げ、そのようなプロトコルの3つの制限を特定し、代わりにTCsを使用してBFTプロトコルの性能を改善することを提案する。ここでは、両論の欠点を指摘し、BFTプロトコルにおける最も価値あるTCの使用は、クラッシュフォールトトレラント(CFT)プロトコルのように耐障害性を持たせることであり、2f+1$レプリカを使用して最大$f$の障害レプリカを許容することができることを主張する。 A recent paper by Gupta et al. (EuroSys'23) challenged the usefulness of trusted component (TC) based Byzantine fault-tolerant (BFT) protocols to lower the replica group size from $3f+1$ to $2f+1$, identifying three limitations of such protocols and proposing that TCs should be used instead to improve the performance of BFT protocols. Here, we point out flaws in both arguments and advocate that the most worthwhile use of TCs in BFT protocols is indeed to make them as resilient as crash fault-tolerant (CFT) protocols, which can tolerate up to $f$ faulty replicas using $2f+1$ replicas.	翻訳日:2024-03-18 12:46:22 公開日:2023-12-10
# 高速インターネット・コンピュータ・コンセンサス Fast Internet Computer Consensus ( http://arxiv.org/abs/2312.05869v1 ) ライセンス: Link先を確認	Massimo Albarello, Jakub Sliwinski, Yann Vonlanthen, Roger Wattenhofer,	(参考訳) 本稿では,ビザンチンの耐故障性(BFT)設定において,単一ラウンドトリップ時間でトランザクションを確認可能な,最初の回転型リーダ状態マシンレプリケーション(SMR)プロトコルを提案する。インターネット・コンピュータ・コンセンサス(ICC)プロトコルの最小限の変更と無視可能な通信オーバーヘッドに基づいて、高速経路における最適なブロック終端遅延を可能にする新しいデュアルモード機構を導入する。重要なことに、高速経路が有効でない場合でも罰則は発生しないような操作モードが統合される。さらに,本アルゴリズムは,ビューチェンジプロトコルを必要とせず,楽観的な応答性やリーダの回転など,本来のICCプロトコルのコア特性を維持している。我々は,Fast Internet Computer Consensus(FICC)プロトコルの正当性を証明し,そのオープンソース実装を提供する。 FICCプロトコルとICCプロトコルは、グローバルに分散した広域ネットワークで比較される。評価の結果,FICC プロトコルは ICC プロトコルと比較して,さらなるセキュリティ仮定を必要とせず,レイテンシの低減を実現していることがわかった。さらに、レプリカの数を$n = 5f + 1$に増やすことで、理論上の最大33%に近いレイテンシの改善が達成可能であることを示す。我々は,ネットワークトポロジを,コンセンサスアルゴリズムのレイテンシの評価と比較において重要な要素として強調することで結論付けた。 This paper presents the first rotating leader state machine replication (SMR) protocol that allows transactions to be confirmed in just a single round-trip time in the Byzantine fault tolerance (BFT) setting. Based on minimal alterations to the Internet Computer Consensus (ICC) protocol and with negligible communication overhead, we introduce a novel dual mode mechanism that enables optimal block finalization latency in the fast path. Crucially, the modes of operation are integrated, such that even if the fast path is not effective, no penalties are incurred. Moreover, our algorithm maintains the core attributes of the original ICC protocol, including optimistic responsiveness and rotating leaders without the necessity for a view-change protocol. We prove the correctness of our Fast Internet Computer Consensus (FICC) protocol and provide an open-source implementation of it. Both the FICC and original ICC protocol are compared in a globally distributed wide-area network. Our evaluation reveals that the FICC protocol achieves reduced latency compared to the ICC protocol, without requiring additional security assumptions. Furthermore, by increasing the number of replicas to $n = 5f + 1$, we exhibit that latency improvements close to the theoretical maximum of 33% are attainable. We conclude by highlighting the network topology as a significant factor in evaluating and comparing the latency of consensus algorithms.	翻訳日:2024-03-18 12:46:22 公開日:2023-12-10
# TapTree: プロセストレーベースのホスト動作モデリングとシーケンスパターンマイニングによる脅威検出フレームワーク TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining ( http://arxiv.org/abs/2312.07575v1 ) ライセンス: Link先を確認	Mohammad Mamun, Scott Buffett,	(参考訳) システムレベルのイベントを含む監査ログは、サイバー脅威の発生に関する詳細な洞察を提供するため、行動モデリングに頻繁に使用される。しかし、監査ログ内の低レベルのシステムイベントをハイレベルな行動にマッピングすることは、潜在的なサイバー脅威を検出するためにホストのコンテキスト的行動を特定する上で大きな課題となっている。ドメインエキスパートの知識を頼りにすれば、実践的な実装が制限される可能性がある。本稿では,システムイベントのセマンティック情報をコンパイルすることでホスト動作を抽出するTapTreeを提案する。システム生成プロセスツリーとして振る舞いを抽出した後、TapTreeは振る舞いの表現としてイベントセマンティクスを統合する。アナリストのパターンマッチングワークロードをさらに削減するために、TapTreeは意味論的に等価なパターンを集約し、代表的な振る舞いを最適化する。最近のベンチマーク監査ログデータセット(DARPA OpTC)に対する評価では、TapTreeは、ツリーパターンクエリとシーケンシャルパターンマイニング技術を使用して、接続されたシステムイベントのセマンティクスを推論し、行動抽象化の高精度化と、高度なパーシスタント・スリート(APT)攻撃検出を実現している。さらに、オンラインのベースラインモデルを徐々に更新し、時間とともに新しいログパターンに適応させる方法について説明する。 Audit logs containing system level events are frequently used for behavior modeling as they can provide detailed insight into cyber-threat occurrences. However, mapping low-level system events in audit logs to highlevel behaviors has been a major challenge in identifying host contextual behavior for the purpose of detecting potential cyber threats. Relying on domain expert knowledge may limit its practical implementation. This paper presents TapTree, an automated process-tree based technique to extract host behavior by compiling system events' semantic information. After extracting behaviors as system generated process trees, TapTree integrates event semantics as a representation of behaviors. To further reduce pattern matching workloads for the analyst, TapTree aggregates semantically equivalent patterns and optimizes representative behaviors. In our evaluation against a recent benchmark audit log dataset (DARPA OpTC), TapTree employs tree pattern queries and sequential pattern mining techniques to deduce the semantics of connected system events, achieving high accuracy for behavior abstraction and then Advanced Persistent Threat (APT) attack detection. Moreover, we illustrate how to update the baseline model gradually online, allowing it to adapt to new log patterns over time.	翻訳日:2024-03-18 12:26:52 公開日:2023-12-10
# Descriptor-Conditioned Reinforcement Learning による品質多様性の相乗化 Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning ( http://arxiv.org/abs/2401.08632v1 ) ライセンス: Link先を確認	Maxence Faldor, F\'elix Chalumeau, Manon Flageat, Antoine Cully	(参考訳) インテリジェンスの基本的特徴は、与えられた課題に対処したり、予期せぬ状況に適応するために、斬新で創造的な解決策を見つけることである。このことを反映して、Quality-Diversityの最適化は進化的アルゴリズムのファミリーであり、多種多様な高性能なソリューションのコレクションを生成する。これらの中、map-elitesは進化ロボティクスを含む様々な分野にうまく適用された顕著な例である。しかし、MAP-Elitesは遺伝的アルゴリズムから派生したランダムな突然変異を持つ分岐探索を行い、低次元解の進化する集団に限られる。 pga-map-elitesはこの制限を、大規模ニューラルネットワークの進化を可能にする深層強化学習にインスパイアされた勾配ベースの変分演算子を用いて克服する。多くの環境で高い性能を示すが、PGA-MAP-Elitesは勾配に基づく変動作用素の収束探索が多様性を妨げるいくつかのタスクで失敗する。 In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. 提案手法であるDCG-MAP-Elitesは、7つの困難な連続制御ロコモーションタスクのベースラインと同等以上のQDスコアとカバレッジを達成する。 A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.	翻訳日:2024-01-22 09:51:18 公開日:2023-12-10
# 量子遺伝アルゴリズムの探求:死の谷を横切る冒険 The Conquest of Quantum Genetic Algorithms: The Adventure to Cross the Valley of Death ( http://arxiv.org/abs/2401.08631v1 ) ライセンス: Link先を確認	Rafael Lahoz-Beltra	(参考訳) 近年、AIが実りある時代を迎えている時代に初めて量子コンピュータが出現したことで、多くのAI研究者は、量子コンピュータ上で動くアルゴリズムに適応する誘惑に駆られるようになった。しかし多くの場合、量子コンピューティングの基礎となる機能や原理は従来のコンピュータとは大きく異なるため、初期の熱意はフラストレーションに終止符を打った。本稿では,ダーウィンの進化機構(いわゆる遺伝的アルゴリズム)に基づいて,進化アルゴリズムの量子バージョンを設計する際に生じる困難について論じる。この論文には、これらの進化的アルゴリズムの量子バージョンであるPythonとQISKITの両方のコードが含まれており、古典的アルゴリズムを量子バージョンに翻訳する際に生じるセットバックを読者が体験することができる。この論文で研究されているRQGA(Reduced Quantum Genetic Algorithm)と呼ばれるアルゴリズムは、他のAIアルゴリズムに共通するこれらの困難を示す例として選択されている。 In recent years, the emergence of the first quantum computers at a time when AI is undergoing a fruitful era has led many AI researchers to be tempted into adapting their algorithms to run on a quantum computer. However, in many cases the initial enthusiasm has ended in frustration, since the features and principles underlying quantum computing are very different from traditional computers. In this paper, we present a discussion of the difficulties arising when designing a quantum version of an evolutionary algorithm based on Darwin's evolutionary mechanism, the so-called genetic algorithms. The paper includes the code in both Python and QISKIT of the quantum version of one of these evolutionary algorithms allowing the reader to experience the setbacks arising when translating a classical algorithm to its quantum version. The algorithm studied in this paper, termed RQGA (Reduced Quantum Genetic Algorithm), has been chosen as an example that clearly shows these difficulties, which are common to other AI algorithms.	翻訳日:2024-01-22 09:50:53 公開日:2023-12-10
# ディープラーニングを用いたspotifyの音楽レコメンデーション Music Recommendation on Spotify using Deep Learning ( http://arxiv.org/abs/2312.10079v1 ) ライセンス: Link先を確認	Chhavi Maheshwari	(参考訳) 約5000万曲と40億のプレイリストをホストするspotifyには、毎日膨大な量のデータがあり、600ギガバイト以上のデータがある(harvard.edu)。 Spotifyがレコメンデーションシステムで使用しているアルゴリズムはプロプライエタリで機密であるため、ビッグデータ分析とレコメンデーションのためのコードは推測のみ可能である。しかしながら、Spotifyはユーザーのプレイリストとパーソナライズされたミックスをターゲットとして、探索とエクスプロイト(kaggle.com)という2つの主要な戦略を使用していると広く説かれている。本稿では,深層学習のアプローチを応用したフィルタリングを最大限に活用することを目的としている。アーキテクチャはそれぞれ98.57%と80%のトレーニングと検証精度を達成している。 Hosting about 50 million songs and 4 billion playlists, there is an enormous amount of data generated at Spotify every single day - upwards of 600 gigabytes of data (harvard.edu). Since the algorithms that Spotify uses in recommendation systems is proprietary and confidential, code for big data analytics and recommendation can only be speculated. However, it is widely theorized that Spotify uses two main strategies to target users' playlists and personalized mixes that are infamous for their retention - exploration and exploitation (kaggle.com). This paper aims to appropriate filtering using the approach of deep learning for maximum user likeability. The architecture achieves 98.57% and 80% training and validation accuracy respectively.	翻訳日:2024-01-15 13:49:40 公開日:2023-12-10
# データレンズによる初期のChatGPTユーザ画像 Early ChatGPT User Portrait through the Lens of Data ( http://arxiv.org/abs/2312.10078v1 ) ライセンス: Link先を確認	Yuyang Deng, Ni Zhao, Xin Huang	(参考訳) ChatGPTはローンチ以来、多目的な対話型AIプラットフォームとして成功し、世界中の数百万のユーザーを集め、学術、工業、一般コミュニティに広く認知されている。本稿は,初期のGPTユーザの肖像を指差し,その進化過程を理解することを目的とする。具体的な質問には、関心のあるトピックや潜在的なキャリア、時間とともにどのように変化するかなどが含まれる。実世界のChatGPTデータセットの詳細な分析を行い、ユーザとChatGPTのマルチターン会話を行う。マルチプログレッシブアプローチにより、ターン数を調べて会話のダイナミクスを定量化し、ユーザ感情の変動を理解するために感情を計測し、最後にLDA(Latent Dirichlet Allocation)を用いて会話内の上位トピックを識別する。ユーザ人口と関心の変化を理解することによって、人間とAIの相互作用の性質の変化に光を当て、言語モデルによるユーザエンゲージメントの今後の動向を予測することを目指している。 Since its launch, ChatGPT has achieved remarkable success as a versatile conversational AI platform, drawing millions of users worldwide and garnering widespread recognition across academic, industrial, and general communities. This paper aims to point a portrait of early GPT users and understand how they evolved. Specific questions include their topics of interest and their potential careers; and how this changes over time. We conduct a detailed analysis of real-world ChatGPT datasets with multi-turn conversations between users and ChatGPT. Through a multi-pronged approach, we quantify conversation dynamics by examining the number of turns, then gauge sentiment to understand user sentiment variations, and finally employ Latent Dirichlet Allocation (LDA) to discern overarching topics within the conversation. By understanding shifts in user demographics and interests, we aim to shed light on the changing nature of human-AI interaction and anticipate future trends in user engagement with language models.	翻訳日:2024-01-15 13:49:30 公開日:2023-12-10
# ソーシャルメディアにおける攻撃的言語識別における多言語モデルの性能 The performance of multiple language models in identifying offensive language on social media ( http://arxiv.org/abs/2312.11504v1 ) ライセンス: Link先を確認	Hao Li, Brandon Bennett	(参考訳) テキスト分類は自然言語処理の分野で重要なトピックである。情報検索、デジタルライブラリ、自動抽象化、テキストフィルタリング、単語の意味的識別など多くの分野に適用されている。本研究の目的は,様々なアルゴリズムを用いて攻撃的ポストを識別し,様々な評価手法に対する性能評価を行うことである。このプロジェクトの動機は、悪質な投稿のスクリーニングを自動化することで、これらの言語の人間検閲に対する害を軽減することである。この分野は新しい分野であり、過去2年間、多くの関心にもかかわらず、犯罪の対象に焦点が当てられていない。本研究は,本研究を通じて,識別方法と識別内容に関する今後の研究を刺激するものである。 Text classification is an important topic in the field of natural language processing. It has been preliminarily applied in information retrieval, digital library, automatic abstracting, text filtering, word semantic discrimination and many other fields. The aim of this research is to use a variety of algorithms to test the ability to identify offensive posts and evaluate their performance against a variety of assessment methods. The motivation for this project is to reduce the harm of these languages to human censors by automating the screening of offending posts. The field is a new one, and despite much interest in the past two years, there has been no focus on the object of the offence. Through the experiment of this project, it should inspire future research on identification methods as well as identification content.	翻訳日:2024-01-15 13:23:34 公開日:2023-12-10
# 音声とテキストに基づく感情認識 Speech and Text-Based Emotion Recognizer ( http://arxiv.org/abs/2312.11503v1 ) ライセンス: Link先を確認	Varun Sharma	(参考訳) 感情コンピューティングは、人間の感情を理解し、解釈し、反応できるシステムや技術の開発に焦点を当てた研究分野である。特に音声感情認識(ser)は、近年研究者から多くの注目を集めている。しかしながら、多くの場合、トレーニングと評価に使用される公開データセットは、感情ラベル間で不足し、不均衡である。本研究では,これらのデータセットと各種音声データ拡張技術を組み合わせて,これらのデータセットからバランスのとれたコーパスを構築することに焦点を当てた。さらに,音声感情認識のための異なるアーキテクチャを実験した。最良システムであるマルチモーダル音声とテキストベースモデルにより,119.66のベースラインアルゴリズムの性能と比較して,UA(Unweighed Accuracy)+WA(Weighed Accuracy)の157.57のパフォーマンスが得られる。 Affective computing is a field of study that focuses on developing systems and technologies that can understand, interpret, and respond to human emotions. Speech Emotion Recognition (SER), in particular, has got a lot of attention from researchers in the recent past. However, in many cases, the publicly available datasets, used for training and evaluation, are scarce and imbalanced across the emotion labels. In this work, we focused on building a balanced corpus from these publicly available datasets by combining these datasets as well as employing various speech data augmentation techniques. Furthermore, we experimented with different architectures for speech emotion recognition. Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66	翻訳日:2024-01-15 13:23:22 公開日:2023-12-10
# 現実の論理的一貫した予後モデルとしての意識 Consciousness as a logically consistent and prognostic model of reality ( http://arxiv.org/abs/2401.00005v1 ) ライセンス: Link先を確認	Evgenii Vityaev	(参考訳) この研究は、脳が外界の因果関係を論理的に一貫性があり、予測可能な現実のモデルとして反映していることを示している。本論文は,統計的曖昧性の問題を解析,解決し,確率的最大固有規則として因果関係の形式モデルを提供する。脳は因果関係から可能なすべての推論を行う。提案された形式モデルがあいまいな推論の性質を持つことを証明し、一貫した前提から一貫した結論を導き出す。これは全ての推論の集合が知覚された世界の一貫したモデルを形成することを可能にする。因果関係は周期的予測可能特性の固定点を生成する。ジョン・セントミルによって導入された「自然」分類を考察し、対象の属性の様々な不動点が外界の「自然」分類を形成することを実証する。次に、eleanor rosch と bob rehder によって導入された「自然」圏の概念と圏の因果モデルを検討し、これらの概念を形式化する対象属性間の因果関係の不動点を実証する。もし「自然」分類が外界の物体、そして「自然」概念がこれらの物体の知覚を記述しているなら、G.トノニによって導入された統合情報理論は「自然」分類を反映した「自然」概念形成のための脳の情報プロセスを記述する。我々は、統合情報によって物体の識別精度が高いことを主張する。符号化された桁の固定点形成を示すコンピュータベースの実験が提供される。 The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prognostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and provides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent premises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal relationships may create fixed points of cyclic inter-predictable properties. We consider the "natural" classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects' attributes forms a "natural" classification of the external world. Then we consider notions of "natural" categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demonstrate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the "natural" classification describes the objects of the external world, and "natural" concepts the perception of these objects, then the theory of integrated information, introduced by G. Tononi, describes the information processes of the brain for "natural" concepts formation that reflects the "natural" classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits.	翻訳日:2024-01-15 12:41:04 公開日:2023-12-10
# 現実予測の最大精度を提供する情報非還元主義的意識理論 Informational non-reductionist theory of consciousness that providing maximum accuracy of reality prediction ( http://arxiv.org/abs/2401.00004v1 ) ライセンス: Link先を確認	E.E. Vityaev	(参考訳) 本論では,非還元主義的意識論について考察し,現実の理論や生理学・心理学理論には適用できない。 D.I.Dubrovskyの"Mind-Brain Problem"への"情報的アプローチ"に続いて、観察された現象に関する情報のプリズムを通じて現実を考察する。この枠組みの中では、意識情報理論(ITS)の発展の次の原則が提案されている:脳は外界におけるすべての因果関係を発見し、それらによって可能なすべての推論を行う。本論文は,(1)外界構造に関する情報法則にも基づき,(2)脳機能系とセル・アンサンブルの構造と機能を説明する,(3)予測の最大精度と現実の予測を保証すること,(4)出現する矛盾を解決すること,(5)は脳の現実を反映した情報理論である。 The paper considers a non-reductionist theory of consciousness, which is not reducible to theories of reality and to physiological or psychological theories. Following D.I.Dubrovsky's "informational approach" to the "Mind-Brain Problem", we consider the reality through the prism of information about observed phenomena, which, in turn, is perceived by subjective reality through sensations, perceptions, feelings, etc., which, in turn, are information about the corresponding brain processes. Within this framework the following principle of the Information Theory of Consciousness (ITS) development is put forward: the brain discovers all possible causal relations in the external world and makes all possible inferences by them. The paper shows that ITS built on this principle: (1) also base on the information laws of the structure of external world; (2) explains the structure and functioning of the brain functional systems and cellular ensembles; (3) ensures maximum accuracy of predictions and the anticipation of reality; (4) resolves emerging contradictions and (5) is an information theory of the brain's reflection of reality.	翻訳日:2024-01-15 12:40:38 公開日:2023-12-10
# 完全テスト時間適応のための特異値ペナライゼーションと意味データ拡張 Singular Value Penalization and Semantic Data Augmentation for Fully Test-Time Adaptation ( http://arxiv.org/abs/2312.08378v1 ) ライセンス: Link先を確認	Houcheng Su, Daixian Liu, Mengzhu Wang, Wei Wang	(参考訳) 完全なテスト時間適応(FTTA)は、テストフェーズ中にソースドメイン上でトレーニングされたモデルをターゲットドメインに適応させる。既存の手法は通常エントロピー最小化を採用し、目標予測結果の不確実性を低減し、FTTAの性能を向上させる。しかし、ターゲット予測結果の多様性を保証することができない。最近の領域適応研究は、予測結果の特異値の和を最大化すれば、その信頼性(識別可能性)と多様性を同時に向上できることを示した。しかし、トレーニング段階では、大きな特異値は通常損失最大化において支配的な位置を占める。その結果、モデルは識別し易いクラスに対する識別可能性を高める傾向が強くなり、多様性の改善は不十分である。さらに、FTTAの適応と予測は、現在のバッチのデータのみを使用し、オーバーフィッティングのリスクにつながる可能性がある。上記の問題に対処するため,我々は特異値の和を最大化し,その分散を最小化する。これにより、モデルがより小さな特異値に焦点を合わせ、より挑戦的なクラス間の差別性を高め、予測結果の多様性を効果的に増大させることができる。さらに,前回のバッチからのデータを取り込んで,現在のバッチに対する意味的データ拡張を実現し,過剰フィッティングのリスクを低減した。ベンチマークデータセットを広範囲に実験した結果,提案手法は比較対象のftta法を上回っている。 Fully test-time adaptation (FTTA) adapts a model that is trained on a source domain to a target domain during the testing phase, where the two domains follow different distributions and source data is unavailable during the training phase. Existing methods usually adopt entropy minimization to reduce the uncertainty of target prediction results, and improve the FTTA performance accordingly. However, they fail to ensure the diversity in target prediction results. Recent domain adaptation study has shown that maximizing the sum of singular values of prediction results can simultaneously enhance their confidence (discriminability) and diversity. However, during the training phase, larger singular values usually take up a dominant position in loss maximization. This results in the model being more inclined to enhance discriminability for easily distinguishable classes, and the improvement in diversity is insufficiently effective. Furthermore, the adaptation and prediction in FTTA only use data from the current batch, which may lead to the risk of overfitting. To address the aforementioned issues, we propose maximizing the sum of singular values while minimizing their variance. This enables the model's focus toward the smaller singular values, enhancing discriminability between more challenging classes and effectively increasing the diversity of prediction results. Moreover, we incorporate data from the previous batch to realize semantic data augmentation for the current batch, reducing the risk of overfitting. Extensive experiments on benchmark datasets show our proposed approach outperforms some compared state-of-the-art FTTA methods.	翻訳日:2023-12-16 03:23:23 公開日:2023-12-10
# i'm hoi:3次元オブジェクトインタラクションの慣性認識モノクロキャプチャ I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions ( http://arxiv.org/abs/2312.08869v1 ) ライセンス: Link先を確認	Chengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, Lan Xu	(参考訳) 私たちは、センサー能力の豊富な多様で「スマート」なデバイスに囲まれた世界に住んでいる。人類とこれらの物体の相互作用を便利に捉えている。本稿では,RGBカメラと物体搭載慣性測定ユニット(IMU)の最小限の量を用いて,人間と物体の3次元運動を忠実に捉えるモノクラースキームI'm-HOIを提案する。一般的な動き推論とカテゴリー認識の洗練を兼ね備えている。前者に対しては、IMU信号とRGBストリームを融合させ、段階的に人間の動きを回復し、その後に付随する物体の動きを回復する全体的対象追跡手法を導入する。後者については、IMUの生観測と前段階の結果の両方をパラメータ化表現の下で条件付けしたカテゴリー対応運動拡散モデルを提案する。初期結果を著しく洗練し、鮮やかな体、手、物体の動きを生成する。さらに,人間と物体の動き,RGBの高密度入力,およびリッチな物体搭載IMU測定による大規模データセットをコントリビュートする。広汎な実験は、ハイブリッドキャプチャ環境でのI'm-HOIの有効性を示す。私たちのデータセットとコードはコミュニティにリリースされます。 We are living in a world surrounded by diverse and "smart" devices with rich modalities of sensing ability. Conveniently capturing the interactions between us humans and these objects remains far-reaching. In this paper, we present I'm-HOI, a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting: using a minimal amount of RGB camera and object-mounted Inertial Measurement Unit (IMU). It combines general motion inference and category-aware refinement. For the former, we introduce a holistic human-object tracking method to fuse the IMU signals and the RGB stream and progressively recover the human motions and subsequently the companion object motions. For the latter, we tailor a category-aware motion diffusion model, which is conditioned on both the raw IMU observations and the results from the previous stage under over-parameterization representation. It significantly refines the initial results and generates vivid body, hand, and object motions. Moreover, we contribute a large dataset with ground truth human and object motions, dense RGB inputs, and rich object-mounted IMU measurements. Extensive experiments demonstrate the effectiveness of I'm-HOI under a hybrid capture setting. Our dataset and code will be released to the community.	翻訳日:2023-12-15 22:37:52 公開日:2023-12-10
# 細胞レベルでの機能組織単位の半監督的セグメンテーション Semi-Supervised Segmentation of Functional Tissue Units at the Cellular Level ( http://arxiv.org/abs/2305.02148v2 ) ライセンス: Link先を確認	Volodymyr Sydorskyi, Igor Krashenyi, Denis Sakva and Oleksandr Zarichkovyi	(参考訳) 本稿では,最新の深層学習セマンティックセマンティックセマンティクスアプローチと,ドメイン適応と半教師付き学習技術を用いた,細胞レベルでの機能組織単位セマンティクスの新しい手法を提案する。このアプローチにより、ドメインギャップの最小化、クラス不均衡、HPAとHubMAPデータセット間の設定の影響のキャプチャが可能になる。提案手法は, 細胞レベルでの機能的組織単位のセグメンテーションにおける現状と同等である。ソースコードはhttps://github.com/VSydorskyy/hubmap_2022_htt_solutionで入手できる。 We present a new method for functional tissue unit segmentation at the cellular level, which utilizes the latest deep learning semantic segmentation approaches together with domain adaptation and semi-supervised learning techniques. This approach allows for minimizing the domain gap, class imbalance, and captures settings influence between HPA and HubMAP datasets. The presented approach achieves comparable with state-of-the-art-result in functional tissue unit segmentation at the cellular level. The source code is available at https://github.com/VSydorskyy/hubmap_2022_htt_solution	翻訳日:2023-12-14 20:49:12 公開日:2023-12-10
# 一般化グラフプロンプト:グラフ上の事前学習とダウンストリームタスクの統合に向けて Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs ( http://arxiv.org/abs/2311.15317v2 ) ライセンス: Link先を確認	Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen and Xinming Zhang	(参考訳) グラフニューラルネットワークはグラフ表現学習の強力なツールとして登場したが、そのパフォーマンスはタスク固有の監督に大きく依存している。ラベル付け要求を減らすため、"pre-train, prompt"パラダイムはますます一般的になっている。しかしながら、グラフ上でのプロンプトに関する既存の研究は限定的であり、異なる下流タスクにアピールするための普遍的な治療法が欠如している。本稿では,グラフの事前学習と促進のための新しいフレームワークであるGraphPromptを提案する。 graphpromptは、事前トレーニングとダウンストリームのタスクを共通のタスクテンプレートに統合するだけでなく、学習可能なプロンプトを使用して、事前トレーニングされたモデルから最も関連する知識をタスク固有の方法で特定する。この2つのステージでGraphPromptをさらに強化するために、GraphPrompt+に2つの大きな拡張を加えました。まず、単純なリンク予測以上のグラフ事前学習タスクを一般化し、タスクテンプレートとの互換性を広げる。次に,事前学習したグラフエンコーダの各層に一連のプロンプトベクトルを組み込んだ,より一般化されたプロンプト設計を提案する。最後に、GraphPromptとGraphPrompt+を評価し分析するために、5つの公開データセットに関する広範な実験を行う。 Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.	翻訳日:2023-12-14 20:04:32 公開日:2023-12-10
# アルゴリズムガバナンスにおけるフレキシビリティ向上のためのスマートハイブリッド契約の利用について On the Use of Smart Hybrid Contracts to Provide Flexibility in Algorithmic Governance ( http://arxiv.org/abs/2312.07565v1 ) ライセンス: Link先を確認	Carlos Molina-Jimenez and Sandra Milena Felizia	(参考訳) 法律の施行を自動化するためのコンピュータ技術の利用は、官僚的な手続きを単純化するための有望な代替手段である。しかし、不注意な自動化は、個人やマイノリティの特質を考慮しないアルゴリズムによって駆動される、柔軟で非人間的な法執行システムをもたらす可能性がある。本稿では,規制を盲目的に強制するよりも,監視にデプロイされたハイブリッドスマートコントラクトが柔軟性の向上に有効であることを論じる。厳格な予防が必要とされる場合に限って実施は適切な代替手段であるが,監視に基づく修正アプローチの方が柔軟で適切である場合が多いと論じる。柔軟性を高めるために、ハイブリッドスマートコントラクトは、人間の判断が必要なとき、人間またはそのグループの介入を要求するために停止するようにプログラムすることができる。 The use of computer technology to automate the enforcement of law is a promising alternative to simplify bureaucratic procedures. However, careless automation might result in an inflexible and dehumanise law enforcement system driven by algorithms that do not account for the particularities of individuals or minorities. In this paper, we argue that hybrid smart contracts deployed to monitor rather than to blindly enforce regulations can be used to add flexibility. Enforcement is a suitable alternative only when prevention is strictly necessary; however, we argue that in many situations a corrective approach based on monitoring is more flexible and suitable. To add more flexibility, the hybrid smart contract can be programmed to stop to request the intervention of a human or of a group of them when human judgement is needed.	翻訳日:2023-12-14 18:18:40 公開日:2023-12-10
# 動的非エルミート皮膚効果の観察 Observation of dynamic non-Hermitian skin effects ( http://arxiv.org/abs/2312.07564v1 ) ライセンス: Link先を確認	Zhen Li, Li-Wei Wang, Xulong Wang, Zhi-Kang Lin, Guancong Ma, and Jian-Hua Jiang	(参考訳) 非エルミート効果は、非平衡系の理解を大きく変える物質の位相操作の新しいパラダイムとして登場し、例外点やスペクトルトポロジーといった新しい概念や、非エルミート皮膚効果(nhses)のようなエキゾチックな現象を導入している。しかしながら、既存のほとんどの研究は非エルミート固有状態に焦点を当てているが、非エルミート系の動的性質は、波動自己修復、カイラルゼナートンネル、実験ではまだ確認されていない動的NHSEなどの予期せぬ現象を予測して、ごく最近まで議論されてきた。本稿では, 波長可変な一次元非共役二重鎖力学系を用いて, リッチな非エルミート系皮膚力学を初めて実験的に観察した。注目すべきは、動的NHSEは異なる動的相の様々な動的挙動で観察され、一般化されたブリルアンゾーンと関連する概念を通して理解できるこれらの相の興味深い性質を明らかにすることである。さらに、観測された波長可変非エルミート皮膚のダイナミックスと増幅、バルク一方向波伝播、および境界波トラップは、制御可能でロバストな方法で波を誘導し、トラップし、増幅する有望な方法を提供する。本研究は, 物質非平衡相の研究を融合させ, 情報処理に新たな応用をもたらす非エルミタン動力学への新たな道を開くことを目的とした。 Non-Hermitian effects have emerged as a new paradigm for the manipulation of phases of matter that profoundly changes our understanding of non-equilibrium systems, introducing novel concepts such as exceptional points and spectral topology, as well as exotic phenomena such as non-Hermitian skin effects (NHSEs). Most existing studies, however, focus on non-Hermitian eigenstates, whereas dynamic properties of non-Hermitian systems have been discussed only very recently, predicting unexpected phenomena such as wave self-healing, chiral Zener tunneling, and the dynamic NHSEs that are not yet confirmed in experiments. Here, we report the first experimental observation of rich non-Hermitian skin dynamics using tunable one-dimensional nonreciprocal double-chain mechanical systems with glide-time symmetry. Remarkably, dynamic NHSEs are observed with various dynamic behaviors in different dynamic phases, revealing the intriguing nature of these phases that can be understood via the generalized Brillouin zone and the related concepts. Moreover, the observed tunable non-Hermitian skin dynamics and amplifications, the bulk unidirectional wave propagation, and the boundary wave trapping provide promising ways to guide, trap, and amplify waves in a controllable and robust way. Our findings unveil the fundamental aspects and open a new pathway toward non-Hermitian dynamics, which will fertilize the study of non-equilibrium phases of matter and give rise to novel applications in information processing.	翻訳日:2023-12-14 18:18:26 公開日:2023-12-10
# メタモデルと文法の共進化のための自動支援に向けて Towards Automated Support for the Co-Evolution of Meta-Models and Grammars ( http://arxiv.org/abs/2312.07582v1 ) ライセンス: Link先を確認	Weixing Zhang	(参考訳) ブレンドモデリングは、同じ基礎となるモデリング言語のための複数の表記間のシームレスな相互作用を含む新興パラダイムである。我々はメタモデルに基づくモデル駆動工学(MDE)アプローチに注目し,モデリングツールのブレンドモデリング機能を改善するためにテキスト言語を開発する。本稿ではメタモデルに基づくMDE設定において,言語技術者がテキスト言語を開発する際に,メタモデルと文法の共進化を支援する手法を提案する。まず,混合モデリングをサポートするモデリングツールの課題と限界を総合的に報告し,その改善の機会について報告する。第2に,言語技術者が必要に応じてXtextのジェネレータ機能を拡張できることを実証する。第3に,生成文法を持つ言語をpython型言語に変換する半自動的手法を提案する。最後に、異なるスタイルの言語の迅速なプロトタイピングと、進化する言語のメタモデルと文法の共進化をサポートするソリューション(グラマー最適化)を提供する。 Blended modeling is an emerging paradigm involving seamless interaction between multiple notations for the same underlying modeling language. We focus on a model-driven engineering (MDE) approach based on meta-models to develop textual languages to improve the blended modeling capabilities of modeling tools. In this thesis, we propose an approach that can support the co-evolution of meta-models and grammars as language engineers develop textual languages in a meta-model-based MDE setting. Firstly, we comprehensively report on the challenges and limitations of modeling tools that support blended modeling, as well as opportunities to improve them. Second, we demonstrate how language engineers can extend Xtext's generator capabilities according to their needs. Third, we propose a semi-automatic method to transform a language with a generated grammar into a Python-style language. Finally, we provide a solution (i.e., GrammarOptimizer) that can support rapid prototyping of languages in different styles and the co-evolution of meta-models and grammars of evolving languages.	翻訳日:2023-12-14 18:12:15 公開日:2023-12-10
# スライス処理技術とct画像からのxception分類器を用いたcovid-19検出 COVID-19 Detection Using Slices Processing Techniques and a Modified Xception Classifier from Computed Tomography Images ( http://arxiv.org/abs/2312.07580v1 ) ライセンス: Link先を確認	Kenan Morani	(参考訳) 本稿では,従来の診断方法を拡張し,CT画像からCOVID-19を検出する方法を提案する。モデル誤分類を減らすために、画像処理の2つの重要なステップが採用された。まず、上側と下側のスライスが取り除かれ、各患者のスライスの60%が保存された。第2に、全てのスライスは肺領域を強調するために手作業で切り刻みを行った。その後、Xception Transfer Learning Modelに再サイズのCTスキャン(224×224)を入力した。 Xceptionのアーキテクチャと事前訓練された重量を活用して、修正されたモデルはバイナリ分類を実現した。 COV19-CTデータベースで得られた結果から, 従来のソリューションと同一データセットの代替品と比較して, スライスレベルと患者レベルのマクロF1スコアが高かった。 This paper extends our previous method for COVID-19 diagnosis, proposing an enhanced solution for detecting COVID-19 from computed tomography (CT) images. To decrease model misclassifications, two key steps of image processing were employed. Firstly, the uppermost and lowermost slices were removed, preserving sixty percent of each patient's slices. Secondly, all slices underwent manual cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by 224) were input into an Xception transfer learning model. Leveraging Xception's architecture and pre-trained weights, the modified model achieved binary classification. Promising results on the COV19-CT database showcased higher validation accuracy and macro F1 score at both the slice and patient levels compared to our previous solution and alternatives on the same dataset.	翻訳日:2023-12-14 18:11:44 公開日:2023-12-10
# 価値アライメント戦略としての脳から機械への交叉受精共感 Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy ( http://arxiv.org/abs/2312.07579v1 ) ライセンス: Link先を確認	Devin Gonier, Adrian Adduci, Cassidy LoCascio	(参考訳) AIアライメント研究は、マシンによる独立したアクションが常に倫理的であることを保証するために、人間とAIの目標を調整することを目指している。本論文は, より誘惑的なアプローチを優先してしばしば無視されるにもかかわらず, この課題に対して共感が不可欠であると主張している。我々は、倫理と共感をアルゴリズム的に理解する基盤として、脳の文脈内で道徳を基礎とする内在的アプローチを提供する。これらの議論は関連する文献の調査によって正当化される。この論文は、今後の研究といくつかの実験的な観察に対する提案された実験的アプローチで締めくくられる。 AI Alignment research seeks to align human and AI goals to ensure independent actions by a machine are always ethical. This paper argues empathy is necessary for this task, despite being often neglected in favor of more deductive approaches. We offer an inside-out approach that grounds morality within the context of the brain as a basis for algorithmically understanding ethics and empathy. These arguments are justified via a survey of relevant literature. The paper concludes with a suggested experimental approach to future research and some initial experimental observations.	翻訳日:2023-12-14 18:10:49 公開日:2023-12-10
# テーブルシフトを用いたタブラルデータのベンチマーク分布シフト Benchmarking Distribution Shift in Tabular Data with TableShift ( http://arxiv.org/abs/2312.07577v1 ) ライセンス: Link先を確認	Josh Gardner, Zoran Popovic, Ludwig Schmidt	(参考訳) 分散シフトに対するロバスト性は、研究対象から現実世界への展開への移行に伴って、テキストや画像モデルに対する関心が高まっている。しかし、表型データの普及や、テキストや画像と比較して表型データに使用するモデルの違いにもかかわらず、表型機械学習タスクの分散シフトのための高品質なベンチマークはいまだに欠落している。その結果,分布シフトに対する表モデルのロバスト性はよく分かっていない。この問題に対処するため,表データの分散シフトベンチマークであるTableShiftを導入する。 TableShiftには15のバイナリ分類タスクがあり、それぞれに関連するシフトがあり、さまざまなデータソース、予測ターゲット、分散シフトが含まれている。このベンチマークは、ファイナンス、教育、公共政策、医療、市民参加を含むドメインをカバーしており、TableShift API経由でわずか数行のPythonコードでアクセスできる。ベンチマークタスクにおける頑健な学習法とドメイン一般化法とともに、最先端の表型データモデルを比較した大規模な研究を行う。本研究は,(1)分布内(ID)と分布外(OOD)の精度の線形傾向,(2)ドメインの堅牢性はシフトギャップを低減できるが,IDの精度の低減は可能であること,(3)シフトギャップ(IDとOODのパフォーマンスの差)とラベル分布のシフトとの強い関係を示す。ベンチマークデータ、pythonパッケージ、モデル実装、およびtableshiftに関するさらなる情報は、https://github.com/mlfoundations/tableshiftおよびhttps://tableshift.orgで入手できる。 Robustness to distribution shift has become a growing concern for text and image models as they transition from research subjects to deployment in the real world. However, high-quality benchmarks for distribution shift in tabular machine learning tasks are still lacking despite the widespread real-world use of tabular data and differences in the models used for tabular data in comparison to text and images. As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. TableShift contains 15 binary classification tasks in total, each with an associated shift, and includes a diverse set of data sources, prediction targets, and distribution shifts. The benchmark covers domains including finance, education, public policy, healthcare, and civic participation, and is accessible using only a few lines of Python code via the TableShift API. We conduct a large-scale study comparing several state-of-the-art tabular data models alongside robust learning and domain generalization methods on the benchmark tasks. Our study demonstrates (1) a linear trend between in-distribution (ID) and out-of-distribution (OOD) accuracy; (2) domain robustness methods can reduce shift gaps but at the cost of reduced ID accuracy; (3) a strong relationship between shift gap (difference between ID and OOD performance) and shifts in the label distribution. The benchmark data, Python package, model implementations, and more information about TableShift are available at https://github.com/mlfoundations/tableshift and https://tableshift.org .	翻訳日:2023-12-14 18:10:34 公開日:2023-12-10
# アラビア文字テキストラインデータセット Arabic Handwritten Text Line Dataset ( http://arxiv.org/abs/2312.07573v1 ) ライセンス: Link先を確認	Hakim Bouchal and Ahror Belaid	(参考訳) アラビア語の写本をテキストや単語の行に分割することは、認識システムをより効率的かつ正確にするための重要なステップである。テキスト行へのセグメンテーションの問題は、このタスク専用の注釈付きデータセットがあるため解決される。しかし、私たちの知る限りでは、アラビア語のテキストの位置を示すデータセットは存在しない。本稿では,単語レベルでの位置をアノテートする歴史的アラビア語文字用に特別に設計された新しいデータセットを提案する。 Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.	翻訳日:2023-12-14 18:09:49 公開日:2023-12-10
# 視覚障害者の屋外障害物検出に向けたYOLOモデルの検討 Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People ( http://arxiv.org/abs/2312.07571v1 ) ライセンス: Link先を確認	Chenhao He and Pramit Saha	(参考訳) 深層学習に基づく物体検出の利用は、視覚障害者の障害回避を支援する効果的なアプローチである。本稿では,7種類のYOLOオブジェクト検出モデルであるtextit{viz}を実装した。 YOLO-NAS, YOLO-NAS (小, 中, 大), YOLOv8, YOLOv7, YOLOv6, YOLOv5は, 慎重に調整したハイパーパラメータを用いて包括的評価を行い, 道路や歩道で提示される日常的物体を含む画像に対して, これらのモデルがどのように実行されたかを分析した。系統的な調査の後、YOLOv8は最高のモデルであることが判明し、この分野の研究者が収集した画像とともにVOCデータセット、COCOデータセット、TT100Kデータセットの画像を含む、よく知られたObstacle Datasetの80\%$と68.2\%$のリコールに達した。 YOLO-NASは最新のモデルであり、他の多くのアプリケーションで優れた性能を示すが、障害物検出タスクには最適であることがわかった。 The utilization of deep learning-based object detection is an effective approach to assist visually impaired individuals in avoiding obstacles. In this paper, we implemented seven different YOLO object detection models \textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and YOLOv5 and performed comprehensive evaluation with carefully tuned hyperparameters, to analyze how these models performed on images containing common daily-life objects presented on roads and sidewalks. After a systematic investigation, YOLOv8 was found to be the best model, which reached a precision of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which includes images from VOC dataset, COCO dataset, and TT100K dataset along with images collected by the researchers in the field. Despite being the latest model and demonstrating better performance in many other applications, YOLO-NAS was found to be suboptimal for the obstacle detection task.	翻訳日:2023-12-14 18:09:13 公開日:2023-12-10
# 符号化ストレージシステムからの量子プライベート情報検索 Quantum Private Information Retrieval from Coded Storage Systems ( http://arxiv.org/abs/2312.07570v1 ) ライセンス: Link先を確認	Matteo Allaix	(参考訳) 広範なデータ成長の時代には、データストレージシステム(dss)のような膨大なデジタル情報を保管し管理するために、堅牢で効率的なメカニズムが必要となる。同時に、プライバシに関する懸念が生じ、プライバシを保護しながらデータアクセスを可能にするPrivate Information Retrieval(PIR)のようなテクニックの開発につながっている。 PIRプロトコルは、ユーザがクエリやアクセスしているデータの詳細を明らかにすることなく、データベースから情報を取得することを可能にする。量子コンピューティングの出現により、研究者は情報検索におけるプライバシーを高めるために量子システムを使用する可能性を探った。量子プライベート情報検索 (Quantum Private Information Retrieval, QPIR) プロトコルでは、複数のサーバから量子システムをダウンロードしてデータベースから情報を取得すると同時に、アクセスされている特定の情報に対してサーバが邪魔にならないことを保証する。このシナリオは、量子システムの固有の特性を活用して、古典的なPIRプロトコルと比較して、プライバシー保証の強化と通信速度の改善を提供する。この論文では、クエリと符号化ストレージシステムが古典的であり、サーバからの応答が量子的であるqpirの設定を検討する。この問題はsongらによって複製保存と異なる結束パターンで処理された。この論文は、既知の古典的PIRプロトコルと量子通信アルゴリズムを組み合わせることで、符号化ストレージ用のQPIRプロトコルを開発し、プライバシーと通信コストを向上することを目的としている。我々は、異なる記憶符号と堅牢性仮定を検討し、達成された通信コストが常に古典的通信コストよりも低いことを証明した。 In the era of extensive data growth, robust and efficient mechanisms are needed to store and manage vast amounts of digital information, such as Data Storage Systems (DSSs). Concurrently, privacy concerns have arisen, leading to the development of techniques like Private Information Retrieval (PIR) to enable data access while preserving privacy. A PIR protocol allows users to retrieve information from a database without revealing the specifics of their query or the data they are accessing. With the advent of quantum computing, researchers have explored the potential of using quantum systems to enhance privacy in information retrieval. In a Quantum Private Information Retrieval (QPIR) protocol, a user can retrieve information from a database by downloading quantum systems from multiple servers, while ensuring that the servers remain oblivious to the specific information being accessed. This scenario offers a unique advantage by leveraging the inherent properties of quantum systems to provide enhanced privacy guarantees and improved communication rates compared to classical PIR protocols. In this thesis we consider the QPIR setting where the queries and the coded storage systems are classical, while the responses from the servers are quantum. This problem was treated by Song et al. for replicated storage and different collusion patterns. This thesis aims to develop QPIR protocols for coded storage by combining known classical PIR protocols with quantum communication algorithms, achieving enhanced privacy and communication costs. We consider different storage codes and robustness assumptions, and we prove that the achieved communication cost is always lower than the classical counterparts.	翻訳日:2023-12-14 18:08:38 公開日:2023-12-10
# 単眼深度推定のためのニューラルネットワーク構造の一般性に関する研究 A Study on the Generality of Neural Network Structures for Monocular Depth Estimation ( http://arxiv.org/abs/2301.03169v3 ) ライセンス: Link先を確認	Jinwoo Bae and Kyumin Hwang and Sunghoon Im	(参考訳) 単眼深度推定は広く研究されており、近年は性能が大幅に向上している。しかしながら、KITTIデータセットのようないくつかのベンチマークデータセットで以前の研究が評価されており、いずれの論文も単眼深度推定の一般化性能の詳細な分析を提供していない。本稿では,単眼深度推定の一般化に向けて,様々なバックボーンネットワーク(cnnやトランスフォーマモデルなど)について深く検討する。まず,ネットワークトレーニング中に一度も見られなかった分布内および分布外両方のモデルを評価する。次に,合成テクスチャシフトデータセットを用いて,cnn/トランスフォーマモデル中間層からの表現の内部特性について検討する。広範な実験により,トランスフォーマーは強いテクスチャバイアスを持つCNNよりも強い形状バイアスを示すことが明らかとなった。また,テクスチャバイアスモデルでは,形状バイアスモデルよりも単眼深度推定の一般化性能が劣ることがわかった。我々は、様々な環境下でキャプチャされた実世界の運転データセットで、同様の側面が観察されることを示した。最後に,現代の戦略に活用される各種バックボーンネットワークを用いた高密度アブレーション研究を行った。実験により, cnnの固有局所性とトランスフォーマーの自己付着がテクスチャバイアスと形状バイアスをそれぞれ引き起こすことが示された。 Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone networks (e.g.CNN and Transformer models) toward the generalization of monocular depth estimation. First, we evaluate state-of-the-art models on both in-distribution and out-of-distribution datasets, which have never been seen during network training. Then, we investigate the internal properties of the representations from the intermediate layers of CNN-/Transformer-based models using synthetic texture-shifted datasets. Through extensive experiments, we observe that the Transformers exhibit a strong shape-bias rather than CNNs, which have a strong texture-bias. We also discover that texture-biased models exhibit worse generalization performance for monocular depth estimation than shape-biased models. We demonstrate that similar aspects are observed in real-world driving datasets captured under diverse environments. Lastly, we conduct a dense ablation study with various backbone networks which are utilized in modern strategies. The experiments demonstrate that the intrinsic locality of the CNNs and the self-attention of the Transformers induce texture-bias and shape-bias, respectively.	翻訳日:2023-12-13 20:53:10 公開日:2023-12-10
# 安全であるべき: 分子設計のための新しい枠組み Gotta be SAFE: A New Framework for Molecular Design ( http://arxiv.org/abs/2310.10773v2 ) ライセンス: Link先を確認	Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan S.C Lim, Prudencio Tossou	(参考訳) SMILESのような伝統的な分子文字列表現は、しばしばAI駆動の分子設計に挑戦する。この問題に対処するため,我々は化学構造のための新しい線記法であるシーケンシャルアタッチメントに基づくフラグメント埋め込み(safe)を導入する。 SAFEはSMILES文字列を、既存のSMILESパーサとの互換性を維持しながら、相互接続された断片ブロックの順序のないシーケンスとして再定義する。足場装飾、フラグメントリンク、ポリマー生成、足場ホッピングなどの複雑な生成タスクを合理化し、フラグメント制約設計の自己回帰生成を容易にし、複雑なデコードやグラフベースモデルの必要性をなくす。我々は,110億のSAFE表現を含むデータセット上で,8700万パラメータのGPT2ライクなモデルをトレーニングすることにより,SAFEの有効性を示す。対象とする実験により,我々のSAFE-GPTモデルは多目的かつ堅牢な最適化性能を示すことを示す。 SAFEは、様々な制約の下で化学空間を迅速に探索するための新しい道を開き、AI駆動の分子設計のブレークスルーを約束する。 Traditional molecular string representations, such as SMILES, often pose challenges for AI-driven molecular design due to their non-sequential depiction of molecular substructures. To address this issue, we introduce Sequential Attachment-based Fragment Embedding (SAFE), a novel line notation for chemical structures. SAFE reimagines SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. It streamlines complex generative tasks, including scaffold decoration, fragment linking, polymer generation, and scaffold hopping, while facilitating autoregressive generation for fragment-constrained design, thereby eliminating the need for intricate decoding or graph-based models. We demonstrate the effectiveness of SAFE by training an 87-million-parameter GPT2-like model on a dataset containing 1.1 billion SAFE representations. Through targeted experimentation, we show that our SAFE-GPT model exhibits versatile and robust optimization performance. SAFE opens up new avenues for the rapid exploration of chemical space under various constraints, promising breakthroughs in AI-driven molecular design.	翻訳日:2023-12-13 19:32:51 公開日:2023-12-10
# LLMGA:マルチモーダル大言語モデルに基づく生成アシスタント LLMGA: Multimodal Large Language Model based Generation Assistant ( http://arxiv.org/abs/2311.16500v2 ) ライセンス: Link先を確認	Bin Xia, Shiyin Wang, Yingfan Tao, Yitong Wang, and Jiaya Jia	(参考訳) 本稿では,LLMGA(Large Language Model-based Generation Assistant)を紹介し,画像生成と編集を支援するために,LLM(Large Language Models)に固有の推論,理解,応答の膨大な知識と熟練度を活用する。 MLLM(Multimodal Large Language Models)が安定拡散(SD)を制御するための固定サイズ埋め込みを生成する既存のアプローチから切り離され、LSMGAはSDを正確に制御するための詳細な言語生成プロンプトを提供する。これは、llmのコンテキスト理解を増強するだけでなく、生成プロンプトのノイズを低減し、より複雑で正確なコンテンツを持つ画像を生成し、ネットワークの解釈可能性を高める。この目的のために、即時改善、類似画像生成、$\&$のアウトペイント、視覚的質問応答を含む包括的なデータセットをキュレートする。さらに,二段階訓練方式を提案する。第1段階では、画像生成と編集の特性を把握できるようにMLLMを訓練し、詳細なプロンプトを生成する。第2段階では、SDを最適化してMLLMの生成プロンプトに合わせる。また,画像編集中に生成領域と保存領域のテクスチャ,輝度,コントラストの差異を緩和する参照ベース復元ネットワークを提案する。その結果, LLMGA は有望な生成能力を有し, 対話的手法で広範囲のアプリケーションを実現することができた。 In this paper, we introduce a Multimodal Large Language Model-based Generation Assistant (LLMGA), leveraging the vast reservoir of knowledge and proficiency in reasoning, comprehension, and response inherent in Large Language Models (LLMs) to assist users in image generation and editing. Diverging from existing approaches where Multimodal Large Language Models (MLLMs) generate fixed-size embeddings to control Stable Diffusion (SD), our LLMGA provides a detailed language generation prompt for precise control over SD. This not only augments LLM context understanding but also reduces noise in generation prompts, yields images with more intricate and precise content, and elevates the interpretability of the network. To this end, we curate a comprehensive dataset comprising prompt refinement, similar image generation, inpainting $\&$ outpainting, and visual question answering. Moreover, we propose a two-stage training scheme. In the first stage, we train the MLLM to grasp the properties of image generation and editing, enabling it to generate detailed prompts. In the second stage, we optimize SD to align with the MLLM's generation prompts. Additionally, we propose a reference-based restoration network to alleviate texture, brightness, and contrast disparities between generated and preserved regions during image editing. Extensive results show that LLMGA has promising generative capabilities and can enable wider applications in an interactive manner.	翻訳日:2023-12-13 19:23:29 公開日:2023-12-10
# ビデオ言語共同学習における弱教師付き文成分分析のための生成言語モデルの活用 Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning ( http://arxiv.org/abs/2312.06699v1 ) ライセンス: Link先を確認	Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Rahul Pratap Singh, Bishmoy Paul, Ali Dabouei, Min Xu	(参考訳) テキストデータの徹底的な理解は、マルチモーダルビデオ分析タスクの基本的な要素である。しかし、近年の研究では、現在のモデルでは、目標下流タスクのトレーニング中にテキストデータの包括的理解が得られていないことが示されている。この制限に対する以前のアプローチと直交して、対象タスクに応じた文コンポーネントの重要性を理解することで、モデルの性能が向上する可能性があると仮定する。そこで我々は,事前学習された大規模言語モデル (LLM) の知識を利用して,原文からテキストサンプルを生成する。本稿では,コンポーネントの相対的重要度を計算し,異なる映像言語タスクを改善するために,弱教師付き重要度推定モジュールを提案する。厳密な定量的解析により,提案手法は複数の映像言語タスクにおいて有意な改善を示す。特に,本手法は,ベースライン上での8.3\% と 1.4\% の相対的改善により,r@1 の観点でビデオテキスト検索を顕著に向上させる。さらに、ビデオモーメント検索では、平均的なmAPは、異なるベースラインにわたる2.0\%から13.7 \%までの相対的な改善を示している。 A thorough comprehension of textual data is a fundamental element in multi-modal video analysis tasks. However, recent works have shown that the current models do not achieve a comprehensive understanding of the textual data during the training for the target downstream tasks. Orthogonal to the previous approaches to this limitation, we postulate that understanding the significance of the sentence components according to the target task can potentially enhance the performance of the models. Hence, we utilize the knowledge of a pre-trained large language model (LLM) to generate text samples from the original ones, targeting specific sentence components. We propose a weakly supervised importance estimation module to compute the relative importance of the components and utilize them to improve different video-language tasks. Through rigorous quantitative analysis, our proposed method exhibits significant improvement across several video-language tasks. In particular, our approach notably enhances video-text retrieval by a relative improvement of 8.3\% in video-to-text and 1.4\% in text-to-video retrieval over the baselines, in terms of R@1. Additionally, in video moment retrieval, average mAP shows a relative improvement ranging from 2.0\% to 13.7 \% across different baselines.	翻訳日:2023-12-13 18:59:56 公開日:2023-12-10
# tetrirf: 自由視点ビデオのための時間的三面放射場 TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video ( http://arxiv.org/abs/2312.06713v1 ) ライセンス: Link先を確認	Minye Wu, Zehao Wang, Georgios Kouros, Tinne Tuytelaars	(参考訳) neural radiance fields (nerf)は、フォトリアリスティックなfvv(free-viewpoint video)体験を提供することで、視覚メディアの領域に革命をもたらす。しかし、この技術の重要なストレージ要件と生成とレンダリングに関わる計算の複雑さは、現在、幅広いアプリケーションを制限する。このギャップを埋めるために,本稿では,fvv(free-viewpoint video)のストレージサイズを大幅に削減する新しい技術であるtemporal tri-plane radiance fields (tetrirf)を提案する。 TeTriRFは三面体とボクセルグリッドとのハイブリッド表現を導入し、複雑な動きや急激な変化を伴う長い順列やシーンのスケーリングをサポートする。本研究では,高いトレーニング効率を実現し,時間的に一貫した低エントロピーシーン表現を実現するグループトレーニング手法を提案する。これらの表現の特性を活かして,既製のビデオコーデックを用いた圧縮パイプラインを導入し,最先端のものに比べてストレージサイズを桁違いに削減した。実験により,TeTriRFは高い圧縮速度で競争性が得られることを示した。 Neural Radiance Fields (NeRF) revolutionize the realm of visual media by providing photorealistic Free-Viewpoint Video (FVV) experiences, offering viewers unparalleled immersion and interactivity. However, the technology's significant storage requirements and the computational complexity involved in generation and rendering currently limit its broader application. To close this gap, this paper presents Temporal Tri-Plane Radiance Fields (TeTriRF), a novel technology that significantly reduces the storage size for Free-Viewpoint Video (FVV) while maintaining low-cost generation and rendering. TeTriRF introduces a hybrid representation with tri-planes and voxel grids to support scaling up to long-duration sequences and scenes with complex motions or rapid changes. We propose a group training scheme tailored to achieving high training efficiency and yielding temporally consistent, low-entropy scene representations. Leveraging these properties of the representations, we introduce a compression pipeline with off-the-shelf video codecs, achieving an order of magnitude less storage size compared to the state-of-the-art. Our experiments demonstrate that TeTriRF can achieve competitive quality with a higher compression rate.	翻訳日:2023-12-13 18:48:47 公開日:2023-12-10
# 分離エンハンス:Text2画像拡散モデルのための合成ファインタニング Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models ( http://arxiv.org/abs/2312.06712v1 ) ライセンス: Link先を確認	Zhipeng Bao and Yijun Li and Krishna Kumar Singh and Yu-Xiong Wang and Martial Hebert	(参考訳) 拡散ベースのテキスト・ツー・イメージ(T2I)モデルによって達成された最近の顕著な進歩にもかかわらず、現在のシステムはテキストプロンプト、特にマルチオブジェクト・ジェネレーションの適切な構成生成を保証する能力は依然として低い。この研究は、注意力の低いアクティベーションスコアとマスクオーバーラップに関連する問題を指摘し、このような不一致の根本的な理由を照らしている。これまでの研究はこれらの問題に個別に取り組んできたが、総合的なアプローチが最重要であると断言する。そこで本稿では,物体マスクの重なりを減らし,注目度を最大化する2つの新しい目的,分離損失とエンハンス損失を提案する。本手法は,従来のテスト時間適応手法と異なり,限界パラメータの微調整に焦点を合わせ,スケーラビリティと一般化性を高める。総合的な評価は,画像リアリズム,テキスト・画像アライメント,適応性,特に著明なベースラインよりも優れた性能を示す。本研究は,T2I拡散モデルにおいて,合成能力の向上と適用性の向上を図っている。プロジェクトwebページはhttps://zpbao.github.io/projects/sepen/。 Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. While previous research efforts have individually tackled these issues, we assert that a holistic approach is paramount. Thus, we propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability. Comprehensive evaluations demonstrate the superior performance of our model in terms of image realism, text-image alignment, and adaptability, notably outperforming prominent baselines. Ultimately, this research paves the way for T2I diffusion models with enhanced compositional capacities and broader applicability. The project webpage is available at https://zpbao.github.io/projects/SepEn/.	翻訳日:2023-12-13 18:48:25 公開日:2023-12-10
# 物理インフォームドニューラルネットワークによるオプション価格設定 Physics Informed Neural Network for Option Pricing ( http://arxiv.org/abs/2312.06711v1 ) ライセンス: Link先を確認	Ashish Dhiman and Yibei Hu	(参考訳) 物理インフォームド・ディープラーニングのアプローチをブラック・スコールズ方程式に適用し、アメリカとヨーロッパのオプションの価格設定を行う。我々は、シミュレーションと実市場データの両方でアプローチを検証し、分析/数値ベンチマークと比較した。本モデルは,市場データに対して適切な性能を示しながら,シミュレーションデータ上での価格変動を正確に把握することができる。 PINNモデルのアーキテクチャと学習プロセスについても実験を行い、パフォーマンスに影響を与える収束性や安定性の問題をより深く理解する。 We apply a physics-informed deep-learning approach the PINN approach to the Black-Scholes equation for pricing American and European options. We test our approach on both simulated as well as real market data, compare it to analytical/numerical benchmarks. Our model is able to accurately capture the price behaviour on simulation data, while also exhibiting reasonable performance for market data. We also experiment with the architecture and learning process of our PINN model to provide more understanding of convergence and stability issues that impact performance.	翻訳日:2023-12-13 18:48:05 公開日:2023-12-10
# 生成再生を伴う連続学習のためのクラスプロトタイプ条件拡散モデル Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay ( http://arxiv.org/abs/2312.06710v1 ) ライセンス: Link先を確認	Khanh Doan, Quyen Tran, Tuan Nguyen, Dinh Phung, Trung Le	(参考訳) 破滅的な忘れを緩和することは、継続的な学習において重要なハードルである。 Deep Generative Replay (GR)は、モデルのメモリ能力を向上するために、以前のタスクからサンプルを生成する技術を提供する。生成型aiの進歩に伴い、生成型モデルは生成型逆ネットワーク(gans)からより最近の拡散モデル(dms)へと進化してきた。主な問題は、ジェネレータが出力から継続的に自己学習するため、生成データの品質がオリジナルと比較して低下することである。この劣化は、分類器で起こる壊滅的な忘れの潜在的なリスクにつながる可能性がある。そこで本研究では,連続学習のためのクラスプロトタイプ条件拡散モデル(CPDM, Class-Prototype Conditional Diffusion Model)を提案する。 CPDMの基礎は学習可能なクラスプロトタイプであり、与えられたクラスの画像のコア特性をキャプチャする。このプロトタイプは拡散モデルの復調プロセスに統合され、高品質な画像の生成を保証する。新たなタスクが導入されても古いタスクの有効性を維持し、画像生成の品質を保ち、分類器における破滅的な忘れ込みのリスクを低減する。多様なデータセットに関する実証研究により,提案手法が既存の最先端モデルを大幅に上回っており,画像品質を保ち,メモリ保持能力を向上させる能力に特筆すべき点が示された。 Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities. With the progression in generative AI, generative models have advanced from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated data compared to the original, as the generator continuously self-learns from its outputs. This degradation can lead to the potential risk of catastrophic forgetting occurring in the classifier. To address this, we propose the Class-Prototype Conditional Diffusion Model (CPDM), a GR-based approach for continual learning that enhances image quality in generators and thus reduces catastrophic forgetting in classifiers. The cornerstone of CPDM is a learnable class-prototype that captures the core characteristics of images in a given class. This prototype, integrated into the diffusion model's denoising process, ensures the generation of high-quality images. It maintains its effectiveness for old tasks even when new tasks are introduced, preserving image generation quality and reducing the risk of catastrophic forgetting in classifiers. Our empirical studies on diverse datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art models, highlighting its exceptional ability to preserve image quality and enhance the model's memory retention.	翻訳日:2023-12-13 18:47:56 公開日:2023-12-10
# AM-RADIO: 集約モデル - すべてのドメインをひとつに AM-RADIO: Agglomerative Model -- Reduce All Domains Into One ( http://arxiv.org/abs/2312.06709v1 ) ライセンス: Link先を確認	Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov	(参考訳) いくつかのビジュアルファンデーションモデル(VFM)が最近、下流タスクのバックボーンとして登場した。 CLIP、DINOv2、SAMなどのVFMは、異なる目的でトレーニングされており、さまざまな下流タスクに固有の特性を示している。概念的相違にもかかわらず、これらのモデルはマルチティーチンガー蒸留により効果的に統一モデルにマージ可能である。このアプローチをAM-RADIO(Agglomerative Model -- Reduce All Domains Into One)と呼ぶ。この統合的アプローチは、個々の教師モデルのパフォーマンスを超えるだけでなく、ゼロショット視覚言語理解、詳細なピクセルレベルの理解、オープンボキャブラリセグメンテーション機能などの特徴を融合させる。最もハードウェア効率のよいバックボーンを追求するため、同じトレーニングレシピを用いてマルチティーチンガー蒸留パイプラインの多数のアーキテクチャを評価した。これは、前任者の性能を超え、教師モデルよりも少なくとも7倍高速な新しいアーキテクチャ(E-RADIO)の開発につながった。包括的なベンチマークプロセスは、ImageNet分類、ADE20kセマンティックセグメンテーション、COCOオブジェクト検出、LLaVa-1.5フレームワークなどの下流タスクをカバーする。コード: https://github.com/nvlabs/radio A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name this approach AM-RADIO (Agglomerative Model -- Reduce All Domains Into One). This integrative approach not only surpasses the performance of individual teacher models but also amalgamates their distinctive features, such as zero-shot vision-language comprehension, detailed pixel-level understanding, and open vocabulary segmentation capabilities. In pursuit of the most hardware-efficient backbone, we evaluated numerous architectures in our multi-teacher distillation pipeline using the same training recipe. This led to the development of a novel architecture (E-RADIO) that exceeds the performance of its predecessors and is at least 7x faster than the teacher models. Our comprehensive benchmarking process covers downstream tasks including ImageNet classification, ADE20k semantic segmentation, COCO object detection and LLaVa-1.5 framework. Code: https://github.com/NVlabs/RADIO	翻訳日:2023-12-13 18:47:32 公開日:2023-12-10
# 拡散型映像編集のための中性編集フレームワーク Neutral Editing Framework for Diffusion-based Video Editing ( http://arxiv.org/abs/2312.06708v1 ) ライセンス: Link先を確認	Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong, Chang D. Yoo	(参考訳) テキスト条件付き画像編集は拡散フレームワークに基づく様々な種類の編集に成功している。残念なことに、この成功はビデオに受け継がれず、今も挑戦を続けている。既存のビデオ編集システムはまだスタイル転送やオブジェクトオーバーレイのような剛体型編集に限られている。そこで本稿では,映像中の人物・物体の動きを変えることによって,複雑な非剛性編集を可能にするニュートラル編集(NeuEdit)フレームワークを提案する。 NeuEditは「ニュートラライゼーション」という概念を導入し、他の補助補助具(例えば、視覚マスク、ビデオキャプション)を使わずに、入力ビデオとテキストをモデルに依存しない方法で拡散ベースの編集システムのチューニング編集プロセスを強化する。多数のビデオに対する大規模な実験は、NeuEditフレームワークの適応性と有効性を示している。私たちの仕事のwebサイトはここで入手できる。 https://neuedit.github.io Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing by changing the motion of a person/object in a video, which has never been attempted before. NeuEdit introduces a concept of `neutralization' that enhances a tuning-editing process of diffusion-based editing systems in a model-agnostic manner by leveraging input video and text without any other auxiliary aids (e.g., visual masks, video captions). Extensive experiments on numerous videos demonstrate adaptability and effectiveness of the NeuEdit framework. The website of our work is available here: https://neuedit.github.io	翻訳日:2023-12-13 18:47:12 公開日:2023-12-10
# 安全・映像サーベイランス技術に対する市民の認識を探る:調査によるアプローチ Exploring Public's Perception of Safety and Video Surveillance Technology: A Survey Approach ( http://arxiv.org/abs/2312.06707v1 ) ライセンス: Link先を確認	Babak Rahimi Ardabili, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, Gordon Hull, Shannon Reid, Hamed Tabkhi	(参考訳) 公共の安全に取り組むには、様々な利害関係者の視点、特に他の利害関係者に比べて過小評価されるコミュニティの視点を効果的に取り入れる必要がある。本研究は,コミュニティの一般公衆安全に関する懸念,既存の監視技術に対する見解,および都市環境における安全向上のためのAI駆動型ソリューションに対する認識の包括的分析である。 2023年8月と9月に410人の参加者による対人調査を含む調査アプローチを通じて、年齢、性別、民族、教育レベルなどの要因を調査し、公衆の認識と公衆の安全と可能な解決策に対する懸念について考察する。従属変数の種類に基づき,ロジット回帰や順序ロジスティック回帰といった異なる統計学的および意義分析を用いて,各種従属変数に対する人口統計因子の影響について検討した。以上の結果から,公共安全問題における人口統計学的差異が明らかになった。若い女性は、既存のビデオ監視システムへの信頼が弱くなりがちだが、高齢の教育を受けた人はモールでの暴力犯罪に関心がある。さらに、AIによる監視に対する態度は異なる。年長の黒人個人はデータのプライバシーに懸念があるにもかかわらず、それを支持している。 Addressing public safety effectively requires incorporating diverse stakeholder perspectives, particularly those of the community, which are often underrepresented compared to other stakeholders. This study presents a comprehensive analysis of the community's general public safety concerns, their view of existing surveillance technologies, and their perception of AI-driven solutions for enhancing safety in urban environments, focusing on Charlotte, NC. Through a survey approach, including in-person surveys conducted in August and September 2023 with 410 participants, this research investigates demographic factors such as age, gender, ethnicity, and educational level to gain insights into public perception and concerns toward public safety and possible solutions. Based on the type of dependent variables, we utilized different statistical and significance analyses, such as logit regression and ordinal logistic regression, to explore the effects of demographic factors on the various dependent variables. Our results reveal demographic differences in public safety concerns. Younger females tend to feel less secure yet trust existing video surveillance systems, whereas older, educated individuals are more concerned about violent crimes in malls. Additionally, attitudes towards AI-driven surveillance differ: older Black individuals demonstrate support for it despite having concerns about data privacy, while educated females show a tendency towards skepticism.	翻訳日:2023-12-13 18:46:57 公開日:2023-12-10
# UNeR3D: 教師なし再構成における2次元画像からの可変かつスケーラブルな3D RGBポイントクラウド生成 UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction ( http://arxiv.org/abs/2312.06706v1 ) ライセンス: Link先を確認	Hongbin Lin, Juangui Xu, Qingfeng Xu, Zhengyu Hu, Handing Xu, Yunzhi Chen, Yongjun Hu, Zhenguo Nie	(参考訳) 2次元画像からの3次元再構成の領域では、3次元地上真実データに依存しない高精度な再構成を実現することが課題である。 UNeR3Dは、2次元ビューのみから詳細な3次元再構成を生成するための新しい標準を定めている。我々のモデルは、教師付きアプローチに関連するトレーニングコストを大幅に削減し、3DポイントクラウドにRGBカラー化を導入し、視覚的体験を豊かにする。色レンダリングに逆距離重み付け技術を用いることで、UNeR3Dはシームレスな色遷移を保証し、視覚的忠実度を高める。私たちのモデルの柔軟なアーキテクチャは、任意の数のビューでトレーニングをサポートします。推論中に任意のビュー数を推測し、並行しない汎用性を提供する。さらに、モデルの連続的な空間入力領域は任意の解像度で点雲を生成することができ、高解像度の3D RGB点雲を作成することができる。我々は,新しい多視点幾何学的損失と色損失により再構成過程を固め,このモデルが単視点入力に優れていることを示し,教師なし学習のパラダイムを3次元視覚で再構築する。私たちのコントリビューションは、3dビジョンの大幅な進歩を示し、さまざまなアプリケーションでコンテンツを作成するための新たな地平線を提供します。コードはhttps://github.com/HongbinLin3589/UNeR3Dで入手できる。 In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus reshaping the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D.	翻訳日:2023-12-13 18:46:33 公開日:2023-12-10
# Google Appのレビューから大学生の意見を理解する Perceiving University Student's Opinions from Google App Reviews ( http://arxiv.org/abs/2312.06705v1 ) ライセンス: Link先を確認	Sakshi Ranjan, Subhankar Mishra	(参考訳) google app marketは、世界中のあらゆる地域から、格付けやテキストレビューを通じて、多言語的な分野でユーザーの考えを捉えている。指数的な成長のため、レビューから潜在的情報を手作業で抽出することはできない。そこで、NLPを用いた機械学習とディープラーニングアルゴリズムによる感性分析は、感情を明示的に解明し、解釈する。本研究は,アプリレビューの感情分類を行い,探索的分析により大学生のアプリ市場に対する行動を特定する。本研究では, TP, TF, TF IDFテキスト表現方式を用いて機械学習アルゴリズムを適用し, アンサンブル学習手法であるBaggingの性能評価を行った。ディープラーニングのパラダイムでは、単語埋め込み、グローブを使いました。私たちのモデルはGoogleのアプリレビューでトレーニングされ、学生のApp Reviews(SAR)でテストされました。これらのアルゴリズムの様々な組み合わせをFスコアを用いて比較し、精度と推測をグラフィカルに強調した。 SVMは他の分類器の中でも実りのある精度(93.41%)、ビッグラムのFスコア(89%)、TF IDFスキームのFスコア(89%)を与えた。バグングはlrとnbの性能を87.88%、86.69%、fスコアを86%、78%で向上させた。総合的に、Gloveの埋め込みにおけるLSTMは最高精度(95.2%)とFスコア(88%)を記録した。 Google app market captures the school of thought of users from every corner of the globe via ratings and text reviews, in a multilinguistic arena. The potential information from the reviews cannot be extracted manually, due to its exponential growth. So, Sentiment analysis, by machine learning and deep learning algorithms employing NLP, explicitly uncovers and interprets the emotions. This study performs the sentiment classification of the app reviews and identifies the university student's behavior towards the app market via exploratory analysis. We applied machine learning algorithms using the TP, TF, and TF IDF text representation scheme and evaluated its performance on Bagging, an ensemble learning method. We used word embedding, Glove, on the deep learning paradigms. Our model was trained on Google app reviews and tested on Student's App Reviews(SAR). The various combinations of these algorithms were compared amongst each other using F score and accuracy and inferences were highlighted graphically. SVM, amongst other classifiers, gave fruitful accuracy(93.41%), F score(89%) on bigram and TF IDF scheme. Bagging enhanced the performance of LR and NB with accuracy of 87.88% and 86.69% and F score of 86% and 78% respectively. Overall, LSTM on Glove embedding recorded the highest accuracy(95.2%) and F score(88%).	翻訳日:2023-12-13 18:46:07 公開日:2023-12-10
# SIFU:現実世界で使用可能な衣服再構築のためのサイドビューコンディショニングインシシシット機能 SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction ( http://arxiv.org/abs/2312.06704v1 ) ライセンス: Link先を確認	Zechuan Zhang, Zongxin Yang, Yi Yang	(参考訳) 現実世界の応用のために、単一の画像から高品質な人間の3Dモデルを作成することが重要である。近年の進歩にも拘わらず、複雑なポーズや被写体画像からのゆるい衣服の正確な復元や、見えない領域のテクスチャの予測は依然として大きな課題となっている。従来の手法の重要な制限は、2Dから3Dへの遷移やテクスチャ予測における事前ガイダンスの不足である。これに対し, SIFU (Side-view Conditioned Implicit Function for Real-world Usable Human Reconstruction) は, 2次元特徴を3次元にマッピングする過程で, SMPL-X正規化をクエリとして, トランスフォーマ内でのクロスアテンション機構を用いて, サイドビューデカップリングトランスフォーマと3次元連続テクスチャリファインメントパイプラインを組み合わせた新しいアプローチである。この手法は3次元モデルの精度を向上するだけでなく、特にSMPL-X推定が完全でない場合には、その堅牢性も向上する。テクスチャリファインメントプロセスはテキストから画像への拡散をベースとして,現実的で一貫したテクスチャを生成する。広範な実験を通じて、sifuは幾何学とテクスチャの再構成の両方においてsota法を超越し、複雑なシナリオにおいて強固性を高め、前例のないシャンファーとp2sの測定を達成した。われわれのアプローチは、3Dプリンティングやシーンビルディングといった実用的応用にまで拡張され、現実世界のシナリオでその幅広い実用性を実証している。プロジェクトページ https://river-zhang.github.io/SIFU-projectpage/。 Creating high-quality 3D models of clothed humans from single images for real-world applications is crucial. Despite recent advancements, accurately reconstructing humans in complex poses or with loose clothing from in-the-wild images, along with predicting textures for unseen areas, remains a significant challenge. A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction. In response, we introduce SIFU (Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction), a novel approach combining a Side-view Decoupling Transformer with a 3D Consistent Texture Refinement pipeline.SIFU employs a cross-attention mechanism within the transformer, using SMPL-X normals as queries to effectively decouple side-view features in the process of mapping 2D features to 3D. This method not only improves the precision of the 3D models but also their robustness, especially when SMPL-X estimates are not perfect. Our texture refinement process leverages text-to-image diffusion-based prior to generate realistic and consistent textures for invisible views. Through extensive experiments, SIFU surpasses SOTA methods in both geometry and texture reconstruction, showcasing enhanced robustness in complex scenarios and achieving an unprecedented Chamfer and P2S measurement. Our approach extends to practical applications such as 3D printing and scene building, demonstrating its broad utility in real-world scenarios. Project page https://river-zhang.github.io/SIFU-projectpage/ .	翻訳日:2023-12-13 18:45:46 公開日:2023-12-10
# opensd: 統合オープンボキャブラリーセグメンテーションと検出 OpenSD: Unified Open-Vocabulary Segmentation and Detection ( http://arxiv.org/abs/2312.06703v1 ) ライセンス: Link先を確認	Shuai Li, Minghan Li, Pengfei Wang, Lei Zhang	(参考訳) 近年,汎用セグメンテーションと検出タスクに対処する統一アーキテクチャを用いて,いくつかのオープン語彙法が提案されている。しかし、タスク間の衝突やCLIPの不十分な使用により、オープン語彙能力に制限があるため、タスク固有のモデルにはまだパフォーマンスが遅れている。これらの課題に対処するため,オープンボキャブラリセグメンテーションと検出タスクの処理に同じアーキテクチャとネットワークパラメータを利用する,OpenSDと呼ばれるユニバーサルトランスフォーマーベースのフレームワークを提案する。まず,各タスクを同一の枠組み下でより効果的に学習できるように,モノとスタッフのセマンティックな対立を軽減するためのデコーダ分離学習戦略を導入する。第二に、CLIPをエンドツーエンドのセグメンテーションと検出に活用するために、語彙内ドメインと語彙外ドメインをそれぞれ扱う2つの分類器を提案する。テキストエンコーダはさらに、分離されたプロンプト・ラーニングを通じて、物と物の両方のカテゴリにリージョン対応するように訓練され、エンドツーエンドのセグメンテーションと検出に重要な、重複した低品質の予測をフィルタできる。様々な状況下で複数のデータセットに対して大規模な実験を行う。その結果,OpenSDはクローズド・オープン・ボキャブラリ設定とオープン・ボキャブラリ設定の両方において,最先端のオープン・ボキャブラリセグメンテーションと検出方法よりも優れていた。コードはhttps://github.com/strongwolf/OpenSDで入手できる。 Recently, a few open-vocabulary methods have been proposed by employing a unified architecture to tackle generic segmentation and detection tasks. However, their performance still lags behind the task-specific models due to the conflict between different tasks, and their open-vocabulary capability is limited due to the inadequate use of CLIP. To address these challenges, we present a universal transformer-based framework, abbreviated as OpenSD, which utilizes the same architecture and network parameters to handle open-vocabulary segmentation and detection tasks. First, we introduce a decoder decoupled learning strategy to alleviate the semantic conflict between thing and staff categories so that each individual task can be learned more effectively under the same framework. Second, to better leverage CLIP for end-to-end segmentation and detection, we propose dual classifiers to handle the in-vocabulary domain and out-of-vocabulary domain, respectively. The text encoder is further trained to be region-aware for both thing and stuff categories through decoupled prompt learning, enabling them to filter out duplicated and low-quality predictions, which is important to end-to-end segmentation and detection. Extensive experiments are conducted on multiple datasets under various circumstances. The results demonstrate that OpenSD outperforms state-of-the-art open-vocabulary segmentation and detection methods in both closed- and open-vocabulary settings. Code is available at https://github.com/strongwolf/OpenSD	翻訳日:2023-12-13 18:45:11 公開日:2023-12-10
# 自律走行システムにおける動的対向攻撃 Dynamic Adversarial Attacks on Autonomous Driving Systems ( http://arxiv.org/abs/2312.06701v1 ) ライセンス: Link先を確認	Amirhosein Chahe, Chenan Wang, Abhishek Jeyapratap, Kaidi Xu, Lifeng Zhou	(参考訳) 本稿では,自律運転システムのレジリエンスに挑戦する攻撃機構を提案する。具体的には、他の移動車に搭載された画面に対向パッチを動的に表示することにより、自動運転車の意思決定プロセスを操作する。これらのパッチは、オブジェクト検出モデルを誤分類対象オブジェクト(例えば、交通標識)に騙すように最適化されている。このような操作は、交差点横断や車線変更といった、安全で効率的な自律運転システムにとって不可欠な重要な多車間相互作用に重要な意味を持つ。特に、大きな貢献は4つあります。まず,パッチがターゲットと同一位置にあるのではなく,より汎用的でステルス的な攻撃を可能にする新しい攻撃手法を提案する。さらに,画面上に動的パッチを表示させ,適応的な変化と移動を可能にし,攻撃の柔軟性と性能を向上させる。そこで我々は,画面画像変換ネットワーク(SIT-Net)を設計し,表示画像の環境効果をシミュレートし,シミュレートされたシナリオと実世界のシナリオとのギャップを狭める。さらに、動的攻撃の成功率を高めるために、位置損失項を敵の訓練プロセスに統合する。最後に、焦点を単なる知覚システムへの攻撃から、自動運転システムの意思決定アルゴリズムに移す。我々の実験は、現実の自律運転シナリオにおけるこのような動的敵攻撃の実装を初めて成功させ、堅牢で安全な自律運転の分野における進歩の道を開いたものである。 This paper introduces an attacking mechanism to challenge the resilience of autonomous driving systems. Specifically, we manipulate the decision-making processes of an autonomous vehicle by dynamically displaying adversarial patches on a screen mounted on another moving vehicle. These patches are optimized to deceive the object detection models into misclassifying targeted objects, e.g., traffic signs. Such manipulation has significant implications for critical multi-vehicle interactions such as intersection crossing and lane changing, which are vital for safe and efficient autonomous driving systems. Particularly, we make four major contributions. First, we introduce a novel adversarial attack approach where the patch is not co-located with its target, enabling more versatile and stealthy attacks. Moreover, our method utilizes dynamic patches displayed on a screen, allowing for adaptive changes and movement, enhancing the flexibility and performance of the attack. To do so, we design a Screen Image Transformation Network (SIT-Net), which simulates environmental effects on the displayed images, narrowing the gap between simulated and real-world scenarios. Further, we integrate a positional loss term into the adversarial training process to increase the success rate of the dynamic attack. Finally, we shift the focus from merely attacking perceptual systems to influencing the decision-making algorithms of self-driving systems. Our experiments demonstrate the first successful implementation of such dynamic adversarial attacks in real-world autonomous driving scenarios, paving the way for advancements in the field of robust and secure autonomous driving.	翻訳日:2023-12-13 18:44:27 公開日:2023-12-10
# 集中型、拡張性、構成可能なスコアリングアプリケーションの設計とアーキテクチャ Design and Architecture for a Centralized, Extensible, and Configurable Scoring Application ( http://arxiv.org/abs/2312.06700v1 ) ライセンス: Link先を確認	Sumit Sanwal	(参考訳) 現代の組織では、多くのソフトウェアアプリケーションがアプリケーションのワークフローと承認の次のステップを決定するために重要なインプットを必要とする。アクションの手順を決定する上で最も重要な入力の1つは、アプリケーションで使用されるエンティティのキーパフォーマンスインジケータベースのスコアである。アプリケーション内のエンティティの正しいスコアを計算することは、その後の処理を駆動し、エンティティの次のアクションの手順を正確に決定するのに役立つ重要なステップです。適切なスコアを計算することは、アプリケーション処理にとって重要なパラメータであり、正確なスコアと正しいスコアを導出することは、アプリケーションの意図した目的にとって重要であり、重要である。この記事では、汎用的な拡張可能なスコアリングエンジンの構想と設計、およびスコアリングフレームワークを実装するための関連する複雑さや複雑さとスコアリングするためのいくつかのユースケースについて論じる。 In modern-day organizations, many software applications require critical input to decide the next steps in the application workflow and approval. One of the most important inputs to decide the subsequent course of action is the key performance indicator-based scoring for the entities used in the application. Computing the right score for the entities in the application is a critical step that will drive the subsequent processing and help to decide the next course of action for the entity accurately. Computing the right score is a critical parameter for application processing; deriving the precise and correct score is crucial and pivotal for the application's intended objective; this mandates a very efficient and optimized scoring application in place and is of paramount importance for the success of such applications. We will discuss in this article how to envision and design a generic, extensible scoring engine and a few use cases for scoring with the associated intricacies and complexities to implement the scoring framework.	翻訳日:2023-12-13 18:43:45 公開日:2023-12-10
# グラフェンに基づく電気双極子分子からなる長寿命寿命制御可能なスケーラブル量子ビット The long mean-life-time-controlled and potentially scalable qubits composed of electric dipolar molecules based on graphene ( http://arxiv.org/abs/2103.07263v3 ) ライセンス: Link先を確認	Yong-Yi Huang	(参考訳) 電気双極子分子からなる新しい種類の量子ビットを提案する。外部均一電場中の電気双極子分子は単純な調和振動を受け、その2つの最低エネルギーレベルに属する量子状態は量子ビットの状態 \|0>, \|1> として作用する。量子ビットの励起状態は非常に長く制御された平均寿命は260秒であり、デコヒーレンスはもはや量子計算の障害ではない。量子計算は、中性原子のように電気双極子分子の量子ビットを操作することで行うことができる。量子ビットが量子計算に使用されるとき、双極子モーメントの向きは外部の電場に沿って調和的に振動し、方向を変えることはない:電場に沿って、あるいは電場に対して、量子ビットはグラフェン系で大規模に製造できる。ライドバーグ封鎖の半径は約100nmである。演算量子ビットの数は数百万に達する。 We propose a new kind of qubits composed of electric dipolar molecules. The electric dipolar molecules in an external uniform electric field will take simple harmonic oscillations, whose quantum states belonging to the two lowest energy levels act as the states \|0>, \|1> of a qubit. The qubits' excited states have a very long controlled mean life time about 260 seconds, decoherence is no longer an obstacle in quantum computation. We can perform quantum computations by manipulating the qubits of electric dipolar molecules just like those of neutral atoms. When the qubits are used for quantum computations, the dipolar moments' orientations will harmonically oscillate along an external electric field and they will not change the directions: along or against the electric field, so the qubits can be large-scalely manufactured in graphene system. The radius of Rydberg blockade is about 100nm. The number of operated qubits reach several millions.	翻訳日:2023-12-13 03:55:23 公開日:2023-12-10
# kiefer-wolfowitz法による確率最適化のオンライン統計推論 Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods ( http://arxiv.org/abs/2102.03389v5 ) ライセンス: Link先を確認	Xi Chen, Zehua Lai, He Li, Yichen Zhang	(参考訳) 本稿では,ランダム探索方向を持つkiefer-wolfowitzアルゴリズムを用いて,確率最適化問題におけるモデルパラメータのオンライン統計量推定の問題について検討する。まず, 漸近共分散行列が探索方向の分布と関数-値問合せの複雑性に依存するポリak-ruppert平均型kiefer-wolfowitz (akw) 推定器の漸近分布を示す。分布結果は、統計効率と関数クエリの複雑さのトレードオフを反映している。さらに,ランダム探索方向の選択を解析し,漸近共分散行列のある種の要約統計を最小化する。漸近分布に基づいて,有効信頼区間の2つの構成手順を提供することにより,オンライン統計推定を行う。 This paper investigates the problem of online statistical inference of model parameters in stochastic optimization problems via the Kiefer-Wolfowitz algorithm with random search directions. We first present the asymptotic distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators, whose asymptotic covariance matrices depend on the distribution of search directions and the function-value query complexity. The distributional result reflects the trade-off between statistical efficiency and function query complexity. We further analyze the choice of random search directions to minimize certain summary statistics of the asymptotic covariance matrix. Based on the asymptotic distribution, we conduct online statistical inference by providing two construction procedures of valid confidence intervals.	翻訳日:2023-12-13 03:54:50 公開日:2023-12-10
# PyCSP3: Pythonの組合せ制約問題モデリング PyCSP3: Modeling Combinatorial Constrained Problems in Python ( http://arxiv.org/abs/2009.00326v5 ) ライセンス: Link先を確認	Christophe Lecoutre and Nicolas Szczepanski	(参考訳) この文書では、PythonライブラリであるPyCSP$3$を紹介します。現在、PyCSP$3$で制約満足度と最適化問題のモデルを記述することができる。より具体的には、CSP(Constraint Satisfaction Problem)とCOP(Constraint Optimization Problem)モデルを構築することができる。重要なのは、モデルを書き、XCSP$3$のインスタンス(ファイル)を生成するために(いくつかのデータを提供しながら)それをコンパイルし、制約解決器を使ってその問題を解くことです。また、PyCSP$3$で解決手順を直接パイロットして、インクリメンタルな解決戦略を実行することもできる。このドキュメントでは、50以上のイラストモデルを持つpycsp$3$について知っておくべきことすべてを見つけることができます。 In this document, we introduce PyCSP$3$, a Python library that allows us to write models of combinatorial constrained problems in a declarative manner. Currently, with PyCSP$3$, you can write models of constraint satisfaction and optimization problems. More specifically, you can build CSP (Constraint Satisfaction Problem) and COP (Constraint Optimization Problem) models. Importantly, there is a complete separation between the modeling and solving phases: you write a model, you compile it (while providing some data) in order to generate an XCSP$3$ instance (file), and you solve that problem instance by means of a constraint solver. You can also directly pilot the solving procedure in PyCSP$3$, possibly conducting an incremental solving strategy. In this document, you will find all that you need to know about PyCSP$3$, with more than 50 illustrative models.	翻訳日:2023-12-13 03:54:07 公開日:2023-12-10
# 二元分類における逆代理リスクの存在とミニマックス定理 Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification ( http://arxiv.org/abs/2206.09098v4 ) ライセンス: Link先を確認	Natalie S. Frank, Jonathan Niles-Weed	(参考訳) 敵意訓練は、敵意攻撃に頑健な訓練方法の最も一般的な方法の1つであるが、理論的にはよく理解されていない。我々は、逆代理リスクに対する証明と存在、正則性、およびミニマックス定理を行う。本研究は,先行研究による敵のロバスト性に関する経験的観察を説明し,アルゴリズム開発における新たな方向性を示唆する。さらに, 既知の存在と, 逆分類リスクに対するミニマックス定理を拡張し, リスクを推測した。 Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.	翻訳日:2023-12-13 03:46:50 公開日:2023-12-10
# HiFi++: 帯域拡張と音声強調のための統一フレームワーク HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement ( http://arxiv.org/abs/2203.13086v4 ) ライセンス: Link先を確認	Pavel Andreev, Aibek Alanov, Oleg Ivanov, Dmitry Vetrov	(参考訳) 生成的敵ネットワークは、最近、自己回帰モデルやフローベースモデルよりも優れた神経ボコーディング性能を示した。本稿では,この成功を条件付き音声生成の他のタスクにも拡張できることを示す。特に,HiFi vocoders をベースとして,帯域拡張と音声強調のための新しい HiFi++ 汎用フレームワークを提案する。ジェネレータアーキテクチャの改善により、hifi++は、計算リソースを大幅に削減しながら、これらのタスクの最先端と、より良く、あるいは互換性のあるパフォーマンスを示す。本手法の有効性は, 様々な実験により検証された。 Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture, HiFi++ performs better or comparably with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.	翻訳日:2023-12-13 03:45:44 公開日:2023-12-10
# 量子状態の現実について:$\psi$-ontic モデルに対するno-go定理 On the reality of the quantum state once again: A no-go theorem for $\psi$-ontic models ( http://arxiv.org/abs/2201.11842v3 ) ライセンス: Link先を確認	Gabriele Carcassi, Andrea Oldofredi, Christine A. Aidala	(参考訳) 本稿では,Harrigan と Spekkens (HS) が定義した$\psi$-ontic モデルでは量子論を再現できないことを示す。確率に焦点をあてる代わりに、情報理論的な考察を用いて、$\psi$-onticモデルのすべての純粋状態は、量子力学に明確に違反して互いに直交しなければならないことを示す。それを考えると (i)Pusey,Barrett and Rudolph (PBR)は以前、HSが定義した$\psi$-epistemic Modelも量子力学に矛盾することを示した。 (II) HS分類はこれらの2種類のモデルによって枯渇しており、HS分類自体が量子論を再現できるモデルに空間を残さないため問題である、と結論付けている。 In this paper we show that $\psi$-ontic models, as defined by Harrigan and Spekkens (HS), cannot reproduce quantum theory. Instead of focusing on probability, we use information theoretic considerations to show that all pure states of $\psi$-ontic models must be orthogonal to each other, in clear violation of quantum mechanics. Given that (i) Pusey, Barrett and Rudolph (PBR) previously showed that $\psi$-epistemic models, as defined by HS, also contradict quantum mechanics, and (ii) the HS categorization is exhausted by these two types of models, we conclude that the HS categorization itself is problematic as it leaves no space for models that can reproduce quantum theory.	翻訳日:2023-12-13 03:44:43 公開日:2023-12-10
# データ分裂:単一のデータポイントを分割する Data fission: splitting a single data point ( http://arxiv.org/abs/2112.11079v9 ) ライセンス: Link先を確認	James Leiner, Boyan Duan, Larry Wasserman, Aaditya Ramdas	(参考訳) 未知のパラメータを持つ既知の族において、ある分布からランダムベクトル $x$ を観測すると仮定する。いずれの場合、$x$を2つの部分に分けて$f(x)$と$g(x)$に分割することは可能で、どちらの部分も$x$をそれ自体で再構築するには十分ではありませんが、どちらも$x$を完全に回収することができ、$(f(x),g(x))$のジョイントディストリビューションは扱いやすいのでしょうか? 例えば、$X=(X_1,\dots,X_n)$と$P$が積分布であれば、任意の$m<n$に対して、サンプルを$f(X)=(X_1,\dots,X_m)$と$g(X)=(X_{m+1},\dots,X_n)$に分割することができる。 rasines and young (2022)は、付加ガウスノイズを使用する別のアプローチを提供する -- これはガウス分散データに対する有限サンプルでのポスト選択推論を可能にし、エラーがガウス的でない場合の漸近的推論を可能にする。本稿では,ベイズ推論からアイデアを借用して,データ分割の連続的類似物と見なすことのできる(相対論的)解を得る,有限サンプルの分割を実現するためのより一般的な手法を提案する。我々は、データ分割、データ彫刻、p値マスキングに代わる方法として、メソッドデータフィッションと呼ぶ。トレンドフィルタリングやその他の回帰問題に対するポストセレクション推論など,いくつかのプロトタイプアプリケーション上での手法を例示する。 Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.	翻訳日:2023-12-13 03:43:45 公開日:2023-12-10
# 熱場力学による熱量子ビットのシミュレーション Simulating thermal qubits through thermofield dynamics ( http://arxiv.org/abs/2111.09969v6 ) ライセンス: Link先を確認	G. X. A. Petronilo, M. R. Ara\'ujo, Clebson Cruz	(参考訳) 量子コンピューティングは過去数十年間、科学界の注目を集めてきた。量子コンピュータの開発は、情報を扱い、抽出し、転送するより安全で高速な方法への一歩を約束している。しかし、量子コンピューティングの大きな利点にもかかわらず、室温で動作する量子デバイスの開発は熱デコヒーレンスプロセスによって損なわれている。さらに、ほとんどの学部や大学院の量子力学コースでは、熱場力学の研究は通常無視される。このシナリオでは、量子コンピューティングのセットアップに適用される熱場ダイナミクス(tfd)を介して熱量子ビットシステムをシミュレートするディダクティックなアプローチを探求する。この結果から, 量子演算系における熱量子ビットの実用的構築を可能にするボゴリューボフ変換を用いて, 量子ビットに対するブロッホ球表現を記述できることが示唆された。したがって、この研究は、量子コンピューティングによる熱場力学を教師や興味のある学生に導入し、TFD技術を用いて量子プロトコルに対する温度の影響を研究し、学習する。 Quantum computing has attracted the attention of the scientific community in the past few decades. The development of quantum computers promises one path toward safer and faster ways to treat, extract, and transfer information. However, despite the significant advantages of quantum computing, the development of quantum devices operating at room temperature has been compromised by the thermal decoherence process. In addition, in most undergraduate and graduate quantum mechanics courses, the study of thermofield dynamics is usually neglected. In this scenario, this work explores a didactic approach to simulate thermal qubit systems through Thermofield Dynamics (TFD), applied in a quantum computing setup. The results show that the Bloch sphere representation for a qubit can be written in terms of the Bogoliubov transformation, which allows a practical construction for the thermal qubits in a quantum computing setup. Therefore, this work introduces thermofield dynamics through quantum computing to teachers and curious students interested in teaching and learning this important field of studying the temperature impacts on quantum protocols using the TFD technique.	翻訳日:2023-12-13 03:43:02 公開日:2023-12-10
# 不確実性下の探索のためのリスクアウェアなメタレベル意思決定 Risk-aware Meta-level Decision Making for Exploration Under Uncertainty ( http://arxiv.org/abs/2209.05580v2 ) ライセンス: Link先を確認	Joshua Ott, Sung-Kyun Kim, Amanda Bouman, Oriana Peltzer, Mamoru Sobue, Harrison Delecki, Mykel J. Kochenderfer, Joel Burdick, Ali-akbar Agha-mohammadi	(参考訳) 未知環境のロボットによる探索は、センサ測定、局所化、行動実行、その他多くの要因において不確実性を考慮しなければならない不確実性の下で決定する問題である。大規模探査アプリケーションの場合、自律システムは、障害や危険地形に関連するリスクを安全に評価しながら、環境のどの領域が探検に値するかを順次決定する課題を克服しなければならない。本研究では,地域・グローバル探索に伴うトレードオフのバランスをとるためのリスク対応型メタレベル意思決定フレームワークを提案する。メタレベルの意思決定は、局所的な政策とグローバルな政策を切り替えることによって古典的な階層的なカバレッジプランナーの上に構築される。我々は, 環境史, トラバーサビリティリスク, キノダイナミック制約に関する情報を用いて, 地域政策とグローバル政策の切り替えに成功している政策実行の可能性を推論する。シミュレーションと大規模な実世界のハードウェアテストの両方で、私たちのソリューションを検証しました。その結果,局所探査とグローバル探査のバランスをとることで,大規模環境をより効率的に探索できることがわかった。 Robotic exploration of unknown environments is fundamentally a problem of decision making under uncertainty where the robot must account for uncertainty in sensor measurements, localization, action execution, as well as many other factors. For large-scale exploration applications, autonomous systems must overcome the challenges of sequentially deciding which areas of the environment are valuable to explore while safely evaluating the risks associated with obstacles and hazardous terrain. In this work, we propose a risk-aware meta-level decision making framework to balance the tradeoffs associated with local and global exploration. Meta-level decision making builds upon classical hierarchical coverage planners by switching between local and global policies with the overall objective of selecting the policy that is most likely to maximize reward in a stochastic environment. We use information about the environment history, traversability risk, and kinodynamic constraints to reason about the probability of successful policy execution to switch between local and global policies. We have validated our solution in both simulation and on a variety of large-scale real world hardware tests. Our results show that by balancing local and global exploration we are able to significantly explore large-scale environments more efficiently.	翻訳日:2023-12-13 03:34:19 公開日:2023-12-10
# Spach Transformer:PET画像の局所的・グローバル的自己注意に基づく空間的・チャネル的変換器 Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising ( http://arxiv.org/abs/2209.03300v2 ) ライセンス: Link先を確認	Se-In Jang, Tinsu Pan, Ye Li, Pedram Heidari, Junyu Chen, Quanzheng Li, Kuang Gong	(参考訳) ポジショルエミッショントモグラフィ(PET)はその量的メリットと高い感度のために臨床や研究に広く用いられているが、低信号-雑音比(SNR)に悩まされている。近年,畳み込みニューラルネットワーク(cnns)がpet画像の品質向上に広く利用されている。局所的な特徴抽出で成功し、効率的であるが、CNNはその限定された受容野のため、長距離依存をうまく捉えることはできない。 global multi-head self-attention (msa) は長距離情報を取り込む一般的な手法である。しかし,3次元画像に対するグローバルmsaの計算には高い計算コストがかかる。本研究では,局所的および大域的msaに基づく空間的およびチャネル的情報を活用できる,効率的な空間的およびチャネル的エンコーダ・デコーダ変換器spach transformerを提案する。異なるPETトレーサのデータセット、すなわち$^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, $^{68}$Ga-DOTATATEを用いて提案フレームワークの評価を行った。定量的な結果は,提案するSpach Transformerフレームワークが最先端のディープラーニングアーキテクチャより優れていることを示している。私たちのコードはhttps://github.com/sijang/SpachTransformerで利用可能です。 Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limited receptive field. Global multi-head self-attention (MSA) is a popular approach to capture long-range information. However, the calculation of global MSA for 3D images has high computational costs. In this work, we proposed an efficient spatial and channel-wise encoder-decoder transformer, Spach Transformer, that can leverage spatial and channel information based on local and global MSAs. Experiments based on datasets of different PET tracers, i.e., $^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, and $^{68}$Ga-DOTATATE, were conducted to evaluate the proposed framework. Quantitative results show that the proposed Spach Transformer framework outperforms state-of-the-art deep learning architectures. Our codes are available at https://github.com/sijang/SpachTransformer	翻訳日:2023-12-13 03:33:58 公開日:2023-12-10
# 2022年XCSP3コンペティションの成果 Proceedings of the 2022 XCSP3 Competition ( http://arxiv.org/abs/2209.00917v2 ) ライセンス: Link先を確認	Gilles Audemard, Christophe Lecoutre, Emmanuel Lonca	(参考訳) この文書は2022年のXCSP3コンペティションの手続きを表している。この制約ソルバの競争の結果は、2022年7月31日から8月7日までイスラエルのハイファで開催されたfloc(federated logic conference) 2022オリンピックで発表された。 This document represents the proceedings of the 2022 XCSP3 Competition. The results of this competition of constraint solvers were presented at FLOC (Federated Logic Conference) 2022 Olympic Games, held in Haifa, Israel from 31th July 2022 to 7th August, 2022.	翻訳日:2023-12-13 03:33:31 公開日:2023-12-10
# GANとクロージャ:マルチスケールモデリングにおけるマイクロマクロ一貫性 GANs and Closures: Micro-Macro Consistency in Multiscale Modeling ( http://arxiv.org/abs/2208.10715v4 ) ライセンス: Link先を確認	Ellis R. Crabtree, Juan M. Bello-Rivas, Andrew L. Ferguson, Ioannis G. Kevrekidis	(参考訳) 分子系の位相空間、そしてより一般的には、確率微分方程式によって効果的にモデル化される複雑な系のサンプリングは、タンパク質の折り畳みから物質発見に至るまで、多くの分野において重要なモデリングステップである。これらの問題は自然界においてしばしばマルチスケールであり、少数の「遅い」反応座標によってパラメトリケートされた低次元の有効自由エネルギー表面で説明でき、残りの「速い」自由度は反応座標値の平衡測度を発生させる。このような問題に対するサンプリング手順は、条件付き平衡分布に関するアンサンブル平均と同様に有効自由エネルギー差を推定するために用いられる。近年,分子シミュレーションと組み合わせた改良されたサンプリング技術が開発されている。興味深いアナロジーは機械学習(ml)の分野において発生し、生成型逆ネットワークは低次元確率分布から高次元のサンプルを生成することができる。このサンプル生成は、その低次元表現に関する情報から、モデル状態の可能な高次元空間実現を返す。本稿では,同じタスクに対して,mlベースの条件付き生成逆ネットワークを用いて条件分布をサンプリングするための物理ベースのシミュレーションとバイアス手法を結合する手法を提案する。微細なスケールの実現を条件付ける「粗い記述子」は、優先順位として、あるいは非線形次元の減少を通じて学習することができる。物理学に基づく拡張サンプリング技術とcGANを結合したフレームワークが、マルチスケールのSDE動的システムサンプリングを改善することを実証し、複雑さを増すシステムへの期待を示す。 Sampling the phase space of molecular systems -- and, more generally, of complex systems effectively modeled by stochastic differential equations -- is a crucial modeling step in many fields, from protein folding to materials discovery. These problems are often multiscale in nature: they can be described in terms of low-dimensional effective free energy surfaces parametrized by a small number of "slow" reaction coordinates; the remaining "fast" degrees of freedom populate an equilibrium measure on the reaction coordinate values. Sampling procedures for such problems are used to estimate effective free energy differences as well as ensemble averages with respect to the conditional equilibrium distributions; these latter averages lead to closures for effective reduced dynamic models. Over the years, enhanced sampling techniques coupled with molecular simulation have been developed. An intriguing analogy arises with the field of Machine Learning (ML), where Generative Adversarial Networks can produce high dimensional samples from low dimensional probability distributions. This sample generation returns plausible high dimensional space realizations of a model state, from information about its low-dimensional representation. In this work, we present an approach that couples physics-based simulations and biasing methods for sampling conditional distributions with ML-based conditional generative adversarial networks for the same task. The "coarse descriptors" on which we condition the fine scale realizations can either be known a priori, or learned through nonlinear dimensionality reduction. We suggest that this may bring out the best features of both approaches: we demonstrate that a framework that couples cGANs with physics-based enhanced sampling techniques can improve multiscale SDE dynamical systems sampling, and even shows promise for systems of increasing complexity.	翻訳日:2023-12-13 03:32:44 公開日:2023-12-10
# システムバスダイナミクスのための小行列経路積分のツリーベース実装 Tree-based Implementation of the Small Matrix Path Integral for System-Bath Dynamics ( http://arxiv.org/abs/2207.11830v2 ) ライセンス: Link先を確認	Geshuo Wang and Zhenning Cai	(参考訳) small matrix path integral (smatpi) 法は、高調波浴に結合した量子系の進化をシミュレートする効率的な数値的手法である。この方法は量子系の非マルコフ力学を定義する一連のカーネル行列に依存する。 SMatPI方式では、これらのカーネルはQuAPI方式で間接的に計算される。代わりに、カーネル行列の定義に焦点をあて、これらの行列の繰り返し関係を明らかにする。このような関係を用いて,木ベースアルゴリズム(t-smatpi)を開発し,その定義に基づくカーネル行列の簡単な計算よりも高速であることが示されている。このアルゴリズムはSMatPI行列を他の経路積分法によって計算するステップをバイパスし、SMatPI行列自体をより深く理解する。一方、メモリコストと計算コストは低く抑えられている。数値実験により、t-SMatPIアルゴリズムはi-QuAPIとSMatPIと全く同じ結果が得られることが示された。それにもかかわらず、我々の手法はオープン量子系のいくつかの新しい性質を示し、高次数値スキームに一般化できる可能性を持っている。 The small matrix path integral (SMatPI) method is an efficient numerical approach to simulate the evolution of a quantum system coupled to a harmonic bath. The method relies on a sequence of kernel matrices that defines the non-Markovian dynamics of the quantum system. In the original SMatPI method, these kernels are computed indirectly through the QuAPI method. Instead, we focus on the definition of the kernel matrices and reveal a recurrence relation in these matrices. Using such a relationship, a tree based algorithm (t-SMatPI) is developed, which is shown to be much faster than straightforward computation of the kernel matrices based on their definitions. This algorithm bypasses the step to compute the SMatPI matrices by other path integral methods and provides more understanding of the SMatPI matrices themselves. Meanwhile, it keeps the memory cost and computational cost low. Numerical experiments show that the t-SMatPI algorithm gives exactly the same result as i-QuAPI and SMatPI. In spite of this, our method may indicate some new properties of open quantum systems, and has the potential to be generalized to higher-order numerical schemes.	翻訳日:2023-12-13 03:31:53 公開日:2023-12-10
# グラフトポロジサンプリングを用いたトレーニンググラフ畳み込みネットワークの一般化保証 Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling ( http://arxiv.org/abs/2207.03584v2 ) ライセンス: Link先を確認	Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong	(参考訳) グラフ畳み込みネットワーク(GCN)は近年,グラフ構造化データの学習において大きな成功を収めている。隣り合う機能の再帰的埋め込みによるスケーラビリティの問題に対処するために、gcnのトレーニングのメモリと計算コストを削減するためにグラフトポロジサンプリングが提案されており、多くの実証研究でトポロジーサンプリングのないものと同等のテスト性能を達成している。本稿では,半教師付きノード分類のための(最大)3層gcnの学習におけるグラフトポロジーサンプリングの理論的正当化について述べる。グラフトポロジサンプリングにおいて,GCNトレーニングが一般化誤差を減少させるような条件を公式に特徴付ける。さらに,本手法は,既存のGCNの理論的解析において未探索の層間重みの非凸相互作用に対処する。本稿では,グラフ構造とトポロジサンプリングが一般化性能および試料の複雑さに与える影響を明示し,数値実験により理論的知見を正当化する。 Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data. To address its scalability issue due to the recursive embedding of neighboring features, graph topology sampling has been proposed to reduce the memory and computational cost of training GCNs, and it has achieved comparable test performance to those without topology sampling in many empirical studies. To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. We formally characterize some sufficient conditions on graph topology sampling such that GCN training leads to a diminishing generalization error. Moreover, our method tackles the nonconvex interaction of weights across layers, which is under-explored in the existing theoretical analyses of GCNs. This paper characterizes the impact of graph structures and topology sampling on the generalization performance and sample complexity explicitly, and the theoretical findings are also justified through numerical experiments.	翻訳日:2023-12-13 03:31:09 公開日:2023-12-10
# CT画像からのトランスファーラーニングアプローチを用いたCOVID-19検出 COVID-19 Detection Using Transfer Learning Approach from Computed Tomography Images ( http://arxiv.org/abs/2207.00259v5 ) ライセンス: Link先を確認	Kenan Morani, Esra Kaya Ayana, Devrim Unay	(参考訳) 新型コロナウイルスのパンデミックによる独特な課題の中で、効率的かつ正確な診断の重要性は、革新的なアプローチの緊急性を示している。これらの課題に対応するために,最近アノテーション付きct画像データベースを用いたトランスファー学習に基づくアプローチを提案する。多くのアプローチが集中型データプリプロセシングおよび/または複雑なモデルアーキテクチャを提案するが、この手法は最小限の手動エンジニアリングで効率的なソリューションを提供することに焦点を当てている。具体的には、covid-19検出のための修正xceptionモデルの適合性について検討する。この方法は、事前トレーニングされたXceptionモデルを適応させ、ImageNetのアーキテクチャと事前トレーニングされた重みの両方を組み込む。モデルの出力は最終的な診断決定を下すように設計された。トレーニングでは、128のバッチサイズと224x224の入力画像サイズを使用し、標準の512x512から縮小した。入力データ上でのda処理は行われなかった。検査は「COV19-CT-DB」CT画像データセットを用いて行う。その結果、検証サブセットにおける精度、精度、リコール、マクロF1スコアの精度が向上し、VGG-16転送モデルよりも優れ、パラメータが少ない精度が向上した。さらに、cov19-ct-dbデータセットの代替手法と比較すると、同じデータセット上のベースラインアプローチや他の代替方法を超える。最後に、COV19-CT-DBデータセットのユニークな特徴に対するXception trasnfer学習ベースモデルの適応性は、CT画像から新型コロナウイルスを診断するための堅牢なツールとしての可能性を示している。 The significance of efficient and accurate diagnosis amidst the unique challenges posed by the COVID-19 pandemic underscores the urgency for innovative approaches. In response to these challenges, we propose a transfer learning-based approach using a recently annotated Computed Tomography (CT) image database. While many approaches propose an intensive data preproseccing and/or complex model architecture, our method focusses on offering an efficient solution with minimal manual engineering. Specifically, we investigate the suitability of a modified Xception model for COVID-19 detection. The method involves adapting a pre-trained Xception model, incorporating both the architecture and pre-trained weights from ImageNet. The output of the model was designed to take the final diagnosis decisions. The training utilized 128 batch sizes and 224x224 input image dimensions, downsized from standard 512x512. No further da processing was performed on the input data. Evaluation is conducted on the 'COV19-CT-DB' CT image dataset, containing labeled COVID-19 and non-COVID-19 cases. Results reveal the method's superiority in accuracy, precision, recall, and macro F1 score on the validation subset, outperforming VGG-16 transfer model and thus offering enhanced precision with fewer parameters. Furthermore, when compared to alternative methods for the COV19-CT-DB dataset, our approach exceeds the baseline approach and other alternatives on the same dataset. Finally, the adaptability of the modified Xception trasnfer learning-based model to the unique features of the COV19-CT-DB dataset showcases its potential as a robust tool for enhanced COVID-19 diagnosis from CT images.	翻訳日:2023-12-13 03:30:53 公開日:2023-12-10
# 深い量子誤差補正 Deep Quantum Error Correction ( http://arxiv.org/abs/2301.11930v2 ) ライセンス: Link先を確認	Yoni Choukroun, Lior Wolf	(参考訳) 量子誤り訂正符号(QECC)は、量子コンピューティングのポテンシャルを実現するための鍵となる要素である。 QECCは、従来のECC(英語版)と同様に、冗長な物理量子ビットに量子論理情報を分散することにより、エラーを検出し修正することで、エラー率の低減を可能にする。本研究では,新しい量子誤り復号器を効率的に訓練する。システムノイズの最初の推定値を予測するために、シンドローム復号を増強することで量子計測の崩壊を解消し、深層ニューラルネットワークによって反復的に洗練する。有限フィールド上で計算された論理エラー率は、微分可能な目的によって直接最適化され、コードによって課される制約の下で効率的な復号化を可能にする。最後に, 繰り返しシンドロームサンプリングの効率的な復号化により, 故障症候群計測をサポートするよう拡張した。提案手法は,QECC におけるニューラルデコーダのパワーを,最先端の精度を達成し,従来の {end-to-end } ニューラルおよび古典的デコーダを性能的に向上させることによって実証する。 Quantum error correction codes (QECC) are a key component for realizing the potential of quantum computing. QECC, as its classical counterpart (ECC), enables the reduction of error rates, by distributing quantum logical information across redundant physical qubits, such that errors can be detected and corrected. In this work, we efficiently train novel {\emph{end-to-end}} deep quantum error decoders. We resolve the quantum measurement collapse by augmenting syndrome decoding to predict an initial estimate of the system noise, which is then refined iteratively through a deep neural network. The logical error rates calculated over finite fields are directly optimized via a differentiable objective, enabling efficient decoding under the constraints imposed by the code. Finally, our architecture is extended to support faulty syndrome measurement, by efficient decoding of repeated syndrome sampling. The proposed method demonstrates the power of neural decoders for QECC by achieving state-of-the-art accuracy, outperforming {for small distance topological codes,} the existing {end-to-end }neural and classical decoders, which are often computationally prohibitive.	翻訳日:2023-12-13 03:24:08 公開日:2023-12-10
# EPCL: Frozen CLIP Transformerは効率的なポイントクラウドエンコーダ EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder ( http://arxiv.org/abs/2212.04098v3 ) ライセンス: Link先を確認	Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang	(参考訳) プリトレイン・フィニチューン・パラダイムは、高品質な表現能力とトレーニング済みモデルの転送性により、nlpと2d画像の分野で大きな成功を収めている。しかし,3次元点雲場において,このような強いモデルの事前学習は,点雲列の限られた量のため困難である。本稿では, 凍結したCLIP変換器を用いて高品質のクラウドモデルを直接学習する, 効率的かつ効率的なポイントクラウド学習者である \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL) を紹介する。我々のEPCLは、2D-3Dデータなしで画像の特徴と点雲の特徴を意味的に整合させることで、2Dと3Dのモダリティを接続する。具体的には、入力ポイントクラウドは一連のローカルパッチに分割され、設計されたpoint cloud tokenizerによってトークン埋め込みに変換される。これらのトークン埋め込みはタスクトークンと結合され、ポイントクラウド表現を学ぶために凍ったクリップトランスフォーマーに供給される。直感的には、提案されたpoint cloud tokenizerは入力ポイントクラウドを2dイメージに似た統一トークン空間に投影する。 3次元検出,セマンティックセグメンテーション,分類,少数ショット学習に関する総合的な実験により,CLIPトランスフォーマーが効率的なポイントクラウドエンコーダとして機能し,室内および屋外のベンチマークで有望な性能を達成することを示す。特に、epclがもたらしたパフォーマンス向上は、scannet v2検出で$\textbf{19.7}$ ap$_{50}$、s3disセグメンテーションで$\textbf{4.4}$ miou、semantickittiセグメンテーションで$\textbf{1.2}$ miouです。コードは \url{https://github.com/xiaoshuihuang/epcl} で入手できる。 The pretrain-finetune paradigm has achieved great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field due to the limited amount of point cloud sequences. This paper introduces \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP transformer. Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a series of local patches, which are converted to token embeddings by the designed point cloud tokenizer. These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation. The intuition is that the proposed point cloud tokenizer projects the input point cloud into a unified token space that is similar to the 2D images. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the CLIP transformer can serve as an efficient point cloud encoder and our method achieves promising performance on both indoor and outdoor benchmarks. In particular, performance gains brought by our EPCL are $\textbf{19.7}$ AP$_{50}$ on ScanNet V2 detection, $\textbf{4.4}$ mIoU on S3DIS segmentation and $\textbf{1.2}$ mIoU on SemanticKITTI segmentation compared to contemporary pretrained models. Code is available at \url{https://github.com/XiaoshuiHuang/EPCL}.	翻訳日:2023-12-13 03:20:49 公開日:2023-12-10
# 非マルコフ環境における強化学習 Reinforcement Learning in Non-Markovian Environments ( http://arxiv.org/abs/2211.01595v3 ) ライセンス: Link先を確認	Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia	(参考訳) 任意の非マルコフ環境における強化学習のためにvan royと共著者によって開発された新しいパラダイムに動機づけられ、q-learningアルゴリズムを適用した際の観測の非マルコフ性に起因する誤りを、関連する定式化し、明確にピン留めする。この観察に基づいて,エージェント設計の基準は,ある条件法則に対してよい近似を求めるべきであることを示唆する。古典的確率制御に着想を得て, 近似的統計量の再帰的計算に還元されることを示す。これにより、エージェント設計のためのオートエンコーダベースのスキームが実現され、部分的に観察された強化学習環境上で数値的にテストされる。 Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.	翻訳日:2023-12-13 03:19:21 公開日:2023-12-10
# dinar: 一発ヒトアバターの神経テクスチャの拡散インパインティング DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars ( http://arxiv.org/abs/2303.09375v4 ) ライセンス: Link先を確認	David Svitov, Dmitrii Gudkov, Renat Bashirov, Victor Lempitsky	(参考訳) DINARは、1枚のRGB画像から現実的なフルボディアバターを作成するためのアプローチである。従来の研究と同様に, SMPL-Xボディーモデルと組み合わせた神経テクスチャを用いて, アバターのフォトリアリスティックな品質を実現し, アニメーションや高速な推論を実現している。テクスチャを復元するために、潜伏拡散モデルを使用し、そのようなモデルを神経テクスチャ空間でどのようにトレーニングするかを示す。拡散モデルを用いることで、正面から見ると人物の背中のような大きな目立たない領域を現実的に再構築することができる。パイプライン内のモデルは、2D画像とビデオのみを使用してトレーニングされています。実験では,最先端のレンダリング品質と,新たなポーズや視点への優れた一般化を実現する。特に、このアプローチはSnapshotPeople公開ベンチマークの最先端を改善している。 We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.	翻訳日:2023-12-13 03:10:27 公開日:2023-12-10
# 学習は低位行列回復における局所的最適性によって説明できるか? Can Learning Be Explained By Local Optimality In Low-rank Matrix Recovery? ( http://arxiv.org/abs/2302.10963v2 ) ライセンス: Link先を確認	Jianhao Ma, Salar Fattahi	(参考訳) 低ランクマトリックスリカバリのローカルランドスケープを探求し、$m$線形測定値から$r$で$d_1\times d_2$マトリックスを再構築することを目的としている。真のランクが不明な場合、過剰推定は一般的であり、ランク $k\geq r$ の過剰パラメータモデルが得られる。近年の研究では、ロバストな$\ell_1$-lossを持つ一階法は、ランクが過大評価され、測定がうるさく、真の解が局所的あるいは大域的ミニマとして現れる可能性を示している。本論文は, 穏やかな条件下では, 真の解が「textit{strict saddle points}」として現れることを示す。我々は,ロバストな$\ell_1$-loss と低ランク行列回復,行列完成,行列センシングの2つのカテゴリについて検討した。マトリックスセンシングでは、2つの臨界遷移を明らかにする。 m$ の範囲は $\max\{d_1,d_2\}r\lesssim m\lesssim \max\{d_1,d_2\}k$ の範囲であり、真の解はいずれも局所的あるいは大域的でない。 m$ が $\max\{d_1,d_2\}k$ を超えると、すべての真の解は無意味なグローバルミニマになる。行列の完備化において、わずかなランクの過大評価と穏やかなノイズであっても、真の解は非臨界点または厳密な鞍点として現れる。 We explore the local landscape of low-rank matrix recovery, aiming to reconstruct a $d_1\times d_2$ matrix with rank $r$ from $m$ linear measurements, some potentially noisy. When the true rank is unknown, overestimation is common, yielding an over-parameterized model with rank $k\geq r$. Recent findings suggest that first-order methods with the robust $\ell_1$-loss can recover the true low-rank solution even when the rank is overestimated and measurements are noisy, implying that true solutions might emerge as local or global minima. Our paper challenges this notion, demonstrating that, under mild conditions, true solutions manifest as \textit{strict saddle points}. We study two categories of low-rank matrix recovery, matrix completion and matrix sensing, both with the robust $\ell_1$-loss. For matrix sensing, we uncover two critical transitions. With $m$ in the range of $\max\{d_1,d_2\}r\lesssim m\lesssim \max\{d_1,d_2\}k$, none of the true solutions are local or global minima, but some become strict saddle points. As $m$ surpasses $\max\{d_1,d_2\}k$, all true solutions become unequivocal global minima. In matrix completion, even with slight rank overestimation and mild noise, true solutions either emerge as non-critical or strict saddle points.	翻訳日:2023-12-13 03:08:47 公開日:2023-12-10
# ベイジアン説得による動的価格と学習 Dynamic Pricing and Learning with Bayesian Persuasion ( http://arxiv.org/abs/2304.14385v2 ) ライセンス: Link先を確認	Shipra Agrawal, Yiding Feng, Wei Tang	(参考訳) 我々は,商品の価格設定に加えて,販売者が「広告計画」にコミットする,新たな動的価格設定と学習環境について考察する。つまり、各ラウンドの開始時に、売り手は商品の品質について購入者にどのような信号を提供するかを決定することができる。人気の高いベイズ説得フレームワークを用いて、これらのシグナルが購入者の評価と購入応答に及ぼす影響をモデル化し、販売者の期待収益を最大化する価格体系とともに、広告スキームの最適設計を求める問題を定式化する。購入者の需要関数を事前に知ることなく、過去の購入応答を利用して最適な価格と広告戦略を適応的に学習できるオンラインアルゴリズムを設計することを目標としている。本稿では,最適な価格と広告手法と比較し,アルゴリズムの後悔について考察する。我々の主な結果は計算効率の良いオンラインアルゴリズムであり、製品品質において評価関数が線形であるときに$o(t^{2/3}(m\log t)^{1/3})$ regret boundを達成する。ここで $m$ は離散的製品品質ドメインの濃度であり、$t$ は時間軸である。この結果は、バリュエーション関数に対する自然な単調性とリプシッツの仮定を必要とするが、購入者の要求関数に対するリプシッツや滑らかさの仮定は不要である。定数$m$の場合、この結果は対数係数内での動的価格設定に対する後悔の少ない低い値と一致します。また、より広範に考慮された加法評価の特別ケースに対して、$m$ の独立性を持つ $\tilde{O}(T^{2/3})$ regret bound を含むいくつかの改善された結果を得る。 We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.	翻訳日:2023-12-13 02:59:37 公開日:2023-12-10
# 形状, 材料, 照明のニューラルPBIR再構成 Neural-PBIR Reconstruction of Shape, Material, and Illumination ( http://arxiv.org/abs/2304.13445v4 ) ライセンス: Link先を確認	Cheng Sun, Guangyan Cai, Zhengqin Li, Kai Yan, Cheng Zhang, Carl Marshall, Jia-Bin Huang, Shuang Zhao, Zhao Dong	(参考訳) 物体の2d画像(例えば写真)に基づく物理世界の物体の形状と空間的に変化する表面の外観の再構築は、コンピュータビジョンやグラフィックスにおいて長年の課題となっている。本稿では,ニューラルネットワークを用いた物体再構成と物理ベースの逆レンダリング(PBIR)を組み合わせた高精度かつ高効率な物体再構成パイプラインを提案する。当社のパイプラインではまず,ニューラルsdfベースの形状再構成を活用して,高品質だが潜在的に不完全なオブジェクト形状を生成する。次に, 神経材料と照明蒸留ステージを導入し, 材料と照明の高品質な予測を実現する。最終段階では、神経予測によって初期化され、PBIRを用いて初期結果を洗練し、オブジェクト形状、材料、照明の最終的な高品質な再構成を得る。実験の結果、パイプラインは既存のメソッドよりも品質や性能に優れています。 Reconstructing the shape and spatially varying surface appearances of a physical-world object as well as its surrounding illumination based on 2D images (e.g., photographs) of the object has been a long-standing problem in computer vision and graphics. In this paper, we introduce an accurate and highly efficient object reconstruction pipeline combining neural based object reconstruction and physics-based inverse rendering (PBIR). Our pipeline firstly leverages a neural SDF based shape reconstruction to produce high-quality but potentially imperfect object shape. Then, we introduce a neural material and lighting distillation stage to achieve high-quality predictions for material and illumination. In the last stage, initialized by the neural predictions, we perform PBIR to refine the initial results and obtain the final high-quality reconstruction of object shape, material, and illumination. Experimental results demonstrate our pipeline significantly outperforms existing methods quality-wise and performance-wise.	翻訳日:2023-12-13 02:59:01 公開日:2023-12-10
# 正規化・多視点支援ベクトル機械学習のローカライズ Localisation of Regularised and Multiview Support Vector Machine Learning ( http://arxiv.org/abs/2304.05655v2 ) ライセンス: Link先を確認	Aurelian Gheondea and Cankat Tilki	(参考訳) 我々は、H.Q.~Minh, L によって導入された正規化および多視点支援ベクトル機械学習問題の局所化バージョンに対するいくつかの代表者定理を証明した。〜bazzani,v。 ~murino, \textit{journal of machine learning research}, \textbf{17}(2016) 1--72, 演算子値の正の半定義核とその再生成核ヒルベルト空間を含む。結果は、凸または非凸損失函数と有限または無限次元の入力空間を考える場合の一般的な場合に関する。一般化されたフレームワークは無限次元の入力空間と非凸損失関数を特別な場合、特に損失関数が g\^ateaux 微分可能である場合に許容する。部分非線形問題につながる指数最小二乗損失関数について、詳細な計算が提供される。 We prove a few representer theorems for a localised version of the regularised and multiview support vector machine learning problem introduced by H.Q.~Minh, L.~Bazzani, and V.~Murino, \textit{Journal of Machine Learning Research}, \textbf{17}(2016) 1--72, that involves operator valued positive semidefinite kernels and their reproducing kernel Hilbert spaces. The results concern general cases when convex or nonconvex loss functions and finite or infinite dimensional input spaces are considered. We show that the general framework allows infinite dimensional input spaces and nonconvex loss functions for some special cases, in particular in case the loss functions are G\^ateaux differentiable. Detailed calculations are provided for the exponential least squares loss functions that leads to partially nonlinear problems.	翻訳日:2023-12-13 02:57:18 公開日:2023-12-10
# ニューラルネットワーク制御システムの整合性解析のための契約型適応分割法 Contraction-Guided Adaptive Partitioning for Reachability Analysis of Neural Network Controlled Systems ( http://arxiv.org/abs/2304.03671v2 ) ライセンス: Link先を確認	Akash Harapanahalli, Saber Jafarpour, Samuel Coogan	(参考訳) 本稿では,ニューラルネットワークコントローラと外乱を用いた非線形フィードバックループにおける区間値のロバスト到達可能集合推定を改善するための縮小誘導適応分割アルゴリズムを提案する。過近似間隔の収縮率の推定に基づいて、アルゴリズムはいつ、どこで分割するかを選択する。そして、ニューラルネットワーク検証ステップと到達可能性分割層を分離することにより、アルゴリズムは計算コストの少ない精度向上を提供することができる。このアプローチは、十分な精度のオープンループ間隔値到達可能性推定手法と、ニューラルネットワークの入出力挙動をバウンドする方法に適用できる。縮退に基づくロバストネス解析を用いて,混合単調到達性を有するアルゴリズムの性能保証を行う。最後に,いくつかの数値シミュレーションを用いてアルゴリズムの性能を実証し,既存の手法と比較する。特に,実行環境のごく一部において到達可能な集合推定の精度が,最先端手法と比較して大幅に向上したことを報告する。 In this paper, we present a contraction-guided adaptive partitioning algorithm for improving interval-valued robust reachable set estimates in a nonlinear feedback loop with a neural network controller and disturbances. Based on an estimate of the contraction rate of over-approximated intervals, the algorithm chooses when and where to partition. Then, by leveraging a decoupling of the neural network verification step and reachability partitioning layers, the algorithm can provide accuracy improvements for little computational cost. This approach is applicable with any sufficiently accurate open-loop interval-valued reachability estimation technique and any method for bounding the input-output behavior of a neural network. Using contraction-based robustness analysis, we provide guarantees of the algorithm's performance with mixed monotone reachability. Finally, we demonstrate the algorithm's performance through several numerical simulations and compare it with existing methods in the literature. In particular, we report a sizable improvement in the accuracy of reachable set estimation in a fraction of the runtime as compared to state-of-the-art methods.	翻訳日:2023-12-13 02:57:03 公開日:2023-12-10
# Grid-SD2E:認知学習システムにおける一般的なグリッドフィードバック Grid-SD2E: A General Grid-Feedback in a System for Cognitive Learning ( http://arxiv.org/abs/2304.01844v3 ) ライセンス: Link先を確認	Jingyi Feng and Chenming Zhang	(参考訳) 生成された神経データによって脳が外界とどのように相互作用するかを理解することは、その働きのメカニズムの決定、脳疾患の治療、知性の理解に不可欠である。多くの理論モデルが提案されているが、これまでのところ統合と開発は困難である。本研究では,より汎用的でロバストなグリッドモジュールを作成し,ベイジアン推論(space-division and exploration-exploitation with grid-feedback, grid-sd2e)を用いた対話型・自己情報型認知システムを構築した。ここでは、グリッドモジュールを外界とシステム間の相互作用媒体として、システム内の自己強化媒体として使用することができる。空間分割探索探索(SD2E)は、その空間分割(SD)モジュールを介してグリッドの0/1信号を受信する。本稿では,他の研究者による実験と神経復号に関する経験から得られた理論モデルについても述べる。本稿では,神経科学と認知科学の両分野における既存の理論に基づくシステムの合理性を分析し,人と人と外の世界との間の相互作用を説明するための特別な,一般的なルールを提案する。さらに、このフレームワークに基づいて、最小の計算ユニットが抽出され、これは脳内の1つのニューロンに類似しています。 Comprehending how the brain interacts with the external world through generated neural data is crucial for determining its working mechanism, treating brain diseases, and understanding intelligence. Although many theoretical models have been proposed, they have thus far been difficult to integrate and develop. In this study, we were inspired in part by grid cells in creating a more general and robust grid module and constructing an interactive and self-reinforcing cognitive system together with Bayesian reasoning, an approach called space-division and exploration-exploitation with grid-feedback (Grid-SD2E). Here, a grid module can be used as an interaction medium between the outside world and a system, as well as a self-reinforcement medium within the system. The space-division and exploration-exploitation (SD2E) receives the 0/1 signals of a grid through its space-division (SD) module. The system described in this paper is also a theoretical model derived from experiments conducted by other researchers and our experience on neural decoding. Herein, we analyse the rationality of the system based on the existing theories in both neuroscience and cognitive science, and attempt to propose special and general rules to explain the different interactions between people and between people and the external world. What's more, based on this framework, the smallest computing unit is extracted, which is analogous to a single neuron in the brain.	翻訳日:2023-12-13 02:56:32 公開日:2023-12-10
# 画像マッティングのための異方性事前学習 Disentangled Pre-training for Image Matting ( http://arxiv.org/abs/2304.00784v2 ) ライセンス: Link先を確認	Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao	(参考訳) 画像マッチングは、近年の文献における深層モデルのトレーニングを支援するために、高品質なピクセルレベルの人間のアノテーションを必要とする。このようなアノテーションは費用がかかり、スケールが難しいが、研究の発展を著しく妨げている。本研究では,無限個のデータを利用してマットング性能を向上させる自己教師付き事前学習手法を提案することで,この問題への最初の試みを行う。プリトレーニングタスクは、ランダムなトリマップとアルファマットを生成して画像不等角化目標を達成するイメージマットングと似た方法で設計される。次に、事前訓練されたモデルは、微調整のための下流マットングタスクの初期化として使用される。広範な実験評価により,提案手法は最先端のマットング法と他の自己教師付き初期化手法を大差で上回ることがわかった。また,異なるバックボーンアーキテクチャ上で提案手法の堅牢性を示す。プロジェクトページはhttps://crystraldo.github.io/dpt_mat/で閲覧できます。 Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pre-training approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches by a large margin. We also show the robustness of the proposed approach over different backbone architectures. Our project page is available at https://crystraldo.github.io/dpt_mat/.	翻訳日:2023-12-13 02:56:09 公開日:2023-12-10
# 学習における絡み合いと統計の役割について On the Role of Entanglement and Statistics in Learning ( http://arxiv.org/abs/2306.03161v2 ) ライセンス: Link先を確認	Srinivasan Arunachalam, Vojtech Havlicek, Louis Schatzki	(参考訳) 本研究では,量子統計クエリ(QSQ)モデルにおいて,絡み合った,分離可能な,統計的に測定された学習モデル間の関係を理解する。この目的のために、以下の結果を示す。分離可能な測定値に対して$\textbf{entangled。 c\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$.} ここでの目標は、未知の$f$を、$\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$という概念クラスから学ぶことである。もし$t$が、絡み合った測定値を使って$f$を学ぶのに十分であれば、$o(nt^2)$は、分離可能な測定値だけで$f$を学ぶのに十分である。 $\textbf{Entangled versus statistics Measurement} ここでのゴールは、分離可能な測定と統計測定へのアクセスを与えられた関数 $f \in C$ を学ぶことである。 qsq学習と(ノイズが存在する場合でも)絡み合った測定値を持つ量子学習を指数関数的に分離するクラス$c$を示す。これはblum et alの独創的な結果の「量子アナログ」を証明している。 [BKW'03]。これは古典的なSQとPAC学習を分類ノイズで分離する。学習状態の上限は$\textbf{qsq である。量子統計クエリーディメンション(QSD)を導入し、QSQ学習の下位境界を与える。これにより、純度、シャドウトモグラフィ、アベリア隠れ部分群問題、次数2$の関数、植込み双斜め状態、深さ$\textsf{polylog}(n)$のクリフォード回路の出力状態をテストするための超多項式QSQの下界を証明できる。 $\textbf{Further アプリケーション。弱いエラーと強いエラーの軽減を分離し、qsqモデルにおける学習分布の限界を低く証明します。 Quekらによる以前の作品。 qfk+'22] ヒンシュなどです [hin+'22] と nietner 等。 NIS+'23]は類似の結果を$\textit{assuming}$ 対角測定で証明し、我々の研究はこの仮定を取り除いた。 In this work we make progress in understanding the relationship between learning models with access to entangled, separable and statistical measurements in the quantum statistical query (QSQ) model. To this end, we show the following results. $\textbf{Entangled versus separable measurements.}$ The goal here is to learn an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We show that, if $T$ copies suffice to learn $f$ using entangled measurements, then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements. $\textbf{Entangled versus statistical measurements}$ The goal here is to learn a function $f \in C$ given access to separable measurements and statistical measurements. We exhibit a class $C$ that gives an exponential separation between QSQ learning and quantum learning with entangled measurements (even in the presence of noise). This proves the "quantum analogue" of the seminal result of Blum et al. [BKW'03]. that separates classical SQ and PAC learning with classification noise. $\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum statistical query dimension (QSD), which we use to give lower bounds on the QSQ learning. With this we prove superpolynomial QSQ lower bounds for testing purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$ functions, planted bi-clique states and output states of Clifford circuits of depth $\textsf{polylog}(n)$. $\textbf{Further applications.}$ We give and $\textit{unconditional}$ separation between weak and strong error mitigation and prove lower bounds for learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22], Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous results $\textit{assuming}$ diagonal measurements and our work removes this assumption.	翻訳日:2023-12-13 02:34:45 公開日:2023-12-10
# 平均的リワードを伴うレストレスバンド:一様グローバルアトラクタの推計を破る Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption ( http://arxiv.org/abs/2306.00196v2 ) ライセンス: Link先を確認	Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang	(参考訳) 平均報酬基準による無限ホリゾンレストレスト・バンディット問題を離散時間と連続時間の両方の設定で検討した。基本的な目標は、腕の数($n$)が大きくなるにつれて最適なギャップを減少させる計算効率のよいポリシーを設計することである。漸近的最適性に関する既存の結果は、すべて一様大域的誘引特性(UGAP)に依存している。本稿では,単腕のポリシを元のn$-armed問題に対するポリシに変換する,汎用的なシミュレーションベースのフレームワークであるnext-the-virtual-adviceを提案する。これは、各腕に単一武装のポリシーをシミュレートし、実状態をシミュレートされた状態に向けて慎重に操ることによって行われる。我々のフレームワークは、$O(1/\sqrt{N})$Optimity gapでポリシーを生成するためにインスタンス化することができる。離散時間設定では、結果はより単純な同期仮定の下で保持され、これはugapに違反する問題インスタンスをカバーする。より注目すべきは、連続時間設定では、標準のユニチェーン条件を超える追加の仮定は不要である。どちらの設定でも、我々の研究はUGAPを必要としない最初の漸近的最適性の結果である。 We study the infinite-horizon Restless Bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental goal is to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require any additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.	翻訳日:2023-12-13 02:32:56 公開日:2023-12-10
# 非エルミート量子系における非自明な世界線巻線 Nontrivial worldline winding in non-Hermitian quantum systems ( http://arxiv.org/abs/2307.01260v2 ) ライセンス: Link先を確認	Shi-Xin Hu, Yongxu Fu, Yi Zhang	(参考訳) 非エルミート量子システムへの関心が高まっている中、非相互作用モデルが最も注目されている。ここでは、確率級数展開量子モンテカルロ法を用いて、相互作用する量子系、例えば様々な非エルミート量子スピン鎖における非エルミート物理学を研究する。計算は開境界条件下で一貫した数値結果をもたらすが、周期境界条件下での非エルミート量子系は、非自明な巻線上の想像時間世界線の異常な濃度を観測し、適切な収束のために巻数セクター間のエルゴード性を高める必要がある。このような非自明なワールドラインの巻線は、他の非エルミートモデルや解析的アプローチにも存在する創発的な物理現象である。非エルミート皮膚効果やポイントギャップ分光法と並行して、非エルミート位相現象の同定と解析を、相互作用、有限温度、生物軌道基底、周期境界条件を新規かつ制御された方法で量子系へと大きく拡張する。最後に,このような非自明なワールドライン巻線の直接的物理的意味について検討し,絡み合いエントロピーに付加的,潜在的に準長距離の寄与をもたらす。 Amid the growing interest in non-Hermitian quantum systems, non-interacting models have received the most attention. Here, through the stochastic series expansion quantum Monte Carlo method, we investigate non-Hermitian physics in interacting quantum systems, e.g., various non-Hermitian quantum spin chains. While calculations yield consistent numerical results under open boundary conditions, non-Hermitian quantum systems under periodic boundary conditions observe an unusual concentration of imaginary-time worldlines over nontrivial winding and require enhanced ergodicity between winding-number sectors for proper convergences. Such nontrivial worldline winding is an emergent physical phenomenon that also exists in other non-Hermitian models and analytical approaches. Alongside the non-Hermitian skin effect and the point-gap spectroscopy, it largely extends the identification and analysis of non-Hermitian topological phenomena to quantum systems with interactions, finite temperatures, biorthogonal basis, and periodic boundary conditions in a novel and controlled fashion. Finally, we study the direct physical implications of such nontrivial worldline winding, which bring additional, potentially quasi-long-range contributions to the entanglement entropy.	翻訳日:2023-12-13 02:26:08 公開日:2023-12-10
# Nano1D:低次元ナノ構造の解析とセグメンテーションのための正確なコンピュータビジョンソフトウェア Nano1D: An accurate Computer Vision software for analysis and segmentation of low-dimensional nanostructures ( http://arxiv.org/abs/2306.15319v3 ) ライセンス: Link先を確認	Ehsan Moradpur-Tari (1), Sergei Vlassov (1,2), Sven Oras (1,2), Mart Ernits (1), Elyad Damerchi (1), Boris Polyakovc (3), Andreas Kyritsakis (1), and Veronika Zadin (1) ((1) Institute of Technology, University of Tartu, Nooruse 1, 50411 Tartu, Estonia (2) Institute of Physics, University of Tartu, W. Ostwaldi 1, 50411 Tartu, Estonia (3) Institute of Solid State Physics, University of Latvia, Kengaraga street 8, LV-1063 Riga, Latvia)	(参考訳) 顕微鏡画像のナノ粒子は通常、質的または手作業で分析され、これらの物体の自律的定量分析が必要となる。本稿では、顕微鏡画像から1次元の変形可能な重なり合う物体の正確なセグメンテーションと幾何解析のための物理計算モデルを提案する。このモデルはNano1Dと呼ばれ、前処理、セグメンテーション、重なり合う物体と幾何学的測定の4つのステップを持つ。このモデルは、異なる顕微鏡から採取したAgおよびAuナノワイヤのSEM画像と、異なる長さ、直径、人口密度のナノ粒子に熱分解されたAgナノワイヤを用いて試験された。長さや平均直径などの幾何学的特徴を分割し分析することに成功した。アルゴリズムの機能は、画像内のオブジェクトのサイズ、数、密度、方向、重なりによって損なわれない。モデルの主な強みは、重なり合うオブジェクトを99%以上の精度でセグメント化および解析し、一方、現在の機械学習と計算モデルは、重なり合うオブジェクトをセグメント化できない不正確さに悩まされている。グラフィカルなユーザインタフェイスから得られるNano1Dは、ナノワイヤ、ナノチューブ、ナノロッドなどの1Dナノ粒子を分析できる。 Nanoparticles in microscopy images are usually analyzed qualitatively or manually and there is a need for autonomous quantitative analysis of these objects. In this paper, we present a physics-based computational model for accurate segmentation and geometrical analysis of one-dimensional deformable overlapping objects from microscopy images. This model, named Nano1D, has four steps of preprocessing, segmentation, separating overlapped objects and geometrical measurements. The model is tested on SEM images of Ag and Au nanowire taken from different microscopes, and thermally fragmented Ag nanowires transformed into nanoparticles with different lengths, diameters, and population densities. It successfully segments and analyzes their geometrical characteristics including lengths and average diameter. The function of the algorithm is not undermined by the size, number, density, orientation and overlapping of objects in images. The main strength of the model is shown to be its ability to segment and analyze overlapping objects successfully with more than 99% accuracy, while current machine learning and computational models suffer from inaccuracy and inability to segment overlapping objects. Benefiting from a graphical user interface, Nano1D can analyze 1D nanoparticles including nanowires, nanotubes, nanorods in addition to other 1D features of microstructures like microcracks, dislocations etc.	翻訳日:2023-12-13 02:24:17 公開日:2023-12-10
# 古典的システムからのインタラクションフリー計測の誤解 Misinference of interaction-free measurement from a classical system ( http://arxiv.org/abs/2306.13590v2 ) ライセンス: Link先を確認	Valeri Frumkin and John W. M. Bush	(参考訳) 相互作用のない測定は、量子粒子が移動しない経路に沿って物体を検出できると考えられている。したがって、これは量子現象の最も迷いの1つである。ここでは, 流体表面を自転する液滴を自転する流体を, 自作の波で誘導する流体力学的パイロット波を用いたインタラクションフリー計測の古典的な例を示す。我々は、相互作用のない量子測定の既存の合理的化は、波状に導かれる粒子によって、我々の流体力学系における古典的な記述を可能にする。 Interaction-free measurement is thought to allow for quantum particles to detect objects along paths they never traveled. As such, it represents one of the most beguiling of quantum phenomena. Here, we present a classical analog of interaction-free measurement using the hydrodynamic pilot-wave system, in which a droplet self-propels across a vibrating fluid surface, guided by a wave of its own making. We argue that existing rationalizations of interaction-free quantum measurement in terms of particles being guided by wave forms allow for a classical description manifest in our hydrodynamic system, wherein the measurement is decidedly not interaction-free.	翻訳日:2023-12-13 02:23:29 公開日:2023-12-10
# ニューラルスペクトロ偏光場 Neural Spectro-polarimetric Fields ( http://arxiv.org/abs/2306.12562v2 ) ライセンス: Link先を確認	Youngchan Kim, Wonjoon Jin, Sunghyun Cho, Seung-Hwan Baek	(参考訳) シーン内の光の空間放射率分布のモデル化は、ビュー合成を含む応用のために広く研究されている。スペクトルと偏光は、光の波動特性であり、3つのrgbスペクトルバンドへの積分と人間の視覚に対する非受容性のため、しばしば無視される。しかし、これらの性質はシーンに関する実質的な資料や幾何学的情報を包含することが知られている。本稿では、任意の波長における任意の光線の空間ストークスベクトル分布である分光偏光場をモデル化する。我々は, 位置, 方向, 波長の連続変数で, 物理的に有意なストークスベクトルをモデル化したニューラル・スペクトロ偏光場(NeSpoF)を提案する。 NeSpoFは本質的にノイズの多い生の測定を管理し、メモリ効率を示し、物理的に重要な信号を保存する。 NeSpoFを検証するために,合成シーンと実世界のシーンの両方からなる,最初のマルチビューハイパースペクトル偏光画像データセットを提案する。これらの画像は当社の小型ハイパースペクトル偏光イメージングシステムを用いて撮影され、システム欠陥に対するロバスト性について校正されている。我々は様々な場面でnespofの能力を示す。 Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. However, these properties are known to encompass substantial material and geometric information about a scene. Here, we propose to model spectro-polarimetric fields, the spatial Stokes-vector distribution of any light ray at an arbitrary wavelength. We present Neural Spectro-polarimetric Fields (NeSpoF), a neural representation that models the physically-valid Stokes vector at given continuous variables of position, direction, and wavelength. NeSpoF manages inherently noisy raw measurements, showcases memory efficiency, and preserves physically vital signals - factors that are crucial for representing the high-dimensional signal of a spectro-polarimetric field. To validate NeSpoF, we introduce the first multi-view hyperspectral-polarimetric image dataset, comprised of both synthetic and real-world scenes. These were captured using our compact hyperspectral-polarimetric imaging system, which has been calibrated for robustness against system imperfections. We demonstrate the capabilities of NeSpoF on diverse scenes.	翻訳日:2023-12-13 02:23:16 公開日:2023-12-10
# 浅い量子回路による化学精度向上に向けて:クリフォードに基づくハミルトン工学的アプローチ Towards chemical accuracy with shallow quantum circuits: A Clifford-based Hamiltonian engineering approach ( http://arxiv.org/abs/2306.12053v3 ) ライセンス: Link先を確認	Jiace Sun, Lixue Cheng, Weitang Li	(参考訳) 浅い量子回路で化学的精度を得ることは、量子化学、特に短期量子デバイスにおいて重要な課題である。本研究では,回路深さと精度のトレードオフに対処するクリフォードに基づくハミルトン工学アルゴリズム,すなわちCHEMを提案する。変分量子固有解法とハードウェア効率のansatzに基づき、(1)ハーツリー・フォックエネルギーに対応する一連の初期回路パラメータが生成可能であること、(2)回路パラメータに関して初期エネルギー勾配を効果的に最大化すること、(3)古典的処理に無視できるオーバーヘッドを課すこと、(4)追加の量子資源を必要としないこと、(4)回路トポロジーと互換性があることを保証するクリフォードに基づくハミルトニアン変換を設計する。量子ハードウェアエミュレータを用いたアプローチの有効性を実証し,30量子ゲート未満の12量子ビットのシステムに対して化学的精度を実現する。我々のクリフォード拠点のハミルトン工学的アプローチは、短期量子デバイス上での実用的な量子計算化学のための有望な道を提供する。 Achieving chemical accuracy with shallow quantum circuits is a significant challenge in quantum computational chemistry, particularly for near-term quantum devices. In this work, we present a Clifford-based Hamiltonian engineering algorithm, namely CHEM, that addresses the trade-off between circuit depth and accuracy. Based on variational quantum eigensolver and hardware-efficient ansatz, our method designs Clifford-based Hamiltonian transformation that (1) ensures a set of initial circuit parameters corresponding to the Hartree--Fock energy can be generated, (2) effectively maximizes the initial energy gradient with respect to circuit parameters, (3) imposes negligible overhead for classical processing and does not require additional quantum resources, and (4) is compatible with any circuit topology. We demonstrate the efficacy of our approach using a quantum hardware emulator, achieving chemical accuracy for systems as large as 12 qubits with fewer than 30 two-qubit gates. Our Clifford-based Hamiltonian engineering approach offers a promising avenue for practical quantum computational chemistry on near-term quantum devices.	翻訳日:2023-12-13 02:22:22 公開日:2023-12-10
# 自己回帰型ニューラル演算子の安定性に向けて Towards Stability of Autoregressive Neural Operators ( http://arxiv.org/abs/2306.10619v2 ) ライセンス: Link先を確認	Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown	(参考訳) ニューラル演算子は、物理科学における時空間系のモデリングに有望なアプローチであることが証明されている。しかし、これらのモデルを大規模システム向けにトレーニングすることは、計算とメモリの大幅なコストを発生させるため、非常に難しい - これらのシステムは、将来の時間状態を予測するために、ニューラルネットワークの自動回帰的タイムステッピングに頼ることを余儀なくされることが多い。これはコスト管理に有効であるが、時間とともに制御不能なエラーの増加と最終的には不安定になる可能性がある。この自己回帰的誤差の増大の原因を,物理システムのための先駆的ニューラルオペレータモデルを用いて解析し,その軽減法を探究する。計算/メモリコストを膨らませることなく、これらのモデル内で不安定誘導操作を慎重に制御できるアーキテクチャとアプリケーション固有の改善を導入する。本研究では,Navier-Stokes流体の流れ,浅瀬の回転,高分解能気象予報システムなどの科学システムについて報告する。ニューラル演算子に設計原則を適用することで、長期的な予測や、質的な分岐の兆候のない長い時間軸に対する誤差が、これらのシステムのオリジナルのモデルよりも大幅に低減できることを実証する。再現性のために、私たちは \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code}をオープンソースにしました。 Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code} for reproducibility.	翻訳日:2023-12-13 02:21:43 公開日:2023-12-10
# DNNに基づく適応型クルーズ制御システムに対する定常認識攻撃 Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control Systems ( http://arxiv.org/abs/2307.08939v2 ) ライセンス: Link先を確認	Xugui Zhou and Anqi Chen and Maxfield Kouzel and Haotian Ren and Morgan McCarty and Cristina Nita-Rotaru and Homa Alemzadeh	(参考訳) アダプティブ・クルーズ・コントロール(ACC、Adaptive Cruise Control)は、先導車への所望の速度と安全な距離を維持するための運転補助技術である。本稿では, カメラデータに摂動を戦略的に注入して前方衝突を引き起こす, 実行時盗聴攻撃下でのディープニューラルネットワーク(DNN)ベースのACCシステムのセキュリティを評価する。本稿では、攻撃を誘発する最も重要な時間を選択するためのコンテキスト認識戦略と、実行時に画像摂動を適応的に生成するための新しい最適化手法を提案する。提案手法は,実車,実車,実運用型accシステムからの制御ソフトウェア,実世界の運転シミュレータ,運転者による介入,高度緊急ブレーキシステム(aebs)などの安全機能を備えた現実的なシミュレーションプラットフォームを用いて,提案手法の有効性を評価する。実験の結果, 本攻撃は, 実世界の要因や環境の動的変化に対してステルス性, 堅牢でありながら, 危険発生時の成功率142.9倍, 避難率89.6%向上することがわかった。本研究は,攻撃防止における人間ドライバーの役割と基本的な安全メカニズムを明らかにする。 Adaptive Cruise Control (ACC) is a widely used driver assistance technology for maintaining the desired speed and safe distance to the leading vehicle. This paper evaluates the security of the deep neural network (DNN) based ACC systems under runtime stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at runtime. We evaluate the effectiveness of the proposed attack using a publicly available driving dataset, an actual vehicle, and a realistic simulation platform with the control software from a production ACC system, a physical-world driving simulator, and interventions by the human driver and safety features such as Advanced Emergency Braking System (AEBS). Experimental results show that the proposed attack achieves 142.9 times higher success rate in causing hazards and 89.6% higher evasion rate than baselines while being stealthy and robust to real-world factors and dynamic changes in the environment. This study highlights the role of human drivers and basic safety mechanisms in preventing attacks.	翻訳日:2023-12-13 02:11:24 公開日:2023-12-10
# 関数近似を用いたロバスト強化学習のための自然アクター批判 Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation ( http://arxiv.org/abs/2307.08875v2 ) ライセンス: Link先を確認	Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian	(参考訳) 本研究では,トレーニングシミュレータとテスト環境間のモデルミスマッチに対して頑健な評価政策を決定することを目的として,ロバスト強化学習(RL)について検討する。従来のポリシーベースのロバストなRLアルゴリズムは主に、ロバストなポリシー評価を容易にする不確実性セットの下での表の設定に重点を置いているが、状態のスケールアップ時にはもはや取り外せない。この目的のために,2つの新しい不確実性集合の定式化を提案し,その1つは二重サンプリングに基づくものであり,もう1つは積分確率計量に基づくものである。どちらも、シミュレータにしかアクセスできない場合でも、大規模で堅牢なRLを牽引可能である。本稿では,新しい不確実性集合を取り入れ,関数近似を用いる,頑健な自然なアクター批判(RNAC)アプローチを提案する。提案するrnacアルゴリズムの関数近似誤差における最適ロバストポリシーに対する有限時間収束保証を提案する。最後に,複数の MuJoCo 環境と実際の TurtleBot ナビゲーションタスクにおいて,提案した RNAC アプローチによって学習されたポリシーの堅牢性を示す。 We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.	翻訳日:2023-12-13 02:11:03 公開日:2023-12-10
# 歩数認識と教師なし適応における視線バイアスのある領域ギャップ Watch Where You Head: A View-biased Domain Gap in Gait Recognition and Unsupervised Adaptation ( http://arxiv.org/abs/2307.06751v3 ) ライセンス: Link先を確認	Gavriel Habib, Noa Barzilay, Or Shimshi, Rami Ben-Ari, Nir Darshan	(参考訳) 歩行認識は、歩行パターンによって人々を識別することを目的としたコンピュータビジョンタスクである。既存のメソッドは特定のデータセットで高いパフォーマンスを示すことが多いが、見当たらないシナリオに一般化する能力が欠けている。 unsupervised domain adaptation(uda)は、ソースドメイン上で教師付きで事前学習されたモデルを、ラベルなしのターゲットドメインに適応させようとする。限られたシナリオに対するソリューションを提案する歩行認識のためのUDAに関する研究はわずかである。本稿では,対象領域の角度や歩行方向に対するバイアスによる歩行認識モデルの適用において,基本的な現象を明らかにする。そこで我々は,新しい三重項選択戦略とカリキュラム学習を組み合わせることで,このバイアスを軽減するための修正を提案する。そこで本稿では,教師なしドメイン適応(GOUDA)のためのゲイト指向方式を提案する。 casia-b,ou-mvlp,grown,gait3dの4つの広く使われているgaitデータセットと,gaitset,gaitpart,gaitglの3つのバックボーンについて広範な実験を行い,アプローチバイアスを正当化し,uda以前の作業よりも提案手法の優越性を示す。 Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Although existing methods often show high performance on specific datasets, they lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. There are only a few works on UDA for gait recognition proposing solutions to limited scenarios. In this paper, we reveal a fundamental phenomenon in adaptation of gait recognition models, caused by the bias in the target domain to viewing angle or walking direction. We then suggest a remedy to reduce this bias with a novel triplet selection strategy combined with curriculum learning. To this end, we present Gait Orientation-based method for Unsupervised Domain Adaptation (GOUDA). We provide extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP, GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL, justifying the view bias and showing the superiority of our proposed method over prior UDA works.	翻訳日:2023-12-13 02:09:43 公開日:2023-12-10
# SOGDet:Semantic-Occupancy Guided Multi-view 3D Object Detection SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection ( http://arxiv.org/abs/2308.13794v2 ) ライセンス: Link先を確認	Qiu Zhou, Jinming Cao, Hanchao Leng, Yifang Yin, Yu Kun and Roger Zimmermann	(参考訳) 自動運転の分野では、3D環境の正確で包括的な認識が不可欠である。 Bird's Eye View (BEV) ベースの手法は、多視点画像を入力として使用する3Dオブジェクト検出のための有望なソリューションとして登場した。しかし、既存の3Dオブジェクト検出手法は、歩道や植生などの環境の物理的文脈を無視することが多く、結果として準最適性能が得られる。本稿では,sogdet(semantic-occupancy guided multi-view 3d object detection)と呼ばれる3次元意味空間枝を利用して3次元物体検出の精度を向上させる手法を提案する。特に、意味的占有によってモデル化された物理的文脈は、検出器がより総合的な視点でシーンを認識するのに役立つ。私たちのSOGDetは柔軟で、既存のほとんどのBEVベースのメソッドとシームレスに統合できます。本手法の有効性を評価するため,いくつかの最先端ベースラインに適用し,排他的nuScenesデータセット上で広範囲な実験を行う。以上の結果から,SOGDet は nuScenes Detection Score (NDS) と平均平均精度 (mAP) の3つのベースライン法の性能を一貫して向上させることがわかった。これは、3Dオブジェクト検出と3Dセマンティック占有の組み合わせが、3D環境をより包括的に認識し、より堅牢な自律運転システムの構築を支援することを示唆している。コードは、https://github.com/zhouqiu/SOGDet.comで入手できる。 In the field of autonomous driving, accurate and comprehensive perception of the 3D environment is crucial. Bird's Eye View (BEV) based methods have emerged as a promising solution for 3D object detection using multi-view images as input. However, existing 3D object detection methods often ignore the physical context in the environment, such as sidewalk and vegetation, resulting in sub-optimal performance. In this paper, we propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection), that leverages a 3D semantic-occupancy branch to improve the accuracy of 3D object detection. In particular, the physical context modeled by semantic occupancy helps the detector to perceive the scenes in a more holistic view. Our SOGDet is flexible to use and can be seamlessly integrated with most existing BEV-based methods. To evaluate its effectiveness, we apply this approach to several state-of-the-art baselines and conduct extensive experiments on the exclusive nuScenes dataset. Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP). This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems. The codes are available at: https://github.com/zhouqiu/SOGDet.	翻訳日:2023-12-13 02:04:23 公開日:2023-12-10
# オンライン動的埋め込み予測による停滞解消型分散gnnトレーニング Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction ( http://arxiv.org/abs/2308.13466v2 ) ライセンス: Link先を確認	Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao	(参考訳) 最近のグラフニューラルネットワーク(GNN)の成功にもかかわらず、近隣の爆発によって大規模なグラフでGNNをトレーニングすることは依然として困難である。修正として、分散コンピューティングは、豊富なコンピューティングリソース(例えばgpu)を活用することで、有望なソリューションになる。しかし,グラフデータのノード依存性は,大規模な通信オーバーヘッドに悩まされる分散GNNトレーニングにおいて,高い並行性を実現することの難しさを増大させる。これを解決するために、歴史的価値近似は分散トレーニング技術の有望なクラスと見なされる。オフラインメモリを使用して、正確な値の安価な近似として履歴情報をキャッシュし、高い並行性を実現する。しかし、そのような利点は、古いトレーニング情報を含むコストがかかるため、停滞、不正確さ、および収束の問題に繋がる。これらの課題を克服するため,本稿では,新しいスケーラブル分散gnnトレーニングフレームワークであるsat(staleness-alleviated training)を提案する。 SATの鍵となる考え方は、GNNの埋め込み進化を時間グラフとしてモデル化し、その上にモデルを構築し、将来の埋め込みを予測することである。本稿では,埋め込み予測器と分散GNNを代替的に学習するオンラインアルゴリズムを提案し,さらに収束解析を行う。実験により,satは組込みの停滞を効果的に軽減し,大規模グラフデータセットの性能と収束速度を向上できることを実証した。 Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which suffers from the massive communication overhead. To address it, Historical value approximation is deemed a promising class of distributed training techniques. It utilizes an offline memory to cache historical information (e.g., node embedding) as an affordable approximation of the exact value and achieves high concurrency. However, such benefits come at the cost of involving dated training information, leading to staleness, imprecision, and convergence issues. To overcome these challenges, this paper proposes SAT (Staleness-Alleviated Training), a novel and scalable distributed GNN training framework that reduces the embedding staleness adaptively. The key idea of SAT is to model the GNN's embedding evolution as a temporal graph and build a model upon it to predict future embedding, which effectively alleviates the staleness of the cached historical embedding. We propose an online algorithm to train the embedding predictor and the distributed GNN alternatively and further provide a convergence analysis. Empirically, we demonstrate that SAT can effectively reduce embedding staleness and thus achieve better performance and convergence speed on multiple large-scale graph datasets.	翻訳日:2023-12-13 02:03:55 公開日:2023-12-10
# v2a-mapper:基盤モデル接続による視覚-聴覚生成のための軽量ソリューション V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models ( http://arxiv.org/abs/2308.09300v3 ) ライセンス: Link先を確認	Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai	(参考訳) 基礎モデル(FM)の上に人工知能(AI)システムを構築することは、AI研究における新たなパラダイムになりつつある。膨大なデータから学習した代表的および生成能力は、スクラッチから余分なトレーニングをすることなく、容易に適応し、幅広い下流タスクに移行することができる。しかし、音声モダリティが関与する場合、クロスモーダル生成におけるFMの活用は未検討のままである。一方,視覚入力から意味的関連音を自動生成することは,モーダル・ジェネレーション研究において重要な課題である。このvision-to-audio(v2a)生成問題を解決するために、既存の手法では、小さなデータセットを使って複雑なシステムをスクラッチから設計し構築する傾向がある。本稿では,基礎モデル,特にCLIP,CLAP,AudioLDMを活用することで,この問題に対する軽量な解決策を提案する。まず視覚的CLIPの潜在空間と聴覚的CLAPモデルとの領域ギャップについて検討する。次に,CLIP と CLAP 空間間の視覚的入力を変換することで,領域ギャップを埋めるシンプルなマッパー機構 (V2A-Mapper) を提案する。変換されたCLAP埋め込みを条件に、事前訓練された音声生成FM AudioLDMを採用し、高忠実で視覚的に整合した音を生成する。従来の手法と比較して,本手法ではV2A-Mapperの迅速な訓練しか必要としない。さらに、V2A-Mapperの選択に関する広範な実験を行い、生成マッパーが忠実度と可変性(FD)に優れ、レグレッションマッパーが相対性(CS)に若干優れていることを示す。 2つのV2Aデータセットの客観的評価と主観評価は、現在の最先端手法と比較して、提案手法の優位性を示し、パラメータは86%少なく、FDとCSは53%、CSは19%改善した。 Building artificial intelligence (AI) systems on top of a set of foundation models (FMs) is becoming a new paradigm in AI research. Their representative and generative abilities learnt from vast amounts of data can be easily adapted and transferred to a wide range of downstream tasks without extra training from scratch. However, leveraging FMs in cross-modal generation remains under-researched when audio modality is involved. On the other hand, automatically generating semantically-relevant sound from visual input is an important problem in cross-modal generation studies. To solve this vision-to-audio (V2A) generation problem, existing methods tend to design and build complex systems from scratch using modestly sized datasets. In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM. We first investigate the domain gap between the latent space of the visual CLIP and the auditory CLAP models. Then we propose a simple yet effective mapper mechanism (V2A-Mapper) to bridge the domain gap by translating the visual input between CLIP and CLAP spaces. Conditioned on the translated CLAP embedding, pretrained audio generative FM AudioLDM is adopted to produce high-fidelity and visually-aligned sound. Compared to previous approaches, our method only requires a quick training of the V2A-Mapper. We further analyze and conduct extensive experiments on the choice of the V2A-Mapper and show that a generative mapper is better at fidelity and variability (FD) while a regression mapper is slightly better at relevance (CS). Both objective and subjective evaluation on two V2A datasets demonstrate the superiority of our proposed method compared to current state-of-the-art approaches - trained with 86% fewer parameters but achieving 53% and 19% improvement in FD and CS, respectively.	翻訳日:2023-12-13 02:01:56 公開日:2023-12-10
# 一般浅層ネットワークによる近似のトラクタビリティ Tractability of approximation by general shallow networks ( http://arxiv.org/abs/2308.03230v2 ) ライセンス: Link先を確認	Hrushikesh Mhaskar, Tong Mao	(参考訳) 本稿では,一般浅層ネットワークの次元独立境界(ニューラルネットワーク, \textbf{123} (2020), 142-152)において,よりシャープな結果版を提案する。 \mathbb{x}$ と $\mathbb{y}$ をコンパクト距離空間とする。 x\mapsto\int_{\mathbb{Y}} G(x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG(x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$。被覆数の観点から、$\mathbb{x}$ と $\mathbb{y}$ の次元を定義すると、n$ の項で近似の次数上の次元独立な境界が得られる。応用には、高次元空間への関数拡張の重要な問題だけでなく、電力整合線形単位ネットワーク、粒子関数ネットワーク、特定の放射基底関数ネットワークによる近似が含まれる。 In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.	翻訳日:2023-12-13 01:59:22 公開日:2023-12-10
# QNNのトレーニングに必要なサンプル量の削減について:トレーニングデータの線形構造に関する制約 On Reducing the Amount of Samples Required for Training of QNNs: Constraints on the Linear Structure of the Training Data ( http://arxiv.org/abs/2309.13711v2 ) ライセンス: Link先を確認	Alexander Mandl, Johanna Barzen, Frank Leymann, Daniel Vietz	(参考訳) 古典的ニューラルネットワークのトレーニングは通常、多数のトレーニングサンプルを必要とする。絡み合ったトレーニングサンプルを使用することで、量子ニューラルネットワーク(QNN)はトレーニングプロセスに必要なトレーニングサンプルの量を著しく削減する可能性がある。しかし、結果のQNNによる誤った予測数を最小化するためには、トレーニングサンプルの構造が一定の要件を満たすことが不可欠である。一方、トレーニングサンプルのセット全体に対して、正確な絡み合いの程度が固定されなければならない。一方、トレーニングサンプルは線形独立かつ非直交でなければならない。しかし、これらの要件を満たさないことがQNNの結果に与える影響は、十分に研究されていない。これを解決するため、QNFL定理の証明を拡張した。 (i)絡み合いの程度の違いに対する定理の一般化を提供する。この一般化は、トレーニングサンプルのセットにおける絡み合いの平均度を用いて、QNNの期待品質を予測できることを示している。さらに私たちは (II) 線形依存型, 直交型である適度に絡み合ったトレーニングサンプルに対するQNNの予測精度の新しい推定値を導入する。私たちの分析結果は 3)QNN訓練を模擬し,訓練後のQNNの質を分析して実験的に検証した。 Training classical neural networks generally requires a large number of training samples. Using entangled training samples, Quantum Neural Networks (QNNs) have the potential to significantly reduce the amount of training samples required in the training process. However, to minimize the number of incorrect predictions made by the resulting QNN, it is essential that the structure of the training samples meets certain requirements. On the one hand, the exact degree of entanglement must be fixed for the whole set of training samples. On the other hand, training samples must be linearly independent and non-orthogonal. However, how failing to meet these requirements affects the resulting QNN is not fully studied. To address this, we extend the proof of the QNFL theorem to (i) provide a generalization of the theorem for varying degrees of entanglement. This generalization shows that the average degree of entanglement in the set of training samples can be used to predict the expected quality of the QNN. Furthermore, we (ii) introduce new estimates for the expected accuracy of QNNs for moderately entangled training samples that are linear dependent or orthogonal. Our analytical results are (iii) experimentally validated by simulating QNN training and analyzing the quality of the QNN after training.	翻訳日:2023-12-13 01:51:35 公開日:2023-12-10
# ジョセフソンパラメトリック発振器を用いたイジングマシン A Josephson Parametric Oscillator-Based Ising Machine ( http://arxiv.org/abs/2309.03407v2 ) ライセンス: Link先を確認	Sasan Razmkhah, Mehdi Kamal, Nobuyuki Yoshikawa, Massoud Pedram	(参考訳) イジングマシンはNP完全組合せ最適化問題を高速に解くための有望なソリューションとして登場し、従来の計算手法の能力を超越している。加熱過程におけるハミルトン基底状態の効率的な決定により、Isingマシンは最適化問題に対処するためにCPUを効率的に補完することができる。これらのイジングマシンを実現するために、二安定発振器はイジングモデルの原子スピンと相互作用をエミュレートするために必須である。本研究では,スケーラブルな超伝導イジングマシンの基本単位として,ジョセフソンパラメトリック振動子(jpo)を用いたタイル構造を提案する。超伝導体ベースの発振器であるJPOの双安定特性を利用して、提案機は7.5GHzの周波数で動作でき、CMOSベースのシステムに比べて消費電力は大幅に少ない(3桁)。さらに、提案したタイル構造とLHZアーキテクチャとの互換性により、大規模統合の実現性が保証される。騒音環境下でのタイルのシミュレーションを行い,その機能検証を行った。その結果をハミルトニアンモデルの解析解と比較し,その動作特性を検証した。この検証は、Isingマシンの実装におけるJPOベースのタイルの有効性と有効性を示し、量子コンピューティングにおける効率的でスケーラブルな組合せ最適化のための新しい道を開く。 Ising machines have emerged as a promising solution for rapidly solving NP-complete combinatorial optimization problems, surpassing the capabilities of traditional computing methods. By efficiently determining the ground state of the Hamiltonian during the annealing process, Ising machines can effectively complement CPUs in tackling optimization challenges. To realize these Ising machines, a bi-stable oscillator is essential to emulate the atomic spins and interactions of the Ising model. This study introduces a Josephson parametric oscillator (JPO)-based tile structure, serving as a fundamental unit for scalable superconductor-based Ising machines. Leveraging the bi-stable nature of JPOs, which are superconductor-based oscillators, the proposed machine can operate at frequencies of 7.5GHz while consuming significantly less power (by three orders of magnitude) than CMOS-based systems. Furthermore, the compatibility of the proposed tile structure with the Lechner-Hauke-Zoller (LHZ) architecture ensures its viability for large-scale integration. We conducted simulations of the tile in a noisy environment to validate its functionality. We verified its operational characteristics by comparing the results with the analytical solution of its Hamiltonian model. This verification demonstrates the feasibility and effectiveness of the JPO-based tile in implementing Ising machines, opening new avenues for efficient and scalable combinatorial optimization in quantum computing.	翻訳日:2023-12-13 01:49:07 公開日:2023-12-10
# パッチバイパッチパラダイムによるganを用いた無限サイズのテクスチャ生成 Generating Infinite-Size Textures using GANs with Patch-by-Patch Paradigm ( http://arxiv.org/abs/2309.02340v3 ) ライセンス: Link先を確認	Alhasan Abdellatif and Ahmed H. Elsheikh	(参考訳) 本稿では,パッチ・バイ・パッチ・パラダイムに基づくGAN(Generative Adversarial Networks)を用いて,無限サイズのテクスチャ画像を生成する手法を提案する。既存のテクスチャ合成技術は、生成モデルへの単一のフォワードパスを使用して、大規模なテクスチャを生成することに依存している。対照的に、提案手法は単一のテクスチャイメージ上にGANモデルをトレーニングし、局所的に相関し、より大きな画像を形成するためにシームレスに結合できる比較的小さなパッチを生成する。このメソッドはジェネレータのローカルパディングに依存し、生成されたパッチ間の一貫性を保証する。また、空間確率変調を利用して局所的な変動を可能にし、大規模画像のパターンアライメントを改善する。トレーニングされたモデルは、局所的なテクスチャ構造を学び、任意のサイズの画像を生成すると同時に、一貫性と多様性を維持します。実験結果は、GPUメモリの比例的な成長を示す既存のアプローチと比較して、生成した画像サイズに対して一定のGPUスケーラビリティを示す。 In this paper, we introduce a novel approach for generating texture images of infinite sizes using Generative Adversarial Networks (GANs) based on a patch-by-patch paradigm. Existing texture synthesis techniques rely on generating large-scale textures using a single forward pass to the generative model; this approach limits the scalability and flexibility of the images produced. In contrast, the proposed approach trains a GAN model on a single texture image to generate relatively small-size patches that are locally correlated and can be seamlessly concatenated to form a larger image. The method relies on local padding in the generator to ensure consistency between the generated patches. It also utilizes spatial stochastic modulation to allow for local variations and improve patterns alignment in the large-scale image. The trained models learn the local texture structure and are able to generate images of arbitrary sizes, while also maintaining the coherence and diversity. Experimental results demonstrate constant GPU scalability with respect to the generated image size compared to existing approaches that exhibit a proportional growth in GPU memory.	翻訳日:2023-12-13 01:48:16 公開日:2023-12-10
# MedShapeNet - コンピュータビジョンのための3D医療形状の大規模データセット MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision ( http://arxiv.org/abs/2308.16139v4 ) ライセンス: Link先を確認	Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen, Heinrich M\"achler, Jan Stefan Kirschke, Ezequiel de la Rosa, Patrick Ferdinand Christ, Hongwei Bran Li, David G. Ellis, Michele R. Aizenberg, Sergios Gatidis, Thomas K\"ustner, Nadya Shusharina, Nicholas Heller, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Anjany Sekuboyina, Maximilian L\"offler, Hans Liebl, Reuben Dorent, Tom Vercauteren, Jonathan Shapey, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Achraf Ben-Hamadou, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Federico Bolelli, Costantino Grana, Luca Lumetti, Hamidreza Salehi, Jun Ma, Yao Zhang, Ramtin Gharleghi, Susann Beier, Arcot Sowmya, Eduardo A. Garza-Villarreal, Thania Balducci, Diego Angeles-Valdez, Roberto Souza, Leticia Rittner, Richard Frayne, Yuanfeng Ji, Vincenzo Ferrari, Soumick Chatterjee, Florian Dubost, Stefanie Schreiber, Hendrik Mattern, Oliver Speck, Daniel Haehn, Christoph John, Andreas N\"urnberger, Jo\~ao Pedrosa, Carlos Ferreira, Guilherme Aresta, Ant\'onio Cunha, Aur\'elio Campilho, Yannick Suter, Jose Garcia, Alain Lalande, Vicky Vandenbossche, Aline Van Oevelen, Kate Duquesne, Hamza Mekhzoum, Jef Vandemeulebroucke, Emmanuel Audenaert, Claudia Krebs, Timo van Leeuwen, Evie Vereecke, Hauke Heidemeyer, Rainer R\"ohrig, Frank H\"olzle, Vahid Badeli, Kathrin Krieger, Matthias Gunzer, Jianxu Chen, Timo van Meegdenburg, Amin Dada, Miriam Balzer, Jana Fragemann, Frederic Jonske, Moritz Rempe, Stanislav Malorodov, Fin H. Bahnsen, Constantin Seibold, Alexander Jaus, Zdravko Marinov, Paul F. Jaeger, Rainer Stiefelhagen, Ana Sofia Santos, Mariana Lindo, Andr\'e Ferreira, Victor Alves, Michael Kamp, Amr Abourayya, Felix Nensa, Fabian H\"orst, Alexander Brehmer, Lukas Heine, Yannik Hanusrichter, Martin We{\ss}ling, Marcel Dudda, Lars E. Podleska, Matthias A. Fink, Julius Keyl, Konstantinos Tserpes, Moon-Sung Kim, Shireen Elhabian, Hans Lamecker, D\v{z}enan Zuki\'c, Beatriz Paniagua, Christian Wachinger, Martin Urschler, Luc Duong, Jakob Wasserthal, Peter F. Hoyer, Oliver Basu, Thomas Maal, Max J. H. Witjes, Gregor Schiele, Ti-chiun Chang, Seyed-Ahmad Ahmadi, Ping Luo, Bjoern Menze, Mauricio Reyes, Thomas M. Deserno, Christos Davatzikos, Behrus Puladi, Pascal Fua, Alan L. Yuille, Jens Kleesiek, Jan Egger	(参考訳) 深層学習以前は、対象を記述するのに「textit{shape}」が一般的であった。今日では、医療画像における最先端のSOTAアルゴリズムは、主にボクセルグリッド、メッシュ、ポイントクラウド、暗黙の表面モデルを使用するコンピュータビジョンから分岐している。これは、プレミアビジョンカンファレンスにおける多くのシェイプ関連出版物、および \textit{ShapeNet} (約51,300モデル) と \textit{Princeton ModelNet} (127,915モデル) の人気が高まっている。医療分野では,医療応用へのデータ駆動型視覚アルゴリズムの翻訳を容易にし,SOTAビジョンアルゴリズムを医療問題に適用するために,解剖学的形状(骨,臓器,血管など)と手術器具の3次元モデル(textit{MedShapeNet})を多数提示する。特異な特徴として,実際の患者の画像データから形状のほとんどを直接モデル化する。今日、 \textit{medshapenet}には、アノテーションとペアリングされた10万以上の形状を持つ23のデータセットが含まれている(ground truth)。私たちのデータは、webインターフェースとpython application programming interface(api)を介して自由にアクセスでき、判別、再構成、変動ベンチマーク、仮想、拡張、混合現実、および3dプリンティングの様々なアプリケーションで使用できます。例として,脳腫瘍の分類,顔面と頭蓋骨の再建,マルチクラス解剖学の完成,教育,3Dプリンティングの分野での応用例を挙げる。将来的には、データを拡張し、インターフェースを改善します。プロジェクトページは、 \url{https://medshapenet.ikim.nrw/} と \url{https://github.com/jianningli/medshapenet-feedback} である。 Prior to the deep learning era, \textit{shape} was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of \textit{ShapeNet} (about 51,300 models) and \textit{Princeton ModelNet} (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called \textit{MedShapeNet}, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, \textit{MedShapeNet} includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: \url{https://medshapenet.ikim.nrw/} and \url{https://github.com/Jianningli/medshapenet-feedback}	翻訳日:2023-12-13 01:47:03 公開日:2023-12-10
# エキシトン-ポーラリトン凝縮:フーリエニューラルオペレーターアプローチ Exciton-Polariton Condensates: A Fourier Neural Operator Approach ( http://arxiv.org/abs/2309.15593v2 ) ライセンス: Link先を確認	Surya T. Sathujoda, Yuan Wang, Kanishk Gandhi	(参考訳) 過去10年間の半導体製造の進歩は、エキシトン・ポラリトン凝縮によって駆動される全光学デバイスに関する広範な研究を触媒している。トランジスタを含むこれらの装置の予備的検証は、環境条件下においても奨励効果を示す。しかし、大規模な応用には依然として大きな課題が残っており、安定するために長い時間を要する複雑な非線形系をシミュレートするために使用できる堅牢な解法がない。このニーズに対処するため,機械学習に基づくフーリエニューラル演算子の応用を提案し,グロス・ピタエフスキー方程式と余剰エキシトンレート方程式の解を求める。この研究は、ニューラル演算子のエキシトン-ポラリトン凝縮系への最初の直接的応用である。提案手法は,CUDAベースのGPU解法よりも1000倍近い精度で最終状態の解を予測できることを示す。さらに、これは実験データを統合することによって、全光学チップ設計ワークフローの潜在的な道を開く。 Advancements in semiconductor fabrication over the past decade have catalyzed extensive research into all-optical devices driven by exciton-polariton condensates. Preliminary validations of such devices, including transistors, have shown encouraging results even under ambient conditions. A significant challenge still remains for large scale application however: the lack of a robust solver that can be used to simulate complex nonlinear systems which require an extended period of time to stabilize. Addressing this need, we propose the application of a machine-learning-based Fourier Neural Operator approach to find the solution to the Gross-Pitaevskii equations coupled with extra exciton rate equations. This work marks the first direct application of Neural Operators to an exciton-polariton condensate system. Our findings show that the proposed method can predict final-state solutions to a high degree of accuracy almost 1000 times faster than CUDA-based GPU solvers. Moreover, this paves the way for potential all-optical chip design workflows by integrating experimental data.	翻訳日:2023-12-13 01:37:37 公開日:2023-12-10
# Kmスケール大気下降の残留拡散モデル Residual Diffusion Modeling for Km-scale Atmospheric Downscaling ( http://arxiv.org/abs/2309.15214v3 ) ライセンス: Link先を確認	Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, and Mike Pritchard	(参考訳) 気象リスクの予測には、粗いグローバルインプットによって駆動される高価なkmスケールシミュレーションが必要である。ここでは,25km ERA5再解析に基づく台湾上空2kmの高分解能気象モデルを用いて,コスト効率の高い確率ダウンスケーリングモデルを訓練する。気象データのマルチスケール機械学習の課題に対処するために、2段階のアプローチ補正拡散(\textit{corrdiff})を採用し、そこで平均のunet予測を拡散ステップで補正する。レイノルズによる流体力学の分解と同様に、これは生成学習を確率スケールに分離する。 \textit{corrdiff} は熟練したrmseと crps を示し、極端でもスペクトルと分布を忠実に復元する。コヒーレント気象現象のケーススタディでは、台風の目壁付近で激しい降雨と急勾配のコロケーション、極端な風と降雨帯といった、学習物理学を連想させる適切な多変量関係が示される。グローバルな予測のスケールダウンは、これらのメリットの多くをうまく維持し、マシンラーニングの天気予報のエンドツーエンドなグローバルなスケールの可能性を先導する。 Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs. Here, a cost-effective stochastic downscaling model is trained from a high-resolution 2-km weather model over Taiwan conditioned on 25-km ERA5 reanalysis. To address the multi-scale machine learning challenges of weather data, we employ a two-step approach Corrector Diffusion (\textit{CorrDiff}), where a UNet prediction of the mean is corrected by a diffusion step. Akin to Reynolds decomposition in fluid dynamics, this isolates generative learning to the stochastic scales. \textit{CorrDiff} exhibits skillful RMSE and CRPS and faithfully recovers spectra and distributions even for extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics: the collocation of intense rainfall and sharp gradients in fronts and extreme winds and rainfall bands near the eyewall of typhoons. Downscaling global forecasts successfully retains many of these benefits, foreshadowing the potential of end-to-end, global-to-km-scales machine learning weather predictions.	翻訳日:2023-12-13 01:37:23 公開日:2023-12-10
# 共同音声と音声の理解 Joint Audio and Speech Understanding ( http://arxiv.org/abs/2309.14405v3 ) ライセンス: Link先を確認	Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass	(参考訳) 人間は音声と非音声の両方を含む音声信号に囲まれている。音声および非音声音声イベントの認識と理解は、両者の関係を深く理解すると共に、基本的な認知能力を構成する。概念的に類似した普遍的なオーディオ知覚と高度な推論能力を持つ、ltu-asと呼ばれる機械学習モデルが初めて構築されました。具体的には、Whisperを知覚モジュールとして、LLaMAを推論モジュールとして統合することにより、LTU-ASは音声テキスト、音声パラ言語学、非音声音声イベントを同時に認識し、共同理解することができる。 Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.	翻訳日:2023-12-13 01:36:06 公開日:2023-12-10
# 羅生門重要度分布:不安定かつ単一モデルに基づく可変値のRID化 The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance ( http://arxiv.org/abs/2309.13775v3 ) ライセンス: Link先を確認	Jon Donnelly, Srikar Katta, Cynthia Rudin, Edward P. Browne	(参考訳) 可変重要度を定量化することは、遺伝学、公共政策、医学などの分野における高リスクな質問に答えるために不可欠である。現在の手法は一般に、与えられたデータセットでトレーニングされた与えられたモデルに対する変数の重要度を計算する。しかし、あるデータセットに対して、ターゲットとなる結果について等しく説明できる多くのモデルが存在するかもしれない。さらに、与えられたデータセットの可能なすべての説明を考慮に入れたとしても、これらの洞察は一般化しないかもしれない。本稿では,すべての優れたモデルの集合における変数の重要性を定量化し,データ分布全体で安定な新しい変数重要度フレームワークを提案する。私たちのフレームワークは非常に柔軟で、既存のモデルクラスやグローバル変数重要度メトリクスと統合できます。実験により,提案手法は他の手法が失敗する複雑なシミュレーション環境において,変数重要度ランキングを回復することを示した。さらに,本フレームワークは,基礎となるデータ分布に対する変数の真の重要性を正確に推定する。推定器の整合性および有限サンプル誤差率に関する理論的保証を提供する。最後に、HIV感染者のHIV負荷を予測するためにどの遺伝子が重要であるかを実世界のケーススタディで検証し、これまで研究されていない重要な遺伝子を強調した。コードはhttps://github.com/jdonnelly36/Rashomon_Importance_Distributionで公開されている。 Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.	翻訳日:2023-12-13 01:35:55 公開日:2023-12-10
# 大規模分散モデルトレーニングのための効率的な並列化レイアウト Efficient Parallelization Layouts for Large-Scale Distributed Model Training ( http://arxiv.org/abs/2311.05610v2 ) ライセンス: Link先を確認	Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo	(参考訳) 大きな言語モデルを効果的に訓練するには、数百のハードウェアアクセラレーターを並列化し、様々な計算とメモリの最適化を実行する必要がある。組み合わせると、これらの戦略の多くは最終訓練効率に関する複雑な相互作用を持つ。この問題に取り組む以前の作業では、フラッシュアテンションやシーケンス並列処理など、最新の最適化セットにアクセスできなかった。本研究では,大規模言語モデルのトレーニング構成に関する包括的アブレーション研究を行う。この大規模な研究を、最も効率的なトレーニングのためのいくつかの重要な推奨事項にまとめます。例えば、マイクロバッチサイズ1を使用することで、最も効率的なトレーニングレイアウトが可能になります。より大きなマイクロバッチサイズは、アクティベーションチェックポイントやモデル並列性の高次化を必要とし、さらに大きなパイプラインバブルにつながる。最も効率的な構成は、Llama 13Bモデルをトレーニングする際のモデルFLOPs利用率70.5%など、様々なモデルサイズで最先端のトレーニング効率を達成できます。 Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a Llama 13B model.	翻訳日:2023-12-13 01:28:35 公開日:2023-12-10
# 量子力学の仮定としての最大エントロピー原理 Maximum Entropy Principle as Postulate of Quantum Mechanics ( http://arxiv.org/abs/2311.04893v2 ) ライセンス: Link先を確認	Alexei V. Tkachenko	(参考訳) 量子力学(QM)の定式化から1世紀も経っても、波動関数崩壊(WFC)は理論の論争的な側面のままである。環境誘起デコヒーレンス(英語版)は、オープン量子システムにおけるユニタリ進化が、そのコンポーネント内の効果的なwfcにどのようにつながるかを示すことによって、部分的な解決を提供する。しかし、このアプローチ自体がQMの完全自己整合的な再構成につながるわけではない。我々は、WFCとボルンの確率則の両方を除外した修正されたQM仮定を導入する。最大エントロピー原理(英: Maximum Entropy Principle)は、相互に互換性のある観測のための条件付き確率を示す、より弱い仮定である。この定式化の中で、WFCとボルンの規則は共に新しい性質となる。 Even a century after the formulation of Quantum Mechanics (QM), the wave function collapse (WFC) remains a contentious aspect of the theory. Environment-induced decoherence has offered a partial resolution by illustrating how unitary evolution in an open quantum system can lead to effective WFC within its components. However, this approach by itself does not lead to a fully self-consistent reformulation of QM. We introduce a modified set of QM postulates, which exclude both WFC and Born's probability rule. They are replaced with the Maximum Entropy Principle, a weaker postulate that specifies conditional probabilities for mutually compatible observations. Within this formulation, both WFC and Born's rule become emerging properties.	翻訳日:2023-12-13 01:28:01 公開日:2023-12-10
# MAS:2次元拡散を用いた3次元モーション生成のためのマルチビューアンセストラルサンプリング MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion ( http://arxiv.org/abs/2310.14729v2 ) ライセンス: Link先を確認	Roy Kapon, Guy Tevet, Daniel Cohen-Or and Amit H. Bermano	(参考訳) 本研究では,3次元映像から得られた動きに基づいて学習した2次元拡散モデルを用いて,3次元動き生成法であるマルチビューサンプリング(mas)を提案する。そのため、masは3dデータの収集が困難で困難であるため、これまで未調査だった、エキサイティングで多様な動きの分野へのチャンスを開く。 MASは、同じ3Dモーションの異なるビューを表す複数の2Dモーションシーケンスを同時に識別する。個々の世代を統一された3Dシーケンスに組み合わせ、元のビューに投影することで、各拡散ステップにおけるすべてのビューの一貫性を保証する。プロバスケットボールの練習、球技を特徴とする新体操競技、競馬の映像から得られた2次元ポーズデータのmasを実演する。それぞれの領域において、3Dモーションキャプチャは困難であるが、MASは多様なリアルな3Dシーケンスを生成する。小修正を繰り返し適用することで各試料を最適化するスコア蒸留法とは異なり,本手法は拡散フレームワークのために構築されたサンプリングプロセスを使用する。示すように、MASはドメイン外サンプリングやモード崩壊といった一般的な問題を避けます。 https://guytevet.github.io/mas-page/ We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation, using 2D diffusion models that were trained on motions obtained from in-the-wild videos. As such, MAS opens opportunities to exciting and diverse fields of motion previously under-explored as 3D data is scarce and hard to collect. MAS works by simultaneously denoising multiple 2D motion sequences representing different views of the same 3D motion. It ensures consistency across all views at each diffusion step by combining the individual generations into a unified 3D sequence, and projecting it back to the original views. We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers, rhythmic gymnastic performances featuring a ball apparatus, and horse races. In each of these domains, 3D motion capture is arduous, and yet, MAS generates diverse and realistic 3D sequences. Unlike the Score Distillation approach, which optimizes each sample by repeatedly applying small fixes, our method uses a sampling process that was constructed for the diffusion framework. As we demonstrate, MAS avoids common issues such as out-of-domain sampling and mode-collapse. https://guytevet.github.io/mas-page/	翻訳日:2023-12-13 01:26:11 公開日:2023-12-10
# BERTの一般化に対するヒトとヒトの親和性サンプルの影響 Effects of Human Adversarial and Affable Samples on BERT Generalization ( http://arxiv.org/abs/2310.08008v4 ) ライセンス: Link先を確認	Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor	(参考訳) bertベースのモデルは、leaderboardsでパフォーマンスが高かったが、現実の世界では一般化を必要とする状況では、かなり悪くなっている。限られた量のトレーニングデータは、機械学習における一般化性を達成するための鍵となる障害とみなされる。本稿では,モデルの一般化性に対する量ではなく,データ品質のトレーニングが与える影響について検討する。訓練データの特徴として,人間-敵関係 (h-敵関係) の部分,すなわち,一見小さな差異があるが接地ラベルが異なるサンプルペア,および人間-適応(h-親和性)訓練サンプル,すなわち,接地ラベルは同じ接地ラベルを持つサンプルペアの2つを検討した。サンプルの固定サイズについては,親指の規則として10～30%のh-adversarialインスタンスを持つと精度が向上し,F1はテキスト分類や関係抽出のタスクにおいて最大20ポイント向上することがわかった。この範囲を超えてh-adversarialが増加すると、パフォーマンスのプラトーや劣化が起きる。対照的に、h-affablesはモデルの一般化可能性に寄与せず、一般化性能を低下させることもある。 BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model's generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e., sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e., sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model's generalizability and may even degrade generalization performance.	翻訳日:2023-12-13 01:25:21 公開日:2023-12-10
# 動的外観粒子ニューラル放射場 Dynamic Appearance Particle Neural Radiance Field ( http://arxiv.org/abs/2310.07916v2 ) ライセンス: Link先を確認	Ancheng Lin, Jun Li	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は3Dシーンをモデル化する大きな可能性を示している。動的NeRFは、典型的には変形場を用いて、時間変化要素をキャプチャすることでこのモデルを拡張する。既存の動的nerfは、光放射と変形場の両方に同様のオイラー表現を用いる。これは外見と動きを密結合させ、物理的解釈を欠いている。本研究では,動的3次元シーンにおける視覚的要素の運動をモデル化するための粒子ベース表現を導入し,DAP-NeRF(Dynamic Outearance Particle Neural Radiance Field)を提案する。 DAP-NeRFは静的場と動的場の重ね合わせからなる。動的場は、シーン内の小さな動的要素の視覚情報を伝達し、モーションモデルを備えた「外見粒子」の集合として定量化される。粒子の静的場、視覚特徴、運動モデルを含む全ての構成要素は、シーンに関する事前の幾何学的知識なしに単眼ビデオから学習される。粒子モデルのための効率的な計算フレームワークを開発する。また,動きモデリングを評価するための新しいデータセットを構築した。実験結果から, DAP-NeRFは外見だけでなく, 3次元動的シーンにおける身体的に意味のある動きを捉えるのに有効であることがわかった。 Neural Radiance Fields (NeRFs) have shown great potential in modelling 3D scenes. Dynamic NeRFs extend this model by capturing time-varying elements, typically using deformation fields. The existing dynamic NeRFs employ a similar Eulerian representation for both light radiance and deformation fields. This leads to a close coupling of appearance and motion and lacks a physical interpretation. In this work, we propose Dynamic Appearance Particle Neural Radiance Field (DAP-NeRF), which introduces particle-based representation to model the motions of visual elements in a dynamic 3D scene. DAP-NeRF consists of superposition of a static field and a dynamic field. The dynamic field is quantised as a collection of {\em appearance particles}, which carries the visual information of a small dynamic element in the scene and is equipped with a motion model. All components, including the static field, the visual features and motion models of the particles, are learned from monocular videos without any prior geometric knowledge of the scene. We develop an efficient computational framework for the particle-based model. We also construct a new dataset to evaluate motion modelling. Experimental results show that DAP-NeRF is an effective technique to capture not only the appearance but also the physically meaningful motions in a 3D dynamic scene.	翻訳日:2023-12-13 01:25:00 公開日:2023-12-10
# DeepSimHO:物理シミュレーションによる手動物体間相互作用の安定電位推定 DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation ( http://arxiv.org/abs/2310.07206v3 ) ライセンス: Link先を確認	Rong Wang, Wei Mao, Hongdong Li	(参考訳) 本稿では,物体と相互作用する手の3次元ポーズ推定の課題について検討する。ハンド・オブジェクト相互作用のモデル化では、手が物体を安定して把握し、重力に逆行し、物体の滑りや落下を防止しなければならない動的性質を見落としながら、主に近接する手がかりを利用する。これらの仕事は、推定において動的制約を活用できず、結果としてしばしば不安定な結果を生み出す。一方で、物理ベースの推論による不安定な構成の洗練は、接触ダイナミクスの複雑さと、データ駆動学習フレームワークにおける効率的で効率的な物理推論の欠如の両方によって、依然として困難である。両問題に対処するため,我々は,前方物理学シミュレーションと後方勾配近似とニューラルネットワークを組み合わせた,新しいディープラーニングパイプラインであるdeepsimhoを提案する。具体的には,ベースネットワークによって推定された初期ハンドオブジェクトポーズに対して,その安定性を評価するために物理シミュレータに転送する。しかし、非スムース接触形状と浸透のため、既存の微分可能シミュレータは信頼できる状態勾配を提供することができない。この問題を解決するために,我々は,シミュレータから安定性評価プロセスをスムーズに学習し,その勾配を近似し,効果的なバックプロパゲーションを実現するディープネットワークを提案する。実験の結果,提案手法は評価の安定性を著しく向上し,テスト時間最適化よりも優れた効率性を実現することがわかった。コードはhttps://github.com/rongakowang/deepsimhoで入手できる。 This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Specifically, for an initial hand-object pose estimated by a base network, we forward it to a physics simulator to evaluate its stability. However, due to non-smooth contact geometry and penetration, existing differentiable simulators can not provide reliable state gradient. To remedy this, we further introduce a deep network to learn the stability evaluation process from the simulator, while smoothly approximating its gradient and thus enabling effective back-propagation. Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization. The code is available at https://github.com/rongakowang/DeepSimHO.	翻訳日:2023-12-13 01:24:14 公開日:2023-12-10
# ニューラルネットワークを用いた確率的構造メタマテリアルの機械的特性と逆設計 Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators ( http://arxiv.org/abs/2311.13812v2 ) ライセンス: Link先を確認	Hanxun Jin, Enrui Zhang, Boyu Zhang, Sridhar Krishnaswamy, George Em Karniadakis, Horacio D. Espinosa	(参考訳) 機械学習(ML)は、設計した材料を設計するための変革的なツールとして登場し、ラボベースの試行錯誤手法によって達成可能なものを超える特性を提供する。しかし、現在の逆設計戦略における大きな課題は、計算および/または実験的なデータセットへの依存であり、特に非線形機械的挙動を示すマイクロスケールの確率的構造材料の設計において問題となる。本稿では,ディープニューラル演算子(deeponet)を活用した新しいエンド・ツー・エンドの科学mlフレームワークについて紹介する。このアプローチは、特定の非線形機械的挙動に合わせた構造物の逆設計を容易にする。 2光子リソグラフィで印刷したスピノダル微細構造から得られた結果は, 機械応答の予測誤差が5～10%の範囲内にあることを明らかにした。我々の研究は、先進的なマイクロメカニクス実験技術を用いたニューラル演算子を用いることで、データ不足に制約されたシナリオにおいても、所望の特性を持つ複雑なマイクロ構造材料の設計が実現可能であることを強調している。我々の研究は、材料設計の分野において重要な進歩を示し、実験的な洞察から直接得られる非平行な機械的特性を持つ次世代のメタマテリアルの発見と開発における新しい時代を告げる可能性を秘めている。 Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.	翻訳日:2023-12-13 01:15:12 公開日:2023-12-10
# Rich and Poor Texture Contrast: AI生成画像検出のためのシンプルで効果的なアプローチ Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection ( http://arxiv.org/abs/2311.12397v2 ) ライセンス: Link先を確認	Nan Zhong, Yiran Xu, Zhenxing Qian, Xinpeng Zhang	(参考訳) 最近の生成モデルは、写真画像の生成において印象的な性能を示している。人間は、そんな信じられないほどリアルなai画像と実際の画像とを区別できない。 AI生成画像は、ユビキタスな偽情報拡散につながる可能性がある。したがって、AI生成画像を特定する検出器を開発するのは最も緊急である。既存の検出器の多くは、目に見えない生成モデルよりも高い性能低下に悩まされている。本稿では,多種多様な生成モデルにより生成された偽画像を識別できる,新しいAI生成画像検出器を提案する。本手法では,画像内のテクスチャ領域とテクスチャ領域間のピクセル間相関コントラストを利用する。豊かなテクスチャ領域の画素は、粗いテクスチャ領域よりも大きな変動を示す。この相違は、豊かなテクスチャ領域のエントロピーが貧しい領域のエントロピーよりも大きいことを反映している。その結果、現実的なリッチテクスチャ領域の合成は、既存の生成モデルよりも難しいことが証明される。この原理に基づき、画像を複数のパッチに分割し、リッチテキストと貧弱テキストのパッチからなる2つのイメージに再構成する。次に,テクスチャ領域とテクスチャ領域の画素間相関差を抽出した。この機能は、さまざまな生成モデルにわたるAI生成画像鑑定に使用される普遍的な指紋として機能する。さらに,既存のベースラインの有効性とアプローチを評価するために,16種類の事前生成モデルを含む総合的なAI生成画像検出ベンチマークを構築した。我々のベンチマークはフォローアップ研究のリーダーボードを提供する。その結果,本手法は最先端のベースラインよりも有意差が認められた。私たちのプロジェクト:https://fdmas.github.io/AIGCDetect/ Recent generative models show impressive performance in generating photographic images. Humans can hardly distinguish such incredibly realistic-looking AI-generated images from real ones. AI-generated images may lead to ubiquitous disinformation dissemination. Therefore, it is of utmost urgency to develop a detector to identify AI-generated images. Most existing detectors suffer from sharp performance drops over unseen generative models. In this paper, we propose a novel AI-generated image detector capable of identifying fake images created by a wide range of generative models. Our approach leverages the inter-pixel correlation contrast between rich and poor texture regions within an image. Pixels in rich texture regions exhibit more significant fluctuations than those in poor texture regions. This discrepancy reflects that the entropy of rich texture regions is larger than that of poor ones. Consequently, synthesizing realistic rich texture regions proves to be more challenging for existing generative models. Based on this principle, we divide an image into multiple patches and reconstruct them into two images, comprising rich-texture and poor-texture patches respectively. Subsequently, we extract the inter-pixel correlation discrepancy feature between rich and poor texture regions. This feature serves as a universal fingerprint used for AI-generated image forensics across different generative models. In addition, we build a comprehensive AI-generated image detection benchmark, which includes 16 kinds of prevalent generative models, to evaluate the effectiveness of existing baselines and our approach. Our benchmark provides a leaderboard for follow-up studies. Extensive experimental results show that our approach outperforms state-of-the-art baselines by a significant margin. Our project: https://fdmas.github.io/AIGCDetect/	翻訳日:2023-12-13 01:14:23 公開日:2023-12-10
# RBPGAN:ビデオスーパーレゾリューションのためのリカレントバックプロジェクションGAN RBPGAN: Recurrent Back-Projection GAN for Video Super Resolution ( http://arxiv.org/abs/2311.09178v4 ) ライセンス: Link先を確認	Marwah Sulaiman, Zahraa Shehabeldin, Israa Fahmy, Mohammed Barakat, Mohammed El-Naggar, Dareen Hussein, Moustafa Youssef, Hesham M. Eraqi	(参考訳) 近年,ビデオスーパーレゾリューション (VSR) はコンピュータビジョンの領域において,様々な用途で非常に影響力のある課題となっている。本稿では,空間的詳細を保ちながら時間的コヒーレントな解を生成するために,vsrのためのバックプロジェクション生成逆ネットワーク(rbpgan)を提案する。 RBPGANは2つの最先端モデルを統合して、生成されたビデオの精度を損なうことなく、両方の世界で最高のものを得る。モデルのジェネレータはRDPNシステムにインスパイアされ、識別器はTecoGANにインスパイアされている。また,Ping-Pong損失を利用して時間とともに時間的整合性を高める。我々のコントリビューションは、異なるデータセットを使用して定性的かつ定量的に示すように、時間的に一貫した詳細の観点から、初期の作業より優れているモデルをもたらす。 Recently, video super resolution (VSR) has become a very impactful task in the area of Computer Vision due to its various applications. In this paper, we propose Recurrent Back-Projection Generative Adversarial Network (RBPGAN) for VSR in an attempt to generate temporally coherent solutions while preserving spatial details. RBPGAN integrates two state-of-the-art models to get the best in both worlds without compromising the accuracy of produced video. The generator of the model is inspired by RBPN system, while the discriminator is inspired by TecoGAN. We also utilize Ping-Pong loss to increase temporal consistency over time. Our contribution together results in a model that outperforms earlier work in terms of temporally consistent details, as we will demonstrate qualitatively and quantitatively using different datasets.	翻訳日:2023-12-13 01:13:17 公開日:2023-12-10
# 自然災害管理のためのAIの活用 : モロッコ地震の教訓 Leveraging AI for Natural Disaster Management : Takeaways From The Moroccan Earthquake ( http://arxiv.org/abs/2311.08999v2 ) ライセンス: Link先を確認	Morocco Solidarity Hackathon (Organizers, Speakers, Mentors and Participant teams)	(参考訳) 2023年、モロッコのアル・ハウズで発生したマグニチュード6.8の地震は、世界的な災害管理戦略に重大な反省を呼び起こし、人工知能(AI)を用いた災害対策、対応、復旧のためのハッカソンを引き起こした。この論文は (i)総合的な文献レビュー (ii)勝利プロジェクトの概観 (iii)オープンソースのリアルタイムデータ、データ不足、学際的コラボレーション障壁といった重要な洞察と課題 (iv)さらなる行動を求めるコミュニティコール。 The devastating 6.8-magnitude earthquake in Al Haouz, Morocco in 2023 prompted critical reflections on global disaster management strategies, resulting in a post-disaster hackathon, using artificial intelligence (AI) to improve disaster preparedness, response, and recovery. This paper provides (i) a comprehensive literature review, (ii) an overview of winning projects, (iii) key insights and challenges, namely real-time open-source data, data scarcity, and interdisciplinary collaboration barriers, and (iv) a community-call for further action.	翻訳日:2023-12-13 01:13:01 公開日:2023-12-10
# KEEC: 等変幾何学の制御に埋め込まれる KEEC: Embed to Control on An Equivariant Geometry ( http://arxiv.org/abs/2312.01544v2 ) ライセンス: Link先を確認	Xiaoyuan Cheng, Yiming Yang, Wei Jiang, Yukun Hu	(参考訳) 本稿では, カオス系や非線形系などの未知および複素力学における表現学習の最適制御を, 事前の領域知識に頼らずに実現する方法について検討する。中心となる考え方は、力学系によって定義される多様体に微分同型である同変幾何学を確立し、非自明なタスクであるこの幾何学の中で最適な制御を行うことである。この課題に対処するために、モデル学習と制御のためにKoopman Embed to Equivariant Control (KEEC)を提案する。リー理論に着想を得たKEECは、多様体上で定義された非線形力学系を学び、軌跡をリー群に埋め込むことから始める。その後、KEECは同変幾何学の強化学習における同変値関数方程式を定式化し、元の多様体上の値関数として不変性を保証する。等価値関数に対する解析的形式的最適作用を導出することにより、keecは理論上、同変幾何上の微分情報を利用して最適同変値関数の二次収束を達成する。 KEECの有効性は、ロレンツ63のようなカオス的なシステムを含む挑戦的な力学系で実証されている。特に,測度と差分情報を保存しながら幾何のコンパクト性と完全性を維持する等尺関数は,これらの特徴を欠く損失関数より一貫して優れていた。 This paper investigates how representation learning can enable optimal control in unknown and complex dynamics, such as chaotic and non-linear systems, without relying on prior domain knowledge of the dynamics. The core idea is to establish an equivariant geometry that is diffeomorphic to the manifold defined by a dynamical system and to perform optimal control within this corresponding geometry, which is a non-trivial task. To address this challenge, Koopman Embed to Equivariant Control (KEEC) is proposed for model learning and control. Inspired by Lie theory, KEEC begins by learning a non-linear dynamical system defined on a manifold and embedding trajectories into a Lie group. Subsequently, KEEC formulates an equivariant value function equation in reinforcement learning on the equivariant geometry, ensuring an invariant effect as the value function on the original manifold. By deriving analytical-form optimal actions on the equivariant value function, KEEC theoretically achieves quadratic convergence for the optimal equivariant value function by leveraging the differential information on the equivariant geometry. The effectiveness of KEEC is demonstrated in challenging dynamical systems, including chaotic ones like Lorenz-63. Notably, our results show that isometric functions, which maintain the compactness and completeness of geometry while preserving metric and differential information, consistently outperform loss functions lacking these characteristics.	翻訳日:2023-12-13 01:04:54 公開日:2023-12-10
# 進化的アルゴリズムによるポインタネットワークの学習 Pointer Networks Trained Better via Evolutionary Algorithms ( http://arxiv.org/abs/2312.01150v3 ) ライセンス: Link先を確認	Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Ke Tang, Peng Yang	(参考訳) Pointer Network (PtrNet) は、組合せ最適化問題(COP)を解決するためのニューラルネットワークである。 PtrNetsは複雑なCOPsインスタンスに対してリアルタイムフィードフォワード推論を提供するが、結果の品質は満足できない傾向にある。一つの考えられる理由は、このような問題は勾配降下のグローバルな探索能力の欠如に苦しんでおり、教師付き学習と強化学習の両方を含む伝統的なptrnetトレーニング手法で頻繁に使われている。 PtrNetの性能向上のために,PtrNetと進化的アルゴリズム(EA)の訓練の利点を深く研究した。トラベリングセールスマン問題(TSP)に基づく広範な実証研究が実施されている。その結果、EAでトレーニングされたPtrNetは、様々な問題スケールで8つの最先端手法よりもずっと優れた推論結果が得られることが示された。勾配降下に基づくPtrNetトレーニング手法と比較して、EAは同じ計算時間でソリューションの品質を最大30.21 %向上させる。この利点を活かして,同じ次元でptrnetをトレーニングすることにより,1000次元tspの解法を初めて報告することが可能であり,高次元copsの解法においてptrnetの性能を向上させるためには,トレーニングインスタンスのスケールアップが必要であることを強く示唆する。 Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditional PtrNet training methods including both supervised learning and reinforcement learning. To improve the performance of PtrNet, this paper delves deeply into the advantages of training PtrNet with Evolutionary Algorithms (EAs), which have been widely acknowledged for not easily getting trapped by local optima. Extensive empirical studies based on the Travelling Salesman Problem (TSP) have been conducted. Results demonstrate that PtrNet trained with EA can consistently perform much better inference results than eight state-of-the-art methods on various problem scales. Compared with gradient descent based PtrNet training methods, EA achieves up to 30.21\% improvement in quality of the solution with the same computational time. With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs.	翻訳日:2023-12-13 01:04:29 公開日:2023-12-10
# 暗く見えるように潜伏拡散モデルを改ざんする Taming Latent Diffusion Models to See in the Dark ( http://arxiv.org/abs/2312.01027v2 ) ライセンス: Link先を確認	Qiang Wen, Yazhou Xing and Qifeng Chen	(参考訳) 低照度RAW画像をよく露出したクリーンなsRGB画像に拡張することは、計算写真において重要な課題である。大規模なペアリングデータの制限のため、従来の手法では極低照度領域の細部や真の色を復元することが困難であった。一方, 生成拡散モデルの最近の進歩は, 低照度画像強調(LLIE)タスクの恩恵を受けるために, 大規模オープンドメインデータセット上で訓練された拡散モデルから生成先行を探索するための有望な生成能力を示している。そこで本研究では, LDM-SIDと呼ばれる拡散モデルに基づくLLIE法を提案する。 LDM-SIDは,提案するテーピングモジュールの集合を凍結した事前学習拡散モデルに挿入し,生成過程を制御することを目的としている。具体的には、低照度情報によって供給されるテーミングモジュールは、拡散モデルにおける中間的特徴を変調するために、一対のアフィン変換パラメータを出力する。さらに,拡散モデルの異なる部分にわたる専用生成前兆の観測に基づいて,入力生画像に2次元離散ウェーブレット変換を適用し,llieタスクを低周波コンテンツ生成と高周波細部維持という2つの必須部分に分割することを提案する。これにより、構造生成と詳細な拡張を最適化するために拡散モデルを巧みに調整することができる。提案手法は, 定量的評価において最先端の性能を得るだけでなく, 視覚的比較において有意な優位性を示す。これらの結果から,LLIEタスクに先立って,事前学習した拡散モデルを利用した生成モデルの有効性が示唆された。プロジェクトページはhttps://csqiangwen.github.io/projects/ldm-sid/にある。 Enhancing a low-light noisy RAW image into a well-exposed and clean sRGB image is a significant challenge in computational photography. Due to the limitation of large-scale paired data, prior approaches have difficulty in recovering fine details and true colors in extremely low-light regions. Meanwhile, recent advancements in generative diffusion models have shown promising generating capabilities, which inspires this work to explore generative priors from a diffusion model trained on a large-scale open-domain dataset to benefit the low-light image enhancement (LLIE) task. Based on this intention, we propose a novel diffusion-model-based LLIE method, dubbed LDM-SID. LDM-SID aims at inserting a set of proposed taming modules into a frozen pre-trained diffusion model to steer its generating process. Specifically, the taming module fed with low-light information serves to output a pair of affine transformation parameters to modulate the intermediate feature in the diffusion model. Additionally, based on the observation of dedicated generative priors across different portions of the diffusion model, we propose to apply 2D discrete wavelet transforms on the input RAW image, resulting in dividing the LLIE task into two essential parts: low-frequency content generation and high-frequency detail maintenance. This enables us to skillfully tame the diffusion model for optimized structural generation and detail enhancement. Extensive experiments demonstrate the proposed method not only achieves state-of-the-art performance in quantitative evaluations but also shows significant superiority in visual comparisons. These findings highlight the effectiveness of leveraging a pre-trained diffusion model as a generative prior to the LLIE task. The project page is available at https://csqiangwen.github.io/projects/ldm-sid/	翻訳日:2023-12-13 01:04:10 公開日:2023-12-10
# 健康のための機械学習シンポジウム2023 -- findings track Machine Learning for Health symposium 2023 -- Findings track ( http://arxiv.org/abs/2312.00655v2 ) ライセンス: Link先を確認	Stefan Hegselmann, Antonio Parziale, Divya Shanmugam, Shengpu Tang, Mercy Nyamewaa Asiedu, Serina Chang, Thomas Hartvigsen, Harvineet Singh	(参考訳) 2023年12月10日にルイジアナ州ニューオーリンズで開催された第3回機械学習・フォー・ヘルスシンポジウム(ML4H 2023)で発表されたFindingsの論文集。 ML4H 2023は、医療、バイオメディシン、公衆衛生など、様々な健康関連分野における問題に関する高品質な申請を招待した。提出トラックはアーカイバル・プロシージャー・トラックと非アーキバル・アック・トラックの2つが提供された。研究対象は、高度な技術的洗練と健康への影響の高い成熟した作業であった。調査結果のトラックは、洞察に富んだ議論を呼び起こしたり、コミュニティにとって貴重なリソースになったり、新しいコラボレーションを可能にする新しいアイデアを探した。手続トラックへの提出は受理されなかったとしても、自動的に結果トラックとして検討された。 ml4hシンポジウムに提出された全ての原稿は、二重盲検のピアレビュープロセスが行われた。 A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.	翻訳日:2023-12-13 01:03:39 公開日:2023-12-10
# webcrow (複数形 webcrows) The WebCrow French Crossword Solver ( http://arxiv.org/abs/2311.15626v2 ) ライセンス: Link先を確認	Giovanni Angelini, Marco Ernandes, Tommaso laquinta, Caroline Stehl\'e, Fanny Sim\~oes, Kamyar Zeinalipour, Andrea Zugarini, Marco Gori	(参考訳) クロスワードパズル(crossword puzzles)は、世界中の異なる言語でプレイされる最も人気のあるワードゲームの一つであり、リドルスタイルは国によって大きく異なる。自動クロスワード解決は困難であり、典型的なソルバは、以前に解決したクロスワードの大規模なデータベースに依存している。本研究では,自動クロスワードソルバであるwebcrow 2.0をフランス語に拡張し,フランス語でクロスワードを解くための最初のプログラムとした。ヒントと回答のクロスワードデータの大規模なリポジトリがないことに対処するため、WebCrow 2.0は、専門家と呼ばれる複数のモジュールを利用して、Web、知識グラフ、言語規則などの異種リソースから候補回答を取得する。 webcrowのパフォーマンスを2つの異なる課題で人間と比較した。過去のクロスワードが限られていたにもかかわらず、フランスのWebCrowは競争力があり、スピードと精度で人間より優れており、新しい言語に一般化する能力を示した。 Crossword puzzles are one of the most popular word games, played in different languages all across the world, where riddle style can vary significantly from one country to another. Automated crossword resolution is challenging, and typical solvers rely on large databases of previously solved crosswords. In this work, we extend WebCrow 2.0, an automatic crossword solver, to French, making it the first program for crossword solving in the French language. To cope with the lack of a large repository of clue-answer crossword data, WebCrow 2.0 exploits multiple modules, called experts, that retrieve candidate answers from heterogeneous resources, such as the web, knowledge graphs, and linguistic rules. We compared WebCrow's performance against humans in two different challenges. Despite the limited amount of past crosswords, French WebCrow was competitive, actually outperforming humans in terms of speed and accuracy, thus proving its capabilities to generalize to new languages.	翻訳日:2023-12-13 01:00:54 公開日:2023-12-10
# ShareCMP: 偏光対応RGB-Pセマンティックセグメンテーション ShareCMP: Polarization-Aware RGB-P Semantic Segmentation ( http://arxiv.org/abs/2312.03430v2 ) ライセンス: Link先を確認	Zhuoyan Liu, Bo Wang, Lizhi Wang, Chenyu Mao, Ye Li	(参考訳) マルチモーダルなセマンティックセグメンテーションは急速に発展しているが、RGB-Polarizationのモダリティはいまだ解明されていない。そこで本研究では,12種類の水中意味クラスを持つUPLight RGB-Pセグメンテーションベンチマークを構築した。本研究では,dual-branchアーキテクチャを持つrgb-pセマンティクスセグメンテーションフレームワークであるsharecmpを設計し,従来のdual-branchモデルと比較してパラメータ数を約26～33%削減した。エンコーダの偏光特性が豊かな偏光モーダル画像を生成するように設計された偏光生成注意(pga)モジュールを包含する。さらに,偏波モーダル情報のためのエンコーダの学習と理解を改善し,pgaモジュールを最適化するために,クラス偏波認識損失(cpaloss)を導入する。合計3つのRGB-Pベンチマークに関する広範な実験により、ShareCMPは、UPLight(92.45(+0.32)%)、ZJU(92.7(+0.1)%)、MCubeS(50.99(+1.51)%)データセットのパラメータが少ないmIoUの最先端性能を達成した。コードはhttps://github.com/LEFTeyex/ShareCMPで入手できる。 Multimodal semantic segmentation is developing rapidly, but the modality of RGB-Polarization remains underexplored. To delve into this problem, we construct a UPLight RGB-P segmentation benchmark with 12 typical underwater semantic classes. In this work, we design the ShareCMP, an RGB-P semantic segmentation framework with a shared dual-branch architecture, which reduces the number of parameters by about 26-33% compared to previous dual-branch models. It encompasses a Polarization Generate Attention (PGA) module designed to generate polarization modal images with richer polarization properties for the encoder. In addition, we introduce the Class Polarization-Aware Loss (CPALoss) to improve the learning and understanding of the encoder for polarization modal information and to optimize the PGA module. With extensive experiments on a total of three RGB-P benchmarks, our ShareCMP achieves state-of-the-art performance in mIoU with fewer parameters on the UPLight (92.45(+0.32)%), ZJU (92.7(+0.1)%), and MCubeS (50.99(+1.51)%) datasets compared to the previous best methods. The code is available at https://github.com/LEFTeyex/ShareCMP.	翻訳日:2023-12-13 00:52:54 公開日:2023-12-10
# gcfa:ジオデシック曲線のプレ形状空間における拡張 GCFA:Geodesic Curve Feature Augmentation in the Pre-Shape Space ( http://arxiv.org/abs/2312.03325v2 ) ライセンス: Link先を確認	Yuexing Han, Guanxin Wan and Bing Wang	(参考訳) 深層学習は様々な領域で顕著な結果をもたらした。しかし、大規模なラベル付きサンプルを必要とするという課題は、いまだにディープラーニングにおいて持続している。このように、ディープラーニングモデルをトレーニングするための重要な戦略として、データ拡張が導入されている。しかし、データ拡張は小さなサンプル環境での情報損失と性能の低下に苦しむ。これらの欠点を克服するため,我々は形状空間理論に基づく特徴拡張法,すなわち,GCFAと呼ばれるジオデシック曲線の特徴増強手法を提案し,まず,ニューラルネットワークモデルを用いて特徴抽出を行う。そして、複数の画像特徴を特徴として事前形状空間に投影する。プレシェイプ空間では、特徴に合うようにジオデシック曲線が構築される。最後に、Geodesic曲線上に生成された多くの特徴は、様々な機械学習モデルをトレーニングするために使用される。 GCFAモジュールは、ほとんどの機械学習メソッドとシームレスに統合できる。また,提案手法は小型サンプルデータセットに対して単純で効果的で非感受性であり,サンプル環境ではgcfa法がデータプリプロセッシングモデルの性能を大幅に向上できることを示す。 Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we propose a feature augmentation method based on shape space theory, i.e., Geodesic curve feature augmentation, called GCFA in brevity.First, we extract features from the image with the neural network model. Then, the multiple image features are projected into a pre-shape space as features. In the pre-shape space, a Geodesic curve is built to fit the features. Finally, the many generated features on the Geodesic curve are used to train the various machine learning models. The GCFA module can be seamlessly integrated with most machine learning methods. And the proposed method is simple, effective and insensitive for the small sample datasets.Several examples demonstrate that the GCFA method can greatly improve the performance of the data preprocessing model in a small sample environment.	翻訳日:2023-12-13 00:52:25 公開日:2023-12-10
# dreamvideo: 画像保持とテキストガイダンスを備えた高忠実度画像対ビデオ生成 DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance ( http://arxiv.org/abs/2312.03018v2 ) ライセンス: Link先を確認	Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang	(参考訳) 参照画像からビデオを生成することを目的とした画像対ビデオ生成が注目されている。既存の方法は、事前訓練されたテキスト誘導画像拡散モデルから画像誘導映像生成モデルへの拡張を試みる。それにもかかわらず、これらの手法は、浅い画像誘導と時間的一貫性の欠如により、低い忠実度または時間の経過とともに点滅する。これらの問題に対処するために,DreamVideo という名前の事前学習ビデオ拡散モデルに基づいてフレーム保持分岐を考案し,高忠実度映像生成手法を提案する。参照画像をセマンティックなレベルで拡散するプロセスに統合する代わりに、DreamVideoはコンボリューション層を通じて参照画像を認識し、ノイズの多いラテントをモデル入力として特徴を結合する。これにより、参照画像の詳細を最大限に保存することができる。さらに、ダブルコンディショナライザフリーのガイダンスを組み込むことで、さまざまなプロンプトテキストを提供することで、異なるアクションの動画に単一のイメージを向けることができる。これは制御可能なビデオ生成に重要な意味を持ち、幅広い応用可能性を持っている。定量的および定性的な結果から,本手法が最先端の手法より優れていることを示すため,公開データセットの総合的な実験を行った。特に忠実度では画像保持能力が強く,UCF101では他の画像対映像モデルと比較してFVDが高い。また、異なるテキストプロンプトを与えることで、正確な制御が可能となる。このモデルのさらなる詳細と包括的な結果はhttps://anonymous0769.github.io/dreamvideo/で示されます。 Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods try to extend pre-trained text-guided image diffusion models to image-guided video generation models. Nevertheless, these methods often result in either low fidelity or flickering over time due to their limitation to shallow image guidance and poor temporal consistency. To tackle these problems, we propose a high-fidelity image-to-video generation method by devising a frame retention branch on the basis of a pre-trained video diffusion model, named DreamVideo. Instead of integrating the reference image into the diffusion process in a semantic level, our DreamVideo perceives the reference image via convolution layers and concatenate the features with the noisy latents as model input. By this means, the details of the reference image can be preserved to the greatest extent. In addition, by incorporating double-condition classifier-free guidance, a single image can be directed to videos of different actions by providing varying prompt texts. This has significant implications for controllable video generation and holds broad application prospects. We conduct comprehensive experiments on the public dataset, both quantitative and qualitative results indicate that our method outperforms the state-of-the-art method. Especially for fidelity, our model has powerful image retention ability and result in high FVD in UCF101 compared to other image-to-video models. Also, precise control can be achieved by giving different text prompts. Further details and comprehensive results of our model will be presented in https://anonymous0769.github.io/DreamVideo/.	翻訳日:2023-12-13 00:51:48 公開日:2023-12-10
# HGPROMPT:Few-shot Prompt Learningのための均質グラフと不均質グラフ HGPROMPT: Bridging Homogeneous and Heterogeneous Graphs for Few-shot Prompt Learning ( http://arxiv.org/abs/2312.01878v2 ) ライセンス: Link先を確認	Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang	(参考訳) グラフニューラルネットワーク(GNN)とヘテロジニアスグラフニューラルネットワーク(HGNN)は、同質で異質なグラフ表現学習において顕著なテクニックであるが、エンドツーエンドの監視フレームワークにおけるパフォーマンスは、タスク固有の監視の可用性に大きく依存している。ラベル付けコストを削減するため、自己教師付きプレテキストタスクの事前学習は一般的なパラダイムとなっているが、事前訓練されたモデルと下流タスクの間には、目的の相違から生じるギャップがしばしばある。ギャップを埋めるために、特に数ショット設定では、事前訓練されたモデルを完全に微調整することなく、迅速な学習が有望な方向として上昇している。グラフ上でのプロンプトベースの学習に関する初期の研究はあったが、主に同質グラフを扱っており、下流のアプリケーションでよく見られる不均一グラフを無視している。本稿では,HGPROMPTを提案する。HGPROMPTは,事前学習タスクと下流タスクだけでなく,二重テンプレート設計による均質かつ異質なグラフを統一する新しい学習促進フレームワークである。さらに,hgpromptのデュアルプロンプトを提案することで,特徴のばらつきだけでなく,タスク間の異種性の違いによって引き起こされるギャップを橋渡しする前に,下流タスクが最も重要視されるよう支援する。最後に,HGPROMPTを3つの公開データセットの広範な実験により徹底的に評価・解析する。 Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on self-supervised pretext tasks has become a popular paradigm,but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been some early exploration of prompt-based learning on graphs, they primarily deal with homogeneous graphs, ignoring the heterogeneous graphs that are prevalent in downstream applications. In this paper, we propose HGPROMPT, a novel pre-training and prompting framework to unify not only pre-training and downstream tasks but also homogeneous and heterogeneous graphs via a dual-template design. Moreover, we propose dual-prompt in HGPROMPT to assist a downstream task in locating the most relevant prior to bridge the gaps caused by not only feature variations but also heterogeneity differences across tasks. Finally, we thoroughly evaluate and analyze HGPROMPT through extensive experiments on three public datasets.	翻訳日:2023-12-13 00:48:40 公開日:2023-12-10
# ロボット合成 : バイオオタクティルセンシングによる手作業操作 Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing ( http://arxiv.org/abs/2312.01853v2 ) ライセンス: Link先を確認	Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang	(参考訳) 接触の多い操作タスクの実行は触覚と視覚フィードバックの融合を必要とする。しかし、これらの様相の異なる性質は、重大な課題をもたらす。本稿では,視覚と触覚の入力を活用し,手作業のデキスタラブルな操作を可能にするシステムを提案する。具体的には,人間の触覚と視覚の合成にインスパイアされた新しい点雲に基づく触覚表現であるRobot Synesthesiaを提案する。このアプローチは、両方の感覚入力を同時にシームレスに統合し、より豊かな空間情報を提供し、ロボットの動作に関するより良い推論を容易にする。シミュレーション環境で訓練され、実際のロボットにデプロイされたこの方法は、様々な手持ちのオブジェクトの回転タスクに適用できる。視覚と触覚の統合によって強化学習とSim2Realのパフォーマンスが向上する。プロジェクトページはhttps://yingyuan0414.github.io/visuotactile/。 Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .	翻訳日:2023-12-13 00:48:12 公開日:2023-12-10
# $\nabla$を信頼する: 因果発見のためのグラディエントベースのインターベンションターゲット Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery ( http://arxiv.org/abs/2211.13715v3 ) ライセンス: Link先を確認	Mateusz Olko, Micha{\l} Zaj\k{a}c, Aleksandra Nowak, Nino Scherrer, Yashas Annadani, Stefan Bauer, {\L}ukasz Kuci\'nski, Piotr Mi{\l}o\'s	(参考訳) データから因果構造を推論することは、科学における基本的な重要性の課題である。観測データはしばしばシステムの因果構造を一意に識別するには不十分である。介入(実験)を行うことで識別性が向上するが、そのようなサンプルは通常、入手が困難で高価である。したがって、因果発見のための実験的設計アプローチは、最も有益な介入目標を推定することで介入回数を最小化することを目的としている。そこで本研究では,勾配に基づく因果発見フレームワークの勾配推定器を「信頼」し,介入獲得関数のシグナルを提供する,新しい勾配に基づく介入ターゲティング手法gitを提案する。我々は、シミュレーションおよび実世界のデータセットにおいて広範な実験を行い、GITが低データ体制において、競争ベースラインに匹敵する性能を示す。 Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.	翻訳日:2023-12-12 23:06:58 公開日:2023-12-10
# 言語モデルのロバスト性および一般化性に及ぼす対人訓練の影響 Impact of Adversarial Training on Robustness and Generalizability of Language Models ( http://arxiv.org/abs/2211.05523v3 ) ライセンス: Link先を確認	Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud, Sanjay Chawla	(参考訳) 敵の訓練は敵の攻撃に対する最も効果的な防御として広く認められている。しかし、敵対的に訓練されたモデルにおける堅牢性と一般化の両立にはトレードオフが伴うことも十分に確立されている。この研究の目的は、言語モデルにおける敵対的トレーニングのための異なるアプローチを深く比較することである。具体的には、事前学習データ拡張とトレーニング時間入力摂動と埋め込み空間摂動がトランスフォーマーベース言語モデルの堅牢性と一般化に及ぼす影響について検討する。以上の結果から,データの強化や入力空間の摂動によるトレーニングにより,より頑健性が得られることが示唆された。しかし、埋め込み空間摂動によるトレーニングは一般化を著しく改善する。学習モデルのニューロンの言語的相関解析により、改良された一般化は「より専門的な」ニューロンによるものであることが明らかになった。我々の知識を最大限に活用するために、言語モデルの対角訓練における逆例を生成する様々な方法の深い定性的な分析を行うのは、これが初めてである。 Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to 'more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.	翻訳日:2023-12-12 23:06:43 公開日:2023-12-10
# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v5 ) ライセンス: Link先を確認	Hao Xue and Flora D. Salim	(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。 This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.	翻訳日:2023-12-12 23:06:27 公開日:2023-12-10
# ベイズの最適緩和としてのSAM SAM as an Optimal Relaxation of Bayes ( http://arxiv.org/abs/2210.01620v3 ) ライセンス: Link先を確認	Thomas M\"ollenhoff, Mohammad Emtiyaz Khan	(参考訳) シャープネスを意識した最小化(SAM)およびそれに関連する逆深層学習法は、一般化を大幅に改善することができるが、その基盤となるメカニズムはまだ完全には理解されていない。そこで我々は,いわゆるフェンシェル双共役を用いて得られた最適凸下界に,期待負損失が置き換えられるベイズ目標の緩和としてsamを定式化する。この接続により、新しいAdamのようなSAMの拡張が自動的に妥当な不確実性の推定値を得ることができ、時には精度も向上する。敵対的手法とベイズ的手法をつなぐことで、我々の研究は堅牢性への新しい道を開きます。 Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables a new Adam-like extension of SAM to automatically obtain reasonable uncertainty estimates, while sometimes also improving its accuracy. By connecting adversarial and Bayesian methods, our work opens a new path to robustness.	翻訳日:2023-12-12 23:06:09 公開日:2023-12-10
# TCJA-SNN:スパイクニューラルネットワークのための時空連成注意 TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks ( http://arxiv.org/abs/2206.10177v2 ) ライセンス: Link先を確認	Rui-Jie Zhu, Qihang Zhao, Tianjing Zhang, Haoyu Deng, Yule Duan, Malu Zhang, Liang-Jian Deng	(参考訳) スパイキングニューラルネットワーク(SNN)は、生物学的妥当性、エネルギー効率、強力な時空間情報表現能力によって広く関心を集めている。ニューラルネットワークの性能向上における注意機構の重要な役割を考えると、SNNと注意機構の統合は、エネルギー効率と高性能コンピューティングパラダイムを提供する可能性を示している。本稿では,TJA-SNNと呼ばれるSNNの時間・チャネル共同注意機構について述べる。提案するtcja-snnフレームワークは,空間次元と時間次元の両方からスパイクシーケンスの意義を効果的に評価できる。より具体的に言えば、我々の重要な技術的貢献は 1) スパイクストリームを平均行列に圧縮するために, 圧縮操作を用いる。そして,効率的な1次元畳み込みに基づく2つの局所的注意機構を活用し,時間・チャネルレベルでの包括的特徴抽出を容易にする。 2) 時間領域とチャネル領域の相互依存性をモデル化するための新しいアプローチとして, クロス畳み込み融合(ccf)層を導入する。このレイヤは2つの次元の独立性を破り、機能間の相互作用を可能にします。実験の結果、提案されたTJA-SNNは、Fashion-MNIST、CIFAR10-DVS、N-Caltech 101、DVS128 Gestureなど、標準的な静的およびニューロモルフィックなデータセットで最大15.7%の精度でSOTAを上回った。さらに、可変オートエンコーダを利用して画像生成タスクにTJA-SNNフレームワークを適用する。我々の知る限り、この研究は、画像分類と生成タスクにSNNアテンション機構が採用された最初の事例である。特に,本手法は両領域でSOTA性能を達成し,この分野において大きな進歩を遂げた。コードはhttps://github.com/ridgerchu/TCJA.comで入手できる。 Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA.	翻訳日:2023-12-12 23:04:52 公開日:2023-12-10
# 公正な二分分類のための任意の決定事項の修正 Repairing Regressors for Fair Binary Classification at Any Decision Threshold ( http://arxiv.org/abs/2203.07490v4 ) ライセンス: Link先を確認	Kweku Kwegyir-Aggrey, A. Feder Cooper, Jessica Dai, John Dickerson, Keegan Hines, Suresh Venkatasubramanian	(参考訳) 我々は,教師付き機械学習型回帰器の処理後問題について検討し,任意の判定しきい値における公平な二項分類を最大化する。各グループのスコア分布間の統計的距離を減少させることにより、各閾値の公平な性能を一度に向上でき、精度を大幅に低下させることなく達成できることが示される。この目的のために,異なる保護群に対する分類の分布の類似度をキャプチャする分布パリティの形式的尺度を導入する。我々の主な成果は、最適輸送に基づく新しいポストプロセッシングアルゴリズムを提案し、分配パリティを確実に最大化し、任意の閾値における等化オッドや等化オポチュニティのようなグループフェアネスの共通概念を達成することである。 2つのフェアネスベンチマークで、我々の手法は実験的にうまく機能し、関連する作業から類似した手法を上回り、一般化する。 We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically, while also outperforming and generalizing similar techniques from related work.	翻訳日:2023-12-12 23:03:24 公開日:2023-12-10
# 畳み込みニューラルネットワークを用いた食品分類と多クラス線形識別分析 Food Classification with Convolutional Neural Networks and Multi-Class Linear Discernment Analysis ( http://arxiv.org/abs/2012.03170v3 ) ライセンス: Link先を確認	Joshua Ball	(参考訳) 畳み込みニューラルネットワーク(cnns)は、人間の脳で知覚される完全に接続された推論能力を表現することに成功している。 cnnの無数の実装は、これらの複雑なパターン、特に画像分類の領域を学習する能力の強さを示している。しかし、高性能CNNをいわゆる「最先端技術」レベルに上げるコストは、計算コストがかかる。 mobilenetv2のようなモデルから非常に深い層を利用する転送学習を使う場合でも、cnnは膨大な時間とリソースを必要とします。フィッシャーの線形判別を一般化した線形判別分析(LDA)は、画像分類に高性能なシステムを必要としないが、クラス特徴の分離性を高めるために多クラス分類法で実装することができる。同様に、私たちはLDAが優れたパフォーマンスを約束しているとも信じています。本稿では, 食品分類のための堅牢なCNNの開発プロセスと, マルチクラスLDAの効果的な実装について論じ, 1) 画像分類においてCNNがLDAよりも優れていること, (2) 画像分類においてLDAを除外すべきでない理由について述べる。 Convolutional neural networks (CNNs) have been successful in representing the fully-connected inferencing ability perceived to be seen in the human brain: they take full advantage of the hierarchy-style patterns commonly seen in complex data and develop more patterns using simple features. Countless implementations of CNNs have shown how strong their ability is to learn these complex patterns, particularly in the realm of image classification. However, the cost of getting a high performance CNN to a so-called "state of the art" level is computationally costly. Even when using transfer learning, which utilize the very deep layers from models such as MobileNetV2, CNNs still take a great amount of time and resources. Linear discriminant analysis (LDA), a generalization of Fisher's linear discriminant, can be implemented in a multi-class classification method to increase separability of class features while not needing a high performance system to do so for image classification. Similarly, we also believe LDA has great promise in performing well. In this paper, we discuss our process of developing a robust CNN for food classification as well as our effective implementation of multi-class LDA and prove that (1) CNN is superior to LDA for image classification and (2) why LDA should not be left out of the races for image classification, particularly for binary cases.	翻訳日:2023-12-12 23:02:39 公開日:2023-12-10
# マルチモーダルインタラクションの定量化とモデル化:情報分解フレームワーク Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework ( http://arxiv.org/abs/2302.12247v5 ) ライセンス: Link先を確認	Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) 近年のマルチモーダルアプリケーションへの関心の高まりにより、様々なモダリティから情報を表現・統合するためのデータセットや手法が広く選択された。これらの経験的な進歩にもかかわらず、基礎的な研究の疑問が残る: マルチモーダルなタスクを解決するのに必要な相互作用をどのように定量化できるか? その後、これらの相互作用を捉えるのに最も適したマルチモーダルモデルは何ですか? これらの質問に答えるために,入力モダリティと出力タスクを関連付ける冗長性,特異性,相乗効果の程度を定量化する情報理論的手法を提案する。これら3つの測度をマルチモーダル分布(略してPID)のPID統計と呼び、高次元分布にスケールするこれらのPID統計に対する2つの新しい推定値を導入する。 PID推定を検証するために、PIDが知られている合成データセットと、PID推定を人間のアノテーションと比較する大規模マルチモーダルベンチマークの両方で広範な実験を行う。最後に,(1)マルチモーダルデータセット内のインタラクションの定量化,(2)マルチモーダルモデルでキャプチャされたインタラクションの定量化,(3)モデル選択のための原則的アプローチ,(4)病理学,ムード予測,ロボット知覚における3つの実世界のケーススタディにおいて有用性を示す。 The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.	翻訳日:2023-12-12 22:55:33 公開日:2023-12-10
# データバイアス下での公平な分類器の比較について On Comparing Fair Classifiers under Data Bias ( http://arxiv.org/abs/2302.05906v2 ) ライセンス: Link先を確認	Mohit Sharma, Amit Deshpande, Rajiv Ratn Shah	(参考訳) 本稿では,データバイアス,すなわち自己表現とラベルバイアス(blum & stangl, 2019)を注入するための理論的モデルを検討する。フェア分類器の精度と公平性に対する様々なデータバイアスの影響を実証研究する。合成および実世界のデータセット(例えば、アダルト、ドイツ信用、銀行マーケティング、CompAS)の広範な実験を通じて、トレーニングデータ(ただし、テストデータではなく)に様々な量の下位表現とラベルバイアスを注入することにより、標準フェアネスツールキットから、事前、内、後処理の公正分類を実証的に監査する。私たちの主な観察は 1 標準公正分類器の公平性と精度は、訓練データに注入されるバイアスが増加するにつれて著しく低下する。 2. 適切なデータに基づいてトレーニングされた単純なロジスティック回帰モデルは、精度と公平性の両方において、偏りのあるトレーニングデータに基づいてトレーニングされた最も公正な分類器よりもしばしば優れる。 3. 少数の単純なフェアネス技術(例えば、リウィーディング、指数化勾配)は、トレーニングデータを低表現とラベルバイアスで注入しても、安定した精度と公正性を保証する。実験では、既存のフェアネスダッシュボードにデータバイアスリスクの測定値を統合する方法も示しています。 In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.	翻訳日:2023-12-12 22:53:34 公開日:2023-12-10
# 責任あるデータキュレーションの倫理的考察 Ethical Considerations for Responsible Data Curation ( http://arxiv.org/abs/2302.03629v3 ) ライセンス: Link先を確認	Jerone T. A. Andrews and Dora Zhao and William Thong and Apostolos Modas and Orestis Papakyriakopoulos and Alice Xiang	(参考訳) human-centric computer vision (hccv) データキュレーションの実践は、しばしばプライバシーやバイアスの懸念を無視し、データセットの撤回と不公平なモデルにつながる。非合意Webスクレイピングによって構築されたHCCVデータセットには、包括的な公正性と堅牢性評価のための重要なメタデータが欠如している。現在の治療法は、ポストホック、採用に対する説得力のある正当化の欠如、あるいは適切なアプリケーションに対する適切なコンテキスト化の提供に失敗している。本研究は,HCCV評価データセットを算出し,プライバシとバイアスの懸念に対処するための,積極的に,ドメイン固有の推奨,目的,プライバシと同意,多様性をカバーすることに焦点を当てる。現在のプラクティスやガイドライン、データセットの取り下げ、監査から導き、考慮事項やレコメンデーションを知らせるアンテホックな視点を採用しています。 Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustness evaluations. Current remedies are post hoc, lack persuasive justification for adoption, or fail to provide proper contextualization for appropriate application. Our research focuses on proactive, domain-specific recommendations, covering purpose, privacy and consent, and diversity, for curating HCCV evaluation datasets, addressing privacy and bias concerns. We adopt an ante hoc reflective perspective, drawing from current practices, guidelines, dataset withdrawals, and audits, to inform our considerations and recommendations.	翻訳日:2023-12-12 22:52:50 公開日:2023-12-10
# モデルのスケーリングがパラメーター効率のチューニングに与える影響を探る Exploring the Impact of Model Scaling on Parameter-Efficient Tuning ( http://arxiv.org/abs/2306.02320v2 ) ライセンス: Link先を確認	Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun	(参考訳) パラメータ効率チューニング(PET)手法は、最小限のパラメータのみを訓練することによって、非常に大きな事前学習言語モデル(PLM)を効果的に駆動することができる。異なるPET法は、異なる手動で設計したチューナブルモジュールを利用する。小型PLMでは、PET法には通常顕著な性能差がある。しかし、モデルスケールが大きくなるにつれて、性能の差は狭まる。したがって、モデルスケーリングはpetメソッドに対する設計の違いの影響を緩和する、と仮定する。そこで本研究では,Arbitrary PET(APET)法という,より柔軟なPET法を提案する。 APET法は任意の位置に分布する任意の数のパラメータからなるチューナブルモジュールと互換性がある。そして,これを利用し,11のNLPタスクを3つの代表的PLMで実験する。本研究は,モデルスケーリングが,(1)調整可能なパラメータの位置が性能に与える影響を緩和し,(2)調整可能なパラメータを最適化することで,フルパラメータの微調整に匹敵する性能を実現することを明らかにする。興味深いことに、チューニング手法は、異なるタスクにおけるランダムな推測性能を超えるように、類似の調整可能なパラメータ数を最適化する。本稿では,この現象と,その基礎となるメカニズムを理解するための最適化の観点から,上記の2つの知見をまとめて論じる。これらの結論は, モデルスケーリングがPETに与える影響の理解を深め, 異なるスケールのPLMに対して, より効率的かつ効率的なPET手法の設計を支援する。ソースコードは、このgithubリポジトリから取得することができる。 Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by training only minimal parameters. Different PET methods utilize different manually designed tunable modules. In small PLMs, there are usually noticeable performance differences among PET methods. Nevertheless, as the model scale increases, the performance differences become marginal. Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. The APET method is compatible with a tunable module, which consists of any number of parameters distributed in arbitrary positions. Then, we utilize it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters. Intriguingly, we also observe that tuning methods optimize the similar number of tunable parameters to exceed random guess performance on different tasks. We collectively discuss this phenomenon and the two aforementioned findings from an optimization perspective to understand the underlying mechanisms. These conclusions enhance our understanding of the impact of model scaling on PET and assist in designing more effective and efficient PET methods for PLMs of different scales. The source code can be obtained from this GitHub repository: \url{https://github.com/yushengsu-thu/PET_Scaling}.	翻訳日:2023-12-12 22:44:24 公開日:2023-12-10
# DaGAN++: ヘッドビデオ生成のための奥行き対応ネットワーク DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation ( http://arxiv.org/abs/2305.06225v2 ) ライセンス: Link先を確認	Fa-Ting Hong, Li Shen, and Dan Xu	(参考訳) 音声頭部生成の手法は、入力された顔画像からの表情や動きを含む2次元情報に大きく依存する。それでも、画素の深さのような高密度な3次元顔形状は、正確な3次元顔構造の構築と、生成のための複雑な背景雑音の抑制に重要な役割を果たしている。しかし、顔の動画に対する密集した3dアノテーションは、非常にコストがかかる。本稿では,まず,カメラパラメータや3次元形状アノテーションを必要とせず,顔映像から密集した3次元顔形状(ie,深度)を学習する新しい自己教師あり手法を提案する。さらに,幾何学習のためのより信頼性の高い剛体移動画素を知覚するために,画素レベルの不確実性を学習する戦略を提案する。第2に,移動場を生成するための正確なキーポイントを提供する,効果的な幾何学誘導型顔キーポイント推定モジュールを設計する。最後に,各生成層に適用可能な3d対応のクロスモーダル(ie,外観,奥行き)注意機構を開発し,顔の形状を粗度から細度まで把握する。大規模な実験は3つの挑戦的なベンチマーク(VoxCeleb1、VoxCeleb2、HDTF)で実施される。その結果,提案フレームワークは,これらのベンチマークで新たな最先端性能が確立され,高度にリアルに再現されたトーキングビデオを生成することができることがわかった。コードとトレーニングされたモデルはgithubプロジェクトのhttps://github.com/harlanhong/cvpr2022-daganで公開されている。 Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Secondly, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (ie, VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks. The codes and trained models are publicly available on the GitHub project page at https://github.com/harlanhong/CVPR2022-DaGAN	翻訳日:2023-12-12 22:41:07 公開日:2023-12-10
# belt:bootstrapping electroencephalography-to-language decodingとゼロショット感情分類 BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision ( http://arxiv.org/abs/2309.12056v2 ) ライセンス: Link先を確認	Jinzhao Zhou, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin	(参考訳) 本稿では,脳から言語への翻訳研究において重要なトピックとなる新しいモデルと学習フレームワークである belt を提案する。非侵襲的な脳信号から可読性自然言語への変換は、応用シナリオを促進し、脳-コンピュータインターフェース(BCI)全体の開発を促進する可能性がある。脳信号デコードや脳から言語への翻訳における重要な問題は、限られた規模と品質のデータセットから意味的に適切かつ差別的な脳波表現を取得することである。提案手法は,既製の大規模事前学習言語モデル(LM)を用いて脳波表現学習をブートストラップする汎用的で効率的なフレームワークである。意味情報の理解とゼロショットの一般化のための大きなLM能力により、BELTは、インターネット規模のデータセットで訓練された大規模なLMを使用して、脳波信号の理解を大幅に改善する。特に、BELTモデルは、ディープコンバータエンコーダとベクトル量子化エンコーダで構成される。意味論的脳波表現は、自然言語を監督する対比学習ステップによって達成される。脳から言語への翻訳とゼロショット感情分類を含む2つの脳デコーディングタスクについて最新の結果を得た。具体的には、両方のタスクのベースラインモデルを5.45%、10%以上で上回り、それぞれ42.31%のBLEU-1スコアと67.32%の精度で翻訳の主評価基準とゼロショットの感情分類をアーカイブする。 This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.	翻訳日:2023-12-12 22:34:38 公開日:2023-12-10
# チャットボットのバイアスと公平性:概要 Bias and Fairness in Chatbots: An Overview ( http://arxiv.org/abs/2309.08836v2 ) ライセンス: Link先を確認	Jintang Xue, Yun-Cheng Wang, Chengwei Wei, Xiaofeng Liu, Jonghye Woo, C.-C. Jay Kuo	(参考訳) チャットボットは半世紀以上研究されてきた。近年,自然言語処理(NLP)技術の急速な発展に伴い,大規模言語モデル(LLM)を用いたチャットボットが注目されている。従来のチャットボットと比較すると、現代のチャットボットはより強力で、現実世界のアプリケーションで使われている。しかし、現代のチャットボット設計にはバイアスと公平性に関する懸念がある。膨大なトレーニングデータ、非常に大きなモデルサイズ、解釈可能性の欠如、バイアス緩和、そして現代のチャットボットの公平性保存は困難である。そこで本稿では,チャットボットシステムにおけるバイアスと公平性について概観する。チャットボットの歴史とそのカテゴリを最初にレビューする。次に、バイアス源とアプリケーションにおける潜在的な害を分析する。公正なチャットボットシステムを設計する際の考察について考察する。最後に今後の研究方針について述べる。 Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in modern chatbot design. Due to the huge amounts of training data, extremely large model sizes, and lack of interpretability, bias mitigation and fairness preservation of modern chatbots are challenging. Thus, a comprehensive overview on bias and fairness in chatbot systems is given in this paper. The history of chatbots and their categories are first reviewed. Then, bias sources and potential harms in applications are analyzed. Considerations in designing fair and unbiased chatbot systems are examined. Finally, future research directions are discussed.	翻訳日:2023-12-12 22:33:46 公開日:2023-12-10
# VerilogEval:Verilogコード生成のための大規模言語モデルの評価 VerilogEval: Evaluating Large Language Models for Verilog Code Generation ( http://arxiv.org/abs/2309.07544v2 ) ライセンス: Link先を確認	Mingjie Liu, Nathaniel Pinckney, Brucek Khailany and Haoxing Ren	(参考訳) 大規模言語モデル (LLMs) の人気が高まり、様々な分野への応用の道が開かれた。本稿では,ハードウェア設計と検証のための Verilog コード生成の文脈で LLM 性能を評価するためのベンチマークフレームワークを提案する。本稿では,VerilogインストラクショナルWebサイトHDLBitsから156個の問題からなる総合評価データセットを提案する。評価セットは、単純な組合せ回路から複雑な有限状態マシンまで、様々なVerilogコード生成タスクからなる。生成した設計の過渡的シミュレーション出力を黄金解と比較することにより、Verilogのコード補完を機能的正当性のために自動テストすることができる。また,LLM生成した合成問題コードペアによるブートストラップにより,教師付き微調整により,事前学習言語モデルのVerilogコード生成能力を向上できることを実証した。 The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.	翻訳日:2023-12-12 22:33:35 公開日:2023-12-10
# 人間のフィードバックからのオフライン学習による言語モデルの調整 Aligning Language Models with Offline Learning from Human Feedback ( http://arxiv.org/abs/2308.12050v2 ) ライセンス: Link先を確認	Jian Hu, Li Tao, June Yang, Chandler Zhou	(参考訳) 人間の好みから学ぶことは言語モデル(LM)にとって重要であり、人間のニーズや社会的価値に効果的に対応する。従来の研究は、人間のフィードバックを利用して指示に従うことで顕著な進歩を遂げてきた。しかし、これらのアプローチは主にPPO(Proximal Policy Optimization)のようなオンライン学習技術に依存しており、言語モデルのチューニングが不安定で難しいことが証明されている。さらに、PPOは複雑な分散システムの実装を必要とし、大規模な分散トレーニングの効率を阻害する。本研究では,環境と対話することなくLMを協調するオフライン学習手法を提案する。具体的には、フィルタリングアライメント(FA)、報酬重み付けレグレッション(RWR)、条件付きアライメント(CA)を検討し、言語モデルを人間の好みに合わせる。教師付き微調整に類似した損失関数を用いることで、単純な機械学習システム~(MLSys)を用いてPPOよりも安定なモデルトレーニングを実現し、より少ない(約9倍)計算資源を実現できる。実験の結果,条件付きアライメントは他のオフラインアライメント手法よりも優れており,ppoに匹敵する。 Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline learning from human feedback framework to align LMs without interacting with environments. Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 9\%) computing resources. Experimental results demonstrate that conditional alignment outperforms other offline alignment methods and is comparable to PPO.	翻訳日:2023-12-12 22:32:50 公開日:2023-12-10
# ターゲットとトラブル: 子どものwebサイト上での追跡と広告 Targeted and Troublesome: Tracking and Advertising on Children's Websites ( http://arxiv.org/abs/2308.04887v2 ) ライセンス: Link先を確認	Zahra Moti, Asuman Senol, Hamid Bostani, Frederik Zuiderveen Borgesius, Veelasha Moonsamy, Arunesh Mathur, Gunes Acar	(参考訳) 現代のウェブでは、追跡者や広告主は同意なしにユーザーの詳細な行動プロファイルを構築し収益化することが多い。ウェブ追跡機構や広告に関する様々な研究にもかかわらず、子供をターゲットにしたウェブサイトに焦点を当てた厳格な研究は行われていない。そこで本研究では,子ども向けウェブサイトにおけるトラッキングと広告(ターゲット広告)の計測について述べる。児童向けWebサイトの包括的リストが欠如していることから、私たちはまず、Webページのタイトルと記述に基づく多言語分類器を構築する。この分類器を200万ページ以上に適用し、児童指向のWebサイトのリストをコンパイルする。 5つの点からこれらのサイトをクローリングし、トラッカー、指紋認証スクリプト、広告の頻度を測定します。当社のクローラは、児童向けウェブサイトに表示された広告を検出し、いつでも広告開示ページをスクレイピングすることで広告ターゲティングが有効かどうかを判断する。その結果、子ども向けウェブサイトの約90%には1つ以上のトラッカーが組み込まれており、約27%にはターゲット広告が含まれていることがわかった。次に、広告から抽出した画像とテキストの両方を処理するMLパイプラインを開発することにより、児童向けウェブサイト上の不適切な広告を識別する。このパイプラインでは、任意の検索語に対して意味的類似性クエリを実行し、デート、体重減少、メンタルヘルスに関連するサービスを促進する広告や、セックストイやおしゃべりのチャットサービスのための広告を明らかにすることができます。これらの広告のいくつかは、反発的で性的に明示的なイメージを特徴とする。要約すると、多くの広告主や児童向けウェブサイトでプライバシー規制に準拠せず、広告の安全性を損なう傾向が示唆されている。子どもを保護し、より安全なオンライン環境を構築するためには、規制当局と利害関係者がより厳格な措置を採用し、強制する必要がある。 On the modern web, trackers and advertisers frequently construct and monetize users' detailed behavioral profiles without consent. Despite various studies on web tracking mechanisms and advertisements, there has been no rigorous study focusing on websites targeted at children. To address this gap, we present a measurement of tracking and (targeted) advertising on websites directed at children. Motivated by lacking a comprehensive list of child-directed (i.e., targeted at children) websites, we first build a multilingual classifier based on web page titles and descriptions. Applying this classifier to over two million pages, we compile a list of two thousand child-directed websites. Crawling these sites from five vantage points, we measure the prevalence of trackers, fingerprinting scripts, and advertisements. Our crawler detects ads displayed on child-directed websites and determines if ad targeting is enabled by scraping ad disclosure pages whenever available. Our results show that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements--a practice that should require verifiable parental consent. Next, we identify improper ads on child-directed websites by developing an ML pipeline that processes both images and text extracted from ads. The pipeline allows us to run semantic similarity queries for arbitrary search terms, revealing ads that promote services related to dating, weight loss, and mental health; as well as ads for sex toys and flirting chat services. Some of these ads feature repulsive and sexually explicit imagery. In summary, our findings indicate a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites. To protect children and create a safer online environment, regulators and stakeholders must adopt and enforce more stringent measures.	翻訳日:2023-12-12 22:32:03 公開日:2023-12-10
# クロスプラットフォームヘイトスピーチ検出のための因果関係誘導乱れ Causality Guided Disentanglement for Cross-Platform Hate Speech Detection ( http://arxiv.org/abs/2308.02080v3 ) ライセンス: Link先を確認	Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu	(参考訳) ソーシャルメディアプラットフォームは、オープンな言論を広める価値はあるものの、有害なコンテンツを広めるためにしばしば利用される。現在のディープラーニングと自然言語処理モデルは、この有害なコンテンツを検出するために、一般的なヘイトスピーチ検出に適応する能力に影響するドメイン固有の用語に依存している。これは、特定の言語信号や特定のカテゴリーの単語の使用に焦点を絞る傾向があるためである。もうひとつの重要な課題は、プラットフォームにトレーニング用の高品質なアノテートデータがない場合であり、異なる分散シフトに適応可能なクロスプラットフォームモデルの必要性が生じる。本研究では,あるプラットフォームのデータに基づいて学習し,複数のプラットフォームに一般化可能な,クロスプラットフォームのヘイトスピーチ検出モデルを提案する。プラットフォーム間の優れた一般化を実現するために、入力表現を不変かつプラットフォームに依存した機能に分解する方法がある。また,多様な環境にまたがる因果関係の学習は,ヘイトスピーチにおける不変表現の理解に大きく寄与すると考えられる。プラットフォームに依存した特徴(ヘイトターゲットの予測に使用される)とプラットフォームに依存しない特徴(ヘイトの存在の予測に使用される)に入力を分離することにより、分布シフトに抵抗する不変表現を学習する。これらの機能は、未公開のプラットフォームでヘイトスピーチを予測するために使用される。 4つのプラットフォームにまたがる広範な実験では,ヘイトスピーチの一般化検出における既存の最先端手法と比較して,モデルの有効性が向上していることが強調された。 Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content. Current deep learning and natural language processing models used for detecting this harmful content overly rely on domain-specific terms affecting their capabilities to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular linguistic signals or the use of certain categories of words. Another significant challenge arises when platforms lack high-quality annotated data for training, leading to a need for cross-platform models that can adapt to different distribution shifts. Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms. To achieve good generalizability across platforms, one way is to disentangle the input representations into invariant and platform-dependent features. We also argue that learning causal relationships, which remain constant across diverse environments, can significantly aid in understanding invariant representations in hate speech. By disentangling input into platform-dependent features (useful for predicting hate targets) and platform-independent features (used to predict the presence of hate), we learn invariant representations resistant to distribution shifts. These features are then used to predict hate speech across unseen platforms. Our extensive experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.	翻訳日:2023-12-12 22:31:32 公開日:2023-12-10
# 教育における人間とaiのハイブリッドエッセイのための境界の自動検出 Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education ( http://arxiv.org/abs/2307.12267v5 ) ライセンス: Link先を確認	Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c, Guanliang Chen	(参考訳) 最近の大規模言語モデル(llm)、例えばchatgptは、特定の指示が提供されたときに、人間的かつ流動的な応答を生成することができる。技術進歩によってもたらされる利便性を認める一方で、教育者は、学生がLSMを活用して執筆の課題を完了し、それらを元の作業として引き渡すのではないかと懸念している。このような懸念から、多くのAIコンテンツ検出研究が実施されているが、これらの先行研究の多くは、テキストが完全に人間書きであるか、完全にAI生成であると仮定して、AIコンテンツ検出を分類問題としてモデル化した。本研究では,人間と生成的LLM(ハイブリッドテキスト)が共同で検出対象のテキストを書けるような,希少かつ現実的な環境下でのAIコンテンツ検出について検討した。まず,対象とするハイブリッドテキスト(境界検出)から人書きコンテンツとAI生成コンテンツ間の遷移点を特定することを目的とした。そこで我々は,(1)エンコーダ訓練中にAI生成コンテンツと人書きコンテンツとを分離する2段階のアプローチを提案し,(2)隣り合う2つのプロトタイプ間の距離を計算し,その境界が互いに最も近い2つのプロトタイプの間に存在すると仮定した。 Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation. The recent large language models (LLMs), e.g., ChatGPT, have been able to generate human-like and fluent responses when provided with specific instructions. While admitting the convenience brought by technological advancement, educators also have concerns that students might leverage LLMs to complete their writing assignments and pass them off as their original work. Although many AI content detection studies have been conducted as a result of such concerns, most of these prior studies modeled AI content detection as a classification problem, assuming that a text is either entirely human-written or entirely AI-generated. In this study, we investigated AI content detection in a rarely explored yet realistic setting where the text to be detected is collaboratively written by human and generative LLMs (i.e., hybrid text). We first formalized the detection task as identifying the transition points between human-written content and AI-generated content from a given hybrid text (boundary detection). Then we proposed a two-step approach where we (1) separated AI-generated content from human-written content during the encoder training process; and (2) calculated the distances between every two adjacent prototypes and assumed that the boundaries exist between the two adjacent prototypes that have the furthest distance from each other. Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.	翻訳日:2023-12-12 22:30:42 公開日:2023-12-10
# 自動コンテンツ分析における誤分類は回帰バイアスを引き起こす。修正できますか? はいできます! Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can! ( http://arxiv.org/abs/2307.06483v2 ) ライセンス: Link先を確認	Nathan TeBlunthuis, Valerie Hase, Chung-Hong Chan	(参考訳) 教師付き機械学習(sml)によって構築される自動分類器(acs)は、テキストから画像やビデオまで、大規模で統計的に強力なデータのサンプルを分類することができ、通信科学や関連分野において広く普及している。この人気にもかかわらず、高精度な分類器でさえ誤分類バイアスや誤解を招くようなエラーを発生させ、下流解析の結果を誤解させる。 SML応用の体系的な文献レビューで示すように、コミュニケーション研究者は誤分類バイアスをほとんど無視する。原則として、既存の統計手法は、人間の注釈者によって作成されたような「金標準」検証データを使用して、誤分類バイアスを正し、一貫した見積もりを生成することができる。我々は,Rパッケージの誤分類モデルの設計と実装を含む新しい手法をモンテカルロシミュレーションを用いて導入し,その手法の限界を明らかにする。提案手法は汎用性と効率性を有するため,新しい誤り訂正手法を推奨する。まとめると、自動分類器(共通精度基準以下のものや体系的な誤分類)は、注意深い研究設計と適切な誤り訂正方法を用いて測定するのに有用である。 Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.	翻訳日:2023-12-12 22:30:15 公開日:2023-12-10
# ReLoRA:低ランク更新によるハイランクトレーニング ReLoRA: High-Rank Training Through Low-Rank Updates ( http://arxiv.org/abs/2307.05695v4 ) ライセンス: Link先を確認	Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky	(参考訳) 数十億のパラメータを持つ大規模ネットワークによるスケールの優位と有効性にもかかわらず、過剰パラメータモデルのトレーニングの必要性はいまだに理解されておらず、トレーニングコストは指数関数的に増加する。本稿では,大規模ニューラルネットワークのトレーニング手法としてパラメータ効率のトレーニング手法を検討する。高速ネットワークのトレーニングに低ランク更新を利用するReLoRAという新しい手法を提案する。最大1.3Bパラメータを持つトランスフォーマー言語モデルのトレーニングにReLoRAを適用し、通常のニューラルネットワークトレーニングに匹敵するパフォーマンスを示す。 ReLoRAはGPU当たり最大5.5GbのRAMを節約し、モデルサイズとハードウェア設定に応じてトレーニング速度を9～40%改善する。本研究は,大規模プレトレーニングにおけるパラメータ効率向上手法の可能性を示す。 Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially. In this paper, we explore parameter-efficient training techniques as an approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to training transformer language models with up to 1.3B parameters and demonstrate comparable performance to regular neural network training. ReLoRA saves up to 5.5Gb of RAM per GPU and improves training speed by 9-40% depending on the model size and hardware setup. Our findings show the potential of parameter-efficient techniques for large-scale pre-training.	翻訳日:2023-12-12 22:29:51 公開日:2023-12-10
# ロバスト・プルーニングに向けて:言語モデルのための適応的知識保持プルーニング戦略 Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models ( http://arxiv.org/abs/2310.13191v2 ) ライセンス: Link先を確認	Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu	(参考訳) pruningの目標は、言語モデルの正確性と頑健性を超えて、最近拡張された。それにもかかわらず、既存の手法は、モデルの間隔を継続的に増加させ、再訓練プロセスを必要とする場合、敵攻撃に対する堅牢性を高めるのに苦労している。人間が大きな言語モデルの時代に入ると、これらの問題はますます顕著になる。本稿では, 言語モデルの頑健性は, 学習済み知識の程度に比例することを示す。そこで本研究では,高密度言語モデルの埋め込み空間と特徴空間を忠実に再現し,pruningプロセスにおける事前学習知識の保存を目的とした,訓練後のpruning戦略を提案する。このセットアップでは、各レイヤの再構成エラーはそれ自体から発生するだけでなく、前のレイヤからの累積誤差も含む。他の最先端のベースラインと比較して、我々のアプローチは、SST2、IMDB、AGNewsのデータセット上でBERTによる精度、スパーシリティ、ロバスト性、およびプルーニングコストのバランスが優れていることを示す。 The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.	翻訳日:2023-12-12 22:22:02 公開日:2023-12-10
# リー代数畳み込みによる概等分散 Almost Equivariance via Lie Algebra Convolutions ( http://arxiv.org/abs/2310.13164v3 ) ライセンス: Link先を確認	Daniel McNeela	(参考訳) 近年,機械学習の研究において,集団行動に関するモデルの等価性が重要な話題となっている。既存のニューラルネットワークアーキテクチャの組込み等価性の解析や、明示的に"bake in"等価性を持つモデルの構築に関する研究は、それ自体で重要な研究領域となっている。しかし、特定のグループの同値性を持つアーキテクチャを付与することは、モデルが期待するデータ変換のタイプに強く先行する。厳密な同変モデルは対称性を強制するが、実世界のデータは必ずしもそのような厳密な等式に従わない。そのような場合、厳密な等分散の事前は実際には強すぎることが証明され、モデルが過小評価される。そこで本研究では,近縁な話題であるほぼ同値な話題について考察する。概等分散の定義を提供し、リー群のリー代数に訴えることでモデルの概等分散を符号化する実用的な方法を与える。具体的には、リー代数の畳み込みを定義し、それらはリー群畳み込みよりもいくつかの利点をもたらすことを証明している。そこから, 等分散および等化の概念と, 概等分散および概等化の概念との関係を示す。 2つの存在定理を証明し、1つは多様体の等距離の有界距離における概等距離の存在を示し、もう1つはヒルベルト空間の逆を示す。我々は、これらの定理を拡張して、群作用と関数類に関する一定の制約に従う完全同値な埋め込み関数の有界距離内における概同値多様体埋め込みの存在を証明する。最後に、完全同値およびほぼ同値な設定でデータセットに対してベンチマークを行うことにより、このアプローチの有効性を実証する。 Recently, the equivariance of models with respect to a group action has become an important topic of research in machine learning. Analysis of the built-in equivariance of existing neural network architectures, as well as the study of building models that explicitly "bake in" equivariance, have become significant research areas in their own right. However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, real-world data does not always conform to such strict equivariances. In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform. Therefore, in this work we study a closely related topic, that of almost equivariance. We provide a definition of almost equivariance and give a practical method for encoding almost equivariance in models by appealing to the Lie algebra of a Lie group. Specifically, we define Lie algebra convolutions and demonstrate that they offer several benefits over Lie group convolutions, including being well-defined for non-compact Lie groups having non-surjective exponential map. From there, we demonstrate connections between the notions of equivariance and isometry and those of almost equivariance and almost isometry. We prove two existence theorems, one showing the existence of almost isometries within bounded distance of isometries of a manifold, and another showing the converse for Hilbert spaces. We extend these theorems to prove the existence of almost equivariant manifold embeddings within bounded distance of fully equivariant embedding functions, subject to certain constraints on the group action and the function class. Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.	翻訳日:2023-12-12 22:21:37 公開日:2023-12-10
# フィードバックからの多様性 Diversity from Human Feedback ( http://arxiv.org/abs/2310.06648v2 ) ライセンス: Link先を確認	Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian	(参考訳) 多様性はアンサンブル学習、強化学習、組合せ最適化など多くの問題において重要な役割を果たす。多様性の尺度を定義する方法は、長年にわたる問題である。多くの手法は専門的な経験に基づいて適切な行動空間を定義し、多様性の測定値を得るが、多くのシナリオでは難しい。本稿では,人間のフィードバックから行動空間を学習する問題を提案し,それを解決するために多様性(diversity from human feedback,divhf)と呼ばれる一般的な手法を提案する。 DivHFは、人間のフィードバックをクエリすることで、人間の好みと一致した行動記述子を学習する。学習した行動記述子は、あらゆる距離測度と組み合わせて多様性測度を定義することができる。本稿では,品質多様性最適化アルゴリズムmap-elitesと統合し,qdaxスイート上で実験を行い,divhfの有効性を示す。結果は、DivHFが直接データ駆動アプローチよりも人間の要求に合う行動空間を学習し、人間の好みの下でより多様なソリューションをもたらすことを示している。我々の貢献は、問題の定式化、DivHF法の提案、実験による効果の実証である。 Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.	翻訳日:2023-12-12 22:21:10 公開日:2023-12-10
# フェデレーション・トランスファー・ラーニングによる基礎モデル:汎用フレームワーク Grounding Foundation Models through Federated Transfer Learning: A General Framework ( http://arxiv.org/abs/2311.17431v5 ) ライセンス: Link先を確認	Yan Kang, Tao Fan, Hanlin Gu, Lixin Fan, Qiang Yang	(参考訳) 膨大な知識と強力な創発能力を備えたGPT-4のような基礎モデル(FM)は、様々な自然言語処理やコンピュータビジョンタスクにおいて大きな成功を収めている。 FMをドメイン固有のタスクに適応させたり、ドメイン固有の知識で拡張することで、FMの潜在能力を最大限活用することができる。しかし、基盤となるFMは、主に制約のあるコンピューティングリソース、データプライバシ、モデルの不均一性、モデルオーナシップなど、いくつかの課題に直面している。フェデレーション・トランスファー・ラーニング(FTL)は、フェデレーション・ラーニングとトランスファー・ラーニングを組み合わせたもので、これらの課題に対処するための有望なソリューションを提供する。近年、FTL-FMと呼ばれるFTLを利用したFMの接地の必要性が、学術と産業の両方で強く現れている。本研究では,FTL-FM研究の高度化とFTL-FMの産業的応用への影響を背景として,FTL-FMフレームワークの構築,FTL-FMフレームワークに基づく詳細な分類法の構築,最先端のFTL-FM作品の分類,提案した分類法に基づくFTL-FM作品の包括的概要について述べる。また、FTL-FMと従来のFM適応フェーズの対応性を確立し、FM実践者がFTL-FMと研究作業を整合させることができるようにした。さらに、FTL-FMにおいて効率とプライバシーが重要となるため、高度な効率改善とプライバシー保護技術の概要を述べる。最後に,FTL-FMの今後の研究の方向性について述べる。 Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM.	翻訳日:2023-12-12 22:08:36 公開日:2023-12-10
# FLORIDA:フェイクっぽいリアルイメージデータセット FLORIDA: Fake-looking Real Images Dataset ( http://arxiv.org/abs/2311.10931v2 ) ライセンス: Link先を確認	Ali Borji	(参考訳) ディープフェイクの検出におけるAIツールやモデルの有効性を評価するために、広範な研究がなされているが、これらのモデルが人工的に現れる真のイメージを正確に識別できるかどうかについては疑問が残る。本研究では,この問題に対処するための最初のステップとして,偽の外観を示す510の本物画像のデータセットをキュレートし,2つのaiモデルを用いて評価を行った。データセットに適用すると,2つのモデルがサブパー性能を示した。さらに,我々のデータセットは,複雑な視覚刺激を理解するための深層学習モデルの能力を評価する上で有用なツールとなり得る。本研究は,本分野におけるさらなる議論と調査の促進を期待する。私たちのデータセットはhttps://github.com/aliborji/FLORIDAでアクセスできます。 Although extensive research has been carried out to evaluate the effectiveness of AI tools and models in detecting deep fakes, the question remains unanswered regarding whether these models can accurately identify genuine images that appear artificial. In this study, as an initial step towards addressing this issue, we have curated a dataset of 510 genuine images that exhibit a fake appearance and conducted an assessment using two AI models. We show that two models exhibited subpar performance when applied to our dataset. Additionally, our dataset can serve as a valuable tool for assessing the ability of deep learning models to comprehend complex visual stimuli. We anticipate that this research will stimulate further discussions and investigations in this area. Our dataset is accessible at https://github.com/aliborji/FLORIDA.	翻訳日:2023-12-12 22:07:49 公開日:2023-12-10
# グラフニューラルネットワークによるイスラム教に対するヘイトスピーチの特定 Explainable Identification of Hate Speech towards Islam using Graph Neural Networks ( http://arxiv.org/abs/2311.04916v2 ) ライセンス: Link先を確認	Azmine Toushik Wasi	(参考訳) islamophobic languageは、オンラインソーシャルインタラクションプラットフォームにおける一般的な課題である。このような憎しみの特定と排除は、調和と平和の未来への重要な一歩である。本研究では,グラフニューラルネットワークを用いて,イスラム教に対するヘイトスピーチを識別し,説明するための新しいパラダイムを提案する。グラフニューラルネットワークの本質的な能力を利用して、異なるデータポイント間の関係を探索、抽出、使用することにより、我々のモデルは、基礎となる相関関係と因果関係の説明を提供しながら、一貫して優れた性能を達成する。 Islamophobic language is a prevalent challenge on online social interaction platforms. Identifying and eliminating such hatred is a crucial step towards a future of harmony and peace. This study presents a novel paradigm for identifying and explaining hate speech towards Islam using graph neural networks. Utilizing the intrinsic ability of graph neural networks to find, extract, and use relationships across disparate data points, our model consistently achieves outstanding performance while offering explanations for the underlying correlations and causation.	翻訳日:2023-12-12 22:07:21 公開日:2023-12-10
# JADE:大規模言語モデルのための言語ベースの安全評価プラットフォーム JADE: A Linguistics-based Safety Evaluation Platform for Large Language Models ( http://arxiv.org/abs/2311.00286v3 ) ライセンス: Link先を確認	Mi Zhang and Xudong Pan and Min Yang	(参考訳) 本稿では, シード質問の言語的複雑さを強化し, 広範に使用されているLLMを, オープンソース中国語8種, 商用中国語6種, 商用英語4種に分類し, 同時に一貫的に破壊する言語ファジリングプラットフォームであるJADEを提案する。質問は同時に複数のLSMの有害な生成を誘発し、平均的な安全でない生成比は$70\%$(下表を参照)であるが、依然として自然の質問であり、コアの安全でないセマンティクスは流動的で保存されている。我々は、商用のLLMとオープンソースのLLM向けに生成されたベンチマークデモを、以下のリンクでリリースする。 JADEによって生成されたより多くの質問を評価することに興味がある読者には、ご連絡ください。 JADEはノーム・チョムスキーの変質生成文法の理論に基づいている。シード質問が安全でない意図で与えられると、JADEは、安全ガードレールが壊れるまで、元の質問の構文構造の複雑さを増すために、生成規則と変換規則のシーケンスを起動する。我々の重要な洞察は: 人間の言語の複雑さのため、現在の最高のLLMのほとんどは、完全にカバーできない無制限の例空間を形成する無限の異なる構文構造から、不変の悪をほとんど認識できない。技術的には、生成/変換規則は言語のネイティブな話者によって構築され、一旦開発されていれば、ガードレールが壊れるまで、ある質問のパースツリーを自動成長させ変換するのに使うことができる。さらなる評価結果とデモについては、Webサイトを参照してください。 In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which contain unsafe questions that are highly threatening: the questions simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of $70\%$ (please see the table below), while are still natural questions, fluent and preserving the core unsafe semantics. We release the benchmark demos generated for commercial English LLMs and open-sourced English LLMs in the following link: https://github.com/whitzard-ai/jade-db. For readers who are interested in evaluating on more questions generated by JADE, please contact us. JADE is based on Noam Chomsky's seminal theory of transformational-generative grammar. Given a seed question with unsafe intention, JADE invokes a sequence of generative and transformational rules to increment the complexity of the syntactic structure of the original question, until the safety guardrail is broken. Our key insight is: Due to the complexity of human language, most of the current best LLMs can hardly recognize the invariant evil from the infinite number of different syntactic structures which form an unbound example space that can never be fully covered. Technically, the generative/transformative rules are constructed by native speakers of the languages, and, once developed, can be used to automatically grow and transform the parse tree of a given question, until the guardrail is broken. For more evaluation results and demo, please check our website: https://whitzard-ai.github.io/jade.html.	翻訳日:2023-12-12 22:06:06 公開日:2023-12-10
# 弱教師付きセマンティックセグメンテーションを支援する基礎モデル Foundation Model Assisted Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2312.03585v2 ) ライセンス: Link先を確認	Xiaobo Yang and Xiaojin Gong	(参考訳) 本研究の目的は, 画像レベルのラベルを用いた弱教師付きセマンティックセマンティックセグメンテーション (WSSS) に対処するために, コントラッシブ言語イメージ事前学習 (CLIP) やSAMセグメンテーションアプライアンスモデル (SAM) などの事前訓練された基礎モデルを活用することである。そこで本研究では,高品質なセグメンテーション種子を生成するためのCLIPとSAMに基づく粗粒度フレームワークを提案する。具体的には,CLIPが凍結重量と2組の学習可能なタスク固有のプロンプトで共同で行う画像分類タスクとシードセグメンテーションタスクを構築する。 SAM-based seeding (SAMS) モジュールは、粗いシードマップまたは細かなシードマップを生成するために各タスクに設計および適用される。さらに,画像レベルラベルに教師付きマルチラベルコントラスト損失と,生成した粗いシードマップに教師付されたカムアクティベーション損失をデザインする。これらの損失は、私たちのフレームワークで学ぶべき唯一の部分であるプロンプトを学ぶために使用されます。ひとたびプロンプトが学習されると、学習したセグメンテーション固有のプロンプトとともにCLIPとSAMSモジュールに各イメージを入力し、高品質なセグメンテーションシードを生成する。これらのシードは、他の2段階のWSSSメソッドと同様に、市販のセグメンテーションネットワークをトレーニングするための擬似ラベルとして機能する。実験により, PASCAL VOC 2012の最先端性能とMS COCO 2014の競争結果が得られた。コードはhttps://github.com/HAL-42/FMA-WSSS.gitで入手できる。 This work aims to leverage pre-trained foundation models, such as contrastive language-image pre-training (CLIP) and segment anything model (SAM), to address weakly supervised semantic segmentation (WSSS) using image-level labels. To this end, we propose a coarse-to-fine framework based on CLIP and SAM for generating high-quality segmentation seeds. Specifically, we construct an image classification task and a seed segmentation task, which are jointly performed by CLIP with frozen weights and two sets of learnable task-specific prompts. A SAM-based seeding (SAMS) module is designed and applied to each task to produce either coarse or fine seed maps. Moreover, we design a multi-label contrastive loss supervised by image-level labels and a CAM activation loss supervised by the generated coarse seed map. These losses are used to learn the prompts, which are the only parts need to be learned in our framework. Once the prompts are learned, we input each image along with the learned segmentation-specific prompts into CLIP and the SAMS module to produce high-quality segmentation seeds. These seeds serve as pseudo labels to train an off-the-shelf segmentation network like other two-stage WSSS methods. Experiments show that our method achieves the state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014. Code is available at https://github.com/HAL-42/FMA-WSSS.git.	翻訳日:2023-12-12 21:54:50 公開日:2023-12-10
# オフライン強化学習における一般化ギャップ The Generalization Gap in Offline Reinforcement Learning ( http://arxiv.org/abs/2312.05742v1 ) ライセンス: Link先を確認	Ishita Mediratta, Qingfei You, Minqi Jiang and Roberta Raileanu	(参考訳) オフライン学習の最近の進歩にもかかわらず、これらの手法はいまだに同じ環境で訓練され、テストされている。本稿では、オンライン強化学習(RL)、オフラインRL、シーケンスモデリング、行動クローニングなど、広く使われているオンラインおよびオフライン学習手法の一般化能力を比較する。実験の結果,オフライン学習アルゴリズムはオンライン学習よりも新しい環境ではパフォーマンスが良いことがわかった。また,オフライン学習における一般化を評価する最初のベンチマークとして,procgen (2dビデオゲーム) や webshop (eコマースwebサイト) から,さまざまなサイズとスキルレベルのデータセットを収集した。データセットには限られた数のゲームレベルや自然言語命令の軌跡が含まれており、テスト時にはエージェントは新しいレベルや命令に一般化する必要がある。実験の結果,既存のオフライン学習アルゴリズムは,トレーニング環境とテスト環境の両方においてオンラインRLの性能に適合することが判明した。ビヘイビアクローンは強力なベースラインであり、複数の環境のデータに基づいてトレーニングし、新しい環境でテストした場合、最先端のオフラインRLとシーケンスモデリングアプローチより優れている。最後に、データのサイズよりも多様性が増すことで、すべてのオフライン学習アルゴリズムの新たな環境の性能が向上することがわかった。本研究は,現在のオフライン学習アルゴリズムの限定的一般化を実証し,この分野におけるさらなる研究の必要性を浮き彫りにした。 Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinforcement learning (RL), offline RL, sequence modeling, and behavioral cloning. Our experiments show that offline learning algorithms perform worse on new environments than online learning ones. We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skill-levels from Procgen (2D video games) and WebShop (e-commerce websites). The datasets contain trajectories for a limited number of game levels or natural language instructions and at test time, the agent has to generalize to new levels or instructions. Our experiments reveal that existing offline learning algorithms struggle to match the performance of online RL on both train and test environments. Behavioral cloning is a strong baseline, outperforming state-of-the-art offline RL and sequence modeling approaches when trained on data from multiple environments and tested on new ones. Finally, we find that increasing the diversity of the data, rather than its size, improves performance on new environments for all offline learning algorithms. Our study demonstrates the limited generalization of current offline learning algorithms highlighting the need for more research in this area.	翻訳日:2023-12-12 19:15:41 公開日:2023-12-10
# FP8-BERT:変圧器の後の量子化 FP8-BERT: Post-Training Quantization for Transformer ( http://arxiv.org/abs/2312.05725v1 ) ライセンス: Link先を確認	Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu	(参考訳) BERTのようなトランスフォーマーベースのモデルは、幅広い自然言語処理タスクに広く応用されている。しかし、避けられない副作用は、大規模なメモリストレージと本番環境にデプロイする際の推論コストである。量子化はコストを緩和する一般的な方法の1つである。しかし、INT8データフォーマットに基づく以前の8ビット量子化戦略は、PTQ(Post-Training Quantization)方式の精度の低下に悩まされるか、高価な量子化アウェアトレーニング(QAT)プロセスを必要とする。近年、H100のような商用AIコンピューティングプラットフォームにおいて、新しい数値形式FP8(すなわち浮動小数点8ビット)が提案されサポートされている。本稿では,簡単なキャリブレーションとフォーマット変換プロセスを用いて,精度を損なうことなく後トレーニング量子化を行う方法としてのfp8の有効性を実証的に検証した。 GLUE と SQuAD v1.1 データセット上でのBERT 変種の実験において,~\citet{nvidia_release} が提案した FP8 標準を採用し,FP8 を用いた PTQ が INT8 の精度を大幅に向上できることを示す。 Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks. However, one inevitable side effect is that they require massive memory storage and inference cost when deployed in production. Quantization is one of the popularized ways to alleviate the cost. However, the previous 8-bit quantization strategy based on INT8 data format either suffers from the degradation of accuracy in a Post-Training Quantization (PTQ) fashion or requires an expensive Quantization-Aware Training (QAT) process. Recently, a new numeric format FP8 (i.e. floating-point of 8-bits) has been proposed and supported in commercial AI computing platforms such as H100. In this paper, we empirically validate the effectiveness of FP8 as a way to do Post-Training Quantization without significant loss of accuracy, with a simple calibration and format conversion process. We adopt the FP8 standard proposed by~\citet{nvidia_release} in our extensive experiments of BERT variants on GLUE and SQuAD v1.1 datasets, and show that PTQ with FP8 can significantly improve the accuracy upon that with INT8, to the extent of the full-precision model.	翻訳日:2023-12-12 19:15:18 公開日:2023-12-10
# 空間連続繊維配向関数の学習 Learning Spatially-Continuous Fiber Orientation Functions ( http://arxiv.org/abs/2312.05721v1 ) ライセンス: Link先を確認	Tyler Spears and P. Thomas Fletcher	(参考訳) ヒトコネクトームの理解は拡散mr画像の分解能によって根本的に制限される。コネクトームを構成する神経経路をコントリクトグラフィで再構築するには、繊維方向の連続フィールドに従う必要がある。典型的には、この磁場は低分解能、雑音拡散MRIにおいて単純な三線補間で見られる。しかし、低品質データの微細な変化にともなうトリリニア補間は困難である。超解像拡散mriにおける最近のディープラーニング手法は固定空間格子へのアップサンプリングに焦点を当てているが、連続場の必要性を満たすものではない。本研究では,低分解能拡散強調画像から空間連続繊維配向密度関数を学習する新しい手法fenriを提案する。また, フェンリの気道解析能力の定量化のために, 深層気道モデル評価のための拡張シミュレーションデータセットも導入した。我々は,FENRIが高分解能繊維配向を現実的な低品質データから正確に予測し,FENRIをベースとしたトラクトグラフィーにより,現在のトリリニア補間よりも高速な直線再構成を実現することを示した。 Our understanding of the human connectome is fundamentally limited by the resolution of diffusion MR images. Reconstructing a connectome's constituent neural pathways with tractography requires following a continuous field of fiber directions. Typically, this field is found with simple trilinear interpolation in low-resolution, noisy diffusion MRIs. However, trilinear interpolation struggles following fine-scale changes in low-quality data. Recent deep learning methods in super-resolving diffusion MRIs have focused on upsampling to a fixed spatial grid, but this does not satisfy tractography's need for a continuous field. In this work, we propose FENRI, a novel method that learns spatially-continuous fiber orientation density functions from low-resolution diffusion-weighted images. To quantify FENRI's capabilities in tractography, we also introduce an expanded simulated dataset built for evaluating deep-learning tractography models. We demonstrate that FENRI accurately predicts high-resolution fiber orientations from realistic low-quality data, and that FENRI-based tractography offers improved streamline reconstruction over the current use of trilinear interpolation.	翻訳日:2023-12-12 19:14:52 公開日:2023-12-10
# プライバシ攻撃の勾配と優先順位を超えて: フェデレーション学習における言語モデルのプール層入力の活用 Beyond Gradient and Priors in Privacy Attacks: Leveraging Pooler Layer Inputs of Language Models in Federated Learning ( http://arxiv.org/abs/2312.05720v1 ) ライセンス: Link先を確認	Jianwei Li, Sheng Liu, Qi Lei	(参考訳) federated learning(fl)は、データをローカルに保存し、モデル更新のみを送信することで、ユーザのプライバシを強調する。最近、flの文脈で言語モデルからセンシティブなトレーニングテキストを抽出することで、プライバシ攻撃に関する一連の作業がユーザのプライバシを損なう。バッチサイズが制限された作業(バッチサイズ1など)もあれば,検出が容易なものもある。本稿では,様々なバッチサイズ設定におけるテキストの回復率を著しく向上させ,検出し難い革新的なアプローチを提案する。基本的なグラデーションマッチングとドメイン事前知識に基づいて,言語モデルのプール層の入力を復元することで,機能レベルで追加の教師付き信号を提供することができる。勾配データとは異なり、これらの信号は文やトークンの平均値ではなく、より微妙で効果的な洞察を提供する。我々は,テキスト分類タスクをCoLA,SST-2,Rotten Tomatoesなどのデータセット上でベンチマークする。バッチサイズとモデルが異なるため、我々のアプローチは従来よりも一貫して優れています。 Federated learning (FL) emphasizes decentralized training by storing data locally and sending only model updates, underlining user privacy. Recently, a line of works on privacy attacks impairs user privacy by extracting sensitive training text from language models in the context of FL. Yet, these attack techniques face distinct hurdles: some work chiefly with limited batch sizes (e.g., batch size of 1), and others are easily detectable. This paper introduces an innovative approach that is challenging to detect, significantly enhancing the recovery rate of text in various batch-size settings. Building on fundamental gradient matching and domain prior knowledge, we enhance the attack by recovering the input of the Pooler layer of language models, which enables us to provide additional supervised signals at the feature level. Unlike gradient data, these signals do not average across sentences and tokens, thereby offering more nuanced and effective insights. We benchmark our method using text classification tasks on datasets such as CoLA, SST-2, and Rotten Tomatoes. Across different batch sizes and models, our approach consistently outperforms previous state-of-the-art results.	翻訳日:2023-12-12 19:14:33 公開日:2023-12-10
# dvanet:マルチビューアクション認識のためのビューとアクションの分離 DVANet: Disentangling View and Action Features for Multi-View Action Recognition ( http://arxiv.org/abs/2312.05719v1 ) ライセンス: Link先を確認	Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah	(参考訳) 本研究では,映像中の映像関連情報から,学習した行動表現を分離するための多視点行動認識手法を提案する。複数の視点からキャプチャされたアクションインスタンスを分類しようとすると、異なるカメラアングルからキャプチャされたアクションの背景、オクルージョン、可視性の違いにより、より困難度が高くなる。マルチビュー動作認識で導入された様々な問題に対処するため,学習可能なトランスフォーマーデコーダクエリを2つの教師付きコントラスト損失とともに新たに構成し,視点の変化に頑健な動作特徴の学習を行う。トランスフォーマーデコーダは、別々のクエリを使用して、アクションとビュー情報を分離して学習します。我々は,NTU RGB+D,NTU RGB+D 120,PKU-MMD,N-UCLAの4つの多視点行動認識データセットにおいて,他のユニモーダルモデルよりも有意に優れていることを示す。従来のRGBと比較すると、各データセットでそれぞれ1.5\%、4.8\%、2.2\%、および4.8\%の最大改善が見られる。 In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video. When trying to classify action instances captured from multiple viewpoints, there is a higher degree of difficulty due to the difference in background, occlusion, and visibility of the captured action from different camera angles. To tackle the various problems introduced in multi-view action recognition, we propose a novel configuration of learnable transformer decoder queries, in conjunction with two supervised contrastive losses, to enforce the learning of action features that are robust to shifts in viewpoints. Our disentangled feature learning occurs in two stages: the transformer decoder uses separate queries to separately learn action and view information, which are then further disentangled using our two contrastive losses. We show that our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets: NTU RGB+D, NTU RGB+D 120, PKU-MMD, and N-UCLA. Compared to previous RGB works, we see maximal improvements of 1.5\%, 4.8\%, 2.2\%, and 4.8\% on each dataset, respectively.	翻訳日:2023-12-12 19:14:10 公開日:2023-12-10
# データ可用性を制限したリチウムイオン電池寿命予測: 異なる機械学習アルゴリズムのベンチマーク Forecasting Lithium-Ion Battery Longevity with Limited Data Availability: Benchmarking Different Machine Learning Algorithms ( http://arxiv.org/abs/2312.05717v1 ) ライセンス: Link先を確認	Hudson Hilal and Pramit Saha	(参考訳) リチウムイオン電池の使用が拡大するにつれて、リチウムイオン電池の寿命を予測できることがますます重要になっている。この研究は、従来の機械学習とディープラーニングの両方で異なる機械学習アルゴリズムの相対的性能を比較して、最小限のデータに基づいてバッテリーサイクル寿命予測のための最高のパフォーマンスアルゴリズムを決定することを目的としている。統計的データに基づいて,14種類の機械学習モデルを用いて手作りの機能を入力し,テストのために3つの特徴群に分割した。ディープラーニングモデルでは,標準的なリカレントニューラルネットワークの構成,ゲート型リカレントユニット,アテンション機構のない長期記憶など,さまざまなニューラルネットワークモデルをテストした。ディープラーニングモデルは、最初の100サイクルで各バッテリーの生データに基づいて、多変量時系列信号を供給した。実験の結果,手作り機能を用いた機械学習アルゴリズムは特に良好であり,平均絶対パーセンテージ誤差は10～20%であった。最も優れたアルゴリズムはランダムフォレスト回帰器であり、9.8%の平均絶対パーセンテージ誤差を与えた。従来の機械学習モデルは、一般的なデータセットのトレンドを理解する能力に優れていた。一方,深層学習モデルでは,生の限られたデータに対して,特に性能が低かった。中間範囲のデータ依存を捉えることにフォーカスした gru や rnns のようなアルゴリズムは、このタスクにとって重要で緩やかな傾向を認識するのにあまり適していなかった。本研究により,手作り機能付き機械学習モデルの実装は,データ可用性に制限のあるリチウムイオン電池寿命を予測するための高度なディープラーニングモデルよりも有効であることが判明した。 As the use of Lithium-ion batteries continues to grow, it becomes increasingly important to be able to predict their remaining useful life. This work aims to compare the relative performance of different machine learning algorithms, both traditional machine learning and deep learning, in order to determine the best-performing algorithms for battery cycle life prediction based on minimal data. We investigated 14 different machine learning models that were fed handcrafted features based on statistical data and split into 3 feature groups for testing. For deep learning models, we tested a variety of neural network models including different configurations of standard Recurrent Neural Networks, Gated Recurrent Units, and Long Short Term Memory with and without attention mechanism. Deep learning models were fed multivariate time series signals based on the raw data for each battery across the first 100 cycles. Our experiments revealed that the machine learning algorithms on handcrafted features performed particularly well, resulting in 10-20% average mean absolute percentage error. The best-performing algorithm was the Random Forest Regressor, which gave a minimum 9.8% mean absolute percentage error. Traditional machine learning models excelled due to their capability to comprehend general data set trends. In comparison, deep learning models were observed to perform particularly poorly on raw, limited data. Algorithms like GRU and RNNs that focused on capturing medium-range data dependencies were less adept at recognizing the gradual, slow trends critical for this task. Our investigation reveals that implementing machine learning models with hand-crafted features proves to be more effective than advanced deep learning models for predicting the remaining useful Lithium-ion battery life with limited data availability.	翻訳日:2023-12-12 19:13:45 公開日:2023-12-10
# 逆転学習における初期化の課題 Initialization Matters for Adversarial Transfer Learning ( http://arxiv.org/abs/2312.05716v1 ) ライセンス: Link先を確認	Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin	(参考訳) 転校学習における事前学習・微調整パラダイムの普及に伴い,ダウンストリームタスクの堅牢性が重要な課題となっている。本研究では,トランスファー学習における敵対的ロバスト性に着目し,事前学習モデルとリニアヘッドの両方を含む初期化の重要な役割を明らかにする。まず, 対向的にロバストな事前学習モデルの必要性を見いだす。特に, 標準事前学習モデルでは, パラメータ効率のよいファインチューニング (peft) 手法は, 逆行性に乏しいか, 逆行性が著しく低下した下流タスクに対して, 逆行性に頑健性を示すことが判明した。強固な事前学習モデルを活用することで、単純な線形プローブが特定のデータセット上でランダムな初期化を伴う完全微調整や他のペフト法よりも優れていることがわかりました。さらに,ロバスト事前学習からロバスト性を維持するのに線形プローブが優れていることも確認した。そこで本稿では, 逆線形探索により得られる重みを線形ヘッドに初期化して, 事前学習から頑健性を最大限に継承する, 逆線形初期化(RoLI)を提案する。 5つの異なる画像分類データセットにおいて,RoLIの有効性を実証し,新しい最先端結果を得た。 With the prevalence of the Pretraining-Finetuning paradigm in transfer learning, the robustness of downstream tasks has become a critical concern. In this work, we delve into adversarial robustness in transfer learning and reveal the critical role of initialization, including both the pretrained model and the linear head. First, we discover the necessity of an adversarially robust pretrained model. Specifically, we reveal that with a standard pretrained model, Parameter-Efficient Finetuning~(PEFT) methods either fail to be adversarially robust or continue to exhibit significantly degraded adversarial robustness on downstream tasks, even with adversarial training during finetuning. Leveraging a robust pretrained model, surprisingly, we observe that a simple linear probing can outperform full finetuning and other PEFT methods with random initialization on certain datasets. We further identify that linear probing excels in preserving robustness from the robust pretraining. Based on this, we propose Robust Linear Initialization~(RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining. Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.	翻訳日:2023-12-12 19:13:16 公開日:2023-12-10
# マルチスケールモデリングにおけるマイクロマクロ整合性:高速・低速力学系のスコアベースモデルによるサンプリング Micro-Macro Consistency in Multiscale Modeling: Score-Based Model Assisted Sampling of Fast/Slow Dynamical Systems ( http://arxiv.org/abs/2312.05715v1 ) ライセンス: Link先を確認	Ellis R. Crabtree, Juan M. Bello-Rivas, Ioannis G. Kevrekidis	(参考訳) 計算化学、生物学、材料科学などの分野におけるマルチスケール力学系のモデリングにおける重要なステップは、長期にわたる関心事における位相空間の代表的なサンプリングである。例えば、多くの自由度を持つ系の長期的挙動は直接力学シミュレーションによって効率的に計算できないことが多く、そのようなシステムは局所的な自由エネルギーミニマの中に閉じ込められることがある。物理学に基づくマルチ時間力学系の研究において、自由エネルギー障壁を越える探索を加速するためにサンプリングを強化する技術が開発されている。一方、機械学習の分野では、生成モデルの一般的な目標は、この密度から経験的なサンプルをトレーニングした後、ターゲット密度からサンプルをサンプリングすることである。スコアベース生成モデル(SGM)は、目標トレーニング分布から可塑性データを生成する最先端の能力を実証している。このような生成モデルの条件付き実装は、強化サンプリングに対する長い確立された-および物理に基づく-ソリューションと大きな並列性を示すことが示されている。これらの物理に基づく手法は、ML生成モデルとの結合によって強化され、強度を補完し、それぞれの技術の弱点を軽減することができる。本研究では,SGMをこのような結合フレームワークで利用することにより,マルチスケールな動的システムのサンプリングを改善することができることを示す。 A valuable step in the modeling of multiscale dynamical systems in fields such as computational chemistry, biology, materials science and more, is the representative sampling of the phase space over long timescales of interest; this task is not, however, without challenges. For example, the long term behavior of a system with many degrees of freedom often cannot be efficiently computationally explored by direct dynamical simulation; such systems can often become trapped in local free energy minima. In the study of physics-based multi-time-scale dynamical systems, techniques have been developed for enhancing sampling in order to accelerate exploration beyond free energy barriers. On the other hand, in the field of Machine Learning, a generic goal of generative models is to sample from a target density, after training on empirical samples from this density. Score based generative models (SGMs) have demonstrated state-of-the-art capabilities in generating plausible data from target training distributions. Conditional implementations of such generative models have been shown to exhibit significant parallels with long-established -- and physics based -- solutions to enhanced sampling. These physics-based methods can then be enhanced through coupling with the ML generative models, complementing the strengths and mitigating the weaknesses of each technique. In this work, we show that that SGMs can be used in such a coupling framework to improve sampling in multiscale dynamical systems.	翻訳日:2023-12-12 19:12:51 公開日:2023-12-10
# ミューオン崩壊の絡み合いエントロピー分布 Entanglement Entropy Distributions of a Muon Decay ( http://arxiv.org/abs/2312.05712v1 ) ライセンス: Link先を確認	Shanmuka Shivashankara, Patti Rizzo, Nicole Cafe	(参考訳) 崩壊および散乱過程の密度行列で生じる発散は、トレースとユニタリティーまたは光学定理によって正規化される。これらの発散は、崩壊する粒子の寿命または全散乱断面積によって規則化される。また、この正規化は最終的な粒子の期待されるヘリシティを与える。密度行列は、ローレンツ不変密度行列のエントリとユニタリティーが木レベルで保たれる、休息時の偏極ミューオンの弱崩壊、$\mu^- \rightarrow \nu_{\mu} (e^\bar \nu_e)$ に対して導かれる。電子のフォン・ノイマンエンタングルメントエントロピー分布は、電子の放出角とエネルギーの両方に関して計算される。角エントロピー分布は、最小体積正規化が与えられたミューオンの分極に対して後方に放出される電子を好む。運動エントロピー分布はミューオンの静止質量エネルギーの半分で最大である。これらの結果は、電子の角および運動的減衰率分布に類似している。密度行列とエンタングルメントエントロピーは、領域または体積の比のどちらかでキャストすることができる。 Divergences that occur in density matrices of decay and scattering processes are shown to be regularized by tracing and unitarity or the optical theorem. These divergences are regularized by the lifetime of the decaying particle or the total scattering cross section. Also, this regularization is shown to give the expected helicities of final particles. The density matrix is derived for the weak decay of a polarized muon at rest, $\mu^- \rightarrow \nu_{\mu} (e^- \bar \nu_e)$, with Lorentz invariant density matrix entries and unitarity upheld at tree level. The electron's von Neumann entanglement entropy distributions are calculated with respect to both the electron's emission angle and energy. The angular entropy distribution favors an electron emitted backwards with respect to the muon's polarization given a minimum volume regularization. The kinematic entropy distribution is maximal at half the muon's rest mass energy. These results are similar to the electron's angular and kinematic decay rate distributions. Both the density matrix and entanglement entropy can be cast either in terms of ratios of areas or volumes.	翻訳日:2023-12-12 19:12:29 公開日:2023-12-10
# スパース誘導ネットワークを用いたカメラベース3次元セマンティックシーン補完 Camera-based 3D Semantic Scene Completion with Sparse Guidance Network ( http://arxiv.org/abs/2312.05752v1 ) ライセンス: Link先を確認	Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao, Jongwon Ra, Laijian Li, Yong Liu	(参考訳) semantic scene completion (ssc) は、3dシーン全体における各voxelの意味的占有率を限定的な観察から予測することを目的としている。近年,よりリッチな視覚手がかりとカメラの費用対効果により,カメラベースのsscソリューションが研究されている。しかし、既存の手法は通常、高度で重い3dモデルを頼りにして、明確なセグメンテーション境界に十分な識別性を持たないリフトされた3d特徴を直接処理する。そこで,本稿では,sgnと呼ばれるエンドツーエンドカメラベースのsscフレームワークを提案する。sgnは,意味的かつ可食的な種子ボクセルから,幾何学的前後の空間情報に基づいてシーン全体へ意味を拡散する。空間的占有と幾何学的先行のためのハイブリッドガイダンス(疎意味的および幾何的ガイダンス)と効果的なボクセルアグリゲーションを設計することにより、異なるカテゴリ間の特徴分離を強化し、意味拡散の収束を早める。 SemanticKITTIデータセットの大規模な実験結果は、既存の最先端手法よりもSGNの方が優れていることを示している。 Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to directly process the lifted 3D features that are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene based on geometry prior and occupancy information. By designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial occupancy and geometry priors, we enhance the feature separation between different categories and expedite the convergence of semantic diffusion. Extensive experimental results on the SemanticKITTI dataset demonstrate the superiority of our SGN over existing state-of-the-art methods.	翻訳日:2023-12-12 19:05:40 公開日:2023-12-10
# クエリストラテジーのベンチマーク: 深層学習を目指して Benchmarking of Query Strategies: Towards Future Deep Active Learning ( http://arxiv.org/abs/2312.05751v1 ) ライセンス: Link先を確認	Shiryu Ueno, Yusei Yamada, Shunsuke Nakatsuka, and Kunihito Kato	(参考訳) 本研究では,深層行動学習(DAL)のためのクエリ戦略をベンチマークする。 DALは、クエリ戦略によって選択された高品質なサンプルに注釈を付けることで、アノテーションのコストを削減する。既存の研究には2つの主要な問題があり、実験的な設定は標準化されておらず、既存の方法の評価が困難であり、実験のほとんどはcifarまたはmnistデータセットで行われた。そこで我々は,DALの標準化された実験環境を開発し,医用および視覚検査画像を含む6つのデータセットを用いて,様々なクエリ戦略の有効性を検討する。さらに,現在のdalアプローチのほとんどがモデルベースであるため,クエリのためのフルトレーニングモデルを用いた検証実験を行い,これら6つのデータセットの有効性を検証した。私たちのコードは \href{https://github.com/ia-gu/Benchmarking-of-Query-Strategies-Towards-Future-Deep-Active-Learning} で利用可能です。 In this study, we benchmark query strategies for deep actice learning~(DAL). DAL reduces annotation costs by annotating only high-quality samples selected by query strategies. Existing research has two main problems, that the experimental settings are not standardized, making the evaluation of existing methods is difficult, and that most of experiments were conducted on the CIFAR or MNIST datasets. Therefore, we develop standardized experimental settings for DAL and investigate the effectiveness of various query strategies using six datasets, including those that contain medical and visual inspection images. In addition, since most current DAL approaches are model-based, we perform verification experiments using fully-trained models for querying to investigate the effectiveness of these approaches for the six datasets. Our code is available at \href{https://github.com/ia-gu/Benchmarking-of-Query-Strategies-Towards-Future-Deep-Active-Learning}	翻訳日:2023-12-12 19:05:13 公開日:2023-12-10
# IL-NeRF:カメラポッドアライメントを用いたニューラルラジアンスフィールドのインクリメンタル学習 IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment ( http://arxiv.org/abs/2312.05748v1 ) ライセンス: Link先を確認	Letian Zhang, Ming Li, Chen Chen, Jie Xu	(参考訳) neural radiance fields(nerf)は、フォトリアリスティックな画像を生成し、複雑なシーンを表現するための有望なアプローチである。しかし、データを逐次処理する場合は、新しいデータでトレーニングした後、前のデータを忘れやすい破滅的な忘れ込みに悩まされることがある。知識蒸留を用いた既存の漸進的な学習手法では、連続データチャンクは2次元画像と対応するカメラポーズパラメータの両方を含むと仮定する。これは、データが順次到着し、将来のチャンクがアクセスできない場合でも、必要なカメラのポーズをデータセット全体から推定する必要があるため、パラドックスとなる。対照的に,カメラのポーズが不明な実用的なシナリオに注目する。我々は,この課題に対処するために,段階的NeRFトレーニングのための新しいフレームワークであるIL-NeRFを提案する。 IL-NeRFのキーとなるアイデアは、カメラのポーズを初期化して調整するための参照として過去のカメラのポーズを選択することである。この後、カメラポーズと再生ベースのnerf蒸留の合同最適化が行われる。実世界の屋内および屋外のシーンにおける実験により、IL-NeRFはインクリメンタルなNeRFトレーニングを処理し、ベースラインを最大54.04 %のレンダリング品質で上回ります。 Neural radiance fields (NeRF) is a promising approach for generating photorealistic images and representing complex scenes. However, when processing data sequentially, it can suffer from catastrophic forgetting, where previous data is easily forgotten after training with new data. Existing incremental learning methods using knowledge distillation assume that continuous data chunks contain both 2D images and corresponding camera pose parameters, pre-estimated from the complete dataset. This poses a paradox as the necessary camera pose must be estimated from the entire dataset, even though the data arrives sequentially and future chunks are inaccessible. In contrast, we focus on a practical scenario where camera poses are unknown. We propose IL-NeRF, a novel framework for incremental NeRF training, to address this challenge. IL-NeRF's key idea lies in selecting a set of past camera poses as references to initialize and align the camera poses of incoming image data. This is followed by a joint optimization of camera poses and replay-based NeRF distillation. Our experiments on real-world indoor and outdoor scenes show that IL-NeRF handles incremental NeRF training and outperforms the baselines by up to $54.04\%$ in rendering quality.	翻訳日:2023-12-12 19:04:57 公開日:2023-12-10
# 学生学習におけるスキル分類と予測のための確率と情報エントロピーの差異 Difference of Probability and Information Entropy for Skills Classification and Prediction in Student Learning ( http://arxiv.org/abs/2312.05747v1 ) ライセンス: Link先を確認	Kennedy Efosa Ehimwenma, Safiya Al Sharji and Maruf Raheem	(参考訳) 事象の確率は[0, 1]の範囲にある。サンプル空間sにおいて、確率の値は結果が真か偽かを決定する。決して起こらない事象Pr(A)の確率 = 0. イベントPr(B) の確率は確実に起こる。 = 1 なので、イベント a と b の両方が確実である。さらに、与えられたサンプル空間 s = 1 における有限個の事象の集合の確率の和 pr(e1) + pr(e2) + ... + pr(en) は、逆に、確実に起こる2つの確率の和の差が成り立つ。まず, ベイズの定理を考察し, 学生学習における学習対象の予測に応用する前に, 学習事象の発生確率と確率の差を補う。本論文は,生徒の学習対象の重みを,argMaxPr(S)と学生効果の確率の差で定量化するものである。スキルセットのデータセットを使用して、計算手順が示す。一より高いレベルの学習につながるようなスキルセットの事象の確率二被写体・被写体の再学習を要しない事象の確率三学生の成績をクラスラベルに予測する際の決定木の正確性及び四スキルセットデータに関する情報エントロピーとその学生の認知能力及び学習の推薦に関する意味 [1] The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + ... + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes' theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].	翻訳日:2023-12-12 19:04:34 公開日:2023-12-10
# ファンデーションモデルにおけるオープンワールドオブジェクト検出 Open World Object Detection in the Era of Foundation Models ( http://arxiv.org/abs/2312.05745v1 ) ライセンス: Link先を確認	Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh Wang	(参考訳) 物体検出は、ロボット工学から医療画像解析まで、現実世界の様々な応用に不可欠なものだ。このようなアプリケーションで確実に使用されるためには、モデルが予期せぬ(または新しい)オブジェクトを処理できる必要がある。オープンワールドオブジェクト検出(OWD)パラダイムは、未知のオブジェクトを検出し、発見したオブジェクトを段階的に学習することで、この課題に対処する。しかし、OWDメソッドの開発は、厳密なベンチマークとタスク定義のために妨げられている。これらの定義は事実上基礎モデルを禁じる。本稿では,これらの定義を緩和し,OWDにおける事前学習基盤モデルの利用について検討する。まず,既存のベンチマークでは基礎モデルを用いた評価手法が不十分であることを示す。その結果、これらのモデルの新たな、挑戦的なベンチマークをキュレートする動機になりました。そこで我々は,航空画像や外科画像などの挑戦的領域を含む,現実世界のアプリケーション駆動データセット5つを含む新しいベンチマークを導入し,ベースラインを確立する。アプリケーション駆動データセットのクラス間の固有の接続を利用し、新しいメソッドであるオープンワールドのためのファウンデーションオブジェクト検出モデル(fomo)を導入し、ベースとなる既知のオブジェクトと共有属性に基づいて未知のオブジェクトを識別する。 FOMOは、ベンチマークのベースラインに比べて、未知のオブジェクトmAPが約3倍である。しかし,本研究の結果から,オブジェクト検出手法を現実世界のドメインに拡張する大きな研究機会が示唆された。私たちのコードとベンチマークはhttps://orrzohar.github.io/projects/fomo/で利用可能です。 Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these models. Therefore, we introduce a new benchmark that includes five real-world application-driven datasets, including challenging domains such as aerial and surgical images, and establish baselines. We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects. FOMO has ~3x unknown object mAP compared to baselines on our benchmark. However, our results indicate a significant place for improvement - suggesting a great research opportunity in further scaling object detection methods to real-world domains. Our code and benchmark are available at https://orrzohar.github.io/projects/fomo/.	翻訳日:2023-12-12 19:04:10 公開日:2023-12-10
# Learngene Poolによる可変サイズモデルの構築 Building Variable-sized Models via Learngene Pool ( http://arxiv.org/abs/2312.05743v1 ) ライセンス: Link先を確認	Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng	(参考訳) 近年、ステッチ可能なニューラルネットワーク(sn-net)が、いくつかの事前学習されたネットワークを縫い合わせて、複雑さとパフォーマンスのトレードオフが異なる多数のネットワークを迅速に構築するために提案されている。このようにして、さまざまなリソース制約のあるアプリケーションシナリオで使用できる可変サイズのネットワークの設計やトレーニングの負担を軽減することができる。しかし、SN-Netはまだいくつかの課題に直面している。 1) 独立に訓練された複数のアンカーからのスティッチは、高いストレージリソース消費をもたらす。 2) SN-Netはリソース制約の少ないモデルを構築するための課題に直面している。 3). SN-Netは縫い目層に未学習の初期化法を使用し、最終的な性能を制限している。最近提案されたlearnergeneフレームワークに動機づけられたこれらの課題を克服するために,learnergene poolと呼ばれる新しい手法を提案する。簡単に言うと、learnergeneは、大きな事前学習されたモデルから重要な知識を小さな部分(learnergeneと呼ばれる)に蒸留し、その小さな部分をいくつかの可変サイズのモデルに拡張する。提案手法では,ネットワークブロックを学習ジェネレーションインスタンスとして使用して学習ジェネレーションプールを構築する複数の小モデルに事前学習した大モデルを蒸留する。 1つの大きなモデルしか使われないので、SN-Netとしてもっと大きなモデルを格納する必要はなく、蒸留後、低いリソース制約を満たすために小さなモデルを構築するために小さな学習遺伝子インスタンスを作成できる。また、インスタンス間で学習可能な変換行列を挿入して可変サイズのモデルに縫い付け、これらのモデルの性能を向上させる。その結果, SN-Netと比較して, 提案したLeargen Poolの有効性が検証された。 Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net.	翻訳日:2023-12-12 19:03:45 公開日:2023-12-10
# misca:マルチインテント検出とスロット充填のためのインテント・スロット協調モデル MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention ( http://arxiv.org/abs/2312.05741v1 ) ライセンス: Link先を確認	Thinh Pham and Chi Tran and Dat Quoc Nguyen	(参考訳) 複雑な現実の状況と関連性から,複数意図の検出やスロットの充填に関する研究が盛んになっている。グラフに基づくジョイントモデルである最近の高度なアプローチは、まだ2つの潜在的な問題に直面しているかもしれない。 (i)事前の意図とスロットに基づいてグラフを構築することにより生じる不確実性は、意図とスロットの相関情報を不正確なラベルノードの宛先に転送することができる。 (ii)トークン単位のインテント投票ごとに複数のインテントラベルを直接組み込むと、スロット予測が誤っている可能性があるため、全体的なパフォーマンスが損なわれる可能性がある。この2つの問題に対処するため,我々はmiscaというジョイントモデルを提案する。我々のMISCAは、意図-スロットのコアテンション機構とラベルアテンション機構の基盤層を導入している。これらのメカニズムによりmiscaはインテントとスロットラベルの相関を効果的に捉え、グラフ構築の必要性をなくすことができる。インテントからスロットへ、スロットからインテントへ、複数のレベルのラベル固有の表現を通して、トークンレベルのインテント情報に頼ることなく、双方向の相関情報の転送も行う。実験の結果、MISCAは従来のモデルよりも優れており、MixATISとMixSNIPSの2つのベンチマークデータセット上で、新しい最先端の全体的な精度性能を実現している。これは注意機構の有効性を強調します。 The research study of detecting multiple intents and filling slots is becoming more popular because of its relevance to complicated real-world situations. Recent advanced approaches, which are joint models based on graphs, might still face two potential issues: (i) the uncertainty introduced by constructing graphs based on preliminary intents and slots, which may transfer intent-slot correlation information to incorrect label node destinations, and (ii) direct incorporation of multiple intent labels for each token w.r.t. token-level intent voting might potentially lead to incorrect slot predictions, thereby hurting the overall performance. To address these two issues, we propose a joint model named MISCA. Our MISCA introduces an intent-slot co-attention mechanism and an underlying layer of label attention mechanism. These mechanisms enable MISCA to effectively capture correlations between intents and slot labels, eliminating the need for graph construction. They also facilitate the transfer of correlation information in both directions: from intents to slots and from slots to intents, through multiple levels of label-specific representations, without relying on token-level intent information. Experimental results show that MISCA outperforms previous models, achieving new state-of-the-art overall accuracy performances on two benchmark datasets MixATIS and MixSNIPS. This highlights the effectiveness of our attention mechanisms.	翻訳日:2023-12-12 19:03:18 公開日:2023-12-10
# GAMC:マスキング付きグラフオートエンコーダを用いたフェイクニュース検出のための教師なし手法 GAMC: An Unsupervised Method for Fake News Detection using Graph Autoencoder with Masking ( http://arxiv.org/abs/2312.05739v1 ) ライセンス: Link先を確認	Shu Yin, Chao Gao, Zhen Wang	(参考訳) ソーシャルメディアの普及に伴い、偽ニュースの拡散は重大な懸念となり、大衆の認識を誤解させ、社会的安定に影響を及ぼす可能性がある。 cnn、rnn、bertのようなトランスフォーマーモデルのようなディープラーニング手法は偽ニュースの検出を強化しているが、主にニュース伝播中にソーシャルコンテキストを見下ろすコンテンツに焦点を当てている。グラフベースのテクニックはこのソーシャルコンテキストを取り入れているが、大きなラベル付きデータセットの必要性によって制限されている。本稿では,マスキングとコントラスト学習を備えたグラフオートエンコーダを用いて,教師なしの偽ニュース検出手法であるGAMCを紹介する。情報伝達のコンテキストと内容を自己教師付き信号として活用することにより,ラベル付きデータセットの要求を無効化する。元のニュース伝搬グラフを拡張し、それらをグラフエンコーダでエンコードし、グラフデコーダを用いて再構成する。再構成誤差やコントラスト損失を含むユニークな複合損失関数を設計する。この手法の貢献は、偽ニュースの検出に自己教師付き学習を導入し、2つの異なる損失を統合するグラフオートエンコーダを提案し、実際のデータセット実験を通じてアプローチの有効性を検証することである。 With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection, they primarily focus on content, overlooking social context during news propagation. Graph-based techniques have incorporated this social context but are limited by the need for large labeled datasets. Addressing these challenges, this paper introduces GAMC, an unsupervised fake news detection technique using the Graph Autoencoder with Masking and Contrastive learning. By leveraging both the context and content of news propagation as self-supervised signals, our method negates the requirement for labeled datasets. We augment the original news propagation graph, encode these with a graph encoder, and employ a graph decoder for reconstruction. A unique composite loss function, including reconstruction error and contrast loss, is designed. The method's contributions are: introducing self-supervised learning to fake news detection, proposing a graph autoencoder integrating two distinct losses, and validating our approach's efficacy through real-world dataset experiments.	翻訳日:2023-12-12 19:02:48 公開日:2023-12-10
# fedreverse: 多人数可逆ディープニューラルネットワークのウォーターマーキング FedReverse: Multiparty Reversible Deep Neural Network Watermarking ( http://arxiv.org/abs/2312.05738v1 ) ライセンス: Link先を確認	Junlong Mao, Huiyi Tang, Yi Zhang, Fengxia Liu, Zhiyong Zheng and Shanxiang Lyu	(参考訳) 商用アプリケーションにおけるディープニューラルネットワーク(DNN)の普及は急速に進んでいる。同時に、DNNモデルの複雑化とコストの増大により、これらの訓練されたモデルに関連する知的財産の保護を取り巻く緊急性が高まっている。この点において、DNNの透かしは重要な保護技術として現れている。本稿では,性能への影響を最小限に抑えつつ,堅牢な著作権保護のための多元的可逆的透かし手法であるfeedreverseを提案する。既存の方法とは異なり、feedreverseはモデルトレーニング後に複数のパーティから共同ウォーターマークを埋め込み、個々の著作権クレームを保証できる。さらに、FedReverseは可逆であり、全クライアントの同意を得て完全な透かしを削除することができる。 FedReverseは完璧なカバーを示し、透かしのある内容の観察が隠された透かしに関する情報を明かさないようにする。さらに、既知のオリジナル攻撃(koa)に対する抵抗を示し、攻撃者がウォーターマークを偽造したり、鍵を推測したりするのは非常に困難である。本稿では,MNISTデータセットに基づいて学習した多層パーセプトロン(MLP)と畳み込みニューラルネットワーク(CNN)の総合シミュレーションを通じてFedReverseを評価する。シミュレーションは、FedReverseの堅牢性、可逆性、および様々な埋め込みパラメータと複数のクライアントシナリオにわたるモデルの精度に最小限の影響を示す。 The proliferation of Deep Neural Networks (DNN) in commercial applications is expanding rapidly. Simultaneously, the increasing complexity and cost of training DNN models have intensified the urgency surrounding the protection of intellectual property associated with these trained models. In this regard, DNN watermarking has emerged as a crucial safeguarding technique. This paper presents FedReverse, a novel multiparty reversible watermarking approach for robust copyright protection while minimizing performance impact. Unlike existing methods, FedReverse enables collaborative watermark embedding from multiple parties after model training, ensuring individual copyright claims. In addition, FedReverse is reversible, enabling complete watermark removal with unanimous client consent. FedReverse demonstrates perfect covering, ensuring that observations of watermarked content do not reveal any information about the hidden watermark. Additionally, it showcases resistance against Known Original Attacks (KOA), making it highly challenging for attackers to forge watermarks or infer the key. This paper further evaluates FedReverse through comprehensive simulations involving Multi-layer Perceptron (MLP) and Convolutional Neural Networks (CNN) trained on the MNIST dataset. The simulations demonstrate FedReverse's robustness, reversibility, and minimal impact on model accuracy across varying embedding parameters and multiple client scenarios.	翻訳日:2023-12-12 19:02:13 公開日:2023-12-10
# aswt-sgnn:適応スペクトルウェーブレット変換に基づく自己教師付きグラフニューラルネットワーク ASWT-SGNN: Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network ( http://arxiv.org/abs/2312.05736v1 ) ライセンス: Link先を確認	Ruyue Liu, Rong Yin, Yong Liu, Weiping Wang	(参考訳) グラフ比較学習(GCL)は、グラフ畳み込みネットワーク(GCN)と比較学習の利点を組み合わせた自己教師型手法であり、ノード表現の学習に有望である。しかし、これらの手法で使用されるGCNエンコーダは、空間的およびスペクトル的局所化トレードオフを含む不確実性原理によって本質的に制限されている固定グラフ表現を学習するためにフーリエ変換に依存する。本稿では,既存手法の柔軟性と計算コストのかかる固有分解と高密度行列乗算を克服するために,適応スペクトルウェーブレット変換を用いた自己教師付きグラフニューラルネットワーク(ASWT-SGNN)を提案する。フィルタ関数を近似するためにスペクトル適応多項式を用い,コントラスト損失を用いてウェーブレットを最適化する。この設計により、スペクトル領域と空間領域の両方で局所フィルタを作成でき、様々なスケールで近隣情報の柔軟な集約を可能にし、局所情報とグローバル情報の制御された変換を容易にする。従来の手法と比較して,提案手法は計算複雑性を低減し,グラフサイズに制約されたグラフ畳み込みニューラルネットワークの制限に対処する。 8つのベンチマークデータセットの大規模な実験により、ASWT-SGNNは高密度スペクトル領域のフィルタ関数を正確に近似し、コストの高い固有分解を避けることを示した。さらに、ASWT-SGNNはノード分類タスクにおける最先端モデルに匹敵する性能を達成する。 Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial and spectral localization trade-offs. To overcome the inflexibility of existing methods and the computationally expensive eigen-decomposition and dense matrix multiplication, this paper proposes an Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network (ASWT-SGNN). The proposed method employs spectral adaptive polynomials to approximate the filter function and optimize the wavelet using contrast loss. This design enables the creation of local filters in both spectral and spatial domains, allowing flexible aggregation of neighborhood information at various scales and facilitating controlled transformation between local and global information. Compared to existing methods, the proposed approach reduces computational complexity and addresses the limitation of graph convolutional neural networks, which are constrained by graph size and lack flexible control over the neighborhood aspect. Extensive experiments on eight benchmark datasets demonstrate that ASWT-SGNN accurately approximates the filter function in high-density spectral regions, avoiding costly eigen-decomposition. Furthermore, ASWT-SGNN achieves comparable performance to state-of-the-art models in node classification tasks.	翻訳日:2023-12-12 19:01:40 公開日:2023-12-10
# ディープラーニングを用いたマルチモーダル会話感情認識に関する総合的研究 A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning ( http://arxiv.org/abs/2312.05735v1 ) ライセンス: Link先を確認	Yuntao Shou, Tao Meng, Wei Ai, Nan Yin, Keqin Li	(参考訳) マルチモーダル会話感情認識(MCER)は、会話シーンにおけるテキスト、音声、視覚情報を用いて話者の感情状態を認識し、追跡することを目的としている。 MCER問題の解析と研究は、感情コンピューティング、インテリジェントなレコメンデーション、人間とコンピュータの相互作用分野において重要である。従来の単一発話のマルチモーダル感情認識や単一モーダル会話感情認識とは異なり、mcerはより複雑な感情相互作用関係を扱う必要があるより難しい問題である。重要な問題は、感情的相互作用関係に基づくマルチモーダル特徴融合のための一貫性と補完的意味論の学習である。この問題を解決するために、深層学習技術に基づくmcerに関する広範な研究を行ったが、モデリング手法の体系的なレビューが不足している。したがって、MCERのディープラーニングにおける最近の進歩のタイムリーで包括的な概要は、学術や産業にとって非常に重要である。本研究では,mcerモデリング手法の包括的概要と,mcer手法を4つのカテゴリ(文脈自由モデリング,逐次文脈モデリング,話者微分モデリング,話者関係モデリング)に大まかに分割した。さらに,MCERが公開している一般的なデータセット,マルチモーダル特徴抽出手法,アプリケーション領域,既存の課題,今後の開発方向性についても論じる。我々は、MCER研究者が感情認識の現在の研究状況を理解し、いくつかのインスピレーションを与え、より効率的なモデルを開発するのに役立つことを期待している。 Multi-modal conversation emotion recognition (MCER) aims to recognize and track the speaker's emotional state using text, speech, and visual information in the conversation scene. Analyzing and studying MCER issues is significant to affective computing, intelligent recommendations, and human-computer interaction fields. Unlike the traditional single-utterance multi-modal emotion recognition or single-modal conversation emotion recognition, MCER is a more challenging problem that needs to deal with more complex emotional interaction relationships. The critical issue is learning consistency and complementary semantics for multi-modal feature fusion based on emotional interaction relationships. To solve this problem, people have conducted extensive research on MCER based on deep learning technology, but there is still a lack of systematic review of the modeling methods. Therefore, a timely and comprehensive overview of MCER's recent advances in deep learning is of great significance to academia and industry. In this survey, we provide a comprehensive overview of MCER modeling methods and roughly divide MCER methods into four categories, i.e., context-free modeling, sequential context modeling, speaker-differentiated modeling, and speaker-relationship modeling. In addition, we further discuss MCER's publicly available popular datasets, multi-modal feature extraction methods, application areas, existing challenges, and future development directions. We hope that our review can help MCER researchers understand the current research status in emotion recognition, provide some inspiration, and develop more efficient models.	翻訳日:2023-12-12 19:00:34 公開日:2023-12-10
# DevBotsはAPIを共同設計できる DevBots can co-design APIs ( http://arxiv.org/abs/2312.05733v1 ) ライセンス: Link先を確認	Vinicius Soares Silva Marques	(参考訳) DevBotsは、ソフトウェア開発をサポートするためにさまざまなタスクを実行する自動化ツールである。それらは増加傾向にあり、繰り返しタスクの自動化やコードジェネレータ、要件の排除やアーキテクチャ定義のコラボレータとして、リポジトリで使用されている。本研究では,ソフトウェア開発におけるdevbotの利用の現状,その特性の理解,ユースケースの特定,devbotと会話型ソフトウェア開発の関係の学習など,24の記事を分析し,人間開発者とボットのコラボレーションを実現する方法について議論した。さらに,人間設計者とdevbotとの協調型api設計に即座のエンジニアリングを適用することで対処すべきギャップを特定し,検索拡張現実を用いた場合とそうでない場合のアプローチが適切かを評価する実験を提案した。私たちの結論では、DevBotsは人間のAPIデザイナと協力することができますが、この2つのアプローチにはアドバンテージとデメリットがあります。 DevBots are automated tools that perform various tasks in order to support software development. They are a growing trend and have been used in repositories to automate repetitive tasks, as code generators, and as collaborators in eliciting requirements and defining architectures. In this study, we analyzed 24 articles to investigate the state of the art of using DevBots in software development, trying to understand their characteristics, identify use cases, learn the relationship between DevBots and conversational software development, and discuss how prompt engineering can enable collaboration between human developers and bots. Additionally, we identified a gap to address by applying prompt engineering to collaborative API design between human designers and DevBots and proposed an experiment to assess what approach, between using Retrieval Augmented Generation or not, is more suitable. Our conclusion is that DevBots can collaborate with human API designers, but the two approaches have advantages and disadvantages.	翻訳日:2023-12-12 18:59:42 公開日:2023-12-10
# 一般化ジェームスの有効ハミルトン法」への回答 Reply to "Comment on `Generalized James' effective Hamiltonian method' " ( http://arxiv.org/abs/2312.05732v1 ) ライセンス: Link先を確認	Wenjun Shao, Chunfeng Wu, and Xun-Li Feng	(参考訳) 前回のコメント [1] において、元の論文 [2] で得られる三階ハミルトニアンは、時間依存性や有効三階拡大の導出方法を考える場合の一般的な状況ではエルミート的ではないと主張した。まず第一に、我々の論文で与えられた3階ハミルトニアンは、ここで述べた条件の下で正確にエルミート的である。第二に, 一般化実効ハミルトニアンを導出する反復的手法はダイソン級数と同値であり, その正確性を保証することができる。第三に、発散した実効ハミルトニアンは、コメントに示されるような時間依存的な状況下では確かに非エルミート的であるが、それは正確には非単体発散ダイソン級数に対応する。断続ダイソン級数は時間依存摂動理論において広く利用されてきたが、本論では非エルミート断続有効ハミルトニアンを有効ハミルトニアンの近似として扱うことができる。 In the preceding Comment [1] it was claimed that the third-order Hamiltonian obtained in our original paper [2] is not Hermitian for general situations when considering time-dependence and the way of deriving the effective third-order expansion is not very rigorous. To reply the comment we should emphasize the following three points: first of all, the third-order Hamiltonian given in our paper is exactly Hermitian under the conditions mentioned there. Secondly, the iterative method adopted in our paper to derive the generalized effective Hamiltonian is equivalent to the Dyson series, and its correctness can thus be guaranteed. Thirdly, although the truncated effective Hamiltonian is indeed non-Hermitian under the time-dependent situation as presented in the Comment, it corresponds exactly to the non-unitary truncated Dyson series. Considering the truncated Dyson series has been extensively utilized in the time-dependent perturbation theory, in our opinion, the non-Hermitian truncated effective Hamiltonian can still be treated as an approximation of the effective Hamiltonian.	翻訳日:2023-12-12 18:59:24 公開日:2023-12-10
# 相関行列を用いたパラメタライズドステアリング基準 Parameterized steering criteria via correlation matrices ( http://arxiv.org/abs/2312.05729v1 ) ライセンス: Link先を確認	Qing-Hua Zhang, Lemin Lai, Shao-Ming Fei	(参考訳) 局所特殊ユニタリ群が与える相関行列に基づく任意の次元二成分系のステアビリティについて検討した。パラメータ化相関行列を用いた二部量子状態のステアリング基準のファミリについて述べる。これらのステアリング基準は、既存のステアリング基準よりも、よりステアブルな状態を検出する可能性がある。結果は詳細な例で示される。 We study the steerability for arbitrary dimensional bipartite systems based on the correlation matrices given by local special unitary groups. We present families of steering criteria for bipartite quantum states in terms of parameterized correlation matrices. We show that these steering criteria may detect more steerable states than the existing steering criteria. The results are illustrated by detailed examples.	翻訳日:2023-12-12 18:58:59 公開日:2023-12-10
# 説明一貫性チェックによるChatGPTによるWeb UIテストの修正 Guiding ChatGPT to Fix Web UI Tests via Explanation-Consistency Checking ( http://arxiv.org/abs/2312.05778v1 ) ライセンス: Link先を確認	Zhuolin Xu, Yuanzhang Lin, Qiushi Li and Shin Hwei Tan	(参考訳) Web UIの急速な進化は、UIテストの維持に時間と労力を要する。 Web UIテストの既存のテクニックは、古いものと一致する新しいWebページのターゲット要素を見つけることに重点を置いており、対応する壊れたステートメントを修復することができる。本稿では,初期局所マッチングに先行する web ui の修正手法を活用し,グローバルマッチングを行うために chatgpt を用いた最初の研究を行う。キーとなる洞察は、以前のテクニックにマッチする要素のリストが与えられたら、ChatGPTは言語理解を利用してグローバルなビューマッチングを実行し、そのコード生成モデルを使って壊れたステートメントを修正できるということです。本稿では,ChatGPTにおける幻覚を緩和するため,提案した結果が一致しているかどうかを判定する説明検証器を設計し,自己補正プロンプトを通じてChatGPTにヒントを提供し,その結果をさらに改善する。本稿では,ChatGPTで強化した手法により,既存のWebテスト修復手法の有効性が向上したことを示す。また、将来のweb uiテストの修復技術を改善する上で、いくつかの重要な知見を共有しています。 The rapid evolution of Web UI incurs time and effort in maintaining UI tests. Existing techniques in Web UI test repair focus on finding the target elements on the new web page that match the old ones so that the corresponding broken statements can be repaired. We present the first study that investigates the feasibility of using prior Web UI repair techniques for initial local matching and then using ChatGPT to perform global matching. Our key insight is that given a list of elements matched by prior techniques, ChatGPT can leverage the language understanding to perform global view matching and use its code generation model for fixing the broken statements. To mitigate hallucination in ChatGPT, we design an explanation validator that checks whether the provided explanation for the matching results is consistent, and provides hints to ChatGPT via a self-correction prompt to further improve its results. Our evaluation on a widely used dataset shows that the ChatGPT-enhanced techniques improve the effectiveness of existing Web test repair techniques. Our study also shares several important insights in improving future Web UI test repair techniques.	翻訳日:2023-12-12 18:52:30 公開日:2023-12-10
# ノイズクロスモーダルマッチングのための負の事前認識 Negative Pre-aware for Noisy Cross-modal Matching ( http://arxiv.org/abs/2312.05777v1 ) ライセンス: Link先を確認	Zhang Xu and Li Hao and Ye Mang	(参考訳) 雑音対応は認識と修正が難しいため,クロスモーダルノイズロバスト学習は難しい課題である。未解決ノイズの累積及び不可避負の影響により、既存の手法ではノイズが増大しても安定した性能を維持することはできない。本稿では,雑音の多い下流タスクにおける大規模視覚言語モデルファインチューニングのための,NPC(Negative Pre-aware Cross-modal)マッチングソリューションを提案する。 1) ノイズ認識と抵抗の2つの側面で特徴付けられる:(1) 従来の手法は、通常、ノイズサブセットを直接フィルタリングするが、各サンプルの負の影響を推定する。信頼できない修正結果を予測するための追加の補正機構は不要であり、自己補強誤差につながる。トレーニングプロセスにおける負の影響に応じて,各サンプルに信頼度重みを割り当てる。これにより、ノイズ蓄積を避けるために各試料の寄与を適応的に調整する。 2) ノイズの増加とともに安定した性能を維持するため, メモリバンクの維持によるDNNの記憶効果を利用する。具体的には、メモリエントリとして高信頼クリーンサンプルを選択するためにGMMを適用し、メモリエントリを使用して各サンプルの負の影響を推定する。クリーンサンプルはノイズの増加とともにGMMにより識別が容易であるため、メモリバンクは高いノイズ比で高い品質を維持することができる。ノイズサンプルに着目した補正機構に比べ、メモリバンクに基づく推定はより堅牢であり、ノイズの多いデータセットでモデル性能を安定させる。広汎な実験により,提案手法は雑音比の増加に伴うマッチング精度と性能安定性を著しく向上することが示された。我々のアプローチは最先端の手法を大きく上回っている。 Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. Due to the cumulative and unavoidable negative impact of unresolved noise, existing methods cannot maintain a stable performance when the noise increases. In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tuning on noisy downstream tasks. It is featured in two aspects: (1) For noise recognition and resistance, previous methods usually directly filter out a noise subset, we propose to estimate the negative impact of each sample. It does not need additional correction mechanisms that may predict unreliable correction results, leading to self-reinforcing error. We assign a confidence weight to each sample according to its negative impact in the training process. This adaptively adjusts the contribution of each sample to avoid noisy accumulation. (2) For maintaining stable performance with increasing noise, we utilize the memorization effect of DNNs by maintaining a memory bank. Specifically, we apply GMM to select high-confident clean samples as the memory entry, where the memory entry is used to estimate the negative impact of each sample. Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio. Compared to the correction mechanism focusing on noise samples, memory bank-based estimation is more robust, which makes the model performance stable on noisy datasets. Extensive experiments demonstrate that our method significantly improves matching accuracy and performance stability at increasing noise ratio. Our approach also surpasses the state-of-the-art methods by a large margin.	翻訳日:2023-12-12 18:52:11 公開日:2023-12-10
# 量子インターネットのためのセキュアかつ効率的な絡み合い分散プロトコル Secure and Efficient Entanglement Distribution Protocol for Near-Term Quantum Internet ( http://arxiv.org/abs/2312.05775v1 ) ライセンス: Link先を確認	Nicholas Skjellum, Mohamed Shaban, and Muhammad Ismail	(参考訳) 量子情報技術は、コンピューティング、通信、セキュリティに革命をもたらす可能性がある。その可能性を十分に実現するためには、数百万の量子ビットを持つ量子プロセッサが必要である。したがって、分散量子コンピューティングが既存の短期量子プロセッサをより強力なリソースに活用できるようにするために、量子ネットワークを確立することが重要である。本稿では,量子リンクが限られている古典量子ネットワークにおいて,量子デバイス間の絡み合いを分散するプロトコルを提案する。提案プロトコルでは,バタフライネットワークにおいてエンタングルメントを効率的に分散するためにエンタングルメントスワッピングを用い,ネットワークボトルネックを克服しながら量子テレポーテーションを実現し,各ノードの量子ビット要求を最小化するために,古典的なネットワーク符号化を適用する。実験の結果,提案プロトコルはネットワークサイズと線形にスケールする量子資源を必要とし,各ノードは固定数の量子ビットしか必要としないことがわかった。最大3つのトランシーバペアの小さなネットワークサイズの場合、提案プロトコルは17%少ないキュービットリソースを使用し、精度を8.8%向上し、35%高速なシミュレーション時間でベンチマークを上回っている。ネットワークサイズが大きいほど、割合が大幅に向上する。また,回転による量子状態エンコードを用いた悪質な絡み合いに対する絡み合い分布を確保するプロトコルを提案する。解析の結果,この手法は通信オーバーヘッドを必要とせず,悪意のあるノードが量子状態を取得する確率を7.2%に低下させることがわかった。得られた結果は、高度にスケーラブルで効率的でセキュアな短期量子インターネットを実現するプロトコルに向けられている。 Quantum information technology has the potential to revolutionize computing, communications, and security. To fully realize its potential, quantum processors with millions of qubits are needed, which is still far from being accomplished. Thus, it is important to establish quantum networks to enable distributed quantum computing to leverage existing and near-term quantum processors into more powerful resources. This paper introduces a protocol to distribute entanglements among quantum devices within classical-quantum networks with limited quantum links, enabling more efficient quantum teleportation in near-term hybrid networks. The proposed protocol uses entanglement swapping to distribute entanglements efficiently in a butterfly network, then classical network coding is applied to enable quantum teleportation while overcoming network bottlenecks and minimizing qubit requirements for individual nodes. Experimental results show that the proposed protocol requires quantum resources that scale linearly with network size, with individual nodes only requiring a fixed number of qubits. For small network sizes of up to three transceiver pairs, the proposed protocol outperforms the benchmark by using 17% fewer qubit resources, achieving 8.8% higher accuracy, and with a 35% faster simulation time. The percentage improvement increases significantly for large network sizes. We also propose a protocol for securing entanglement distribution against malicious entanglements using quantum state encoding through rotation. Our analysis shows that this method requires no communication overhead and reduces the chance of a malicious node retrieving a quantum state to 7.2%. The achieved results point toward a protocol that enables a highly scalable, efficient, and secure near-term quantum Internet.	翻訳日:2023-12-12 18:51:46 公開日:2023-12-10
# 量子ネットワークのためのセキュア量子アイデンティティ認証プロトコル Secured Quantum Identity Authentication Protocol for Quantum Networks ( http://arxiv.org/abs/2312.05774v1 ) ライセンス: Link先を確認	Mohamed Shaban and Muhammad Ismail	(参考訳) 量子インターネットは、量子絡み合いと重ね合わせの原理を利用して、非並列レベルのセキュリティと効率的な計算を促進する通信技術の顕著な進歩を示している。量子通信は量子絡み合いを利用して実現することができる。 2つの実体間の絡み合った対の交換によって、量子通信は実現可能となり、量子テレポーテーションのプロセスによって実現される。チャネルの損失の性質と送信光子の指数的デコヒーレンスを考えると、中間ノードの集合は量子リピータとして機能し、2つの遠方のノードを直接絡み合わせることができる。このような量子リピータは悪意があり、2つの通信ノード間で交換された量子情報の秘密性を危うくすることができる。そこで本稿では,量子ネットワークを悪質な絡み合いから保護する量子id認証プロトコルを提案する。既存のプロトコルとは異なり、提案された量子認証プロトコルは共有秘密鍵の定期的な更新を必要としない。シミュレーションの結果,提案プロトコルは,平均4回の認証ラウンドの後に,100%確率で悪質な絡み合いを検出することができた。 Quantum Internet signifies a remarkable advancement in communication technology, harnessing the principles of quantum entanglement and superposition to facilitate unparalleled levels of security and efficient computations. Quantum communication can be achieved through the utilization of quantum entanglement. Through the exchange of entangled pairs between two entities, quantum communication becomes feasible, enabled by the process of quantum teleportation. Given the lossy nature of the channels and the exponential decoherence of the transmitted photons, a set of intermediate nodes can serve as quantum repeaters to perform entanglement swapping and directly entangle two distant nodes. Such quantum repeaters may be malicious and by setting up malicious entanglements, intermediate nodes can jeopardize the confidentiality of the quantum information exchanged between the two communication nodes. Hence, this paper proposes a quantum identity authentication protocol that protects quantum networks from malicious entanglements. Unlike the existing protocols, the proposed quantum authentication protocol does not require periodic refreshments of the shared secret keys. Simulation results demonstrate that the proposed protocol can detect malicious entanglements with a 100% probability after an average of 4 authentication rounds.	翻訳日:2023-12-12 18:51:17 公開日:2023-12-10
# コードリポジトリのためのコンテキスト対応コード生成フレームワーク:ローカル、グローバル、サードパーティライブラリの認識 Context-Aware Code Generation Framework for Code Repositories: Local, Global, and Third-Party Library Awareness ( http://arxiv.org/abs/2312.05772v1 ) ライセンス: Link先を確認	Dianshu Liao, Shidong Pan, Qing Huang, Xiaoxue Ren, Zhenchang Xing, Huan Jin, Qinying Li	(参考訳) コード生成ツールは、ソフトウェア開発プロセスの開発者を助けるために不可欠です。既存のツールはしばしば作業コンテキスト、すなわちコードリポジトリと切り離され、生成されたコードは人間の開発者と似ていない。本稿では,コードリポジトリ内の情報を利用して,論理エラーやコードの冗長性,ライブラリ関連の互換性問題などの少ないコードを生成するための,新しいコード生成フレームワークである \textbf{$a^3$}-codgenを提案する。本稿では,現在のコードファイルからのローカル認識情報,他のコードファイルからのグローバル認識情報,サードパーティライブラリ情報の3つのカテゴリを識別する。結果は、 \textbf{$a^3$}-codgenフレームワークを採用することで、コードのリポジトリ情報をllmに抽出、融合、フィードし、より正確で効率的で再利用可能なコードを生成することに成功した。我々のフレームワークの有効性は、人間の開発者に比べて高い再利用率のコードを生成することでさらに強調されている。この研究はコード生成の分野に大きく貢献し、開発者が実際にソフトウェア開発の進化する要求に対処するためのより強力なツールを提供する。 Code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed \textbf{$A^3$}-CodGen, to harness information within the code repository to generate code with fewer logical errors, code redundancy, and library-related compatibility issues. We identify three categories of representative information for the code repository: local-aware information from current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the \textbf{$A^3$}-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.	翻訳日:2023-12-12 18:50:56 公開日:2023-12-10
# メタラーニングにおけるタスク共同創設者のハッキング Hacking Task Confounder in Meta-Learning ( http://arxiv.org/abs/2312.05771v1 ) ライセンス: Link先を確認	Jingyao Wang, Wenwen Qiang, Yi Ren, Zeen Song, Xingzhe Su, Changwen Zheng	(参考訳) メタ学習は、様々なタスクからメタ知識を学習することで、新しいタスクへの迅速な一般化を可能にする。モデルが1つのトレーニングバッチで学習するタスクが多ければ多いほど、より豊かな知識が得られ、より一般化のパフォーマンスが向上すると直感的に仮定される。しかし、この直感に反して、我々の実験は予期せぬ結果を示した: 1つのバッチにより多くのタスクを追加することは、実際に一般化性能を低下させる。この予期せぬ現象を説明するために,構造因果モデル(scm)を用いて因果分析を行う。本研究は,メタラーニングにおけるタスク固有の因果要因とラベルの相関関係を明らかにする。さらに、結合因子は異なるバッチ間で異なる。これらの要因を`Task Confounders'と呼んでいる。この知見に基づいて,タスク共同創設者の排除を目的としたメタ学習因果表現学習システム(MetaCRL)を提案する。複数のタスクから分離された因果因子をエンコードし、メタラーニングの因果性を保証するために不変ベースのバイレベル最適化機構を利用する。様々なベンチマークデータセットに対する大規模な実験により、我々の研究がSOTA(State-of-the-art)のパフォーマンスを達成することを示す。 Meta-learning enables rapid generalization to new tasks by learning meta-knowledge from a variety of tasks. It is intuitively assumed that the more tasks a model learns in one training batch, the richer knowledge it acquires, leading to better generalization performance. However, contrary to this intuition, our experiments reveal an unexpected result: adding more tasks within a single batch actually degrades the generalization performance. To explain this unexpected phenomenon, we conduct a Structural Causal Model (SCM) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as ``Task Confounders". Based on this insight, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled causal factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.	翻訳日:2023-12-12 18:50:36 公開日:2023-12-10
# ネットワークおよび遺伝的ネットワーク量子ステアリングの検出 Detection of Network and Genuine Network Quantum Steering ( http://arxiv.org/abs/2312.05769v1 ) ライセンス: Link先を確認	Zhihua Chen, Kai Wu, Shao-Ming Fei	(参考訳) 量子ネットワーク相関は、長距離量子通信、量子暗号、分散量子コンピューティングにおいて重要な役割を果たす。一般に、非局所性、絡み合い、操舵などの多部量子ネットワークの相関を特徴付けることは極めて困難である。本稿では,スターネットワーク構成の確率の観点から,ネットワークと真のネットワーク量子ステアリングモデルを提案する。線形および非線形の不等式が導出され、中央のパーティが1つの固定された測定を行うときに、ネットワークと真のネットワーク量子ステアリングを検出する。提案手法は,n局所量子ネットワークの破れよりも多くの量子ネットワークステアリングを検出できることを示す。さらに,biseparable assemblagesはスターネットワーク構成において真のネットワークステアリングを示すことができることを示した。 The quantum network correlations play significant roles in long distance quantum communication,quantum cryptography and distributed quantum computing. Generally it is very difficult to characterize the multipartite quantum network correlations such as nonlocality, entanglement and steering. In this paper, we propose the network and the genuine network quantum steering models from the aspect of probabilities in the star network configurations. Linear and nonlinear inequalities are derived to detect the network and genuine network quantum steering when the central party performs one fixed measurement. We show that our criteria can detect more quantum network steering than that from the violation of the n-locality quantum networks. Moreover, it is shown that biseparable assemblages can demonstrate genuine network steering in the star network configurations.	翻訳日:2023-12-12 18:50:17 公開日:2023-12-10
# Anomaly Diffusion:拡散モデルを用いたFew-Shot Anomaly Image Generation AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model ( http://arxiv.org/abs/2312.05767v1 ) ライセンス: Link先を確認	Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, Chengjie Wang	(参考訳) 異常検査は工業生産において重要な役割を果たす。既存の異常検査手法は、異常データ不足のため性能に制限がある。異常発生法は異常データを強化するために提案されているが、生成した異常とマスクの間の不正確さや不正確さに苦しむかのどちらかである。そこで本研究では, 大規模データセットから学習した潜在拡散モデルの強い先行情報を利用して, マイノリティトレーニングデータに基づく生成信頼性を向上させる, 新たな拡散型少数ショット異常生成モデルであるanomalydiffusionを提案する。まず、学習可能な異常埋め込みと、異常マスクから符号化された空間埋め込みからなり、異常情報を異常な外観と位置情報に切り離す空間異常埋め込みを提案する。さらに, 生成した異常と異常マスクとの整合性を改善するために, 適応的注意再重み付け機構を導入する。生成した異常画像と正常なサンプルとの差に基づいて、モデルを動的に誘導し、あまり目立たない生成異常の領域に焦点を合わせることにより、正確に一致した異常画像・マスク対を生成することができる。広範な実験により,本モデルが実効性と多様性において最先端手法を著しく上回り,下流異常検査タスクの性能を効果的に向上することを示した。コードとデータはhttps://github.com/sjtuplayer/anomalydiffusionで入手できる。 Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above problems, we propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model, which utilizes the strong prior information of latent diffusion model learned from large-scale dataset to enhance the generation authenticity under few-shot training data. Firstly, we propose Spatial Anomaly Embedding, which consists of a learnable anomaly embedding and a spatial embedding encoded from an anomaly mask, disentangling the anomaly information into anomaly appearance and location information. Moreover, to improve the alignment between the generated anomalies and the anomaly masks, we introduce a novel Adaptive Attention Re-weighting Mechanism. Based on the disparities between the generated anomaly image and normal sample, it dynamically guides the model to focus more on the areas with less noticeable generated anomalies, enabling generation of accurately-matched anomalous image-mask pairs. Extensive experiments demonstrate that our model significantly outperforms the state-of-the-art methods in generation authenticity and diversity, and effectively improves the performance of downstream anomaly inspection tasks. The code and data are available in https://github.com/sjtuplayer/anomalydiffusion.	翻訳日:2023-12-12 18:50:04 公開日:2023-12-10
# 階層的推論による多元的法的判断予測 Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning ( http://arxiv.org/abs/2312.05762v1 ) ライセンス: Link先を確認	Yougang Lyu, Jitai Hao, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren	(参考訳) 刑事事実記述における複数の被告は一般に複雑な相互作用を示しており、単一の被告に対する判決結果(例えば、法律記事、訴追、罰則)の予測に焦点を当てた既存の法的判断予測(ljp)手法ではうまく扱えない。この問題に対処するために,マルチディペンダント LJP の課題を提案し,マルチディペンダント事件の各被告に対する判断結果を自動予測することを目的とした。マルチディペンダント LJP の課題は,(1) 各被告の識別不能な判断結果, (2) 訓練と評価のための実世界のデータセットの欠如である。第1の課題に取り組むために,多元的判断プロセスを階層的推論連鎖として定式化し,階層的推論連鎖に従う階層的推論ネットワーク(hrn)と呼ばれる多元的ljp法を導入する。第2の課題に取り組むために,現実のマルチディペンダント LJP データセット,すなわち MultiLJP を収集し,今後の研究を加速する。 MultiLJPの大規模実験により提案したHRNの有効性が検証された。 Multiple defendants in a criminal fact description generally exhibit complex interactions, and cannot be well handled by existing Legal Judgment Prediction (LJP) methods which focus on predicting judgment results (e.g., law articles, charges, and terms of penalty) for single-defendant cases. To address this problem, we propose the task of multi-defendant LJP, which aims to automatically predict the judgment results for each defendant of multi-defendant cases. Two challenges arise with the task of multi-defendant LJP: (1) indistinguishable judgment results among various defendants; and (2) the lack of a real-world dataset for training and evaluation. To tackle the first challenge, we formalize the multi-defendant judgment process as hierarchical reasoning chains and introduce a multi-defendant LJP method, named Hierarchical Reasoning Network (HRN), which follows the hierarchical reasoning chains to determine criminal relationships, sentencing circumstances, law articles, charges, and terms of penalty for each defendant. To tackle the second challenge, we collect a real-world multi-defendant LJP dataset, namely MultiLJP, to accelerate the relevant research in the future. Extensive experiments on MultiLJP verify the effectiveness of our proposed HRN.	翻訳日:2023-12-12 18:49:38 公開日:2023-12-10
# qmgeo:混合切断幾何分布を用いた確率量子化による微分プライベートフェデレート学習 QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution ( http://arxiv.org/abs/2312.05761v1 ) ライセンス: Link先を確認	Zixi Wang and M. Cenk Gursoy	(参考訳) フェデレートラーニング(FL)は、複数のユーザがパラメータサーバの調整の下でのみモデル更新を送信し、データセットをローカルに保つことで、グローバル機械学習(ML)モデルを共同でトレーニングすることを可能にするフレームワークである。このような分散フレームワークの重要な動機の1つは、ユーザにプライバシ保証を提供することである。しかし、ユーザのデータセットをローカルに保存することは、プライバシには不十分であることが示されている。フレームワークにランダム性を導入することで、証明可能なプライバシー保証を提供するために、いくつかの差分プライバシー(DP)機構が提案されている。 FLフレームワークは、特に機械学習モデルが複雑さとサイズを増すにつれて、通信効率の課題にも直面する。量子化は一般的に利用される手法であり、基礎となる情報の圧縮表現を伝送することで通信コストを削減する。 FLにおけるDPと量子化の研究はいくつかあるが、プライバシ保証の提供における量子化手法の潜在的貢献は、まだ広く分析されていない。本稿では,混合幾何分布を用いて,付加雑音を伴わずにdpの提供に必要なランダム性を導入する新しい確率的量子化法を提案する。我々は,フレームワークの収束解析を行い,その性能を実証研究する。 Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.	翻訳日:2023-12-12 18:49:12 公開日:2023-12-10
# RepViT-SAM: リアルタイムセグメンテーションを目指す RepViT-SAM: Towards Real-Time Segmenting Anything ( http://arxiv.org/abs/2312.05760v1 ) ライセンス: Link先を確認	Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding	(参考訳) segment anything model (sam) は様々なコンピュータビジョンタスクにおいて印象的なゼロショット転送性能を示している。しかし、その計算コストは実用的用途にはまだ支障をきたしている。 MobileSAM は蒸留を用いて SAM の重い画像エンコーダを TinyViT に置き換えることを提案する。しかしながら、リソース制限されたモバイルデバイスへのデプロイメントは、自己保持機構によるメモリと計算オーバーヘッドの大幅な増加により、依然として課題に直面している。近年、RepViTはモバイルデバイス上での最先端のパフォーマンスとレイテンシのトレードオフを実現し、ViTの効率的なアーキテクチャ設計をCNNに組み込むことで実現している。そこで,モバイルSAMを追従して,モバイルデバイス上でのリアルタイムセグメンテーションを実現するため,SAMのヘビー級画像エンコーダをRepViTモデルに置き換え,最終的にRepViT-SAMモデルに置き換える。大規模な実験によると、RepViT-SAMはMobileSAMよりもはるかに優れたゼロショット転送能力を持ち、推論速度は10ドル近い。コードとモデルは \url{https://github.com/thu-mig/repvit} で利用可能である。 Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its deployment on resource-constrained mobile devices still encounters challenges due to the substantial memory and computational overhead caused by self-attention mechanisms. Recently, RepViT achieves the state-of-the-art performance and latency trade-off on mobile devices by incorporating efficient architectural designs of ViTs into CNNs. Here, to achieve real-time segmenting anything on mobile devices, following MobileSAM, we replace the heavyweight image encoder in SAM with RepViT model, ending up with the RepViT-SAM model. Extensive experiments show that RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10\times$ faster inference speed. The code and models are available at \url{https://github.com/THU-MIG/RepViT}.	翻訳日:2023-12-12 18:48:50 公開日:2023-12-10
# 1つのモデルを超えて: 自動運転車のためのディープラーニングを組み立てる Beyond One Model Fits All: Ensemble Deep Learning for Autonomous Vehicles ( http://arxiv.org/abs/2312.05759v1 ) ライセンス: Link先を確認	Hemanth Manjunatha and Panagiotis Tsiotras	(参考訳) 深層学習は、車両が周囲を目覚ましい精度で認識し、解釈できるようにすることによって、自動運転に革命をもたらした。この進歩は、媒介的知覚、行動反射、直接知覚を含む様々なディープラーニングモデルに起因しており、それぞれが自律運転能力を向上させるためのユニークな利点と課題を提供している。しかし、これらのアプローチの統合と、様々な運転シナリオにおけるそれらの関連性を理解することに関する研究にはギャップがある。本研究では,Mediated Perception, Behavior Reflex, Direct Perceptionの3つの異なるニューラルネットワークモデルを紹介する。様々な運転条件においてその重要性を探り、それぞれのアプローチの強みと限界に光を当てる。我々のアーキテクチャは、ベース、将来の潜在ベクトル予測、補助タスクネットワークからの情報を融合し、グローバルルーティングコマンドを使用して適切なアクションサブネットワークを選択する。我々は,自動運転における多様なモデリング戦略を効果的に活用するための知見を実験と評価によって提供することを目的とする。その結果、アンサンブルモデルは個々のアプローチよりも優れた性能を示し、各モードが全体のモデルの性能に一意に寄与することが示唆された。さらに,各モダリティの重要性を探究することにより,ロバストな性能を実現するために複数のモデルを活用することの重要性を強調しながら,自動運転における今後の研究のロードマップを提供する。 Deep learning has revolutionized autonomous driving by enabling vehicles to perceive and interpret their surroundings with remarkable accuracy. This progress is attributed to various deep learning models, including Mediated Perception, Behavior Reflex, and Direct Perception, each offering unique advantages and challenges in enhancing autonomous driving capabilities. However, there is a gap in research addressing integrating these approaches and understanding their relevance in diverse driving scenarios. This study introduces three distinct neural network models corresponding to Mediated Perception, Behavior Reflex, and Direct Perception approaches. We explore their significance across varying driving conditions, shedding light on the strengths and limitations of each approach. Our architecture fuses information from the base, future latent vector prediction, and auxiliary task networks, using global routing commands to select appropriate action sub-networks. We aim to provide insights into effectively utilizing diverse modeling strategies in autonomous driving by conducting experiments and evaluations. The results show that the ensemble model performs better than the individual approaches, suggesting that each modality contributes uniquely toward the performance of the overall model. Moreover, by exploring the significance of each modality, this study offers a roadmap for future research in autonomous driving, emphasizing the importance of leveraging multiple models to achieve robust performance.	翻訳日:2023-12-12 18:48:31 公開日:2023-12-10
# CLeaRForecast:時系列予測のための高純度表現の対比学習 CLeaRForecast: Contrastive Learning of High-Purity Representations for Time Series Forecasting ( http://arxiv.org/abs/2312.05758v1 ) ライセンス: Link先を確認	Jiaxin Gao, Yuxiao Hu, Qinglong Cao, Siqi Dai, Yuntian Chen	(参考訳) 時系列予測(TSF)は多くの領域にまたがる現代社会において重要な意味を持つ。従来の表現型学習ベースのtsfアルゴリズムは、典型的な対比型学習パラダイムを採用しており、傾向周期性表現を分離している。しかし、これらの手法は、時系列データに埋め込まれた固有の高インパクトノイズを無視し、表現の不正確さと予測性能を著しく低下させる。 CLeaRForecastは,高純度時系列表現をサンプル,特徴量,アーキテクチャ浄化手法を用いて学習するための,新しいコントラスト学習フレームワークである。より具体的には、元のサンプル(シリーズ)の変換によって生じるより多くのノイズ付加を避けるために、変換は、それぞれ傾向のある部分と周期的な部分に適用される。さらに,多変量系列の無関係変数から発する雑音を緩和するために,チャネル独立学習方式を導入する。線形学習のバックボーンとグローバルなコントラスト損失関数を用いることで、周期性や傾向の冗長性や不均一性によるノイズ導入を防止する。実験の結果, 下流TSFタスクにおけるCLeaRForecastの性能は良好であった。 Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccuracies and seriously demoting the forecasting performance. To address this issue, we propose CLeaRForecast, a novel contrastive learning framework to learn high-purity time series representations with proposed sample, feature, and architecture purifying methods. More specifically, to avoid more noise adding caused by the transformations of original samples (series), transformations are respectively applied for trendy and periodic parts to provide better positive samples with obviously less noise. Moreover, we introduce a channel independent training manner to mitigate noise originating from unrelated variables in the multivariate series. By employing a streamlined deep-learning backbone and a comprehensive global contrastive loss function, we prevent noise introduction due to redundant or uneven learning of periodicity and trend. Experimental results show the superior performance of CLeaRForecast in various downstream TSF tasks.	翻訳日:2023-12-12 18:48:09 公開日:2023-12-10
# 人間のような知覚に向けて:不均一グラフにおける構造因果モデル学習 Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph ( http://arxiv.org/abs/2312.05757v1 ) ライセンス: Link先を確認	Tianqianjin Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu	(参考訳) 異種グラフニューラルネットワークは様々な領域で普及している。しかしながら、それらの一般化可能性と解釈性は、固有の推論フローと人間の推論論理と、学習問題に対する基礎となる因果関係との相違により制限される。本研究では,構造因果モデルとしてHG-SCM (Heterogeneous Graph as Structure Causal Model)を提案する。グラフスキーマから派生したセマンティクスに基づく理解可能な変数の構築と、高度な因果関係発見技術の導入による、これらの変数間のタスクレベルの因果関係の自動学習である。我々は,HG-SCMを実世界の3つのデータセット上の7つの最先端ベースラインモデルと比較した。 HG-SCMは標準偏差を最小限に抑え、予測力と一般化性の両方の観点からその有効性と優位性を実証した。さらに,3つのタスクを対象とした自動学習因果図の可視化と解析は,ドメイン知識と人間の認知とよく一致し,顕著な解釈可能性を示した。 HG-SCMの人間的な性質と、その拡張された一般化性と解釈性は、透明性と信頼性が最重要である特別なシナリオに対して有望な解決策となる。 Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount.	翻訳日:2023-12-12 18:47:49 公開日:2023-12-10
# particle swarm optimization-back propagation neural network と multivariate gaussian-hidden markov model に基づくストックピッキングとタイミングの定量的融合戦略 A quantitative fusion strategy of stock picking and timing based on Particle Swarm Optimized-Back Propagation Neural Network and Multivariate Gaussian-Hidden Markov Model ( http://arxiv.org/abs/2312.05756v1 ) ライセンス: Link先を確認	Huajian Li, Longjian Li, Jiajian Liang, Weinan Dai	(参考訳) 近年、機械学習(ml)は経済的意思決定、投資予測、リスク管理などに効果的なアプローチと新しい技術をもたらし、経済・金融環境の可変かつ複雑な性質に対処している。本研究は,多変量ガウス・ハイデンマルコフモデル (MGHMM) とParticle Swarm (PSO-BPNN) に最適化されたバックプロパゲーションニューラルネットワークを活用することで,株価タイミングとピッキング戦略を組み合わせた定量的融合モデルを提案する。利得化、中和、標準化、CSI300指数の戻りを含む52の因子間の情報係数(IC)が算出された後、主成分分析(PCA)による次元減少後のPSO-BPNNの入力に向かう候補因子として、上位にランクインする要因の所定の量を選択し、次いで一定量の成分在庫を出力する。その後,過去4年間の卓越したパフォーマンスを示すBox-Cox変換後のCSI300インデックスデータを入力して訓練したMGHMMが出力するスクリーニング株と株式市場の状態に基づいて,予測と取引を行う。最終的に、従来の予測と取引の方法は、中国株式市場の戦略と比較される。本論文で提示する株式の選定とタイミングを取り入れた融合戦略は、金融分析の革新的な技術である。 In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by leveraging the Multivariate Gaussian-Hidden Markov Model (MGHMM) and Back Propagation Neural Network optimized by Particle Swarm (PSO-BPNN). After the information coefficients (IC) between fifty-two factors that have been winsorized, neutralized and standardized and the return of CSI 300 index are calculated, a given amount of factors that rank ahead are choose to be candidate factors heading for the input of PSO-BPNN after dimension reduction by Principal Component Analysis (PCA), followed by a certain amount of constituent stocks outputted. Subsequently, we conduct the prediction and trading on the basis of the screening stocks and stock market state outputted by MGHMM trained using inputting CSI 300 index data after Box-Cox transformation, bespeaking eximious performance during the period of past four years. Ultimately, some conventional forecast and trading methods are compared with our strategy in Chinese stock market. Our fusion strategy incorporating stock picking and timing presented in this article provide a innovative technique for financial analysis.	翻訳日:2023-12-12 18:47:27 公開日:2023-12-10
# グローバル・社会経済的・文化的レコメンダシステムを目指して Towards Global, Socio-Economic, and Culturally Aware Recommender Systems ( http://arxiv.org/abs/2312.05805v1 ) ライセンス: Link先を確認	Kelley Ann Yohe	(参考訳) 消費者の嗜好をパーソナライズするためのレコメンデーションシステムが注目されている。これらのシステムは、主に広告推奨(Googleなど)、パーソナライズされた提案(NetflixやSpotifyなど)、小売業の選択(Amazonなど)といったアプリケーションに焦点を合わせてきたが、特に企業が多様な市場への進出を目指す中で、よりグローバルで社会経済的で文化的に意識されたアプローチの恩恵を受ける可能性がある。本稿では,文化的アイデンティティと社会経済的要因を考慮したレコメンダシステムの可能性を検討することを目的とする。近年のレコメンデーションシステムの発展を振り返り、文化的アイデンティティと社会経済的要因が消費者の嗜好に与える影響を考察する。次に,これらの因子をレコメンダシステムに組み込むためのオントロジーとアプローチを提案する。このアプローチの可能性を説明するために,エンタテインメント業界における消費者サブスクリプションプラン選択のシナリオを提案する。既存のレコメンデーターシステムは、社会経済的要因や文化的アイデンティティの認識が欠如しているため、ユーザの好みを正確に理解する能力が限られていると論じる。また、社会経済状況の変化に応じてレコメンデーションを更新することができない。さまざまな機械学習モデルを探索し、このギャップに対処する最終人工ニューラルネットワークモデル(ANN)を開発する。社会経済的・文化的に意識された推薦システムの有効性を,正確性,正確性,F1,リコールの4次元にわたって評価した。ドメイン固有データを含む高度に調整されたannモデル,文化指標の選択,関連する社会経済的要因は,95%の精度,94%の精度,92\%のf1スコア,90\%のリコールでユーザ好みを予測する。 Recommender systems have gained increasing attention to personalise consumer preferences. While these systems have primarily focused on applications such as advertisement recommendations (e.g., Google), personalized suggestions (e.g., Netflix and Spotify), and retail selection (e.g., Amazon), there is potential for these systems to benefit from a more global, socio-economic, and culturally aware approach, particularly as companies seek to expand into diverse markets. This paper aims to investigate the potential of a recommender system that considers cultural identity and socio-economic factors. We review the most recent developments in recommender systems and explore the impact of cultural identity and socio-economic factors on consumer preferences. We then propose an ontology and approach for incorporating these factors into recommender systems. To illustrate the potential of our approach, we present a scenario in consumer subscription plan selection within the entertainment industry. We argue that existing recommender systems have limited ability to precisely understand user preferences due to a lack of awareness of socio-economic factors and cultural identity. They also fail to update recommendations in response to changing socio-economic conditions. We explore various machine learning models and develop a final artificial neural network model (ANN) that addresses this gap. We evaluate the effectiveness of socio-economic and culturally aware recommender systems across four dimensions: Precision, Accuracy, F1, and Recall. We find that a highly tuned ANN model incorporating domain-specific data, select cultural indices and relevant socio-economic factors predicts user preference in subscriptions with an accuracy of 95%, a precision of 94%, a F1 Score of 92\%, and a Recall of 90\%.	翻訳日:2023-12-12 18:42:50 公開日:2023-12-10
# HumanCoser:Semantic-Aware Diffusion Modelによる階層型3Dヒューマンジェネレーション HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model ( http://arxiv.org/abs/2312.05804v1 ) ライセンス: Link先を確認	Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Yebin Liu, Kun Li	(参考訳) 近年、3d服を着た人間の世代が注目を集めている。しかし、既存の作業は、一貫した身体構造を持つ階層化された高品質な3D人間を生成できない。結果として、これらの方法は人間の身体や衣服を任意に別々に変更・編集することができない。本稿では,新しい物理分離意味認識拡散モデルに基づく,テキスト駆動型階層型3次元人間生成フレームワークを提案する。生成した衣服を対象のテキストと整合性を保つため,モデルが生成する非着装コンテンツを排除可能な衣服のセマンティック信頼戦略を提案する。衣服を異なる体型に合わせるため,衣服の自由な移動と再利用を可能にするsmpl駆動暗黙的場変形ネットワークを提案する。また,身体および衣服のsmplモデルに基づく均一な形状プリエントを導入し,特定のテンプレートに拘束されることなく,より多様な3dコンテンツを生成する。実験結果から,本手法は立体構造が一貫した3次元人体を生成できるだけでなく,自由な編集もできることがわかった。ソースコードは公開される予定だ。 The generation of 3D clothed humans has attracted increasing attention in recent years. However, existing work cannot generate layered high-quality 3D humans with consistent body structures. As a result, these methods are unable to arbitrarily and separately change and edit the body and clothing of the human. In this paper, we propose a text-driven layered 3D human generation framework based on a novel physically-decoupled semantic-aware diffusion model. To keep the generated clothing consistent with the target text, we propose a semantic-confidence strategy for clothing that can eliminate the non-clothing content generated by the model. To match the clothing with different body shapes, we propose a SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Besides, we introduce uniform shape priors based on the SMPL model for body and clothing, respectively, which generates more diverse 3D content without being constrained by specific templates. The experimental results demonstrate that the proposed method not only generates 3D humans with consistent body structures but also allows free editing in a layered manner. The source code will be made public.	翻訳日:2023-12-12 18:42:20 公開日:2023-12-10
# 画像の高精細化のための変圧器による選択的超解像法 Transformer-based Selective Super-Resolution for Efficient Image Refinement ( http://arxiv.org/abs/2312.05803v1 ) ライセンス: Link先を確認	Tianyi Zhang, Kishore Kasichainula, Yaoxin Zhuo, Baoxin Li, Jae-sun Seo, Yu Cao	(参考訳) 従来の超解像法には、2つの欠点がある: 大きな画像全体をアップスケールする際の相当な計算コストと、背景の洗練中に下流コンピュータビジョンタスクに異常または潜在的に有害な情報を導入することである。そこで本研究では,非重複タイルにイメージを分割し,ピラミッドアーキテクチャを用いて様々なスケールで興味のあるタイルを選択し,これら選択したタイルを精巧に再構築する,トランスフォーマティブ・ベースの新しいアルゴリズムであるssrを提案する。 3つのデータセットにおける実験結果は,超解像に対するアプローチの効率性とロバスト性を示している。最先端の手法と比較して、FIDスコアは26.78から10.41に削減され、BDD100Kデータセットの計算コストは40%削減された。ソースコードはhttps://github.com/destiny301/ssrで入手できる。 Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlapping tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset. The source code is available at https://github.com/destiny301/SSR.	翻訳日:2023-12-12 18:42:02 公開日:2023-12-10
# SGNet:Depth Map Super-Resolutionのための勾配周波数認識による構造案内ネットワーク SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-Resolution ( http://arxiv.org/abs/2312.05799v1 ) ライセンス: Link先を確認	Zhengxu Wang and Zhiqiang Yan and Jian Yang	(参考訳) 深度超解像(DSR)は、高分解能(HR)深度を低分解能(LR)深度から復元することを目的としており、RGB画像がこの課題を促進するためにしばしば使用される。最近の画像誘導型DSRアプローチは主に深度構造を再構築するための空間領域に焦点を当てている。しかし、LR深度の構造は通常曖昧であるため、空間領域のみを考えると十分な結果を得るには不十分である。本稿では、高次構造を捕捉する固有の能力を有する勾配領域と周波数領域により注意を払う構造ガイドネットワーク(SGNet)を提案する。具体的には,まず,lr深度構造を研削するために,rgb前の正確な勾配を用いた勾配キャリブレーションモジュール(gcm)を導入する。次に、複数のスペクトル差分ブロック(SDB)を再帰的に実行し、RGBの正確な高周波成分をLR深さに伝播する周波数認識モジュール(FAM)を提案する。実データと合成データの両方に関する広範な実験結果は、sgnetの優位性を示し、最先端に到達しています。コードと事前学習されたモデルはhttps://github.com/yanzq95/sgnetで入手できる。 Depth super-resolution (DSR) aims to restore high-resolution (HR) depth from low-resolution (LR) one, where RGB image is often used to promote this task. Recent image guided DSR approaches mainly focus on spatial domain to rebuild depth structure. However, since the structure of LR depth is usually blurry, only considering spatial domain is not very sufficient to acquire satisfactory results. In this paper, we propose structure guided network (SGNet), a method that pays more attention to gradient and frequency domains, both of which have the inherent ability to capture high-frequency structure. Specifically, we first introduce the gradient calibration module (GCM), which employs the accurate gradient prior of RGB to sharpen the LR depth structure. Then we present the Frequency Awareness Module (FAM) that recursively conducts multiple spectrum differencing blocks (SDB), each of which propagates the precise high-frequency components of RGB into the LR depth. Extensive experimental results on both real and synthetic datasets demonstrate the superiority of our SGNet, reaching the state-of-the-art. Codes and pre-trained models are available at https://github.com/yanzq95/SGNet.	翻訳日:2023-12-12 18:41:45 公開日:2023-12-10
# 制御可能な人物画像生成のためのアンタングル表現学習 Disentangled Representation Learning for Controllable Person Image Generation ( http://arxiv.org/abs/2312.05798v1 ) ライセンス: Link先を確認	Wenju Xu, Chengjiang Long, Yongwei Nie, Guanghui Wang	(参考訳) 本稿では,制御可能な人物画像を生成するために,DRL-CPG という新しいフレームワークを提案する。これは,様々なソースの人物が提供した,所望のポーズと人的属性(例えば,ポーズ,頭,上着,ズボン)でリアルな人物画像を生成する。従来のセマンティックマスクを活用して各コンポーネントの表現を得る作業とは違って,比較的簡単な段階から徐々に難しい段階へのカリキュラム学習で学習したトランスフォーマーを用いた,新しい属性エンコーダによる非絡み付き潜在コード生成を提案する。個人セグメンテーションマスクからコンポーネントマスクをランダムに除去するランダムコンポーネントマスク非依存戦略を導入し、トレーニングの困難化とトランスフォーマーエンコーダの促進を目標とし、各コンポーネント間の基底境界を認識する。これにより、モデルがコンポーネントの形状とテクスチャの両方を転送できる。さらに,複数レベル属性(例えば,構造特徴と属性表現)をよく設計されたDual Adaptive Denormalization (DAD)残余ブロックと統合する属性デコーダネットワークを提案する。広範囲にわたる実験により,提案手法は異なる人間の部位のテクスチャと形状の両方を伝達し,現実的な結果が得られることが示された。我々の知る限り、私たちは人物画像生成のためのトランスフォーマーを用いた非絡み合った潜在表現を初めて学習する。 In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g., pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the model to transfer both the shape and texture of the components. Furthermore, we propose a novel attribute decoder network to integrate multi-level attributes (e.g., the structure feature and the attribute representation) with well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive experiments strongly demonstrate that the proposed approach is able to transfer both the texture and shape of different human parts and yield realistic results. To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.	翻訳日:2023-12-12 18:41:24 公開日:2023-12-10
# オンライン教育におけるマルチモーダリティ : 比較研究 Multimodality in Online Education: A Comparative Study ( http://arxiv.org/abs/2312.05797v1 ) ライセンス: Link先を確認	Praneeta Immadisetty, Pooja Rajesh, Akshita Gupta, Anala M R, Soumya A, K. N. Subramanya	(参考訳) 十年が経つと、それは重大なパンデミックとなり、教育フォーラムがオンラインの世界へと大きく移行した。生徒の理解を深めるためのオンラインビデオ会議プラットフォームやツールの利用が急増しているため、教官が生徒が対象と教育的刺激に対する反応を理解する程度を把握できるかどうかを評価するためのメカニズムが必要である。現在のシステムは、教育分野に焦点をあてていない単一のキューのみを考慮する。したがって、対象物に対する学生の反応の全体的概観を総合的に測定する必要性がある。本稿では, 姿勢・ジェスチャー, 顔, 視線追跡, 言語認識の4つの手がかりを考慮しつつ, 認識とオンライン教室への展開に影響を与えるマルチモーダルアプローチの必要性を強調した。各キューで利用可能なさまざまな機械学習モデルを比較し、利用可能なデータセットと教室映像のパラメータを考えると、最も適切なアプローチを提供する。重み付けされた多数決投票から導かれるマルチモーダル手法は, 精度, 調達容易性, 感度, 主要な欠点に基づいて, 個々の手がかりから最も適合したモデルを組み合わせることによって提案される。 The commencement of the decade brought along with it a grave pandemic and in response the movement of education forums predominantly into the online world. With a surge in the usage of online video conferencing platforms and tools to better gauge student understanding, there needs to be a mechanism to assess whether instructors can grasp the extent to which students understand the subject and their response to the educational stimuli. The current systems consider only a single cue with a lack of focus in the educational domain. Thus, there is a necessity for the measurement of an all-encompassing holistic overview of the students' reaction to the subject matter. This paper highlights the need for a multimodal approach to affect recognition and its deployment in the online classroom while considering four cues, posture and gesture, facial, eye tracking and verbal recognition. It compares the various machine learning models available for each cue and provides the most suitable approach given the available dataset and parameters of classroom footage. A multimodal approach derived from weighted majority voting is proposed by combining the most fitting models from this analysis of individual cues based on accuracy, ease of procuring data corpus, sensitivity and any major drawbacks.	翻訳日:2023-12-12 18:40:58 公開日:2023-12-10
# AntGroupにおける効率的なプルーニングと蒸留による大規模マルチモーダルモデル圧縮 Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup ( http://arxiv.org/abs/2312.05795v1 ) ライセンス: Link先を確認	Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao	(参考訳) AntGroupにLarge Multimodal Models(LMM)が配備されたことにより、Alipayにおける広告オーディションタスクの強化など、支払い、セキュリティ、広告におけるマルチモーダルタスクが大幅に進歩した。しかし、このような大規模なモデルの展開は、特にグリーンAIの理想に反するレイテンシや二酸化炭素排出量の増加に課題をもたらす。本稿では,当社独自のLLMであるAntGMMに対して,新しいマルチステージ圧縮戦略を提案する。提案手法は, 小規模のサンプルサイズの採用, 多段プルーニングによる多段冗長性への対処, 高度な蒸留損失設計の導入という3つの側面に焦点をあてている。本研究では,alipayにおける実世界のシナリオから,maad(multimodal ad audition dataset)というデータセットを構築し,提案手法の信頼性を検証する実験を行った。さらに,本戦略の有効性は2023年9月から3ヶ月間のalipayのマルチモーダル広告オーディションにおいて,その運用的成功に現れている。特に、当社のアプローチは遅延を大幅に削減し、700msから90msに削減しました。さらに,我々の圧縮モデルでは,AntGMMの直接展開と比較して,年間約7500万kWhの消費電力削減が期待でき,グリーンAIイニシアチブへのコミットメントを示す。いくつかのレビュー(footnote{https://github.com/MorinW/AntGMM$\_$Pruning})の後、コードとMAADデータセットを公開します。 The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay's real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives. We will publicly release our code and the MAAD dataset after some reviews\footnote{https://github.com/MorinW/AntGMM$\_$Pruning}.	翻訳日:2023-12-12 18:40:36 公開日:2023-12-10
# 高次元線形ガウス系におけるサンプル共分散行列のスペクトル統計 Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear Gaussians ( http://arxiv.org/abs/2312.05794v1 ) ライセンス: Link先を確認	Muhammad Abdullah Naeem, Miroslav Pajic	(参考訳) Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}^{})=\sum_{i=0}^{N-1}x_{i}x_{i}^{}$ and singular values of $EX_{-}^{}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. 負のモーメントはすべての固有値 $\lambda_{1}\big(X_{-}X_{-}^{}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}^{}\big) \geq 0$ に対して鋭い推定を必要とする。測度現象と摂動理論(gershgorins' と cauchys' interlacing theorem)の集中とともに、 \cite{naeem2023spectral} における非エルミート作用素のスペクトル定理の最近の結果を利用して、$a=a^{}$ のときのみ、$\lambda_{j}\big(x_{-}x_{-}^{}\big) \in \big[n-n\sqrt{n}, n+n\sqrt{n}\big]$ の典型的な順序がすべての$j \in [n]$ であることを示した。しかし \emph{high dimension} において、$a$ が1つの異なる固有値 $\lambda$ と1つの幾何学的多重性を持つとき、固有値が \emph{complex half unit disc} を去ると、最大の固有値が次元の呪いに苦しむ: $\lambda_{1}\big(x_{-}x_{-}^{}\big)=\omega\big( \lfloor\frac{n}{n}\rfloor e^{\alpha_{\lambda}n} \big)$, 最小の固有値 $\lambda_{n}\big(x_{-}x_{-}^{}\big) \in (0,n+\sqrt{n}]$ である。したがって、ols推定器は \emph{phase transition} を発生させ、 \emph{transient: increasing iteration only worsens estimation error} となる。 Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}^{})=\sum_{i=0}^{N-1}x_{i}x_{i}^{}$ and singular values of $EX_{-}^{}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues $\lambda_{1}\big(X_{-}X_{-}^{}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}^{}\big) \geq 0$. Leveraging upon recent results on spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along with concentration of measure phenomenon and perturbation theory(Gershgorins' and Cauchys' interlacing theorem) we show that only when $A=A^{}$, typical order of $\lambda_{j}\big(X_{-}X_{-}^{}\big) \in \big[N-n\sqrt{N}, N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when $A$ has only one distinct eigenvalue $\lambda$ with geometric multiplicity of one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest eigenvalue suffers from curse of dimensionality: $\lambda_{1}\big(X_{-}X_{-}^{}\big)=\Omega\big( \lfloor\frac{N}{n}\rfloor e^{\alpha_{\lambda}n} \big)$, while smallest eigenvalue $\lambda_{n}\big(X_{-}X_{-}^{}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS estimator incurs a \emph{phase transition} and becomes \emph{transient: increasing iteration only worsens estimation error}, all of this happening when the dynamics are generated from stable systems.	翻訳日:2023-12-12 18:40:09 公開日:2023-12-10
# 統計的空間的不均質拡散推論 Statistical Spatially Inhomogeneous Diffusion Inference ( http://arxiv.org/abs/2312.05793v1 ) ライセンス: Link先を確認	Yinuo Ren, Yiping Lu, Lexing Ying, Grant M. Rotskoff	(参考訳) 離散観測から拡散方程式を推定することは、生体物理系の単一分子追跡から金融機器のモデリングに至るまで、様々な分野で重要な課題である。基礎となる力学過程が$d$-次元確率微分方程式に従えば、$$$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t}_t,$$$$ はドリフト $\boldsymbol{b}$ と空間非同次拡散テンソル $D = \Sigma\Sigma^{T} の両方のニューラルネットワークに基づく推定器を提案し、$d\boldsymbol{x}_t} と $D$D$D$-$-D が連続であるときの統計収束を保証する。特に、観測データ内に相関が存在する場合であっても、非パラメトリック関数推定のために最小値の最適値 $N^{-\frac{2s}{2s+d}}$ と整列する。この理論結果は,空間的不均質拡散テンソルの正確な推定を示す数値実験によって裏付けられる。 Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form $$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t,$$ we propose neural network-based estimators of both the drift $\boldsymbol{b}$ and the spatially-inhomogeneous diffusion tensor $D = \Sigma\Sigma^{T}$ and provide statistical convergence guarantees when $\boldsymbol{b}$ and $D$ are $s$-H\"older continuous. Notably, our bound aligns with the minimax optimal rate $N^{-\frac{2s}{2s+d}}$ for nonparametric function estimation even in the presence of correlation within observational data, which necessitates careful handling when establishing fast-rate generalization bounds. Our theoretical results are bolstered by numerical experiments demonstrating accurate inference of spatially-inhomogeneous diffusion tensors.	翻訳日:2023-12-12 18:38:59 公開日:2023-12-10
# 不規則な経路を取る:時系列予測変換器のデコーダ Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting Transformer ( http://arxiv.org/abs/2312.05792v1 ) ライセンス: Link先を確認	Li Shen, Yuning Wei, Yangzhu Wang, Hongguang Li	(参考訳) モノのインターネット(IoT)システムの開発において,意思決定者が現状を評価し,今後の政策を定式化する上で,正確な長期予測手法が不可欠である。現在、トランスフォーマーとmlpは、深い時系列予測のための2つのパラダイムであり、前者は、その優れた注意機構とエンコーダ-デコーダアーキテクチャのおかげで、より普及している。しかし、データ科学者はエンコーダの研究に参入する意思があり、デコーダは無意識のままである。一部の研究者は複雑さを減らすためにデコーダの代わりに線形射影も採用している。我々は、入力シーケンスの特徴を抽出し、エンコーダとデコーダのそれぞれの機能である入力シーケンスと予測シーケンスの関係を求めることが最重要であると主張している。 CV分野におけるFPNの成功を機に,エンコーダとデコーダのボトムアップアーキテクチャとトップダウンアーキテクチャを用いて,フルかつ合理的な階層を構築するFPPformerを提案する。本研究における要素的注意を改良したエンコーダとデコーダの形式も異なるカッティング・エッジ・パッチ・アズ・アウェイ・アテンションを活用し,さらに発展させた。 12ベンチマークの6つの最先端ベースラインによる大規模な実験により、FPPformerの有望な性能と、Transformerの時系列予測における精巧なデコーダの重要性が検証された。ソースコードはhttps://github.com/OrigamiSL/FPPformerで公開されている。 With the development of Internet of Things (IoT) systems, precise long-term forecasting method is requisite for decision makers to evaluate current statuses and formulate future policies. Currently, Transformer and MLP are two paradigms for deep time-series forecasting and the former one is more prevailing in virtue of its exquisite attention mechanism and encoder-decoder architecture. However, data scientists seem to be more willing to dive into the research of encoder, leaving decoder unconcerned. Some researchers even adopt linear projections in lieu of the decoder to reduce the complexity. We argue that both extracting the features of input sequence and seeking the relations of input and prediction sequence, which are respective functions of encoder and decoder, are of paramount significance. Motivated from the success of FPN in CV field, we propose FPPformer to utilize bottom-up and top-down architectures respectively in encoder and decoder to build the full and rational hierarchy. The cutting-edge patch-wise attention is exploited and further developed with the combination, whose format is also different in encoder and decoder, of revamped element-wise attention in this work. Extensive experiments with six state-of-the-art baselines on twelve benchmarks verify the promising performances of FPPformer and the importance of elaborately devising decoder in time-series forecasting Transformer. The source code is released in https://github.com/OrigamiSL/FPPformer.	翻訳日:2023-12-12 18:38:12 公開日:2023-12-10
# SimPSI: 時系列データ拡張におけるスペクトル情報保存のための簡易戦略 SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation ( http://arxiv.org/abs/2312.05790v1 ) ライセンス: Link先を確認	Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo	(参考訳) データ拡張は、データサイズによる制限を克服するために、ニューラルネットワークをトレーニングする上で重要な要素であり、時系列のためにいくつかの技術が研究されている。これらのテクニックは特定のタスクで有効であるが、時系列ベンチマークにはまだ一般化されていない。現在のデータ拡張技術は、周波数領域に含まれるコア情報を台無しにする。そこで本研究では,時系列データ拡張におけるスペクトル情報(SimPSI)の保存方法を提案する。 SimPSIは、各周波数の重要度を示す保存マップによって重み付けされた元の入力スペクトルと拡張入力スペクトルを混合することによりスペクトル情報を保存する。特に、我々の実験的な貢献は、マグニチュードスペクトル、サリエンシーマップ、スペクトル保存マップの3つの異なる保存マップを構築することである。我々は,SimPSIを様々な時系列データ拡張に適用し,その効果を広範囲の時系列ベンチマークで評価する。実験結果から,SimPSIはコアスペクトル情報を保存することで時系列データ拡張の性能を大幅に向上することがわかった。論文で使用されたソースコードはhttps://github.com/Hyun-Ryu/simpsi.comで公開されている。 Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency domain. To address this issue, we propose a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. SimPSI preserves the spectral information by mixing the original and augmented input spectrum weighted by a preservation map, which indicates the importance score of each frequency. Specifically, our experimental contributions are to build three distinct preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. We apply SimPSI to various time series data augmentations and evaluate its effectiveness across a wide range of time series benchmarks. Our experimental results support that SimPSI considerably enhances the performance of time series data augmentations by preserving core spectral information. The source code used in the paper is available at https://github.com/Hyun-Ryu/simpsi.	翻訳日:2023-12-12 18:37:44 公開日:2023-12-10
# 高再生率と正規化を考慮したスパース・リワードゴール・コンディション強化学習 Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization ( http://arxiv.org/abs/2312.05787v1 ) ライセンス: Link先を確認	Takuya Hiraoka	(参考訳) 高再生率(RR)と正則化を有する強化学習(RL)法は, より優れた試料効率により注目されている。しかし、これらの手法は主に密帰的タスクのために開発された。本稿では、これらのRL手法をスパース逆ゴール条件タスクに拡張することを目的とする。我々はRandomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021) を用いた。 REDQをスパース・リワード目標条件タスクに適用するには、以下の修正を加えます。 (i)後見体験リプレイと (ii)バウンディングターゲットのq値。我々は,ロボット工学における目標条件12タスク(plappert et al., 2018)において,これらの修正によりredqを評価し,従来のsota(state-of-the-art) rl法よりも約2 \times$良いサンプル効率が得られることを示した。さらに、REDQの特定のコンポーネントの必要性を再考し、不要なものを取り除き、それを単純化する。我々の修正によって単純化されたREDQは、ロボティクスの4つのFetchタスクのSoTAメソッドよりも、$\sim 8 \times$優れたサンプル効率が得られる。 Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.	翻訳日:2023-12-12 18:37:24 公開日:2023-12-10
# 深層強化学習を用いた動的環境におけるスケーラブルな自動運転のためのグラフベース予測計画政策ネットワーク(GP3Net) Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning ( http://arxiv.org/abs/2312.05784v1 ) ライセンス: Link先を確認	Jayabrata Chowdhury, Venkataramanan Shivaraman, Suresh Sundaram and P B Sujit	(参考訳) 最近の自動運転車のモーションプランニング(avs)の進歩は、非定常運転環境でのエキスパートドライバーの行動を使うことに大きな期待を示している。しかし、専門家ドライバーによる学習は、交通参加者のダイナミックな振る舞いと気象条件のために、ドメインシフトやほぼ障害シナリオから回復するために、より汎用性を必要とする。深層グラフに基づく予測・計画政策ネットワーク(GP3Net)フレームワークは,交通参加者間のインタラクションをコンテキスト情報にエンコードし,AVの安全な操作を判断する非定常環境に対して提案されている。時空間グラフは、トラヒック参加者間の相互作用をモデル化し、その参加者の将来の軌跡を予測する。予測された軌道は、進化する非定常運転環境を予測するために不確実性が埋め込まれたAV周辺の将来の占有マップを生成するために利用される。次に、gp3netフレームワークのポリシーネットワークにコンテキスト情報と将来の占有マップを入力し、近位ポリシー最適化(ppo)アルゴリズムを用いてトレーニングする。提案したGP3Net性能は,交通パターンのドメインシフト(アーバン,ハイウェイ,混合)を基準とした標準CARLAベンチマークシナリオで評価される。その結果,gp3netは旧来の模倣学習型計画モデルよりも優れていた。さらに、目に見えない新しい気象条件では、GP3Netはより少ないトラフィック違反で所望の経路を完成させる。最後に,非定常環境における安全対策を強化するための予測モジュールの導入の利点を強調する。 Recent advancements in motion planning for Autonomous Vehicles (AVs) show great promise in using expert driver behaviors in non-stationary driving environments. However, learning only through expert drivers needs more generalizability to recover from domain shifts and near-failure scenarios due to the dynamic behavior of traffic participants and weather conditions. A deep Graph-based Prediction and Planning Policy Network (GP3Net) framework is proposed for non-stationary environments that encodes the interactions between traffic participants with contextual information and provides a decision for safe maneuver for AV. A spatio-temporal graph models the interactions between traffic participants for predicting the future trajectories of those participants. The predicted trajectories are utilized to generate a future occupancy map around the AV with uncertainties embedded to anticipate the evolving non-stationary driving environments. Then the contextual information and future occupancy maps are input to the policy network of the GP3Net framework and trained using Proximal Policy Optimization (PPO) algorithm. The proposed GP3Net performance is evaluated on standard CARLA benchmarking scenarios with domain shifts of traffic patterns (urban, highway, and mixed). The results show that the GP3Net outperforms previous state-of-the-art imitation learning-based planning models for different towns. Further, in unseen new weather conditions, GP3Net completes the desired route with fewer traffic infractions. Finally, the results emphasize the advantage of including the prediction module to enhance safety measures in non-stationary environments.	翻訳日:2023-12-12 18:37:01 公開日:2023-12-10
# DCIR:マルチエージェント強化学習のための動的一貫性固有のリワード DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2312.05783v1 ) ライセンス: Link先を確認	Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan	(参考訳) マルチエージェントシステムにおけるエージェント毎の最適行動ポリシーの学習は必須だが難しい問題である。マルチエージェント強化学習は実りある進歩を遂げているが、2つのエージェントが一貫性のある行動を示すべきかどうかのダイナミクスに対処するという課題はまだ未解決である。本稿では,各エージェントに対して最適なポリシーを学習するために本質的な報酬を利用することで,エージェントの行動が他のエージェントの行動と一致しているかどうかを学習できる新しいアプローチを提案する。振る舞いの一貫性を、2つのエージェント間の出力アクションの相違として定義することから始めます。次に,他者の行動に気付くエージェントを刺激し,それと一貫性があるかどうかを判断するために,動的一貫性内在報酬(dcir)を導入する。最後に,エージェントの学習可能なスケールファクタを各ステップ毎に提供するダイナミックスケールネットワーク(dsn)を考案し,一貫した行動と報酬の程度を動的に確認する。マルチエージェント粒子, Google Research Football および StarCraft II マイクロマネジメントを含む複数の環境における DCIR の評価を行い,その有効性を示した。 Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy.	翻訳日:2023-12-12 18:36:38 公開日:2023-12-10
# PULSAR:パーキンソン病認識のためのマルチストリーム適応畳み込みを用いたグラフベース正の未ラベル学習 PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition ( http://arxiv.org/abs/2312.05780v1 ) ライセンス: Link先を確認	Md. Zarif Ul Alam, Md Saiful Islam, Ehsan Hoque, M Saifur Rahman	(参考訳) パーキンソン病(英: Parkinson's disease、PD)は、運動、言語、協調に影響を及ぼす神経変性疾患である。タイムリーな診断と治療はpd患者の生活の質を改善することができる。しかし、低所得国(LMIC)では臨床診断へのアクセスが制限されている。したがって、PDのための自動スクリーニングツールの開発は、特に公衆衛生分野において大きな社会的影響をもたらす可能性がある。本稿では,運動障害学会(united parkinson's disease rating scale (mds-updrs)) の指テーピングタスクをウェブカメラで録画したビデオからpdをスクリーニングする新しい方法であるpulsarを提案する。 PULSARは,382名(PD患者183名)から収集したデータに基づいて,訓練および評価を行った。適応型グラフ畳み込みニューラルネットワークを用いて,フィンガーテーピングタスクに特有の時間的グラフエッジを動的に学習した。指関節の相対的位置, 触覚の速度, 加速度など, PD検出に重要となる様々なデータから特徴を学習するために, マルチストリーム適応畳み込みモデルを用いてこのアイデアを拡張した。ビデオのラベルが自己申告されているため、非PDラベルのサンプルに未診断のPDがある可能性がある。我々は、ラベル付き負のデータを必要としないPositive Unlabeled (PU) Learningというアイデアを活用しました。我々の実験は、この方法で問題をモデル化する利点を明らかに示している。 PULSARは検証セットの80.95%の精度を達成し、データ量に制限があるにもかかわらず、独立したテストでは平均71.29%(2.49%の標準偏差)の精度を達成した。これは医療分野でラベル付きデータが不足しているため、特に有望である。 PULSARは、PDスクリーニングを誰にとってもよりアクセスしやすいものにすることを願っている。提案手法は、失調症やハンティントン病などの他の運動障害を評価するために拡張することができる。 Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public health sector. In this paper, we present PULSAR, a novel method to screen for PD from webcam-recorded videos of the finger-tapping task from the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS). PULSAR is trained and evaluated on data collected from 382 participants (183 self-reported as PD patients). We used an adaptive graph convolutional neural network to dynamically learn the spatio temporal graph edges specific to the finger-tapping task. We enhanced this idea with a multi stream adaptive convolution model to learn features from different modalities of data critical to detect PD, such as relative location of the finger joints, velocity and acceleration of tapping. As the labels of the videos are self-reported, there could be cases of undiagnosed PD in the non-PD labeled samples. We leveraged the idea of Positive Unlabeled (PU) Learning that does not need labeled negative data. Our experiments show clear benefit of modeling the problem in this way. PULSAR achieved 80.95% accuracy in validation set and a mean accuracy of 71.29% (2.49% standard deviation) in independent test, despite being trained with limited amount of data. This is specially promising as labeled data is scarce in health care sector. We hope PULSAR will make PD screening more accessible to everyone. The proposed techniques could be extended for assessment of other movement disorders, such as ataxia, and Huntington's disease.	翻訳日:2023-12-12 18:36:09 公開日:2023-12-10
# クロスサイロ知識伝達を用いた大小言語モデルの相互強化 Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer ( http://arxiv.org/abs/2312.05842v1 ) ライセンス: Link先を確認	Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang	(参考訳) 大きな言語モデル(LLM)は広い知識で権限を与えられるが、タスク固有のパフォーマンスは、しばしば準最適である。タスク固有のデータで微調整 LLM を必要とするが、プライバシー上の懸念からアクセスできない可能性がある。本稿では,より小さな言語モデル (SLM) を用いたLLMの拡張手法を提案する。 LLMとSLMの相互強化を実現するために,SLMがタスク固有の高品質なデータを生成するためにLSMを推進し,SLMとSLMの双方が生成されたデータによって拡張されるCrossLMを提案する。様々なベンチマークタスクで公開言語モデルを用いてCrossLMを評価する。その結果、CrossLMはクライアント上でのSLMのタスク固有性能と、LLMの一般化能力を同時に維持しながら、クラウドサーバ上でのLCMのタスク固有性能を著しく向上させることを示した。 While large language models (LLMs) are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible language models across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.	翻訳日:2023-12-12 18:28:34 公開日:2023-12-10
# ニューラルネットワーク解析のためのトポロジカルデータ分析:包括的調査 Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey ( http://arxiv.org/abs/2312.05840v1 ) ライセンス: Link先を確認	Rub\'en Ballester, Carles Casacuberta, Sergio Escalera	(参考訳) このサーベイは、ニューラルネットワーク分析におけるトポロジカルデータ分析(TDA)の適用を包括的に調査する。永続的ホモロジーやMapperといったTDAツールを使用して、ニューラルネットワークとそのデータセットの複雑な構造と振る舞いを調べます。本稿では,データおよびニューラルネットワークから位相情報を得るための様々な戦略について,tdaを用いて検討する。さらに,その一般化能力や表現性など,ニューラルネットワークの特性を分析するためにトポロジカル情報をどのように活用するかについて検討する。深層学習の実際的意義を探究し,特に逆検出やモデル選択といった分野に注目した。調査は,調査対象を4つの広い領域にまとめる。 1.ニューラルネットワークアーキテクチャの特徴 2. 決定領域及び境界の分析 3 内部表現、活性化及びパラメータに関する研究 4. 訓練ダイナミクスと損失関数の探索それぞれのカテゴリの中で,様々な方法論を理解するための背景情報を提供するいくつかの記事について論じる。我々は,本研究から得られた重要な知見を合成し,その分野における課題と潜在的な進歩について議論した。 This survey provides a comprehensive exploration of applications of Topological Data Analysis (TDA) within neural network analysis. Using TDA tools such as persistent homology and Mapper, we delve into the intricate structures and behaviors of neural networks and their datasets. We discuss different strategies to obtain topological information from data and neural networks by means of TDA. Additionally, we review how topological information can be leveraged to analyze properties of neural networks, such as their generalization capacity or expressivity. We explore practical implications of deep learning, specifically focusing on areas like adversarial detection and model selection. Our survey organizes the examined works into four broad domains: 1. Characterization of neural network architectures; 2. Analysis of decision regions and boundaries; 3. Study of internal representations, activations, and parameters; 4. Exploration of training dynamics and loss functions. Within each category, we discuss several articles, offering background information to aid in understanding the various methodologies. We conclude with a synthesis of key insights gained from our study, accompanied by a discussion of challenges and potential advancements in the field.	翻訳日:2023-12-12 18:28:19 公開日:2023-12-10
# 量子インスパイアされたイジング最適化問題の高速数値解法 A Fast Numerical Solver of Quantum-inspired Ising Optimization Problems ( http://arxiv.org/abs/2312.05837v1 ) ライセンス: Link先を確認	Langyu Li and Yu Pan	(参考訳) 量子アニーラ、コヒーレントイジングマシン、および量子インスパイアされた最適化問題を解決するデジタルイジングマシンは、その短期的応用のために急速に発展してきた。デジタルイジングマシンの数値解法は、従来の計算装置に基づいている。本研究では,Ising最適化問題に対する高速かつ効率的な解法を提案する。このアルゴリズムは、イジングモデルのグラフ情報を利用して計算複雑性を低減させるプルーニング法と、離散実行可能領域を連続的に緩和し、効率的な勾配降下法を組み込んだ領域選択法とからなる。実験の結果, 従来の解法よりも桁違いに高速であり, ベンチマーク問題に対する量子アニールを含む量子インスピレーションアニールよりも少なくとも2倍高速であることがわかった。ハードウェアに対する要求が緩和され、量子アニールよりも低コストになるため、提案した解法は、挑戦的な最適化問題の解決における短期的応用の可能性と、量子デバイスの利点を評価するためのベンチマークとして機能する。 Quantum annealers, coherent Ising machines and digital Ising machines for solving quantum-inspired optimization problems have been developing rapidly due to their near-term applications. The numerical solvers of the digital Ising machines are based on traditional computing devices. In this work, we propose a fast and efficient solver for the Ising optimization problems. The algorithm consists of a pruning method that exploits the graph information of the Ising model to reduce the computational complexity, and a domain selection method which introduces significant acceleration by relaxing the discrete feasible domain into a continuous one to incorporate the efficient gradient descent method. The experiment results show that our solver can be an order of magnitude faster than the classical solver, and at least two times faster than the quantum-inspired annealers including the simulated quantum annealing on the benchmark problems. With more relaxed requirements on hardware and lower cost than quantum annealing, the proposed solver has the potential for near-term application in solving challenging optimization problems as well as serving as a benchmark for evaluating the advantage of quantum devices.	翻訳日:2023-12-12 18:28:04 公開日:2023-12-10
# 大規模言語モデルを用いたエビデンスに基づくオープンドメインファクトチェック Evidence-based Interpretable Open-domain Fact-checking with Large Language Models ( http://arxiv.org/abs/2312.05834v1 ) ライセンス: Link先を確認	Xin Tan, Bowei Zou and Ai Ti Aw	(参考訳) 現実の主張に対する普遍的なファクトチェックシステムは、有効かつ十分なリアルタイムの証拠を集め、合理的な判断を下す上で大きな課題に直面している。本稿では,実世界シナリオにおけるクレームチェックのためのオープンドメイン説明可能な事実チェックシステム(oe-fact)を提案する。 OE-Factシステムは、大規模言語モデル(LLM)の強力な理解と推論能力を活用して、クレームを検証し、ファクトチェック決定のための因果説明を生成する。従来の3モジュールファクトチェックフレームワークをオープンドメイン設定に適応させるために,まず,オープンwebサイトからクレーム関連情報を適切な証拠として取得する。その後、llmおよびその後の検証のための類似性計算により、請求に係る証拠を保持する。我々は、ファクト抽出および検証(fever)データセット上での3モジュールoeファクトシステムの性能を評価する。実験結果から,我々のOE-Factシステムは,クローズドドメインとオープンドメインの両方のシナリオにおいて,一般的なファクトチェックベースラインシステムよりも優れた性能を示し,信頼性と正確性を確保しつつ,ファクトチェック決定のための簡潔かつ説得力のあるリアルタイム説明を提供する。 Universal fact-checking systems for real-world claims face significant challenges in gathering valid and sufficient real-time evidence and making reasoned decisions. In this work, we introduce the Open-domain Explainable Fact-checking (OE-Fact) system for claim-checking in real-world scenarios. The OE-Fact system can leverage the powerful understanding and reasoning capabilities of large language models (LLMs) to validate claims and generate causal explanations for fact-checking decisions. To adapt the traditional three-module fact-checking framework to the open domain setting, we first retrieve claim-related information as relevant evidence from open websites. After that, we retain the evidence relevant to the claim through LLM and similarity calculation for subsequent verification. We evaluate the performance of our adapted three-module OE-Fact system on the Fact Extraction and Verification (FEVER) dataset. Experimental results show that our OE-Fact system outperforms general fact-checking baseline systems in both closed- and open-domain scenarios, ensuring stable and accurate verdicts while providing concise and convincing real-time explanations for fact-checking decisions.	翻訳日:2023-12-12 18:27:46 公開日:2023-12-10
# 軽量列車のMLP型視覚異常検出のための空間的動的蒸留法 Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains ( http://arxiv.org/abs/2312.05832v1 ) ライセンス: Link先を確認	Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren	(参考訳) 物体検出タスクにおける畳み込みニューラルネットワーク(CNN)の適用は成功したが、貨物列車画像から断層を検出する効率は、実際のエンジニアリングシナリオの実装には不十分である。従来のcnnにおける空間的不変性とプーリング層の既存モデルの欠点は、重要なグローバル情報の無視をしばしば無視し、貨物列車の故障異状タスクのエラーローカライズに繋がる。これらの問題を解決するため,貨物列車の視覚的故障検出のための多層パーセプトロン(MLP)に基づく空間的動的蒸留フレームワークを設計した。我々はまず,MLPのようなアーキテクチャが空間不変性の課題を克服し,局所的およびグローバル的両手段を効果的に活用する軸シフト戦略を提案する。学生モデルと意味的不一致を効果的に解消できる動的教師機構を含む,教師を伴わない動的蒸留法を提案する。このようなアプローチは、グローバル空間および意味情報のモデル化に効率的なインスタンス埋め込みを利用する余分な監視信号として、低レベルの特徴の出現と高レベルのラベルセマンティクスから、より豊富な詳細を掘り下げている。さらに,提案した動的教師は,学生と共同で学習し,蒸留効率をより高めることができる。 6つの典型的な断層データセットで実施された大規模な実験により、我々の手法は現在の最先端検出器よりも優れており、より低い計算コストでリアルタイム検出を行うことができる。ソースコードは \url{https://github.com/MVME-HBUT/SDD-FTI-FDet} で入手できる。 Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}.	翻訳日:2023-12-12 18:27:24 公開日:2023-12-10
# 物理を意識した多忠実ベイズ最適化:一般化された定式化 Physics-Aware Multifidelity Bayesian Optimization: a Generalized Formulation ( http://arxiv.org/abs/2312.05831v1 ) ライセンス: Link先を確認	Francesco Di Fiore and Laura Mainini	(参考訳) マルチクエリ最適化問題に対する高忠実度モデルの導入は、各クエリでの評価に要する計算コストに大きく制限されている。 multifidelity bayesian methods (mfbo) は、クエリのサブセレクションのみに対してコストのかかる高忠実度応答を含めることができ、高速な低忠実度モデルを使用して最適化プロセスを高速化できる。 State-of-the-artメソッドは純粋にデータ駆動型検索に依存しており、物理的なコンテキストに関する明確な情報は含まない。本稿では,これらのデータ駆動探索を高速化するために,工学的課題の物理領域に関する事前知識を活用できることを認め,mfboの最適化手順中にドメイン認識の形式を埋め込むための一般化した定式化を提案する。特に、バイアスをドメインの物理的構造をキャプチャする多元性獲得関数として定式化する。これにより、データ駆動検索がドメインプロパティのオンザフライ学習から部分的に緩和され、複数の情報ソースの管理が敏感に強化される。本手法は,全計算コストを抑えつつ最適化探索を誘導する高忠実度シミュレーションを効率よく組み込むことができる。物理学を意識した多元的ベイズ最適化を, 設計最適化と健康モニタリング問題という, 科学や工学でよく見られる2つの最適化問題に対して提示し, 解説した。 The adoption of high-fidelity models for many-query optimization problems is majorly limited by the significant computational cost required for their evaluation at every query. Multifidelity Bayesian methods (MFBO) allow to include costly high-fidelity responses for a sub-selection of queries only, and use fast lower-fidelity models to accelerate the optimization process. State-of-the-art methods rely on a purely data-driven search and do not include explicit information about the physical context. This paper acknowledges that prior knowledge about the physical domains of engineering problems can be leveraged to accelerate these data-driven searches, and proposes a generalized formulation for MFBO to embed a form of domain awareness during the optimization procedure. In particular, we formalize a bias as a multifidelity acquisition function that captures the physical structure of the domain. This permits to partially alleviate the data-driven search from learning the domain properties on-the-fly, and sensitively enhances the management of multiple sources of information. The method allows to efficiently include high-fidelity simulations to guide the optimization search while containing the overall computational expense. Our physics-aware multifidelity Bayesian optimization is presented and illustrated for two classes of optimization problems frequently met in science and engineering, namely design optimization and health monitoring problems.	翻訳日:2023-12-12 18:26:54 公開日:2023-12-10
# スケルトンに基づくアクションセグメンテーションのための分離時空間枠組み A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation ( http://arxiv.org/abs/2312.05830v1 ) ライセンス: Link先を確認	Yunheng Li, Zhongyu Li, Shanghua Gao, Qilong Wang, Qibin Hou, Ming-Ming Cheng	(参考訳) 識別時空間情報を効果的にモデル化することは、長い行動系列のセグメンテーション活動に不可欠である。しかし, 既存の手法では, 2種類のデカップリングモデリングにより, 弱時空間モデリング能力に制限がある。 (i)カスケード相互作用は空間的・時間的モデリングを結合し、長列上での運動のモデリングを行う。 (ii)ジョイント共有時空間モデリングは、異なるジョイントの動きパターンを無視して、各ジョイントをモデリングするために共有ウェイトを採用する。本稿では,これらの問題に対処するための分散時空間フレームワーク(DeST)を提案する。まず,複数の時空間ブロックの積み重ねを回避し,十分な時空間相互作用を実現する。具体的には、DeSTは一度統一された空間モデルを実行し、空間的特徴を異なるサブフィーチャーのグループに分割し、異なるレイヤから時間的特徴と適応的に相互作用する。異なるサブフィーチャは異なる空間意味を含むため、モデルは各層で最適な相互作用パターンを学ぶことができる。一方,異なる関節が異なる速度で動くという事実に触発されて,個別に訓練可能な重みを用いて各関節の時間的特徴を捉えるジョイント分離時空間モデリングを提案する。異なるシーンの4つの大規模なベンチマークでは、DeSTは計算の複雑さを減らして現在の最先端の手法を著しく上回っている。 Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two forms of decoupled modeling: (i) cascaded interaction couples spatial and temporal modeling, which over-smooths motion modeling over the long sequence, and (ii) joint-shared temporal modeling adopts shared weights to model each joint, ignoring the distinct motion patterns of different joints. We propose a Decoupled Spatio-Temporal Framework (DeST) to address the above issues. Firstly, we decouple the cascaded spatio-temporal interaction to avoid stacking multiple spatio-temporal blocks, while achieving sufficient spatio-temporal interaction. Specifically, DeST performs once unified spatial modeling and divides the spatial features into different groups of subfeatures, which then adaptively interact with temporal features from different layers. Since the different sub-features contain distinct spatial semantics, the model could learn the optimal interaction pattern at each layer. Meanwhile, inspired by the fact that different joints move at different speeds, we propose joint-decoupled temporal modeling, which employs independent trainable weights to capture distinctive temporal features of each joint. On four large-scale benchmarks of different scenes, DeST significantly outperforms current state-of-the-art methods with less computational complexity.	翻訳日:2023-12-12 18:26:30 公開日:2023-12-10
# 運動画像の効率的なニューラル表現と実行のためのスパースマルチタスク学習 Sparse Multitask Learning for Efficient Neural Representation of Motor Imagery and Execution ( http://arxiv.org/abs/2312.05828v1 ) ライセンス: Link先を確認	Hye-Bin Shin, Kang Yin, Seong-Whan Lee	(参考訳) 脳-コンピュータインタフェース(BCI)におけるニューラルネットワーク解釈とユーザ意図の分類のための効率的なニューラルネットワークモデルを求める中で、基礎となる神経サブスペースのスパース表現を学習することが重要である。本研究では,人間の脳で観察される神経部分空間の自然な分割に着想を得た,運動画像(mi)と運動実行(me)タスクのためのスパースなマルチタスク学習フレームワークを提案する。 mi-me分類のためのdual-task cnnモデルが与えられた場合,sparsificationアプローチをpruneの超流動接続に適用し,両タスクにおいて高い重要性を示すものを強化する。提案手法では,各タスクに関連付けられ,共通するニューラルアンサンブルを解明し,冗長な接続を排除し,ニューラル信号復号の忠実性を高めるために,スペーシフィケーション手法を用いる。以上の結果から, この調整された疎水性は, オーバーフィッティング問題を緩和し, 少ないデータ量でテスト性能を向上させることを示唆し, 計算効率とロバストなBCIシステムの実現に向けての道のりが示唆された。 In the quest for efficient neural network models for neural data interpretation and user intent classification in brain-computer interfaces (BCIs), learning meaningful sparse representations of the underlying neural subspaces is crucial. The present study introduces a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks, inspired by the natural partitioning of associated neural subspaces observed in the human brain. Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections and reinforce those that show high importance in both tasks. Through our approach, we seek to elucidate the distinct and common neural ensembles associated with each task, employing principled sparsification techniques to eliminate redundant connections and boost the fidelity of neural signal decoding. Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data, suggesting a viable path forward for computationally efficient and robust BCI systems.	翻訳日:2023-12-12 18:26:08 公開日:2023-12-10
# 毒性流の検出 Detecting Toxic Flow ( http://arxiv.org/abs/2312.05827v1 ) ライセンス: Link先を確認	\'Alvaro Cartea, Gerardo Duran-Martin, Leandro S\'anchez-Betancourt	(参考訳) 本稿では,ブローカーがクライアントから受け取る有害取引を予測する枠組みを開発した。トキシック取引は、末層と部分空間の推定の射影に基づく統一(PULSE)と呼ばれる新しいオンラインベイズ手法で予測される。 pulseはベイジアンニューラルネットワークをシーケンシャルにトレーニングするための高速かつ統計効率の良いオンライン手順である。当社の方法論をテストするために、外国為替取引のプロプライエタリなデータセットを使用しています。 PULSEは、取引が有害になるかどうかを予測する際に、標準的な機械学習および統計手法よりも優れており、ベンチマーク手法はロジスティック回帰、ランダムフォレスト、再帰的に更新された最大様相推定器である。顧客から受け取った取引の内面化や外部化のために毒性予測を利用するブローカーのための戦略を考案する。パラメータの更新や予測に1ミリ秒未満を要するため,提案手法をリアルタイムに実装することができる。ベンチマークと比較すると、PULSEは最も高いPnLを獲得し、私たちが考慮する地平線における最大の損失を回避している。 This paper develops a framework to predict toxic trades that a broker receives from her clients. Toxic trades are predicted with a novel online Bayesian method which we call the projection-based unification of last-layer and subspace estimation (PULSE). PULSE is a fast and statistically-efficient online procedure to train a Bayesian neural network sequentially. We employ a proprietary dataset of foreign exchange transactions to test our methodology. PULSE outperforms standard machine learning and statistical methods when predicting if a trade will be toxic; the benchmark methods are logistic regression, random forests, and a recursively-updated maximum-likelihood estimator. We devise a strategy for the broker who uses toxicity predictions to internalise or to externalise each trade received from her clients. Our methodology can be implemented in real-time because it takes less than one millisecond to update parameters and make a prediction. Compared with the benchmarks, PULSE attains the highest PnL and the largest avoided loss for the horizons we consider.	翻訳日:2023-12-12 18:25:47 公開日:2023-12-10
# R2Human:1枚の画像からリアルタイムの3D画像表示 R2Human: Real-Time 3D Human Appearance Rendering from a Single Image ( http://arxiv.org/abs/2312.05826v1 ) ライセンス: Link先を確認	Qiao Feng, Yuanwang Yang, Yu-Kun Lai, Kun Li	(参考訳) ホログラフィックコミュニケーションと没入型社会体験を実現するためには,1枚の画像から3次元人間の外観を再構築することが不可欠である。しかし、これは、通常マルチカメラのセットアップに依存する、あるいはオフライン操作に限定される既存のメソッドにとって、依然として課題である。本稿では,1つの画像から実写的3次元人物像のリアルタイム推論とレンダリングを行う最初の手法であるr$^2$humanを提案する。我々のアプローチの中核は、暗黙のテクスチャフィールドと明示的なニューラルレンダリングの強みと、新しい表現であるZマップを組み合わせることである。そこで本研究では,可視領域の忠実度の高い色再構成を行い,オクルード領域の信頼性の高い色推定を行うエンドツーエンドネットワークを提案する。ネットワークの3次元知覚能力をさらに高めるために、フーリエ占有場を利用して、テクスチャフィールド生成の前駆体として機能し、レンダリング段階でサンプリング面を提供する詳細な3次元形状を再構築する。実験の結果,本手法は合成データと実世界画像の両方において最先端のパフォーマンスを達成し,オフラインメソッドを上回ることさえ可能であった。プロジェクトページは http://cic.tju.edu.cn/faculty/likun/projects/R2Human で研究目的で公開されている。 Reconstructing 3D human appearance from a single image is crucial for achieving holographic communication and immersive social experiences. However, this remains a challenge for existing methods, which typically rely on multi-camera setups or are limited to offline operations. In this paper, we propose R$^2$Human, the first approach for real-time inference and rendering of photorealistic 3D human appearance from a single image. The core of our approach is to combine the strengths of implicit texture fields and explicit neural rendering with our novel representation, namely Z-map. Based on this, we present an end-to-end network that performs high-fidelity color reconstruction of visible areas and provides reliable color inference for occluded regions. To further enhance the 3D perception ability of our network, we leverage the Fourier occupancy field to reconstruct a detailed 3D geometry, which serves as a prior for the texture field generation and provides a sampling surface in the rendering stage. Experiments show that our end-to-end method achieves state-of-the-art performance on both synthetic data and challenging real-world images and even outperforms many offline methods. The project page is available for research purposes at http://cic.tju.edu.cn/faculty/likun/projects/R2Human.	翻訳日:2023-12-12 18:25:29 公開日:2023-12-10
# オープンエンド・エンボディード・タスクの解決に向けて Toward Open-ended Embodied Tasks Solving ( http://arxiv.org/abs/2312.05822v1 ) ライセンス: Link先を確認	William Wei Wang, Dongqi Han, Xufang Luo, Yifei Shen, Charles Ling, Boyu Wang, Dongsheng Li	(参考訳) 近年,人工知能(AI)を用いたロボットなどのエンボディエージェントの活用がますます重要になっている。大きな課題はタスクの開放性です。実際には、ロボットは多面的、動的、決定的な「終末状態」が欠如しており、訓練中に遭遇しなかった新しい目標でタスクを実行する必要があることが多い。この問題に対処するため,本稿では,オープンエンド目標に対して柔軟かつ動的にAIを計画・動作させることを目的とした新しいフレームワークである‘textit{Diffusion for Open-ended Goals} (DOG) を紹介する。 DOGは、オンライン計画と制御を適応的に行うために、拡散モデルの生成技術を最先端の訓練なし指導技術と相乗効果する。本評価は,迷路ナビゲーションとロボット制御の両問題において,訓練中に見つからない様々なタスク目標を,DOGが扱えることを示す。私たちの仕事は、AIの適応性とオープンな目標に取り組む能力を高めることに光を当てています。 Empowering embodied agents, such as robots, with Artificial Intelligence (AI) has become increasingly important in recent years. A major challenge is task open-endedness. In practice, robots often need to perform tasks with novel goals that are multifaceted, dynamic, lack a definitive "end-state", and were not encountered during training. To tackle this problem, this paper introduces \textit{Diffusion for Open-ended Goals} (DOG), a novel framework designed to enable embodied AI to plan and act flexibly and dynamically for open-ended task goals. DOG synergizes the generative prowess of diffusion models with state-of-the-art, training-free guidance techniques to adaptively perform online planning and control. Our evaluations demonstrate that DOG can handle various kinds of novel task goals not seen during training, in both maze navigation and robot control problems. Our work sheds light on enhancing embodied AI's adaptability and competency in tackling open-ended goals.	翻訳日:2023-12-12 18:25:06 公開日:2023-12-10
# ASVD:大規模言語モデル圧縮のためのアクティベーション対応特異値分解 ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models ( http://arxiv.org/abs/2312.05821v1 ) ライセンス: Link先を確認	Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun	(参考訳) 本稿では,大規模言語モデル (llm) を圧縮し,様々なコンピューティング環境において広く採用するための,ポストホックなトレーニングフリーな新しい圧縮パラダイムについて検討する。 LLM圧縮の課題、特に、広範囲なトレーニングデータと計算資源への依存について調べる。本稿では,これらの制約に対処するために,アクティベーション対応特異値分解(ASVD)と呼ばれるトレーニングフリーアプローチを提案する。 ASVDは、活性化分布に基づいて重み行列を調整し、分解精度と効率を向上させることにより、活性化出力を効果的に管理する。また, 最適層比分解のための繰り返しキャリブレーション法を用いて, 異なるLCM層の分解感度の変動に対処する。 ASVDは推論能力を失うことなく、ネットワークを10%から20%圧縮できることを示した。加えて、他のLLM圧縮パラダイムとシームレスに統合することができ、柔軟性のある互換性を示している。コードと圧縮されたモデルはhttps://github.com/hahnyuan/ASVD4LLMで入手できる。 This paper explores a new post-hoc training-free compression paradigm for compressing Large Language Models (LLMs) to facilitate their wider adoption in various computing environments. We delve into the challenges of LLM compression, notably their dependency on extensive training data and computational resources. We propose a training-free approach dubbed Activation-aware Singular Value Decomposition (ASVD) to address these limitations. ASVD effectively manages activation outliers by adjusting the weight matrix based on the activation distribution, improving decomposition accuracy and efficiency. Our method also addresses the varying sensitivity of different LLM layers to decomposition, with an iterative calibration process for optimal layer-specific decomposition. Experiments demonstrate that ASVD can compress network by 10%-20% without losing reasoning capacities. Additionally, it can be seamlessly integrated with other LLM compression paradigms, showcasing its flexible compatibility. Code and compressed models are available at https://github.com/hahnyuan/ASVD4LLM.	翻訳日:2023-12-12 18:24:45 公開日:2023-12-10
# ICTSurF:ニューラルネットワークによる連続時間生存機能 ICTSurF: Implicit Continuous-Time Survival Functions with Neural Networks ( http://arxiv.org/abs/2312.05818v1 ) ライセンス: Link先を確認	Chanon Puttanawarut, Panu Looareesuwan, Romen Samuel Wabina, Prut Saowaprut	(参考訳) 生存分析は、時間とともに事象の可能性を予測する方法として広く知られている。検閲されたサンプルを扱うという課題はまだ残っている。 Cox Proportional Hazards (CPH) モデルのような伝統的な手法は、比例的ハザードの強い仮定と共変量間の所定の関係による制限をヒンジする。ディープニューラルネットワーク(DNN)に基づくモデルの台頭は、生存分析における有効性の向上を証明している。本研究では,連続時間生存モデルに基づく暗黙的持続時間生存関数(ictsurf)を導入し,暗黙的表現による生存分布を構築する。これにより,ニューラルネットワークのアーキテクチャによらず,連続時間空間における入力を受け取り,連続時間空間における生存確率を生成することができる。既存手法との比較評価は,提案手法の高競争性を裏付けるものである。 ICTSurFの実装はhttps://github.com/44REAM/ICTSurFで公開されています。 Survival analysis is a widely known method for predicting the likelihood of an event over time. The challenge of dealing with censored samples still remains. Traditional methods, such as the Cox Proportional Hazards (CPH) model, hinge on the limitations due to the strong assumptions of proportional hazards and the predetermined relationships between covariates. The rise of models based on deep neural networks (DNNs) has demonstrated enhanced effectiveness in survival analysis. This research introduces the Implicit Continuous-Time Survival Function (ICTSurF), built on a continuous-time survival model, and constructs survival distribution through implicit representation. As a result, our method is capable of accepting inputs in continuous-time space and producing survival probabilities in continuous-time space, independent of neural network architecture. Comparative assessments with existing methods underscore the high competitiveness of our proposed approach. Our implementation of ICTSurF is available at https://github.com/44REAM/ICTSurF.	翻訳日:2023-12-12 18:24:28 公開日:2023-12-10
# 深層生成ネットワークに基づく音声合成のためのニューラル音声埋め込み Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks ( http://arxiv.org/abs/2312.05814v1 ) ライセンス: Link先を確認	Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee	(参考訳) 脳音声技術は、人工知能、脳-コンピュータインタフェース、音声合成の分野を含む学際的応用の融合を表す。ニューラル表現学習に基づく意図的復号と音声合成は、神経活動と人間の言語コミュニケーションの手段を直接接続し、コミュニケーションの自然性を大幅に向上させる。表現学習と音声合成技術の発展に関する最近の発見により、脳信号の音声への直接翻訳は大きな可能性を秘めている。特に、ニューラルネットワークに与えられた処理された入力特徴とニューラルスピーチ埋め込みは、脳信号からの音声生成に深い生成モデルを使用する場合、全体的なパフォーマンスにおいて重要な役割を果たす。本稿では,脳信号からの音声合成を可能とし,最終的には非言語コミュニケーションの革新を促進する現在の脳-音声技術を紹介する。また,音声合成作業において重要な役割を担っていると思われる,神経生理学的アクティベーションの基盤となる神経特徴や音声の埋め込みを包括的に分析する。 Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.	翻訳日:2023-12-12 18:24:12 公開日:2023-12-10
# 生成コンテンツを活用したフェデレーション学習 Federated Learning Empowered by Generative Content ( http://arxiv.org/abs/2312.05807v1 ) ライセンス: Link先を確認	Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang	(参考訳) フェデレートラーニング(FL)は、プライバシ保護方法でモデルのトレーニングに分散プライベートデータを活用可能にする。しかし、データの不均一性は現在のFL法の性能を著しく制限する。本稿では,federative contentでプライベートデータを多角化することにより,データの不均一性問題を解決するために設計された,federcと呼ばれる新しいflフレームワークを提案する。 FedGCは単純な実装フレームワークであり、データ生成のワンショットステップのみを導入している。データ生成では,3つの重要かつ価値ある側面(予算割当,迅速な設計,世代指導)を要約し,各側面に対する3つのソリューション候補を提案する。具体的には,データ多様性と生成指導の忠実性とのトレードオフを改善するために,プロンプトと実データを同時に生成することを提案する。生成されたデータはプライベートデータとマージされ、ローカルモデルのトレーニングが容易になる。このような生成データはプライベートデータの多様性を高め、各クライアントが潜在的に偏ったプライベートデータに適合しないようにし、データの不均一性を緩和する。我々は、さまざまなベースライン、データセット、シナリオ、モダリティをカバーする、FedGCに関する体系的な実証的研究を行う。興味ある発見は, 1) 生成データとプライベートデータの間に顕著な相違がある場合でも, FL法の性能を一貫して, 著しく向上させ, 2) 性能とプライバシ保護の両立を図ることである。この作業が将来の作業に刺激を与え、flを生成コンテンツで強化する可能性をさらに探りたいと思っています。 Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. However, data heterogeneity significantly limits the performance of current FL methods. In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. FedGC is a simple-to-implement framework as it only introduces a one-shot step of data generation. In data generation, we summarize three crucial and worth-exploring aspects (budget allocation, prompt design, and generation guidance) and propose three solution candidates for each aspect. Specifically, to achieve a better trade-off between data diversity and fidelity for generation guidance, we propose to generate data based on the guidance of prompts and real data simultaneously. The generated data is then merged with private data to facilitate local model training. Such generative data increases the diversity of private data to prevent each client from fitting the potentially biased private data, alleviating the issue of data heterogeneity. We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities. Interesting findings include (1) FedGC consistently and significantly enhances the performance of FL methods, even when notable disparities exist between generative and private data; (2) FedGC achieves both better performance and privacy-preservation. We wish this work can inspire future works to further explore the potential of enhancing FL with generative content.	翻訳日:2023-12-12 18:23:55 公開日:2023-12-10
# x線ct画像における腰筋の分節化に関する3次元数値スキーム Three-dimensional numerical schemes for the segmentation of the psoas muscle in X-ray computed tomography images ( http://arxiv.org/abs/2312.05887v1 ) ライセンス: Link先を確認	Giulio Paolucci, Isabella Cama, Cristina Campi, Michele Piana	(参考訳) 形態学的・機能的画像解析はサルコペン症、すなわち骨格筋の全身的喪失と多因子的エチオロジー的側面と相関する機能を評価するための正確なアプローチであることが判明した。 sarcopenia assessmentを放射線ワークフローに含めるには、セグメンテーションの信頼性とかなりの自動化を保証する画像処理のための計算パイプラインを実装する必要がある。本研究は,低線量X線CT画像における3次元数値計算手法を用いた。具体的には, レベルセットの方法論に着目し, 古典的進化モデルと3次元測地線モデルという2つの標準手法の性能と, 後者の1次修正による性能を比較した。この分析の結果, この勾配に基づくスキームは, 手動のセグメンテーションに関して信頼性を保証し, 1次スキームは2次アプローチで必要とされるものよりもはるかに小さい計算負荷を必要とすることがわかった。 The analysis of the psoas muscle in morphological and functional imaging has proved to be an accurate approach to assess sarcopenia, i.e. a systemic loss of skeletal muscle mass and function that may be correlated to multifactorial etiological aspects. The inclusion of sarcopenia assessment into a radiological workflow would need the implementation of computational pipelines for image processing that guarantee segmentation reliability and a significant degree of automation. The present study utilizes three-dimensional numerical schemes for psoas segmentation in low-dose X-ray computed tomography images. Specifically, here we focused on the level set methodology and compared the performances of two standard approaches, a classical evolution model and a three-dimension geodesic model, with the performances of an original first-order modification of this latter one. The results of this analysis show that these gradient-based schemes guarantee reliability with respect to manual segmentation and that the first-order scheme requires a computational burden that is significantly smaller than the one needed by the second-order approach.	翻訳日:2023-12-12 18:16:33 公開日:2023-12-10
# カーネルリッジ回帰に対する適応パラメータ選択 Adaptive Parameter Selection for Kernel Ridge Regression ( http://arxiv.org/abs/2312.05885v1 ) ライセンス: Link先を確認	Shao-Bo Lin	(参考訳) 本稿ではカーネルリッジ回帰(KRR)のパラメータ選択問題に焦点をあてる。 KRRの特別なスペクトル特性により、パラメータ間隔の微妙な分割が2つの連続KRR推定値の差を縮めることが分かる。そこで本研究では,krrの早期停止型パラメータ選択戦略を,いわゆるlepskii型原理に基づいて開発する。理論的検証は,提案したパラメータ選択戦略を備えたKRRが最適学習率の達成に成功し,異なる基準に適応し,カーネル手法のパラメータ選択の新たな記録を提供するための学習理論の枠組みとして提示される。 This paper focuses on parameter selection issues of kernel ridge regression (KRR). Due to special spectral properties of KRR, we find that delicate subdivision of the parameter interval shrinks the difference between two successive KRR estimates. Based on this observation, we develop an early-stopping type parameter selection strategy for KRR according to the so-called Lepskii-type principle. Theoretical verifications are presented in the framework of learning theory to show that KRR equipped with the proposed parameter selection strategy succeeds in achieving optimal learning rates and adapts to different norms, providing a new record of parameter selection for kernel methods.	翻訳日:2023-12-12 18:16:15 公開日:2023-12-10
# ディープニューラルネットワークを用いた高速サンプリング粒子タイミング検出器の精度向上 Using deep neural networks to improve the precision of fast-sampled particle timing detectors ( http://arxiv.org/abs/2312.05883v1 ) ライセンス: Link先を確認	Mateusz Kocot, Krzysztof Misan, Valentina Avati, Edoardo Bossini, Leszek Grzanka, Nicola Minafra	(参考訳) 粒子タイミング検出器の測定は、通過する粒子によって堆積された電荷の統計的変動によって生じる時間歩行効果に影響されることが多い。定数分数判別器(CFD)アルゴリズムは、CERNのLHCにおけるCMS-PPSシステムのようなテスト設定と実行実験の両方において、この効果を緩和するために頻繁に使用される。 CFDは単純で効果的であるが、時系列のすべての電圧サンプルを活用できない。その性能はディープニューラルネットワークによって強化され、粒子到着時間の計算を含む時系列解析に一般的に使用される。 desy-iiシンクロトロンの試験ビーム施設で得られたデータを用いて様々なニューラルネットワークアーキテクチャを評価し,ppsダイヤモンドタイミング検出器に加えて正確なmcp(マイクロチャネルプレート)検出器を設置した。 MCP測定は、ネットワークのトレーニング基準として使われ、その結果を標準CFD法と比較した。最終的に、検出器の読み出しチャンネルに応じて、タイミング精度を8%から23%改善しました。最善の結果は、古典的畳み込みネットワークや多層パーセプトロンよりも優れるunetモデルを用いて得られた。 Measurements from particle timing detectors are often affected by the time walk effect caused by statistical fluctuations in the charge deposited by passing particles. The constant fraction discriminator (CFD) algorithm is frequently used to mitigate this effect both in test setups and in running experiments, such as the CMS-PPS system at the CERN's LHC. The CFD is simple and effective but does not leverage all voltage samples in a time series. Its performance could be enhanced with deep neural networks, which are commonly used for time series analysis, including computing the particle arrival time. We evaluated various neural network architectures using data acquired at the test beam facility in the DESY-II synchrotron, where a precise MCP (MicroChannel Plate) detector was installed in addition to PPS diamond timing detectors. MCP measurements were used as a reference to train the networks and compare the results with the standard CFD method. Ultimately, we improved the timing precision by 8% to 23%, depending on the detector's readout channel. The best results were obtained using a UNet-based model, which outperformed classical convolutional networks and the multilayer perceptron.	翻訳日:2023-12-12 18:16:05 公開日:2023-12-10
# データ駆動型最適停止:純粋な探索分析 Data-driven optimal stopping: A pure exploration analysis ( http://arxiv.org/abs/2312.05880v1 ) ライセンス: Link先を確認	S\"oren Christensen, Niklas Dexheimer, Claudia Strauch	(参考訳) 最適停止の標準理論は、基礎となる過程が本質的に知られているという理想化された仮定に基づいている。本稿では, この制約を取り除き, 一般的な拡散過程におけるデータ駆動型最適停止について検討し, 最適停止障壁推定器の統計的性能について検討する。より具体的には、単純後悔に対する非漸近上界と、一様および非漸近性PAC境界を導出する。最小限の最適性は、単純な後悔に対する下界を一致させて上界結果を完成させることによって検証される。すべての結果は、給与関数の一般的な条件と、バイナリ分類で使われるマージン条件を模倣したより洗練された仮定の両方で示され、収束率の向上に繋がる。さらに,我々は,下限と上限の両方において,特定の探査・探査戦略の累積的後悔に単純な後悔を移す結果について検討した。 The standard theory of optimal stopping is based on the idealised assumption that the underlying process is essentially known. In this paper, we drop this restriction and study data-driven optimal stopping for a general diffusion process, focusing on investigating the statistical performance of the proposed estimator of the optimal stopping barrier. More specifically, we derive non-asymptotic upper bounds on the simple regret, along with uniform and non-asymptotic PAC bounds. Minimax optimality is verified by completing the upper bound results with matching lower bounds on the simple regret. All results are shown both under general conditions on the payoff functions and under more refined assumptions that mimic the margin condition used in binary classification, leading to an improved rate of convergence. Additionally, we investigate how our results on the simple regret transfer to the cumulative regret for a specific exploration-exploitation strategy, both with respect to lower bounds and upper bounds.	翻訳日:2023-12-12 18:15:45 公開日:2023-12-10
# 野生の運動を解き放たれた : チーターにおけるマーカーレス3次元運動学と力推定 Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs ( http://arxiv.org/abs/2312.05879v1 ) ライセンス: Link先を確認	Zico da Silva, Stacy Shield, Penny E. Hudson, Alan M. Wilson, Fred Nicolls and Amir Patel	(参考訳) 野生動物におけるマニピュラビリティの複雑なダイナミクスの研究は極めて困難である。チーター(\textit{Acinonyx jubatus}$)は、その不整合速度と操作性に大きな関心があるにもかかわらず、これらの動物から完全な全身の動きデータを取得することは未解決の問題のままである。これは野生のチーターでは特に困難であり、使用する方法が遠方であり、動物の動きを拘束しないことが不可欠である。本研究では,野生のチーターから得られたデータを用いて3次元運動量と関節トルクを遠隔で推定する軌道最適化手法を提案する。この手法をK-FTE (etic full trajectory Estimation) と呼ぶ。本手法は,同期ビデオとフォースプレートデータからなるデータセット上で検証する。 3次元キネマティックスの平均再投影誤差は17.69ピクセル (62.94 $\%$ pck は鼻から眼までの長さをしきい値とする) であり, 推定値では, 力板データと比較すると, 平均根-平均2乗誤差は171.3 n (約17.16$\%$ of peak force during stride) である。ジョイントトルクは地上の真理データに対して直接検証することはできないが、チーターではそのようなデータが利用できないため、推定トルクは制御された設定における以前の四重項の研究と一致する。これらの結果は、生物学者とロボット工学者の両方にとって、より自然な環境における動物の移動の研究に深い洞察をもたらすだろう。 The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods used are remote and do not constrain the animal's motion. In this work, we use data obtained from cheetahs in the wild to present a trajectory optimisation approach for estimating the 3D kinematics and joint torques of subjects remotely. We call this approach kinetic full trajectory estimation (K-FTE). We validate the method on a dataset comprising synchronised video and force plate data. We are able to reconstruct the 3D kinematics with an average reprojection error of 17.69 pixels (62.94 $\%$ PCK using the nose-to-eye(s) length segment as a threshold), while the estimates produce an average root-mean-square error of 171.3 N ($\approx$ 17.16 $\%$ of peak force during stride) for the estimated ground reaction force when compared against the force plate data. While the joint torques cannot be directly validated against ground truth data, as no such data is available for cheetahs, the estimated torques agree with previous studies of quadrupeds in controlled settings. These results will enable deeper insight into the study of animal locomotion in a more natural environment for both biologists and roboticists.	翻訳日:2023-12-12 18:15:29 公開日:2023-12-10
# 不均衡データから学習するスキュー確率型ニューラルネットワーク Skew Probabilistic Neural Networks for Learning from Imbalanced Data ( http://arxiv.org/abs/2312.05878v1 ) ライセンス: Link先を確認	Shraddha M. Naik, Tanujit Chakraborty, Abdenour Hadid, Bibhas Chakraborty	(参考訳) 実世界のデータセットは、特定のクラスレベルが著しく過小評価されている不均衡なデータ分布を示すことが多い。このような場合、伝統的なパターン分類器は多数派に偏りを示し、少数派に対する正確な予測を妨げている。本稿では,確率論的ニューラルネットワーク(PNN)とスキュー正規確率カーネルを用いた不均衡なデータ指向アプローチを提案する。 PNNは確率的出力を提供することで知られており、予測信頼性と不確実性処理の定量化を可能にしている。柔軟性の向上,特に不均衡データと非対称データに対するスキュー正規分布の活用により,提案したスキュー確率ニューラルネットワーク(SkewPNN)は,下層のクラス密度をよりよく表現できる。不均衡データセットに対する提案手法の性能を最適化するには、ハイパーパラメータの微調整が不可欠である。この目的のために,人口ベースのヒューリスティックアルゴリズムであるbat最適化アルゴリズムを用いて,ハイパーパラメータ空間を効果的に探索する。また,サンプルサイズが大きくなるにつれて,真の分布が円滑に近づくことを示す密度推定の統計的整合性も証明する。種々のベンチマーク不均衡学習者を比較し,異なる合成データセットを用いて実験シミュレーションを行った。我々の実データ分析によると、SkewPNNは、ほとんどの実験環境でバランスの取れたデータセットと不均衡なデータセットの両方に対して、最先端の機械学習手法を大幅に上回っている。 Real-world datasets often exhibit imbalanced data distribution, where certain class levels are severely underrepresented. In such cases, traditional pattern classifiers have shown a bias towards the majority class, impeding accurate predictions for the minority class. This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel to address this major challenge. PNNs are known for providing probabilistic outputs, enabling quantification of prediction confidence and uncertainty handling. By leveraging the skew normal distribution, which offers increased flexibility, particularly for imbalanced and non-symmetric data, our proposed Skew Probabilistic Neural Networks (SkewPNNs) can better represent underlying class densities. To optimize the performance of the proposed approach on imbalanced datasets, hyperparameter fine-tuning is imperative. To this end, we employ a population-based heuristic algorithm, Bat optimization algorithms, for effectively exploring the hyperparameter space. We also prove the statistical consistency of the density estimates which suggests that the true distribution will be approached smoothly as the sample size increases. Experimental simulations have been conducted on different synthetic datasets, comparing various benchmark-imbalanced learners. Our real-data analysis shows that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.	翻訳日:2023-12-12 18:14:56 公開日:2023-12-10
# 2023年XCSP3コンペティションの成果 Proceedings of the 2023 XCSP3 Competition ( http://arxiv.org/abs/2312.05877v1 ) ライセンス: Link先を確認	Gilles Audemard, Christophe Lecoutre, Emmanuel Lonca	(参考訳) この文書は2023年のXCSP3コンペティションの手続きを表している。この制約解決の競争の結果はCP'23(カナダのトロントで2023年8月27日から31日まで開催された第29回制約プログラミングの原則と実践に関する国際会議)で発表された。 This document represents the proceedings of the 2023 XCSP3 Competition. The results of this competition of constraint solvers were presented at CP'23 (the 29th International Conference on Principles and Practice of Constraint Programming, held in Toronto, Canada from 27th to 31th August, 2023).	翻訳日:2023-12-12 18:14:33 公開日:2023-12-10
# 効率的なニューラルネットワークのためのクラスアウェアプルーニング Class-Aware Pruning for Efficient Neural Networks ( http://arxiv.org/abs/2312.05875v1 ) ライセンス: Link先を確認	Mengnan Jiang, Jingcun Wang, Amro Eldebiky, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Grace Li Zhang	(参考訳) ディープニューラルネットワーク(DNN)は様々な分野で顕著な成功を収めている。しかし、DNNにおける多数の浮動小数点演算(FLOP)は、エッジデバイスのようなリソース制約のアプリケーションに展開する上での課題となっている。この問題に対処するため、DNNの実行における計算コストを削減するためにプルーニングが導入された。従来のプルーニング戦略は、重量値、勾配値、アクティベーション出力に基づいている。本稿では,dnnを圧縮するクラスアウェアプルーニング手法を提案し,dnnの計算コストを削減するための新しい視点を提供する。各イテレーションで、ニューラルネットワークのトレーニングが変更され、クラス認識の刈り込みが容易になる。その後、クラス数に関するフィルタの重要性が評価される。いくつかのクラスでのみ重要なフィルタは削除される。ニューラルネットワークは、発生した精度の損失を補償するために再トレーニングされる。プルーニングのイテレーションは、フィルタがなくなるまで終了し、残りのフィルタが多くのクラスにとって非常に重要であることを示す。このプルーニング法は, 従来のプルーニング法よりも精度, プルーニング率, FLOPsの低減に優れていた。実験の結果, このクラスアウェアプルーニング手法は, 高い推定精度を維持しつつ, 重みとフラップ数を大幅に削減できることがわかった。 Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy.	翻訳日:2023-12-12 18:14:27 公開日:2023-12-10
# CasADiの学習: 数値最適化におけるデータ駆動モデル Learning for CasADi: Data-driven Models in Numerical Optimization ( http://arxiv.org/abs/2312.05873v1 ) ライセンス: Link先を確認	Tim Salzmann, Jon Arrizabalaga, Joel Andersson, Marco Pavone and Markus Ryll	(参考訳) 実世界の問題は分析的に解析することが難しいことが多いが、深層学習は複雑なプロセスをデータからモデル化する上で優れている。 CasADiのような既存の最適化フレームワークは、ソルバのシームレスな使用を容易にするが、学習プロセスモデルを数値最適化に統合する際の課題に直面している。このギャップに対処するため、我々はLearning for CasADi (L4CasADi) フレームワークを提案し、PyTorchで学習したモデルをCasADiとシームレスに統合し、効率的かつハードウェアアクセラレーションのある数値最適化を可能にする。 l4casadiの適用性は2つのチュートリアル例で示されている: まず, 乱流がピトルチモデルで表されるエネルギー効率のために, 乱流河川における魚の軌跡を最適化する。第2に,L4CasADi を用いた最適制御において,暗黙のニューラルラジアンスフィールド環境表現を容易に活用できることを示す。 L4CasADiはサンプルとドキュメントとともに、MITライセンス下でhttps://github.com/Tim-Salzmann/l4casadiで利用可能である。 While real-world problems are often challenging to analyze analytically, deep learning excels in modeling complex processes from data. Existing optimization frameworks like CasADi facilitate seamless usage of solvers but face challenges when integrating learned process models into numerical optimizations. To address this gap, we present the Learning for CasADi (L4CasADi) framework, enabling the seamless integration of PyTorch-learned models with CasADi for efficient and potentially hardware-accelerated numerical optimization. The applicability of L4CasADi is demonstrated with two tutorial examples: First, we optimize a fish's trajectory in a turbulent river for energy efficiency where the turbulent flow is represented by a PyTorch model. Second, we demonstrate how an implicit Neural Radiance Field environment representation can be easily leveraged for optimal control with L4CasADi. L4CasADi, along with examples and documentation, is available under MIT license at https://github.com/Tim-Salzmann/l4casadi	翻訳日:2023-12-12 18:14:09 公開日:2023-12-10
# tabiic:反復的およびインタラクティブなクラスタリングによる分類学的構築 TaBIIC: Taxonomy Building through Iterative and Interactive Clustering ( http://arxiv.org/abs/2312.05866v1 ) ライセンス: Link先を確認	Mathieu d'Aquin	(参考訳) 分類学を構築することは、しばしばオントロジーを構築する重要な部分であり、関連するデータから分類学を作成するための多くの試みがなされている。このようなアプローチにおける考え方は、概念のインテンションの関連する定義を、データ内のパターン(例えば形式的概念解析)として抽出することができるか、あるいは類似性(クラスタリング)に基づいてデータオブジェクトをグループ化することによって拡張を構築することができる。いずれの場合も、プロセスは自動的に構築される構造につながり、大きすぎるか定義に欠ける可能性があるか、きめ細かな細かな細部が多すぎるため、望まれる分類に洗練される必要がある。本稿では、反復的かつインタラクティブなプロセスにおいて、両方のアプローチからインスピレーションを得る方法について検討し、これらの概念をデータ中に特定する際に、分類学における概念の洗練と定義が生じるようにする。本稿では,本手法が様々なデータソースに適用可能であることを示し,オントロジーにより直接的に組み込むことができる分類学につながることを示す。 Building taxonomies is often a significant part of building an ontology, and many attempts have been made to automate the creation of such taxonomies from relevant data. The idea in such approaches is either that relevant definitions of the intension of concepts can be extracted as patterns in the data (e.g. in formal concept analysis) or that their extension can be built from grouping data objects based on similarity (clustering). In both cases, the process leads to an automatically constructed structure, which can either be too coarse and lacking in definition, or too fined-grained and detailed, therefore requiring to be refined into the desired taxonomy. In this paper, we explore a method that takes inspiration from both approaches in an iterative and interactive process, so that refinement and definition of the concepts in the taxonomy occur at the time of identifying those concepts in the data. We show that this method is applicable on a variety of data sources and leads to taxonomies that can be more directly integrated into ontologies.	翻訳日:2023-12-12 18:13:51 公開日:2023-12-10
# 自己組織化マップを用いたニューラルネットワークの概念表現の探索 Finding Concept Representations in Neural Networks with Self-Organizing Maps ( http://arxiv.org/abs/2312.05864v1 ) ライセンス: Link先を確認	Mathieu d'Aquin	(参考訳) 十分複雑なタスクでは、ニューラルネットワークは問題解決の副作用として、その問題の表現に関する関連する抽象化を学習することが期待される。これは特に、ニューラルネットワーク内の特定のユニット(ニューロン)の活性化と、画像に存在する視覚的概念(テクスチャ、色、オブジェクト)の間に相関があることが、多くの研究で示されている機械ビジョンにおいて確認されている。本稿では, ニューラルネットワークの層全体の活性化ベクトルが, 「女性」や「リアリズム画家」といった抽象概念の神経表現とどのように対応しているかを視覚的に, 計算的に検討する。ネットワークのレイヤにおける概念表現のレベルを評価するために,これらのマップに適用する複数の尺度を実験する。実験の結果, 概念の活性化マップの相対エントロピーは, データ全体のマップと比較して適切な候補であり, 概念の神経表現を同定し, 視覚化し, 目の前の予測課題の解決におけるその重要性を理解するための方法論の一部として利用することができることがわかった。 In sufficiently complex tasks, it is expected that as a side effect of learning to solve a problem, a neural network will learn relevant abstractions of the representation of that problem. This has been confirmed in particular in machine vision where a number of works showed that correlations could be found between the activations of specific units (neurons) in a neural network and the visual concepts (textures, colors, objects) present in the image. Here, we explore the use of self-organizing maps as a way to both visually and computationally inspect how activation vectors of whole layers of neural networks correspond to neural representations of abstract concepts such as `female person' or `realist painter'. We experiment with multiple measures applied to those maps to assess the level of representation of a concept in a network's layer. We show that, among the measures tested, the relative entropy of the activation map for a concept compared to the map for the whole data is a suitable candidate and can be used as part of a methodology to identify and locate the neural representation of a concept, visualize it, and understand its importance in solving the prediction task at hand.	翻訳日:2023-12-12 18:13:32 公開日:2023-12-10
# 256ベースのビデオ:ゼロショットビデオ編集のための空間的期待-最大化インバージョン A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing ( http://arxiv.org/abs/2312.05856v1 ) ライセンス: Link先を確認	Maomao Li, Yu Li, Tianyu Yang, Yunfei Liu, Dongxu Yue, Zhihui Lin, and Dong Xu	(参考訳) 本稿では,ゼロショット映像編集のためのビデオインバージョン手法を提案する。既存のビデオ編集方法は、通常、編集の前に2D DDIMのインバージョンやna\の時空間DDIMのインバージョンを適用する。多くの既存手法と異なり,より高密度な映像特徴を期待最大化法で定式化し,映像全体を表現するためのよりコンパクトなベースを反復的に推定する空間的期待最大化(STEM)インバージョンを提案する。各フレームはインバージョンに対して固定的かつグローバルな表現を適用し、再構成と編集の間は時間的一貫性に親しみやすい。我々のSTEMインバージョンは2つの最先端ビデオ編集法において一貫した改善を達成できることを示す。 This paper presents a video inversion approach for zero-shot video editing, which aims to model the input video with low-rank representation during the inversion process. The existing video editing methods usually apply the typical 2D DDIM inversion or na\"ive spatial-temporal DDIM inversion before editing, which leverages time-varying representation for each frame to derive noisy latent. Unlike most existing approaches, we propose a Spatial-Temporal Expectation-Maximization (STEM) inversion, which formulates the dense video feature under an expectation-maximization manner and iteratively estimates a more compact basis set to represent the whole video. Each frame applies the fixed and global representation for inversion, which is more friendly for temporal consistency during reconstruction and editing. Extensive qualitative and quantitative experiments demonstrate that our STEM inversion can achieve consistent improvement on two state-of-the-art video editing methods.	翻訳日:2023-12-12 18:13:09 公開日:2023-12-10
# NeVRF:長周期画像のためのニューラルビデオベース放射場 NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences ( http://arxiv.org/abs/2312.05855v1 ) ライセンス: Link先を確認	Minye Wu, Tinne Tuytelaars	(参考訳) 長周期動的シーケンスへのニューラルレージアンス場(NeRF)の適用は困難である。既存の手法では、品質とストレージサイズのバランスがとれず、トポロジカルな変化や大きな動きといった複雑なシーンの変化で困難に直面する。これらの課題に対処するために,ニューラルビデオベース放射場(NeVRF)の表現を提案する。 NeVRFは、画像ベースのレンダリングを備えたニューラルラディアンスフィールドをマージし、長期のダイナミックな内向きシーンにおけるフォトリアリスティックなノベルビュー合成をサポートする。本稿では,マルチビュー映像から直接放射率を予測するために,新しいマルチビューラディアンスブレンディング手法を提案する。連続的な学習手法を取り入れることで、NeVRFは、以前のフレームを再考することなく、シーケンシャルデータからフレームを効率的に再構築することができる。さらに、最適化された圧縮アプローチにより、NeVRFは動的シーンをコンパクトに表現することができ、現実のシナリオにおいて動的放射場をより実用的なものにすることができる。我々は,NeVRFの長期配列レンダリング,シーケンシャルデータ再構成,コンパクトデータ記憶における有効性を示した。 Adopting Neural Radiance Fields (NeRF) to long-duration dynamic sequences has been challenging. Existing methods struggle to balance between quality and storage size and encounter difficulties with complex scene changes such as topological changes and large motions. To tackle these issues, we propose a novel neural video-based radiance fields (NeVRF) representation. NeVRF marries neural radiance field with image-based rendering to support photo-realistic novel view synthesis on long-duration dynamic inward-looking scenes. We introduce a novel multi-view radiance blending approach to predict radiance directly from multi-view videos. By incorporating continual learning techniques, NeVRF can efficiently reconstruct frames from sequential data without revisiting previous frames, enabling long-duration free-viewpoint video. Furthermore, with a tailored compression approach, NeVRF can compactly represent dynamic scenes, making dynamic radiance fields more practical in real-world scenarios. Our extensive experiments demonstrate the effectiveness of NeVRF in enabling long-duration sequence rendering, sequential data reconstruction, and compact data storage.	翻訳日:2023-12-12 18:12:50 公開日:2023-12-10
# 複合生存分析:補助集約ベースラインと生存スコアを用いた学習 Composite Survival Analysis: Learning with Auxiliary Aggregated Baselines and Survival Scores ( http://arxiv.org/abs/2312.05854v1 ) ライセンス: Link先を確認	Chris Solomou	(参考訳) 生存分析(Survival Analysis, SA)は、時間とともに発生するわずかな事象の事象確率を推定できるため、時間とイベントのモデリングのデフォルト手法である。本研究は,saモデルの全表現を(1)集団の行動全体を把握するベースラインハザードに分解し,(2)特定のメンバーの特異な確率的ダイナミクスをモデル化する独立分散サバイバルスコアに完全にパラメトリックな設定で分解することにより,saモデルのトレーニングと推論を改善する方法を示す。提案手法は, 直交観測地平線を動的に処理し, 計算非効率なDeep Learning-based SA法や, MCMCを必要とするモデルを含む, 様々な実世界のデータセットにおける他の最先端手法と比較して, 競争力を発揮する。しかし,本手法は,微調整やハイパーパラメータ最適化を行なわず,出力から頑健な結果が得られる。 Survival Analysis (SA) constitutes the default method for time-to-event modeling due to its ability to estimate event probabilities of sparsely occurring events over time. In this work, we show how to improve the training and inference of SA models by decoupling their full expression into (1) an aggregated baseline hazard, which captures the overall behavior of a given population, and (2) independently distributed survival scores, which model idiosyncratic probabilistic dynamics of its given members, in a fully parametric setting. The proposed inference method is shown to dynamically handle right-censored observation horizons, and to achieve competitive performance when compared to other state-of-the-art methods in a variety of real-world datasets, including computationally inefficient Deep Learning-based SA methods and models that require MCMC for inference. Nevertheless, our method achieves robust results from the outset, while not being subjected to fine-tuning or hyperparameter optimization.	翻訳日:2023-12-12 18:12:31 公開日:2023-12-10
# InteractDiffusion:テキスト間拡散モデルにおける相互作用制御 InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.05849v1 ) ライセンス: Link先を確認	Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu	(参考訳) 大規模テキスト・ツー・イメージ(t2i)拡散モデルは、テキスト記述に基づいてコヒーレントな画像を生成する素晴らしい能力を示しており、コンテンツ生成における広大な応用を可能にしている。近年, 物体の局所化, 姿勢, 画像の輪郭などの要因の制御が進んでいるが, 生成コンテンツ中の物体間の相互作用を制御できる重要なギャップが残っている。生成した画像内の対話をうまく制御することで、対話的なキャラクターで現実的なシーンを作るといった有意義な応用が可能になる。本研究では,三重項ラベル(人,行動,対象)と対応する境界ボックスからなる人間-対象間相互作用(hoi)情報を用いたt2i拡散モデルの条件付け問題について検討する。我々は、既存の訓練済みT2I拡散モデルを拡張して、相互作用により良い条件付けを可能にする、InteractDiffusionと呼ばれるプラグイン可能な相互作用制御モデルを提案する。具体的には、HOI情報をトークン化し、インタラクション埋め込みを通じてそれらの関係を学習する。条件付き自己アテンション層は、HOIトークンを視覚トークンにマッピングするように訓練され、既存のT2I拡散モデルにおいて視覚トークンをよりよく条件付ける。提案モデルでは,既存のT2I拡散モデルにおける相互作用と位置の制御が可能であり,HOI検出スコアの差が大きく,FIDおよびKIDの忠実度も大きく向上する。プロジェクトページ: https://jiuntian.github.io/interactdiffusion。 Large-scale text-to-image (T2I) diffusion models have showcased incredible capabilities in generating coherent images based on textual descriptions, enabling vast applications in content generation. While recent advancements have introduced control over factors such as object localization, posture, and image contours, a crucial gap remains in our ability to control the interactions between objects in the generated content. Well-controlling interactions in generated images could yield meaningful applications, such as creating realistic scenes with interacting characters. In this work, we study the problems of conditioning T2I diffusion models with Human-Object Interaction (HOI) information, consisting of a triplet label (person, action, object) and corresponding bounding boxes. We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions. Specifically, we tokenize the HOI information and learn their relationships via interaction embeddings. A conditioning self-attention layer is trained to map HOI tokens to visual tokens, thereby conditioning the visual tokens better in existing T2I diffusion models. Our model attains the ability to control the interaction and location on existing T2I diffusion models, which outperforms existing baselines by a large margin in HOI detection score, as well as fidelity in FID and KID. Project page: https://jiuntian.github.io/interactdiffusion.	翻訳日:2023-12-12 18:12:12 公開日:2023-12-10
# データフリーハードラベルロバストネス盗み攻撃 Data-Free Hard-Label Robustness Stealing Attack ( http://arxiv.org/abs/2312.05924v1 ) ライセンス: Link先を確認	Xiaojian Yuan, Kejiang Chen, Wen Huang, Jie Zhang, Weiming Zhang, Nenghai Yu	(参考訳) MLaaS(Machine Learning as a Service)の人気は、MLaaSをクエリすることでクローンモデルを構築することを目的とした、モデルステアリングアタック(MSA)に対する懸念の高まりにつながっている。現在、MLaaSに関するほとんどの研究は、MLaaSがソフトラベルを提供し、攻撃者は同様の分布を持つプロキシデータセットを持つと仮定している。しかし、ハードラベルだけがMLaaSによって返却され、データの分散が未解決のままである、より現実的なシナリオをカプセル化できない。さらに、既存の仕事の多くはモデルの正確さを盗み、モデルの堅牢さを怠り、セキュリティに敏感なシナリオ、例えばフェイススキャンの支払いにおいて堅牢性が不可欠である。特に、モデルのロバスト性を改善するには、しばしば、敵対的なトレーニングのような高価な技術を使う必要があるため、ロバスト性を盗む方がより有益である。そこで本研究では,これらのギャップに応答して,対象モデルのハードラベルを自然データを用いずに簡単にクエリすることで,モデル精度とロバスト性の両方を盗むことが可能な,データフリーなハードラベルロバストネス盗み (dfhl-rs) 攻撃を提案する。包括的実験により本手法の有効性が実証された。クローンモデルは77.86%のクリーンな精度と39.51%のロバストな精度を実現し、cifar-10データセットのターゲットモデルよりわずか4.71%と8.40%低く、ベースラインを大幅に上回っている。私たちのコードは、https://github.com/LetheSec/DFHL-RS-Attackで利用可能です。 The popularity of Machine Learning as a Service (MLaaS) has led to increased concerns about Model Stealing Attacks (MSA), which aim to craft a clone model by querying MLaaS. Currently, most research on MSA assumes that MLaaS can provide soft labels and that the attacker has a proxy dataset with a similar distribution. However, this fails to encapsulate the more practical scenario where only hard labels are returned by MLaaS and the data distribution remains elusive. Furthermore, most existing work focuses solely on stealing the model accuracy, neglecting the model robustness, while robustness is essential in security-sensitive scenarios, e.g., face-scan payment. Notably, improving model robustness often necessitates the use of expensive techniques such as adversarial training, thereby further making stealing robustness a more lucrative prospect. In response to these identified gaps, we introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model without the help of any natural data. Comprehensive experiments demonstrate the effectiveness of our method. The clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack, which are only 4.71% and 8.40% lower than the target model on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is available at: https://github.com/LetheSec/DFHL-RS-Attack	翻訳日:2023-12-12 18:05:18 公開日:2023-12-10
# 弱教師付きビデオ個数弱教師付ビデオ個数 Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting ( http://arxiv.org/abs/2312.05923v1 ) ライセンス: Link先を確認	Xinyan Liu and Guorong Li and Yuankai Qi and Ziheng Yan and Zhenjun Han and Anton van den Hengel and Ming-Hsuan Yang and Qingming Huang	(参考訳) ビデオ個別カウント(VIC)は、単一のビデオ内のユニークな個人数を予測することを目的としている。 % の既存手法は, 個人に対する軌跡ラベルに基づく表現を学習する。 %) より現実的な実践的課題の反映として, トラジェクティブラベルが提供されない弱教師付きVICタスクを導入する。代わりに、ビューのフィールドに入るトラフィック(インフロー)とフィールドビュー(アウトフロー)を去るトラフィックを示す2種類のラベルが提供される。 % グループレベルのマッチングにおいて,タスクを弱教師付きコントラスト学習問題として定式化するベースラインとして,最初のソリューションを提案する。そこで我々は,ネットワークを駆動し,インフロー,アウトフロー,残りを識別するために,エンドツーエンドのトレーニング可能なソフトコントラスト損失を考案した。 % この方向への今後の研究を促進するため、既存のVICデータセットであるSenseCrowdとCroHDからアノテーションを生成し、新しいデータセットであるUAVVICを構築します。以上の結果から,我々のベースラインの弱弱弱弱弱弱化手法は教師付き手法よりも優れており,より実践的な弱弱弱化タスクへの移行においてほとんど情報が失われることが示唆された。コードとトレーニングされたモデルは、 \href{https://github.com/streamer-ap/cgnet}{cgnet}で公開される。 Video Individual Counting (VIC) aims to predict the number of unique individuals in a single video. % Existing methods learn representations based on trajectory labels for individuals, which are annotation-expensive. % To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided. Instead, two types of labels are provided to indicate traffic entering the field of view (inflow) and leaving the field view (outflow). % We also propose the first solution as a baseline that formulates the task as a weakly supervised contrastive learning problem under group-level matching. In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining. % To facilitate future study in this direction, we generate annotations from the existing VIC datasets SenseCrowd and CroHD and also build a new dataset, UAVVIC. % Extensive results show that our baseline weakly supervised method outperforms supervised methods, and thus, little information is lost in the transition to the more practically relevant weakly supervised task. The code and trained model will be public at \href{https://github.com/streamer-AP/CGNet}{CGNet}	翻訳日:2023-12-12 18:04:47 公開日:2023-12-10
# dig-csi:csiフィードバックトレーニングを支援する分散生成モデル Dig-CSI: A Distributed and Generative Model Assisted CSI Feedback Training Framework ( http://arxiv.org/abs/2312.05921v1 ) ライセンス: Link先を確認	Zhilin Du, Haozhen Li, Zhenyu Liu, Shilong Fan, Xinyu Gu, Lin Zhang	(参考訳) ディープラーニング(DL)ベースのモデルの出現は、無線通信システムにおけるチャネル状態情報(CSI)フィードバック機構を大幅に進歩させた。しかし、従来のアプローチは、CSIデータ処理の集中的な性質のため、高い通信オーバーヘッドと潜在的なプライバシーリスクに悩まされることが多い。これらの課題に対処するため、我々はDig-CSIと呼ばれるCSIフィードバックトレーニングフレームワークを設計し、CSIフィードバックモデルをトレーニングするためのデータセットは、各ユーザ機器(UE)がアップロードした分散ジェネレータによって作成されるが、ローカルデータのアップロードは行わない。各ueは、デコーダを分散ジェネレータと見なすオートエンコーダを訓練し、ローカルデータを用いて再構成精度と生成能力を得る。実験結果から,Dig-CSIは従来の集中学習モデルと同等の性能のグローバルCSIフィードバックモデルを訓練できることがわかった。 The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called Dig-CSI, in which the dataset for training the CSI feedback model is produced by the distributed generators uploaded by each user equipment (UE), but not through local data upload. Each UE trains an autoencoder, where the decoder is considered as the distributed generator, with local data to gain reconstruction accuracy and the ability to generate. Experimental results show that Dig-CSI can train a global CSI feedback model with comparable performance to the model trained with classical centralized learning with a much lighter communication overhead.	翻訳日:2023-12-12 18:04:25 公開日:2023-12-10
# 自然画像マッティングのための拡散 Diffusion for Natural Image Matting ( http://arxiv.org/abs/2312.05915v1 ) ライセンス: Link先を確認	Yihan Hu, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei, Humphrey Shi	(参考訳) 我々は拡散を利用して、困難な画像マッチング課題に取り組むことを目指している。しかし、高い計算オーバーヘッドの存在とトレーニングと推論プロセス間のノイズサンプリングの不整合は、この目標を達成する上で大きな障害となる。本稿では,これらの課題を効果的に克服するソリューションであるdiffmatteを提案する。まず、DiffMatteはデコーダを複雑な結合されたマッティングネットワーク設計から切り離し、拡散プロセスのイテレーションで1つの軽量デコーダだけを含む。このような戦略により、diffmatteはサンプル数の増加に伴って計算オーバーヘッドの増大を緩和する。第2に,均一な時間間隔を持つ自己整合型トレーニング戦略を採用し,時間領域全体にわたるトレーニングと推論の一貫したノイズサンプリングを実現する。我々のDiffMatteは柔軟性を念頭に設計されており、シームレスに様々なモダンなマッティングアーキテクチャに統合できます。大規模な実験結果から,DiffMatteはコンポジション1kテストセットの最先端レベルに到達し,SAD測定値とMSE測定値でそれぞれ5%,15%のベストメソッドを上回り,他のベンチマークではより強力な一般化能力を示した。 We aim to leverage diffusion to address the challenging image matting task. However, the presence of high computational overhead and the inconsistency of noise sampling between the training and inference processes pose significant obstacles to achieving this goal. In this paper, we present DiffMatte, a solution designed to effectively overcome these challenges. First, DiffMatte decouples the decoder from the intricately coupled matting network design, involving only one lightweight decoder in the iterations of the diffusion process. With such a strategy, DiffMatte mitigates the growth of computational overhead as the number of samples increases. Second, we employ a self-aligned training strategy with uniform time intervals, ensuring a consistent noise sampling between training and inference across the entire time domain. Our DiffMatte is designed with flexibility in mind and can seamlessly integrate into various modern matting architectures. Extensive experimental results demonstrate that DiffMatte not only reaches the state-of-the-art level on the Composition-1k test set, surpassing the best methods in the past by 5% and 15% in the SAD metric and MSE metric respectively, but also show stronger generalization ability in other benchmarks.	翻訳日:2023-12-12 18:04:10 公開日:2023-12-10
# アンサンブルカルマンフィルタによるガウス過程状態空間モデルの変分推論 Ensemble Kalman Filtering-Aided Variational Inference for Gaussian Process State-Space Models ( http://arxiv.org/abs/2312.05910v1 ) ライセンス: Link先を確認	Zhidi Lin and Yiyong Sun and Feng Yin and Alexandre Thi\'ery	(参考訳) ガウス過程状態空間モデル(GPSSM)は、放射モデルを通して観測される潜在状態ダイナミクスをモデル化するための原理的かつ柔軟なアプローチを提供する。しかし、既存のGPSSMを学習するための変分法は、特に償却推論ネットワークの導入によって、多数のパラメータを最適化する上で大きな課題に直面している。この課題に対処するために,定評あるモデルベースフィルタリング手法であるアンサンブルカルマンフィルタ(enkf)を用いて,変動推論フレームワークにおける潜在状態の後方分布を近似する。このアプローチは推論ネットワークの必要性をなくし、変動パラメータの数を大幅に削減する。さらに,EnKFの助けを借りて,閉形式解を用いた複数項の和により,変分推論における近似的エビデンス下界(ELBO)の簡易評価が容易に得られることを示した。自動微分ツールを利用することで、ELBOを最大化し、GPSSMを効率的に訓練することができる。また,提案手法をオンライン環境に拡張し,包括的アルゴリズム解析と洞察を提供する。多様な実データとシミュレーションデータセットの大規模なテストは、我々の変分推論アルゴリズムがEnKFと統合され、学習と推論性能の点で既存の手法よりも優れていることを示す。 Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to model latent state dynamics observed through emission models. However, existing variational methods for learning GPSSMs face a substantial challenge in optimizing a large number of parameters, particularly with the introduction of amortized inference networks. To address this challenge, we leverage the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, to approximate the posterior distribution of latent states within the variational inference framework. This approach eliminates the need for inference networks, significantly reducing the number of variational parameters. Moreover, we demonstrate that with the aid of EnKF, the straightforward evaluation of approximated evidence lower bound (ELBO) in the variational inference can be easily obtained through the summation of multiple terms with closed-form solutions. By leveraging automatic differentiation tools, we thus can maximize the ELBO and train the GPSSM efficiently. We also extend the proposed method to an online setting and provide comprehensive algorithm analyses and insights. Extensive testing on diverse real and simulated datasets demonstrates that our variational inference algorithms, integrated with EnKF, outperform existing methods in terms of learning and inference performance.	翻訳日:2023-12-12 18:03:51 公開日:2023-12-10
# 近赤外表情認識のための確率微分方程式を用いた多エネルギー誘導画像変換 Multi-Energy Guided Image Translation with Stochastic Differential Equations for Near-Infrared Facial Expression Recognition ( http://arxiv.org/abs/2312.05908v1 ) ライセンス: Link先を確認	Bingjun Luo, Zewen Wang, Jinpeng Wang, Junjie Zhu, Xibin Zhao, Yue Gao	(参考訳) 照度変化は、現実世界の表情認識(FER)において長期にわたる課題である。非制御または可視光条件下では、近赤外(NIR)は、高画質の画像を取得し、可視領域に欠けている幾何学的およびテクスチャ的詳細を補うための単純で代替的なソリューションを提供することができる。既存の大規模なNIR表情データセットがないため、VIS FERメソッドを直接NIRスペクトルに拡張することは効果がない可能性がある。さらに、従来の異種画像合成法は、タスク知識のない低制御性によって制限される。これらの問題に対処するため、我々はNIR-FER確率微分方程式 (NFER-SDE) を初めて提案する。 NFER-SDEは、VISソースイメージ全体を入力として、ドメイン固有の知識とともに、画像の高周波コンテンツにおけるモダリティ不変情報の保存をガイドすることができる。大規模な実験およびアブレーション研究により、NFER-SDEはNIR FERの性能を著しく改善し、唯一利用可能な2つのNIR FERデータセットであるOulu-CASIAとLarge-HFEに対して最先端の結果を得ることが示された。 Illumination variation has been a long-term challenge in real-world facial expression recognition(FER). Under uncontrolled or non-visible light conditions, Near-infrared (NIR) can provide a simple and alternative solution to obtain high-quality images and supplement the geometric and texture details that are missing in the visible domain. Due to the lack of existing large-scale NIR facial expression datasets, directly extending VIS FER methods to the NIR spectrum may be ineffective. Additionally, previous heterogeneous image synthesis methods are restricted by low controllability without prior task knowledge. To tackle these issues, we present the first approach, called for NIR-FER Stochastic Differential Equations (NFER-SDE), that transforms face expression appearance between heterogeneous modalities to the overfitting problem on small-scale NIR data. NFER-SDE is able to take the whole VIS source image as input and, together with domain-specific knowledge, guide the preservation of modality-invariant information in the high-frequency content of the image. Extensive experiments and ablation studies show that NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.	翻訳日:2023-12-12 18:03:31 公開日:2023-12-10
# 近赤外表情認識のためのハイパーグラフ誘導不等角スペクトルトランスフォーマネットワーク Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition ( http://arxiv.org/abs/2312.05907v1 ) ライセンス: Link先を確認	Bingjun Luo, Haowen Wang, Jinpeng Wang, Junjie Zhu, Xibin Zhao, Yue Gao	(参考訳) 照明変化に対する強い堅牢性により、近赤外(NIR)は、低照度または完全な暗黒条件下での視覚的(VIS)表情認識を効果的かつ必須に補完することができる。しかし,NIR画像からの表情認識(FER)は,データスケールの制約や不完全な可視光コンテンツから識別的特徴を抽出することが困難であるため,従来のFERよりも困難である。本稿では,表情認識の深化を初めて試み,近赤外式トランスフォーマ(nfer-former)と呼ばれる新しい手法を提案する。具体的には、visの分野における豊富なラベル情報をフルに活用するために、入力画像から表現情報とスペクトル情報とを分離する自己対応直交分解機構を導入し、スペクトル変動の干渉を伴わずに表現特徴を抽出する。また,いくつかの重要な顔動作をモデル化し,それら間の複雑な相関構造を学習し,クラス間類似性の干渉を軽減するハイパーグラフガイド機能埋め込み手法を提案する。さらに,NFER-Formerの効率性を評価するために,360個の被験者を含む大規模なNIR-VIS顔表現データセットを構築した。大規模な実験とアブレーション研究により、NFER-FormerはNIR FERの性能を大幅に改善し、利用可能な2つのNIR FERデータセット(Oulu-CASIAとLarge-HFE)で最先端の結果が得られることが示された。 With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.	翻訳日:2023-12-12 18:03:05 公開日:2023-12-10
# エッジレベルEgo-NetworkエンコーディングによるサブグラフGNNの改善 Improving Subgraph-GNNs via Edge-Level Ego-Network Encodings ( http://arxiv.org/abs/2312.05905v1 ) ライセンス: Link先を確認	Nurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicen\c{c} G\'omez	(参考訳) 本稿では,ノードとエッジの機能追加やメッセージパッシングフォーマットの拡張により,メッセージパッシンググラフニューラルネットワーク(mp-gnns)を高速化するグラフ学習のための,新たなエッジレベルのegoネットワーク符号化を提案する。提案した符号化法は,3WL相当グラフ群であるStrongly Regular Graphsを識別するのに十分である。このような符号化はノードベースのMP-GNNよりも表現力が高いことを示す。 10のグラフデータセットを持つ4つのベンチマークに対する実証的な評価では、実際の設定ではメモリ使用量を18.1倍削減しつつ、表現性、グラフ分類、グラフ回帰、近接タスクの以前のベースラインにマッチまたは改善しています。 We present a novel edge-level ego-network encoding for learning on graphs that can boost Message Passing Graph Neural Networks (MP-GNNs) by providing additional node and edge features or extending message-passing formats. The proposed encoding is sufficient to distinguish Strongly Regular Graphs, a family of challenging 3-WL equivalent graphs. We show theoretically that such encoding is more expressive than node-based sub-graph MP-GNNs. In an empirical evaluation on four benchmarks with 10 graph datasets, our results match or improve previous baselines on expressivity, graph classification, graph regression, and proximity tasks -- while reducing memory usage by 18.1x in certain real-world settings.	翻訳日:2023-12-12 18:02:38 公開日:2023-12-10
# ディープラーニングを用いた白内障手術ビデオの解析 Deep-Learning-Assisted Analysis of Cataract Surgery Videos ( http://arxiv.org/abs/2312.05900v1 ) ライセンス: Link先を確認	Negin Ghamsarian	(参考訳) 医療技術の進歩に伴い、手術室はインテリジェントな環境へと進化している。文脈認識システム(CAS)は、手術状態を包括的に解釈し、リアルタイム警告を可能にし、特に初心者外科医の意思決定を支援する。これらのシステムは、手術ビデオを自動的に分析し、インデクシング、文書化、手術後のレポート生成を行うことができる。このような自動システムに対する需要がますます高まる中、手術用ビデオ分析のための機械学習ベースのアプローチが生まれている。この論文は白内障手術ビデオ解析における重要な課題に対処し、効率的な文脈認識システム構築の道を開く。 1) 本論文は, 関連コンテンツの時空間的局所化が位相認識精度を大幅に向上させることを示す。 2)本論文は,白内障手術ビデオのリアルタイムストリーミングと適応ストレージを実現するための,関連性に基づく圧縮のための新しいディープラーニングフレームワークを提案する。 3)いくつかの畳み込みモジュールが提案され,ネットワークの意味解釈性能の向上が期待できる。これらの課題には、ぼかしと反射の歪み、透明性、変形性、色とテクスチャの変化、鈍いエッジ、スケールの変動などがある。 (4)白内障手術ビデオにおける自動不規則検出のための最初の枠組みを提案し,評価する。 (5)手動ピクセルベースのアノテーションの要件を軽減するため,セマンティックセグメンテーションに適応した自己教師付き表現学習のための新しい戦略を提案する。 Following the technological advancements in medicine, the operation rooms are evolving into intelligent environments. The context-aware systems (CAS) can comprehensively interpret the surgical state, enable real-time warning, and support decision-making, especially for novice surgeons. These systems can automatically analyze surgical videos and perform indexing, documentation, and post-operative report generation. The ever-increasing demand for such automatic systems has sparked machine-learning-based approaches for surgical video analysis. This thesis addresses the significant challenges in cataract surgery video analysis to pave the way for building efficient context-aware systems. The main contributions of this thesis are five folds: (1) This thesis demonstrates that spatio-temporal localization of the relevant content can considerably improve phase recognition accuracy. (2) This thesis proposes a novel deep-learning-based framework for relevance-based compression to enable real-time streaming and adaptive storage of cataract surgery videos. (3) Several convolutional modules are proposed to boost the networks' semantic interpretation performance in challenging conditions. These challenges include blur and reflection distortion, transparency, deformability, color and texture variation, blunt edges, and scale variation. (4) This thesis proposes and evaluates the first framework for automatic irregularity detection in cataract surgery videos. (5) To alleviate the requirement for manual pixel-based annotations, this thesis proposes novel strategies for self-supervised representation learning adapted to semantic segmentation.	翻訳日:2023-12-12 18:02:24 公開日:2023-12-10
# PSCR:AIGC画像品質評価のためのサンプリングベースのコントラスト回帰 PSCR: Patches Sampling-based Contrastive Regression for AIGC Image Quality Assessment ( http://arxiv.org/abs/2312.05897v1 ) ライセンス: Link先を確認	Jiquan Yuan, Xinyan Cao, Linjing Cao, Jinlong Lin, and Xixin Cao	(参考訳) 近年、AIGC(Artificial Intelligence Generated Content)はコンピュータ科学コミュニティを超えて広く注目を集めている。 AIGC画像品質評価(AIGCIQA、AIGC Image Quality Assessment)は、AIGIの連続的な生成から生じる様々な問題から、人間の知覚の観点からAIGIの品質を評価することを目的としており、コンピュータビジョン分野における新たなトピックとして浮上している。しかし、既存のほとんどのAIGCIQAメソッドは、単一の生成された画像から予測されたスコアを直接回帰し、AIGIとスコアの固有の違いを見落としている。さらに、リサイズやクロッピングのような操作は、大域的な幾何学的歪みや情報損失を引き起こし、モデルの性能を制限している。これらの問題に対処するために,パッチサンプリングに基づくコントラスト回帰(PSCR)フレームワークを提案する。より優れた表現空間を学習するために,様々な画像の差を利用したコントラスト回帰フレームワークを提案する。この領域では、画像間の差とスコアランキングを相対スコアで測定することができる。また,exemplar aigisを参照として選択することで,参照なし画像データベースでは参照画像が利用できない従来モデルの制限を克服した。画像入力における幾何歪みや情報損失を回避するため,パッチサンプリング戦略を提案する。提案するPSCRフレームワークの有効性を実証するため,AGIQA-1K, AGIQA-3K, AIGCIQA2023を含む3つの主流AIGCIQAデータベース上で広範囲に実験を行った。その結果,提案したPSCRフレームワークの導入により,モデル性能が大幅に向上した。コードは \url{https://github.com/jiquan123/PSCR} で入手できる。 In recent years, Artificial Intelligence Generated Content (AIGC) has gained widespread attention beyond the computer science community. Due to various issues arising from continuous creation of AI-generated images (AIGI), AIGC image quality assessment (AIGCIQA), which aims to evaluate the quality of AIGIs from human perception perspectives, has emerged as a novel topic in the field of computer vision. However, most existing AIGCIQA methods directly regress predicted scores from a single generated image, overlooking the inherent differences among AIGIs and scores. Additionally, operations like resizing and cropping may cause global geometric distortions and information loss, thus limiting the performance of models. To address these issues, we propose a patches sampling-based contrastive regression (PSCR) framework. We suggest introducing a contrastive regression framework to leverage differences among various generated images for learning a better representation space. In this space, differences and score rankings among images can be measured by their relative scores. By selecting exemplar AIGIs as references, we also overcome the limitations of previous models that could not utilize reference images on the no-reference image databases. To avoid geometric distortions and information loss in image inputs, we further propose a patches sampling strategy. To demonstrate the effectiveness of our proposed PSCR framework, we conduct extensive experiments on three mainstream AIGCIQA databases including AGIQA-1K, AGIQA-3K and AIGCIQA2023. The results show significant improvements in model performance with the introduction of our proposed PSCR framework. Code will be available at \url{https://github.com/jiquan123/PSCR}.	翻訳日:2023-12-12 18:01:59 公開日:2023-12-10
# 素粒子とのダークマター差動相互作用の原子プローブ An atomic probe of dark matter differential interactions with elementary particles ( http://arxiv.org/abs/2312.05894v1 ) ライセンス: Link先を確認	Yossi Rosenzweig (1), Yevgeny Kats (1), Menachem Givon (1), Yonathan Japha (1) and Ron Folman (1) ((1) Department of Physics, Ben-Gurion University of the Negev, Israel)	(参考訳) 標準模型を超えて物理を探すことは実験物理学の主要な課題の一つである。ダークマターの候補には、アクシオンのような超軽いボソニック粒子がある。コマグネトメータは、そのような粒子と原子のスピンと相互作用するエキゾチックな磁場に対して超高感度プローブを形成する。本研究では, それらの磁場を発見し, スペクトルを測定できるだけでなく, サブ原子初等粒子, 電子, 中性子, 陽子との結合強度の比を決定するマルチ原子種プローブを提案する。さらに, このプローブの多面的特性は, 通常の磁場とアルカリ原子の光誘起架空の磁場の組み合わせによって生じる合成エキゾチック場によっても証明できることを示した。これらの合成場はまた、エキゾチック物理学のための任意の磁力計またはコマグネトメータプローブの正確な校正を可能にする。 Searching for physics beyond the Standard Model is one of the main tasks of experimental physics. Candidates for dark matter include axion-like ultralight bosonic particles. Comagnetometers form ultra-high sensitivity probes for such particles and any exotic field that interacts with the spin of an atom. Here, we propose a multi-atom-species probe that enables not only to discover such fields and measure their spectrum but also to determine the ratios of their coupling strengths to sub-atomic elementary particles, electrons, neutrons and protons. We further show that the multi-faceted capabilities of this probe may be demonstrated with synthetic exotic fields generated by a combination of regular magnetic fields and light-induced fictitious magnetic fields in alkali atoms. These synthetic fields also enable the accurate calibration of any magnetometer or comagnetometer probe for exotic physics.	翻訳日:2023-12-12 18:01:33 公開日:2023-12-10
# 局所赤外源を照射した超伝導量子ビットの準粒子ダイナミックス Quasiparticle dynamics in a superconducting qubit irradiated by a localized infrared source ( http://arxiv.org/abs/2312.05892v1 ) ライセンス: Link先を確認	Rodrigo Benevides, Maxwell Drimmer, Giacomo Bisson, Francesco Adinolfi, Uwe von L\"upke, Hugo Michiel Doeleman, Gianluigi Catelani, Yiwen Chu	(参考訳) 超伝導量子ビットにおけるデコヒーレンスの既知の源は、クーパー対または準粒子の存在である。これらは、環境に存在する高エネルギー放射や、ハイブリッド量子デバイスのような目的に導入された高エネルギー放射によって生成される。本研究では,様々なパワー,持続時間,空間的位置の焦点をあてた赤外線照射により,照明下でのトランスモンキュービットの特性を体系的に研究する。入射光子の高エネルギーにもかかわらず、我々の観測はトラップが支配する低エネルギー準粒子ダイナミクスのモデルとよく一致する。この手法は、様々なジオメトリや材料を持つ超伝導回路に対する高エネルギー放射線の影響を理解し、緩和することができる。 A known source of decoherence in superconducting qubits is the presence of broken Cooper pairs, or quasiparticles. These can be generated by high-energy radiation, either present in the environment or purposefully introduced, as in the case of some hybrid quantum devices. Here, we systematically study the properties of a transmon qubit under illumination by focused infrared radiation with various powers, durations, and spatial locations. Despite the high energy of incident photons, our observations agree well with a model of low-energy quasiparticle dynamics dominated by trapping. This technique can be used for understanding and potentially mitigating the effects of high-energy radiation on superconducting circuits with a variety of geometries and materials.	翻訳日:2023-12-12 18:01:22 公開日:2023-12-10
# Maxwell-Amp\`{e}re-Nernst-Planck方程式に対する保守的ハイブリッド物理インフォームドニューラルネットワーク法 A conservative hybrid physics-informed neural network method for Maxwell-Amp\`{e}re-Nernst-Planck equations ( http://arxiv.org/abs/2312.05891v1 ) ライセンス: Link先を確認	Cheng Chang, Zhouping Xin, Tieyong Zeng	(参考訳) Maxwell-Amp\`{e}re-Nernst-Planck (MANP) 方程式は、荷電粒子の力学をモデル化するために最近提案されている。本研究では,このシステムの数値アルゴリズムを深層学習ツールを用いて拡張する。提案するハイブリッドアルゴリズムはダミー変数の適切な近似を決定する自動手法を提供する。さらに、元の方法は2次元問題に対して検証される。しかし、空間次元が 1 の場合、元のカールフリー緩和成分は適用不可能であり、ダミー変数の近似式は2次元シナリオでうまく機能するが、1 次元の場合において妥当な出力を与えることができない。提案手法は1次元の場合に容易に一般化できる。実験は1次元の場合のポアソン・ボルツマン型方程式から得られる定常解の数値安定性と良好な収束性を示す。 2次元の場合の実験は,提案手法が保存特性を保っていることを示唆する。 Maxwell-Amp\`{e}re-Nernst-Planck (MANP) equations were recently proposed to model the dynamics of charged particles. In this study, we enhance a numerical algorithm of this system with deep learning tools. The proposed hybrid algorithm provides an automated means to determine a proper approximation for the dummy variables, which can otherwise only be obtained through massive numerical tests. In addition, the original method is validated for 2-dimensional problems. However, when the spatial dimension is one, the original curl-free relaxation component is inapplicable, and the approximation formula for dummy variables, which works well in a 2-dimensional scenario, fails to provide a reasonable output in the 1-dimensional case. The proposed method can be readily generalised to cases with one spatial dimension. Experiments show numerical stability and good convergence to the steady-state solution obtained from Poisson-Boltzmann type equations in the 1-dimensional case. The experiments conducted in the 2-dimensional case indicate that the proposed method preserves the conservation properties.	翻訳日:2023-12-12 18:01:13 公開日:2023-12-10
# 効率的な境界伝搬と並列計算による#DNN検証ツールのスケーリング Scaling #DNN-Verification Tools with Efficient Bound Propagation and Parallel Computing ( http://arxiv.org/abs/2312.05890v1 ) ライセンス: Link先を確認	Luca Marzari, Gabriele Roncolato and Alessandro Farinelli	(参考訳) ディープニューラルネットワーク(dnn)は、パターン認識から複雑なロボット問題に至るまで、多くのシナリオで驚くべき結果を示す強力なツールである。しかし、それらの複雑な設計と透明性の欠如は、現実世界のアプリケーションに適用された場合の安全性の懸念を引き起こす。この文脈において、DNNの形式検証(FV)は、安全面の証明可能な保証を提供する貴重なソリューションとして登場した。それにもかかわらず、バイナリ回答(すなわち、安全か安全か)は、安全モデルのランク付けや選択のような直接的安全介入に十分な情報がない可能性がある。この制限に対処するため、FV問題は、最近#DNN-Verificationと呼ばれるカウントバージョンに拡張され、与えられた安全プロパティのドメイン内の安全でない領域のサイズを計算した。それでも、問題の複雑さのため、既存のソリューションは、DNNが大規模で複雑な実世界のロボットシナリオにスケールするのに苦労している。本研究は,FVの進歩に触発されたこの限界に対処するため,DNNカウンタの高精度かつ近似的なFVの効率を高めるために,シンボリック線形緩和と並列計算を組み合わせた到達可能性解析に基づく新しい戦略を提案する。標準のfvベンチマークと現実的なロボットシナリオの実証的な評価は、スケーラビリティと効率が著しく向上し、複雑なロボットアプリケーションでもそのようなテクニックが利用できることを示した。 Deep Neural Networks (DNNs) are powerful tools that have shown extraordinary results in many scenarios, ranging from pattern recognition to complex robotic problems. However, their intricate designs and lack of transparency raise safety concerns when applied in real-world applications. In this context, Formal Verification (FV) of DNNs has emerged as a valuable solution to provide provable guarantees on the safety aspect. Nonetheless, the binary answer (i.e., safe or unsafe) could be not informative enough for direct safety interventions such as safety model ranking or selection. To address this limitation, the FV problem has recently been extended to the counting version, called #DNN-Verification, for the computation of the size of the unsafe regions in a given safety property's domain. Still, due to the complexity of the problem, existing solutions struggle to scale on real-world robotic scenarios, where the DNN can be large and complex. To address this limitation, inspired by advances in FV, in this work, we propose a novel strategy based on reachability analysis combined with Symbolic Linear Relaxation and parallel computing to enhance the efficiency of existing exact and approximate FV for DNN counters. The empirical evaluation on standard FV benchmarks and realistic robotic scenarios shows a remarkable improvement in scalability and efficiency, enabling the use of such techniques even for complex robotic applications.	翻訳日:2023-12-12 18:00:57 公開日:2023-12-10
# SuperPrimitive: 原始レベルでのシーン再構築 SuperPrimitive: Scene Reconstruction at a Primitive Level ( http://arxiv.org/abs/2312.05889v1 ) ライセンス: Link先を確認	Kirill Mazur, Gwangbin Bae, Andrew J. Davison	(参考訳) 画像群や単眼映像群からのジョイントカメラのポーズと密度幾何推定は、その計算の複雑さと固有の視覚的曖昧さのため、依然として困難な問題である。多くの高密度増分再構成システムは画像画素を直接操作し、多視点幾何学的手がかりを用いて3次元位置を解く。このようなピクセルレベルのアプローチは、多視点一貫性の曖昧さや違反(例えば、テクスチャのない表面や鏡面に起因する)に苦しむ。我々はスーパープリミティブと呼ばれる新しいイメージ表現でこの問題に対処する。超プリミティブは、イメージを意味的に相関した局所領域に分割し、それらを予測された表面正規方向で拡張することで得られる。これはスーパープリミティブ当たりの局所幾何学的推定を提供し、相対的な位置は多視点観測に基づいて調整される。新しい表現の汎用性を示すために,3つの3次元再構成課題である奥行き完了,運動からの少数視点構造,単眼密度視覚オドメトリに対処した。 Joint camera pose and dense geometry estimation from a set of images or a monocular video remains a challenging problem due to its computational complexity and inherent visual ambiguities. Most dense incremental reconstruction systems operate directly on image pixels and solve for their 3D positions using multi-view geometry cues. Such pixel-level approaches suffer from ambiguities or violations of multi-view consistency (e.g. caused by textureless or specular surfaces). We address this issue with a new image representation which we call a SuperPrimitive. SuperPrimitives are obtained by splitting images into semantically correlated local regions and enhancing them with estimated surface normal directions, both of which are predicted by state-of-the-art single image neural networks. This provides a local geometry estimate per SuperPrimitive, while their relative positions are adjusted based on multi-view observations. We demonstrate the versatility of our new representation by addressing three 3D reconstruction tasks: depth completion, few-view structure from motion, and monocular dense visual odometry.	翻訳日:2023-12-12 18:00:33 公開日:2023-12-10
# Fake it Till Make it: 合意指向世代によるフェデレーション学習 Fake It Till Make It: Federated Learning with Consensus-Oriented Generation ( http://arxiv.org/abs/2312.05966v1 ) ライセンス: Link先を確認	Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang	(参考訳) フェデレートラーニング(FL)では、データの異質性はモデルの分散と性能の制限を引き起こす重要なボトルネックの1つである。これに対応するために、既存の手法では、データ不均一性を固有の特性とみなし、モデルを修正することでその悪影響を軽減することを提案する。本稿では,元となるデータセットを補完し,不均一性を根本的に緩和するデータを生成することによって,この特性を破ろうとする。データの観点からの新しい試みとして,コンセンサス指向生成(fedcog)を用いた連合学習を提案する。 FedCOGは、共有グローバルモデルから抽出されたデータを生成して元のデータセットを補完する補完データ生成と、生成されたデータに基づいてグローバルモデルからローカルモデルに知識を蒸留し、元の異種データセットの過度な適合を緩和する知識蒸留モデルトレーニングの2つの主要なコンポーネントで構成されている。 FedCOGには2つの重要な利点がある。 1)既存のFLメソッドの性能をさらに向上させるプラグイン・アンド・プレイモジュールとなり得る。 2)セキュアアグリゲーションのような標準flプロトコルと自然に互換性がある。古典的および実世界のFLデータセットに対する大規模な実験は、FedCOGが一貫して最先端の手法より優れていることを示している。 In federated learning (FL), data heterogeneity is one key bottleneck that causes model divergence and limits performance. Addressing this, existing methods often regard data heterogeneity as an inherent property and propose to mitigate its adverse effects by correcting models. In this paper, we seek to break this inherent property by generating data to complement the original dataset to fundamentally mitigate heterogeneity level. As a novel attempt from the perspective of data, we propose federated learning with consensus-oriented generation (FedCOG). FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset. FedCOG has two critical advantages: 1) it can be a plug-and-play module to further improve the performance of most existing FL methods, and 2) it is naturally compatible with standard FL protocols such as Secure Aggregation since it makes no modification in communication process. Extensive experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.	翻訳日:2023-12-12 17:53:14 公開日:2023-12-10
# 結果:電子健康記録作成のための論理制約付きシーケンスの合成 ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation ( http://arxiv.org/abs/2312.05964v1 ) ライセンス: Link先を確認	Brandon Theodorou, Shrusti Jain, Cao Xiao, and Jimeng Sun	(参考訳) 生成モデルは、実際のデータが使用できない、あるいは制限された場合に、分析タスクのための合成患者記録を生成することができる。しかし、現在の手法はドメイン固有の知識に固執し、無効なデータを削除するのに苦労している。本稿では,逐次生成型ニューラルネットワーク出力にドメイン知識を統合するための効果的な手法を提案する。我々の規則に基づく定式化は時間的集約と先行評価モジュールを含み、効率的な行列乗算定式化によって保証され、時間ステップ間のハードかつソフトな論理的制約を満たす。既存の制約手法は、しばしば制約満足度を保証することができず、時間的制約を扱う能力がなく、モデルの学習と計算効率を妨げる。対照的に,本手法は論理コヒーレンスを保証することで,全ての制約を効率的に処理する。本研究は,電子健康記録の作成において,実行時性能や生成的品質を損なうことなく,完全な時間的・空間的制約満足度を達成するための競争相手を上回り,その結果の有効性を示す。具体的には、ConSequenceは、モデル品質を改善しながら、テストの難易度を5%削減し、制約のないモデルに比べて生成速度が13%以下に低下する。 Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model.	翻訳日:2023-12-12 17:52:51 公開日:2023-12-10
# Aikyam: 聴覚障害とダムのためのビデオ会議ユーティリティ Aikyam: A Video Conferencing Utility for Deaf and Dumb ( http://arxiv.org/abs/2312.05962v1 ) ライセンス: Link先を確認	Kshitij Deshpande, Varad Mashalkar, Kaustubh Mhaisekar, Amaan Naikwadi and Archana Ghotkar	(参考訳) パンデミックの到来に伴い、コミュニケーション手段としてのビデオ会議プラットフォームの使用が大幅に増加し、それに伴い遠隔地での機会も増えた。聴覚障害者と愚か者は伝統的にコミュニケーションのいくつかの問題に直面してきたが、現在ではその影響はより厳しく感じられている。本稿では、既存のビデオ会議プラットフォームと併用してこれらの問題に対処できる全アクセスビデオ会議ユーティリティを提案する。適切な意味的正しい文は、システムによって解釈されるシグナーのジェスチャーから生成される。この文を出力するオーディオと共に、ユーザのフィードも、その文に注釈をつけるために使用される。これはすべての参加者が見ることができ、すべての関係者との円滑なコミュニケーションを支援する。このユーティリティは、ジェスチャの分類に単純なLSTMモデルを使用する。文はt5ベースのモデルによって構築される。必要なデータフローを達成するために、仮想カメラを使用する。 With the advent of the pandemic, the use of video conferencing platforms as a means of communication has greatly increased and with it, so have the remote opportunities. The deaf and dumb have traditionally faced several issues in communication, but now the effect is felt more severely. This paper proposes an all-encompassing video conferencing utility that can be used with existing video conferencing platforms to address these issues. Appropriate semantically correct sentences are generated from the signer's gestures which would be interpreted by the system. Along with an audio to emit this sentence, the user's feed is also used to annotate the sentence. This can be viewed by all participants, thus aiding smooth communication with all parties involved. This utility utilizes a simple LSTM model for classification of gestures. The sentences are constructed by a t5 based model. In order to achieve the required data flow, a virtual camera is used.	翻訳日:2023-12-12 17:52:29 公開日:2023-12-10
# TransGlow:水流予測のためのグラフニューラルネットワークに基づく注意増強型トランスダクションモデル TransGlow: Attention-augmented Transduction model based on Graph Neural Networks for Water Flow Forecasting ( http://arxiv.org/abs/2312.05961v1 ) ライセンス: Link先を確認	Naghmeh Shafiee Roudbari, Charalambos Poullis, Zachary Patterson, Ursula Eicker	(参考訳) 水量の水量予測は、水管理、洪水予測、洪水制御など様々な用途に有用である。しかし、水系の動的性質と限られたデータのため、作業は困難である。高度に相互接続された水系は、水力計の予測に大きな影響を与える。したがって、他のシステムコンポーネント間の関係を表すモデルを開発することが重要である。近年,河川流量予測,洪水予測,水質予測など,多くの水文学的応用が研究されている。既存の方法は、変数のペア間の隣接領域の影響をモデル化できない。本稿では,GCRN(Graph Convolution Recurrent Neural Network)エンコーダデコーダにおいて,アテンションメカニズムの効率的なバージョンを用いて隠れた状態を増大させる時空間予測モデルを提案する。注目層は、デコーダが入力シーケンスの異なる部分に選択的にアクセスできるようにする。水系は相互接続され、ステーション間の接続情報は暗黙的であるため、提案モデルはグラフ学習モジュールを利用してデータに基づいて疎グラフ隣接行列を適応的に抽出する。時空間予測は歴史的データに依存する。しかし、一部の地域では、歴史的データは限定的あるいは不完全であり、将来の水質を正確に予測することは困難である。さらに,河川,河川,湖上のカナダステーションのネットワークから,新たな水流のベンチマークデータセットを提案する。実験の結果,提案モデルであるTransGlowはベースライン法よりも広いマージンで優れていた。 The hydrometric prediction of water quantity is useful for a variety of applications, including water management, flood forecasting, and flood control. However, the task is difficult due to the dynamic nature and limited data of water systems. Highly interconnected water systems can significantly affect hydrometric forecasting. Consequently, it is crucial to develop models that represent the relationships between other system components. In recent years, numerous hydrological applications have been studied, including streamflow prediction, flood forecasting, and water quality prediction. Existing methods are unable to model the influence of adjacent regions between pairs of variables. In this paper, we propose a spatiotemporal forecasting model that augments the hidden state in Graph Convolution Recurrent Neural Network (GCRN) encoder-decoder using an efficient version of the attention mechanism. The attention layer allows the decoder to access different parts of the input sequence selectively. Since water systems are interconnected and the connectivity information between the stations is implicit, the proposed model leverages a graph learning module to extract a sparse graph adjacency matrix adaptively based on the data. Spatiotemporal forecasting relies on historical data. In some regions, however, historical data may be limited or incomplete, making it difficult to accurately predict future water conditions. Further, we present a new benchmark dataset of water flow from a network of Canadian stations on rivers, streams, and lakes. Experimental results demonstrate that our proposed model TransGlow significantly outperforms baseline methods by a wide margin.	翻訳日:2023-12-12 17:52:17 公開日:2023-12-10
# VAE-IF:定期取得ICU時系列における非教師なしアーティファクト検出のための平均化による深部特徴抽出 VAE-IF: Deep feature extraction with averaging for unsupervised artifact detection in routine acquired ICU time-series ( http://arxiv.org/abs/2312.05959v1 ) ライセンス: Link先を確認	Hollan Haule, Ian Piper, Patricia Jones, Chen Qin, Tsz-Yan Milly Lo, and Javier Escudero	(参考訳) アーティファクトは集中治療ユニット(icu)やその他の設定から収集された生理時系列データにおいて一般的な問題である。臨床研究と患者のケアの質と信頼性に影響を及ぼす。アーティファクトのマニュアルアノテーションは費用がかかり、時間がかかり、実用的ではない。自動化された方法が望ましい。本稿では,先行ラベルや信号固有の知識を必要とせず,臨床標準分単位のicuデータからアーティファクトを検出するための新しい教師なし手法を提案する。このアプローチでは,変動型オートエンコーダ(vae)と孤立林(iforest)モデルを組み合わせて,血圧,心拍数,頭蓋内圧など,さまざまな生命徴候の特徴を学習し,異常を同定する。我々は、実世界のICUデータセットに対するアプローチを評価し、長期記憶(LSTM)とXGBoostに基づく教師付きモデルと比較する。提案手法は, 同等の感度を達成し, 外部データセットによく適合することを示す。また,vaeが学習した潜在空間を可視化し,クリーンでノイズの多いサンプルを分離する能力を示す。本手法は,臨床研究や実践において,ラベルを一切必要とせずにICUデータをクリーニングする,有望なソリューションを提供する。 Artifacts are a common problem in physiological time-series data collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel unsupervised approach to detect artifacts in clinical-standard minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (iForest) model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised models based on long short-term memory (LSTM) and XGBoost. We show that our approach achieves comparable sensitivity and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.	翻訳日:2023-12-12 17:51:53 公開日:2023-12-10
# フライ上の微分可能な粒子フィルタの学習 Learning Differentiable Particle Filter on the Fly ( http://arxiv.org/abs/2312.05955v1 ) ライセンス: Link先を確認	Jiaxi Li, Xiongjie Chen, Yunpeng Li	(参考訳) 微分可能な粒子フィルタは、ニューラルネットワークを用いて状態空間モデルに成分を構成するシーケンシャルベイズ推論技術の新たなクラスである。既存のアプローチは、主にオフラインの教師付きトレーニング戦略に基づいている。これにより、モデルデプロイメントの遅延が発生し、得られたフィルタはテスト時間データの分散シフトに影響を受けやすい。本稿では,データ到着時にモデルパラメータを更新できるように,微分可能な粒子フィルタのためのオンライン学習フレームワークを提案する。技術的な制約は、オンライン推論設定に既知の真理状態情報がないことである。我々は、オンラインモデル更新手順を構築するために、教師なしの損失を採用することで、この問題に対処する。提案手法の有効性を実証的に評価し,多変量線形ガウス状態空間モデルと擬似物体追跡実験を含むシミュレーション設定における教師付き学習手法と比較した。 Differentiable particle filters are an emerging class of sequential Bayesian inference techniques that use neural networks to construct components in state space models. Existing approaches are mostly based on offline supervised training strategies. This leads to the delay of the model deployment and the obtained filters are susceptible to distribution shift of test-time data. In this paper, we propose an online learning framework for differentiable particle filters so that model parameters can be updated as data arrive. The technical constraint is that there is no known ground truth state information in the online inference setting. We address this by adopting an unsupervised loss to construct the online model updating procedure, which involves a sequence of filtering operations for online maximum likelihood-based parameter estimation. We empirically evaluate the effectiveness of the proposed method, and compare it with supervised learning methods in simulation settings including a multivariate linear Gaussian state-space model and a simulated object tracking experiment.	翻訳日:2023-12-12 17:51:34 公開日:2023-12-10
# RadImageGAN - 医療画像のためのマルチモーダルデータセットスケール生成AI RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging ( http://arxiv.org/abs/2312.05953v1 ) ライセンス: Link先を確認	Zelong Liu, Alexander Zhou, Arnold Yang, Alara Yilmaz, Maxwell Yoo, Mikey Sullivan, Catherine Zhang, James Grant, Daiqing Li, Zahi A. Fayad, Sean Huver, Timothy Deyer, Xueyan Mei	(参考訳) 医用画像の深層学習は、しばしば大規模で高品質なデータを必要とする。しかしながら、医療データセットはデータアベイラビリティ、ドメイン固有の知識、プライバシの懸念によって制限されており、radimagenetのような大規模で多様な放射線データベースの作成は非常にリソース集約的である。これらの制約に対処するため、我々は102,774人の実際のRadImageNetデータセット上でStyleGAN-XLをトレーニングすることで開発された最初のマルチモーダルラジオグラフィーデータジェネレータであるRadImageGANを紹介した。 RadImageGANは、12の解剖学的領域と130の病理組織を3つのモードで高解像度の医用画像データセットを生成することができる。さらに,radimagegan ジェネレータを bigdatasetgan と併用することで,手作業によるアノテーションの少ない下流セグメンテーションタスクのためのマルチクラスピクセルアノテートされた合成画像とマスクを生成できることを実証した。本研究では,radimageganの合成自動ラベルデータを用いることで,実トレーニングデータの拡張や微調整のための事前学習重み付けの開発により,4種類の下流セグメンテーションデータセットの性能を大幅に向上できることを示した。これは、RadImageGANとBigDatasetGANを組み合わせることで、セグメンテーションタスクのアノテーションに必要なリソースを削減しつつ、モデルパフォーマンスを改善し、データの不足に対処できることを示している。 Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, the first multi-modal radiologic data generator, which was developed by training StyleGAN-XL on the real RadImageNet dataset of 102,774 patients. RadImageGAN can generate high-resolution synthetic medical imaging datasets across 12 anatomical regions and 130 pathological classes in 3 modalities. Furthermore, we demonstrate that RadImageGAN generators can be utilized with BigDatasetGAN to generate multi-class pixel-wise annotated paired synthetic images and masks for diverse downstream segmentation tasks with minimal manual annotation. We showed that using synthetic auto-labeled data from RadImageGAN can significantly improve performance on four diverse downstream segmentation datasets by augmenting real training data and/or developing pre-trained weights for fine-tuning. This shows that RadImageGAN combined with BigDatasetGAN can improve model performance and address data scarcity while reducing the resources needed for annotations for segmentation tasks.	翻訳日:2023-12-12 17:51:19 公開日:2023-12-10
# 静的黒孔の熱力学における非摂動補正の探索 Exploring Non-perturbative Corrections in Thermodynamics of Static Dirty Black Holes ( http://arxiv.org/abs/2312.05948v1 ) ライセンス: Link先を確認	Saheb Soroushfar, Behnam Pourhassan, and \.Izzet Sakall{\i}	(参考訳) 本研究は、アインシュタイン-非線形電気力学(ene)-ディラトン理論の枠組みにおける一様電場に浸漬された汚れたブラックホールの熱力学的性質についての研究である。解析は熱容量、ヘルムホルツ自由エネルギー、内部エネルギーを含む様々な熱力学的側面に分解され、電場の影響下でのブラックホールの挙動についての洞察を与える。さらに、量子補正エントロピーの検証を通じて、量子効果と熱力学的挙動の間の複雑な相互作用を探求する。この研究は、この複雑なシステムで発生する非摂動的補正に光を当てることを目的としており、特定の理論枠組みの中で汚れたブラックホールの修正熱力学を包括的に理解することを目的としている。 This study presents an investigation into the thermodynamic properties of a dirty black hole immersed in a uniform electric field within the framework of the Einstein-Nonlinear Electrodynamics (ENE)-dilaton theory. The analysis delves into various thermodynamic aspects, including heat capacity, Helmholtz free energy, and internal energy, providing insights into the behavior of the black hole under the influence of the electric field. Furthermore, the article explores the intricate interplay between quantum effects and thermodynamic behavior through the examination of quantum-corrected entropy. The study aims to shed light on the non-perturbative corrections that arise in this complex system, offering a comprehensive understanding of the modified thermodynamics of dirty black holes within the specified theoretical framework.	翻訳日:2023-12-12 17:50:51 公開日:2023-12-10
# 因子グラフを用いた訓練深層ニューラルネットワークによる不確かさ伝播 Uncertainty Propagation through Trained Deep Neural Networks Using Factor Graphs ( http://arxiv.org/abs/2312.05946v1 ) ライセンス: Link先を確認	Angel Daruna, Yunye Gong, Abhinav Rajvanshi, Han-Pang Chiu, Yi Yao	(参考訳) 予測の不確実性推定は、安全クリティカルなアプリケーションにおけるディープニューラルネットワークのサブシステムとしての使用を前提とした、依然として難しい問題である。不確実性は予測の不確実性の構成要素であり、モデルの改善によって低減できない。不確実性伝播は、入力の不確実性をネットワーク予測に伝播させることで、アリーアティック不確実性の推定を試みる。既存の不確実性伝播技術では、一方向の情報フローを使用し、層間またはニューラルネットワーク全体にわたって不確実性が伝播する。深層ニューラルネットワーク内の複雑な情報フロー(スキップ接続など)に動機付け,不確実性伝播を係数グラフを用いた非線形最適化問題として用いた新しい手法を開発し,評価した。 3つのデータセットと2つのニューラルネットワークアーキテクチャを含む、ほとんどの実験でファクタグラフを使用する場合、以前の作業よりも、統計的に大幅なパフォーマンス改善が見られた。我々の実装はサンプリングと解析的伝播技術の利点のバランスをとっており、これは性能改善の鍵となる要素であると考えている。 Predictive uncertainty estimation remains a challenging problem precluding the use of deep neural networks as subsystems within safety-critical applications. Aleatoric uncertainty is a component of predictive uncertainty that cannot be reduced through model improvements. Uncertainty propagation seeks to estimate aleatoric uncertainty by propagating input uncertainties to network predictions. Existing uncertainty propagation techniques use one-way information flows, propagating uncertainties layer-by-layer or across the entire neural network while relying either on sampling or analytical techniques for propagation. Motivated by the complex information flows within deep neural networks (e.g. skip connections), we developed and evaluated a novel approach by posing uncertainty propagation as a non-linear optimization problem using factor graphs. We observed statistically significant improvements in performance over prior work when using factor graphs across most of our experiments that included three datasets and two neural network architectures. Our implementation balances the benefits of sampling and analytical propagation techniques, which we believe, is a key factor in achieving performance improvements.	翻訳日:2023-12-12 17:50:40 公開日:2023-12-10
# ASH: 効率的でフォトリアルな人間レンダリングのためのアニマブルなガウススプラッター ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering ( http://arxiv.org/abs/2312.05941v1 ) ライセンス: Link先を確認	Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann	(参考訳) フォトリアリスティックで制御可能なアバターのリアルタイムレンダリングは、コンピュータビジョンとグラフィックスの基盤となっている。ニューラル暗黙的レンダリングの最近の進歩は、デジタルアバターに対する前例のないフォトリアリズムを解き放つ一方で、リアルタイムのパフォーマンスは静的なシーンでのみ実証されている。そこで本研究では,動的人間をリアルタイムに光実写レンダリングするためのアニマタブルなガウススプラッティング手法であるASHを提案する。我々は、被服をアニマタブルな3Dガウスとしてパラメータ化し、画像空間に効率よく切り込み、最終的なレンダリングを生成する。しかし、3d空間でガウスパラメータを自然に学習することは計算の面で厳しい課題となる。代わりに、変形可能なキャラクタモデルにガウシアンをアタッチし、2次元テクスチャ空間でパラメータを学習することで、必要な数のガウシアンで容易にスケールできる効率的な2次元畳み込みアーキテクチャを活用できる。我々は、ポーズ制御可能なアバターの競合手法を用いてASHをベンチマークし、我々の手法が既存のリアルタイムメソッドよりも大きなマージンで優れており、オフラインメソッドよりも同等あるいはそれ以上の結果を示すことを示した。 Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.	翻訳日:2023-12-12 17:50:24 公開日:2023-12-10
# 微調整か、それとも検索か? LLMにおける知識注入の比較 Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs ( http://arxiv.org/abs/2312.05934v1 ) ライセンス: Link先を確認	Oded Ovadia, Menachem Brief, Moshik Mishaeli, Oren Elisha	(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる多様な質問に答える能力によって証明されるように、事前訓練された重みの中に大量の事実情報をカプセル化する。しかしながら、この知識は本質的に限定的であり、トレーニングデータの特性に大きく依存している。したがって、新しい情報を組み込んだり、以前見た情報にllmの機能を洗練したりする外部データセットを使用することは、大きな課題となる。本研究では,微細チューニングと検索強化生成(RAG)の2つのアプローチを比較した。さまざまなトピックにまたがる様々な知識集約的なタスクに対して,両アプローチを評価した。私たちの調査結果は、微調整が改善をもたらす一方で、ragはトレーニング中に遭遇した既存の知識と全く新しい知識の両方において、一貫してそれを上回っています。さらに、llmは微調整によって新しい事実情報を学ぶのに苦労し、訓練中に同じ事実の様々なバリエーションを露出させることでこの問題を緩和できることがわかった。 Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.	翻訳日:2023-12-12 17:50:01 公開日:2023-12-10
# 患者リスク進行のモデル化のための時間監督型コントラスト学習 Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression ( http://arxiv.org/abs/2312.05933v1 ) ライセンス: Link先を確認	Shahriar Noroozizadeh, Jeremy C. Weiss, George H. Chen	(参考訳) 我々は,患者の興味ある結果の可能性が,患者のデータより多く観察されるにつれてどのように変化するかを予測する問題を考える。この問題を解決するために,患者時系列の各段階の埋め込み表現を学習する教師付きコントラスト学習フレームワークを提案する。組込み空間は,(1)組込み空間内の近傍点が類似するクラス確率を持つ,(2)組込み空間内の近傍点に対する同一時系列マップの隣接時間ステップ,(3)全く異なる生特徴ベクトルを持つ時間ステップが組込み空間の遠く離れた領域にマップされる,という特性を持つように学習する。特性 (3) を達成するために, 原特徴空間に最も近い近傍のペアリング機構を用いる。このメカニズムは、臨床表形式データに対して適切な現実性を持つ標準的手順を欠いているコントラスト学習の重要な要素であるデータ拡張の代替としても機能する。本手法は, 敗血症患者 (MIMIC-III データセット) の死亡を予測し, 認知障害者 (ADNI データセット) の進行の追跡において, 最先端のベースラインよりも優れていることを示す。また,本手法は,実験全体にわたる正しい合成データセットの埋め込み構造を一貫して復元する。我々のアブレーション実験は、最も近い隣人のペアリングの重要な役割を示している。 We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to data augmentation, a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.	翻訳日:2023-12-12 17:49:44 公開日:2023-12-10
# 爪折り毛細管解析のための包括的データセットと自動パイプライン A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis ( http://arxiv.org/abs/2312.05930v1 ) ライセンス: Link先を確認	Linxi Zhao, Jiankai Tang, Dongyu Chen, Xiaohong Liu, Yong Zhou, Guangyu Wang, Yuntao Wang	(参考訳) ネイルフォールド毛細管鏡は健康状態の評価法として確立されているが,最近の進歩にもかかわらず,機械学習を用いた自動医用画像解析の可能性は未解決である。本稿では,ディープラーニングモデルの学習に欠かせないリソースとして,総合的なデータセット-321画像,219ビデオ,68のクリニックレポートを構築するための先駆的な取り組みを提案する。このデータセットを利用して,多様な形態的特徴と動的特徴を自動的に検出・測定できるエンドツーエンドのネイルフォールドキャピラリー解析パイプラインを提案する。実験結果は, 異常部分の予測におけるサブピクセル測定精度と90%の精度を示し, 定量的医学研究の進展と医療における広範コンピューティングの実現の可能性を強調した。私たちはオープンソースコードとデータ(https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillaryで利用可能)を共有して、計算医療画像解析における変革的な進歩に貢献しました。 Nailfold capillaroscopy is a well-established method for assessing health conditions, but the untapped potential of automated medical image analysis using machine learning remains despite recent advancements. In this groundbreaking study, we present a pioneering effort in constructing a comprehensive dataset-321 images, 219 videos, 68 clinic reports, with expert annotations-that serves as a crucial resource for training deep-learning models. Leveraging this dataset, we propose an end-to-end nailfold capillary analysis pipeline capable of automatically detecting and measuring diverse morphological and dynamic features. Experimental results demonstrate sub-pixel measurement accuracy and 90% accuracy in predicting abnormality portions, highlighting its potential for advancing quantitative medical research and enabling pervasive computing in healthcare. We've shared our open-source codes and data (available at https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary) to contribute to transformative progress in computational medical image analysis.	翻訳日:2023-12-12 17:49:18 公開日:2023-12-10
# AesFA: 美的特徴を意識した任意型ニューラルネットワーク AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer ( http://arxiv.org/abs/2312.05928v1 ) ライセンス: Link先を確認	Joonwoo Kwon, Sooyoung Kim, Yuewei Lin, Shinjae Yoo, Jiook Cha	(参考訳) ニューラルスタイル転送(NST)は近年大きく進歩している。しかし、その急速な進歩と進歩にもかかわらず、既存のNST手法は、あるスタイルから美的情報を効果的に伝達するのに苦労するか、あるいは事前訓練されたモデルの使用による特徴のゆがみに高い計算コストと非効率に苦しむかのいずれかである。この研究は軽量だが効果的なモデルであるAesFA -- Aesthetic Feature-Aware NSTを提案する。主なアイデアは、モデル全体をエンドツーエンドでトレーニングしながら、その周波数でイメージを分解し、参照画像から審美的なスタイルを分離し、推論時に事前訓練されたモデルを完全に排除することである。ネットワークがより明確な表現を抽出し、スタイライズ品質をさらに向上する能力を向上させるため、本研究では、新しい美的特徴であるコントラッシブ・ロスを導入する。大規模な実験と改善は、最近のNST法をスタイリング品質で上回るだけでなく、より高速な推論も達成していることを示している。コードはhttps://github.com/Sooyyoungg/AesFAで入手できる。 Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Aesthetic Feature-Aware NST. The primary idea is to decompose the image via its frequencies to better disentangle aesthetic styles from the reference image while training the entire model in an end-to-end manner to exclude pre-trained models at inference completely. To improve the network's ability to extract more distinct representations and further enhance the stylization quality, this work introduces a new aesthetic feature: contrastive loss. Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Codes are available at https://github.com/Sooyyoungg/AesFA.	翻訳日:2023-12-12 17:48:58 公開日:2023-12-10
# 言語記述セマンティック検索に基づくロボットマニピュレーションタスクのポリシー Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks ( http://arxiv.org/abs/2312.05925v1 ) ライセンス: Link先を確認	Jannik Sheikh, Andrew Melnik, Gora Chand Nandi, Robert Haschke	(参考訳) 強化学習と模倣学習のアプローチは、タスクのごく一部の例でうまく一般化することが難しい政策学習戦略を利用する。本研究では,状態行動軌跡の実証データセットからオンライン検索ポリシーを作成するための言語条件のセマンティック検索手法を提案する。ここでは、データセットにある最もよく似た操作軌跡からアクションを直接取得する。提案手法は,CALVINベンチマークのベースライン性能を超越し,ゼロショット適応性能が向上する。これは、オンライン検索ベースのポリシーアプローチを、通常Imitation LearningやReinforcement Learningベースのポリシーによって対処されるタスクに拡張する大きな可能性を秘めている。 Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.	翻訳日:2023-12-12 17:48:36 公開日:2023-12-10
# 想像上のアクションによるRLポリシーの修正:新しいタスクの実行を可能にする予測可能なポリシー Modifying RL Policies with Imagined Actions: How Predictable Policies Can Enable Users to Perform Novel Tasks ( http://arxiv.org/abs/2312.05991v1 ) ライセンス: Link先を確認	Isaac Sheidlower, Reuben Aronson, Elaine Short	(参考訳) ユーザーは、ロボットの機能を利用して、リアルタイムで問題を創造的に解決できることが重要です。強化学習(rl)ベースのロボットにアクセス可能なユーザーは、ロボットの自律性とその行動に関する知識を使って新しいタスクを完了させたいかもしれない。 1つの方法は、ユーザが遠隔操作によってロボットのアクション空間の一部を制御し、RLポリシーが残りを同時に制御することである。しかし、既定のrlポリシーは簡単には実現できないかもしれない。例えば、ユーザのコントロールは、ポリシーの観点からロボットを障害状態にし、ユーザが慣れていない方法で動作させることで、ユーザの望むタスクの成功を妨げる可能性がある。本稿では,この課題を定式化し,その問題に対処し,ロボットの行動に対する期待を生かして新たなタスクを実現するための初期アルゴリズムであるiodaを提案する。 It is crucial that users are empowered to use the functionalities of a robot to creatively solve problems on the fly. A user who has access to a Reinforcement Learning (RL) based robot may want to use the robot's autonomy and their knowledge of its behavior to complete new tasks. One way is for the user to take control of some of the robot's action space through teleoperation while the RL policy simultaneously controls the rest. However, an out-of-the-box RL policy may not readily facilitate this. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act in a way the user is not familiar with, hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm for addressing that problem and empowering user's to leverage their expectation of a robot's behavior to accomplish new tasks.	翻訳日:2023-12-12 17:43:52 公開日:2023-12-10
# テキストから遅延メッセージの特徴を抽出するベクティナリーの構築:道徳的アピールを事例として Constructing Vec-tionaries to Extract Latent Message Features from Texts: A Case Study of Moral Appeals ( http://arxiv.org/abs/2312.05990v1 ) ライセンス: Link先を確認	Zening Duan, Anqi Shao, Yicheng Hu, Heysung Lee, Xining Liao, Yoo Ji Suh, Jisoo Kim, Kai-Cheng Yang, Kaiping Chen, and Sijia Yang	(参考訳) コミュニケーション研究は、道徳的アピールのような潜在メッセージの特徴をしばしば研究しているが、その定量化は依然として課題である。従来のヒューマンコーディングはスケーラビリティとインターコーダの信頼性に苦しむ。辞書ベースの手法はコスト効率と計算効率が良いが、文脈感度に欠けることが多く、本来の用途で開発された語彙によって制限される。本稿では,非線形最適化による単語埋め込みによる検証辞書の高速化を目的とした,ベクトル定値測定ツールの構築手法を提案する。埋め込みによって符号化される意味関係を利用することにより、vec-tionaryは、元の語彙を他の文脈に適用する可能性を広げることで、潜在メッセージ特徴の測定を改善する。 Vec-tionariesは、辞書の本来の語彙を超えて、テキスト、特に短いフォーマットのテキストから意味情報を抽出するのに役立つ。重要なことに、vec-tionaryは、テキストの強み以上の潜在機能の価値とあいまいさを捉えるために、追加のメトリクスを生成することができる。新型コロナウイルス関連ツイートの道徳的魅力を事例研究として、倫理的基盤を構築するためのステップを解説し、辞書の手法で欠落した投稿を処理し、クラウドソースされた人的評価に適合した測定結果を作成する能力を示す。さらに、道徳的基礎からのさらなるメトリクスは、メッセージの再送のような予測結果を促進するユニークな洞察を明らかにした。 While communication research frequently studies latent message features like moral appeals, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present a novel approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionaries improve the measurement of latent message features by expanding the applicability of original vocabularies to other contexts. Vec-tionaries can also help extract semantic information from texts, especially those in short format, beyond the original vocabulary of a dictionary. Importantly, a vec-tionary can produce additional metrics to capture the valence and ambivalence of a latent feature beyond its strength in texts. Using moral appeals in COVID-19-related tweets as a case study, we illustrate the steps to construct the moral foundations vec-tionary, showcasing its ability to process posts missed by dictionary methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the moral foundations vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission.	翻訳日:2023-12-12 17:43:37 公開日:2023-12-10
# Denoising Diffusion Probabilistic Modelの収束性に関する一考察 A Note on the Convergence of Denoising Diffusion Probabilistic Models ( http://arxiv.org/abs/2312.05989v1 ) ライセンス: Link先を確認	Sokhna Diarra Mbacke, Omar Rivasplata	(参考訳) 拡散モデルは深層生成モデルの最も重要なファミリーの1つである。本稿では,データ生成分布と拡散モデルで学習した分布との間のワッサーシュタイン距離の定量的上限を導出する。この分野の先行研究とは異なり、この結果は学習スコア関数を仮定しない。さらに、有界なインスタンス空間上の任意のデータ生成分布に対する束縛は、密度 w.r.t. を持たないものでさえも、ルベーグ測度であり、上界は指数的依存関係を伴わない。我々の主な成果は、Mbacke et al. (2023) の最近の研究に基づいている。 Diffusion models are one of the most important families of deep generative models. In this note, we derive a quantitative upper bound on the Wasserstein distance between the data-generating distribution and the distribution learned by a diffusion model. Unlike previous works in this field, our result does not make assumptions on the learned score function. Moreover, our bound holds for arbitrary data-generating distributions on bounded instance spaces, even those without a density w.r.t. the Lebesgue measure, and the upper bound does not suffer from exponential dependencies. Our main result builds upon the recent work of Mbacke et al. (2023) and our proofs are elementary.	翻訳日:2023-12-12 17:43:12 公開日:2023-12-10
# 反復変形学習による乳児脳MRIからの球面形状を持つ皮質表面の再構成 Reconstruction of Cortical Surfaces with Spherical Topology from Infant Brain MRI via Recurrent Deformation Learning ( http://arxiv.org/abs/2312.05986v1 ) ライセンス: Link先を確認	Xiaoyang Chen, Junjie Zhao, Siyuan Liu, Sahar Ahmad, Pew-Thian Yap	(参考訳) MRIからの皮質表面再構成(CSR)は、脳の構造と機能を研究する鍵となる。近年のディープラーニングアプローチはCSRの速度を大幅に向上させたが、下流の幾何学的解析を容易にするために、皮質を位相的に正しい球面多様体にマッピングするためには、かなりのランタイムが必要である。さらに、このマッピングは、表面メッシュのトポロジーが球面とホモトピーである場合にのみ可能である。本稿では,数秒以内に効率的にCSRと球面マッピングを同時に行う手法を提案する。提案手法は,2つのサブネットワークをシームレスに接続し,白色表面生成を行う。残留微分同相変形を反復的に学習し, メッシュトポロジーと均一性を保ちながら, 球面テンプレートメッシュを白色およびピアル面に徐々にワープする。テンプレート球面と皮質面の間の1対1の頂点対応により、凸性や曲率といった幾何学的特徴を球面に簡単に直接マッピングでき、可視化や下流処理が可能となる。乳児期脳MRIに対するアプローチの有効性を実証し,初生後1年間の急速な脳発達に伴う組織コントラストの変化により,CSRに重大な課題を提起した。 0～12ヶ月の幼児のデータセットに基づく性能評価の結果,本手法はメッシュの正則性を大幅に向上し,幾何学的誤差を低減し,高度な計算効率を維持しつつ,最先端のディープラーニングアプローチよりも優れていた。 Cortical surface reconstruction (CSR) from MRI is key to investigating brain structure and function. While recent deep learning approaches have significantly improved the speed of CSR, a substantial amount of runtime is still needed to map the cortex to a topologically-correct spherical manifold to facilitate downstream geometric analyses. Moreover, this mapping is possible only if the topology of the surface mesh is homotopic to a sphere. Here, we present a method for simultaneous CSR and spherical mapping efficiently within seconds. Our approach seamlessly connects two sub-networks for white and pial surface generation. Residual diffeomorphic deformations are learned iteratively to gradually warp a spherical template mesh to the white and pial surfaces while preserving mesh topology and uniformity. The one-to-one vertex correspondence between the template sphere and the cortical surfaces allows easy and direct mapping of geometric features like convexity and curvature to the sphere for visualization and downstream processing. We demonstrate the efficacy of our approach on infant brain MRI, which poses significant challenges to CSR due to tissue contrast changes associated with rapid brain development during the first postnatal year. Performance evaluation based on a dataset of infants from 0 to 12 months demonstrates that our method substantially enhances mesh regularity and reduces geometric errors, outperforming state-of-the-art deep learning approaches, all while maintaining high computational efficiency.	翻訳日:2023-12-12 17:42:59 公開日:2023-12-10
# 重み付き導入による差分差分に対する融合2ウェイ固定効果 Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions ( http://arxiv.org/abs/2312.05985v1 ) ライセンス: Link先を確認	Gregory Faletto	(参考訳) 停滞した導入下での差分差分に対する正準二方向固定効果推定器のバイアスに対処するため、Woldridge (2021) は拡張二方向固定効果推定器を提案し、多くのパラメータを追加した。しかし、これは効率を低下させる。これらのパラメータのいくつかを等しく制限することは役に立つが、アドホックな制限はバイアスを再導入する可能性がある。本研究では,これらの制約の自動データ駆動選択を可能にする,単一チューニングパラメータによる拡張双方向固定効果(fetwfe)を用いた機械学習推定器を提案する。 FETWFEは適切な空間性仮定の下で、確率が1の傾向の正しい制限を識別する。また,FETWFEとFETWFEの整合性,漸近的正規性,およびオラクル効率を条件付きおよび辺縁的平行性傾向の両条件で評価し,条件付き平均処理効果の2種類の条件付き平均化効果の整合性を示す。シミュレーション研究におけるFETWFEの実証と実証応用について述べる。 To address the bias of the canonical two-way fixed effects estimator for difference-in-differences under staggered adoptions, Wooldridge (2021) proposed the extended two-way fixed effects estimator, which adds many parameters. However, this reduces efficiency. Restricting some of these parameters to be equal helps, but ad hoc restrictions may reintroduce bias. We propose a machine learning estimator with a single tuning parameter, fused extended two-way fixed effects (FETWFE), that enables automatic data-driven selection of these restrictions. We prove that under an appropriate sparsity assumption FETWFE identifies the correct restrictions with probability tending to one. We also prove the consistency, asymptotic normality, and oracle efficiency of FETWFE for two classes of heterogeneous marginal treatment effect estimators under either conditional or marginal parallel trends, and we prove consistency for two classes of conditional average treatment effects under conditional parallel trends. We demonstrate FETWFE in simulation studies and an empirical application.	翻訳日:2023-12-12 17:42:35 公開日:2023-12-10
# ハイブリッドニューラルフィールドのための高精度微分作用素 Accurate Differential Operators for Hybrid Neural Fields ( http://arxiv.org/abs/2312.05984v1 ) ライセンス: Link先を確認	Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan	(参考訳) ニューラルネットワークは、形状表現からニューラルレンダリング、偏微分方程式(PDE)の解法など、様々な分野で広く使われている。小さなMLPと明示的な表現を活用するInstant NGPのようなハイブリッドニューラルネットワーク表現の出現により、これらのモデルは迅速にトレーニングされ、大きなシーンに適合する。しかし、レンダリングやシミュレーションのような多くのアプリケーションでは、ハイブリッドニューラルネットワークは顕著で不合理なアーティファクトを引き起こす可能性がある。これは、これらの下流アプリケーションに必要な正確な空間微分が得られないためである。本研究では,これらの課題を回避する2つの方法を提案する。我々の最初のアプローチは、事前訓練されたハイブリッドニューラルネットワークからより正確な導出を得るために局所多項式フィッティングを使用するポストホック演算子である。さらに,初期信号を維持しながら正確な導関数を直接生成するために,神経場を洗練する自己教師付き微調整手法を提案する。提案手法のレンダリング, 衝突シミュレーション, PDE の解法への応用について述べる。提案手法を用いることで, より正確な導関数が得られ, アーティファクトが低減され, 下流のアプリケーションでより正確なシミュレーションがもたらされる。 Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybrid neural fields can cause noticeable and unreasonable artifacts. This is because they do not yield accurate spatial derivatives needed for these downstream applications. In this work, we propose two ways to circumvent these challenges. Our first approach is a post hoc operator that uses local polynomial-fitting to obtain more accurate derivatives from pre-trained hybrid neural fields. Additionally, we also propose a self-supervised fine-tuning approach that refines the neural field to yield accurate derivatives directly while preserving the initial signal. We show the application of our method on rendering, collision simulation, and solving PDEs. We observe that using our approach yields more accurate derivatives, reducing artifacts and leading to more accurate simulations in downstream applications.	翻訳日:2023-12-12 17:42:02 公開日:2023-12-10
# 電気自動車充電ステーションの最適位置の最大流量に基づく定式化 Maximum flow-based formulation for the optimal location of electric vehicle charging stations ( http://arxiv.org/abs/2312.05980v1 ) ライセンス: Link先を確認	Pierre-Luc Parent and Margarida Carvalho and Miguel F. Anjos and Ribal Atallah	(参考訳) 気候変動の影響が増すにつれ、化石燃料から遠ざかる緊急性はこれまで以上に大きくなっている。電気自動車(ev)は、これらの効果を減少させる一つの方法であるが、その普及は充電ステーションが不十分なため、しばしば制限される。この作業では、ユーザの満足度(と充電ステーションの可用性)の観点から、より優れたサービスの質を提供するために、ev充電ステーションのインフラストラクチャを拡張することを目標としています。特に我々の焦点は都市部に向けられている。まず, ステーションへのEV充電需要の配分モデルを提案し, 最大流量問題として検討した。このモデルは、所定の課金インフラストラクチャによるユーザ満足度の評価の基礎となる。第2に,混合整数線形プログラムに最大流量モデルを導入することで,新しい駅の開設や追加出口による容量拡大に関する決定を行う。実世界のシナリオを扱うための我々のアプローチのスケーラビリティを実証し、モントリオール市の方法論を紹介します。実例の解法では,充電需要の時間的変化と時間的変化の両方を考慮すると有意義である。 With the increasing effects of climate change, the urgency to step away from fossil fuels is greater than ever before. Electric vehicles (EVs) are one way to diminish these effects, but their widespread adoption is often limited by the insufficient availability of charging stations. In this work, our goal is to expand the infrastructure of EV charging stations, in order to provide a better quality of service in terms of user satisfaction (and availability of charging stations). Specifically, our focus is directed towards urban areas. We first propose a model for the assignment of EV charging demand to stations, framing it as a maximum flow problem. This model is the basis for the evaluation of user satisfaction with a given charging infrastructure. Secondly, we incorporate the maximum flow model into a mixed-integer linear program, where decisions on the opening of new stations and on the expansion of their capacity through additional outlets is accounted for. We showcase our methodology for the city of Montreal, demonstrating the scalability of our approach to handle real-world scenarios. We conclude that considering both spacial and temporal variations in charging demand is meaningful when solving realistic instances.	翻訳日:2023-12-12 17:41:33 公開日:2023-12-10
# novacomet: シンボリック知識蒸留を伴うオープンコモンセンス基礎モデル NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation ( http://arxiv.org/abs/2312.05979v1 ) ライセンス: Link先を確認	Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi	(参考訳) novacometはオープン・コモンセンス・ナレッジ・モデルで、知識と一般的なタスク・モデルの最良の側面を組み合わせたものです。従来の知識モデルと比較すると、NovaCOMETは推論タスクへの直接適用を可能にするオープンフォーマットのリレーションシップを可能にしており、Flan-T5のような一般的なタスクモデルと比較して、知識を明示的に中心とし、常識推論の優れたパフォーマンスを実現する。 NovaCOMETは、不透明なプロプライエタリモデルの知識を活用して、オープンな知識パイプラインを作成する。第一に、知識は象徴的にNovATOMICに蒸留され、これは、監査、批判、フィルタリングが可能な公開リリースの個別知識グラフである。次に、NovaCOMETをNovATOMIC上で訓練し、オープンソースの事前学習モデルを微調整する。 NovaCOMETはオープンフォーマットのトレーニング目標を使用して、過去の知識モデルの固定された関係セットを置き換えることで、データ内の任意の構造が入力や出力として機能できるようにする。生成された生成モデルは、オプションで人間のアノテーションで拡張され、さまざまなコモンセンス生成タスクでFlan-T5のようなオープンタスクモデルと一致するか、超える。 NovaCOMETは、命令チューニングのみに焦点を合わせ、コモンセンス知識を明示的にモデル化する上で、明確な利点を示す。 We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.	翻訳日:2023-12-12 17:41:01 公開日:2023-12-10
# 高速ブラッグピーク解析のためのニューラルアーキテクチャ符号符号 Neural Architecture Codesign for Fast Bragg Peak Analysis ( http://arxiv.org/abs/2312.05978v1 ) ライセンス: Link先を確認	Luke McDermott, Jason Weitz, Dmitri Demler, Daniel Cummings, Nhan Tran, Javier Duarte	(参考訳) 高エネルギー回折顕微鏡で高速かつリアルタイムブラッグピーク解析を行うために,ニューラルネットワークのコード署名を合理化する自動パイプラインを開発した。従来のアプローチ、特に擬似Voigtフィッティングは重要な計算資源を必要とし、より効率的なソリューションのためのディープラーニングモデルへの関心を喚起した。我々の手法では、ハードウェアコストを含むこれらのモデルを強化するためにニューラルアーキテクチャ検索とAutoMLを使用し、よりハードウェア効率の良いニューラルアーキテクチャの発見に繋がる。その結果,従来の最先端技術と比較して,ビット演算の13$\times$削減を実現した。量子化・アウェアトレーニングやニューラルネットワークのプルーニングといったモデル圧縮技術により、さらなるスピードアップを示す。さらに、階層的な検索空間は最適化の柔軟性を高め、他のタスクやドメインにも簡単に拡張できます。 We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in high-energy diffraction microscopy. Traditional approaches, notably pseudo-Voigt fitting, demand significant computational resources, prompting interest in deep learning models for more efficient solutions. Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures. Our results match the performance, while achieving a 13$\times$ reduction in bit operations compared to the previous state-of-the-art. We show further speedup through model compression techniques such as quantization-aware-training and neural network pruning. Additionally, our hierarchical search space provides greater flexibility in optimization, which can easily extend to other tasks and domains.	翻訳日:2023-12-12 17:40:27 公開日:2023-12-10
# 国間における人工メディアの検出に関する代表的研究 A Representative Study on Human Detection of Artificially Generated Media Across Countries ( http://arxiv.org/abs/2312.05976v1 ) ライセンス: Link先を確認	Joel Frank, Franziska Herbert, Jonas Ricker, Lea Sch\"onherr, Thorsten Eisenhofer, Asja Fischer, Markus D\"urmuth, Thorsten Holz	(参考訳) AI生成メディアは、私たちが知っているデジタル社会にとって脅威となっている。これらの偽造物は、公開技術に基づいて、かつ大規模に自動的に作成することができる。この課題を認識した学者や実践者は,そのような人工メディアを検出するための自動検出戦略を多数提案している。しかし、これらの技術的進歩とは対照的に、生成されたメディアに対する人間の認識は、まだ十分に研究されていない。本稿では,この研究ギャップを埋めることを目的とする。我々は,3カ国(米国,ドイツ,中国)で音声,画像,テキストメディアの3,002人を対象に,生成メディアを検出する人々の能力に関する総合的な調査を行った。以上の結果から,最先端の偽造品は「リアル」メディアとほとんど区別できないことが示唆された。さらに、AIによって生成されたメディア受信は、あらゆるメディアタイプとすべての国で、より人間らしく投票される。生成メディアの検出能力に影響を及ぼす要因をさらに理解するため,ディープフェイクやフェイクニュース研究の分野における文献レビューに基づいて選択した個人変数を含む。回帰分析の結果, 総合的信頼, 認知的リフレクション, およびディープフェイクに対する自己報告的親和性は, 全メディアカテゴリーの参加者の判断に大きく影響した。 AI-generated media has become a threat to our digital society as we know it. These forgeries can be created automatically and on a large scale based on publicly available technology. Recognizing this challenge, academics and practitioners have proposed a multitude of automatic detection strategies to detect such artificial media. However, in contrast to these technical advances, the human perception of generated media has not been thoroughly studied yet. In this paper, we aim at closing this research gap. We perform the first comprehensive survey into people's ability to detect generated media, spanning three countries (USA, Germany, and China) with 3,002 participants across audio, image, and text media. Our results indicate that state-of-the-art forgeries are almost indistinguishable from "real" media, with the majority of participants simply guessing when asked to rate them as human- or machine-generated. In addition, AI-generated media receive is voted more human like across all media types and all countries. To further understand which factors influence people's ability to detect generated media, we include personal variables, chosen based on a literature review in the domains of deepfake and fake news research. In a regression analysis, we found that generalized trust, cognitive reflection, and self-reported familiarity with deepfakes significantly influence participant's decision across all media categories.	翻訳日:2023-12-12 17:39:36 公開日:2023-12-10
# FM-G-CAM:コンピュータビジョンにおける説明可能なAIの全体的アプローチ FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision ( http://arxiv.org/abs/2312.05975v1 ) ライセンス: Link先を確認	Ravidu Suien Rammuni Silva, Jordan J. Bird	(参考訳) 説明可能性(Explainability)は、現実世界のインパクトとユーザビリティに不可欠な、現代のAIの側面である。本研究の目的は,コンピュータビジョンモデル,特に畳み込みニューラルネットワーク(CNN)に基づくモデルの予測を理解する必要性を強調することである。既存のCNN予測法は、主にグラディエント重み付きクラスアクティベーションマップ(Grad-CAM)に基づいており、単一のターゲットクラスのみに焦点を当てている。対象クラス選択の観点から予測過程を仮定し,予測者のcnnモデルの思考過程の大部分を無視することを示した。本稿では,複数のトップ予測クラスを考慮したfm-g-cam(fused multi-class gradient-weighted class activation map)と呼ばれる徹底的な手法を提案する。また,本手法の詳細な数学的およびアルゴリズム的記述も提供する。さらに,既存の手法の簡潔な比較とともに,FM-G-CAMとGrad-CAMを比較し,現実の実践的ユースケースによるメリットを強調した。最後に,FM-G-CAMを実装したオープンソースのPythonライブラリを提案する。 Explainability is an aspect of modern AI that is vital for impact and usability in the real world. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) based models. Existing methods of explaining CNN predictions are mostly based on Gradient-weighted Class Activation Maps (Grad-CAM) and solely focus on a single target class. We show that from the point of the target class selection, we make an assumption on the prediction process, hence neglecting a large portion of the predictor CNN model's thinking process. In this paper, we present an exhaustive methodology called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM) that considers multiple top predicted classes, which provides a holistic explanation of the predictor CNN's thinking rationale. We also provide a detailed and comprehensive mathematical and algorithmic description of our method. Furthermore, along with a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.	翻訳日:2023-12-12 17:38:39 公開日:2023-12-10
# 潜在ノードと構造騒音下におけるネットワーク力学系の因果構造学習 Learning the Causal Structure of Networked Dynamical Systems under Latent Nodes and Structured Noise ( http://arxiv.org/abs/2312.05974v1 ) ライセンス: Link先を確認	Augusto Santos, Diogo Rente, Rui Seabra and Jos\'e M. F. Moura	(参考訳) 本稿では,線形ネットワーク型力学系(NDS)の隠れ因果ネットワークを,そのノードの一部の時系列データから学習する。 NDSのダイナミクスは、一対のノード間で急激な関連を生み出す色付きノイズによって駆動され、問題をはるかに難しくする。ノイズ相関と部分可観測性の課題に対処するため,観測ノードの時系列データから計算した特徴ベクトルを各ノードに割り当てる。特徴の集合を一貫して分割するアフィン超平面が存在し、接続されたノードのペアに対応する特徴ベクトルと非連結なペアに対応するものとを分離する。従って因果推論問題は、設計された特徴をクラスタリングすることで解決される。単純なベースライン教師付き手法を用いて,実世界ネットワークを含む広帯域接続環境と雑音相関レベル下での因果推論機構の競合性能を実証する。さらに,線形NDSにおける構造整合性の新たな技術的保証を考察した。 This paper considers learning the hidden causal network of a linear networked dynamical system (NDS) from the time series data at some of its nodes -- partial observability. The dynamics of the NDS are driven by colored noise that generates spurious associations across pairs of nodes, rendering the problem much harder. To address the challenge of noise correlation and partial observability, we assign to each pair of nodes a feature vector computed from the time series data of observed nodes. The feature embedding is engineered to yield structural consistency: there exists an affine hyperplane that consistently partitions the set of features, separating the feature vectors corresponding to connected pairs of nodes from those corresponding to disconnected pairs. The causal inference problem is thus addressed via clustering the designed features. We demonstrate with simple baseline supervised methods the competitive performance of the proposed causal inference mechanism under broad connectivity regimes and noise correlation levels, including a real world network. Further, we devise novel technical guarantees of structural consistency for linear NDS under the considered regime.	翻訳日:2023-12-12 17:38:15 公開日:2023-12-10
# 参照のない3次元クラウド品質評価のための周波数とVTTの活性化 Activating Frequency and ViT for 3D Point Cloud Quality Assessment without Reference ( http://arxiv.org/abs/2312.05972v1 ) ライセンス: Link先を確認	Oussama Messai, Abdelouahid Bentamou, Abbass Zein-Eddine, Yann Gavet	(参考訳) 深層学習に基づく品質評価は、知覚的マルチメディア品質評価を著しく向上させたが、3dポイントクラウド(pcs)のような3dビジュアルデータはまだ初期段階にある。 3D-PCの容量が大きいため、このような量は送信や視聴のために頻繁に圧縮され、品質に影響を及ぼす可能性がある。そこで我々は,与えられた3D-PCの非参照品質指標を提案する。幾何や色に着目した既存手法と比較して, 圧縮による空間劣化パターンの指標として, 周波数等級を統合することを提案する。入力属性を品質スコアにマップするには、変形可能な畳み込みネットワーク(dcn)と視覚トランスフォーマー(vit)を組み合わせた軽量ハイブリッドディープモデルを用いる。 icip20 [1]、pointxr [2] dataset、basics [3]と呼ばれる新しいデータセットで実験が行われている。その結果,本手法は現在のNR-PCQA測度やPointXRのFR-PCQAよりも優れていた。実装コードはhttps://github.com/o-messai/3d-pcqa。 Deep learning-based quality assessments have significantly enhanced perceptual multimedia quality assessment, however it is still in the early stages for 3D visual data such as 3D point clouds (PCs). Due to the high volume of 3D-PCs, such quantities are frequently compressed for transmission and viewing, which may affect perceived quality. Therefore, we propose no-reference quality metric of a given 3D-PC. Comparing to existing methods that mostly focus on geometry or color aspects, we propose integrating frequency magnitudes as indicator of spatial degradation patterns caused by the compression. To map the input attributes to quality score, we use a light-weight hybrid deep model; combined of Deformable Convolutional Network (DCN) and Vision Transformers (ViT). Experiments are carried out on ICIP20 [1], PointXR [2] dataset, and a new big dataset called BASICS [3]. The results show that our approach outperforms state-of-the-art NR-PCQA measures and even some FR-PCQA on PointXR. The implementation code can be found at: https://github.com/o-messai/3D-PCQA	翻訳日:2023-12-12 17:37:59 公開日:2023-12-10
# 境界場を持つスピン-$\frac{1}{2}$ XXZ鎖におけるスピン分数化とゼロモード Spin fractionalization and zero modes in the spin-$\frac{1}{2}$ XXZ chain with boundary fields ( http://arxiv.org/abs/2312.05970v1 ) ライセンス: Link先を確認	Parameshwar R. Pasnoori, Yicheng Tang, Junhyun Lee, J. H. Pixley, Natan Andrei, Patrick Azaria	(参考訳) この研究において、境界磁場を持つガッピング相における反強磁性スピン $\frac{1}{2}$ xxz 鎖は、その端に分数スピン $\frac{1}{4}$ を持つと主張する。ベーテ・アンザッツと密度行列再正規化群の組み合わせを用いて、これらの分数スピンは基底と第1の励起状態の両方においてシャープな量子観測可能であり、関連する分数スピン作用素の分散はゼロであることを示す。零端場の極限において、これらの分数スピン作用素はかつて基底状態と第1励起状態によって広がる低エネルギー部分空間に射影され、P. Fendley \cite{Fendley} によって発見された強い零エネルギーモードと同一視される。 In this work we argue that the antiferromagnetic spin $\frac{1}{2}$ XXZ chain in the gapped phase with boundary magnetic fields hosts fractional spin $\frac{1}{4}$ at its edges. Using a combination of Bethe ansatz and the density matrix renormalization group we show that these fractional spins are sharp quantum observables in both the ground and the first excited state as the associated fractional spin operators have zero variance. In the limit of zero edge fields, we argue that these fractional spin operators once projected onto the low energy subspace spanned by the ground state and the first excited state, identify with the strong zero energy mode discovered by P. Fendley \cite{Fendley}.	翻訳日:2023-12-12 17:37:38 公開日:2023-12-10
# 跳躍する手術用コンピュータビジョン Jumpstarting Surgical Computer Vision ( http://arxiv.org/abs/2312.05968v1 ) ライセンス: Link先を確認	Deepak Alapatt, Aditya Murali, Vinkle Srivastav, Pietro Mascagni, AI4SafeChole Consortium, Nicolas Padoy	(参考訳) 目的: 研究者と業界の間での一般的なコンセンサスでは、大規模な注釈付きデータセットの欠如が、外科データ科学の分野における進歩の最大の障害であることを示している。自己教師型学習は、この問題の一部に対する解決策であり、アノテーションへの依存を取り除く。しかし,現在の自己教師あり学習法の領域シフトへの頑健性はいまだ不明であり,多種多様な手術データを活用するための有用性の理解は限られている。方法: 本研究では, 様々な手術用データセットを柔軟に活用するために, 自己教師付き学習を用いて, 様々な手術下処理に使用できるタスク非依存表現を学習する。本研究は,下流作業性能に対するプレトレーニングの影響を明らかにするために,ソース病院,外科手術の種類,トレーニング前の規模(動画数)の3変数を調整し,22種類のプレトレーニングデータセットの組み合わせを探索する。次に, 腹腔鏡下胆嚢摘出術における位相認識と安全性の重要視, 腹腔鏡下子宮摘出術における位相認識の3つの課題について検討した。結果: コントロールされた実験は、さまざまなタスク、データセット、ラベリング予算におけるパフォーマンスの大幅な向上を強調する。しかしながら、このパフォーマンスは、いくつかの研究段階を通じて堅牢に証明された事前学習データセットの構成と複雑に結びついている。結論: 事前トレーニングデータセットの構成は,さまざまなダウンストリームタスクに対するSSLメソッドの有効性に大きく影響し,SSL方法論の適用拡大に向けた今後のデータ収集の取り組みを批判的に伝える必要がある。キーワード:自己監督学習、移行学習、手術用コンピュータビジョン、内視鏡映像、安全性の批判的視点、位相認識 Purpose: General consensus amongst researchers and industry points to a lack of large, representative annotated datasets as the biggest obstacle to progress in the field of surgical data science. Self-supervised learning represents a solution to part of this problem, removing the reliance on annotations. However, the robustness of current self-supervised learning methods to domain shifts remains unclear, limiting our understanding of its utility for leveraging diverse sources of surgical data. Methods: In this work, we employ self-supervised learning to flexibly leverage diverse surgical datasets, thereby learning taskagnostic representations that can be used for various surgical downstream tasks. Based on this approach, to elucidate the impact of pre-training on downstream task performance, we explore 22 different pre-training dataset combinations by modulating three variables: source hospital, type of surgical procedure, and pre-training scale (number of videos). We then finetune the resulting model initializations on three diverse downstream tasks: namely, phase recognition and critical view of safety in laparoscopic cholecystectomy and phase recognition in laparoscopic hysterectomy. Results: Controlled experimentation highlights sizable boosts in performance across various tasks, datasets, and labeling budgets. However, this performance is intricately linked to the composition of the pre-training dataset, robustly proven through several study stages. Conclusion: The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks and should critically inform future data collection efforts to scale the application of SSL methodologies. Keywords: Self-Supervised Learning, Transfer Learning, Surgical Computer Vision, Endoscopic Videos, Critical View of Safety, Phase Recognition	翻訳日:2023-12-12 17:37:23 公開日:2023-12-10
# 教育用AIのマルチモーダリティ : 汎用人工知能を目指して Multimodality of AI for Education: Towards Artificial General Intelligence ( http://arxiv.org/abs/2312.06037v1 ) ライセンス: Link先を確認	Gyeong-Geon Lee, Lehong Shi, Ehsan Latif, Yizhu Gao, Arne Bewersdorf, Matthew Nyaaba, Shuchen Guo, Zihao Wu, Zhengliang Liu, Hui Wang, Gengchen Mai, Tiaming Liu, and Xiaoming Zhai	(参考訳) 本稿では,マルチモーダル人工知能(AI)アプローチが,教育的文脈における人工知能(AGI)の実現に向けてどのように進んでいるのかを包括的に検討する。教育システムにおけるAIの進化と統合を精査し、聴覚、視覚、審美、言語的な学習様式を含むマルチモーダルの重要な役割を強調している。この研究は、認知フレームワーク、高度な知識表現、適応学習機構、戦略的計画、洗練された言語処理、多様なマルチモーダルデータソースの統合など、AGIの重要な側面を深く掘り下げている。教育パラダイムの改革におけるAGIの変革的ポテンシャルを批判的に評価し、教育と学習の有効性の向上、既存の方法論のギャップを埋めること、教育環境における倫理的配慮とAGIの責任ある利用に対処することに焦点を当てている。本稿は、AGI開発における今後の方向性と課題に関する洞察を提供する、教育におけるマルチモーダルAIの役割の意味についても論じる。この調査は、AIとマルチモダリティ、教育の交わりの微妙な理解を提供することを目的としており、AGIにおける将来の研究と開発の基礎を確立している。 This paper presents a comprehensive examination of how multimodal artificial intelligence (AI) approaches are paving the way towards the realization of Artificial General Intelligence (AGI) in educational contexts. It scrutinizes the evolution and integration of AI in educational systems, emphasizing the crucial role of multimodality, which encompasses auditory, visual, kinesthetic, and linguistic modes of learning. This research delves deeply into the key facets of AGI, including cognitive frameworks, advanced knowledge representation, adaptive learning mechanisms, strategic planning, sophisticated language processing, and the integration of diverse multimodal data sources. It critically assesses AGI's transformative potential in reshaping educational paradigms, focusing on enhancing teaching and learning effectiveness, filling gaps in existing methodologies, and addressing ethical considerations and responsible usage of AGI in educational settings. The paper also discusses the implications of multimodal AI's role in education, offering insights into future directions and challenges in AGI development. This exploration aims to provide a nuanced understanding of the intersection between AI, multimodality, and education, setting a foundation for future research and development in AGI.	翻訳日:2023-12-12 17:30:25 公開日:2023-12-10
# aiコンペティションとベンチマーク:ポストチャレンジ紙、ベンチマーク、その他の普及行動における課題の長期的影響の確保方法 AI Competitions and Benchmarks: How to ensure a long-lasting impact of a challenge with post-challenge paper, benchmarks and other dissemination action ( http://arxiv.org/abs/2312.06036v1 ) ライセンス: Link先を確認	Antoine Marot, David Rousseau, Zhen Xu	(参考訳) AIチャレンジの組織化は最終イベントに終止符を打たない。長期的な影響も組織化する必要がある。この章は、チャレンジが正式に完了した後の様々な活動を取り上げている。異なるアフターチャレンジ活動のターゲットオーディエンスを特定した。チャレンジのさまざまなアウトプットは、それらを収集する手段でリストされる。章の主部は典型的なポストカレンゲ紙のテンプレートであり、グラフや、チャレンジを長期のベンチマークに変換する方法についてのアドバイスを含んでいる。 Organising an AI challenge does not end with the final event. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. The target audience of different post-challenge activities is identified. The various outputs of the challenge are listed with the means to collect them. The main part of the chapter is a template for a typical post-challenge paper, including possible graphs as well as advice on how to turn the challenge into a long-lasting benchmark.	翻訳日:2023-12-12 17:30:05 公開日:2023-12-10
# 正規化フローによるパーソナライズされた感情予測の不確かさのモデル化 Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows ( http://arxiv.org/abs/2312.06034v1 ) ライセンス: Link先を確認	Piotr Mi{\l}kowski, Konrad Karanowski, Patryk Wielopolski, Jan Koco\'n, Przemys{\l}aw Kazienko, Maciej Zi\k{e}ba	(参考訳) 自然言語処理(NLP)における主観的問題に対する予測モデルの設計は依然として困難である。これは主に、その非決定論的性質と、異なる人間の内容に対する異なる認識によるものである。これはパーソナライズされた自然言語処理(pnlp)によって解決される可能性があり、モデルでは読み手に関する追加情報を利用してより正確な予測を行う。しかし、現在のアプローチでは、受信者の完全な情報を直接埋め込む必要がある。さらに、近年の手法は、確率の決定論的推測や単純な周波数に基づく推定に焦点を当てている。本研究では,条件付き正規化フローを用いて予測の不確かさを捉える新しい手法を提案することにより,この制限を克服する。これにより、複雑なマルチモーダル分布をモデル化し、負の対数類似度(NLL)を用いて様々なモデルを比較することができる。さらに、新しいソリューションでは、利用可能なサンプリング機能のおかげで、読者認識の様々な解釈が可能になる。感情認識やヘイトスピーチを含む3つの主観的nlp課題について検証を行った。一般化およびパーソナライズされたアプローチの比較分析により、我々のパーソナライズされたソリューションはベースラインを著しく上回り、より正確な不確実性推定を提供することがわかった。テキストの解釈可能性と不確実性の研究にも影響がある。開発した手法によって得られた情報により、従来のソリューションを超える効果を持つハイブリッドモデルを構築することができる。また,アノテーションとアノテーションを混同したアノテータのエントロピーが高いテキストに対して,与えられた決定の確率分析と可視化を行った。 Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the reader to make more accurate predictions. However, current approaches require complete information about the recipients to be straight embedded. Besides, the recent methods focus on deterministic inference or simple frequency-based estimations of the probabilities. In this work, we overcome this limitation by proposing a novel approach to capture the uncertainty of the forecast using conditional Normalizing Flows. This allows us to model complex multimodal distributions and to compare various models using negative log-likelihood (NLL). In addition, the new solution allows for various interpretations of possible reader perception thanks to the available sampling function. We validated our method on three challenging, subjective NLP tasks, including emotion recognition and hate speech. The comparative analysis of generalized and personalized approaches revealed that our personalized solutions significantly outperform the baseline and provide more precise uncertainty estimates. The impact on the text interpretability and uncertainty studies are presented as well. The information brought by the developed methods makes it possible to build hybrid models whose effectiveness surpasses classic solutions. In addition, an analysis and visualization of the probabilities of the given decisions for texts with high entropy of annotations and annotators with mixed views were carried out.	翻訳日:2023-12-12 17:29:56 公開日:2023-12-10
# モデル開発におけるモデル説明の有用性評価 Evaluating the Utility of Model Explanations for Model Development ( http://arxiv.org/abs/2312.06032v1 ) ライセンス: Link先を確認	Shawn Im, Jacob Andreas, Yilun Zhou	(参考訳) 説明可能なAIのモチベーションのひとつは、AIモデルの使用とデプロイに関して、人間がより良く、より情報的な決定を行えるようにすることです。しかし、この期待が達成されたかどうかを評価するには慎重な評価が必要である。現在の評価は、主に説明のアルゴリズム的性質に焦点を当てており、対象者を含むものは、客観的な測定値や測定値に基かずに、説明の有用性に対する人間の知覚をテストするために主観的な質問をしばしば採用している。本研究では,機械学習モデル開発の実践シナリオにおいて,説明が人間の意思決定を改善できるかどうかを評価する。 smoothgrad, gradcam, oracle による2つのタスク - モデル選択と反事実シミュレーション - によって生成された給与マップを評価するために,画像データを含む混合手法のユーザ調査を行った。驚いたことに、サリエンシマップのいずれかがユーザによって提供されたとき、これらのタスクが大幅に改善されたという証拠は見つからなかった。それでも、説明はユーザーがモデルをより正確に記述するのに役立った。これらの結果は, 塩分に基づく説明における誤解の有用性と可能性について注意を喚起する。 One of the motivations for explainable AI is to allow humans to make better and more informed decisions regarding the use and deployment of AI models. But careful evaluations are needed to assess whether this expectation has been fulfilled. Current evaluations mainly focus on algorithmic properties of explanations, and those that involve human subjects often employ subjective questions to test human's perception of explanation usefulness, without being grounded in objective metrics and measurements. In this work, we evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. We conduct a mixed-methods user study involving image data to evaluate saliency maps generated by SmoothGrad, GradCAM, and an oracle explanation on two tasks: model selection and counterfactual simulation. To our surprise, we did not find evidence of significant improvement on these tasks when users were provided with any of the saliency maps, even the synthetic oracle explanation designed to be simple to understand and highly indicative of the answer. Nonetheless, explanations did help users more accurately describe the models. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.	翻訳日:2023-12-12 17:29:34 公開日:2023-12-10
# 大規模時系列データセットの高速分類 Fast Classification of Large Time Series Datasets ( http://arxiv.org/abs/2312.06029v1 ) ライセンス: Link先を確認	Muhammad Marwan Muhammad Fuad	(参考訳) 時系列分類(TSC)は、医学、気象学、ファイナンスサイバーセキュリティなど多くの分野で応用されているため、時系列マイニングにおいて最も輸入されたタスクである。時系列データセットのサイズがますます大きくなる中、伝統的なtscメソッドのいくつかは、そのような非常に大きなデータセットでこのタスクを実行するのに十分な効率がなくなった。しかし、tscに関する最近の論文では、深層学習(例えば、膨大なデータセットに効率的に適用できない膨大な計算リソースを必要とする)を適用する手法を用いて、精度に重点が置かれている。本論文で紹介する手法は、効率性が主な目的である超大規模時系列データセットに焦点をあてる。時系列の簡易表現によってこれを達成します。これは、表される時系列の値のいくつかしか考慮しない距離測度によって拡張される。この組み合わせの結果は、tscの非常に効率的な表現方法である。これは、その効率で特に人気のある別の時系列法に対して実験的にテストされている。実験の結果,本手法は平均4倍高速であるだけでなく,29件の時系列データセットのうち24件においてより優れた結果が得られるため,分類精度も優れていることがわかった。 . Time series classification (TSC) is the most import task in time series mining as it has several applications in medicine, meteorology, finance cyber security, and many others. With the ever increasing size of time series datasets, several traditional TSC methods are no longer efficient enough to perform this task on such very large datasets. Yet, most recent papers on TSC focus mainly on accuracy by using methods that apply deep learning, for instance, which require extensive computational resources that cannot be applied efficiently to very large datasets. The method we introduce in this paper focuses on these very large time series datasets with the main objective being efficiency. We achieve this through a simplified representation of the time series. This in turn is enhanced by a distance measure that considers only some of the values of the represented time series. The result of this combination is a very efficient representation method for TSC. This has been tested experimentally against another time series method that is particularly popular for its efficiency. The experiments show that our method is not only 4 times faster, on average, but it is also superior in terms of classification accuracy, as it gives better results on 24 out of the 29 tested time series datasets. .	翻訳日:2023-12-12 17:29:12 公開日:2023-12-10
# 仮想現実を用いた注意力トレーニングによるストレス管理 Stress Management Using Virtual Reality-Based Attention Training ( http://arxiv.org/abs/2312.06025v1 ) ライセンス: Link先を確認	Rojaina Mahmoud, Mona Mamdouh, Omneya Attallah, Ahmad Al-Kabbany	(参考訳) 本研究では,ストレス管理のためのツールとしてのバーチャルリアリティに基づく注意訓練の適用性について考察する。メンタルストレスは世界の課題であり、完全に管理されるには程遠い。これにより、ストレスの検出と管理のためのツールの開発と検証に注目すべき研究が続けられている。テクノロジーベースのツールは、仮想現実(VR)技術など、これらの取り組みの中心にある。とはいえ、vrの可能性の大部分は、そのような技術によって消費されるコンテンツの性質にある。本研究では,VRによるストレス管理の実現可能性に及ぼす特別タイプのコンテンツ,すなわちアテンショントレーニングの影響について検討した。大学生14名を対象に,脳波信号が記録されている間にストレス誘発器に2回露出する実験を行った。最初のイテレーションでは、ストレスタスクを開始する前にVRベースのアテンショントレーニングが行われた。複数の特徴と様々な機械学習モデルを用いて、VRベースの注意訓練が、記録された脳波信号における認識されたストレスインスタンス数を一貫して減少させることを示した。この研究はストレス管理のためのvrベースの注意トレーニングの導入に関する予備的な洞察を与え、その結果をより大きなサンプルで再現するために将来の研究が必要である。 In this research, we are concerned with the applicability of virtual reality-based attention training as a tool for stress management. Mental stress is a worldwide challenge that is still far from being fully managed. This has maintained a remarkable research attention on developing and validating tools for detecting and managing stress. Technology-based tools have been at the heart of these endeavors, including virtual reality (VR) technology. Nevertheless, the potential of VR lies, to a large part, in the nature of the content being consumed through such technology. In this study, we investigate the impact of a special type of content, namely, attention training, on the feasibility of using VR for stress management. On a group of fourteen undergraduate engineering students, we conducted a study in which the participants got exposed twice to a stress inducer while their EEG signals were being recorded. The first iteration involved VR-based attention training before starting the stress task while the second time did not. Using multiple features and various machine learning models, we show that VR-based attention training has consistently resulted in reducing the number of recognized stress instances in the recorded EEG signals. This research gives preliminary insights on adopting VR-based attention training for managing stress, and future studies are required to replicate the results in larger samples.	翻訳日:2023-12-12 17:28:55 公開日:2023-12-10
# 抽象テキスト要約におけるデータ蒸留における表現バイアスの活用 Exploiting Representation Bias for Data Distillation in Abstractive Text Summarization ( http://arxiv.org/abs/2312.06022v1 ) ライセンス: Link先を確認	Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty	(参考訳) 抽象的なテキスト要約は、ディープラーニングモデルのニーズを満たすためのトレーニングサンプルの数とともに増えている。これらのモデルは、訓練データ表現を利用して、結果要約の定量的要素を改善することにより、優れた性能を得る傾向がある。しかしながら、トレーニングセットのサイズを増やすことは、常にパフォーマンスを最大化するための理想的なソリューションであるとは限らないため、トレーニングサンプルの品質とディープラーニングモデルの学習プロトコルを再検討する必要がある。本稿では,入力埋め込み空間とモデルエンコーダ空間の間の特性を理解するために,抽象的テキスト要約モデルのベクトル空間を離散化することを目的とする。深いモデルでは入力空間の多様性を捉えられていないことを示す。さらに、エンコーダ空間におけるデータポイントの分布は、トレーニングサンプルの未チェック増加が付加価値をもたらさないことを示している。我々は、モデルのサンプル空間の多様性と、埋め込み空間からエンコーダ空間へのデータポイントのマッピング方法を学ぶためにクラスタリング技術を採用している。さらに,冗長なデータポイントをフィルタしてモデルをより堅牢かつ少ないデータ空腹にするために,メトリクスを考案する。本稿では, BERTScore, FEQA, ピラミドスコアなどの定量値と定性値を用いて, 提案手法のベンチマークを行った。また、モデルが様々な入力サンプルから多様性を学ぶことを妨げる理由を定量化する。 Abstractive text summarization is surging with the number of training samples to cater to the needs of the deep learning models. These models tend to exploit the training data representations to attain superior performance by improving the quantitative element of the resultant summary. However, increasing the size of the training set may not always be the ideal solution to maximize the performance, and therefore, a need to revisit the quality of training samples and the learning protocol of deep learning models is a must. In this paper, we aim to discretize the vector space of the abstractive text summarization models to understand the characteristics learned between the input embedding space and the models' encoder space. We show that deep models fail to capture the diversity of the input space. Further, the distribution of data points on the encoder space indicates that an unchecked increase in the training samples does not add value; rather, a tear-down of data samples is highly needed to make the models focus on variability and faithfulness. We employ clustering techniques to learn the diversity of a model's sample space and how data points are mapped from the embedding space to the encoder space and vice versa. Further, we devise a metric to filter out redundant data points to make the model more robust and less data hungry. We benchmark our proposed method using quantitative metrics, such as Rouge, and qualitative metrics, such as BERTScore, FEQA and Pyramid score. We also quantify the reasons that inhibit the models from learning the diversity from the varied input samples.	翻訳日:2023-12-12 17:28:36 公開日:2023-12-10
# GenDepth: 平面埋め込みによる任意カメラパラメータの単眼深度推定の一般化 GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding ( http://arxiv.org/abs/2312.06021v1 ) ライセンス: Link先を確認	Karlo Koledi\'c, Luka Petrovi\'c, Ivan Petrovi\'c, Ivan Markovi\'c	(参考訳) 学習に基づく単眼深度推定は、トレーニングデータに存在する幾何学的先行情報を利用して、1つの画像からメートル法的深度知覚を可能にする。しかし、これらの先入観は特定の領域に特有であり、見当たらないデータに対する限定的な一般化性能をもたらす。十分に研究された環境領域間隙とは別に、単眼深度推定は様々なカメラパラメータによって引き起こされる領域間隙にも敏感である。この問題は、データセットが単一車両とカメラのセットアップで一般的に収集される自律運転シナリオにおいて特に顕著であり、固定された視点幾何学によるトレーニングデータのバイアスにつながる。本稿では,この傾向に挑戦し,任意の車載カメラ装置の計量深度推定が可能な新しいモデルであるGenDepthを紹介する。十分な多様なカメラパラメータによるデータの欠如に対処するため、まず異なる車両カメラシステムで収集された合成データセットを作成する。そして、2つの目的を同時に最適化するGenDepthを設計する。 (i)合成データにおけるカメラパラメータ変動の等価性 2) 固定車載カメラシステムを用いた1つの実世界のデータセットを用いて, 学習した同値を実世界の環境特徴に伝達する。そこで本研究では,地平面深度にカメラパラメータを埋め込む新しい手法を提案し,これらの埋め込みを対向領域アライメントと統合するアーキテクチャを提案する。我々は、複数の自動運転データセットについてgendepthを検証し、異なる車載カメラシステムに対する最先端の一般化能力を示す。 Learning-based monocular depth estimation leverages geometric priors present in the training data to enable metric depth perception from a single image, a traditionally ill-posed problem. However, these priors are often specific to a particular domain, leading to limited generalization performance on unseen data. Apart from the well studied environmental domain gap, monocular depth estimation is also sensitive to the domain gap induced by varying camera parameters, an aspect that is often overlooked in current state-of-the-art approaches. This issue is particularly evident in autonomous driving scenarios, where datasets are typically collected with a single vehicle-camera setup, leading to a bias in the training data due to a fixed perspective geometry. In this paper, we challenge this trend and introduce GenDepth, a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups. To address the lack of data with sufficiently diverse camera parameters, we first create a bespoke synthetic dataset collected with different vehicle-camera systems. Then, we design GenDepth to simultaneously optimize two objectives: (i) equivariance to the camera parameter variations on synthetic data, (ii) transferring the learned equivariance to real-world environmental features using a single real-world dataset with a fixed vehicle-camera system. To achieve this, we propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment. We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.	翻訳日:2023-12-12 17:28:12 公開日:2023-12-10
# 1+1次元の2電子間の光子の相対論的量子力学について On the relativistic quantum mechanics of a photon between two electrons in 1+1 dimensions ( http://arxiv.org/abs/2312.06019v1 ) ライセンス: Link先を確認	Lawrence Frolov, Samuel E. Leigh, and A. Shadi Tahvildar-Zadeh	(参考訳) 波動方程式のローレンツ共変系は、1つの空間次元の量子力学的3体系に対して定式化され、1つの光子と2つの同一の質量スピン1-ハーフディラック粒子からなる。すなわち、波動関数 $\Psi(\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2})$ where $\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2}$ はそれぞれ光子と2つの電子の一般的な時空イベントである。それらの相互作用は、偶然のサブマニフォールズ $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_1}\}$ および $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_2}\}$ におけるローレンツ不変のno-crossing-of-paths境界条件によって実装される。対応する初期境界値問題は、パウリの排他原理によって与えられる反対称性の仮定の下でうまく仮定され、クライン=ゴードンと輸送方程式の連結系に対する閉形式解が与えられる。 A Lorentz-covariant system of wave equations is formulated for a quantum-mechanical three-body system in one space dimension, comprised of one photon and two identical massive spin one-half Dirac particles, which can be thought of as two electrons (or alternatively, two positrons). Manifest covariance is achieved using Dirac's formalism of multi-time wave functions, i.e, wave functions $\Psi(\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2})$ where $\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2}$ are generic spacetime events of the photon and two electrons respectively. Their interaction is implemented via a Lorentz-invariant no-crossing-of-paths boundary condition at the coincidence submanifolds $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_1}\}$ and $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_2}\}$ compatible with conservation of probability current. The corresponding initial-boundary value problem is shown to be well-posed under the additional assumption of anti-symmetry given by the Pauli exclusion principle, and a closed-form solution to the ensuing coupled system of Klein-Gordon and transport equations is given.	翻訳日:2023-12-12 17:27:48 公開日:2023-12-10
# オートエンコーダとニューラルodeによるアストロケミカル反応ネットワークの高速化 Speeding up astrochemical reaction networks with autoencoders and neural ODEs ( http://arxiv.org/abs/2312.06015v1 ) ライセンス: Link先を確認	Immanuel Sulzer, Tobias Buck	(参考訳) 天体物理学において、複雑な化学反応ネットワークを解くことは必須であるが、ODEシステムの高次元性と剛性のために計算的に要求される。計算負荷を減らす伝統的なアプローチは、しばしば特定の化学ネットワークに特化しており、専門知識を必要とする。本稿では,次元減少のためのオートエンコーダと,アストロケミカル反応ネットワーク計算を高速化する潜在空間ニューラルODEソルバを用いた機械学習ソリューションを提案する。さらに,ニューラルネットワークの代替として,コスト効率の高い潜在空間線形関数解法を提案する。これらの方法は29の化学種と224の反応からなるデータセットで評価される。その結果,ニューラルodeはベースラインモデルに比べて55倍のスピードアップを達成し,相対誤差を最大2桁削減することで精度が向上した。さらに、線形潜在モデルは精度を高め、標準手法に比べて最大4000倍の高速化を実現する。 In astrophysics, solving complex chemical reaction networks is essential but computationally demanding due to the high dimensionality and stiffness of the ODE systems. Traditional approaches for reducing computational load are often specialized to specific chemical networks and require expert knowledge. This paper introduces a machine learning-based solution employing autoencoders for dimensionality reduction and a latent space neural ODE solver to accelerate astrochemical reaction network computations. Additionally, we propose a cost-effective latent space linear function solver as an alternative to neural ODEs. These methods are assessed on a dataset comprising 29 chemical species and 224 reactions. Our findings demonstrate that the neural ODE achieves a 55x speedup over the baseline model while maintaining significantly higher accuracy by up to two orders of magnitude reduction in relative error. Furthermore, the linear latent model enhances accuracy and achieves a speedup of up to 4000x compared to standard methods.	翻訳日:2023-12-12 17:26:59 公開日:2023-12-10
# guardians of trust: ベンダーパートナーシップによるaiopsのデータセキュリティのナビゲート Guardians of Trust: Navigating Data Security in AIOps through Vendor Partnerships ( http://arxiv.org/abs/2312.06008v1 ) ライセンス: Link先を確認	Subhadip Kumar	(参考訳) AIOps(AI AI for IT Operations)は、ITオペレーションの自動化と最適化に人工知能と機械学習を適用する、急速に成長する分野である。 AIOpsベンダは、エンドツーエンドのログ、トレース、メトリクスを取り込み、ITシステムの完全なスタック可観測性を提供するサービスを提供している。しかし、これらのデータソースは、内部ipアドレス、ホスト名、httpヘッダ、sql、メソッド/パラメータの戻り値、url、個人識別情報(pii)、機密ビジネスデータなどの機密情報を含む可能性がある。したがって、aiopsベンダーと作業する場合、データセキュリティは重要な関心事である。この記事では、異なるベンダーが提供するセキュリティ機能と、データ保護とプライバシを確保するためにベストプラクティスをどのように適用できるかについて論じます。 Artificial Intelligence for IT Operations (AIOps) is a rapidly growing field that applies artificial intelligence and machine learning to automate and optimize IT operations. AIOps vendors provide services that ingest end-to-end logs, traces, and metrics to offer a full stack observability of IT systems. However, these data sources may contain sensitive information such as internal IP addresses, hostnames, HTTP headers, SQLs, method/argument return values, URLs, personal identifiable information (PII), or confidential business data. Therefore, data security is a crucial concern when working with AIOps vendors. In this article, we will discuss the security features offered by different vendors and how we can adopt best practices to ensure data protection and privacy.	翻訳日:2023-12-12 17:26:46 公開日:2023-12-10
# バグ修正プロセスに関するソフトウェア問題レポート:機械学習ライブラリに関する実証的研究 Software issues report for bug fixing process: An empirical study of machine-learning libraries ( http://arxiv.org/abs/2312.06005v1 ) ライセンス: Link先を確認	Adekunle Ajibode, Dong Yunwei, Yang Hongji	(参考訳) 問題解決とバグ修正プロセスは、よく最適化された機能を保証するために、ソフトウェア開発と同様の機械学習ライブラリの開発に不可欠である。機械学習ライブラリのイシュー解決とバグ修正プロセスを理解することで、開発者は改善すべき領域を特定し、イシュー解決とバグ修正のための戦略を最適化することができる。しかし、この話題に関する詳細な研究は乏しい。そこで我々は,6つの機械学習ライブラリ,Tensorflow,Keras,Theano,Pytorch,Caffe,Scikit-learnにおけるバグ修正プロセスの課題解決の有効性を検討した。 GitHub Rest APIを通じてGitHubリポジトリから抽出された16,921のイシューを使用して、7つのリサーチ質問(RQ)に対処しました。 rqs分析には相関, ols回帰, パーセンテージと周波数数, ヒートマップなど, データ分析の定量的な方法がいくつか用いられた。 1) マシンラーニングライブラリで発生した問題の最も一般的なカテゴリは、バグ、ドキュメンテーション、最適化、クラッシュ、拡張、新機能要求、ビルド/ci、サポート、パフォーマンスです。 2) 重要なバグの修正、パフォーマンスの最適化、ドキュメントの改善など、これらの問題を解決する効果的な戦略。 3) これらの分類問題はテストとランタイムに関連するもので,6つの機械学習ライブラリすべてに共通している。 (4) 問題に関するコメントの総数を監視することで、問題の期間に関する洞察が得られる。 (5)重要課題の優先順位付けと他の課題へのタイムリーな対処のバランスをとることが不可欠である。そこで本研究では,効率的な課題追跡プロセス,効果的なコミュニケーション,コラボレーションが,機械学習ライブラリの課題解決とバグフィックスの効果的な解決に不可欠であると結論づける。 Issue resolution and bug-fixing processes are essential in the development of machine-learning libraries, similar to software development, to ensure well-optimized functions. Understanding the issue resolution and bug-fixing process of machine-learning libraries can help developers identify areas for improvement and optimize their strategies for issue resolution and bug-fixing. However, detailed studies on this topic are lacking. Therefore, we investigated the effectiveness of issue resolution for bug-fixing processes in six machine-learning libraries: Tensorflow, Keras, Theano, Pytorch, Caffe, and Scikit-learn. We addressed seven research questions (RQs) using 16,921 issues extracted from the GitHub repository via the GitHub Rest API. We employed several quantitative methods of data analysis, including correlation, OLS regression, percentage and frequency count, and heatmap to analyze the RQs. We found the following through our empirical investigation: (1) The most common categories of issues that arise in machine-learning libraries are bugs, documentation, optimization, crashes, enhancement, new feature requests, build/CI, support, and performance. (2) Effective strategies for addressing these problems include fixing critical bugs, optimizing performance, and improving documentation. (3) These categorized issues are related to testing and runtime and are common among all six machine-learning libraries. (4) Monitoring the total number of comments on issues can provide insights into the duration of the issues. (5) It is crucial to strike a balance between prioritizing critical issues and addressing other issues in a timely manner. Therefore, this study concludes that efficient issue-tracking processes, effective communication, and collaboration are vital for effective resolution of issues and bug fixing processes in machine-learning libraries.	翻訳日:2023-12-12 17:26:33 公開日:2023-12-10
# 語彙意味変化検出における大規模言語モデルの評価 Large Language Models on Lexical Semantic Change Detection: An Evaluation ( http://arxiv.org/abs/2312.06002v1 ) ライセンス: Link先を確認	Ruiyu Wang, Matthew Choi	(参考訳) Lexical Semantic Change Detectionは、Large Language Models (LLM)が広く関与していない数少ない領域の1つである。 PPMIやSGNSといった従来の手法は、新しいBERTベースのアプローチとともに研究で広く使われている。 LLMによって様々な自然言語処理領域が包括的にカバーされているにもかかわらず、この特定の領域におけるそれらの適用に関する文献は顕著に乏しい。本研究では,LLMをLexical Semantic Change Detectionの領域に導入することで,このギャップを埋めようとしている。本研究は,3世代にわたる言語モデルにまたがる新しいプロンプトソリューションと包括的評価を提示し,本研究領域におけるLLMの探索に寄与する。 Lexical Semantic Change Detection stands out as one of the few areas where Large Language Models (LLMs) have not been extensively involved. Traditional methods like PPMI, and SGNS remain prevalent in research, alongside newer BERT-based approaches. Despite the comprehensive coverage of various natural language processing domains by LLMs, there is a notable scarcity of literature concerning their application in this specific realm. In this work, we seek to bridge this gap by introducing LLMs into the domain of Lexical Semantic Change Detection. Our work presents novel prompting solutions and a comprehensive evaluation that spans all three generations of language models, contributing to the exploration of LLMs in this research area.	翻訳日:2023-12-12 17:26:02 公開日:2023-12-10
# 対応から詩へ:曖昧さのない最小限の最適相対詩 From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation ( http://arxiv.org/abs/2312.05995v1 ) ライセンス: Link先を確認	Javier Tirado-Gar\'in and Javier Civera	(参考訳) 2つのキャリブレーションビュー間の$n \geq 5$対応による相対カメラポーズの推定は、コンピュータビジョンの基本的なタスクである。この過程には一般的に2つの段階がある。 1)ビューとビューの間に必要不可欠な行列を推定する 2) エピポーラ幾何を満たす4つの候補関係ポーズのうち曖昧さを解消する。本稿では,第2段階をバイパスする新たなアプローチを提案する。具体的には,適切な相対カメラのポーズを処理後ステップを必要とせず,直接対応から推定することが可能であることを示す。証明不能な非最小最適化の最近の進歩に基づいて、擬似制約付き擬似プログラム(QCQP)として相対的なポーズ推定を行う。適切な制約を適用することで,有効な3次元形状に対応するカメラのポーズを推定し,認証時に世界規模で最適とする。提案手法の有効性, 有効性, 精度を検証し, 総合的な合成および実世界の実験による検証を行った。コードはhttps://github.com/javrtg/C2Pで入手できる。 Estimating the relative camera pose from $n \geq 5$ correspondences between two calibrated views is a fundamental task in computer vision. This process typically involves two stages: 1) estimating the essential matrix between the views, and 2) disambiguating among the four candidate relative poses that satisfy the epipolar geometry. In this paper, we demonstrate a novel approach that, for the first time, bypasses the second stage. Specifically, we show that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences. Building on recent advances in certifiable non-minimal optimization, we frame the relative pose estimation as a Quadratically Constrained Quadratic Program (QCQP). By applying the appropriate constraints, we ensure the estimation of a camera pose that corresponds to a valid 3D geometry and that is globally optimal when certified. We validate our method through exhaustive synthetic and real-world experiments, confirming the efficacy, efficiency and accuracy of the proposed approach. Code is available at https://github.com/javrtg/C2P.	翻訳日:2023-12-12 17:25:49 公開日:2023-12-10
# Fast Part: スパース最適化のための過パラメータ確率勾配勾配 FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures ( http://arxiv.org/abs/2312.05993v1 ) ライセンス: Link先を確認	Yohann De Castro, S\'ebastien Gadat, Cl\'ement Marteau	(参考訳) 本稿では,確率的勾配降下戦略とランダムな特徴を併用して,測度の偏最適化問題を解くために特別に調整されたcpgdのスケーラビリティを向上させる新しいアルゴリズムを提案する。変分フレームワーク内でCPGDステップを定式化することにより、以下の重要な結果を示す厳密な数学的証明を提供する。一降下軌道に沿った解測度の総変動ノルムが有界であり、安定を確保し、望ましくない発散を防止すること。 (ii)$\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ の収束率で大域収束保証を確立し、アルゴリズムの有効性と有効性を示す。 (iii)さらに,一階条件の不一致に対する局所制御を解析・確立し,実用的応用におけるアルゴリズムの挙動と信頼性の理解を深めた。 This paper presents a novel algorithm that leverages Stochastic Gradient Descent strategies in conjunction with Random Features to augment the scalability of Conic Particle Gradient Descent (CPGD) specifically tailored for solving sparse optimisation problems on measures. By formulating the CPGD steps within a variational framework, we provide rigorous mathematical proofs demonstrating the following key findings: (i) The total variation norms of the solution measures along the descent trajectory remain bounded, ensuring stability and preventing undesirable divergence; (ii) We establish a global convergence guarantee with a convergence rate of $\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ iterations, showcasing the efficiency and effectiveness of our algorithm; (iii) Additionally, we analyze and establish local control over the first-order condition discrepancy, contributing to a deeper understanding of the algorithm's behavior and reliability in practical applications.	翻訳日:2023-12-12 17:25:28 公開日:2023-12-10
# 再サンプリングによる拡散生成の補正 Correcting Diffusion Generation through Resampling ( http://arxiv.org/abs/2312.06038v1 ) ライセンス: Link先を確認	Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang	(参考訳) 拡散モデルが複雑な分布をモデル化する能力は優れているが、生成画像と基底画像の間には、まだ自明な分布の相違があり、画像生成において、テキスト対画像生成におけるオブジェクトエラーの欠如や画像品質の低下など、いくつかの顕著な問題を引き起こしている。これらの問題に対処しようとする既存の手法は、分布的不一致であるこれらの問題の背後にある根本的な原因に対処しがちであり、従って最適以下の結果を得る。本稿では,分散の相違を明示的に低減し,両問題を効果的に解決できる粒子フィルタリングフレームワークを提案する。具体的には,実画像と事前学習対象物検出装置のセットを含む外部ガイダンスのセットを用いて,分布ギャップを計測し,そのギャップを補正するために再サンプリング重量を設計する。実験の結果,提案手法はオブジェクトの誤りを効果的に修正し,画像生成タスクの画質を向上させることができることがわかった。特に,ms-coco 上では,既存の最強ベースラインを5%,fid を 1.0 と上回っている。私たちのコードはhttps://github.com/UCSB-NLP-Chang/diffusion_resampling.gitで公開されています。 Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do not tend to address the fundamental cause behind these problems, which is the distributional discrepancies, and hence achieve sub-optimal results. In this paper, we propose a particle filtering framework that can effectively address both problems by explicitly reducing the distributional discrepancies. Specifically, our method relies on a set of external guidance, including a small set of real images and a pre-trained object detector, to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap. Experiments show that our methods can effectively correct missing object errors and improve image quality in various image generation tasks. Notably, our method outperforms the existing strongest baseline by 5% in object occurrence and 1.0 in FID on MS-COCO. Our code is publicly available at https://github.com/UCSB-NLP-Chang/diffusion_resampling.git.	翻訳日:2023-12-12 17:14:24 公開日:2023-12-10
# コード大言語モデルにおけるトロイの木馬入力のオクルージョンに基づく検出 Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code ( http://arxiv.org/abs/2312.04004v2 ) ライセンス: Link先を確認	Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Mohammad Amin Alipour, Bowen Xu	(参考訳) 大規模言語モデル(LLM)はソフトウェア開発の一体的な部分になりつつある。これらのモデルは、コードのために大きなデータセットでトレーニングされ、各データポイントの検証が難しい。したがって、潜在的攻撃面は、有毒データをトレーニングデータに注入してモデルに脆弱性を持たせることができる。モデル内にマニピュレーション的な振る舞いを隠すことで重大な脅威をもたらし、ダウンストリームタスクにおけるモデルの整合性を損なうことになる。本稿では,コードのトロイの木馬入力を識別するためのオクルージョンに基づくヒューマン・イン・ザ・ループ手法であるoseqlを提案する。この手法は、コードのトロイの木馬型ニューラルモデルが入力のトリガー部分に大きく依存しているという観察に基づいており、その除去によって予測におけるモデルの信頼性が大幅に変化する。以上の結果から,OSeqlは,ほぼ100%のリコールでトリガ入力を検出できることが示唆された。我々は偽陽性の問題と対処方法について議論する。これらの結果は今後の研究の基盤となる。 Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, leading to compromising the integrity of the models in downstream tasks. In this paper, we propose an occlusion-based human-in-the-loop technique, OSeql, to distinguish trojan-triggering inputs of code. The technique is based on the observation that trojaned neural models of code rely heavily on the triggering part of input; hence, its removal would change the confidence of the models in their prediction substantially. Our results suggest that OSeql can detect the triggering inputs with almost 100% recall. We discuss the problem of false positives and how to address them. These results provide a baseline for future studies in this field.	翻訳日:2023-12-12 12:22:59 公開日:2023-12-10

Title

Authors

Abstract

論文公表日・翻訳日

# 完全同型暗号化とプライバシ保護機械学習のためのブラインド評価フレームワーク

Blind Evaluation Framework for Fully Homomorphic Encryption and Privacy-Preserving Machine Learning ( http://arxiv.org/abs/2310.13140v2 )

ライセンス: Link先を確認

Hunjae Lee, Corey Clark,

(参考訳) FHE(Fully Homomorphic Encryption)を用いたプライバシー保護機械学習(PPML)への様々なアプローチが開発され、データ所有者による信頼できないサーバへのセキュアなデータアウトソーシングに焦点を当てている。 FHEは暗号化データ上での計算を可能にするが、特に標準プログラミングにおいて不可欠な決定式や条件文といった制御構造の統合において、大きな制限に直面している。例えば、決定木の特徴選択のために暗号化されたリストから最小値を選択するといったタスクは、暗号化された形式で比較式を評価することができないため、難しい。 FHEに関する既存の文献の多くは、これらの課題のために事前訓練されたモデルを使用して暗号化された予測に集中しており、トレーニングプロセスは、しばしばインターミディット・ラウンド・オブ・デクリプション・アンド・アセスメント(IRDE)を必要とする。 IRDEは、クライアントがプレーンテキストでデータを復号し評価することで制御構造を処理する一方、潜在的に信頼できないサーバが暗号化された計算を行う対話型通信である。 IRDEプロトコルは、暗号化プログラミングにおける制御構造問題に対する解決策を提供するが、それらのプログラムの一部が暗号化された空間(信頼できないサーバ)を離れ、秘密鍵を保持する信頼されたクライアントで実行されなければならないため、FHEの真に暗号化されたプログラムを構築するという原則に反する。このようなモデルは効率的に製造できるが、IRDEを全て必要としないモデルよりも劣るであろう。 IRDEを除去する機能により、複数のIRDEサイクルに対して信頼できるクライアントを必要とすることなく、信頼できないサーバ上で計算と制御構造を実行することができる。本稿では,Blind Evaluation Framework (BEF)を紹介した。BEFは暗号的にセキュアなプログラミングフレームワークで,条件式を評価せずに,暗号化空間における制御構造の実行を可能にする。

Various approaches to privacy-preserving machine learning (PPML) using Fully Homomorphic Encryption (FHE) have been developed, focusing on secure data outsourcing to untrusted servers by data owners. While FHE enables computation on encrypted data, it faces significant limitations, particularly in integrating control structures like decision expressions and conditional statements, which are vital in standard programming. For instance, tasks like selecting the smallest value from an encrypted list for feature selection in decision trees are challenging due to the inability to evaluate comparison expressions in encrypted form. Most existing literature on FHE have concentrated on encrypted prediction using pre-trained models due to these challenges, with training processes often requiring Intermediate Rounds of Decryption and Evaluation (IRDE). IRDE involves interactive communication where the potentially untrusted server performs encrypted computations, while the client handles control structures by decrypting and evaluating data in plaintext. While it presents a solution to the control structure problem in encrypted programming, IRDE protocols go against FHE's principles of building truly encrypted programs as portions of such programs must leave the encrypted space (untrusted server) and be executed on the trusted client, who holds the private keys for decryption. Such models, however efficiently they can be made, would be inferior to models that eliminate the need for IRDE all-together. The ability to remove IRDE allows both computation and control structures to be performed on untrusted servers without requiring trusted clients for multiple IRDE cycles. This paper introduces the Blind Evaluation Framework (BEF), a cryptographically secure programming framework enabling the execution of control structures in encrypted space without evaluating conditional expressions...

翻訳日:2024-03-19 01:54:08 公開日:2023-12-10

# MuFuzz: ブロックチェーンスマートコントラクトファズリングのためのシーケンス対応ミューテーションとシードマスクガイダンス

MuFuzz: Sequence-Aware Mutation and Seed Mask Guidance for Blockchain Smart Contract Fuzzing ( http://arxiv.org/abs/2312.04512v2 )

ライセンス: Link先を確認

Peng Qian, Hanjie Wu, Zeren Du, Turan Vural, Dazhong Rong, Zheng Cao, Lun Zhang, Yanbin Wang, Jianhai Chen, Qinming He,

(参考訳) ブロックチェーンのスマートコントラクトが普及し、より多くの価値あるデジタル資産を保有するようになると、攻撃者にとってますます魅力的なターゲットとなる。ここ数年、スマートコントラクトは壊滅的な攻撃を受け、数十億ドルの損失を被った。スマートコントラクトの欠陥を特定することに対する研究の関心が高まっている。しかし、既存のスマートコントラクトファジィツールは相変わらず不満足だ。彼らは意味のあるトランザクションシーケンスをチェックアウトし、トランザクション毎に重要な入力を指定するのに苦労する。その結果、それらが引き起こすのは限られた範囲のコントラクト状態のみであり、ディープステート空間に隠された複雑な脆弱性を明らかにするのが難しくなる。本稿では,シーケンシャル・アウェア・ミュータントとシードマスク誘導戦略を用いて,スマートコントラクトファジリングに光を当てた。特に,まずデータフローに基づくフィードバックを用いてトランザクション順序を意味のある方法で決定し,さらにより深い状態を探索するためのシーケンス認識突然変異手法を導入する。その後、生成したトランザクション入力をターゲット分岐にバイアスを与えるマスク誘導型シード突然変異戦略を設計する。さらに,ファジングキャンペーン中にファジング資源割り当てのバランスをとる動的適応型エネルギー調整パラダイムを開発する。設計を MuFuzz という新しいスマートコントラクトファザに実装し,それを3つのベンチマークで広範囲に評価する。実証的な結果は、MuFuzzが既存のツールよりも、ブランチカバレッジとバグ発見の両方で優れていることを示している。全体として、MuFuzzは最先端のファズーよりも高いブランチカバレッジ(25%まで)を実現し、既存のバグ検知器よりも30%多くのバグを検出する。

As blockchain smart contracts become more widespread and carry more valuable digital assets, they become an increasingly attractive target for attackers. Over the past few years, smart contracts have been subject to a plethora of devastating attacks, resulting in billions of dollars in financial losses. There has been a notable surge of research interest in identifying defects in smart contracts. However, existing smart contract fuzzing tools are still unsatisfactory. They struggle to screen out meaningful transaction sequences and specify critical inputs for each transaction. As a result, they can only trigger a limited range of contract states, making it difficult to unveil complicated vulnerabilities hidden in the deep state space. In this paper, we shed light on smart contract fuzzing by employing a sequence-aware mutation and seed mask guidance strategy. In particular, we first utilize data-flow-based feedback to determine transaction orders in a meaningful way and further introduce a sequence-aware mutation technique to explore deeper states. Thereafter, we design a mask-guided seed mutation strategy that biases the generated transaction inputs to hit target branches. In addition, we develop a dynamic-adaptive energy adjustment paradigm that balances the fuzzing resource allocation during a fuzzing campaign. We implement our designs into a new smart contract fuzzer named MuFuzz, and extensively evaluate it on three benchmarks. Empirical results demonstrate that MuFuzz outperforms existing tools in terms of both branch coverage and bug finding. Overall, MuFuzz achieves higher branch coverage than state-of-the-art fuzzers (up to 25%) and detects 30% more bugs than existing bug detectors.

翻訳日:2024-03-18 12:56:06 公開日:2023-12-10

# 分裂の可視化:BFTプロトコルにおける信頼されたコンポーネントの役割について

Vivisecting the Dissection: On the Role of Trusted Components in BFT Protocols ( http://arxiv.org/abs/2312.05714v1 )

ライセンス: Link先を確認

Alysson Bessani, Miguel Correia, Tobias Distler, Rüdiger Kapitza, Paulo Esteves-Verissimo, Jiangshan Yu,

(参考訳) Gupta et al (EuroSys'23) による最近の論文では、信頼できるコンポーネント(TC)ベースのByzantine Fault-tolerant (BFT)プロトコルが、レプリカグループのサイズを$3f+1$から$2f+1$に下げ、そのようなプロトコルの3つの制限を特定し、代わりにTCsを使用してBFTプロトコルの性能を改善することを提案する。ここでは、両論の欠点を指摘し、BFTプロトコルにおける最も価値あるTCの使用は、クラッシュフォールトトレラント(CFT)プロトコルのように耐障害性を持たせることであり、2f+1$レプリカを使用して最大$f$の障害レプリカを許容することができることを主張する。

A recent paper by Gupta et al. (EuroSys'23) challenged the usefulness of trusted component (TC) based Byzantine fault-tolerant (BFT) protocols to lower the replica group size from $3f+1$ to $2f+1$, identifying three limitations of such protocols and proposing that TCs should be used instead to improve the performance of BFT protocols. Here, we point out flaws in both arguments and advocate that the most worthwhile use of TCs in BFT protocols is indeed to make them as resilient as crash fault-tolerant (CFT) protocols, which can tolerate up to $f$ faulty replicas using $2f+1$ replicas.

翻訳日:2024-03-18 12:46:22 公開日:2023-12-10

# 高速インターネット・コンピュータ・コンセンサス

Fast Internet Computer Consensus ( http://arxiv.org/abs/2312.05869v1 )

ライセンス: Link先を確認

Massimo Albarello, Jakub Sliwinski, Yann Vonlanthen, Roger Wattenhofer,

(参考訳) 本稿では,ビザンチンの耐故障性(BFT)設定において,単一ラウンドトリップ時間でトランザクションを確認可能な,最初の回転型リーダ状態マシンレプリケーション(SMR)プロトコルを提案する。インターネット・コンピュータ・コンセンサス(ICC)プロトコルの最小限の変更と無視可能な通信オーバーヘッドに基づいて、高速経路における最適なブロック終端遅延を可能にする新しいデュアルモード機構を導入する。重要なことに、高速経路が有効でない場合でも罰則は発生しないような操作モードが統合される。さらに,本アルゴリズムは,ビューチェンジプロトコルを必要とせず,楽観的な応答性やリーダの回転など,本来のICCプロトコルのコア特性を維持している。我々は,Fast Internet Computer Consensus(FICC)プロトコルの正当性を証明し,そのオープンソース実装を提供する。 FICCプロトコルとICCプロトコルは、グローバルに分散した広域ネットワークで比較される。評価の結果,FICC プロトコルは ICC プロトコルと比較して,さらなるセキュリティ仮定を必要とせず,レイテンシの低減を実現していることがわかった。さらに、レプリカの数を$n = 5f + 1$に増やすことで、理論上の最大33%に近いレイテンシの改善が達成可能であることを示す。我々は,ネットワークトポロジを,コンセンサスアルゴリズムのレイテンシの評価と比較において重要な要素として強調することで結論付けた。

This paper presents the first rotating leader state machine replication (SMR) protocol that allows transactions to be confirmed in just a single round-trip time in the Byzantine fault tolerance (BFT) setting. Based on minimal alterations to the Internet Computer Consensus (ICC) protocol and with negligible communication overhead, we introduce a novel dual mode mechanism that enables optimal block finalization latency in the fast path. Crucially, the modes of operation are integrated, such that even if the fast path is not effective, no penalties are incurred. Moreover, our algorithm maintains the core attributes of the original ICC protocol, including optimistic responsiveness and rotating leaders without the necessity for a view-change protocol. We prove the correctness of our Fast Internet Computer Consensus (FICC) protocol and provide an open-source implementation of it. Both the FICC and original ICC protocol are compared in a globally distributed wide-area network. Our evaluation reveals that the FICC protocol achieves reduced latency compared to the ICC protocol, without requiring additional security assumptions. Furthermore, by increasing the number of replicas to $n = 5f + 1$, we exhibit that latency improvements close to the theoretical maximum of 33% are attainable. We conclude by highlighting the network topology as a significant factor in evaluating and comparing the latency of consensus algorithms.

翻訳日:2024-03-18 12:46:22 公開日:2023-12-10

# TapTree: プロセストレーベースのホスト動作モデリングとシーケンスパターンマイニングによる脅威検出フレームワーク

TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining ( http://arxiv.org/abs/2312.07575v1 )

ライセンス: Link先を確認

Mohammad Mamun, Scott Buffett,

(参考訳) システムレベルのイベントを含む監査ログは、サイバー脅威の発生に関する詳細な洞察を提供するため、行動モデリングに頻繁に使用される。しかし、監査ログ内の低レベルのシステムイベントをハイレベルな行動にマッピングすることは、潜在的なサイバー脅威を検出するためにホストのコンテキスト的行動を特定する上で大きな課題となっている。ドメインエキスパートの知識を頼りにすれば、実践的な実装が制限される可能性がある。本稿では,システムイベントのセマンティック情報をコンパイルすることでホスト動作を抽出するTapTreeを提案する。システム生成プロセスツリーとして振る舞いを抽出した後、TapTreeは振る舞いの表現としてイベントセマンティクスを統合する。アナリストのパターンマッチングワークロードをさらに削減するために、TapTreeは意味論的に等価なパターンを集約し、代表的な振る舞いを最適化する。最近のベンチマーク監査ログデータセット(DARPA OpTC)に対する評価では、TapTreeは、ツリーパターンクエリとシーケンシャルパターンマイニング技術を使用して、接続されたシステムイベントのセマンティクスを推論し、行動抽象化の高精度化と、高度なパーシスタント・スリート(APT)攻撃検出を実現している。さらに、オンラインのベースラインモデルを徐々に更新し、時間とともに新しいログパターンに適応させる方法について説明する。

Audit logs containing system level events are frequently used for behavior modeling as they can provide detailed insight into cyber-threat occurrences. However, mapping low-level system events in audit logs to highlevel behaviors has been a major challenge in identifying host contextual behavior for the purpose of detecting potential cyber threats. Relying on domain expert knowledge may limit its practical implementation. This paper presents TapTree, an automated process-tree based technique to extract host behavior by compiling system events' semantic information. After extracting behaviors as system generated process trees, TapTree integrates event semantics as a representation of behaviors. To further reduce pattern matching workloads for the analyst, TapTree aggregates semantically equivalent patterns and optimizes representative behaviors. In our evaluation against a recent benchmark audit log dataset (DARPA OpTC), TapTree employs tree pattern queries and sequential pattern mining techniques to deduce the semantics of connected system events, achieving high accuracy for behavior abstraction and then Advanced Persistent Threat (APT) attack detection. Moreover, we illustrate how to update the baseline model gradually online, allowing it to adapt to new log patterns over time.

翻訳日:2024-03-18 12:26:52 公開日:2023-12-10

# Descriptor-Conditioned Reinforcement Learning による品質多様性の相乗化

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning ( http://arxiv.org/abs/2401.08632v1 )

ライセンス: Link先を確認

Maxence Faldor, F\'elix Chalumeau, Manon Flageat, Antoine Cully

(参考訳) インテリジェンスの基本的特徴は、与えられた課題に対処したり、予期せぬ状況に適応するために、斬新で創造的な解決策を見つけることである。このことを反映して、Quality-Diversityの最適化は進化的アルゴリズムのファミリーであり、多種多様な高性能なソリューションのコレクションを生成する。これらの中、map-elitesは進化ロボティクスを含む様々な分野にうまく適用された顕著な例である。しかし、MAP-Elitesは遺伝的アルゴリズムから派生したランダムな突然変異を持つ分岐探索を行い、低次元解の進化する集団に限られる。 pga-map-elitesはこの制限を、大規模ニューラルネットワークの進化を可能にする深層強化学習にインスパイアされた勾配ベースの変分演算子を用いて克服する。多くの環境で高い性能を示すが、PGA-MAP-Elitesは勾配に基づく変動作用素の収束探索が多様性を妨げるいくつかのタスクで失敗する。 In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. 提案手法であるDCG-MAP-Elitesは、7つの困難な連続制御ロコモーションタスクのベースラインと同等以上のQDスコアとカバレッジを達成する。

A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.

翻訳日:2024-01-22 09:51:18 公開日:2023-12-10

# 量子遺伝アルゴリズムの探求:死の谷を横切る冒険

The Conquest of Quantum Genetic Algorithms: The Adventure to Cross the Valley of Death ( http://arxiv.org/abs/2401.08631v1 )

ライセンス: Link先を確認

Rafael Lahoz-Beltra

(参考訳) 近年、AIが実りある時代を迎えている時代に初めて量子コンピュータが出現したことで、多くのAI研究者は、量子コンピュータ上で動くアルゴリズムに適応する誘惑に駆られるようになった。しかし多くの場合、量子コンピューティングの基礎となる機能や原理は従来のコンピュータとは大きく異なるため、初期の熱意はフラストレーションに終止符を打った。本稿では,ダーウィンの進化機構(いわゆる遺伝的アルゴリズム)に基づいて,進化アルゴリズムの量子バージョンを設計する際に生じる困難について論じる。この論文には、これらの進化的アルゴリズムの量子バージョンであるPythonとQISKITの両方のコードが含まれており、古典的アルゴリズムを量子バージョンに翻訳する際に生じるセットバックを読者が体験することができる。この論文で研究されているRQGA(Reduced Quantum Genetic Algorithm)と呼ばれるアルゴリズムは、他のAIアルゴリズムに共通するこれらの困難を示す例として選択されている。

In recent years, the emergence of the first quantum computers at a time when AI is undergoing a fruitful era has led many AI researchers to be tempted into adapting their algorithms to run on a quantum computer. However, in many cases the initial enthusiasm has ended in frustration, since the features and principles underlying quantum computing are very different from traditional computers. In this paper, we present a discussion of the difficulties arising when designing a quantum version of an evolutionary algorithm based on Darwin's evolutionary mechanism, the so-called genetic algorithms. The paper includes the code in both Python and QISKIT of the quantum version of one of these evolutionary algorithms allowing the reader to experience the setbacks arising when translating a classical algorithm to its quantum version. The algorithm studied in this paper, termed RQGA (Reduced Quantum Genetic Algorithm), has been chosen as an example that clearly shows these difficulties, which are common to other AI algorithms.

翻訳日:2024-01-22 09:50:53 公開日:2023-12-10

# ディープラーニングを用いたspotifyの音楽レコメンデーション

Music Recommendation on Spotify using Deep Learning ( http://arxiv.org/abs/2312.10079v1 )

ライセンス: Link先を確認

Chhavi Maheshwari

(参考訳) 約5000万曲と40億のプレイリストをホストするspotifyには、毎日膨大な量のデータがあり、600ギガバイト以上のデータがある(harvard.edu)。 Spotifyがレコメンデーションシステムで使用しているアルゴリズムはプロプライエタリで機密であるため、ビッグデータ分析とレコメンデーションのためのコードは推測のみ可能である。しかしながら、Spotifyはユーザーのプレイリストとパーソナライズされたミックスをターゲットとして、探索とエクスプロイト(kaggle.com)という2つの主要な戦略を使用していると広く説かれている。本稿では,深層学習のアプローチを応用したフィルタリングを最大限に活用することを目的としている。アーキテクチャはそれぞれ98.57%と80%のトレーニングと検証精度を達成している。

Hosting about 50 million songs and 4 billion playlists, there is an enormous amount of data generated at Spotify every single day - upwards of 600 gigabytes of data (harvard.edu). Since the algorithms that Spotify uses in recommendation systems is proprietary and confidential, code for big data analytics and recommendation can only be speculated. However, it is widely theorized that Spotify uses two main strategies to target users' playlists and personalized mixes that are infamous for their retention - exploration and exploitation (kaggle.com). This paper aims to appropriate filtering using the approach of deep learning for maximum user likeability. The architecture achieves 98.57% and 80% training and validation accuracy respectively.

翻訳日:2024-01-15 13:49:40 公開日:2023-12-10

# データレンズによる初期のChatGPTユーザ画像

Early ChatGPT User Portrait through the Lens of Data ( http://arxiv.org/abs/2312.10078v1 )

ライセンス: Link先を確認

Yuyang Deng, Ni Zhao, Xin Huang

(参考訳) ChatGPTはローンチ以来、多目的な対話型AIプラットフォームとして成功し、世界中の数百万のユーザーを集め、学術、工業、一般コミュニティに広く認知されている。本稿は,初期のGPTユーザの肖像を指差し,その進化過程を理解することを目的とする。具体的な質問には、関心のあるトピックや潜在的なキャリア、時間とともにどのように変化するかなどが含まれる。実世界のChatGPTデータセットの詳細な分析を行い、ユーザとChatGPTのマルチターン会話を行う。マルチプログレッシブアプローチにより、ターン数を調べて会話のダイナミクスを定量化し、ユーザ感情の変動を理解するために感情を計測し、最後にLDA(Latent Dirichlet Allocation)を用いて会話内の上位トピックを識別する。ユーザ人口と関心の変化を理解することによって、人間とAIの相互作用の性質の変化に光を当て、言語モデルによるユーザエンゲージメントの今後の動向を予測することを目指している。

Since its launch, ChatGPT has achieved remarkable success as a versatile conversational AI platform, drawing millions of users worldwide and garnering widespread recognition across academic, industrial, and general communities. This paper aims to point a portrait of early GPT users and understand how they evolved. Specific questions include their topics of interest and their potential careers; and how this changes over time. We conduct a detailed analysis of real-world ChatGPT datasets with multi-turn conversations between users and ChatGPT. Through a multi-pronged approach, we quantify conversation dynamics by examining the number of turns, then gauge sentiment to understand user sentiment variations, and finally employ Latent Dirichlet Allocation (LDA) to discern overarching topics within the conversation. By understanding shifts in user demographics and interests, we aim to shed light on the changing nature of human-AI interaction and anticipate future trends in user engagement with language models.

翻訳日:2024-01-15 13:49:30 公開日:2023-12-10

# ソーシャルメディアにおける攻撃的言語識別における多言語モデルの性能

The performance of multiple language models in identifying offensive language on social media ( http://arxiv.org/abs/2312.11504v1 )

ライセンス: Link先を確認

Hao Li, Brandon Bennett

(参考訳) テキスト分類は自然言語処理の分野で重要なトピックである。情報検索、デジタルライブラリ、自動抽象化、テキストフィルタリング、単語の意味的識別など多くの分野に適用されている。本研究の目的は,様々なアルゴリズムを用いて攻撃的ポストを識別し,様々な評価手法に対する性能評価を行うことである。このプロジェクトの動機は、悪質な投稿のスクリーニングを自動化することで、これらの言語の人間検閲に対する害を軽減することである。この分野は新しい分野であり、過去2年間、多くの関心にもかかわらず、犯罪の対象に焦点が当てられていない。本研究は,本研究を通じて,識別方法と識別内容に関する今後の研究を刺激するものである。

Text classification is an important topic in the field of natural language processing. It has been preliminarily applied in information retrieval, digital library, automatic abstracting, text filtering, word semantic discrimination and many other fields. The aim of this research is to use a variety of algorithms to test the ability to identify offensive posts and evaluate their performance against a variety of assessment methods. The motivation for this project is to reduce the harm of these languages to human censors by automating the screening of offending posts. The field is a new one, and despite much interest in the past two years, there has been no focus on the object of the offence. Through the experiment of this project, it should inspire future research on identification methods as well as identification content.

翻訳日:2024-01-15 13:23:34 公開日:2023-12-10

# 音声とテキストに基づく感情認識

Speech and Text-Based Emotion Recognizer ( http://arxiv.org/abs/2312.11503v1 )

ライセンス: Link先を確認

Varun Sharma

(参考訳) 感情コンピューティングは、人間の感情を理解し、解釈し、反応できるシステムや技術の開発に焦点を当てた研究分野である。特に音声感情認識(ser)は、近年研究者から多くの注目を集めている。しかしながら、多くの場合、トレーニングと評価に使用される公開データセットは、感情ラベル間で不足し、不均衡である。本研究では,これらのデータセットと各種音声データ拡張技術を組み合わせて,これらのデータセットからバランスのとれたコーパスを構築することに焦点を当てた。さらに,音声感情認識のための異なるアーキテクチャを実験した。最良システムであるマルチモーダル音声とテキストベースモデルにより,119.66のベースラインアルゴリズムの性能と比較して,UA(Unweighed Accuracy)+WA(Weighed Accuracy)の157.57のパフォーマンスが得られる。

Affective computing is a field of study that focuses on developing systems and technologies that can understand, interpret, and respond to human emotions. Speech Emotion Recognition (SER), in particular, has got a lot of attention from researchers in the recent past. However, in many cases, the publicly available datasets, used for training and evaluation, are scarce and imbalanced across the emotion labels. In this work, we focused on building a balanced corpus from these publicly available datasets by combining these datasets as well as employing various speech data augmentation techniques. Furthermore, we experimented with different architectures for speech emotion recognition. Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66

翻訳日:2024-01-15 13:23:22 公開日:2023-12-10

# 現実の論理的一貫した予後モデルとしての意識

Consciousness as a logically consistent and prognostic model of reality ( http://arxiv.org/abs/2401.00005v1 )

ライセンス: Link先を確認

Evgenii Vityaev

(参考訳) この研究は、脳が外界の因果関係を論理的に一貫性があり、予測可能な現実のモデルとして反映していることを示している。本論文は,統計的曖昧性の問題を解析,解決し,確率的最大固有規則として因果関係の形式モデルを提供する。脳は因果関係から可能なすべての推論を行う。提案された形式モデルがあいまいな推論の性質を持つことを証明し、一貫した前提から一貫した結論を導き出す。これは全ての推論の集合が知覚された世界の一貫したモデルを形成することを可能にする。因果関係は周期的予測可能特性の固定点を生成する。ジョン・セントミルによって導入された「自然」分類を考察し、対象の属性の様々な不動点が外界の「自然」分類を形成することを実証する。次に、eleanor rosch と bob rehder によって導入された「自然」圏の概念と圏の因果モデルを検討し、これらの概念を形式化する対象属性間の因果関係の不動点を実証する。もし「自然」分類が外界の物体、そして「自然」概念がこれらの物体の知覚を記述しているなら、G.トノニによって導入された統合情報理論は「自然」分類を反映した「自然」概念形成のための脳の情報プロセスを記述する。我々は、統合情報によって物体の識別精度が高いことを主張する。符号化された桁の固定点形成を示すコンピュータベースの実験が提供される。

The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prognostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and provides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent premises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal relationships may create fixed points of cyclic inter-predictable properties. We consider the "natural" classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects' attributes forms a "natural" classification of the external world. Then we consider notions of "natural" categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demonstrate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the "natural" classification describes the objects of the external world, and "natural" concepts the perception of these objects, then the theory of integrated information, introduced by G. Tononi, describes the information processes of the brain for "natural" concepts formation that reflects the "natural" classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits.

翻訳日:2024-01-15 12:41:04 公開日:2023-12-10

# 現実予測の最大精度を提供する情報非還元主義的意識理論

Informational non-reductionist theory of consciousness that providing maximum accuracy of reality prediction ( http://arxiv.org/abs/2401.00004v1 )

ライセンス: Link先を確認

E.E. Vityaev

(参考訳) 本論では,非還元主義的意識論について考察し,現実の理論や生理学・心理学理論には適用できない。 D.I.Dubrovskyの"Mind-Brain Problem"への"情報的アプローチ"に続いて、観察された現象に関する情報のプリズムを通じて現実を考察する。この枠組みの中では、意識情報理論(ITS)の発展の次の原則が提案されている:脳は外界におけるすべての因果関係を発見し、それらによって可能なすべての推論を行う。本論文は,(1)外界構造に関する情報法則にも基づき,(2)脳機能系とセル・アンサンブルの構造と機能を説明する,(3)予測の最大精度と現実の予測を保証すること,(4)出現する矛盾を解決すること,(5)は脳の現実を反映した情報理論である。

The paper considers a non-reductionist theory of consciousness, which is not reducible to theories of reality and to physiological or psychological theories. Following D.I.Dubrovsky's "informational approach" to the "Mind-Brain Problem", we consider the reality through the prism of information about observed phenomena, which, in turn, is perceived by subjective reality through sensations, perceptions, feelings, etc., which, in turn, are information about the corresponding brain processes. Within this framework the following principle of the Information Theory of Consciousness (ITS) development is put forward: the brain discovers all possible causal relations in the external world and makes all possible inferences by them. The paper shows that ITS built on this principle: (1) also base on the information laws of the structure of external world; (2) explains the structure and functioning of the brain functional systems and cellular ensembles; (3) ensures maximum accuracy of predictions and the anticipation of reality; (4) resolves emerging contradictions and (5) is an information theory of the brain's reflection of reality.

翻訳日:2024-01-15 12:40:38 公開日:2023-12-10

# 完全テスト時間適応のための特異値ペナライゼーションと意味データ拡張

Singular Value Penalization and Semantic Data Augmentation for Fully Test-Time Adaptation ( http://arxiv.org/abs/2312.08378v1 )

ライセンス: Link先を確認

Houcheng Su, Daixian Liu, Mengzhu Wang, Wei Wang

(参考訳) 完全なテスト時間適応(FTTA)は、テストフェーズ中にソースドメイン上でトレーニングされたモデルをターゲットドメインに適応させる。既存の手法は通常エントロピー最小化を採用し、目標予測結果の不確実性を低減し、FTTAの性能を向上させる。しかし、ターゲット予測結果の多様性を保証することができない。最近の領域適応研究は、予測結果の特異値の和を最大化すれば、その信頼性(識別可能性)と多様性を同時に向上できることを示した。しかし、トレーニング段階では、大きな特異値は通常損失最大化において支配的な位置を占める。その結果、モデルは識別し易いクラスに対する識別可能性を高める傾向が強くなり、多様性の改善は不十分である。さらに、FTTAの適応と予測は、現在のバッチのデータのみを使用し、オーバーフィッティングのリスクにつながる可能性がある。上記の問題に対処するため,我々は特異値の和を最大化し,その分散を最小化する。これにより、モデルがより小さな特異値に焦点を合わせ、より挑戦的なクラス間の差別性を高め、予測結果の多様性を効果的に増大させることができる。さらに,前回のバッチからのデータを取り込んで,現在のバッチに対する意味的データ拡張を実現し,過剰フィッティングのリスクを低減した。ベンチマークデータセットを広範囲に実験した結果,提案手法は比較対象のftta法を上回っている。

Fully test-time adaptation (FTTA) adapts a model that is trained on a source domain to a target domain during the testing phase, where the two domains follow different distributions and source data is unavailable during the training phase. Existing methods usually adopt entropy minimization to reduce the uncertainty of target prediction results, and improve the FTTA performance accordingly. However, they fail to ensure the diversity in target prediction results. Recent domain adaptation study has shown that maximizing the sum of singular values of prediction results can simultaneously enhance their confidence (discriminability) and diversity. However, during the training phase, larger singular values usually take up a dominant position in loss maximization. This results in the model being more inclined to enhance discriminability for easily distinguishable classes, and the improvement in diversity is insufficiently effective. Furthermore, the adaptation and prediction in FTTA only use data from the current batch, which may lead to the risk of overfitting. To address the aforementioned issues, we propose maximizing the sum of singular values while minimizing their variance. This enables the model's focus toward the smaller singular values, enhancing discriminability between more challenging classes and effectively increasing the diversity of prediction results. Moreover, we incorporate data from the previous batch to realize semantic data augmentation for the current batch, reducing the risk of overfitting. Extensive experiments on benchmark datasets show our proposed approach outperforms some compared state-of-the-art FTTA methods.

翻訳日:2023-12-16 03:23:23 公開日:2023-12-10

# i'm hoi:3次元オブジェクトインタラクションの慣性認識モノクロキャプチャ

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions ( http://arxiv.org/abs/2312.08869v1 )

ライセンス: Link先を確認

Chengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, Lan Xu

(参考訳) 私たちは、センサー能力の豊富な多様で「スマート」なデバイスに囲まれた世界に住んでいる。人類とこれらの物体の相互作用を便利に捉えている。本稿では,RGBカメラと物体搭載慣性測定ユニット(IMU)の最小限の量を用いて,人間と物体の3次元運動を忠実に捉えるモノクラースキームI'm-HOIを提案する。一般的な動き推論とカテゴリー認識の洗練を兼ね備えている。前者に対しては、IMU信号とRGBストリームを融合させ、段階的に人間の動きを回復し、その後に付随する物体の動きを回復する全体的対象追跡手法を導入する。後者については、IMUの生観測と前段階の結果の両方をパラメータ化表現の下で条件付けしたカテゴリー対応運動拡散モデルを提案する。初期結果を著しく洗練し、鮮やかな体、手、物体の動きを生成する。さらに,人間と物体の動き,RGBの高密度入力,およびリッチな物体搭載IMU測定による大規模データセットをコントリビュートする。広汎な実験は、ハイブリッドキャプチャ環境でのI'm-HOIの有効性を示す。私たちのデータセットとコードはコミュニティにリリースされます。

We are living in a world surrounded by diverse and "smart" devices with rich modalities of sensing ability. Conveniently capturing the interactions between us humans and these objects remains far-reaching. In this paper, we present I'm-HOI, a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting: using a minimal amount of RGB camera and object-mounted Inertial Measurement Unit (IMU). It combines general motion inference and category-aware refinement. For the former, we introduce a holistic human-object tracking method to fuse the IMU signals and the RGB stream and progressively recover the human motions and subsequently the companion object motions. For the latter, we tailor a category-aware motion diffusion model, which is conditioned on both the raw IMU observations and the results from the previous stage under over-parameterization representation. It significantly refines the initial results and generates vivid body, hand, and object motions. Moreover, we contribute a large dataset with ground truth human and object motions, dense RGB inputs, and rich object-mounted IMU measurements. Extensive experiments demonstrate the effectiveness of I'm-HOI under a hybrid capture setting. Our dataset and code will be released to the community.

翻訳日:2023-12-15 22:37:52 公開日:2023-12-10

# 細胞レベルでの機能組織単位の半監督的セグメンテーション

Semi-Supervised Segmentation of Functional Tissue Units at the Cellular Level ( http://arxiv.org/abs/2305.02148v2 )

ライセンス: Link先を確認

Volodymyr Sydorskyi, Igor Krashenyi, Denis Sakva and Oleksandr Zarichkovyi

(参考訳) 本稿では,最新の深層学習セマンティックセマンティックセマンティクスアプローチと,ドメイン適応と半教師付き学習技術を用いた,細胞レベルでの機能組織単位セマンティクスの新しい手法を提案する。このアプローチにより、ドメインギャップの最小化、クラス不均衡、HPAとHubMAPデータセット間の設定の影響のキャプチャが可能になる。提案手法は, 細胞レベルでの機能的組織単位のセグメンテーションにおける現状と同等である。ソースコードはhttps://github.com/VSydorskyy/hubmap_2022_htt_solutionで入手できる。

We present a new method for functional tissue unit segmentation at the cellular level, which utilizes the latest deep learning semantic segmentation approaches together with domain adaptation and semi-supervised learning techniques. This approach allows for minimizing the domain gap, class imbalance, and captures settings influence between HPA and HubMAP datasets. The presented approach achieves comparable with state-of-the-art-result in functional tissue unit segmentation at the cellular level. The source code is available at https://github.com/VSydorskyy/hubmap_2022_htt_solution

翻訳日:2023-12-14 20:49:12 公開日:2023-12-10

# 一般化グラフプロンプト:グラフ上の事前学習とダウンストリームタスクの統合に向けて

Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs ( http://arxiv.org/abs/2311.15317v2 )

ライセンス: Link先を確認

Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen and Xinming Zhang

(参考訳) グラフニューラルネットワークはグラフ表現学習の強力なツールとして登場したが、そのパフォーマンスはタスク固有の監督に大きく依存している。ラベル付け要求を減らすため、"pre-train, prompt"パラダイムはますます一般的になっている。しかしながら、グラフ上でのプロンプトに関する既存の研究は限定的であり、異なる下流タスクにアピールするための普遍的な治療法が欠如している。本稿では,グラフの事前学習と促進のための新しいフレームワークであるGraphPromptを提案する。 graphpromptは、事前トレーニングとダウンストリームのタスクを共通のタスクテンプレートに統合するだけでなく、学習可能なプロンプトを使用して、事前トレーニングされたモデルから最も関連する知識をタスク固有の方法で特定する。この2つのステージでGraphPromptをさらに強化するために、GraphPrompt+に2つの大きな拡張を加えました。まず、単純なリンク予測以上のグラフ事前学習タスクを一般化し、タスクテンプレートとの互換性を広げる。次に,事前学習したグラフエンコーダの各層に一連のプロンプトベクトルを組み込んだ,より一般化されたプロンプト設計を提案する。最後に、GraphPromptとGraphPrompt+を評価し分析するために、5つの公開データセットに関する広範な実験を行う。

Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.

翻訳日:2023-12-14 20:04:32 公開日:2023-12-10

# アルゴリズムガバナンスにおけるフレキシビリティ向上のためのスマートハイブリッド契約の利用について

On the Use of Smart Hybrid Contracts to Provide Flexibility in Algorithmic Governance ( http://arxiv.org/abs/2312.07565v1 )

ライセンス: Link先を確認

Carlos Molina-Jimenez and Sandra Milena Felizia

(参考訳) 法律の施行を自動化するためのコンピュータ技術の利用は、官僚的な手続きを単純化するための有望な代替手段である。しかし、不注意な自動化は、個人やマイノリティの特質を考慮しないアルゴリズムによって駆動される、柔軟で非人間的な法執行システムをもたらす可能性がある。本稿では,規制を盲目的に強制するよりも,監視にデプロイされたハイブリッドスマートコントラクトが柔軟性の向上に有効であることを論じる。厳格な予防が必要とされる場合に限って実施は適切な代替手段であるが,監視に基づく修正アプローチの方が柔軟で適切である場合が多いと論じる。柔軟性を高めるために、ハイブリッドスマートコントラクトは、人間の判断が必要なとき、人間またはそのグループの介入を要求するために停止するようにプログラムすることができる。

The use of computer technology to automate the enforcement of law is a promising alternative to simplify bureaucratic procedures. However, careless automation might result in an inflexible and dehumanise law enforcement system driven by algorithms that do not account for the particularities of individuals or minorities. In this paper, we argue that hybrid smart contracts deployed to monitor rather than to blindly enforce regulations can be used to add flexibility. Enforcement is a suitable alternative only when prevention is strictly necessary; however, we argue that in many situations a corrective approach based on monitoring is more flexible and suitable. To add more flexibility, the hybrid smart contract can be programmed to stop to request the intervention of a human or of a group of them when human judgement is needed.

翻訳日:2023-12-14 18:18:40 公開日:2023-12-10

# 動的非エルミート皮膚効果の観察

Observation of dynamic non-Hermitian skin effects ( http://arxiv.org/abs/2312.07564v1 )

ライセンス: Link先を確認

Zhen Li, Li-Wei Wang, Xulong Wang, Zhi-Kang Lin, Guancong Ma, and Jian-Hua Jiang

(参考訳) 非エルミート効果は、非平衡系の理解を大きく変える物質の位相操作の新しいパラダイムとして登場し、例外点やスペクトルトポロジーといった新しい概念や、非エルミート皮膚効果(nhses)のようなエキゾチックな現象を導入している。しかしながら、既存のほとんどの研究は非エルミート固有状態に焦点を当てているが、非エルミート系の動的性質は、波動自己修復、カイラルゼナートンネル、実験ではまだ確認されていない動的NHSEなどの予期せぬ現象を予測して、ごく最近まで議論されてきた。本稿では, 波長可変な一次元非共役二重鎖力学系を用いて, リッチな非エルミート系皮膚力学を初めて実験的に観察した。注目すべきは、動的NHSEは異なる動的相の様々な動的挙動で観察され、一般化されたブリルアンゾーンと関連する概念を通して理解できるこれらの相の興味深い性質を明らかにすることである。さらに、観測された波長可変非エルミート皮膚のダイナミックスと増幅、バルク一方向波伝播、および境界波トラップは、制御可能でロバストな方法で波を誘導し、トラップし、増幅する有望な方法を提供する。本研究は, 物質非平衡相の研究を融合させ, 情報処理に新たな応用をもたらす非エルミタン動力学への新たな道を開くことを目的とした。

Non-Hermitian effects have emerged as a new paradigm for the manipulation of phases of matter that profoundly changes our understanding of non-equilibrium systems, introducing novel concepts such as exceptional points and spectral topology, as well as exotic phenomena such as non-Hermitian skin effects (NHSEs). Most existing studies, however, focus on non-Hermitian eigenstates, whereas dynamic properties of non-Hermitian systems have been discussed only very recently, predicting unexpected phenomena such as wave self-healing, chiral Zener tunneling, and the dynamic NHSEs that are not yet confirmed in experiments. Here, we report the first experimental observation of rich non-Hermitian skin dynamics using tunable one-dimensional nonreciprocal double-chain mechanical systems with glide-time symmetry. Remarkably, dynamic NHSEs are observed with various dynamic behaviors in different dynamic phases, revealing the intriguing nature of these phases that can be understood via the generalized Brillouin zone and the related concepts. Moreover, the observed tunable non-Hermitian skin dynamics and amplifications, the bulk unidirectional wave propagation, and the boundary wave trapping provide promising ways to guide, trap, and amplify waves in a controllable and robust way. Our findings unveil the fundamental aspects and open a new pathway toward non-Hermitian dynamics, which will fertilize the study of non-equilibrium phases of matter and give rise to novel applications in information processing.

翻訳日:2023-12-14 18:18:26 公開日:2023-12-10

# メタモデルと文法の共進化のための自動支援に向けて

Towards Automated Support for the Co-Evolution of Meta-Models and Grammars ( http://arxiv.org/abs/2312.07582v1 )

ライセンス: Link先を確認

Weixing Zhang

(参考訳) ブレンドモデリングは、同じ基礎となるモデリング言語のための複数の表記間のシームレスな相互作用を含む新興パラダイムである。我々はメタモデルに基づくモデル駆動工学(MDE)アプローチに注目し,モデリングツールのブレンドモデリング機能を改善するためにテキスト言語を開発する。本稿ではメタモデルに基づくMDE設定において,言語技術者がテキスト言語を開発する際に,メタモデルと文法の共進化を支援する手法を提案する。まず,混合モデリングをサポートするモデリングツールの課題と限界を総合的に報告し,その改善の機会について報告する。第2に,言語技術者が必要に応じてXtextのジェネレータ機能を拡張できることを実証する。第3に,生成文法を持つ言語をpython型言語に変換する半自動的手法を提案する。最後に、異なるスタイルの言語の迅速なプロトタイピングと、進化する言語のメタモデルと文法の共進化をサポートするソリューション(グラマー最適化)を提供する。

Blended modeling is an emerging paradigm involving seamless interaction between multiple notations for the same underlying modeling language. We focus on a model-driven engineering (MDE) approach based on meta-models to develop textual languages to improve the blended modeling capabilities of modeling tools. In this thesis, we propose an approach that can support the co-evolution of meta-models and grammars as language engineers develop textual languages in a meta-model-based MDE setting. Firstly, we comprehensively report on the challenges and limitations of modeling tools that support blended modeling, as well as opportunities to improve them. Second, we demonstrate how language engineers can extend Xtext's generator capabilities according to their needs. Third, we propose a semi-automatic method to transform a language with a generated grammar into a Python-style language. Finally, we provide a solution (i.e., GrammarOptimizer) that can support rapid prototyping of languages in different styles and the co-evolution of meta-models and grammars of evolving languages.

翻訳日:2023-12-14 18:12:15 公開日:2023-12-10

# スライス処理技術とct画像からのxception分類器を用いたcovid-19検出

COVID-19 Detection Using Slices Processing Techniques and a Modified Xception Classifier from Computed Tomography Images ( http://arxiv.org/abs/2312.07580v1 )

ライセンス: Link先を確認

Kenan Morani

(参考訳) 本稿では,従来の診断方法を拡張し,CT画像からCOVID-19を検出する方法を提案する。モデル誤分類を減らすために、画像処理の2つの重要なステップが採用された。まず、上側と下側のスライスが取り除かれ、各患者のスライスの60%が保存された。第2に、全てのスライスは肺領域を強調するために手作業で切り刻みを行った。その後、Xception Transfer Learning Modelに再サイズのCTスキャン(224×224)を入力した。 Xceptionのアーキテクチャと事前訓練された重量を活用して、修正されたモデルはバイナリ分類を実現した。 COV19-CTデータベースで得られた結果から, 従来のソリューションと同一データセットの代替品と比較して, スライスレベルと患者レベルのマクロF1スコアが高かった。

This paper extends our previous method for COVID-19 diagnosis, proposing an enhanced solution for detecting COVID-19 from computed tomography (CT) images. To decrease model misclassifications, two key steps of image processing were employed. Firstly, the uppermost and lowermost slices were removed, preserving sixty percent of each patient's slices. Secondly, all slices underwent manual cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by 224) were input into an Xception transfer learning model. Leveraging Xception's architecture and pre-trained weights, the modified model achieved binary classification. Promising results on the COV19-CT database showcased higher validation accuracy and macro F1 score at both the slice and patient levels compared to our previous solution and alternatives on the same dataset.

翻訳日:2023-12-14 18:11:44 公開日:2023-12-10

# 価値アライメント戦略としての脳から機械への交叉受精共感

Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy ( http://arxiv.org/abs/2312.07579v1 )

ライセンス: Link先を確認

Devin Gonier, Adrian Adduci, Cassidy LoCascio

(参考訳) AIアライメント研究は、マシンによる独立したアクションが常に倫理的であることを保証するために、人間とAIの目標を調整することを目指している。本論文は, より誘惑的なアプローチを優先してしばしば無視されるにもかかわらず, この課題に対して共感が不可欠であると主張している。我々は、倫理と共感をアルゴリズム的に理解する基盤として、脳の文脈内で道徳を基礎とする内在的アプローチを提供する。これらの議論は関連する文献の調査によって正当化される。この論文は、今後の研究といくつかの実験的な観察に対する提案された実験的アプローチで締めくくられる。

AI Alignment research seeks to align human and AI goals to ensure independent actions by a machine are always ethical. This paper argues empathy is necessary for this task, despite being often neglected in favor of more deductive approaches. We offer an inside-out approach that grounds morality within the context of the brain as a basis for algorithmically understanding ethics and empathy. These arguments are justified via a survey of relevant literature. The paper concludes with a suggested experimental approach to future research and some initial experimental observations.

翻訳日:2023-12-14 18:10:49 公開日:2023-12-10

# テーブルシフトを用いたタブラルデータのベンチマーク分布シフト

Benchmarking Distribution Shift in Tabular Data with TableShift ( http://arxiv.org/abs/2312.07577v1 )

ライセンス: Link先を確認

Josh Gardner, Zoran Popovic, Ludwig Schmidt

(参考訳) 分散シフトに対するロバスト性は、研究対象から現実世界への展開への移行に伴って、テキストや画像モデルに対する関心が高まっている。しかし、表型データの普及や、テキストや画像と比較して表型データに使用するモデルの違いにもかかわらず、表型機械学習タスクの分散シフトのための高品質なベンチマークはいまだに欠落している。その結果,分布シフトに対する表モデルのロバスト性はよく分かっていない。この問題に対処するため,表データの分散シフトベンチマークであるTableShiftを導入する。 TableShiftには15のバイナリ分類タスクがあり、それぞれに関連するシフトがあり、さまざまなデータソース、予測ターゲット、分散シフトが含まれている。このベンチマークは、ファイナンス、教育、公共政策、医療、市民参加を含むドメインをカバーしており、TableShift API経由でわずか数行のPythonコードでアクセスできる。ベンチマークタスクにおける頑健な学習法とドメイン一般化法とともに、最先端の表型データモデルを比較した大規模な研究を行う。本研究は,(1)分布内(ID)と分布外(OOD)の精度の線形傾向,(2)ドメインの堅牢性はシフトギャップを低減できるが,IDの精度の低減は可能であること,(3)シフトギャップ(IDとOODのパフォーマンスの差)とラベル分布のシフトとの強い関係を示す。ベンチマークデータ、pythonパッケージ、モデル実装、およびtableshiftに関するさらなる情報は、https://github.com/mlfoundations/tableshiftおよびhttps://tableshift.orgで入手できる。

Robustness to distribution shift has become a growing concern for text and image models as they transition from research subjects to deployment in the real world. However, high-quality benchmarks for distribution shift in tabular machine learning tasks are still lacking despite the widespread real-world use of tabular data and differences in the models used for tabular data in comparison to text and images. As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. TableShift contains 15 binary classification tasks in total, each with an associated shift, and includes a diverse set of data sources, prediction targets, and distribution shifts. The benchmark covers domains including finance, education, public policy, healthcare, and civic participation, and is accessible using only a few lines of Python code via the TableShift API. We conduct a large-scale study comparing several state-of-the-art tabular data models alongside robust learning and domain generalization methods on the benchmark tasks. Our study demonstrates (1) a linear trend between in-distribution (ID) and out-of-distribution (OOD) accuracy; (2) domain robustness methods can reduce shift gaps but at the cost of reduced ID accuracy; (3) a strong relationship between shift gap (difference between ID and OOD performance) and shifts in the label distribution. The benchmark data, Python package, model implementations, and more information about TableShift are available at https://github.com/mlfoundations/tableshift and https://tableshift.org .

翻訳日:2023-12-14 18:10:34 公開日:2023-12-10

# アラビア文字テキストラインデータセット

Arabic Handwritten Text Line Dataset ( http://arxiv.org/abs/2312.07573v1 )

ライセンス: Link先を確認

Hakim Bouchal and Ahror Belaid

(参考訳) アラビア語の写本をテキストや単語の行に分割することは、認識システムをより効率的かつ正確にするための重要なステップである。テキスト行へのセグメンテーションの問題は、このタスク専用の注釈付きデータセットがあるため解決される。しかし、私たちの知る限りでは、アラビア語のテキストの位置を示すデータセットは存在しない。本稿では,単語レベルでの位置をアノテートする歴史的アラビア語文字用に特別に設計された新しいデータセットを提案する。

Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.

翻訳日:2023-12-14 18:09:49 公開日:2023-12-10

# 視覚障害者の屋外障害物検出に向けたYOLOモデルの検討

Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People ( http://arxiv.org/abs/2312.07571v1 )

ライセンス: Link先を確認

Chenhao He and Pramit Saha

(参考訳) 深層学習に基づく物体検出の利用は、視覚障害者の障害回避を支援する効果的なアプローチである。本稿では,7種類のYOLOオブジェクト検出モデルであるtextit{viz}を実装した。 YOLO-NAS, YOLO-NAS (小, 中, 大), YOLOv8, YOLOv7, YOLOv6, YOLOv5は, 慎重に調整したハイパーパラメータを用いて包括的評価を行い, 道路や歩道で提示される日常的物体を含む画像に対して, これらのモデルがどのように実行されたかを分析した。系統的な調査の後、YOLOv8は最高のモデルであることが判明し、この分野の研究者が収集した画像とともにVOCデータセット、COCOデータセット、TT100Kデータセットの画像を含む、よく知られたObstacle Datasetの80\%$と68.2\%$のリコールに達した。 YOLO-NASは最新のモデルであり、他の多くのアプリケーションで優れた性能を示すが、障害物検出タスクには最適であることがわかった。

The utilization of deep learning-based object detection is an effective approach to assist visually impaired individuals in avoiding obstacles. In this paper, we implemented seven different YOLO object detection models \textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and YOLOv5 and performed comprehensive evaluation with carefully tuned hyperparameters, to analyze how these models performed on images containing common daily-life objects presented on roads and sidewalks. After a systematic investigation, YOLOv8 was found to be the best model, which reached a precision of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which includes images from VOC dataset, COCO dataset, and TT100K dataset along with images collected by the researchers in the field. Despite being the latest model and demonstrating better performance in many other applications, YOLO-NAS was found to be suboptimal for the obstacle detection task.

翻訳日:2023-12-14 18:09:13 公開日:2023-12-10

# 符号化ストレージシステムからの量子プライベート情報検索

Quantum Private Information Retrieval from Coded Storage Systems ( http://arxiv.org/abs/2312.07570v1 )

ライセンス: Link先を確認

Matteo Allaix

(参考訳) 広範なデータ成長の時代には、データストレージシステム(dss)のような膨大なデジタル情報を保管し管理するために、堅牢で効率的なメカニズムが必要となる。同時に、プライバシに関する懸念が生じ、プライバシを保護しながらデータアクセスを可能にするPrivate Information Retrieval(PIR)のようなテクニックの開発につながっている。 PIRプロトコルは、ユーザがクエリやアクセスしているデータの詳細を明らかにすることなく、データベースから情報を取得することを可能にする。量子コンピューティングの出現により、研究者は情報検索におけるプライバシーを高めるために量子システムを使用する可能性を探った。量子プライベート情報検索 (Quantum Private Information Retrieval, QPIR) プロトコルでは、複数のサーバから量子システムをダウンロードしてデータベースから情報を取得すると同時に、アクセスされている特定の情報に対してサーバが邪魔にならないことを保証する。このシナリオは、量子システムの固有の特性を活用して、古典的なPIRプロトコルと比較して、プライバシー保証の強化と通信速度の改善を提供する。この論文では、クエリと符号化ストレージシステムが古典的であり、サーバからの応答が量子的であるqpirの設定を検討する。この問題はsongらによって複製保存と異なる結束パターンで処理された。この論文は、既知の古典的PIRプロトコルと量子通信アルゴリズムを組み合わせることで、符号化ストレージ用のQPIRプロトコルを開発し、プライバシーと通信コストを向上することを目的としている。我々は、異なる記憶符号と堅牢性仮定を検討し、達成された通信コストが常に古典的通信コストよりも低いことを証明した。

In the era of extensive data growth, robust and efficient mechanisms are needed to store and manage vast amounts of digital information, such as Data Storage Systems (DSSs). Concurrently, privacy concerns have arisen, leading to the development of techniques like Private Information Retrieval (PIR) to enable data access while preserving privacy. A PIR protocol allows users to retrieve information from a database without revealing the specifics of their query or the data they are accessing. With the advent of quantum computing, researchers have explored the potential of using quantum systems to enhance privacy in information retrieval. In a Quantum Private Information Retrieval (QPIR) protocol, a user can retrieve information from a database by downloading quantum systems from multiple servers, while ensuring that the servers remain oblivious to the specific information being accessed. This scenario offers a unique advantage by leveraging the inherent properties of quantum systems to provide enhanced privacy guarantees and improved communication rates compared to classical PIR protocols. In this thesis we consider the QPIR setting where the queries and the coded storage systems are classical, while the responses from the servers are quantum. This problem was treated by Song et al. for replicated storage and different collusion patterns. This thesis aims to develop QPIR protocols for coded storage by combining known classical PIR protocols with quantum communication algorithms, achieving enhanced privacy and communication costs. We consider different storage codes and robustness assumptions, and we prove that the achieved communication cost is always lower than the classical counterparts.

翻訳日:2023-12-14 18:08:38 公開日:2023-12-10

# 単眼深度推定のためのニューラルネットワーク構造の一般性に関する研究

A Study on the Generality of Neural Network Structures for Monocular Depth Estimation ( http://arxiv.org/abs/2301.03169v3 )

ライセンス: Link先を確認

Jinwoo Bae and Kyumin Hwang and Sunghoon Im

(参考訳) 単眼深度推定は広く研究されており、近年は性能が大幅に向上している。しかしながら、KITTIデータセットのようないくつかのベンチマークデータセットで以前の研究が評価されており、いずれの論文も単眼深度推定の一般化性能の詳細な分析を提供していない。本稿では,単眼深度推定の一般化に向けて,様々なバックボーンネットワーク(cnnやトランスフォーマモデルなど)について深く検討する。まず,ネットワークトレーニング中に一度も見られなかった分布内および分布外両方のモデルを評価する。次に,合成テクスチャシフトデータセットを用いて,cnn/トランスフォーマモデル中間層からの表現の内部特性について検討する。広範な実験により,トランスフォーマーは強いテクスチャバイアスを持つCNNよりも強い形状バイアスを示すことが明らかとなった。また,テクスチャバイアスモデルでは,形状バイアスモデルよりも単眼深度推定の一般化性能が劣ることがわかった。我々は、様々な環境下でキャプチャされた実世界の運転データセットで、同様の側面が観察されることを示した。最後に,現代の戦略に活用される各種バックボーンネットワークを用いた高密度アブレーション研究を行った。実験により, cnnの固有局所性とトランスフォーマーの自己付着がテクスチャバイアスと形状バイアスをそれぞれ引き起こすことが示された。

Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone networks (e.g.CNN and Transformer models) toward the generalization of monocular depth estimation. First, we evaluate state-of-the-art models on both in-distribution and out-of-distribution datasets, which have never been seen during network training. Then, we investigate the internal properties of the representations from the intermediate layers of CNN-/Transformer-based models using synthetic texture-shifted datasets. Through extensive experiments, we observe that the Transformers exhibit a strong shape-bias rather than CNNs, which have a strong texture-bias. We also discover that texture-biased models exhibit worse generalization performance for monocular depth estimation than shape-biased models. We demonstrate that similar aspects are observed in real-world driving datasets captured under diverse environments. Lastly, we conduct a dense ablation study with various backbone networks which are utilized in modern strategies. The experiments demonstrate that the intrinsic locality of the CNNs and the self-attention of the Transformers induce texture-bias and shape-bias, respectively.

翻訳日:2023-12-13 20:53:10 公開日:2023-12-10

# 安全であるべき: 分子設計のための新しい枠組み

Gotta be SAFE: A New Framework for Molecular Design ( http://arxiv.org/abs/2310.10773v2 )

ライセンス: Link先を確認

Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan S.C Lim, Prudencio Tossou

(参考訳) SMILESのような伝統的な分子文字列表現は、しばしばAI駆動の分子設計に挑戦する。この問題に対処するため,我々は化学構造のための新しい線記法であるシーケンシャルアタッチメントに基づくフラグメント埋め込み(safe)を導入する。 SAFEはSMILES文字列を、既存のSMILESパーサとの互換性を維持しながら、相互接続された断片ブロックの順序のないシーケンスとして再定義する。足場装飾、フラグメントリンク、ポリマー生成、足場ホッピングなどの複雑な生成タスクを合理化し、フラグメント制約設計の自己回帰生成を容易にし、複雑なデコードやグラフベースモデルの必要性をなくす。我々は,110億のSAFE表現を含むデータセット上で,8700万パラメータのGPT2ライクなモデルをトレーニングすることにより,SAFEの有効性を示す。対象とする実験により,我々のSAFE-GPTモデルは多目的かつ堅牢な最適化性能を示すことを示す。 SAFEは、様々な制約の下で化学空間を迅速に探索するための新しい道を開き、AI駆動の分子設計のブレークスルーを約束する。

Traditional molecular string representations, such as SMILES, often pose challenges for AI-driven molecular design due to their non-sequential depiction of molecular substructures. To address this issue, we introduce Sequential Attachment-based Fragment Embedding (SAFE), a novel line notation for chemical structures. SAFE reimagines SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. It streamlines complex generative tasks, including scaffold decoration, fragment linking, polymer generation, and scaffold hopping, while facilitating autoregressive generation for fragment-constrained design, thereby eliminating the need for intricate decoding or graph-based models. We demonstrate the effectiveness of SAFE by training an 87-million-parameter GPT2-like model on a dataset containing 1.1 billion SAFE representations. Through targeted experimentation, we show that our SAFE-GPT model exhibits versatile and robust optimization performance. SAFE opens up new avenues for the rapid exploration of chemical space under various constraints, promising breakthroughs in AI-driven molecular design.

翻訳日:2023-12-13 19:32:51 公開日:2023-12-10

# LLMGA:マルチモーダル大言語モデルに基づく生成アシスタント

LLMGA: Multimodal Large Language Model based Generation Assistant ( http://arxiv.org/abs/2311.16500v2 )

ライセンス: Link先を確認

Bin Xia, Shiyin Wang, Yingfan Tao, Yitong Wang, and Jiaya Jia

(参考訳) 本稿では,LLMGA(Large Language Model-based Generation Assistant)を紹介し,画像生成と編集を支援するために,LLM(Large Language Models)に固有の推論,理解,応答の膨大な知識と熟練度を活用する。 MLLM(Multimodal Large Language Models)が安定拡散(SD)を制御するための固定サイズ埋め込みを生成する既存のアプローチから切り離され、LSMGAはSDを正確に制御するための詳細な言語生成プロンプトを提供する。これは、llmのコンテキスト理解を増強するだけでなく、生成プロンプトのノイズを低減し、より複雑で正確なコンテンツを持つ画像を生成し、ネットワークの解釈可能性を高める。この目的のために、即時改善、類似画像生成、$\&$のアウトペイント、視覚的質問応答を含む包括的なデータセットをキュレートする。さらに,二段階訓練方式を提案する。第1段階では、画像生成と編集の特性を把握できるようにMLLMを訓練し、詳細なプロンプトを生成する。第2段階では、SDを最適化してMLLMの生成プロンプトに合わせる。また,画像編集中に生成領域と保存領域のテクスチャ,輝度,コントラストの差異を緩和する参照ベース復元ネットワークを提案する。その結果, LLMGA は有望な生成能力を有し, 対話的手法で広範囲のアプリケーションを実現することができた。

In this paper, we introduce a Multimodal Large Language Model-based Generation Assistant (LLMGA), leveraging the vast reservoir of knowledge and proficiency in reasoning, comprehension, and response inherent in Large Language Models (LLMs) to assist users in image generation and editing. Diverging from existing approaches where Multimodal Large Language Models (MLLMs) generate fixed-size embeddings to control Stable Diffusion (SD), our LLMGA provides a detailed language generation prompt for precise control over SD. This not only augments LLM context understanding but also reduces noise in generation prompts, yields images with more intricate and precise content, and elevates the interpretability of the network. To this end, we curate a comprehensive dataset comprising prompt refinement, similar image generation, inpainting $\&$ outpainting, and visual question answering. Moreover, we propose a two-stage training scheme. In the first stage, we train the MLLM to grasp the properties of image generation and editing, enabling it to generate detailed prompts. In the second stage, we optimize SD to align with the MLLM's generation prompts. Additionally, we propose a reference-based restoration network to alleviate texture, brightness, and contrast disparities between generated and preserved regions during image editing. Extensive results show that LLMGA has promising generative capabilities and can enable wider applications in an interactive manner.

翻訳日:2023-12-13 19:23:29 公開日:2023-12-10

# ビデオ言語共同学習における弱教師付き文成分分析のための生成言語モデルの活用

Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning ( http://arxiv.org/abs/2312.06699v1 )

ライセンス: Link先を確認

Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Rahul Pratap Singh, Bishmoy Paul, Ali Dabouei, Min Xu

(参考訳) テキストデータの徹底的な理解は、マルチモーダルビデオ分析タスクの基本的な要素である。しかし、近年の研究では、現在のモデルでは、目標下流タスクのトレーニング中にテキストデータの包括的理解が得られていないことが示されている。この制限に対する以前のアプローチと直交して、対象タスクに応じた文コンポーネントの重要性を理解することで、モデルの性能が向上する可能性があると仮定する。そこで我々は,事前学習された大規模言語モデル (LLM) の知識を利用して,原文からテキストサンプルを生成する。本稿では,コンポーネントの相対的重要度を計算し,異なる映像言語タスクを改善するために,弱教師付き重要度推定モジュールを提案する。厳密な定量的解析により,提案手法は複数の映像言語タスクにおいて有意な改善を示す。特に,本手法は,ベースライン上での8.3\% と 1.4\% の相対的改善により,r@1 の観点でビデオテキスト検索を顕著に向上させる。さらに、ビデオモーメント検索では、平均的なmAPは、異なるベースラインにわたる2.0\%から13.7 \%までの相対的な改善を示している。

A thorough comprehension of textual data is a fundamental element in multi-modal video analysis tasks. However, recent works have shown that the current models do not achieve a comprehensive understanding of the textual data during the training for the target downstream tasks. Orthogonal to the previous approaches to this limitation, we postulate that understanding the significance of the sentence components according to the target task can potentially enhance the performance of the models. Hence, we utilize the knowledge of a pre-trained large language model (LLM) to generate text samples from the original ones, targeting specific sentence components. We propose a weakly supervised importance estimation module to compute the relative importance of the components and utilize them to improve different video-language tasks. Through rigorous quantitative analysis, our proposed method exhibits significant improvement across several video-language tasks. In particular, our approach notably enhances video-text retrieval by a relative improvement of 8.3\% in video-to-text and 1.4\% in text-to-video retrieval over the baselines, in terms of R@1. Additionally, in video moment retrieval, average mAP shows a relative improvement ranging from 2.0\% to 13.7 \% across different baselines.

翻訳日:2023-12-13 18:59:56 公開日:2023-12-10

# tetrirf: 自由視点ビデオのための時間的三面放射場

TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video ( http://arxiv.org/abs/2312.06713v1 )

ライセンス: Link先を確認

Minye Wu, Zehao Wang, Georgios Kouros, Tinne Tuytelaars

(参考訳) neural radiance fields (nerf)は、フォトリアリスティックなfvv(free-viewpoint video)体験を提供することで、視覚メディアの領域に革命をもたらす。しかし、この技術の重要なストレージ要件と生成とレンダリングに関わる計算の複雑さは、現在、幅広いアプリケーションを制限する。このギャップを埋めるために,本稿では,fvv(free-viewpoint video)のストレージサイズを大幅に削減する新しい技術であるtemporal tri-plane radiance fields (tetrirf)を提案する。 TeTriRFは三面体とボクセルグリッドとのハイブリッド表現を導入し、複雑な動きや急激な変化を伴う長い順列やシーンのスケーリングをサポートする。本研究では,高いトレーニング効率を実現し,時間的に一貫した低エントロピーシーン表現を実現するグループトレーニング手法を提案する。これらの表現の特性を活かして,既製のビデオコーデックを用いた圧縮パイプラインを導入し,最先端のものに比べてストレージサイズを桁違いに削減した。実験により,TeTriRFは高い圧縮速度で競争性が得られることを示した。

Neural Radiance Fields (NeRF) revolutionize the realm of visual media by providing photorealistic Free-Viewpoint Video (FVV) experiences, offering viewers unparalleled immersion and interactivity. However, the technology's significant storage requirements and the computational complexity involved in generation and rendering currently limit its broader application. To close this gap, this paper presents Temporal Tri-Plane Radiance Fields (TeTriRF), a novel technology that significantly reduces the storage size for Free-Viewpoint Video (FVV) while maintaining low-cost generation and rendering. TeTriRF introduces a hybrid representation with tri-planes and voxel grids to support scaling up to long-duration sequences and scenes with complex motions or rapid changes. We propose a group training scheme tailored to achieving high training efficiency and yielding temporally consistent, low-entropy scene representations. Leveraging these properties of the representations, we introduce a compression pipeline with off-the-shelf video codecs, achieving an order of magnitude less storage size compared to the state-of-the-art. Our experiments demonstrate that TeTriRF can achieve competitive quality with a higher compression rate.

翻訳日:2023-12-13 18:48:47 公開日:2023-12-10

# 分離エンハンス:Text2画像拡散モデルのための合成ファインタニング

Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models ( http://arxiv.org/abs/2312.06712v1 )

ライセンス: Link先を確認

Zhipeng Bao and Yijun Li and Krishna Kumar Singh and Yu-Xiong Wang and Martial Hebert

(参考訳) 拡散ベースのテキスト・ツー・イメージ(T2I)モデルによって達成された最近の顕著な進歩にもかかわらず、現在のシステムはテキストプロンプト、特にマルチオブジェクト・ジェネレーションの適切な構成生成を保証する能力は依然として低い。この研究は、注意力の低いアクティベーションスコアとマスクオーバーラップに関連する問題を指摘し、このような不一致の根本的な理由を照らしている。これまでの研究はこれらの問題に個別に取り組んできたが、総合的なアプローチが最重要であると断言する。そこで本稿では,物体マスクの重なりを減らし,注目度を最大化する2つの新しい目的,分離損失とエンハンス損失を提案する。本手法は,従来のテスト時間適応手法と異なり,限界パラメータの微調整に焦点を合わせ,スケーラビリティと一般化性を高める。総合的な評価は,画像リアリズム,テキスト・画像アライメント,適応性,特に著明なベースラインよりも優れた性能を示す。本研究は,T2I拡散モデルにおいて,合成能力の向上と適用性の向上を図っている。プロジェクトwebページはhttps://zpbao.github.io/projects/sepen/。

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. While previous research efforts have individually tackled these issues, we assert that a holistic approach is paramount. Thus, we propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability. Comprehensive evaluations demonstrate the superior performance of our model in terms of image realism, text-image alignment, and adaptability, notably outperforming prominent baselines. Ultimately, this research paves the way for T2I diffusion models with enhanced compositional capacities and broader applicability. The project webpage is available at https://zpbao.github.io/projects/SepEn/.

翻訳日:2023-12-13 18:48:25 公開日:2023-12-10

# 物理インフォームドニューラルネットワークによるオプション価格設定

Physics Informed Neural Network for Option Pricing ( http://arxiv.org/abs/2312.06711v1 )

ライセンス: Link先を確認

Ashish Dhiman and Yibei Hu

(参考訳) 物理インフォームド・ディープラーニングのアプローチをブラック・スコールズ方程式に適用し、アメリカとヨーロッパのオプションの価格設定を行う。我々は、シミュレーションと実市場データの両方でアプローチを検証し、分析/数値ベンチマークと比較した。本モデルは,市場データに対して適切な性能を示しながら,シミュレーションデータ上での価格変動を正確に把握することができる。 PINNモデルのアーキテクチャと学習プロセスについても実験を行い、パフォーマンスに影響を与える収束性や安定性の問題をより深く理解する。

We apply a physics-informed deep-learning approach the PINN approach to the Black-Scholes equation for pricing American and European options. We test our approach on both simulated as well as real market data, compare it to analytical/numerical benchmarks. Our model is able to accurately capture the price behaviour on simulation data, while also exhibiting reasonable performance for market data. We also experiment with the architecture and learning process of our PINN model to provide more understanding of convergence and stability issues that impact performance.

翻訳日:2023-12-13 18:48:05 公開日:2023-12-10

# 生成再生を伴う連続学習のためのクラスプロトタイプ条件拡散モデル

Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay ( http://arxiv.org/abs/2312.06710v1 )

ライセンス: Link先を確認

Khanh Doan, Quyen Tran, Tuan Nguyen, Dinh Phung, Trung Le

(参考訳) 破滅的な忘れを緩和することは、継続的な学習において重要なハードルである。 Deep Generative Replay (GR)は、モデルのメモリ能力を向上するために、以前のタスクからサンプルを生成する技術を提供する。生成型aiの進歩に伴い、生成型モデルは生成型逆ネットワーク(gans)からより最近の拡散モデル(dms)へと進化してきた。主な問題は、ジェネレータが出力から継続的に自己学習するため、生成データの品質がオリジナルと比較して低下することである。この劣化は、分類器で起こる壊滅的な忘れの潜在的なリスクにつながる可能性がある。そこで本研究では,連続学習のためのクラスプロトタイプ条件拡散モデル(CPDM, Class-Prototype Conditional Diffusion Model)を提案する。 CPDMの基礎は学習可能なクラスプロトタイプであり、与えられたクラスの画像のコア特性をキャプチャする。このプロトタイプは拡散モデルの復調プロセスに統合され、高品質な画像の生成を保証する。新たなタスクが導入されても古いタスクの有効性を維持し、画像生成の品質を保ち、分類器における破滅的な忘れ込みのリスクを低減する。多様なデータセットに関する実証研究により,提案手法が既存の最先端モデルを大幅に上回っており,画像品質を保ち,メモリ保持能力を向上させる能力に特筆すべき点が示された。

Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities. With the progression in generative AI, generative models have advanced from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated data compared to the original, as the generator continuously self-learns from its outputs. This degradation can lead to the potential risk of catastrophic forgetting occurring in the classifier. To address this, we propose the Class-Prototype Conditional Diffusion Model (CPDM), a GR-based approach for continual learning that enhances image quality in generators and thus reduces catastrophic forgetting in classifiers. The cornerstone of CPDM is a learnable class-prototype that captures the core characteristics of images in a given class. This prototype, integrated into the diffusion model's denoising process, ensures the generation of high-quality images. It maintains its effectiveness for old tasks even when new tasks are introduced, preserving image generation quality and reducing the risk of catastrophic forgetting in classifiers. Our empirical studies on diverse datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art models, highlighting its exceptional ability to preserve image quality and enhance the model's memory retention.

翻訳日:2023-12-13 18:47:56 公開日:2023-12-10

# AM-RADIO: 集約モデル - すべてのドメインをひとつに

AM-RADIO: Agglomerative Model -- Reduce All Domains Into One ( http://arxiv.org/abs/2312.06709v1 )

ライセンス: Link先を確認

Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

(参考訳) いくつかのビジュアルファンデーションモデル(VFM)が最近、下流タスクのバックボーンとして登場した。 CLIP、DINOv2、SAMなどのVFMは、異なる目的でトレーニングされており、さまざまな下流タスクに固有の特性を示している。概念的相違にもかかわらず、これらのモデルはマルチティーチンガー蒸留により効果的に統一モデルにマージ可能である。このアプローチをAM-RADIO(Agglomerative Model -- Reduce All Domains Into One)と呼ぶ。この統合的アプローチは、個々の教師モデルのパフォーマンスを超えるだけでなく、ゼロショット視覚言語理解、詳細なピクセルレベルの理解、オープンボキャブラリセグメンテーション機能などの特徴を融合させる。最もハードウェア効率のよいバックボーンを追求するため、同じトレーニングレシピを用いてマルチティーチンガー蒸留パイプラインの多数のアーキテクチャを評価した。これは、前任者の性能を超え、教師モデルよりも少なくとも7倍高速な新しいアーキテクチャ(E-RADIO)の開発につながった。包括的なベンチマークプロセスは、ImageNet分類、ADE20kセマンティックセグメンテーション、COCOオブジェクト検出、LLaVa-1.5フレームワークなどの下流タスクをカバーする。コード: https://github.com/nvlabs/radio

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name this approach AM-RADIO (Agglomerative Model -- Reduce All Domains Into One). This integrative approach not only surpasses the performance of individual teacher models but also amalgamates their distinctive features, such as zero-shot vision-language comprehension, detailed pixel-level understanding, and open vocabulary segmentation capabilities. In pursuit of the most hardware-efficient backbone, we evaluated numerous architectures in our multi-teacher distillation pipeline using the same training recipe. This led to the development of a novel architecture (E-RADIO) that exceeds the performance of its predecessors and is at least 7x faster than the teacher models. Our comprehensive benchmarking process covers downstream tasks including ImageNet classification, ADE20k semantic segmentation, COCO object detection and LLaVa-1.5 framework. Code: https://github.com/NVlabs/RADIO

翻訳日:2023-12-13 18:47:32 公開日:2023-12-10

# 拡散型映像編集のための中性編集フレームワーク

Neutral Editing Framework for Diffusion-based Video Editing ( http://arxiv.org/abs/2312.06708v1 )

ライセンス: Link先を確認

Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong, Chang D. Yoo

(参考訳) テキスト条件付き画像編集は拡散フレームワークに基づく様々な種類の編集に成功している。残念なことに、この成功はビデオに受け継がれず、今も挑戦を続けている。既存のビデオ編集システムはまだスタイル転送やオブジェクトオーバーレイのような剛体型編集に限られている。そこで本稿では,映像中の人物・物体の動きを変えることによって,複雑な非剛性編集を可能にするニュートラル編集(NeuEdit)フレームワークを提案する。 NeuEditは「ニュートラライゼーション」という概念を導入し、他の補助補助具(例えば、視覚マスク、ビデオキャプション)を使わずに、入力ビデオとテキストをモデルに依存しない方法で拡散ベースの編集システムのチューニング編集プロセスを強化する。多数のビデオに対する大規模な実験は、NeuEditフレームワークの適応性と有効性を示している。私たちの仕事のwebサイトはここで入手できる。 https://neuedit.github.io

Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing by changing the motion of a person/object in a video, which has never been attempted before. NeuEdit introduces a concept of `neutralization' that enhances a tuning-editing process of diffusion-based editing systems in a model-agnostic manner by leveraging input video and text without any other auxiliary aids (e.g., visual masks, video captions). Extensive experiments on numerous videos demonstrate adaptability and effectiveness of the NeuEdit framework. The website of our work is available here: https://neuedit.github.io

翻訳日:2023-12-13 18:47:12 公開日:2023-12-10

# 安全・映像サーベイランス技術に対する市民の認識を探る:調査によるアプローチ

Exploring Public's Perception of Safety and Video Surveillance Technology: A Survey Approach ( http://arxiv.org/abs/2312.06707v1 )

ライセンス: Link先を確認

Babak Rahimi Ardabili, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, Gordon Hull, Shannon Reid, Hamed Tabkhi

(参考訳) 公共の安全に取り組むには、様々な利害関係者の視点、特に他の利害関係者に比べて過小評価されるコミュニティの視点を効果的に取り入れる必要がある。本研究は,コミュニティの一般公衆安全に関する懸念,既存の監視技術に対する見解,および都市環境における安全向上のためのAI駆動型ソリューションに対する認識の包括的分析である。 2023年8月と9月に410人の参加者による対人調査を含む調査アプローチを通じて、年齢、性別、民族、教育レベルなどの要因を調査し、公衆の認識と公衆の安全と可能な解決策に対する懸念について考察する。従属変数の種類に基づき,ロジット回帰や順序ロジスティック回帰といった異なる統計学的および意義分析を用いて,各種従属変数に対する人口統計因子の影響について検討した。以上の結果から,公共安全問題における人口統計学的差異が明らかになった。若い女性は、既存のビデオ監視システムへの信頼が弱くなりがちだが、高齢の教育を受けた人はモールでの暴力犯罪に関心がある。さらに、AIによる監視に対する態度は異なる。年長の黒人個人はデータのプライバシーに懸念があるにもかかわらず、それを支持している。

Addressing public safety effectively requires incorporating diverse stakeholder perspectives, particularly those of the community, which are often underrepresented compared to other stakeholders. This study presents a comprehensive analysis of the community's general public safety concerns, their view of existing surveillance technologies, and their perception of AI-driven solutions for enhancing safety in urban environments, focusing on Charlotte, NC. Through a survey approach, including in-person surveys conducted in August and September 2023 with 410 participants, this research investigates demographic factors such as age, gender, ethnicity, and educational level to gain insights into public perception and concerns toward public safety and possible solutions. Based on the type of dependent variables, we utilized different statistical and significance analyses, such as logit regression and ordinal logistic regression, to explore the effects of demographic factors on the various dependent variables. Our results reveal demographic differences in public safety concerns. Younger females tend to feel less secure yet trust existing video surveillance systems, whereas older, educated individuals are more concerned about violent crimes in malls. Additionally, attitudes towards AI-driven surveillance differ: older Black individuals demonstrate support for it despite having concerns about data privacy, while educated females show a tendency towards skepticism.

翻訳日:2023-12-13 18:46:57 公開日:2023-12-10

# UNeR3D: 教師なし再構成における2次元画像からの可変かつスケーラブルな3D RGBポイントクラウド生成

UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction ( http://arxiv.org/abs/2312.06706v1 )

ライセンス: Link先を確認

Hongbin Lin, Juangui Xu, Qingfeng Xu, Zhengyu Hu, Handing Xu, Yunzhi Chen, Yongjun Hu, Zhenguo Nie

(参考訳) 2次元画像からの3次元再構成の領域では、3次元地上真実データに依存しない高精度な再構成を実現することが課題である。 UNeR3Dは、2次元ビューのみから詳細な3次元再構成を生成するための新しい標準を定めている。我々のモデルは、教師付きアプローチに関連するトレーニングコストを大幅に削減し、3DポイントクラウドにRGBカラー化を導入し、視覚的体験を豊かにする。色レンダリングに逆距離重み付け技術を用いることで、UNeR3Dはシームレスな色遷移を保証し、視覚的忠実度を高める。私たちのモデルの柔軟なアーキテクチャは、任意の数のビューでトレーニングをサポートします。推論中に任意のビュー数を推測し、並行しない汎用性を提供する。さらに、モデルの連続的な空間入力領域は任意の解像度で点雲を生成することができ、高解像度の3D RGB点雲を作成することができる。我々は,新しい多視点幾何学的損失と色損失により再構成過程を固め,このモデルが単視点入力に優れていることを示し,教師なし学習のパラダイムを3次元視覚で再構築する。私たちのコントリビューションは、3dビジョンの大幅な進歩を示し、さまざまなアプリケーションでコンテンツを作成するための新たな地平線を提供します。コードはhttps://github.com/HongbinLin3589/UNeR3Dで入手できる。

In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus reshaping the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D.

翻訳日:2023-12-13 18:46:33 公開日:2023-12-10

# Google Appのレビューから大学生の意見を理解する

Perceiving University Student's Opinions from Google App Reviews ( http://arxiv.org/abs/2312.06705v1 )

ライセンス: Link先を確認

Sakshi Ranjan, Subhankar Mishra

(参考訳) google app marketは、世界中のあらゆる地域から、格付けやテキストレビューを通じて、多言語的な分野でユーザーの考えを捉えている。指数的な成長のため、レビューから潜在的情報を手作業で抽出することはできない。そこで、NLPを用いた機械学習とディープラーニングアルゴリズムによる感性分析は、感情を明示的に解明し、解釈する。本研究は,アプリレビューの感情分類を行い,探索的分析により大学生のアプリ市場に対する行動を特定する。本研究では, TP, TF, TF IDFテキスト表現方式を用いて機械学習アルゴリズムを適用し, アンサンブル学習手法であるBaggingの性能評価を行った。ディープラーニングのパラダイムでは、単語埋め込み、グローブを使いました。私たちのモデルはGoogleのアプリレビューでトレーニングされ、学生のApp Reviews(SAR)でテストされました。これらのアルゴリズムの様々な組み合わせをFスコアを用いて比較し、精度と推測をグラフィカルに強調した。 SVMは他の分類器の中でも実りのある精度(93.41%)、ビッグラムのFスコア(89%)、TF IDFスキームのFスコア(89%)を与えた。バグングはlrとnbの性能を87.88%、86.69%、fスコアを86%、78%で向上させた。総合的に、Gloveの埋め込みにおけるLSTMは最高精度(95.2%)とFスコア(88%)を記録した。

Google app market captures the school of thought of users from every corner of the globe via ratings and text reviews, in a multilinguistic arena. The potential information from the reviews cannot be extracted manually, due to its exponential growth. So, Sentiment analysis, by machine learning and deep learning algorithms employing NLP, explicitly uncovers and interprets the emotions. This study performs the sentiment classification of the app reviews and identifies the university student's behavior towards the app market via exploratory analysis. We applied machine learning algorithms using the TP, TF, and TF IDF text representation scheme and evaluated its performance on Bagging, an ensemble learning method. We used word embedding, Glove, on the deep learning paradigms. Our model was trained on Google app reviews and tested on Student's App Reviews(SAR). The various combinations of these algorithms were compared amongst each other using F score and accuracy and inferences were highlighted graphically. SVM, amongst other classifiers, gave fruitful accuracy(93.41%), F score(89%) on bigram and TF IDF scheme. Bagging enhanced the performance of LR and NB with accuracy of 87.88% and 86.69% and F score of 86% and 78% respectively. Overall, LSTM on Glove embedding recorded the highest accuracy(95.2%) and F score(88%).

翻訳日:2023-12-13 18:46:07 公開日:2023-12-10

# SIFU:現実世界で使用可能な衣服再構築のためのサイドビューコンディショニングインシシシット機能

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction ( http://arxiv.org/abs/2312.06704v1 )

ライセンス: Link先を確認

Zechuan Zhang, Zongxin Yang, Yi Yang

(参考訳) 現実世界の応用のために、単一の画像から高品質な人間の3Dモデルを作成することが重要である。近年の進歩にも拘わらず、複雑なポーズや被写体画像からのゆるい衣服の正確な復元や、見えない領域のテクスチャの予測は依然として大きな課題となっている。従来の手法の重要な制限は、2Dから3Dへの遷移やテクスチャ予測における事前ガイダンスの不足である。これに対し, SIFU (Side-view Conditioned Implicit Function for Real-world Usable Human Reconstruction) は, 2次元特徴を3次元にマッピングする過程で, SMPL-X正規化をクエリとして, トランスフォーマ内でのクロスアテンション機構を用いて, サイドビューデカップリングトランスフォーマと3次元連続テクスチャリファインメントパイプラインを組み合わせた新しいアプローチである。この手法は3次元モデルの精度を向上するだけでなく、特にSMPL-X推定が完全でない場合には、その堅牢性も向上する。テクスチャリファインメントプロセスはテキストから画像への拡散をベースとして,現実的で一貫したテクスチャを生成する。広範な実験を通じて、sifuは幾何学とテクスチャの再構成の両方においてsota法を超越し、複雑なシナリオにおいて強固性を高め、前例のないシャンファーとp2sの測定を達成した。われわれのアプローチは、3Dプリンティングやシーンビルディングといった実用的応用にまで拡張され、現実世界のシナリオでその幅広い実用性を実証している。プロジェクトページ https://river-zhang.github.io/SIFU-projectpage/。

Creating high-quality 3D models of clothed humans from single images for real-world applications is crucial. Despite recent advancements, accurately reconstructing humans in complex poses or with loose clothing from in-the-wild images, along with predicting textures for unseen areas, remains a significant challenge. A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction. In response, we introduce SIFU (Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction), a novel approach combining a Side-view Decoupling Transformer with a 3D Consistent Texture Refinement pipeline.SIFU employs a cross-attention mechanism within the transformer, using SMPL-X normals as queries to effectively decouple side-view features in the process of mapping 2D features to 3D. This method not only improves the precision of the 3D models but also their robustness, especially when SMPL-X estimates are not perfect. Our texture refinement process leverages text-to-image diffusion-based prior to generate realistic and consistent textures for invisible views. Through extensive experiments, SIFU surpasses SOTA methods in both geometry and texture reconstruction, showcasing enhanced robustness in complex scenarios and achieving an unprecedented Chamfer and P2S measurement. Our approach extends to practical applications such as 3D printing and scene building, demonstrating its broad utility in real-world scenarios. Project page https://river-zhang.github.io/SIFU-projectpage/ .

翻訳日:2023-12-13 18:45:46 公開日:2023-12-10

# opensd: 統合オープンボキャブラリーセグメンテーションと検出

OpenSD: Unified Open-Vocabulary Segmentation and Detection ( http://arxiv.org/abs/2312.06703v1 )

ライセンス: Link先を確認

Shuai Li, Minghan Li, Pengfei Wang, Lei Zhang

(参考訳) 近年,汎用セグメンテーションと検出タスクに対処する統一アーキテクチャを用いて,いくつかのオープン語彙法が提案されている。しかし、タスク間の衝突やCLIPの不十分な使用により、オープン語彙能力に制限があるため、タスク固有のモデルにはまだパフォーマンスが遅れている。これらの課題に対処するため,オープンボキャブラリセグメンテーションと検出タスクの処理に同じアーキテクチャとネットワークパラメータを利用する,OpenSDと呼ばれるユニバーサルトランスフォーマーベースのフレームワークを提案する。まず,各タスクを同一の枠組み下でより効果的に学習できるように,モノとスタッフのセマンティックな対立を軽減するためのデコーダ分離学習戦略を導入する。第二に、CLIPをエンドツーエンドのセグメンテーションと検出に活用するために、語彙内ドメインと語彙外ドメインをそれぞれ扱う2つの分類器を提案する。テキストエンコーダはさらに、分離されたプロンプト・ラーニングを通じて、物と物の両方のカテゴリにリージョン対応するように訓練され、エンドツーエンドのセグメンテーションと検出に重要な、重複した低品質の予測をフィルタできる。様々な状況下で複数のデータセットに対して大規模な実験を行う。その結果,OpenSDはクローズド・オープン・ボキャブラリ設定とオープン・ボキャブラリ設定の両方において,最先端のオープン・ボキャブラリセグメンテーションと検出方法よりも優れていた。コードはhttps://github.com/strongwolf/OpenSDで入手できる。

Recently, a few open-vocabulary methods have been proposed by employing a unified architecture to tackle generic segmentation and detection tasks. However, their performance still lags behind the task-specific models due to the conflict between different tasks, and their open-vocabulary capability is limited due to the inadequate use of CLIP. To address these challenges, we present a universal transformer-based framework, abbreviated as OpenSD, which utilizes the same architecture and network parameters to handle open-vocabulary segmentation and detection tasks. First, we introduce a decoder decoupled learning strategy to alleviate the semantic conflict between thing and staff categories so that each individual task can be learned more effectively under the same framework. Second, to better leverage CLIP for end-to-end segmentation and detection, we propose dual classifiers to handle the in-vocabulary domain and out-of-vocabulary domain, respectively. The text encoder is further trained to be region-aware for both thing and stuff categories through decoupled prompt learning, enabling them to filter out duplicated and low-quality predictions, which is important to end-to-end segmentation and detection. Extensive experiments are conducted on multiple datasets under various circumstances. The results demonstrate that OpenSD outperforms state-of-the-art open-vocabulary segmentation and detection methods in both closed- and open-vocabulary settings. Code is available at https://github.com/strongwolf/OpenSD

翻訳日:2023-12-13 18:45:11 公開日:2023-12-10

# 自律走行システムにおける動的対向攻撃

Dynamic Adversarial Attacks on Autonomous Driving Systems ( http://arxiv.org/abs/2312.06701v1 )

ライセンス: Link先を確認

Amirhosein Chahe, Chenan Wang, Abhishek Jeyapratap, Kaidi Xu, Lifeng Zhou

(参考訳) 本稿では,自律運転システムのレジリエンスに挑戦する攻撃機構を提案する。具体的には、他の移動車に搭載された画面に対向パッチを動的に表示することにより、自動運転車の意思決定プロセスを操作する。これらのパッチは、オブジェクト検出モデルを誤分類対象オブジェクト(例えば、交通標識)に騙すように最適化されている。このような操作は、交差点横断や車線変更といった、安全で効率的な自律運転システムにとって不可欠な重要な多車間相互作用に重要な意味を持つ。特に、大きな貢献は4つあります。まず,パッチがターゲットと同一位置にあるのではなく,より汎用的でステルス的な攻撃を可能にする新しい攻撃手法を提案する。さらに,画面上に動的パッチを表示させ,適応的な変化と移動を可能にし,攻撃の柔軟性と性能を向上させる。そこで我々は,画面画像変換ネットワーク(SIT-Net)を設計し,表示画像の環境効果をシミュレートし,シミュレートされたシナリオと実世界のシナリオとのギャップを狭める。さらに、動的攻撃の成功率を高めるために、位置損失項を敵の訓練プロセスに統合する。最後に、焦点を単なる知覚システムへの攻撃から、自動運転システムの意思決定アルゴリズムに移す。我々の実験は、現実の自律運転シナリオにおけるこのような動的敵攻撃の実装を初めて成功させ、堅牢で安全な自律運転の分野における進歩の道を開いたものである。

This paper introduces an attacking mechanism to challenge the resilience of autonomous driving systems. Specifically, we manipulate the decision-making processes of an autonomous vehicle by dynamically displaying adversarial patches on a screen mounted on another moving vehicle. These patches are optimized to deceive the object detection models into misclassifying targeted objects, e.g., traffic signs. Such manipulation has significant implications for critical multi-vehicle interactions such as intersection crossing and lane changing, which are vital for safe and efficient autonomous driving systems. Particularly, we make four major contributions. First, we introduce a novel adversarial attack approach where the patch is not co-located with its target, enabling more versatile and stealthy attacks. Moreover, our method utilizes dynamic patches displayed on a screen, allowing for adaptive changes and movement, enhancing the flexibility and performance of the attack. To do so, we design a Screen Image Transformation Network (SIT-Net), which simulates environmental effects on the displayed images, narrowing the gap between simulated and real-world scenarios. Further, we integrate a positional loss term into the adversarial training process to increase the success rate of the dynamic attack. Finally, we shift the focus from merely attacking perceptual systems to influencing the decision-making algorithms of self-driving systems. Our experiments demonstrate the first successful implementation of such dynamic adversarial attacks in real-world autonomous driving scenarios, paving the way for advancements in the field of robust and secure autonomous driving.

翻訳日:2023-12-13 18:44:27 公開日:2023-12-10

# 集中型、拡張性、構成可能なスコアリングアプリケーションの設計とアーキテクチャ

Design and Architecture for a Centralized, Extensible, and Configurable Scoring Application ( http://arxiv.org/abs/2312.06700v1 )

ライセンス: Link先を確認

Sumit Sanwal

(参考訳) 現代の組織では、多くのソフトウェアアプリケーションがアプリケーションのワークフローと承認の次のステップを決定するために重要なインプットを必要とする。アクションの手順を決定する上で最も重要な入力の1つは、アプリケーションで使用されるエンティティのキーパフォーマンスインジケータベースのスコアである。アプリケーション内のエンティティの正しいスコアを計算することは、その後の処理を駆動し、エンティティの次のアクションの手順を正確に決定するのに役立つ重要なステップです。適切なスコアを計算することは、アプリケーション処理にとって重要なパラメータであり、正確なスコアと正しいスコアを導出することは、アプリケーションの意図した目的にとって重要であり、重要である。この記事では、汎用的な拡張可能なスコアリングエンジンの構想と設計、およびスコアリングフレームワークを実装するための関連する複雑さや複雑さとスコアリングするためのいくつかのユースケースについて論じる。

In modern-day organizations, many software applications require critical input to decide the next steps in the application workflow and approval. One of the most important inputs to decide the subsequent course of action is the key performance indicator-based scoring for the entities used in the application. Computing the right score for the entities in the application is a critical step that will drive the subsequent processing and help to decide the next course of action for the entity accurately. Computing the right score is a critical parameter for application processing; deriving the precise and correct score is crucial and pivotal for the application's intended objective; this mandates a very efficient and optimized scoring application in place and is of paramount importance for the success of such applications. We will discuss in this article how to envision and design a generic, extensible scoring engine and a few use cases for scoring with the associated intricacies and complexities to implement the scoring framework.

翻訳日:2023-12-13 18:43:45 公開日:2023-12-10

# グラフェンに基づく電気双極子分子からなる長寿命寿命制御可能なスケーラブル量子ビット

The long mean-life-time-controlled and potentially scalable qubits composed of electric dipolar molecules based on graphene ( http://arxiv.org/abs/2103.07263v3 )

ライセンス: Link先を確認

Yong-Yi Huang

(参考訳) 電気双極子分子からなる新しい種類の量子ビットを提案する。外部均一電場中の電気双極子分子は単純な調和振動を受け、その2つの最低エネルギーレベルに属する量子状態は量子ビットの状態 |0>, |1> として作用する。量子ビットの励起状態は非常に長く制御された平均寿命は260秒であり、デコヒーレンスはもはや量子計算の障害ではない。量子計算は、中性原子のように電気双極子分子の量子ビットを操作することで行うことができる。量子ビットが量子計算に使用されるとき、双極子モーメントの向きは外部の電場に沿って調和的に振動し、方向を変えることはない:電場に沿って、あるいは電場に対して、量子ビットはグラフェン系で大規模に製造できる。ライドバーグ封鎖の半径は約100nmである。演算量子ビットの数は数百万に達する。

We propose a new kind of qubits composed of electric dipolar molecules. The electric dipolar molecules in an external uniform electric field will take simple harmonic oscillations, whose quantum states belonging to the two lowest energy levels act as the states |0>, |1> of a qubit. The qubits' excited states have a very long controlled mean life time about 260 seconds, decoherence is no longer an obstacle in quantum computation. We can perform quantum computations by manipulating the qubits of electric dipolar molecules just like those of neutral atoms. When the qubits are used for quantum computations, the dipolar moments' orientations will harmonically oscillate along an external electric field and they will not change the directions: along or against the electric field, so the qubits can be large-scalely manufactured in graphene system. The radius of Rydberg blockade is about 100nm. The number of operated qubits reach several millions.

翻訳日:2023-12-13 03:55:23 公開日:2023-12-10

# kiefer-wolfowitz法による確率最適化のオンライン統計推論

Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods ( http://arxiv.org/abs/2102.03389v5 )

ライセンス: Link先を確認

Xi Chen, Zehua Lai, He Li, Yichen Zhang

(参考訳) 本稿では,ランダム探索方向を持つkiefer-wolfowitzアルゴリズムを用いて,確率最適化問題におけるモデルパラメータのオンライン統計量推定の問題について検討する。まず, 漸近共分散行列が探索方向の分布と関数-値問合せの複雑性に依存するポリak-ruppert平均型kiefer-wolfowitz (akw) 推定器の漸近分布を示す。分布結果は、統計効率と関数クエリの複雑さのトレードオフを反映している。さらに,ランダム探索方向の選択を解析し,漸近共分散行列のある種の要約統計を最小化する。漸近分布に基づいて,有効信頼区間の2つの構成手順を提供することにより,オンライン統計推定を行う。

This paper investigates the problem of online statistical inference of model parameters in stochastic optimization problems via the Kiefer-Wolfowitz algorithm with random search directions. We first present the asymptotic distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators, whose asymptotic covariance matrices depend on the distribution of search directions and the function-value query complexity. The distributional result reflects the trade-off between statistical efficiency and function query complexity. We further analyze the choice of random search directions to minimize certain summary statistics of the asymptotic covariance matrix. Based on the asymptotic distribution, we conduct online statistical inference by providing two construction procedures of valid confidence intervals.

翻訳日:2023-12-13 03:54:50 公開日:2023-12-10

# PyCSP3: Pythonの組合せ制約問題モデリング

PyCSP3: Modeling Combinatorial Constrained Problems in Python ( http://arxiv.org/abs/2009.00326v5 )

ライセンス: Link先を確認

Christophe Lecoutre and Nicolas Szczepanski

(参考訳) この文書では、PythonライブラリであるPyCSP$3$を紹介します。現在、PyCSP$3$で制約満足度と最適化問題のモデルを記述することができる。より具体的には、CSP(Constraint Satisfaction Problem)とCOP(Constraint Optimization Problem)モデルを構築することができる。重要なのは、モデルを書き、XCSP$3$のインスタンス(ファイル)を生成するために(いくつかのデータを提供しながら)それをコンパイルし、制約解決器を使ってその問題を解くことです。また、PyCSP$3$で解決手順を直接パイロットして、インクリメンタルな解決戦略を実行することもできる。このドキュメントでは、50以上のイラストモデルを持つpycsp$3$について知っておくべきことすべてを見つけることができます。

In this document, we introduce PyCSP$3$, a Python library that allows us to write models of combinatorial constrained problems in a declarative manner. Currently, with PyCSP$3$, you can write models of constraint satisfaction and optimization problems. More specifically, you can build CSP (Constraint Satisfaction Problem) and COP (Constraint Optimization Problem) models. Importantly, there is a complete separation between the modeling and solving phases: you write a model, you compile it (while providing some data) in order to generate an XCSP$3$ instance (file), and you solve that problem instance by means of a constraint solver. You can also directly pilot the solving procedure in PyCSP$3$, possibly conducting an incremental solving strategy. In this document, you will find all that you need to know about PyCSP$3$, with more than 50 illustrative models.

翻訳日:2023-12-13 03:54:07 公開日:2023-12-10

# 二元分類における逆代理リスクの存在とミニマックス定理

Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification ( http://arxiv.org/abs/2206.09098v4 )

ライセンス: Link先を確認

Natalie S. Frank, Jonathan Niles-Weed

(参考訳) 敵意訓練は、敵意攻撃に頑健な訓練方法の最も一般的な方法の1つであるが、理論的にはよく理解されていない。我々は、逆代理リスクに対する証明と存在、正則性、およびミニマックス定理を行う。本研究は,先行研究による敵のロバスト性に関する経験的観察を説明し,アルゴリズム開発における新たな方向性を示唆する。さらに, 既知の存在と, 逆分類リスクに対するミニマックス定理を拡張し, リスクを推測した。

Adversarial training is one of the most popular methods for training methods robust to adversarial attacks, however, it is not well-understood from a theoretical perspective. We prove and existence, regularity, and minimax theorems for adversarial surrogate risks. Our results explain some empirical observations on adversarial robustness from prior work and suggest new directions in algorithm development. Furthermore, our results extend previously known existence and minimax theorems for the adversarial classification risk to surrogate risks.

翻訳日:2023-12-13 03:46:50 公開日:2023-12-10

# HiFi++: 帯域拡張と音声強調のための統一フレームワーク

HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement ( http://arxiv.org/abs/2203.13086v4 )

ライセンス: Link先を確認

Pavel Andreev, Aibek Alanov, Oleg Ivanov, Dmitry Vetrov

(参考訳) 生成的敵ネットワークは、最近、自己回帰モデルやフローベースモデルよりも優れた神経ボコーディング性能を示した。本稿では,この成功を条件付き音声生成の他のタスクにも拡張できることを示す。特に,HiFi vocoders をベースとして,帯域拡張と音声強調のための新しい HiFi++ 汎用フレームワークを提案する。ジェネレータアーキテクチャの改善により、hifi++は、計算リソースを大幅に削減しながら、これらのタスクの最先端と、より良く、あるいは互換性のあるパフォーマンスを示す。本手法の有効性は, 様々な実験により検証された。

Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture, HiFi++ performs better or comparably with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.

翻訳日:2023-12-13 03:45:44 公開日:2023-12-10

# 量子状態の現実について:$\psi$-ontic モデルに対するno-go定理

On the reality of the quantum state once again: A no-go theorem for $\psi$-ontic models ( http://arxiv.org/abs/2201.11842v3 )

ライセンス: Link先を確認

Gabriele Carcassi, Andrea Oldofredi, Christine A. Aidala

(参考訳) 本稿では,Harrigan と Spekkens (HS) が定義した$\psi$-ontic モデルでは量子論を再現できないことを示す。確率に焦点をあてる代わりに、情報理論的な考察を用いて、$\psi$-onticモデルのすべての純粋状態は、量子力学に明確に違反して互いに直交しなければならないことを示す。それを考えると (i)Pusey,Barrett and Rudolph (PBR)は以前、HSが定義した$\psi$-epistemic Modelも量子力学に矛盾することを示した。 (II) HS分類はこれらの2種類のモデルによって枯渇しており、HS分類自体が量子論を再現できるモデルに空間を残さないため問題である、と結論付けている。

In this paper we show that $\psi$-ontic models, as defined by Harrigan and Spekkens (HS), cannot reproduce quantum theory. Instead of focusing on probability, we use information theoretic considerations to show that all pure states of $\psi$-ontic models must be orthogonal to each other, in clear violation of quantum mechanics. Given that (i) Pusey, Barrett and Rudolph (PBR) previously showed that $\psi$-epistemic models, as defined by HS, also contradict quantum mechanics, and (ii) the HS categorization is exhausted by these two types of models, we conclude that the HS categorization itself is problematic as it leaves no space for models that can reproduce quantum theory.

翻訳日:2023-12-13 03:44:43 公開日:2023-12-10

# データ分裂:単一のデータポイントを分割する

Data fission: splitting a single data point ( http://arxiv.org/abs/2112.11079v9 )

ライセンス: Link先を確認

James Leiner, Boyan Duan, Larry Wasserman, Aaditya Ramdas

(参考訳) 未知のパラメータを持つ既知の族において、ある分布からランダムベクトル $x$ を観測すると仮定する。いずれの場合、$x$を2つの部分に分けて$f(x)$と$g(x)$に分割することは可能で、どちらの部分も$x$をそれ自体で再構築するには十分ではありませんが、どちらも$x$を完全に回収することができ、$(f(x),g(x))$のジョイントディストリビューションは扱いやすいのでしょうか? 例えば、$X=(X_1,\dots,X_n)$と$P$が積分布であれば、任意の$m<n$に対して、サンプルを$f(X)=(X_1,\dots,X_m)$と$g(X)=(X_{m+1},\dots,X_n)$に分割することができる。 rasines and young (2022)は、付加ガウスノイズを使用する別のアプローチを提供する -- これはガウス分散データに対する有限サンプルでのポスト選択推論を可能にし、エラーがガウス的でない場合の漸近的推論を可能にする。本稿では,ベイズ推論からアイデアを借用して,データ分割の連続的類似物と見なすことのできる(相対論的)解を得る,有限サンプルの分割を実現するためのより一般的な手法を提案する。我々は、データ分割、データ彫刻、p値マスキングに代わる方法として、メソッドデータフィッションと呼ぶ。トレンドフィルタリングやその他の回帰問題に対するポストセレクション推論など,いくつかのプロトタイプアプリケーション上での手法を例示する。

Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.

翻訳日:2023-12-13 03:43:45 公開日:2023-12-10

# 熱場力学による熱量子ビットのシミュレーション

Simulating thermal qubits through thermofield dynamics ( http://arxiv.org/abs/2111.09969v6 )

ライセンス: Link先を確認

G. X. A. Petronilo, M. R. Ara\'ujo, Clebson Cruz

(参考訳) 量子コンピューティングは過去数十年間、科学界の注目を集めてきた。量子コンピュータの開発は、情報を扱い、抽出し、転送するより安全で高速な方法への一歩を約束している。しかし、量子コンピューティングの大きな利点にもかかわらず、室温で動作する量子デバイスの開発は熱デコヒーレンスプロセスによって損なわれている。さらに、ほとんどの学部や大学院の量子力学コースでは、熱場力学の研究は通常無視される。このシナリオでは、量子コンピューティングのセットアップに適用される熱場ダイナミクス(tfd)を介して熱量子ビットシステムをシミュレートするディダクティックなアプローチを探求する。この結果から, 量子演算系における熱量子ビットの実用的構築を可能にするボゴリューボフ変換を用いて, 量子ビットに対するブロッホ球表現を記述できることが示唆された。したがって、この研究は、量子コンピューティングによる熱場力学を教師や興味のある学生に導入し、TFD技術を用いて量子プロトコルに対する温度の影響を研究し、学習する。

Quantum computing has attracted the attention of the scientific community in the past few decades. The development of quantum computers promises one path toward safer and faster ways to treat, extract, and transfer information. However, despite the significant advantages of quantum computing, the development of quantum devices operating at room temperature has been compromised by the thermal decoherence process. In addition, in most undergraduate and graduate quantum mechanics courses, the study of thermofield dynamics is usually neglected. In this scenario, this work explores a didactic approach to simulate thermal qubit systems through Thermofield Dynamics (TFD), applied in a quantum computing setup. The results show that the Bloch sphere representation for a qubit can be written in terms of the Bogoliubov transformation, which allows a practical construction for the thermal qubits in a quantum computing setup. Therefore, this work introduces thermofield dynamics through quantum computing to teachers and curious students interested in teaching and learning this important field of studying the temperature impacts on quantum protocols using the TFD technique.

翻訳日:2023-12-13 03:43:02 公開日:2023-12-10

# 不確実性下の探索のためのリスクアウェアなメタレベル意思決定

Risk-aware Meta-level Decision Making for Exploration Under Uncertainty ( http://arxiv.org/abs/2209.05580v2 )

ライセンス: Link先を確認

Joshua Ott, Sung-Kyun Kim, Amanda Bouman, Oriana Peltzer, Mamoru Sobue, Harrison Delecki, Mykel J. Kochenderfer, Joel Burdick, Ali-akbar Agha-mohammadi

(参考訳) 未知環境のロボットによる探索は、センサ測定、局所化、行動実行、その他多くの要因において不確実性を考慮しなければならない不確実性の下で決定する問題である。大規模探査アプリケーションの場合、自律システムは、障害や危険地形に関連するリスクを安全に評価しながら、環境のどの領域が探検に値するかを順次決定する課題を克服しなければならない。本研究では,地域・グローバル探索に伴うトレードオフのバランスをとるためのリスク対応型メタレベル意思決定フレームワークを提案する。メタレベルの意思決定は、局所的な政策とグローバルな政策を切り替えることによって古典的な階層的なカバレッジプランナーの上に構築される。我々は, 環境史, トラバーサビリティリスク, キノダイナミック制約に関する情報を用いて, 地域政策とグローバル政策の切り替えに成功している政策実行の可能性を推論する。シミュレーションと大規模な実世界のハードウェアテストの両方で、私たちのソリューションを検証しました。その結果,局所探査とグローバル探査のバランスをとることで,大規模環境をより効率的に探索できることがわかった。

Robotic exploration of unknown environments is fundamentally a problem of decision making under uncertainty where the robot must account for uncertainty in sensor measurements, localization, action execution, as well as many other factors. For large-scale exploration applications, autonomous systems must overcome the challenges of sequentially deciding which areas of the environment are valuable to explore while safely evaluating the risks associated with obstacles and hazardous terrain. In this work, we propose a risk-aware meta-level decision making framework to balance the tradeoffs associated with local and global exploration. Meta-level decision making builds upon classical hierarchical coverage planners by switching between local and global policies with the overall objective of selecting the policy that is most likely to maximize reward in a stochastic environment. We use information about the environment history, traversability risk, and kinodynamic constraints to reason about the probability of successful policy execution to switch between local and global policies. We have validated our solution in both simulation and on a variety of large-scale real world hardware tests. Our results show that by balancing local and global exploration we are able to significantly explore large-scale environments more efficiently.

翻訳日:2023-12-13 03:34:19 公開日:2023-12-10

# Spach Transformer:PET画像の局所的・グローバル的自己注意に基づく空間的・チャネル的変換器

Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising ( http://arxiv.org/abs/2209.03300v2 )

ライセンス: Link先を確認

Se-In Jang, Tinsu Pan, Ye Li, Pedram Heidari, Junyu Chen, Quanzheng Li, Kuang Gong

(参考訳) ポジショルエミッショントモグラフィ(PET)はその量的メリットと高い感度のために臨床や研究に広く用いられているが、低信号-雑音比(SNR)に悩まされている。近年,畳み込みニューラルネットワーク(cnns)がpet画像の品質向上に広く利用されている。局所的な特徴抽出で成功し、効率的であるが、CNNはその限定された受容野のため、長距離依存をうまく捉えることはできない。 global multi-head self-attention (msa) は長距離情報を取り込む一般的な手法である。しかし,3次元画像に対するグローバルmsaの計算には高い計算コストがかかる。本研究では,局所的および大域的msaに基づく空間的およびチャネル的情報を活用できる,効率的な空間的およびチャネル的エンコーダ・デコーダ変換器spach transformerを提案する。異なるPETトレーサのデータセット、すなわち$^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, $^{68}$Ga-DOTATATEを用いて提案フレームワークの評価を行った。定量的な結果は,提案するSpach Transformerフレームワークが最先端のディープラーニングアーキテクチャより優れていることを示している。私たちのコードはhttps://github.com/sijang/SpachTransformerで利用可能です。

Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limited receptive field. Global multi-head self-attention (MSA) is a popular approach to capture long-range information. However, the calculation of global MSA for 3D images has high computational costs. In this work, we proposed an efficient spatial and channel-wise encoder-decoder transformer, Spach Transformer, that can leverage spatial and channel information based on local and global MSAs. Experiments based on datasets of different PET tracers, i.e., $^{18}$F-FDG, $^{18}$F-ACBC, $^{18}$F-DCFPyL, and $^{68}$Ga-DOTATATE, were conducted to evaluate the proposed framework. Quantitative results show that the proposed Spach Transformer framework outperforms state-of-the-art deep learning architectures. Our codes are available at https://github.com/sijang/SpachTransformer

翻訳日:2023-12-13 03:33:58 公開日:2023-12-10

# 2022年XCSP3コンペティションの成果

Proceedings of the 2022 XCSP3 Competition ( http://arxiv.org/abs/2209.00917v2 )

ライセンス: Link先を確認

Gilles Audemard, Christophe Lecoutre, Emmanuel Lonca

(参考訳) この文書は2022年のXCSP3コンペティションの手続きを表している。この制約ソルバの競争の結果は、2022年7月31日から8月7日までイスラエルのハイファで開催されたfloc(federated logic conference) 2022オリンピックで発表された。

This document represents the proceedings of the 2022 XCSP3 Competition. The results of this competition of constraint solvers were presented at FLOC (Federated Logic Conference) 2022 Olympic Games, held in Haifa, Israel from 31th July 2022 to 7th August, 2022.

翻訳日:2023-12-13 03:33:31 公開日:2023-12-10

# GANとクロージャ:マルチスケールモデリングにおけるマイクロマクロ一貫性

GANs and Closures: Micro-Macro Consistency in Multiscale Modeling ( http://arxiv.org/abs/2208.10715v4 )

ライセンス: Link先を確認

Ellis R. Crabtree, Juan M. Bello-Rivas, Andrew L. Ferguson, Ioannis G. Kevrekidis

(参考訳) 分子系の位相空間、そしてより一般的には、確率微分方程式によって効果的にモデル化される複雑な系のサンプリングは、タンパク質の折り畳みから物質発見に至るまで、多くの分野において重要なモデリングステップである。これらの問題は自然界においてしばしばマルチスケールであり、少数の「遅い」反応座標によってパラメトリケートされた低次元の有効自由エネルギー表面で説明でき、残りの「速い」自由度は反応座標値の平衡測度を発生させる。このような問題に対するサンプリング手順は、条件付き平衡分布に関するアンサンブル平均と同様に有効自由エネルギー差を推定するために用いられる。近年,分子シミュレーションと組み合わせた改良されたサンプリング技術が開発されている。興味深いアナロジーは機械学習(ml)の分野において発生し、生成型逆ネットワークは低次元確率分布から高次元のサンプルを生成することができる。このサンプル生成は、その低次元表現に関する情報から、モデル状態の可能な高次元空間実現を返す。本稿では,同じタスクに対して,mlベースの条件付き生成逆ネットワークを用いて条件分布をサンプリングするための物理ベースのシミュレーションとバイアス手法を結合する手法を提案する。微細なスケールの実現を条件付ける「粗い記述子」は、優先順位として、あるいは非線形次元の減少を通じて学習することができる。物理学に基づく拡張サンプリング技術とcGANを結合したフレームワークが、マルチスケールのSDE動的システムサンプリングを改善することを実証し、複雑さを増すシステムへの期待を示す。

Sampling the phase space of molecular systems -- and, more generally, of complex systems effectively modeled by stochastic differential equations -- is a crucial modeling step in many fields, from protein folding to materials discovery. These problems are often multiscale in nature: they can be described in terms of low-dimensional effective free energy surfaces parametrized by a small number of "slow" reaction coordinates; the remaining "fast" degrees of freedom populate an equilibrium measure on the reaction coordinate values. Sampling procedures for such problems are used to estimate effective free energy differences as well as ensemble averages with respect to the conditional equilibrium distributions; these latter averages lead to closures for effective reduced dynamic models. Over the years, enhanced sampling techniques coupled with molecular simulation have been developed. An intriguing analogy arises with the field of Machine Learning (ML), where Generative Adversarial Networks can produce high dimensional samples from low dimensional probability distributions. This sample generation returns plausible high dimensional space realizations of a model state, from information about its low-dimensional representation. In this work, we present an approach that couples physics-based simulations and biasing methods for sampling conditional distributions with ML-based conditional generative adversarial networks for the same task. The "coarse descriptors" on which we condition the fine scale realizations can either be known a priori, or learned through nonlinear dimensionality reduction. We suggest that this may bring out the best features of both approaches: we demonstrate that a framework that couples cGANs with physics-based enhanced sampling techniques can improve multiscale SDE dynamical systems sampling, and even shows promise for systems of increasing complexity.

翻訳日:2023-12-13 03:32:44 公開日:2023-12-10

# システムバスダイナミクスのための小行列経路積分のツリーベース実装

Tree-based Implementation of the Small Matrix Path Integral for System-Bath Dynamics ( http://arxiv.org/abs/2207.11830v2 )

ライセンス: Link先を確認

Geshuo Wang and Zhenning Cai

(参考訳) small matrix path integral (smatpi) 法は、高調波浴に結合した量子系の進化をシミュレートする効率的な数値的手法である。この方法は量子系の非マルコフ力学を定義する一連のカーネル行列に依存する。 SMatPI方式では、これらのカーネルはQuAPI方式で間接的に計算される。代わりに、カーネル行列の定義に焦点をあて、これらの行列の繰り返し関係を明らかにする。このような関係を用いて,木ベースアルゴリズム(t-smatpi)を開発し,その定義に基づくカーネル行列の簡単な計算よりも高速であることが示されている。このアルゴリズムはSMatPI行列を他の経路積分法によって計算するステップをバイパスし、SMatPI行列自体をより深く理解する。一方、メモリコストと計算コストは低く抑えられている。数値実験により、t-SMatPIアルゴリズムはi-QuAPIとSMatPIと全く同じ結果が得られることが示された。それにもかかわらず、我々の手法はオープン量子系のいくつかの新しい性質を示し、高次数値スキームに一般化できる可能性を持っている。

The small matrix path integral (SMatPI) method is an efficient numerical approach to simulate the evolution of a quantum system coupled to a harmonic bath. The method relies on a sequence of kernel matrices that defines the non-Markovian dynamics of the quantum system. In the original SMatPI method, these kernels are computed indirectly through the QuAPI method. Instead, we focus on the definition of the kernel matrices and reveal a recurrence relation in these matrices. Using such a relationship, a tree based algorithm (t-SMatPI) is developed, which is shown to be much faster than straightforward computation of the kernel matrices based on their definitions. This algorithm bypasses the step to compute the SMatPI matrices by other path integral methods and provides more understanding of the SMatPI matrices themselves. Meanwhile, it keeps the memory cost and computational cost low. Numerical experiments show that the t-SMatPI algorithm gives exactly the same result as i-QuAPI and SMatPI. In spite of this, our method may indicate some new properties of open quantum systems, and has the potential to be generalized to higher-order numerical schemes.

翻訳日:2023-12-13 03:31:53 公開日:2023-12-10

# グラフトポロジサンプリングを用いたトレーニンググラフ畳み込みネットワークの一般化保証

Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling ( http://arxiv.org/abs/2207.03584v2 )

ライセンス: Link先を確認

Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong

(参考訳) グラフ畳み込みネットワーク(GCN)は近年,グラフ構造化データの学習において大きな成功を収めている。隣り合う機能の再帰的埋め込みによるスケーラビリティの問題に対処するために、gcnのトレーニングのメモリと計算コストを削減するためにグラフトポロジサンプリングが提案されており、多くの実証研究でトポロジーサンプリングのないものと同等のテスト性能を達成している。本稿では,半教師付きノード分類のための(最大)3層gcnの学習におけるグラフトポロジーサンプリングの理論的正当化について述べる。グラフトポロジサンプリングにおいて,GCNトレーニングが一般化誤差を減少させるような条件を公式に特徴付ける。さらに,本手法は,既存のGCNの理論的解析において未探索の層間重みの非凸相互作用に対処する。本稿では,グラフ構造とトポロジサンプリングが一般化性能および試料の複雑さに与える影響を明示し,数値実験により理論的知見を正当化する。

Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data. To address its scalability issue due to the recursive embedding of neighboring features, graph topology sampling has been proposed to reduce the memory and computational cost of training GCNs, and it has achieved comparable test performance to those without topology sampling in many empirical studies. To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. We formally characterize some sufficient conditions on graph topology sampling such that GCN training leads to a diminishing generalization error. Moreover, our method tackles the nonconvex interaction of weights across layers, which is under-explored in the existing theoretical analyses of GCNs. This paper characterizes the impact of graph structures and topology sampling on the generalization performance and sample complexity explicitly, and the theoretical findings are also justified through numerical experiments.

翻訳日:2023-12-13 03:31:09 公開日:2023-12-10

# CT画像からのトランスファーラーニングアプローチを用いたCOVID-19検出

COVID-19 Detection Using Transfer Learning Approach from Computed Tomography Images ( http://arxiv.org/abs/2207.00259v5 )

ライセンス: Link先を確認

Kenan Morani, Esra Kaya Ayana, Devrim Unay

(参考訳) 新型コロナウイルスのパンデミックによる独特な課題の中で、効率的かつ正確な診断の重要性は、革新的なアプローチの緊急性を示している。これらの課題に対応するために,最近アノテーション付きct画像データベースを用いたトランスファー学習に基づくアプローチを提案する。多くのアプローチが集中型データプリプロセシングおよび/または複雑なモデルアーキテクチャを提案するが、この手法は最小限の手動エンジニアリングで効率的なソリューションを提供することに焦点を当てている。具体的には、covid-19検出のための修正xceptionモデルの適合性について検討する。この方法は、事前トレーニングされたXceptionモデルを適応させ、ImageNetのアーキテクチャと事前トレーニングされた重みの両方を組み込む。モデルの出力は最終的な診断決定を下すように設計された。トレーニングでは、128のバッチサイズと224x224の入力画像サイズを使用し、標準の512x512から縮小した。入力データ上でのda処理は行われなかった。検査は「COV19-CT-DB」CT画像データセットを用いて行う。その結果、検証サブセットにおける精度、精度、リコール、マクロF1スコアの精度が向上し、VGG-16転送モデルよりも優れ、パラメータが少ない精度が向上した。さらに、cov19-ct-dbデータセットの代替手法と比較すると、同じデータセット上のベースラインアプローチや他の代替方法を超える。最後に、COV19-CT-DBデータセットのユニークな特徴に対するXception trasnfer学習ベースモデルの適応性は、CT画像から新型コロナウイルスを診断するための堅牢なツールとしての可能性を示している。

The significance of efficient and accurate diagnosis amidst the unique challenges posed by the COVID-19 pandemic underscores the urgency for innovative approaches. In response to these challenges, we propose a transfer learning-based approach using a recently annotated Computed Tomography (CT) image database. While many approaches propose an intensive data preproseccing and/or complex model architecture, our method focusses on offering an efficient solution with minimal manual engineering. Specifically, we investigate the suitability of a modified Xception model for COVID-19 detection. The method involves adapting a pre-trained Xception model, incorporating both the architecture and pre-trained weights from ImageNet. The output of the model was designed to take the final diagnosis decisions. The training utilized 128 batch sizes and 224x224 input image dimensions, downsized from standard 512x512. No further da processing was performed on the input data. Evaluation is conducted on the 'COV19-CT-DB' CT image dataset, containing labeled COVID-19 and non-COVID-19 cases. Results reveal the method's superiority in accuracy, precision, recall, and macro F1 score on the validation subset, outperforming VGG-16 transfer model and thus offering enhanced precision with fewer parameters. Furthermore, when compared to alternative methods for the COV19-CT-DB dataset, our approach exceeds the baseline approach and other alternatives on the same dataset. Finally, the adaptability of the modified Xception trasnfer learning-based model to the unique features of the COV19-CT-DB dataset showcases its potential as a robust tool for enhanced COVID-19 diagnosis from CT images.

翻訳日:2023-12-13 03:30:53 公開日:2023-12-10

# 深い量子誤差補正

Deep Quantum Error Correction ( http://arxiv.org/abs/2301.11930v2 )

ライセンス: Link先を確認

Yoni Choukroun, Lior Wolf

(参考訳) 量子誤り訂正符号(QECC)は、量子コンピューティングのポテンシャルを実現するための鍵となる要素である。 QECCは、従来のECC(英語版)と同様に、冗長な物理量子ビットに量子論理情報を分散することにより、エラーを検出し修正することで、エラー率の低減を可能にする。本研究では,新しい量子誤り復号器を効率的に訓練する。システムノイズの最初の推定値を予測するために、シンドローム復号を増強することで量子計測の崩壊を解消し、深層ニューラルネットワークによって反復的に洗練する。有限フィールド上で計算された論理エラー率は、微分可能な目的によって直接最適化され、コードによって課される制約の下で効率的な復号化を可能にする。最後に, 繰り返しシンドロームサンプリングの効率的な復号化により, 故障症候群計測をサポートするよう拡張した。提案手法は,QECC におけるニューラルデコーダのパワーを,最先端の精度を達成し,従来の {end-to-end } ニューラルおよび古典的デコーダを性能的に向上させることによって実証する。

Quantum error correction codes (QECC) are a key component for realizing the potential of quantum computing. QECC, as its classical counterpart (ECC), enables the reduction of error rates, by distributing quantum logical information across redundant physical qubits, such that errors can be detected and corrected. In this work, we efficiently train novel {\emph{end-to-end}} deep quantum error decoders. We resolve the quantum measurement collapse by augmenting syndrome decoding to predict an initial estimate of the system noise, which is then refined iteratively through a deep neural network. The logical error rates calculated over finite fields are directly optimized via a differentiable objective, enabling efficient decoding under the constraints imposed by the code. Finally, our architecture is extended to support faulty syndrome measurement, by efficient decoding of repeated syndrome sampling. The proposed method demonstrates the power of neural decoders for QECC by achieving state-of-the-art accuracy, outperforming {for small distance topological codes,} the existing {end-to-end }neural and classical decoders, which are often computationally prohibitive.

翻訳日:2023-12-13 03:24:08 公開日:2023-12-10

# EPCL: Frozen CLIP Transformerは効率的なポイントクラウドエンコーダ

EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder ( http://arxiv.org/abs/2212.04098v3 )

ライセンス: Link先を確認

Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang

(参考訳) プリトレイン・フィニチューン・パラダイムは、高品質な表現能力とトレーニング済みモデルの転送性により、nlpと2d画像の分野で大きな成功を収めている。しかし,3次元点雲場において,このような強いモデルの事前学習は,点雲列の限られた量のため困難である。本稿では, 凍結したCLIP変換器を用いて高品質のクラウドモデルを直接学習する, 効率的かつ効率的なポイントクラウド学習者である \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL) を紹介する。我々のEPCLは、2D-3Dデータなしで画像の特徴と点雲の特徴を意味的に整合させることで、2Dと3Dのモダリティを接続する。具体的には、入力ポイントクラウドは一連のローカルパッチに分割され、設計されたpoint cloud tokenizerによってトークン埋め込みに変換される。これらのトークン埋め込みはタスクトークンと結合され、ポイントクラウド表現を学ぶために凍ったクリップトランスフォーマーに供給される。直感的には、提案されたpoint cloud tokenizerは入力ポイントクラウドを2dイメージに似た統一トークン空間に投影する。 3次元検出,セマンティックセグメンテーション,分類,少数ショット学習に関する総合的な実験により,CLIPトランスフォーマーが効率的なポイントクラウドエンコーダとして機能し,室内および屋外のベンチマークで有望な性能を達成することを示す。特に、epclがもたらしたパフォーマンス向上は、scannet v2検出で$\textbf{19.7}$ ap$_{50}$、s3disセグメンテーションで$\textbf{4.4}$ miou、semantickittiセグメンテーションで$\textbf{1.2}$ miouです。コードは \url{https://github.com/xiaoshuihuang/epcl} で入手できる。

The pretrain-finetune paradigm has achieved great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field due to the limited amount of point cloud sequences. This paper introduces \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP transformer. Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a series of local patches, which are converted to token embeddings by the designed point cloud tokenizer. These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation. The intuition is that the proposed point cloud tokenizer projects the input point cloud into a unified token space that is similar to the 2D images. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the CLIP transformer can serve as an efficient point cloud encoder and our method achieves promising performance on both indoor and outdoor benchmarks. In particular, performance gains brought by our EPCL are $\textbf{19.7}$ AP$_{50}$ on ScanNet V2 detection, $\textbf{4.4}$ mIoU on S3DIS segmentation and $\textbf{1.2}$ mIoU on SemanticKITTI segmentation compared to contemporary pretrained models. Code is available at \url{https://github.com/XiaoshuiHuang/EPCL}.

翻訳日:2023-12-13 03:20:49 公開日:2023-12-10

# 非マルコフ環境における強化学習

Reinforcement Learning in Non-Markovian Environments ( http://arxiv.org/abs/2211.01595v3 )

ライセンス: Link先を確認

Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

(参考訳) 任意の非マルコフ環境における強化学習のためにvan royと共著者によって開発された新しいパラダイムに動機づけられ、q-learningアルゴリズムを適用した際の観測の非マルコフ性に起因する誤りを、関連する定式化し、明確にピン留めする。この観察に基づいて,エージェント設計の基準は,ある条件法則に対してよい近似を求めるべきであることを示唆する。古典的確率制御に着想を得て, 近似的統計量の再帰的計算に還元されることを示す。これにより、エージェント設計のためのオートエンコーダベースのスキームが実現され、部分的に観察された強化学習環境上で数値的にテストされる。

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.

翻訳日:2023-12-13 03:19:21 公開日:2023-12-10

# dinar: 一発ヒトアバターの神経テクスチャの拡散インパインティング

DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars ( http://arxiv.org/abs/2303.09375v4 )

ライセンス: Link先を確認

David Svitov, Dmitrii Gudkov, Renat Bashirov, Victor Lempitsky

(参考訳) DINARは、1枚のRGB画像から現実的なフルボディアバターを作成するためのアプローチである。従来の研究と同様に, SMPL-Xボディーモデルと組み合わせた神経テクスチャを用いて, アバターのフォトリアリスティックな品質を実現し, アニメーションや高速な推論を実現している。テクスチャを復元するために、潜伏拡散モデルを使用し、そのようなモデルを神経テクスチャ空間でどのようにトレーニングするかを示す。拡散モデルを用いることで、正面から見ると人物の背中のような大きな目立たない領域を現実的に再構築することができる。パイプライン内のモデルは、2D画像とビデオのみを使用してトレーニングされています。実験では,最先端のレンダリング品質と,新たなポーズや視点への優れた一般化を実現する。特に、このアプローチはSnapshotPeople公開ベンチマークの最先端を改善している。

We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.

翻訳日:2023-12-13 03:10:27 公開日:2023-12-10

# 学習は低位行列回復における局所的最適性によって説明できるか?

Can Learning Be Explained By Local Optimality In Low-rank Matrix Recovery? ( http://arxiv.org/abs/2302.10963v2 )

ライセンス: Link先を確認

Jianhao Ma, Salar Fattahi

(参考訳) 低ランクマトリックスリカバリのローカルランドスケープを探求し、$m$線形測定値から$r$で$d_1\times d_2$マトリックスを再構築することを目的としている。真のランクが不明な場合、過剰推定は一般的であり、ランク $k\geq r$ の過剰パラメータモデルが得られる。近年の研究では、ロバストな$\ell_1$-lossを持つ一階法は、ランクが過大評価され、測定がうるさく、真の解が局所的あるいは大域的ミニマとして現れる可能性を示している。本論文は, 穏やかな条件下では, 真の解が「textit{strict saddle points}」として現れることを示す。我々は,ロバストな$\ell_1$-loss と低ランク行列回復,行列完成,行列センシングの2つのカテゴリについて検討した。マトリックスセンシングでは、2つの臨界遷移を明らかにする。 m$ の範囲は $\max\{d_1,d_2\}r\lesssim m\lesssim \max\{d_1,d_2\}k$ の範囲であり、真の解はいずれも局所的あるいは大域的でない。 m$ が $\max\{d_1,d_2\}k$ を超えると、すべての真の解は無意味なグローバルミニマになる。行列の完備化において、わずかなランクの過大評価と穏やかなノイズであっても、真の解は非臨界点または厳密な鞍点として現れる。

We explore the local landscape of low-rank matrix recovery, aiming to reconstruct a $d_1\times d_2$ matrix with rank $r$ from $m$ linear measurements, some potentially noisy. When the true rank is unknown, overestimation is common, yielding an over-parameterized model with rank $k\geq r$. Recent findings suggest that first-order methods with the robust $\ell_1$-loss can recover the true low-rank solution even when the rank is overestimated and measurements are noisy, implying that true solutions might emerge as local or global minima. Our paper challenges this notion, demonstrating that, under mild conditions, true solutions manifest as \textit{strict saddle points}. We study two categories of low-rank matrix recovery, matrix completion and matrix sensing, both with the robust $\ell_1$-loss. For matrix sensing, we uncover two critical transitions. With $m$ in the range of $\max\{d_1,d_2\}r\lesssim m\lesssim \max\{d_1,d_2\}k$, none of the true solutions are local or global minima, but some become strict saddle points. As $m$ surpasses $\max\{d_1,d_2\}k$, all true solutions become unequivocal global minima. In matrix completion, even with slight rank overestimation and mild noise, true solutions either emerge as non-critical or strict saddle points.

翻訳日:2023-12-13 03:08:47 公開日:2023-12-10

# ベイジアン説得による動的価格と学習

Dynamic Pricing and Learning with Bayesian Persuasion ( http://arxiv.org/abs/2304.14385v2 )

ライセンス: Link先を確認

Shipra Agrawal, Yiding Feng, Wei Tang

(参考訳) 我々は,商品の価格設定に加えて,販売者が「広告計画」にコミットする,新たな動的価格設定と学習環境について考察する。つまり、各ラウンドの開始時に、売り手は商品の品質について購入者にどのような信号を提供するかを決定することができる。人気の高いベイズ説得フレームワークを用いて、これらのシグナルが購入者の評価と購入応答に及ぼす影響をモデル化し、販売者の期待収益を最大化する価格体系とともに、広告スキームの最適設計を求める問題を定式化する。購入者の需要関数を事前に知ることなく、過去の購入応答を利用して最適な価格と広告戦略を適応的に学習できるオンラインアルゴリズムを設計することを目標としている。本稿では,最適な価格と広告手法と比較し,アルゴリズムの後悔について考察する。我々の主な結果は計算効率の良いオンラインアルゴリズムであり、製品品質において評価関数が線形であるときに$o(t^{2/3}(m\log t)^{1/3})$ regret boundを達成する。ここで $m$ は離散的製品品質ドメインの濃度であり、$t$ は時間軸である。この結果は、バリュエーション関数に対する自然な単調性とリプシッツの仮定を必要とするが、購入者の要求関数に対するリプシッツや滑らかさの仮定は不要である。定数$m$の場合、この結果は対数係数内での動的価格設定に対する後悔の少ない低い値と一致します。また、より広範に考慮された加法評価の特別ケースに対して、$m$ の独立性を持つ $\tilde{O}(T^{2/3})$ regret bound を含むいくつかの改善された結果を得る。

We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

翻訳日:2023-12-13 02:59:37 公開日:2023-12-10

# 形状, 材料, 照明のニューラルPBIR再構成

Neural-PBIR Reconstruction of Shape, Material, and Illumination ( http://arxiv.org/abs/2304.13445v4 )

ライセンス: Link先を確認

Cheng Sun, Guangyan Cai, Zhengqin Li, Kai Yan, Cheng Zhang, Carl Marshall, Jia-Bin Huang, Shuang Zhao, Zhao Dong

(参考訳) 物体の2d画像(例えば写真)に基づく物理世界の物体の形状と空間的に変化する表面の外観の再構築は、コンピュータビジョンやグラフィックスにおいて長年の課題となっている。本稿では,ニューラルネットワークを用いた物体再構成と物理ベースの逆レンダリング(PBIR)を組み合わせた高精度かつ高効率な物体再構成パイプラインを提案する。当社のパイプラインではまず,ニューラルsdfベースの形状再構成を活用して,高品質だが潜在的に不完全なオブジェクト形状を生成する。次に, 神経材料と照明蒸留ステージを導入し, 材料と照明の高品質な予測を実現する。最終段階では、神経予測によって初期化され、PBIRを用いて初期結果を洗練し、オブジェクト形状、材料、照明の最終的な高品質な再構成を得る。実験の結果、パイプラインは既存のメソッドよりも品質や性能に優れています。

Reconstructing the shape and spatially varying surface appearances of a physical-world object as well as its surrounding illumination based on 2D images (e.g., photographs) of the object has been a long-standing problem in computer vision and graphics. In this paper, we introduce an accurate and highly efficient object reconstruction pipeline combining neural based object reconstruction and physics-based inverse rendering (PBIR). Our pipeline firstly leverages a neural SDF based shape reconstruction to produce high-quality but potentially imperfect object shape. Then, we introduce a neural material and lighting distillation stage to achieve high-quality predictions for material and illumination. In the last stage, initialized by the neural predictions, we perform PBIR to refine the initial results and obtain the final high-quality reconstruction of object shape, material, and illumination. Experimental results demonstrate our pipeline significantly outperforms existing methods quality-wise and performance-wise.

翻訳日:2023-12-13 02:59:01 公開日:2023-12-10

# 正規化・多視点支援ベクトル機械学習のローカライズ

Localisation of Regularised and Multiview Support Vector Machine Learning ( http://arxiv.org/abs/2304.05655v2 )

ライセンス: Link先を確認

Aurelian Gheondea and Cankat Tilki

(参考訳) 我々は、H.Q.~Minh, L によって導入された正規化および多視点支援ベクトル機械学習問題の局所化バージョンに対するいくつかの代表者定理を証明した。〜bazzani,v。 ~murino, \textit{journal of machine learning research}, \textbf{17}(2016) 1--72, 演算子値の正の半定義核とその再生成核ヒルベルト空間を含む。結果は、凸または非凸損失函数と有限または無限次元の入力空間を考える場合の一般的な場合に関する。一般化されたフレームワークは無限次元の入力空間と非凸損失関数を特別な場合、特に損失関数が g\^ateaux 微分可能である場合に許容する。部分非線形問題につながる指数最小二乗損失関数について、詳細な計算が提供される。

We prove a few representer theorems for a localised version of the regularised and multiview support vector machine learning problem introduced by H.Q.~Minh, L.~Bazzani, and V.~Murino, \textit{Journal of Machine Learning Research}, \textbf{17}(2016) 1--72, that involves operator valued positive semidefinite kernels and their reproducing kernel Hilbert spaces. The results concern general cases when convex or nonconvex loss functions and finite or infinite dimensional input spaces are considered. We show that the general framework allows infinite dimensional input spaces and nonconvex loss functions for some special cases, in particular in case the loss functions are G\^ateaux differentiable. Detailed calculations are provided for the exponential least squares loss functions that leads to partially nonlinear problems.

翻訳日:2023-12-13 02:57:18 公開日:2023-12-10

# ニューラルネットワーク制御システムの整合性解析のための契約型適応分割法

Contraction-Guided Adaptive Partitioning for Reachability Analysis of Neural Network Controlled Systems ( http://arxiv.org/abs/2304.03671v2 )

ライセンス: Link先を確認

Akash Harapanahalli, Saber Jafarpour, Samuel Coogan

(参考訳) 本稿では,ニューラルネットワークコントローラと外乱を用いた非線形フィードバックループにおける区間値のロバスト到達可能集合推定を改善するための縮小誘導適応分割アルゴリズムを提案する。過近似間隔の収縮率の推定に基づいて、アルゴリズムはいつ、どこで分割するかを選択する。そして、ニューラルネットワーク検証ステップと到達可能性分割層を分離することにより、アルゴリズムは計算コストの少ない精度向上を提供することができる。このアプローチは、十分な精度のオープンループ間隔値到達可能性推定手法と、ニューラルネットワークの入出力挙動をバウンドする方法に適用できる。縮退に基づくロバストネス解析を用いて,混合単調到達性を有するアルゴリズムの性能保証を行う。最後に,いくつかの数値シミュレーションを用いてアルゴリズムの性能を実証し,既存の手法と比較する。特に,実行環境のごく一部において到達可能な集合推定の精度が,最先端手法と比較して大幅に向上したことを報告する。

In this paper, we present a contraction-guided adaptive partitioning algorithm for improving interval-valued robust reachable set estimates in a nonlinear feedback loop with a neural network controller and disturbances. Based on an estimate of the contraction rate of over-approximated intervals, the algorithm chooses when and where to partition. Then, by leveraging a decoupling of the neural network verification step and reachability partitioning layers, the algorithm can provide accuracy improvements for little computational cost. This approach is applicable with any sufficiently accurate open-loop interval-valued reachability estimation technique and any method for bounding the input-output behavior of a neural network. Using contraction-based robustness analysis, we provide guarantees of the algorithm's performance with mixed monotone reachability. Finally, we demonstrate the algorithm's performance through several numerical simulations and compare it with existing methods in the literature. In particular, we report a sizable improvement in the accuracy of reachable set estimation in a fraction of the runtime as compared to state-of-the-art methods.

翻訳日:2023-12-13 02:57:03 公開日:2023-12-10

# Grid-SD2E:認知学習システムにおける一般的なグリッドフィードバック

Grid-SD2E: A General Grid-Feedback in a System for Cognitive Learning ( http://arxiv.org/abs/2304.01844v3 )

ライセンス: Link先を確認

Jingyi Feng and Chenming Zhang

(参考訳) 生成された神経データによって脳が外界とどのように相互作用するかを理解することは、その働きのメカニズムの決定、脳疾患の治療、知性の理解に不可欠である。多くの理論モデルが提案されているが、これまでのところ統合と開発は困難である。本研究では,より汎用的でロバストなグリッドモジュールを作成し,ベイジアン推論(space-division and exploration-exploitation with grid-feedback, grid-sd2e)を用いた対話型・自己情報型認知システムを構築した。ここでは、グリッドモジュールを外界とシステム間の相互作用媒体として、システム内の自己強化媒体として使用することができる。空間分割探索探索(SD2E)は、その空間分割(SD)モジュールを介してグリッドの0/1信号を受信する。本稿では,他の研究者による実験と神経復号に関する経験から得られた理論モデルについても述べる。本稿では,神経科学と認知科学の両分野における既存の理論に基づくシステムの合理性を分析し,人と人と外の世界との間の相互作用を説明するための特別な,一般的なルールを提案する。さらに、このフレームワークに基づいて、最小の計算ユニットが抽出され、これは脳内の1つのニューロンに類似しています。

Comprehending how the brain interacts with the external world through generated neural data is crucial for determining its working mechanism, treating brain diseases, and understanding intelligence. Although many theoretical models have been proposed, they have thus far been difficult to integrate and develop. In this study, we were inspired in part by grid cells in creating a more general and robust grid module and constructing an interactive and self-reinforcing cognitive system together with Bayesian reasoning, an approach called space-division and exploration-exploitation with grid-feedback (Grid-SD2E). Here, a grid module can be used as an interaction medium between the outside world and a system, as well as a self-reinforcement medium within the system. The space-division and exploration-exploitation (SD2E) receives the 0/1 signals of a grid through its space-division (SD) module. The system described in this paper is also a theoretical model derived from experiments conducted by other researchers and our experience on neural decoding. Herein, we analyse the rationality of the system based on the existing theories in both neuroscience and cognitive science, and attempt to propose special and general rules to explain the different interactions between people and between people and the external world. What's more, based on this framework, the smallest computing unit is extracted, which is analogous to a single neuron in the brain.

翻訳日:2023-12-13 02:56:32 公開日:2023-12-10

# 画像マッティングのための異方性事前学習

Disentangled Pre-training for Image Matting ( http://arxiv.org/abs/2304.00784v2 )

ライセンス: Link先を確認

Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao

(参考訳) 画像マッチングは、近年の文献における深層モデルのトレーニングを支援するために、高品質なピクセルレベルの人間のアノテーションを必要とする。このようなアノテーションは費用がかかり、スケールが難しいが、研究の発展を著しく妨げている。本研究では,無限個のデータを利用してマットング性能を向上させる自己教師付き事前学習手法を提案することで,この問題への最初の試みを行う。プリトレーニングタスクは、ランダムなトリマップとアルファマットを生成して画像不等角化目標を達成するイメージマットングと似た方法で設計される。次に、事前訓練されたモデルは、微調整のための下流マットングタスクの初期化として使用される。広範な実験評価により,提案手法は最先端のマットング法と他の自己教師付き初期化手法を大差で上回ることがわかった。また,異なるバックボーンアーキテクチャ上で提案手法の堅牢性を示す。プロジェクトページはhttps://crystraldo.github.io/dpt_mat/で閲覧できます。

Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pre-training approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches by a large margin. We also show the robustness of the proposed approach over different backbone architectures. Our project page is available at https://crystraldo.github.io/dpt_mat/.

翻訳日:2023-12-13 02:56:09 公開日:2023-12-10

# 学習における絡み合いと統計の役割について

On the Role of Entanglement and Statistics in Learning ( http://arxiv.org/abs/2306.03161v2 )

ライセンス: Link先を確認

Srinivasan Arunachalam, Vojtech Havlicek, Louis Schatzki

(参考訳) 本研究では,量子統計クエリ(QSQ)モデルにおいて,絡み合った,分離可能な,統計的に測定された学習モデル間の関係を理解する。この目的のために、以下の結果を示す。分離可能な測定値に対して$\textbf{entangled。 c\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$.} ここでの目標は、未知の$f$を、$\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$という概念クラスから学ぶことである。もし$t$が、絡み合った測定値を使って$f$を学ぶのに十分であれば、$o(nt^2)$は、分離可能な測定値だけで$f$を学ぶのに十分である。 $\textbf{Entangled versus statistics Measurement} ここでのゴールは、分離可能な測定と統計測定へのアクセスを与えられた関数 $f \in C$ を学ぶことである。 qsq学習と(ノイズが存在する場合でも)絡み合った測定値を持つ量子学習を指数関数的に分離するクラス$c$を示す。これはblum et alの独創的な結果の「量子アナログ」を証明している。 [BKW'03]。これは古典的なSQとPAC学習を分類ノイズで分離する。学習状態の上限は$\textbf{qsq である。量子統計クエリーディメンション(QSD)を導入し、QSQ学習の下位境界を与える。これにより、純度、シャドウトモグラフィ、アベリア隠れ部分群問題、次数2$の関数、植込み双斜め状態、深さ$\textsf{polylog}(n)$のクリフォード回路の出力状態をテストするための超多項式QSQの下界を証明できる。 $\textbf{Further アプリケーション。弱いエラーと強いエラーの軽減を分離し、qsqモデルにおける学習分布の限界を低く証明します。 Quekらによる以前の作品。 qfk+'22] ヒンシュなどです [hin+'22] と nietner 等。 NIS+'23]は類似の結果を$\textit{assuming}$ 対角測定で証明し、我々の研究はこの仮定を取り除いた。

In this work we make progress in understanding the relationship between learning models with access to entangled, separable and statistical measurements in the quantum statistical query (QSQ) model. To this end, we show the following results. $\textbf{Entangled versus separable measurements.}$ The goal here is to learn an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We show that, if $T$ copies suffice to learn $f$ using entangled measurements, then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements. $\textbf{Entangled versus statistical measurements}$ The goal here is to learn a function $f \in C$ given access to separable measurements and statistical measurements. We exhibit a class $C$ that gives an exponential separation between QSQ learning and quantum learning with entangled measurements (even in the presence of noise). This proves the "quantum analogue" of the seminal result of Blum et al. [BKW'03]. that separates classical SQ and PAC learning with classification noise. $\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum statistical query dimension (QSD), which we use to give lower bounds on the QSQ learning. With this we prove superpolynomial QSQ lower bounds for testing purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$ functions, planted bi-clique states and output states of Clifford circuits of depth $\textsf{polylog}(n)$. $\textbf{Further applications.}$ We give and $\textit{unconditional}$ separation between weak and strong error mitigation and prove lower bounds for learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22], Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous results $\textit{assuming}$ diagonal measurements and our work removes this assumption.

翻訳日:2023-12-13 02:34:45 公開日:2023-12-10

# 平均的リワードを伴うレストレスバンド:一様グローバルアトラクタの推計を破る

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption ( http://arxiv.org/abs/2306.00196v2 )

ライセンス: Link先を確認

Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang

(参考訳) 平均報酬基準による無限ホリゾンレストレスト・バンディット問題を離散時間と連続時間の両方の設定で検討した。基本的な目標は、腕の数($n$)が大きくなるにつれて最適なギャップを減少させる計算効率のよいポリシーを設計することである。漸近的最適性に関する既存の結果は、すべて一様大域的誘引特性(UGAP)に依存している。本稿では,単腕のポリシを元のn$-armed問題に対するポリシに変換する,汎用的なシミュレーションベースのフレームワークであるnext-the-virtual-adviceを提案する。これは、各腕に単一武装のポリシーをシミュレートし、実状態をシミュレートされた状態に向けて慎重に操ることによって行われる。我々のフレームワークは、$O(1/\sqrt{N})$Optimity gapでポリシーを生成するためにインスタンス化することができる。離散時間設定では、結果はより単純な同期仮定の下で保持され、これはugapに違反する問題インスタンスをカバーする。より注目すべきは、連続時間設定では、標準のユニチェーン条件を超える追加の仮定は不要である。どちらの設定でも、我々の研究はUGAPを必要としない最初の漸近的最適性の結果である。

We study the infinite-horizon Restless Bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental goal is to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require any additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.

翻訳日:2023-12-13 02:32:56 公開日:2023-12-10

# 非エルミート量子系における非自明な世界線巻線

Nontrivial worldline winding in non-Hermitian quantum systems ( http://arxiv.org/abs/2307.01260v2 )

ライセンス: Link先を確認

Shi-Xin Hu, Yongxu Fu, Yi Zhang

(参考訳) 非エルミート量子システムへの関心が高まっている中、非相互作用モデルが最も注目されている。ここでは、確率級数展開量子モンテカルロ法を用いて、相互作用する量子系、例えば様々な非エルミート量子スピン鎖における非エルミート物理学を研究する。計算は開境界条件下で一貫した数値結果をもたらすが、周期境界条件下での非エルミート量子系は、非自明な巻線上の想像時間世界線の異常な濃度を観測し、適切な収束のために巻数セクター間のエルゴード性を高める必要がある。このような非自明なワールドラインの巻線は、他の非エルミートモデルや解析的アプローチにも存在する創発的な物理現象である。非エルミート皮膚効果やポイントギャップ分光法と並行して、非エルミート位相現象の同定と解析を、相互作用、有限温度、生物軌道基底、周期境界条件を新規かつ制御された方法で量子系へと大きく拡張する。最後に,このような非自明なワールドライン巻線の直接的物理的意味について検討し,絡み合いエントロピーに付加的,潜在的に準長距離の寄与をもたらす。

Amid the growing interest in non-Hermitian quantum systems, non-interacting models have received the most attention. Here, through the stochastic series expansion quantum Monte Carlo method, we investigate non-Hermitian physics in interacting quantum systems, e.g., various non-Hermitian quantum spin chains. While calculations yield consistent numerical results under open boundary conditions, non-Hermitian quantum systems under periodic boundary conditions observe an unusual concentration of imaginary-time worldlines over nontrivial winding and require enhanced ergodicity between winding-number sectors for proper convergences. Such nontrivial worldline winding is an emergent physical phenomenon that also exists in other non-Hermitian models and analytical approaches. Alongside the non-Hermitian skin effect and the point-gap spectroscopy, it largely extends the identification and analysis of non-Hermitian topological phenomena to quantum systems with interactions, finite temperatures, biorthogonal basis, and periodic boundary conditions in a novel and controlled fashion. Finally, we study the direct physical implications of such nontrivial worldline winding, which bring additional, potentially quasi-long-range contributions to the entanglement entropy.

翻訳日:2023-12-13 02:26:08 公開日:2023-12-10

# Nano1D:低次元ナノ構造の解析とセグメンテーションのための正確なコンピュータビジョンソフトウェア

Nano1D: An accurate Computer Vision software for analysis and segmentation of low-dimensional nanostructures ( http://arxiv.org/abs/2306.15319v3 )

ライセンス: Link先を確認

Ehsan Moradpur-Tari (1), Sergei Vlassov (1,2), Sven Oras (1,2), Mart Ernits (1), Elyad Damerchi (1), Boris Polyakovc (3), Andreas Kyritsakis (1), and Veronika Zadin (1) ((1) Institute of Technology, University of Tartu, Nooruse 1, 50411 Tartu, Estonia (2) Institute of Physics, University of Tartu, W. Ostwaldi 1, 50411 Tartu, Estonia (3) Institute of Solid State Physics, University of Latvia, Kengaraga street 8, LV-1063 Riga, Latvia)

(参考訳) 顕微鏡画像のナノ粒子は通常、質的または手作業で分析され、これらの物体の自律的定量分析が必要となる。本稿では、顕微鏡画像から1次元の変形可能な重なり合う物体の正確なセグメンテーションと幾何解析のための物理計算モデルを提案する。このモデルはNano1Dと呼ばれ、前処理、セグメンテーション、重なり合う物体と幾何学的測定の4つのステップを持つ。このモデルは、異なる顕微鏡から採取したAgおよびAuナノワイヤのSEM画像と、異なる長さ、直径、人口密度のナノ粒子に熱分解されたAgナノワイヤを用いて試験された。長さや平均直径などの幾何学的特徴を分割し分析することに成功した。アルゴリズムの機能は、画像内のオブジェクトのサイズ、数、密度、方向、重なりによって損なわれない。モデルの主な強みは、重なり合うオブジェクトを99%以上の精度でセグメント化および解析し、一方、現在の機械学習と計算モデルは、重なり合うオブジェクトをセグメント化できない不正確さに悩まされている。グラフィカルなユーザインタフェイスから得られるNano1Dは、ナノワイヤ、ナノチューブ、ナノロッドなどの1Dナノ粒子を分析できる。

Nanoparticles in microscopy images are usually analyzed qualitatively or manually and there is a need for autonomous quantitative analysis of these objects. In this paper, we present a physics-based computational model for accurate segmentation and geometrical analysis of one-dimensional deformable overlapping objects from microscopy images. This model, named Nano1D, has four steps of preprocessing, segmentation, separating overlapped objects and geometrical measurements. The model is tested on SEM images of Ag and Au nanowire taken from different microscopes, and thermally fragmented Ag nanowires transformed into nanoparticles with different lengths, diameters, and population densities. It successfully segments and analyzes their geometrical characteristics including lengths and average diameter. The function of the algorithm is not undermined by the size, number, density, orientation and overlapping of objects in images. The main strength of the model is shown to be its ability to segment and analyze overlapping objects successfully with more than 99% accuracy, while current machine learning and computational models suffer from inaccuracy and inability to segment overlapping objects. Benefiting from a graphical user interface, Nano1D can analyze 1D nanoparticles including nanowires, nanotubes, nanorods in addition to other 1D features of microstructures like microcracks, dislocations etc.

翻訳日:2023-12-13 02:24:17 公開日:2023-12-10

# 古典的システムからのインタラクションフリー計測の誤解

Misinference of interaction-free measurement from a classical system ( http://arxiv.org/abs/2306.13590v2 )

ライセンス: Link先を確認

Valeri Frumkin and John W. M. Bush

(参考訳) 相互作用のない測定は、量子粒子が移動しない経路に沿って物体を検出できると考えられている。したがって、これは量子現象の最も迷いの1つである。ここでは, 流体表面を自転する液滴を自転する流体を, 自作の波で誘導する流体力学的パイロット波を用いたインタラクションフリー計測の古典的な例を示す。我々は、相互作用のない量子測定の既存の合理的化は、波状に導かれる粒子によって、我々の流体力学系における古典的な記述を可能にする。

Interaction-free measurement is thought to allow for quantum particles to detect objects along paths they never traveled. As such, it represents one of the most beguiling of quantum phenomena. Here, we present a classical analog of interaction-free measurement using the hydrodynamic pilot-wave system, in which a droplet self-propels across a vibrating fluid surface, guided by a wave of its own making. We argue that existing rationalizations of interaction-free quantum measurement in terms of particles being guided by wave forms allow for a classical description manifest in our hydrodynamic system, wherein the measurement is decidedly not interaction-free.

翻訳日:2023-12-13 02:23:29 公開日:2023-12-10

# ニューラルスペクトロ偏光場

Neural Spectro-polarimetric Fields ( http://arxiv.org/abs/2306.12562v2 )

ライセンス: Link先を確認

Youngchan Kim, Wonjoon Jin, Sunghyun Cho, Seung-Hwan Baek

(参考訳) シーン内の光の空間放射率分布のモデル化は、ビュー合成を含む応用のために広く研究されている。スペクトルと偏光は、光の波動特性であり、3つのrgbスペクトルバンドへの積分と人間の視覚に対する非受容性のため、しばしば無視される。しかし、これらの性質はシーンに関する実質的な資料や幾何学的情報を包含することが知られている。本稿では、任意の波長における任意の光線の空間ストークスベクトル分布である分光偏光場をモデル化する。我々は, 位置, 方向, 波長の連続変数で, 物理的に有意なストークスベクトルをモデル化したニューラル・スペクトロ偏光場(NeSpoF)を提案する。 NeSpoFは本質的にノイズの多い生の測定を管理し、メモリ効率を示し、物理的に重要な信号を保存する。 NeSpoFを検証するために,合成シーンと実世界のシーンの両方からなる,最初のマルチビューハイパースペクトル偏光画像データセットを提案する。これらの画像は当社の小型ハイパースペクトル偏光イメージングシステムを用いて撮影され、システム欠陥に対するロバスト性について校正されている。我々は様々な場面でnespofの能力を示す。

Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. However, these properties are known to encompass substantial material and geometric information about a scene. Here, we propose to model spectro-polarimetric fields, the spatial Stokes-vector distribution of any light ray at an arbitrary wavelength. We present Neural Spectro-polarimetric Fields (NeSpoF), a neural representation that models the physically-valid Stokes vector at given continuous variables of position, direction, and wavelength. NeSpoF manages inherently noisy raw measurements, showcases memory efficiency, and preserves physically vital signals - factors that are crucial for representing the high-dimensional signal of a spectro-polarimetric field. To validate NeSpoF, we introduce the first multi-view hyperspectral-polarimetric image dataset, comprised of both synthetic and real-world scenes. These were captured using our compact hyperspectral-polarimetric imaging system, which has been calibrated for robustness against system imperfections. We demonstrate the capabilities of NeSpoF on diverse scenes.

翻訳日:2023-12-13 02:23:16 公開日:2023-12-10

# 浅い量子回路による化学精度向上に向けて:クリフォードに基づくハミルトン工学的アプローチ

Towards chemical accuracy with shallow quantum circuits: A Clifford-based Hamiltonian engineering approach ( http://arxiv.org/abs/2306.12053v3 )

ライセンス: Link先を確認

Jiace Sun, Lixue Cheng, Weitang Li

(参考訳) 浅い量子回路で化学的精度を得ることは、量子化学、特に短期量子デバイスにおいて重要な課題である。本研究では,回路深さと精度のトレードオフに対処するクリフォードに基づくハミルトン工学アルゴリズム,すなわちCHEMを提案する。変分量子固有解法とハードウェア効率のansatzに基づき、(1)ハーツリー・フォックエネルギーに対応する一連の初期回路パラメータが生成可能であること、(2)回路パラメータに関して初期エネルギー勾配を効果的に最大化すること、(3)古典的処理に無視できるオーバーヘッドを課すこと、(4)追加の量子資源を必要としないこと、(4)回路トポロジーと互換性があることを保証するクリフォードに基づくハミルトニアン変換を設計する。量子ハードウェアエミュレータを用いたアプローチの有効性を実証し,30量子ゲート未満の12量子ビットのシステムに対して化学的精度を実現する。我々のクリフォード拠点のハミルトン工学的アプローチは、短期量子デバイス上での実用的な量子計算化学のための有望な道を提供する。

Achieving chemical accuracy with shallow quantum circuits is a significant challenge in quantum computational chemistry, particularly for near-term quantum devices. In this work, we present a Clifford-based Hamiltonian engineering algorithm, namely CHEM, that addresses the trade-off between circuit depth and accuracy. Based on variational quantum eigensolver and hardware-efficient ansatz, our method designs Clifford-based Hamiltonian transformation that (1) ensures a set of initial circuit parameters corresponding to the Hartree--Fock energy can be generated, (2) effectively maximizes the initial energy gradient with respect to circuit parameters, (3) imposes negligible overhead for classical processing and does not require additional quantum resources, and (4) is compatible with any circuit topology. We demonstrate the efficacy of our approach using a quantum hardware emulator, achieving chemical accuracy for systems as large as 12 qubits with fewer than 30 two-qubit gates. Our Clifford-based Hamiltonian engineering approach offers a promising avenue for practical quantum computational chemistry on near-term quantum devices.

翻訳日:2023-12-13 02:22:22 公開日:2023-12-10

# 自己回帰型ニューラル演算子の安定性に向けて

Towards Stability of Autoregressive Neural Operators ( http://arxiv.org/abs/2306.10619v2 )

ライセンス: Link先を確認

Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

(参考訳) ニューラル演算子は、物理科学における時空間系のモデリングに有望なアプローチであることが証明されている。しかし、これらのモデルを大規模システム向けにトレーニングすることは、計算とメモリの大幅なコストを発生させるため、非常に難しい - これらのシステムは、将来の時間状態を予測するために、ニューラルネットワークの自動回帰的タイムステッピングに頼ることを余儀なくされることが多い。これはコスト管理に有効であるが、時間とともに制御不能なエラーの増加と最終的には不安定になる可能性がある。この自己回帰的誤差の増大の原因を,物理システムのための先駆的ニューラルオペレータモデルを用いて解析し,その軽減法を探究する。計算/メモリコストを膨らませることなく、これらのモデル内で不安定誘導操作を慎重に制御できるアーキテクチャとアプリケーション固有の改善を導入する。本研究では,Navier-Stokes流体の流れ,浅瀬の回転,高分解能気象予報システムなどの科学システムについて報告する。ニューラル演算子に設計原則を適用することで、長期的な予測や、質的な分岐の兆候のない長い時間軸に対する誤差が、これらのシステムのオリジナルのモデルよりも大幅に低減できることを実証する。再現性のために、私たちは \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code}をオープンソースにしました。

Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code} for reproducibility.

翻訳日:2023-12-13 02:21:43 公開日:2023-12-10

# DNNに基づく適応型クルーズ制御システムに対する定常認識攻撃

Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control Systems ( http://arxiv.org/abs/2307.08939v2 )

ライセンス: Link先を確認

Xugui Zhou and Anqi Chen and Maxfield Kouzel and Haotian Ren and Morgan McCarty and Cristina Nita-Rotaru and Homa Alemzadeh

(参考訳) アダプティブ・クルーズ・コントロール(ACC、Adaptive Cruise Control)は、先導車への所望の速度と安全な距離を維持するための運転補助技術である。本稿では, カメラデータに摂動を戦略的に注入して前方衝突を引き起こす, 実行時盗聴攻撃下でのディープニューラルネットワーク(DNN)ベースのACCシステムのセキュリティを評価する。本稿では、攻撃を誘発する最も重要な時間を選択するためのコンテキスト認識戦略と、実行時に画像摂動を適応的に生成するための新しい最適化手法を提案する。提案手法は,実車,実車,実運用型accシステムからの制御ソフトウェア,実世界の運転シミュレータ,運転者による介入,高度緊急ブレーキシステム(aebs)などの安全機能を備えた現実的なシミュレーションプラットフォームを用いて,提案手法の有効性を評価する。実験の結果, 本攻撃は, 実世界の要因や環境の動的変化に対してステルス性, 堅牢でありながら, 危険発生時の成功率142.9倍, 避難率89.6%向上することがわかった。本研究は,攻撃防止における人間ドライバーの役割と基本的な安全メカニズムを明らかにする。

Adaptive Cruise Control (ACC) is a widely used driver assistance technology for maintaining the desired speed and safe distance to the leading vehicle. This paper evaluates the security of the deep neural network (DNN) based ACC systems under runtime stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at runtime. We evaluate the effectiveness of the proposed attack using a publicly available driving dataset, an actual vehicle, and a realistic simulation platform with the control software from a production ACC system, a physical-world driving simulator, and interventions by the human driver and safety features such as Advanced Emergency Braking System (AEBS). Experimental results show that the proposed attack achieves 142.9 times higher success rate in causing hazards and 89.6% higher evasion rate than baselines while being stealthy and robust to real-world factors and dynamic changes in the environment. This study highlights the role of human drivers and basic safety mechanisms in preventing attacks.

翻訳日:2023-12-13 02:11:24 公開日:2023-12-10

# 関数近似を用いたロバスト強化学習のための自然アクター批判

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation ( http://arxiv.org/abs/2307.08875v2 )

ライセンス: Link先を確認

Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian

(参考訳) 本研究では,トレーニングシミュレータとテスト環境間のモデルミスマッチに対して頑健な評価政策を決定することを目的として,ロバスト強化学習(RL)について検討する。従来のポリシーベースのロバストなRLアルゴリズムは主に、ロバストなポリシー評価を容易にする不確実性セットの下での表の設定に重点を置いているが、状態のスケールアップ時にはもはや取り外せない。この目的のために,2つの新しい不確実性集合の定式化を提案し,その1つは二重サンプリングに基づくものであり,もう1つは積分確率計量に基づくものである。どちらも、シミュレータにしかアクセスできない場合でも、大規模で堅牢なRLを牽引可能である。本稿では,新しい不確実性集合を取り入れ,関数近似を用いる,頑健な自然なアクター批判(RNAC)アプローチを提案する。提案するrnacアルゴリズムの関数近似誤差における最適ロバストポリシーに対する有限時間収束保証を提案する。最後に,複数の MuJoCo 環境と実際の TurtleBot ナビゲーションタスクにおいて,提案した RNAC アプローチによって学習されたポリシーの堅牢性を示す。

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.

翻訳日:2023-12-13 02:11:03 公開日:2023-12-10

# 歩数認識と教師なし適応における視線バイアスのある領域ギャップ

Watch Where You Head: A View-biased Domain Gap in Gait Recognition and Unsupervised Adaptation ( http://arxiv.org/abs/2307.06751v3 )

ライセンス: Link先を確認

Gavriel Habib, Noa Barzilay, Or Shimshi, Rami Ben-Ari, Nir Darshan

(参考訳) 歩行認識は、歩行パターンによって人々を識別することを目的としたコンピュータビジョンタスクである。既存のメソッドは特定のデータセットで高いパフォーマンスを示すことが多いが、見当たらないシナリオに一般化する能力が欠けている。 unsupervised domain adaptation(uda)は、ソースドメイン上で教師付きで事前学習されたモデルを、ラベルなしのターゲットドメインに適応させようとする。限られたシナリオに対するソリューションを提案する歩行認識のためのUDAに関する研究はわずかである。本稿では,対象領域の角度や歩行方向に対するバイアスによる歩行認識モデルの適用において,基本的な現象を明らかにする。そこで我々は,新しい三重項選択戦略とカリキュラム学習を組み合わせることで,このバイアスを軽減するための修正を提案する。そこで本稿では,教師なしドメイン適応(GOUDA)のためのゲイト指向方式を提案する。 casia-b,ou-mvlp,grown,gait3dの4つの広く使われているgaitデータセットと,gaitset,gaitpart,gaitglの3つのバックボーンについて広範な実験を行い,アプローチバイアスを正当化し,uda以前の作業よりも提案手法の優越性を示す。

Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Although existing methods often show high performance on specific datasets, they lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. There are only a few works on UDA for gait recognition proposing solutions to limited scenarios. In this paper, we reveal a fundamental phenomenon in adaptation of gait recognition models, caused by the bias in the target domain to viewing angle or walking direction. We then suggest a remedy to reduce this bias with a novel triplet selection strategy combined with curriculum learning. To this end, we present Gait Orientation-based method for Unsupervised Domain Adaptation (GOUDA). We provide extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP, GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL, justifying the view bias and showing the superiority of our proposed method over prior UDA works.

翻訳日:2023-12-13 02:09:43 公開日:2023-12-10

# SOGDet:Semantic-Occupancy Guided Multi-view 3D Object Detection

SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection ( http://arxiv.org/abs/2308.13794v2 )

ライセンス: Link先を確認

Qiu Zhou, Jinming Cao, Hanchao Leng, Yifang Yin, Yu Kun and Roger Zimmermann

(参考訳) 自動運転の分野では、3D環境の正確で包括的な認識が不可欠である。 Bird's Eye View (BEV) ベースの手法は、多視点画像を入力として使用する3Dオブジェクト検出のための有望なソリューションとして登場した。しかし、既存の3Dオブジェクト検出手法は、歩道や植生などの環境の物理的文脈を無視することが多く、結果として準最適性能が得られる。本稿では,sogdet(semantic-occupancy guided multi-view 3d object detection)と呼ばれる3次元意味空間枝を利用して3次元物体検出の精度を向上させる手法を提案する。特に、意味的占有によってモデル化された物理的文脈は、検出器がより総合的な視点でシーンを認識するのに役立つ。私たちのSOGDetは柔軟で、既存のほとんどのBEVベースのメソッドとシームレスに統合できます。本手法の有効性を評価するため,いくつかの最先端ベースラインに適用し,排他的nuScenesデータセット上で広範囲な実験を行う。以上の結果から,SOGDet は nuScenes Detection Score (NDS) と平均平均精度 (mAP) の3つのベースライン法の性能を一貫して向上させることがわかった。これは、3Dオブジェクト検出と3Dセマンティック占有の組み合わせが、3D環境をより包括的に認識し、より堅牢な自律運転システムの構築を支援することを示唆している。コードは、https://github.com/zhouqiu/SOGDet.comで入手できる。

In the field of autonomous driving, accurate and comprehensive perception of the 3D environment is crucial. Bird's Eye View (BEV) based methods have emerged as a promising solution for 3D object detection using multi-view images as input. However, existing 3D object detection methods often ignore the physical context in the environment, such as sidewalk and vegetation, resulting in sub-optimal performance. In this paper, we propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection), that leverages a 3D semantic-occupancy branch to improve the accuracy of 3D object detection. In particular, the physical context modeled by semantic occupancy helps the detector to perceive the scenes in a more holistic view. Our SOGDet is flexible to use and can be seamlessly integrated with most existing BEV-based methods. To evaluate its effectiveness, we apply this approach to several state-of-the-art baselines and conduct extensive experiments on the exclusive nuScenes dataset. Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP). This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems. The codes are available at: https://github.com/zhouqiu/SOGDet.

翻訳日:2023-12-13 02:04:23 公開日:2023-12-10

# オンライン動的埋め込み予測による停滞解消型分散gnnトレーニング

Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction ( http://arxiv.org/abs/2308.13466v2 )

ライセンス: Link先を確認

Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao

(参考訳) 最近のグラフニューラルネットワーク(GNN)の成功にもかかわらず、近隣の爆発によって大規模なグラフでGNNをトレーニングすることは依然として困難である。修正として、分散コンピューティングは、豊富なコンピューティングリソース(例えばgpu)を活用することで、有望なソリューションになる。しかし,グラフデータのノード依存性は,大規模な通信オーバーヘッドに悩まされる分散GNNトレーニングにおいて,高い並行性を実現することの難しさを増大させる。これを解決するために、歴史的価値近似は分散トレーニング技術の有望なクラスと見なされる。オフラインメモリを使用して、正確な値の安価な近似として履歴情報をキャッシュし、高い並行性を実現する。しかし、そのような利点は、古いトレーニング情報を含むコストがかかるため、停滞、不正確さ、および収束の問題に繋がる。これらの課題を克服するため,本稿では,新しいスケーラブル分散gnnトレーニングフレームワークであるsat(staleness-alleviated training)を提案する。 SATの鍵となる考え方は、GNNの埋め込み進化を時間グラフとしてモデル化し、その上にモデルを構築し、将来の埋め込みを予測することである。本稿では,埋め込み予測器と分散GNNを代替的に学習するオンラインアルゴリズムを提案し,さらに収束解析を行う。実験により,satは組込みの停滞を効果的に軽減し,大規模グラフデータセットの性能と収束速度を向上できることを実証した。

Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which suffers from the massive communication overhead. To address it, Historical value approximation is deemed a promising class of distributed training techniques. It utilizes an offline memory to cache historical information (e.g., node embedding) as an affordable approximation of the exact value and achieves high concurrency. However, such benefits come at the cost of involving dated training information, leading to staleness, imprecision, and convergence issues. To overcome these challenges, this paper proposes SAT (Staleness-Alleviated Training), a novel and scalable distributed GNN training framework that reduces the embedding staleness adaptively. The key idea of SAT is to model the GNN's embedding evolution as a temporal graph and build a model upon it to predict future embedding, which effectively alleviates the staleness of the cached historical embedding. We propose an online algorithm to train the embedding predictor and the distributed GNN alternatively and further provide a convergence analysis. Empirically, we demonstrate that SAT can effectively reduce embedding staleness and thus achieve better performance and convergence speed on multiple large-scale graph datasets.

翻訳日:2023-12-13 02:03:55 公開日:2023-12-10

# v2a-mapper:基盤モデル接続による視覚-聴覚生成のための軽量ソリューション

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models ( http://arxiv.org/abs/2308.09300v3 )

ライセンス: Link先を確認

Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai

(参考訳) 基礎モデル(FM)の上に人工知能(AI)システムを構築することは、AI研究における新たなパラダイムになりつつある。膨大なデータから学習した代表的および生成能力は、スクラッチから余分なトレーニングをすることなく、容易に適応し、幅広い下流タスクに移行することができる。しかし、音声モダリティが関与する場合、クロスモーダル生成におけるFMの活用は未検討のままである。一方,視覚入力から意味的関連音を自動生成することは,モーダル・ジェネレーション研究において重要な課題である。このvision-to-audio(v2a)生成問題を解決するために、既存の手法では、小さなデータセットを使って複雑なシステムをスクラッチから設計し構築する傾向がある。本稿では,基礎モデル,特にCLIP,CLAP,AudioLDMを活用することで,この問題に対する軽量な解決策を提案する。まず視覚的CLIPの潜在空間と聴覚的CLAPモデルとの領域ギャップについて検討する。次に,CLIP と CLAP 空間間の視覚的入力を変換することで,領域ギャップを埋めるシンプルなマッパー機構 (V2A-Mapper) を提案する。変換されたCLAP埋め込みを条件に、事前訓練された音声生成FM AudioLDMを採用し、高忠実で視覚的に整合した音を生成する。従来の手法と比較して,本手法ではV2A-Mapperの迅速な訓練しか必要としない。さらに、V2A-Mapperの選択に関する広範な実験を行い、生成マッパーが忠実度と可変性(FD)に優れ、レグレッションマッパーが相対性(CS)に若干優れていることを示す。 2つのV2Aデータセットの客観的評価と主観評価は、現在の最先端手法と比較して、提案手法の優位性を示し、パラメータは86%少なく、FDとCSは53%、CSは19%改善した。

Building artificial intelligence (AI) systems on top of a set of foundation models (FMs) is becoming a new paradigm in AI research. Their representative and generative abilities learnt from vast amounts of data can be easily adapted and transferred to a wide range of downstream tasks without extra training from scratch. However, leveraging FMs in cross-modal generation remains under-researched when audio modality is involved. On the other hand, automatically generating semantically-relevant sound from visual input is an important problem in cross-modal generation studies. To solve this vision-to-audio (V2A) generation problem, existing methods tend to design and build complex systems from scratch using modestly sized datasets. In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM. We first investigate the domain gap between the latent space of the visual CLIP and the auditory CLAP models. Then we propose a simple yet effective mapper mechanism (V2A-Mapper) to bridge the domain gap by translating the visual input between CLIP and CLAP spaces. Conditioned on the translated CLAP embedding, pretrained audio generative FM AudioLDM is adopted to produce high-fidelity and visually-aligned sound. Compared to previous approaches, our method only requires a quick training of the V2A-Mapper. We further analyze and conduct extensive experiments on the choice of the V2A-Mapper and show that a generative mapper is better at fidelity and variability (FD) while a regression mapper is slightly better at relevance (CS). Both objective and subjective evaluation on two V2A datasets demonstrate the superiority of our proposed method compared to current state-of-the-art approaches - trained with 86% fewer parameters but achieving 53% and 19% improvement in FD and CS, respectively.

翻訳日:2023-12-13 02:01:56 公開日:2023-12-10

# 一般浅層ネットワークによる近似のトラクタビリティ

Tractability of approximation by general shallow networks ( http://arxiv.org/abs/2308.03230v2 )

ライセンス: Link先を確認

Hrushikesh Mhaskar, Tong Mao

(参考訳) 本稿では,一般浅層ネットワークの次元独立境界(ニューラルネットワーク, \textbf{123} (2020), 142-152)において,よりシャープな結果版を提案する。 \mathbb{x}$ と $\mathbb{y}$ をコンパクト距離空間とする。 x\mapsto\int_{\mathbb{Y}} G(x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG(x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$。被覆数の観点から、$\mathbb{x}$ と $\mathbb{y}$ の次元を定義すると、n$ の項で近似の次数上の次元独立な境界が得られる。応用には、高次元空間への関数拡張の重要な問題だけでなく、電力整合線形単位ネットワーク、粒子関数ネットワーク、特定の放射基底関数ネットワークによる近似が含まれる。

In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.

翻訳日:2023-12-13 01:59:22 公開日:2023-12-10

# QNNのトレーニングに必要なサンプル量の削減について:トレーニングデータの線形構造に関する制約

On Reducing the Amount of Samples Required for Training of QNNs: Constraints on the Linear Structure of the Training Data ( http://arxiv.org/abs/2309.13711v2 )

ライセンス: Link先を確認

Alexander Mandl, Johanna Barzen, Frank Leymann, Daniel Vietz

(参考訳) 古典的ニューラルネットワークのトレーニングは通常、多数のトレーニングサンプルを必要とする。絡み合ったトレーニングサンプルを使用することで、量子ニューラルネットワーク(QNN)はトレーニングプロセスに必要なトレーニングサンプルの量を著しく削減する可能性がある。しかし、結果のQNNによる誤った予測数を最小化するためには、トレーニングサンプルの構造が一定の要件を満たすことが不可欠である。一方、トレーニングサンプルのセット全体に対して、正確な絡み合いの程度が固定されなければならない。一方、トレーニングサンプルは線形独立かつ非直交でなければならない。しかし、これらの要件を満たさないことがQNNの結果に与える影響は、十分に研究されていない。これを解決するため、QNFL定理の証明を拡張した。 (i)絡み合いの程度の違いに対する定理の一般化を提供する。この一般化は、トレーニングサンプルのセットにおける絡み合いの平均度を用いて、QNNの期待品質を予測できることを示している。さらに私たちは (II) 線形依存型, 直交型である適度に絡み合ったトレーニングサンプルに対するQNNの予測精度の新しい推定値を導入する。私たちの分析結果は 3)QNN訓練を模擬し,訓練後のQNNの質を分析して実験的に検証した。

Training classical neural networks generally requires a large number of training samples. Using entangled training samples, Quantum Neural Networks (QNNs) have the potential to significantly reduce the amount of training samples required in the training process. However, to minimize the number of incorrect predictions made by the resulting QNN, it is essential that the structure of the training samples meets certain requirements. On the one hand, the exact degree of entanglement must be fixed for the whole set of training samples. On the other hand, training samples must be linearly independent and non-orthogonal. However, how failing to meet these requirements affects the resulting QNN is not fully studied. To address this, we extend the proof of the QNFL theorem to (i) provide a generalization of the theorem for varying degrees of entanglement. This generalization shows that the average degree of entanglement in the set of training samples can be used to predict the expected quality of the QNN. Furthermore, we (ii) introduce new estimates for the expected accuracy of QNNs for moderately entangled training samples that are linear dependent or orthogonal. Our analytical results are (iii) experimentally validated by simulating QNN training and analyzing the quality of the QNN after training.

翻訳日:2023-12-13 01:51:35 公開日:2023-12-10

# ジョセフソンパラメトリック発振器を用いたイジングマシン

A Josephson Parametric Oscillator-Based Ising Machine ( http://arxiv.org/abs/2309.03407v2 )

ライセンス: Link先を確認

Sasan Razmkhah, Mehdi Kamal, Nobuyuki Yoshikawa, Massoud Pedram

(参考訳) イジングマシンはNP完全組合せ最適化問題を高速に解くための有望なソリューションとして登場し、従来の計算手法の能力を超越している。加熱過程におけるハミルトン基底状態の効率的な決定により、Isingマシンは最適化問題に対処するためにCPUを効率的に補完することができる。これらのイジングマシンを実現するために、二安定発振器はイジングモデルの原子スピンと相互作用をエミュレートするために必須である。本研究では,スケーラブルな超伝導イジングマシンの基本単位として,ジョセフソンパラメトリック振動子(jpo)を用いたタイル構造を提案する。超伝導体ベースの発振器であるJPOの双安定特性を利用して、提案機は7.5GHzの周波数で動作でき、CMOSベースのシステムに比べて消費電力は大幅に少ない(3桁)。さらに、提案したタイル構造とLHZアーキテクチャとの互換性により、大規模統合の実現性が保証される。騒音環境下でのタイルのシミュレーションを行い,その機能検証を行った。その結果をハミルトニアンモデルの解析解と比較し,その動作特性を検証した。この検証は、Isingマシンの実装におけるJPOベースのタイルの有効性と有効性を示し、量子コンピューティングにおける効率的でスケーラブルな組合せ最適化のための新しい道を開く。

Ising machines have emerged as a promising solution for rapidly solving NP-complete combinatorial optimization problems, surpassing the capabilities of traditional computing methods. By efficiently determining the ground state of the Hamiltonian during the annealing process, Ising machines can effectively complement CPUs in tackling optimization challenges. To realize these Ising machines, a bi-stable oscillator is essential to emulate the atomic spins and interactions of the Ising model. This study introduces a Josephson parametric oscillator (JPO)-based tile structure, serving as a fundamental unit for scalable superconductor-based Ising machines. Leveraging the bi-stable nature of JPOs, which are superconductor-based oscillators, the proposed machine can operate at frequencies of 7.5GHz while consuming significantly less power (by three orders of magnitude) than CMOS-based systems. Furthermore, the compatibility of the proposed tile structure with the Lechner-Hauke-Zoller (LHZ) architecture ensures its viability for large-scale integration. We conducted simulations of the tile in a noisy environment to validate its functionality. We verified its operational characteristics by comparing the results with the analytical solution of its Hamiltonian model. This verification demonstrates the feasibility and effectiveness of the JPO-based tile in implementing Ising machines, opening new avenues for efficient and scalable combinatorial optimization in quantum computing.

翻訳日:2023-12-13 01:49:07 公開日:2023-12-10

# パッチバイパッチパラダイムによるganを用いた無限サイズのテクスチャ生成

Generating Infinite-Size Textures using GANs with Patch-by-Patch Paradigm ( http://arxiv.org/abs/2309.02340v3 )

ライセンス: Link先を確認

Alhasan Abdellatif and Ahmed H. Elsheikh

(参考訳) 本稿では,パッチ・バイ・パッチ・パラダイムに基づくGAN(Generative Adversarial Networks)を用いて,無限サイズのテクスチャ画像を生成する手法を提案する。既存のテクスチャ合成技術は、生成モデルへの単一のフォワードパスを使用して、大規模なテクスチャを生成することに依存している。対照的に、提案手法は単一のテクスチャイメージ上にGANモデルをトレーニングし、局所的に相関し、より大きな画像を形成するためにシームレスに結合できる比較的小さなパッチを生成する。このメソッドはジェネレータのローカルパディングに依存し、生成されたパッチ間の一貫性を保証する。また、空間確率変調を利用して局所的な変動を可能にし、大規模画像のパターンアライメントを改善する。トレーニングされたモデルは、局所的なテクスチャ構造を学び、任意のサイズの画像を生成すると同時に、一貫性と多様性を維持します。実験結果は、GPUメモリの比例的な成長を示す既存のアプローチと比較して、生成した画像サイズに対して一定のGPUスケーラビリティを示す。

In this paper, we introduce a novel approach for generating texture images of infinite sizes using Generative Adversarial Networks (GANs) based on a patch-by-patch paradigm. Existing texture synthesis techniques rely on generating large-scale textures using a single forward pass to the generative model; this approach limits the scalability and flexibility of the images produced. In contrast, the proposed approach trains a GAN model on a single texture image to generate relatively small-size patches that are locally correlated and can be seamlessly concatenated to form a larger image. The method relies on local padding in the generator to ensure consistency between the generated patches. It also utilizes spatial stochastic modulation to allow for local variations and improve patterns alignment in the large-scale image. The trained models learn the local texture structure and are able to generate images of arbitrary sizes, while also maintaining the coherence and diversity. Experimental results demonstrate constant GPU scalability with respect to the generated image size compared to existing approaches that exhibit a proportional growth in GPU memory.

翻訳日:2023-12-13 01:48:16 公開日:2023-12-10

# MedShapeNet - コンピュータビジョンのための3D医療形状の大規模データセット

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision ( http://arxiv.org/abs/2308.16139v4 )

ライセンス: Link先を確認

Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen, Heinrich M\"achler, Jan Stefan Kirschke, Ezequiel de la Rosa, Patrick Ferdinand Christ, Hongwei Bran Li, David G. Ellis, Michele R. Aizenberg, Sergios Gatidis, Thomas K\"ustner, Nadya Shusharina, Nicholas Heller, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Anjany Sekuboyina, Maximilian L\"offler, Hans Liebl, Reuben Dorent, Tom Vercauteren, Jonathan Shapey, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Achraf Ben-Hamadou, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Federico Bolelli, Costantino Grana, Luca Lumetti, Hamidreza Salehi, Jun Ma, Yao Zhang, Ramtin Gharleghi, Susann Beier, Arcot Sowmya, Eduardo A. Garza-Villarreal, Thania Balducci, Diego Angeles-Valdez, Roberto Souza, Leticia Rittner, Richard Frayne, Yuanfeng Ji, Vincenzo Ferrari, Soumick Chatterjee, Florian Dubost, Stefanie Schreiber, Hendrik Mattern, Oliver Speck, Daniel Haehn, Christoph John, Andreas N\"urnberger, Jo\~ao Pedrosa, Carlos Ferreira, Guilherme Aresta, Ant\'onio Cunha, Aur\'elio Campilho, Yannick Suter, Jose Garcia, Alain Lalande, Vicky Vandenbossche, Aline Van Oevelen, Kate Duquesne, Hamza Mekhzoum, Jef Vandemeulebroucke, Emmanuel Audenaert, Claudia Krebs, Timo van Leeuwen, Evie Vereecke, Hauke Heidemeyer, Rainer R\"ohrig, Frank H\"olzle, Vahid Badeli, Kathrin Krieger, Matthias Gunzer, Jianxu Chen, Timo van Meegdenburg, Amin Dada, Miriam Balzer, Jana Fragemann, Frederic Jonske, Moritz Rempe, Stanislav Malorodov, Fin H. Bahnsen, Constantin Seibold, Alexander Jaus, Zdravko Marinov, Paul F. Jaeger, Rainer Stiefelhagen, Ana Sofia Santos, Mariana Lindo, Andr\'e Ferreira, Victor Alves, Michael Kamp, Amr Abourayya, Felix Nensa, Fabian H\"orst, Alexander Brehmer, Lukas Heine, Yannik Hanusrichter, Martin We{\ss}ling, Marcel Dudda, Lars E. Podleska, Matthias A. Fink, Julius Keyl, Konstantinos Tserpes, Moon-Sung Kim, Shireen Elhabian, Hans Lamecker, D\v{z}enan Zuki\'c, Beatriz Paniagua, Christian Wachinger, Martin Urschler, Luc Duong, Jakob Wasserthal, Peter F. Hoyer, Oliver Basu, Thomas Maal, Max J. H. Witjes, Gregor Schiele, Ti-chiun Chang, Seyed-Ahmad Ahmadi, Ping Luo, Bjoern Menze, Mauricio Reyes, Thomas M. Deserno, Christos Davatzikos, Behrus Puladi, Pascal Fua, Alan L. Yuille, Jens Kleesiek, Jan Egger

(参考訳) 深層学習以前は、対象を記述するのに「textit{shape}」が一般的であった。今日では、医療画像における最先端のSOTAアルゴリズムは、主にボクセルグリッド、メッシュ、ポイントクラウド、暗黙の表面モデルを使用するコンピュータビジョンから分岐している。これは、プレミアビジョンカンファレンスにおける多くのシェイプ関連出版物、および \textit{ShapeNet} (約51,300モデル) と \textit{Princeton ModelNet} (127,915モデル) の人気が高まっている。医療分野では,医療応用へのデータ駆動型視覚アルゴリズムの翻訳を容易にし,SOTAビジョンアルゴリズムを医療問題に適用するために,解剖学的形状(骨,臓器,血管など)と手術器具の3次元モデル(textit{MedShapeNet})を多数提示する。特異な特徴として,実際の患者の画像データから形状のほとんどを直接モデル化する。今日、 \textit{medshapenet}には、アノテーションとペアリングされた10万以上の形状を持つ23のデータセットが含まれている(ground truth)。私たちのデータは、webインターフェースとpython application programming interface(api)を介して自由にアクセスでき、判別、再構成、変動ベンチマーク、仮想、拡張、混合現実、および3dプリンティングの様々なアプリケーションで使用できます。例として,脳腫瘍の分類,顔面と頭蓋骨の再建,マルチクラス解剖学の完成,教育,3Dプリンティングの分野での応用例を挙げる。将来的には、データを拡張し、インターフェースを改善します。プロジェクトページは、 \url{https://medshapenet.ikim.nrw/} と \url{https://github.com/jianningli/medshapenet-feedback} である。

Prior to the deep learning era, \textit{shape} was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of \textit{ShapeNet} (about 51,300 models) and \textit{Princeton ModelNet} (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called \textit{MedShapeNet}, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, \textit{MedShapeNet} includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: \url{https://medshapenet.ikim.nrw/} and \url{https://github.com/Jianningli/medshapenet-feedback}

翻訳日:2023-12-13 01:47:03 公開日:2023-12-10

# エキシトン-ポーラリトン凝縮:フーリエニューラルオペレーターアプローチ

Exciton-Polariton Condensates: A Fourier Neural Operator Approach ( http://arxiv.org/abs/2309.15593v2 )

ライセンス: Link先を確認

Surya T. Sathujoda, Yuan Wang, Kanishk Gandhi

(参考訳) 過去10年間の半導体製造の進歩は、エキシトン・ポラリトン凝縮によって駆動される全光学デバイスに関する広範な研究を触媒している。トランジスタを含むこれらの装置の予備的検証は、環境条件下においても奨励効果を示す。しかし、大規模な応用には依然として大きな課題が残っており、安定するために長い時間を要する複雑な非線形系をシミュレートするために使用できる堅牢な解法がない。このニーズに対処するため,機械学習に基づくフーリエニューラル演算子の応用を提案し,グロス・ピタエフスキー方程式と余剰エキシトンレート方程式の解を求める。この研究は、ニューラル演算子のエキシトン-ポラリトン凝縮系への最初の直接的応用である。提案手法は,CUDAベースのGPU解法よりも1000倍近い精度で最終状態の解を予測できることを示す。さらに、これは実験データを統合することによって、全光学チップ設計ワークフローの潜在的な道を開く。

Advancements in semiconductor fabrication over the past decade have catalyzed extensive research into all-optical devices driven by exciton-polariton condensates. Preliminary validations of such devices, including transistors, have shown encouraging results even under ambient conditions. A significant challenge still remains for large scale application however: the lack of a robust solver that can be used to simulate complex nonlinear systems which require an extended period of time to stabilize. Addressing this need, we propose the application of a machine-learning-based Fourier Neural Operator approach to find the solution to the Gross-Pitaevskii equations coupled with extra exciton rate equations. This work marks the first direct application of Neural Operators to an exciton-polariton condensate system. Our findings show that the proposed method can predict final-state solutions to a high degree of accuracy almost 1000 times faster than CUDA-based GPU solvers. Moreover, this paves the way for potential all-optical chip design workflows by integrating experimental data.

翻訳日:2023-12-13 01:37:37 公開日:2023-12-10

# Kmスケール大気下降の残留拡散モデル

Residual Diffusion Modeling for Km-scale Atmospheric Downscaling ( http://arxiv.org/abs/2309.15214v3 )

ライセンス: Link先を確認

Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, and Mike Pritchard

(参考訳) 気象リスクの予測には、粗いグローバルインプットによって駆動される高価なkmスケールシミュレーションが必要である。ここでは,25km ERA5再解析に基づく台湾上空2kmの高分解能気象モデルを用いて,コスト効率の高い確率ダウンスケーリングモデルを訓練する。気象データのマルチスケール機械学習の課題に対処するために、2段階のアプローチ補正拡散(\textit{corrdiff})を採用し、そこで平均のunet予測を拡散ステップで補正する。レイノルズによる流体力学の分解と同様に、これは生成学習を確率スケールに分離する。 \textit{corrdiff} は熟練したrmseと crps を示し、極端でもスペクトルと分布を忠実に復元する。コヒーレント気象現象のケーススタディでは、台風の目壁付近で激しい降雨と急勾配のコロケーション、極端な風と降雨帯といった、学習物理学を連想させる適切な多変量関係が示される。グローバルな予測のスケールダウンは、これらのメリットの多くをうまく維持し、マシンラーニングの天気予報のエンドツーエンドなグローバルなスケールの可能性を先導する。

Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs. Here, a cost-effective stochastic downscaling model is trained from a high-resolution 2-km weather model over Taiwan conditioned on 25-km ERA5 reanalysis. To address the multi-scale machine learning challenges of weather data, we employ a two-step approach Corrector Diffusion (\textit{CorrDiff}), where a UNet prediction of the mean is corrected by a diffusion step. Akin to Reynolds decomposition in fluid dynamics, this isolates generative learning to the stochastic scales. \textit{CorrDiff} exhibits skillful RMSE and CRPS and faithfully recovers spectra and distributions even for extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics: the collocation of intense rainfall and sharp gradients in fronts and extreme winds and rainfall bands near the eyewall of typhoons. Downscaling global forecasts successfully retains many of these benefits, foreshadowing the potential of end-to-end, global-to-km-scales machine learning weather predictions.

翻訳日:2023-12-13 01:37:23 公開日:2023-12-10

# 共同音声と音声の理解

Joint Audio and Speech Understanding ( http://arxiv.org/abs/2309.14405v3 )

ライセンス: Link先を確認

Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

(参考訳) 人間は音声と非音声の両方を含む音声信号に囲まれている。音声および非音声音声イベントの認識と理解は、両者の関係を深く理解すると共に、基本的な認知能力を構成する。概念的に類似した普遍的なオーディオ知覚と高度な推論能力を持つ、ltu-asと呼ばれる機械学習モデルが初めて構築されました。具体的には、Whisperを知覚モジュールとして、LLaMAを推論モジュールとして統合することにより、LTU-ASは音声テキスト、音声パラ言語学、非音声音声イベントを同時に認識し、共同理解することができる。

Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.

翻訳日:2023-12-13 01:36:06 公開日:2023-12-10

# 羅生門重要度分布:不安定かつ単一モデルに基づく可変値のRID化

The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance ( http://arxiv.org/abs/2309.13775v3 )

ライセンス: Link先を確認

Jon Donnelly, Srikar Katta, Cynthia Rudin, Edward P. Browne

(参考訳) 可変重要度を定量化することは、遺伝学、公共政策、医学などの分野における高リスクな質問に答えるために不可欠である。現在の手法は一般に、与えられたデータセットでトレーニングされた与えられたモデルに対する変数の重要度を計算する。しかし、あるデータセットに対して、ターゲットとなる結果について等しく説明できる多くのモデルが存在するかもしれない。さらに、与えられたデータセットの可能なすべての説明を考慮に入れたとしても、これらの洞察は一般化しないかもしれない。本稿では,すべての優れたモデルの集合における変数の重要性を定量化し,データ分布全体で安定な新しい変数重要度フレームワークを提案する。私たちのフレームワークは非常に柔軟で、既存のモデルクラスやグローバル変数重要度メトリクスと統合できます。実験により,提案手法は他の手法が失敗する複雑なシミュレーション環境において,変数重要度ランキングを回復することを示した。さらに,本フレームワークは,基礎となるデータ分布に対する変数の真の重要性を正確に推定する。推定器の整合性および有限サンプル誤差率に関する理論的保証を提供する。最後に、HIV感染者のHIV負荷を予測するためにどの遺伝子が重要であるかを実世界のケーススタディで検証し、これまで研究されていない重要な遺伝子を強調した。コードはhttps://github.com/jdonnelly36/Rashomon_Importance_Distributionで公開されている。

Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.

翻訳日:2023-12-13 01:35:55 公開日:2023-12-10

# 大規模分散モデルトレーニングのための効率的な並列化レイアウト

Efficient Parallelization Layouts for Large-Scale Distributed Model Training ( http://arxiv.org/abs/2311.05610v2 )

ライセンス: Link先を確認

Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo

(参考訳) 大きな言語モデルを効果的に訓練するには、数百のハードウェアアクセラレーターを並列化し、様々な計算とメモリの最適化を実行する必要がある。組み合わせると、これらの戦略の多くは最終訓練効率に関する複雑な相互作用を持つ。この問題に取り組む以前の作業では、フラッシュアテンションやシーケンス並列処理など、最新の最適化セットにアクセスできなかった。本研究では,大規模言語モデルのトレーニング構成に関する包括的アブレーション研究を行う。この大規模な研究を、最も効率的なトレーニングのためのいくつかの重要な推奨事項にまとめます。例えば、マイクロバッチサイズ1を使用することで、最も効率的なトレーニングレイアウトが可能になります。より大きなマイクロバッチサイズは、アクティベーションチェックポイントやモデル並列性の高次化を必要とし、さらに大きなパイプラインバブルにつながる。最も効率的な構成は、Llama 13Bモデルをトレーニングする際のモデルFLOPs利用率70.5%など、様々なモデルサイズで最先端のトレーニング効率を達成できます。

Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a Llama 13B model.

翻訳日:2023-12-13 01:28:35 公開日:2023-12-10

# 量子力学の仮定としての最大エントロピー原理

Maximum Entropy Principle as Postulate of Quantum Mechanics ( http://arxiv.org/abs/2311.04893v2 )

ライセンス: Link先を確認

Alexei V. Tkachenko

(参考訳) 量子力学(QM)の定式化から1世紀も経っても、波動関数崩壊(WFC)は理論の論争的な側面のままである。環境誘起デコヒーレンス(英語版)は、オープン量子システムにおけるユニタリ進化が、そのコンポーネント内の効果的なwfcにどのようにつながるかを示すことによって、部分的な解決を提供する。しかし、このアプローチ自体がQMの完全自己整合的な再構成につながるわけではない。我々は、WFCとボルンの確率則の両方を除外した修正されたQM仮定を導入する。最大エントロピー原理(英: Maximum Entropy Principle)は、相互に互換性のある観測のための条件付き確率を示す、より弱い仮定である。この定式化の中で、WFCとボルンの規則は共に新しい性質となる。

Even a century after the formulation of Quantum Mechanics (QM), the wave function collapse (WFC) remains a contentious aspect of the theory. Environment-induced decoherence has offered a partial resolution by illustrating how unitary evolution in an open quantum system can lead to effective WFC within its components. However, this approach by itself does not lead to a fully self-consistent reformulation of QM. We introduce a modified set of QM postulates, which exclude both WFC and Born's probability rule. They are replaced with the Maximum Entropy Principle, a weaker postulate that specifies conditional probabilities for mutually compatible observations. Within this formulation, both WFC and Born's rule become emerging properties.

翻訳日:2023-12-13 01:28:01 公開日:2023-12-10

# MAS:2次元拡散を用いた3次元モーション生成のためのマルチビューアンセストラルサンプリング

MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion ( http://arxiv.org/abs/2310.14729v2 )

ライセンス: Link先を確認

Roy Kapon, Guy Tevet, Daniel Cohen-Or and Amit H. Bermano

(参考訳) 本研究では,3次元映像から得られた動きに基づいて学習した2次元拡散モデルを用いて,3次元動き生成法であるマルチビューサンプリング(mas)を提案する。そのため、masは3dデータの収集が困難で困難であるため、これまで未調査だった、エキサイティングで多様な動きの分野へのチャンスを開く。 MASは、同じ3Dモーションの異なるビューを表す複数の2Dモーションシーケンスを同時に識別する。個々の世代を統一された3Dシーケンスに組み合わせ、元のビューに投影することで、各拡散ステップにおけるすべてのビューの一貫性を保証する。プロバスケットボールの練習、球技を特徴とする新体操競技、競馬の映像から得られた2次元ポーズデータのmasを実演する。それぞれの領域において、3Dモーションキャプチャは困難であるが、MASは多様なリアルな3Dシーケンスを生成する。小修正を繰り返し適用することで各試料を最適化するスコア蒸留法とは異なり,本手法は拡散フレームワークのために構築されたサンプリングプロセスを使用する。示すように、MASはドメイン外サンプリングやモード崩壊といった一般的な問題を避けます。 https://guytevet.github.io/mas-page/

We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation, using 2D diffusion models that were trained on motions obtained from in-the-wild videos. As such, MAS opens opportunities to exciting and diverse fields of motion previously under-explored as 3D data is scarce and hard to collect. MAS works by simultaneously denoising multiple 2D motion sequences representing different views of the same 3D motion. It ensures consistency across all views at each diffusion step by combining the individual generations into a unified 3D sequence, and projecting it back to the original views. We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers, rhythmic gymnastic performances featuring a ball apparatus, and horse races. In each of these domains, 3D motion capture is arduous, and yet, MAS generates diverse and realistic 3D sequences. Unlike the Score Distillation approach, which optimizes each sample by repeatedly applying small fixes, our method uses a sampling process that was constructed for the diffusion framework. As we demonstrate, MAS avoids common issues such as out-of-domain sampling and mode-collapse. https://guytevet.github.io/mas-page/

翻訳日:2023-12-13 01:26:11 公開日:2023-12-10

# BERTの一般化に対するヒトとヒトの親和性サンプルの影響

Effects of Human Adversarial and Affable Samples on BERT Generalization ( http://arxiv.org/abs/2310.08008v4 )

ライセンス: Link先を確認

Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor

(参考訳) bertベースのモデルは、leaderboardsでパフォーマンスが高かったが、現実の世界では一般化を必要とする状況では、かなり悪くなっている。限られた量のトレーニングデータは、機械学習における一般化性を達成するための鍵となる障害とみなされる。本稿では,モデルの一般化性に対する量ではなく,データ品質のトレーニングが与える影響について検討する。訓練データの特徴として,人間-敵関係 (h-敵関係) の部分,すなわち,一見小さな差異があるが接地ラベルが異なるサンプルペア,および人間-適応(h-親和性)訓練サンプル,すなわち,接地ラベルは同じ接地ラベルを持つサンプルペアの2つを検討した。サンプルの固定サイズについては,親指の規則として10～30%のh-adversarialインスタンスを持つと精度が向上し,F1はテキスト分類や関係抽出のタスクにおいて最大20ポイント向上することがわかった。この範囲を超えてh-adversarialが増加すると、パフォーマンスのプラトーや劣化が起きる。対照的に、h-affablesはモデルの一般化可能性に寄与せず、一般化性能を低下させることもある。

BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model's generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e., sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e., sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model's generalizability and may even degrade generalization performance.

翻訳日:2023-12-13 01:25:21 公開日:2023-12-10

# 動的外観粒子ニューラル放射場

Dynamic Appearance Particle Neural Radiance Field ( http://arxiv.org/abs/2310.07916v2 )

ライセンス: Link先を確認

Ancheng Lin, Jun Li

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は3Dシーンをモデル化する大きな可能性を示している。動的NeRFは、典型的には変形場を用いて、時間変化要素をキャプチャすることでこのモデルを拡張する。既存の動的nerfは、光放射と変形場の両方に同様のオイラー表現を用いる。これは外見と動きを密結合させ、物理的解釈を欠いている。本研究では,動的3次元シーンにおける視覚的要素の運動をモデル化するための粒子ベース表現を導入し,DAP-NeRF(Dynamic Outearance Particle Neural Radiance Field)を提案する。 DAP-NeRFは静的場と動的場の重ね合わせからなる。動的場は、シーン内の小さな動的要素の視覚情報を伝達し、モーションモデルを備えた「外見粒子」の集合として定量化される。粒子の静的場、視覚特徴、運動モデルを含む全ての構成要素は、シーンに関する事前の幾何学的知識なしに単眼ビデオから学習される。粒子モデルのための効率的な計算フレームワークを開発する。また,動きモデリングを評価するための新しいデータセットを構築した。実験結果から, DAP-NeRFは外見だけでなく, 3次元動的シーンにおける身体的に意味のある動きを捉えるのに有効であることがわかった。

Neural Radiance Fields (NeRFs) have shown great potential in modelling 3D scenes. Dynamic NeRFs extend this model by capturing time-varying elements, typically using deformation fields. The existing dynamic NeRFs employ a similar Eulerian representation for both light radiance and deformation fields. This leads to a close coupling of appearance and motion and lacks a physical interpretation. In this work, we propose Dynamic Appearance Particle Neural Radiance Field (DAP-NeRF), which introduces particle-based representation to model the motions of visual elements in a dynamic 3D scene. DAP-NeRF consists of superposition of a static field and a dynamic field. The dynamic field is quantised as a collection of {\em appearance particles}, which carries the visual information of a small dynamic element in the scene and is equipped with a motion model. All components, including the static field, the visual features and motion models of the particles, are learned from monocular videos without any prior geometric knowledge of the scene. We develop an efficient computational framework for the particle-based model. We also construct a new dataset to evaluate motion modelling. Experimental results show that DAP-NeRF is an effective technique to capture not only the appearance but also the physically meaningful motions in a 3D dynamic scene.

翻訳日:2023-12-13 01:25:00 公開日:2023-12-10

# DeepSimHO:物理シミュレーションによる手動物体間相互作用の安定電位推定

DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation ( http://arxiv.org/abs/2310.07206v3 )

ライセンス: Link先を確認

Rong Wang, Wei Mao, Hongdong Li

(参考訳) 本稿では,物体と相互作用する手の3次元ポーズ推定の課題について検討する。ハンド・オブジェクト相互作用のモデル化では、手が物体を安定して把握し、重力に逆行し、物体の滑りや落下を防止しなければならない動的性質を見落としながら、主に近接する手がかりを利用する。これらの仕事は、推定において動的制約を活用できず、結果としてしばしば不安定な結果を生み出す。一方で、物理ベースの推論による不安定な構成の洗練は、接触ダイナミクスの複雑さと、データ駆動学習フレームワークにおける効率的で効率的な物理推論の欠如の両方によって、依然として困難である。両問題に対処するため,我々は,前方物理学シミュレーションと後方勾配近似とニューラルネットワークを組み合わせた,新しいディープラーニングパイプラインであるdeepsimhoを提案する。具体的には,ベースネットワークによって推定された初期ハンドオブジェクトポーズに対して,その安定性を評価するために物理シミュレータに転送する。しかし、非スムース接触形状と浸透のため、既存の微分可能シミュレータは信頼できる状態勾配を提供することができない。この問題を解決するために,我々は,シミュレータから安定性評価プロセスをスムーズに学習し,その勾配を近似し,効果的なバックプロパゲーションを実現するディープネットワークを提案する。実験の結果,提案手法は評価の安定性を著しく向上し,テスト時間最適化よりも優れた効率性を実現することがわかった。コードはhttps://github.com/rongakowang/deepsimhoで入手できる。

This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Specifically, for an initial hand-object pose estimated by a base network, we forward it to a physics simulator to evaluate its stability. However, due to non-smooth contact geometry and penetration, existing differentiable simulators can not provide reliable state gradient. To remedy this, we further introduce a deep network to learn the stability evaluation process from the simulator, while smoothly approximating its gradient and thus enabling effective back-propagation. Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization. The code is available at https://github.com/rongakowang/DeepSimHO.

翻訳日:2023-12-13 01:24:14 公開日:2023-12-10

# ニューラルネットワークを用いた確率的構造メタマテリアルの機械的特性と逆設計

Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators ( http://arxiv.org/abs/2311.13812v2 )

ライセンス: Link先を確認

Hanxun Jin, Enrui Zhang, Boyu Zhang, Sridhar Krishnaswamy, George Em Karniadakis, Horacio D. Espinosa

(参考訳) 機械学習(ML)は、設計した材料を設計するための変革的なツールとして登場し、ラボベースの試行錯誤手法によって達成可能なものを超える特性を提供する。しかし、現在の逆設計戦略における大きな課題は、計算および/または実験的なデータセットへの依存であり、特に非線形機械的挙動を示すマイクロスケールの確率的構造材料の設計において問題となる。本稿では,ディープニューラル演算子(deeponet)を活用した新しいエンド・ツー・エンドの科学mlフレームワークについて紹介する。このアプローチは、特定の非線形機械的挙動に合わせた構造物の逆設計を容易にする。 2光子リソグラフィで印刷したスピノダル微細構造から得られた結果は, 機械応答の予測誤差が5～10%の範囲内にあることを明らかにした。我々の研究は、先進的なマイクロメカニクス実験技術を用いたニューラル演算子を用いることで、データ不足に制約されたシナリオにおいても、所望の特性を持つ複雑なマイクロ構造材料の設計が実現可能であることを強調している。我々の研究は、材料設計の分野において重要な進歩を示し、実験的な洞察から直接得られる非平行な機械的特性を持つ次世代のメタマテリアルの発見と開発における新しい時代を告げる可能性を秘めている。

Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.

翻訳日:2023-12-13 01:15:12 公開日:2023-12-10

# Rich and Poor Texture Contrast: AI生成画像検出のためのシンプルで効果的なアプローチ

Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection ( http://arxiv.org/abs/2311.12397v2 )

ライセンス: Link先を確認

Nan Zhong, Yiran Xu, Zhenxing Qian, Xinpeng Zhang

(参考訳) 最近の生成モデルは、写真画像の生成において印象的な性能を示している。人間は、そんな信じられないほどリアルなai画像と実際の画像とを区別できない。 AI生成画像は、ユビキタスな偽情報拡散につながる可能性がある。したがって、AI生成画像を特定する検出器を開発するのは最も緊急である。既存の検出器の多くは、目に見えない生成モデルよりも高い性能低下に悩まされている。本稿では,多種多様な生成モデルにより生成された偽画像を識別できる,新しいAI生成画像検出器を提案する。本手法では,画像内のテクスチャ領域とテクスチャ領域間のピクセル間相関コントラストを利用する。豊かなテクスチャ領域の画素は、粗いテクスチャ領域よりも大きな変動を示す。この相違は、豊かなテクスチャ領域のエントロピーが貧しい領域のエントロピーよりも大きいことを反映している。その結果、現実的なリッチテクスチャ領域の合成は、既存の生成モデルよりも難しいことが証明される。この原理に基づき、画像を複数のパッチに分割し、リッチテキストと貧弱テキストのパッチからなる2つのイメージに再構成する。次に,テクスチャ領域とテクスチャ領域の画素間相関差を抽出した。この機能は、さまざまな生成モデルにわたるAI生成画像鑑定に使用される普遍的な指紋として機能する。さらに,既存のベースラインの有効性とアプローチを評価するために,16種類の事前生成モデルを含む総合的なAI生成画像検出ベンチマークを構築した。我々のベンチマークはフォローアップ研究のリーダーボードを提供する。その結果,本手法は最先端のベースラインよりも有意差が認められた。私たちのプロジェクト:https://fdmas.github.io/AIGCDetect/

Recent generative models show impressive performance in generating photographic images. Humans can hardly distinguish such incredibly realistic-looking AI-generated images from real ones. AI-generated images may lead to ubiquitous disinformation dissemination. Therefore, it is of utmost urgency to develop a detector to identify AI-generated images. Most existing detectors suffer from sharp performance drops over unseen generative models. In this paper, we propose a novel AI-generated image detector capable of identifying fake images created by a wide range of generative models. Our approach leverages the inter-pixel correlation contrast between rich and poor texture regions within an image. Pixels in rich texture regions exhibit more significant fluctuations than those in poor texture regions. This discrepancy reflects that the entropy of rich texture regions is larger than that of poor ones. Consequently, synthesizing realistic rich texture regions proves to be more challenging for existing generative models. Based on this principle, we divide an image into multiple patches and reconstruct them into two images, comprising rich-texture and poor-texture patches respectively. Subsequently, we extract the inter-pixel correlation discrepancy feature between rich and poor texture regions. This feature serves as a universal fingerprint used for AI-generated image forensics across different generative models. In addition, we build a comprehensive AI-generated image detection benchmark, which includes 16 kinds of prevalent generative models, to evaluate the effectiveness of existing baselines and our approach. Our benchmark provides a leaderboard for follow-up studies. Extensive experimental results show that our approach outperforms state-of-the-art baselines by a significant margin. Our project: https://fdmas.github.io/AIGCDetect/

翻訳日:2023-12-13 01:14:23 公開日:2023-12-10

# RBPGAN:ビデオスーパーレゾリューションのためのリカレントバックプロジェクションGAN

RBPGAN: Recurrent Back-Projection GAN for Video Super Resolution ( http://arxiv.org/abs/2311.09178v4 )

ライセンス: Link先を確認

Marwah Sulaiman, Zahraa Shehabeldin, Israa Fahmy, Mohammed Barakat, Mohammed El-Naggar, Dareen Hussein, Moustafa Youssef, Hesham M. Eraqi

(参考訳) 近年,ビデオスーパーレゾリューション (VSR) はコンピュータビジョンの領域において,様々な用途で非常に影響力のある課題となっている。本稿では,空間的詳細を保ちながら時間的コヒーレントな解を生成するために,vsrのためのバックプロジェクション生成逆ネットワーク(rbpgan)を提案する。 RBPGANは2つの最先端モデルを統合して、生成されたビデオの精度を損なうことなく、両方の世界で最高のものを得る。モデルのジェネレータはRDPNシステムにインスパイアされ、識別器はTecoGANにインスパイアされている。また,Ping-Pong損失を利用して時間とともに時間的整合性を高める。我々のコントリビューションは、異なるデータセットを使用して定性的かつ定量的に示すように、時間的に一貫した詳細の観点から、初期の作業より優れているモデルをもたらす。

Recently, video super resolution (VSR) has become a very impactful task in the area of Computer Vision due to its various applications. In this paper, we propose Recurrent Back-Projection Generative Adversarial Network (RBPGAN) for VSR in an attempt to generate temporally coherent solutions while preserving spatial details. RBPGAN integrates two state-of-the-art models to get the best in both worlds without compromising the accuracy of produced video. The generator of the model is inspired by RBPN system, while the discriminator is inspired by TecoGAN. We also utilize Ping-Pong loss to increase temporal consistency over time. Our contribution together results in a model that outperforms earlier work in terms of temporally consistent details, as we will demonstrate qualitatively and quantitatively using different datasets.

翻訳日:2023-12-13 01:13:17 公開日:2023-12-10

# 自然災害管理のためのAIの活用 : モロッコ地震の教訓

Leveraging AI for Natural Disaster Management : Takeaways From The Moroccan Earthquake ( http://arxiv.org/abs/2311.08999v2 )

ライセンス: Link先を確認

Morocco Solidarity Hackathon (Organizers, Speakers, Mentors and Participant teams)

(参考訳) 2023年、モロッコのアル・ハウズで発生したマグニチュード6.8の地震は、世界的な災害管理戦略に重大な反省を呼び起こし、人工知能(AI)を用いた災害対策、対応、復旧のためのハッカソンを引き起こした。この論文は (i)総合的な文献レビュー (ii)勝利プロジェクトの概観 (iii)オープンソースのリアルタイムデータ、データ不足、学際的コラボレーション障壁といった重要な洞察と課題 (iv)さらなる行動を求めるコミュニティコール。

The devastating 6.8-magnitude earthquake in Al Haouz, Morocco in 2023 prompted critical reflections on global disaster management strategies, resulting in a post-disaster hackathon, using artificial intelligence (AI) to improve disaster preparedness, response, and recovery. This paper provides (i) a comprehensive literature review, (ii) an overview of winning projects, (iii) key insights and challenges, namely real-time open-source data, data scarcity, and interdisciplinary collaboration barriers, and (iv) a community-call for further action.

翻訳日:2023-12-13 01:13:01 公開日:2023-12-10

# KEEC: 等変幾何学の制御に埋め込まれる

KEEC: Embed to Control on An Equivariant Geometry ( http://arxiv.org/abs/2312.01544v2 )

ライセンス: Link先を確認

Xiaoyuan Cheng, Yiming Yang, Wei Jiang, Yukun Hu

(参考訳) 本稿では, カオス系や非線形系などの未知および複素力学における表現学習の最適制御を, 事前の領域知識に頼らずに実現する方法について検討する。中心となる考え方は、力学系によって定義される多様体に微分同型である同変幾何学を確立し、非自明なタスクであるこの幾何学の中で最適な制御を行うことである。この課題に対処するために、モデル学習と制御のためにKoopman Embed to Equivariant Control (KEEC)を提案する。リー理論に着想を得たKEECは、多様体上で定義された非線形力学系を学び、軌跡をリー群に埋め込むことから始める。その後、KEECは同変幾何学の強化学習における同変値関数方程式を定式化し、元の多様体上の値関数として不変性を保証する。等価値関数に対する解析的形式的最適作用を導出することにより、keecは理論上、同変幾何上の微分情報を利用して最適同変値関数の二次収束を達成する。 KEECの有効性は、ロレンツ63のようなカオス的なシステムを含む挑戦的な力学系で実証されている。特に,測度と差分情報を保存しながら幾何のコンパクト性と完全性を維持する等尺関数は,これらの特徴を欠く損失関数より一貫して優れていた。

This paper investigates how representation learning can enable optimal control in unknown and complex dynamics, such as chaotic and non-linear systems, without relying on prior domain knowledge of the dynamics. The core idea is to establish an equivariant geometry that is diffeomorphic to the manifold defined by a dynamical system and to perform optimal control within this corresponding geometry, which is a non-trivial task. To address this challenge, Koopman Embed to Equivariant Control (KEEC) is proposed for model learning and control. Inspired by Lie theory, KEEC begins by learning a non-linear dynamical system defined on a manifold and embedding trajectories into a Lie group. Subsequently, KEEC formulates an equivariant value function equation in reinforcement learning on the equivariant geometry, ensuring an invariant effect as the value function on the original manifold. By deriving analytical-form optimal actions on the equivariant value function, KEEC theoretically achieves quadratic convergence for the optimal equivariant value function by leveraging the differential information on the equivariant geometry. The effectiveness of KEEC is demonstrated in challenging dynamical systems, including chaotic ones like Lorenz-63. Notably, our results show that isometric functions, which maintain the compactness and completeness of geometry while preserving metric and differential information, consistently outperform loss functions lacking these characteristics.

翻訳日:2023-12-13 01:04:54 公開日:2023-12-10

# 進化的アルゴリズムによるポインタネットワークの学習

Pointer Networks Trained Better via Evolutionary Algorithms ( http://arxiv.org/abs/2312.01150v3 )

ライセンス: Link先を確認

Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Ke Tang, Peng Yang

(参考訳) Pointer Network (PtrNet) は、組合せ最適化問題(COP)を解決するためのニューラルネットワークである。 PtrNetsは複雑なCOPsインスタンスに対してリアルタイムフィードフォワード推論を提供するが、結果の品質は満足できない傾向にある。一つの考えられる理由は、このような問題は勾配降下のグローバルな探索能力の欠如に苦しんでおり、教師付き学習と強化学習の両方を含む伝統的なptrnetトレーニング手法で頻繁に使われている。 PtrNetの性能向上のために,PtrNetと進化的アルゴリズム(EA)の訓練の利点を深く研究した。トラベリングセールスマン問題(TSP)に基づく広範な実証研究が実施されている。その結果、EAでトレーニングされたPtrNetは、様々な問題スケールで8つの最先端手法よりもずっと優れた推論結果が得られることが示された。勾配降下に基づくPtrNetトレーニング手法と比較して、EAは同じ計算時間でソリューションの品質を最大30.21 %向上させる。この利点を活かして,同じ次元でptrnetをトレーニングすることにより,1000次元tspの解法を初めて報告することが可能であり,高次元copsの解法においてptrnetの性能を向上させるためには,トレーニングインスタンスのスケールアップが必要であることを強く示唆する。

Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditional PtrNet training methods including both supervised learning and reinforcement learning. To improve the performance of PtrNet, this paper delves deeply into the advantages of training PtrNet with Evolutionary Algorithms (EAs), which have been widely acknowledged for not easily getting trapped by local optima. Extensive empirical studies based on the Travelling Salesman Problem (TSP) have been conducted. Results demonstrate that PtrNet trained with EA can consistently perform much better inference results than eight state-of-the-art methods on various problem scales. Compared with gradient descent based PtrNet training methods, EA achieves up to 30.21\% improvement in quality of the solution with the same computational time. With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs.

翻訳日:2023-12-13 01:04:29 公開日:2023-12-10

# 暗く見えるように潜伏拡散モデルを改ざんする

Taming Latent Diffusion Models to See in the Dark ( http://arxiv.org/abs/2312.01027v2 )

ライセンス: Link先を確認

Qiang Wen, Yazhou Xing and Qifeng Chen

(参考訳) 低照度RAW画像をよく露出したクリーンなsRGB画像に拡張することは、計算写真において重要な課題である。大規模なペアリングデータの制限のため、従来の手法では極低照度領域の細部や真の色を復元することが困難であった。一方, 生成拡散モデルの最近の進歩は, 低照度画像強調(LLIE)タスクの恩恵を受けるために, 大規模オープンドメインデータセット上で訓練された拡散モデルから生成先行を探索するための有望な生成能力を示している。そこで本研究では, LDM-SIDと呼ばれる拡散モデルに基づくLLIE法を提案する。 LDM-SIDは,提案するテーピングモジュールの集合を凍結した事前学習拡散モデルに挿入し,生成過程を制御することを目的としている。具体的には、低照度情報によって供給されるテーミングモジュールは、拡散モデルにおける中間的特徴を変調するために、一対のアフィン変換パラメータを出力する。さらに,拡散モデルの異なる部分にわたる専用生成前兆の観測に基づいて,入力生画像に2次元離散ウェーブレット変換を適用し,llieタスクを低周波コンテンツ生成と高周波細部維持という2つの必須部分に分割することを提案する。これにより、構造生成と詳細な拡張を最適化するために拡散モデルを巧みに調整することができる。提案手法は, 定量的評価において最先端の性能を得るだけでなく, 視覚的比較において有意な優位性を示す。これらの結果から,LLIEタスクに先立って,事前学習した拡散モデルを利用した生成モデルの有効性が示唆された。プロジェクトページはhttps://csqiangwen.github.io/projects/ldm-sid/にある。

Enhancing a low-light noisy RAW image into a well-exposed and clean sRGB image is a significant challenge in computational photography. Due to the limitation of large-scale paired data, prior approaches have difficulty in recovering fine details and true colors in extremely low-light regions. Meanwhile, recent advancements in generative diffusion models have shown promising generating capabilities, which inspires this work to explore generative priors from a diffusion model trained on a large-scale open-domain dataset to benefit the low-light image enhancement (LLIE) task. Based on this intention, we propose a novel diffusion-model-based LLIE method, dubbed LDM-SID. LDM-SID aims at inserting a set of proposed taming modules into a frozen pre-trained diffusion model to steer its generating process. Specifically, the taming module fed with low-light information serves to output a pair of affine transformation parameters to modulate the intermediate feature in the diffusion model. Additionally, based on the observation of dedicated generative priors across different portions of the diffusion model, we propose to apply 2D discrete wavelet transforms on the input RAW image, resulting in dividing the LLIE task into two essential parts: low-frequency content generation and high-frequency detail maintenance. This enables us to skillfully tame the diffusion model for optimized structural generation and detail enhancement. Extensive experiments demonstrate the proposed method not only achieves state-of-the-art performance in quantitative evaluations but also shows significant superiority in visual comparisons. These findings highlight the effectiveness of leveraging a pre-trained diffusion model as a generative prior to the LLIE task. The project page is available at https://csqiangwen.github.io/projects/ldm-sid/

翻訳日:2023-12-13 01:04:10 公開日:2023-12-10

# 健康のための機械学習シンポジウム2023 -- findings track

Machine Learning for Health symposium 2023 -- Findings track ( http://arxiv.org/abs/2312.00655v2 )

ライセンス: Link先を確認

Stefan Hegselmann, Antonio Parziale, Divya Shanmugam, Shengpu Tang, Mercy Nyamewaa Asiedu, Serina Chang, Thomas Hartvigsen, Harvineet Singh

(参考訳) 2023年12月10日にルイジアナ州ニューオーリンズで開催された第3回機械学習・フォー・ヘルスシンポジウム(ML4H 2023)で発表されたFindingsの論文集。 ML4H 2023は、医療、バイオメディシン、公衆衛生など、様々な健康関連分野における問題に関する高品質な申請を招待した。提出トラックはアーカイバル・プロシージャー・トラックと非アーキバル・アック・トラックの2つが提供された。研究対象は、高度な技術的洗練と健康への影響の高い成熟した作業であった。調査結果のトラックは、洞察に富んだ議論を呼び起こしたり、コミュニティにとって貴重なリソースになったり、新しいコラボレーションを可能にする新しいアイデアを探した。手続トラックへの提出は受理されなかったとしても、自動的に結果トラックとして検討された。 ml4hシンポジウムに提出された全ての原稿は、二重盲検のピアレビュープロセスが行われた。

A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.

翻訳日:2023-12-13 01:03:39 公開日:2023-12-10

# webcrow (複数形 webcrows)

The WebCrow French Crossword Solver ( http://arxiv.org/abs/2311.15626v2 )

ライセンス: Link先を確認

Giovanni Angelini, Marco Ernandes, Tommaso laquinta, Caroline Stehl\'e, Fanny Sim\~oes, Kamyar Zeinalipour, Andrea Zugarini, Marco Gori

(参考訳) クロスワードパズル(crossword puzzles)は、世界中の異なる言語でプレイされる最も人気のあるワードゲームの一つであり、リドルスタイルは国によって大きく異なる。自動クロスワード解決は困難であり、典型的なソルバは、以前に解決したクロスワードの大規模なデータベースに依存している。本研究では,自動クロスワードソルバであるwebcrow 2.0をフランス語に拡張し,フランス語でクロスワードを解くための最初のプログラムとした。ヒントと回答のクロスワードデータの大規模なリポジトリがないことに対処するため、WebCrow 2.0は、専門家と呼ばれる複数のモジュールを利用して、Web、知識グラフ、言語規則などの異種リソースから候補回答を取得する。 webcrowのパフォーマンスを2つの異なる課題で人間と比較した。過去のクロスワードが限られていたにもかかわらず、フランスのWebCrowは競争力があり、スピードと精度で人間より優れており、新しい言語に一般化する能力を示した。

Crossword puzzles are one of the most popular word games, played in different languages all across the world, where riddle style can vary significantly from one country to another. Automated crossword resolution is challenging, and typical solvers rely on large databases of previously solved crosswords. In this work, we extend WebCrow 2.0, an automatic crossword solver, to French, making it the first program for crossword solving in the French language. To cope with the lack of a large repository of clue-answer crossword data, WebCrow 2.0 exploits multiple modules, called experts, that retrieve candidate answers from heterogeneous resources, such as the web, knowledge graphs, and linguistic rules. We compared WebCrow's performance against humans in two different challenges. Despite the limited amount of past crosswords, French WebCrow was competitive, actually outperforming humans in terms of speed and accuracy, thus proving its capabilities to generalize to new languages.

翻訳日:2023-12-13 01:00:54 公開日:2023-12-10

# ShareCMP: 偏光対応RGB-Pセマンティックセグメンテーション

ShareCMP: Polarization-Aware RGB-P Semantic Segmentation ( http://arxiv.org/abs/2312.03430v2 )

ライセンス: Link先を確認

Zhuoyan Liu, Bo Wang, Lizhi Wang, Chenyu Mao, Ye Li

(参考訳) マルチモーダルなセマンティックセグメンテーションは急速に発展しているが、RGB-Polarizationのモダリティはいまだ解明されていない。そこで本研究では,12種類の水中意味クラスを持つUPLight RGB-Pセグメンテーションベンチマークを構築した。本研究では,dual-branchアーキテクチャを持つrgb-pセマンティクスセグメンテーションフレームワークであるsharecmpを設計し,従来のdual-branchモデルと比較してパラメータ数を約26～33%削減した。エンコーダの偏光特性が豊かな偏光モーダル画像を生成するように設計された偏光生成注意(pga)モジュールを包含する。さらに,偏波モーダル情報のためのエンコーダの学習と理解を改善し,pgaモジュールを最適化するために,クラス偏波認識損失(cpaloss)を導入する。合計3つのRGB-Pベンチマークに関する広範な実験により、ShareCMPは、UPLight(92.45(+0.32)%)、ZJU(92.7(+0.1)%)、MCubeS(50.99(+1.51)%)データセットのパラメータが少ないmIoUの最先端性能を達成した。コードはhttps://github.com/LEFTeyex/ShareCMPで入手できる。

Multimodal semantic segmentation is developing rapidly, but the modality of RGB-Polarization remains underexplored. To delve into this problem, we construct a UPLight RGB-P segmentation benchmark with 12 typical underwater semantic classes. In this work, we design the ShareCMP, an RGB-P semantic segmentation framework with a shared dual-branch architecture, which reduces the number of parameters by about 26-33% compared to previous dual-branch models. It encompasses a Polarization Generate Attention (PGA) module designed to generate polarization modal images with richer polarization properties for the encoder. In addition, we introduce the Class Polarization-Aware Loss (CPALoss) to improve the learning and understanding of the encoder for polarization modal information and to optimize the PGA module. With extensive experiments on a total of three RGB-P benchmarks, our ShareCMP achieves state-of-the-art performance in mIoU with fewer parameters on the UPLight (92.45(+0.32)%), ZJU (92.7(+0.1)%), and MCubeS (50.99(+1.51)%) datasets compared to the previous best methods. The code is available at https://github.com/LEFTeyex/ShareCMP.

翻訳日:2023-12-13 00:52:54 公開日:2023-12-10

# gcfa:ジオデシック曲線のプレ形状空間における拡張

GCFA:Geodesic Curve Feature Augmentation in the Pre-Shape Space ( http://arxiv.org/abs/2312.03325v2 )

ライセンス: Link先を確認

Yuexing Han, Guanxin Wan and Bing Wang

(参考訳) 深層学習は様々な領域で顕著な結果をもたらした。しかし、大規模なラベル付きサンプルを必要とするという課題は、いまだにディープラーニングにおいて持続している。このように、ディープラーニングモデルをトレーニングするための重要な戦略として、データ拡張が導入されている。しかし、データ拡張は小さなサンプル環境での情報損失と性能の低下に苦しむ。これらの欠点を克服するため,我々は形状空間理論に基づく特徴拡張法,すなわち,GCFAと呼ばれるジオデシック曲線の特徴増強手法を提案し,まず,ニューラルネットワークモデルを用いて特徴抽出を行う。そして、複数の画像特徴を特徴として事前形状空間に投影する。プレシェイプ空間では、特徴に合うようにジオデシック曲線が構築される。最後に、Geodesic曲線上に生成された多くの特徴は、様々な機械学習モデルをトレーニングするために使用される。 GCFAモジュールは、ほとんどの機械学習メソッドとシームレスに統合できる。また,提案手法は小型サンプルデータセットに対して単純で効果的で非感受性であり,サンプル環境ではgcfa法がデータプリプロセッシングモデルの性能を大幅に向上できることを示す。

Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we propose a feature augmentation method based on shape space theory, i.e., Geodesic curve feature augmentation, called GCFA in brevity.First, we extract features from the image with the neural network model. Then, the multiple image features are projected into a pre-shape space as features. In the pre-shape space, a Geodesic curve is built to fit the features. Finally, the many generated features on the Geodesic curve are used to train the various machine learning models. The GCFA module can be seamlessly integrated with most machine learning methods. And the proposed method is simple, effective and insensitive for the small sample datasets.Several examples demonstrate that the GCFA method can greatly improve the performance of the data preprocessing model in a small sample environment.

翻訳日:2023-12-13 00:52:25 公開日:2023-12-10

# dreamvideo: 画像保持とテキストガイダンスを備えた高忠実度画像対ビデオ生成

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance ( http://arxiv.org/abs/2312.03018v2 )

ライセンス: Link先を確認

Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

(参考訳) 参照画像からビデオを生成することを目的とした画像対ビデオ生成が注目されている。既存の方法は、事前訓練されたテキスト誘導画像拡散モデルから画像誘導映像生成モデルへの拡張を試みる。それにもかかわらず、これらの手法は、浅い画像誘導と時間的一貫性の欠如により、低い忠実度または時間の経過とともに点滅する。これらの問題に対処するために,DreamVideo という名前の事前学習ビデオ拡散モデルに基づいてフレーム保持分岐を考案し,高忠実度映像生成手法を提案する。参照画像をセマンティックなレベルで拡散するプロセスに統合する代わりに、DreamVideoはコンボリューション層を通じて参照画像を認識し、ノイズの多いラテントをモデル入力として特徴を結合する。これにより、参照画像の詳細を最大限に保存することができる。さらに、ダブルコンディショナライザフリーのガイダンスを組み込むことで、さまざまなプロンプトテキストを提供することで、異なるアクションの動画に単一のイメージを向けることができる。これは制御可能なビデオ生成に重要な意味を持ち、幅広い応用可能性を持っている。定量的および定性的な結果から,本手法が最先端の手法より優れていることを示すため,公開データセットの総合的な実験を行った。特に忠実度では画像保持能力が強く,UCF101では他の画像対映像モデルと比較してFVDが高い。また、異なるテキストプロンプトを与えることで、正確な制御が可能となる。このモデルのさらなる詳細と包括的な結果はhttps://anonymous0769.github.io/dreamvideo/で示されます。

Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods try to extend pre-trained text-guided image diffusion models to image-guided video generation models. Nevertheless, these methods often result in either low fidelity or flickering over time due to their limitation to shallow image guidance and poor temporal consistency. To tackle these problems, we propose a high-fidelity image-to-video generation method by devising a frame retention branch on the basis of a pre-trained video diffusion model, named DreamVideo. Instead of integrating the reference image into the diffusion process in a semantic level, our DreamVideo perceives the reference image via convolution layers and concatenate the features with the noisy latents as model input. By this means, the details of the reference image can be preserved to the greatest extent. In addition, by incorporating double-condition classifier-free guidance, a single image can be directed to videos of different actions by providing varying prompt texts. This has significant implications for controllable video generation and holds broad application prospects. We conduct comprehensive experiments on the public dataset, both quantitative and qualitative results indicate that our method outperforms the state-of-the-art method. Especially for fidelity, our model has powerful image retention ability and result in high FVD in UCF101 compared to other image-to-video models. Also, precise control can be achieved by giving different text prompts. Further details and comprehensive results of our model will be presented in https://anonymous0769.github.io/DreamVideo/.

翻訳日:2023-12-13 00:51:48 公開日:2023-12-10

# HGPROMPT:Few-shot Prompt Learningのための均質グラフと不均質グラフ

HGPROMPT: Bridging Homogeneous and Heterogeneous Graphs for Few-shot Prompt Learning ( http://arxiv.org/abs/2312.01878v2 )

ライセンス: Link先を確認

Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang

(参考訳) グラフニューラルネットワーク(GNN)とヘテロジニアスグラフニューラルネットワーク(HGNN)は、同質で異質なグラフ表現学習において顕著なテクニックであるが、エンドツーエンドの監視フレームワークにおけるパフォーマンスは、タスク固有の監視の可用性に大きく依存している。ラベル付けコストを削減するため、自己教師付きプレテキストタスクの事前学習は一般的なパラダイムとなっているが、事前訓練されたモデルと下流タスクの間には、目的の相違から生じるギャップがしばしばある。ギャップを埋めるために、特に数ショット設定では、事前訓練されたモデルを完全に微調整することなく、迅速な学習が有望な方向として上昇している。グラフ上でのプロンプトベースの学習に関する初期の研究はあったが、主に同質グラフを扱っており、下流のアプリケーションでよく見られる不均一グラフを無視している。本稿では,HGPROMPTを提案する。HGPROMPTは,事前学習タスクと下流タスクだけでなく,二重テンプレート設計による均質かつ異質なグラフを統一する新しい学習促進フレームワークである。さらに,hgpromptのデュアルプロンプトを提案することで,特徴のばらつきだけでなく,タスク間の異種性の違いによって引き起こされるギャップを橋渡しする前に,下流タスクが最も重要視されるよう支援する。最後に,HGPROMPTを3つの公開データセットの広範な実験により徹底的に評価・解析する。

Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on self-supervised pretext tasks has become a popular paradigm,but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been some early exploration of prompt-based learning on graphs, they primarily deal with homogeneous graphs, ignoring the heterogeneous graphs that are prevalent in downstream applications. In this paper, we propose HGPROMPT, a novel pre-training and prompting framework to unify not only pre-training and downstream tasks but also homogeneous and heterogeneous graphs via a dual-template design. Moreover, we propose dual-prompt in HGPROMPT to assist a downstream task in locating the most relevant prior to bridge the gaps caused by not only feature variations but also heterogeneity differences across tasks. Finally, we thoroughly evaluate and analyze HGPROMPT through extensive experiments on three public datasets.

翻訳日:2023-12-13 00:48:40 公開日:2023-12-10

# ロボット合成 : バイオオタクティルセンシングによる手作業操作

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing ( http://arxiv.org/abs/2312.01853v2 )

ライセンス: Link先を確認

Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang

(参考訳) 接触の多い操作タスクの実行は触覚と視覚フィードバックの融合を必要とする。しかし、これらの様相の異なる性質は、重大な課題をもたらす。本稿では,視覚と触覚の入力を活用し,手作業のデキスタラブルな操作を可能にするシステムを提案する。具体的には,人間の触覚と視覚の合成にインスパイアされた新しい点雲に基づく触覚表現であるRobot Synesthesiaを提案する。このアプローチは、両方の感覚入力を同時にシームレスに統合し、より豊かな空間情報を提供し、ロボットの動作に関するより良い推論を容易にする。シミュレーション環境で訓練され、実際のロボットにデプロイされたこの方法は、様々な手持ちのオブジェクトの回転タスクに適用できる。視覚と触覚の統合によって強化学習とSim2Realのパフォーマンスが向上する。プロジェクトページはhttps://yingyuan0414.github.io/visuotactile/。

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

翻訳日:2023-12-13 00:48:12 公開日:2023-12-10

# $\nabla$を信頼する: 因果発見のためのグラディエントベースのインターベンションターゲット

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery ( http://arxiv.org/abs/2211.13715v3 )

ライセンス: Link先を確認

Mateusz Olko, Micha{\l} Zaj\k{a}c, Aleksandra Nowak, Nino Scherrer, Yashas Annadani, Stefan Bauer, {\L}ukasz Kuci\'nski, Piotr Mi{\l}o\'s

(参考訳) データから因果構造を推論することは、科学における基本的な重要性の課題である。観測データはしばしばシステムの因果構造を一意に識別するには不十分である。介入(実験)を行うことで識別性が向上するが、そのようなサンプルは通常、入手が困難で高価である。したがって、因果発見のための実験的設計アプローチは、最も有益な介入目標を推定することで介入回数を最小化することを目的としている。そこで本研究では,勾配に基づく因果発見フレームワークの勾配推定器を「信頼」し,介入獲得関数のシグナルを提供する,新しい勾配に基づく介入ターゲティング手法gitを提案する。我々は、シミュレーションおよび実世界のデータセットにおいて広範な実験を行い、GITが低データ体制において、競争ベースラインに匹敵する性能を示す。

Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.

翻訳日:2023-12-12 23:06:58 公開日:2023-12-10

# 言語モデルのロバスト性および一般化性に及ぼす対人訓練の影響

Impact of Adversarial Training on Robustness and Generalizability of Language Models ( http://arxiv.org/abs/2211.05523v3 )

ライセンス: Link先を確認

Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud, Sanjay Chawla

(参考訳) 敵の訓練は敵の攻撃に対する最も効果的な防御として広く認められている。しかし、敵対的に訓練されたモデルにおける堅牢性と一般化の両立にはトレードオフが伴うことも十分に確立されている。この研究の目的は、言語モデルにおける敵対的トレーニングのための異なるアプローチを深く比較することである。具体的には、事前学習データ拡張とトレーニング時間入力摂動と埋め込み空間摂動がトランスフォーマーベース言語モデルの堅牢性と一般化に及ぼす影響について検討する。以上の結果から,データの強化や入力空間の摂動によるトレーニングにより,より頑健性が得られることが示唆された。しかし、埋め込み空間摂動によるトレーニングは一般化を著しく改善する。学習モデルのニューロンの言語的相関解析により、改良された一般化は「より専門的な」ニューロンによるものであることが明らかになった。我々の知識を最大限に活用するために、言語モデルの対角訓練における逆例を生成する様々な方法の深い定性的な分析を行うのは、これが初めてである。

Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to 'more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.

翻訳日:2023-12-12 23:06:43 公開日:2023-12-10

# PromptCast: 時系列予測のための新しいPromptベースの学習パラダイム

PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting ( http://arxiv.org/abs/2210.08964v5 )

ライセンス: Link先を確認

Hao Xue and Flora D. Salim

(参考訳) 本稿では,時系列予測の新しい視点を提案する。既存の時系列予測手法では、モデルは入力として数値の列を取り、出力として数値値を生成する。既存のSOTAモデルはトランスフォーマーアーキテクチャに基づいており、複数のエンコーディング機構で変更され、歴史的データのコンテキストとセマンティクスが組み込まれている。事前学習された言語基盤モデルの成功に触発されて、これらのモデルが時系列予測の解決にも適用できるかどうかを疑問視する。そこで我々は,新しい予測パラダイムであるprompt-based time series forecasting (promptcast)を提案する。この新しいタスクでは、数値入力と出力をプロンプトに変換し、予測タスクを文から文へのフレーム化することで、予測目的の言語モデルを直接適用することができる。本研究を支援するために,3つの実世界の予測シナリオを含む大規模データセット(PISA)を提案する。我々は異なるSOTA数値に基づく予測手法と言語生成モデルを評価する。様々な予測設定によるベンチマーク結果は、言語生成モデルで提案するプロンプトキャストが有望な研究方向であることを示している。さらに、従来の数値ベースの予測と比較すると、PromptCastはゼロショット設定下でのより優れた一般化能力を示す。

This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.

翻訳日:2023-12-12 23:06:27 公開日:2023-12-10

# ベイズの最適緩和としてのSAM

SAM as an Optimal Relaxation of Bayes ( http://arxiv.org/abs/2210.01620v3 )

ライセンス: Link先を確認

Thomas M\"ollenhoff, Mohammad Emtiyaz Khan

(参考訳) シャープネスを意識した最小化(SAM)およびそれに関連する逆深層学習法は、一般化を大幅に改善することができるが、その基盤となるメカニズムはまだ完全には理解されていない。そこで我々は,いわゆるフェンシェル双共役を用いて得られた最適凸下界に,期待負損失が置き換えられるベイズ目標の緩和としてsamを定式化する。この接続により、新しいAdamのようなSAMの拡張が自動的に妥当な不確実性の推定値を得ることができ、時には精度も向上する。敵対的手法とベイズ的手法をつなぐことで、我々の研究は堅牢性への新しい道を開きます。

Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables a new Adam-like extension of SAM to automatically obtain reasonable uncertainty estimates, while sometimes also improving its accuracy. By connecting adversarial and Bayesian methods, our work opens a new path to robustness.

翻訳日:2023-12-12 23:06:09 公開日:2023-12-10

# TCJA-SNN:スパイクニューラルネットワークのための時空連成注意

TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks ( http://arxiv.org/abs/2206.10177v2 )

ライセンス: Link先を確認

Rui-Jie Zhu, Qihang Zhao, Tianjing Zhang, Haoyu Deng, Yule Duan, Malu Zhang, Liang-Jian Deng

(参考訳) スパイキングニューラルネットワーク(SNN)は、生物学的妥当性、エネルギー効率、強力な時空間情報表現能力によって広く関心を集めている。ニューラルネットワークの性能向上における注意機構の重要な役割を考えると、SNNと注意機構の統合は、エネルギー効率と高性能コンピューティングパラダイムを提供する可能性を示している。本稿では,TJA-SNNと呼ばれるSNNの時間・チャネル共同注意機構について述べる。提案するtcja-snnフレームワークは,空間次元と時間次元の両方からスパイクシーケンスの意義を効果的に評価できる。より具体的に言えば、我々の重要な技術的貢献は 1) スパイクストリームを平均行列に圧縮するために, 圧縮操作を用いる。そして,効率的な1次元畳み込みに基づく2つの局所的注意機構を活用し,時間・チャネルレベルでの包括的特徴抽出を容易にする。 2) 時間領域とチャネル領域の相互依存性をモデル化するための新しいアプローチとして, クロス畳み込み融合(ccf)層を導入する。このレイヤは2つの次元の独立性を破り、機能間の相互作用を可能にします。実験の結果、提案されたTJA-SNNは、Fashion-MNIST、CIFAR10-DVS、N-Caltech 101、DVS128 Gestureなど、標準的な静的およびニューロモルフィックなデータセットで最大15.7%の精度でSOTAを上回った。さらに、可変オートエンコーダを利用して画像生成タスクにTJA-SNNフレームワークを適用する。我々の知る限り、この研究は、画像分類と生成タスクにSNNアテンション機構が採用された最初の事例である。特に,本手法は両領域でSOTA性能を達成し,この分野において大きな進歩を遂げた。コードはhttps://github.com/ridgerchu/TCJA.comで入手できる。

Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA.

翻訳日:2023-12-12 23:04:52 公開日:2023-12-10

# 公正な二分分類のための任意の決定事項の修正

Repairing Regressors for Fair Binary Classification at Any Decision Threshold ( http://arxiv.org/abs/2203.07490v4 )

ライセンス: Link先を確認

Kweku Kwegyir-Aggrey, A. Feder Cooper, Jessica Dai, John Dickerson, Keegan Hines, Suresh Venkatasubramanian

(参考訳) 我々は,教師付き機械学習型回帰器の処理後問題について検討し,任意の判定しきい値における公平な二項分類を最大化する。各グループのスコア分布間の統計的距離を減少させることにより、各閾値の公平な性能を一度に向上でき、精度を大幅に低下させることなく達成できることが示される。この目的のために,異なる保護群に対する分類の分布の類似度をキャプチャする分布パリティの形式的尺度を導入する。我々の主な成果は、最適輸送に基づく新しいポストプロセッシングアルゴリズムを提案し、分配パリティを確実に最大化し、任意の閾値における等化オッドや等化オポチュニティのようなグループフェアネスの共通概念を達成することである。 2つのフェアネスベンチマークで、我々の手法は実験的にうまく機能し、関連する作業から類似した手法を上回り、一般化する。

We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically, while also outperforming and generalizing similar techniques from related work.

翻訳日:2023-12-12 23:03:24 公開日:2023-12-10

# 畳み込みニューラルネットワークを用いた食品分類と多クラス線形識別分析

Food Classification with Convolutional Neural Networks and Multi-Class Linear Discernment Analysis ( http://arxiv.org/abs/2012.03170v3 )

ライセンス: Link先を確認

Joshua Ball

(参考訳) 畳み込みニューラルネットワーク(cnns)は、人間の脳で知覚される完全に接続された推論能力を表現することに成功している。 cnnの無数の実装は、これらの複雑なパターン、特に画像分類の領域を学習する能力の強さを示している。しかし、高性能CNNをいわゆる「最先端技術」レベルに上げるコストは、計算コストがかかる。 mobilenetv2のようなモデルから非常に深い層を利用する転送学習を使う場合でも、cnnは膨大な時間とリソースを必要とします。フィッシャーの線形判別を一般化した線形判別分析(LDA)は、画像分類に高性能なシステムを必要としないが、クラス特徴の分離性を高めるために多クラス分類法で実装することができる。同様に、私たちはLDAが優れたパフォーマンスを約束しているとも信じています。本稿では, 食品分類のための堅牢なCNNの開発プロセスと, マルチクラスLDAの効果的な実装について論じ, 1) 画像分類においてCNNがLDAよりも優れていること, (2) 画像分類においてLDAを除外すべきでない理由について述べる。

Convolutional neural networks (CNNs) have been successful in representing the fully-connected inferencing ability perceived to be seen in the human brain: they take full advantage of the hierarchy-style patterns commonly seen in complex data and develop more patterns using simple features. Countless implementations of CNNs have shown how strong their ability is to learn these complex patterns, particularly in the realm of image classification. However, the cost of getting a high performance CNN to a so-called "state of the art" level is computationally costly. Even when using transfer learning, which utilize the very deep layers from models such as MobileNetV2, CNNs still take a great amount of time and resources. Linear discriminant analysis (LDA), a generalization of Fisher's linear discriminant, can be implemented in a multi-class classification method to increase separability of class features while not needing a high performance system to do so for image classification. Similarly, we also believe LDA has great promise in performing well. In this paper, we discuss our process of developing a robust CNN for food classification as well as our effective implementation of multi-class LDA and prove that (1) CNN is superior to LDA for image classification and (2) why LDA should not be left out of the races for image classification, particularly for binary cases.

翻訳日:2023-12-12 23:02:39 公開日:2023-12-10

# マルチモーダルインタラクションの定量化とモデル化:情報分解フレームワーク

Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework ( http://arxiv.org/abs/2302.12247v5 )

ライセンス: Link先を確認

Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) 近年のマルチモーダルアプリケーションへの関心の高まりにより、様々なモダリティから情報を表現・統合するためのデータセットや手法が広く選択された。これらの経験的な進歩にもかかわらず、基礎的な研究の疑問が残る: マルチモーダルなタスクを解決するのに必要な相互作用をどのように定量化できるか? その後、これらの相互作用を捉えるのに最も適したマルチモーダルモデルは何ですか? これらの質問に答えるために,入力モダリティと出力タスクを関連付ける冗長性,特異性,相乗効果の程度を定量化する情報理論的手法を提案する。これら3つの測度をマルチモーダル分布(略してPID)のPID統計と呼び、高次元分布にスケールするこれらのPID統計に対する2つの新しい推定値を導入する。 PID推定を検証するために、PIDが知られている合成データセットと、PID推定を人間のアノテーションと比較する大規模マルチモーダルベンチマークの両方で広範な実験を行う。最後に,(1)マルチモーダルデータセット内のインタラクションの定量化,(2)マルチモーダルモデルでキャプチャされたインタラクションの定量化,(3)モデル選択のための原則的アプローチ,(4)病理学,ムード予測,ロボット知覚における3つの実世界のケーススタディにおいて有用性を示す。

The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.

翻訳日:2023-12-12 22:55:33 公開日:2023-12-10

# データバイアス下での公平な分類器の比較について

On Comparing Fair Classifiers under Data Bias ( http://arxiv.org/abs/2302.05906v2 )

ライセンス: Link先を確認

Mohit Sharma, Amit Deshpande, Rajiv Ratn Shah

(参考訳) 本稿では,データバイアス,すなわち自己表現とラベルバイアス(blum & stangl, 2019)を注入するための理論的モデルを検討する。フェア分類器の精度と公平性に対する様々なデータバイアスの影響を実証研究する。合成および実世界のデータセット(例えば、アダルト、ドイツ信用、銀行マーケティング、CompAS)の広範な実験を通じて、トレーニングデータ(ただし、テストデータではなく)に様々な量の下位表現とラベルバイアスを注入することにより、標準フェアネスツールキットから、事前、内、後処理の公正分類を実証的に監査する。私たちの主な観察は 1 標準公正分類器の公平性と精度は、訓練データに注入されるバイアスが増加するにつれて著しく低下する。 2. 適切なデータに基づいてトレーニングされた単純なロジスティック回帰モデルは、精度と公平性の両方において、偏りのあるトレーニングデータに基づいてトレーニングされた最も公正な分類器よりもしばしば優れる。 3. 少数の単純なフェアネス技術(例えば、リウィーディング、指数化勾配)は、トレーニングデータを低表現とラベルバイアスで注入しても、安定した精度と公正性を保証する。実験では、既存のフェアネスダッシュボードにデータバイアスリスクの測定値を統合する方法も示しています。

In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.

翻訳日:2023-12-12 22:53:34 公開日:2023-12-10

# 責任あるデータキュレーションの倫理的考察

Ethical Considerations for Responsible Data Curation ( http://arxiv.org/abs/2302.03629v3 )

ライセンス: Link先を確認

Jerone T. A. Andrews and Dora Zhao and William Thong and Apostolos Modas and Orestis Papakyriakopoulos and Alice Xiang

(参考訳) human-centric computer vision (hccv) データキュレーションの実践は、しばしばプライバシーやバイアスの懸念を無視し、データセットの撤回と不公平なモデルにつながる。非合意Webスクレイピングによって構築されたHCCVデータセットには、包括的な公正性と堅牢性評価のための重要なメタデータが欠如している。現在の治療法は、ポストホック、採用に対する説得力のある正当化の欠如、あるいは適切なアプリケーションに対する適切なコンテキスト化の提供に失敗している。本研究は,HCCV評価データセットを算出し,プライバシとバイアスの懸念に対処するための,積極的に,ドメイン固有の推奨,目的,プライバシと同意,多様性をカバーすることに焦点を当てる。現在のプラクティスやガイドライン、データセットの取り下げ、監査から導き、考慮事項やレコメンデーションを知らせるアンテホックな視点を採用しています。

Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustness evaluations. Current remedies are post hoc, lack persuasive justification for adoption, or fail to provide proper contextualization for appropriate application. Our research focuses on proactive, domain-specific recommendations, covering purpose, privacy and consent, and diversity, for curating HCCV evaluation datasets, addressing privacy and bias concerns. We adopt an ante hoc reflective perspective, drawing from current practices, guidelines, dataset withdrawals, and audits, to inform our considerations and recommendations.

翻訳日:2023-12-12 22:52:50 公開日:2023-12-10

# モデルのスケーリングがパラメーター効率のチューニングに与える影響を探る

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning ( http://arxiv.org/abs/2306.02320v2 )

ライセンス: Link先を確認

Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun

(参考訳) パラメータ効率チューニング(PET)手法は、最小限のパラメータのみを訓練することによって、非常に大きな事前学習言語モデル(PLM)を効果的に駆動することができる。異なるPET法は、異なる手動で設計したチューナブルモジュールを利用する。小型PLMでは、PET法には通常顕著な性能差がある。しかし、モデルスケールが大きくなるにつれて、性能の差は狭まる。したがって、モデルスケーリングはpetメソッドに対する設計の違いの影響を緩和する、と仮定する。そこで本研究では,Arbitrary PET(APET)法という,より柔軟なPET法を提案する。 APET法は任意の位置に分布する任意の数のパラメータからなるチューナブルモジュールと互換性がある。そして,これを利用し,11のNLPタスクを3つの代表的PLMで実験する。本研究は,モデルスケーリングが,(1)調整可能なパラメータの位置が性能に与える影響を緩和し,(2)調整可能なパラメータを最適化することで,フルパラメータの微調整に匹敵する性能を実現することを明らかにする。興味深いことに、チューニング手法は、異なるタスクにおけるランダムな推測性能を超えるように、類似の調整可能なパラメータ数を最適化する。本稿では,この現象と,その基礎となるメカニズムを理解するための最適化の観点から,上記の2つの知見をまとめて論じる。これらの結論は, モデルスケーリングがPETに与える影響の理解を深め, 異なるスケールのPLMに対して, より効率的かつ効率的なPET手法の設計を支援する。ソースコードは、このgithubリポジトリから取得することができる。

Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by training only minimal parameters. Different PET methods utilize different manually designed tunable modules. In small PLMs, there are usually noticeable performance differences among PET methods. Nevertheless, as the model scale increases, the performance differences become marginal. Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. The APET method is compatible with a tunable module, which consists of any number of parameters distributed in arbitrary positions. Then, we utilize it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters. Intriguingly, we also observe that tuning methods optimize the similar number of tunable parameters to exceed random guess performance on different tasks. We collectively discuss this phenomenon and the two aforementioned findings from an optimization perspective to understand the underlying mechanisms. These conclusions enhance our understanding of the impact of model scaling on PET and assist in designing more effective and efficient PET methods for PLMs of different scales. The source code can be obtained from this GitHub repository: \url{https://github.com/yushengsu-thu/PET_Scaling}.

翻訳日:2023-12-12 22:44:24 公開日:2023-12-10

# DaGAN++: ヘッドビデオ生成のための奥行き対応ネットワーク

DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation ( http://arxiv.org/abs/2305.06225v2 )

ライセンス: Link先を確認

Fa-Ting Hong, Li Shen, and Dan Xu

(参考訳) 音声頭部生成の手法は、入力された顔画像からの表情や動きを含む2次元情報に大きく依存する。それでも、画素の深さのような高密度な3次元顔形状は、正確な3次元顔構造の構築と、生成のための複雑な背景雑音の抑制に重要な役割を果たしている。しかし、顔の動画に対する密集した3dアノテーションは、非常にコストがかかる。本稿では,まず,カメラパラメータや3次元形状アノテーションを必要とせず,顔映像から密集した3次元顔形状(ie,深度)を学習する新しい自己教師あり手法を提案する。さらに,幾何学習のためのより信頼性の高い剛体移動画素を知覚するために,画素レベルの不確実性を学習する戦略を提案する。第2に,移動場を生成するための正確なキーポイントを提供する,効果的な幾何学誘導型顔キーポイント推定モジュールを設計する。最後に,各生成層に適用可能な3d対応のクロスモーダル(ie,外観,奥行き)注意機構を開発し,顔の形状を粗度から細度まで把握する。大規模な実験は3つの挑戦的なベンチマーク(VoxCeleb1、VoxCeleb2、HDTF)で実施される。その結果,提案フレームワークは,これらのベンチマークで新たな最先端性能が確立され,高度にリアルに再現されたトーキングビデオを生成することができることがわかった。コードとトレーニングされたモデルはgithubプロジェクトのhttps://github.com/harlanhong/cvpr2022-daganで公開されている。

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Secondly, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (ie, VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks. The codes and trained models are publicly available on the GitHub project page at https://github.com/harlanhong/CVPR2022-DaGAN

翻訳日:2023-12-12 22:41:07 公開日:2023-12-10

# belt:bootstrapping electroencephalography-to-language decodingとゼロショット感情分類

BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision ( http://arxiv.org/abs/2309.12056v2 )

ライセンス: Link先を確認

Jinzhao Zhou, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

(参考訳) 本稿では,脳から言語への翻訳研究において重要なトピックとなる新しいモデルと学習フレームワークである belt を提案する。非侵襲的な脳信号から可読性自然言語への変換は、応用シナリオを促進し、脳-コンピュータインターフェース(BCI)全体の開発を促進する可能性がある。脳信号デコードや脳から言語への翻訳における重要な問題は、限られた規模と品質のデータセットから意味的に適切かつ差別的な脳波表現を取得することである。提案手法は,既製の大規模事前学習言語モデル(LM)を用いて脳波表現学習をブートストラップする汎用的で効率的なフレームワークである。意味情報の理解とゼロショットの一般化のための大きなLM能力により、BELTは、インターネット規模のデータセットで訓練された大規模なLMを使用して、脳波信号の理解を大幅に改善する。特に、BELTモデルは、ディープコンバータエンコーダとベクトル量子化エンコーダで構成される。意味論的脳波表現は、自然言語を監督する対比学習ステップによって達成される。脳から言語への翻訳とゼロショット感情分類を含む2つの脳デコーディングタスクについて最新の結果を得た。具体的には、両方のタスクのベースラインモデルを5.45%、10%以上で上回り、それぞれ42.31%のBLEU-1スコアと67.32%の精度で翻訳の主評価基準とゼロショットの感情分類をアーカイブする。

This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.

翻訳日:2023-12-12 22:34:38 公開日:2023-12-10

# チャットボットのバイアスと公平性:概要

Bias and Fairness in Chatbots: An Overview ( http://arxiv.org/abs/2309.08836v2 )

ライセンス: Link先を確認

Jintang Xue, Yun-Cheng Wang, Chengwei Wei, Xiaofeng Liu, Jonghye Woo, C.-C. Jay Kuo

(参考訳) チャットボットは半世紀以上研究されてきた。近年,自然言語処理(NLP)技術の急速な発展に伴い,大規模言語モデル(LLM)を用いたチャットボットが注目されている。従来のチャットボットと比較すると、現代のチャットボットはより強力で、現実世界のアプリケーションで使われている。しかし、現代のチャットボット設計にはバイアスと公平性に関する懸念がある。膨大なトレーニングデータ、非常に大きなモデルサイズ、解釈可能性の欠如、バイアス緩和、そして現代のチャットボットの公平性保存は困難である。そこで本稿では,チャットボットシステムにおけるバイアスと公平性について概観する。チャットボットの歴史とそのカテゴリを最初にレビューする。次に、バイアス源とアプリケーションにおける潜在的な害を分析する。公正なチャットボットシステムを設計する際の考察について考察する。最後に今後の研究方針について述べる。

Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in modern chatbot design. Due to the huge amounts of training data, extremely large model sizes, and lack of interpretability, bias mitigation and fairness preservation of modern chatbots are challenging. Thus, a comprehensive overview on bias and fairness in chatbot systems is given in this paper. The history of chatbots and their categories are first reviewed. Then, bias sources and potential harms in applications are analyzed. Considerations in designing fair and unbiased chatbot systems are examined. Finally, future research directions are discussed.

翻訳日:2023-12-12 22:33:46 公開日:2023-12-10

# VerilogEval:Verilogコード生成のための大規模言語モデルの評価

VerilogEval: Evaluating Large Language Models for Verilog Code Generation ( http://arxiv.org/abs/2309.07544v2 )

ライセンス: Link先を確認

Mingjie Liu, Nathaniel Pinckney, Brucek Khailany and Haoxing Ren

(参考訳) 大規模言語モデル (LLMs) の人気が高まり、様々な分野への応用の道が開かれた。本稿では,ハードウェア設計と検証のための Verilog コード生成の文脈で LLM 性能を評価するためのベンチマークフレームワークを提案する。本稿では,VerilogインストラクショナルWebサイトHDLBitsから156個の問題からなる総合評価データセットを提案する。評価セットは、単純な組合せ回路から複雑な有限状態マシンまで、様々なVerilogコード生成タスクからなる。生成した設計の過渡的シミュレーション出力を黄金解と比較することにより、Verilogのコード補完を機能的正当性のために自動テストすることができる。また,LLM生成した合成問題コードペアによるブートストラップにより,教師付き微調整により,事前学習言語モデルのVerilogコード生成能力を向上できることを実証した。

The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.

翻訳日:2023-12-12 22:33:35 公開日:2023-12-10

# 人間のフィードバックからのオフライン学習による言語モデルの調整

Aligning Language Models with Offline Learning from Human Feedback ( http://arxiv.org/abs/2308.12050v2 )

ライセンス: Link先を確認

Jian Hu, Li Tao, June Yang, Chandler Zhou

(参考訳) 人間の好みから学ぶことは言語モデル(LM)にとって重要であり、人間のニーズや社会的価値に効果的に対応する。従来の研究は、人間のフィードバックを利用して指示に従うことで顕著な進歩を遂げてきた。しかし、これらのアプローチは主にPPO(Proximal Policy Optimization)のようなオンライン学習技術に依存しており、言語モデルのチューニングが不安定で難しいことが証明されている。さらに、PPOは複雑な分散システムの実装を必要とし、大規模な分散トレーニングの効率を阻害する。本研究では,環境と対話することなくLMを協調するオフライン学習手法を提案する。具体的には、フィルタリングアライメント(FA)、報酬重み付けレグレッション(RWR)、条件付きアライメント(CA)を検討し、言語モデルを人間の好みに合わせる。教師付き微調整に類似した損失関数を用いることで、単純な機械学習システム~(MLSys)を用いてPPOよりも安定なモデルトレーニングを実現し、より少ない(約9倍)計算資源を実現できる。実験の結果,条件付きアライメントは他のオフラインアライメント手法よりも優れており,ppoに匹敵する。

Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline learning from human feedback framework to align LMs without interacting with environments. Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 9\%) computing resources. Experimental results demonstrate that conditional alignment outperforms other offline alignment methods and is comparable to PPO.

翻訳日:2023-12-12 22:32:50 公開日:2023-12-10

# ターゲットとトラブル: 子どものwebサイト上での追跡と広告

Targeted and Troublesome: Tracking and Advertising on Children's Websites ( http://arxiv.org/abs/2308.04887v2 )

ライセンス: Link先を確認

Zahra Moti, Asuman Senol, Hamid Bostani, Frederik Zuiderveen Borgesius, Veelasha Moonsamy, Arunesh Mathur, Gunes Acar

(参考訳) 現代のウェブでは、追跡者や広告主は同意なしにユーザーの詳細な行動プロファイルを構築し収益化することが多い。ウェブ追跡機構や広告に関する様々な研究にもかかわらず、子供をターゲットにしたウェブサイトに焦点を当てた厳格な研究は行われていない。そこで本研究では,子ども向けウェブサイトにおけるトラッキングと広告(ターゲット広告)の計測について述べる。児童向けWebサイトの包括的リストが欠如していることから、私たちはまず、Webページのタイトルと記述に基づく多言語分類器を構築する。この分類器を200万ページ以上に適用し、児童指向のWebサイトのリストをコンパイルする。 5つの点からこれらのサイトをクローリングし、トラッカー、指紋認証スクリプト、広告の頻度を測定します。当社のクローラは、児童向けウェブサイトに表示された広告を検出し、いつでも広告開示ページをスクレイピングすることで広告ターゲティングが有効かどうかを判断する。その結果、子ども向けウェブサイトの約90%には1つ以上のトラッカーが組み込まれており、約27%にはターゲット広告が含まれていることがわかった。次に、広告から抽出した画像とテキストの両方を処理するMLパイプラインを開発することにより、児童向けウェブサイト上の不適切な広告を識別する。このパイプラインでは、任意の検索語に対して意味的類似性クエリを実行し、デート、体重減少、メンタルヘルスに関連するサービスを促進する広告や、セックストイやおしゃべりのチャットサービスのための広告を明らかにすることができます。これらの広告のいくつかは、反発的で性的に明示的なイメージを特徴とする。要約すると、多くの広告主や児童向けウェブサイトでプライバシー規制に準拠せず、広告の安全性を損なう傾向が示唆されている。子どもを保護し、より安全なオンライン環境を構築するためには、規制当局と利害関係者がより厳格な措置を採用し、強制する必要がある。

On the modern web, trackers and advertisers frequently construct and monetize users' detailed behavioral profiles without consent. Despite various studies on web tracking mechanisms and advertisements, there has been no rigorous study focusing on websites targeted at children. To address this gap, we present a measurement of tracking and (targeted) advertising on websites directed at children. Motivated by lacking a comprehensive list of child-directed (i.e., targeted at children) websites, we first build a multilingual classifier based on web page titles and descriptions. Applying this classifier to over two million pages, we compile a list of two thousand child-directed websites. Crawling these sites from five vantage points, we measure the prevalence of trackers, fingerprinting scripts, and advertisements. Our crawler detects ads displayed on child-directed websites and determines if ad targeting is enabled by scraping ad disclosure pages whenever available. Our results show that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements--a practice that should require verifiable parental consent. Next, we identify improper ads on child-directed websites by developing an ML pipeline that processes both images and text extracted from ads. The pipeline allows us to run semantic similarity queries for arbitrary search terms, revealing ads that promote services related to dating, weight loss, and mental health; as well as ads for sex toys and flirting chat services. Some of these ads feature repulsive and sexually explicit imagery. In summary, our findings indicate a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites. To protect children and create a safer online environment, regulators and stakeholders must adopt and enforce more stringent measures.

翻訳日:2023-12-12 22:32:03 公開日:2023-12-10

# クロスプラットフォームヘイトスピーチ検出のための因果関係誘導乱れ

Causality Guided Disentanglement for Cross-Platform Hate Speech Detection ( http://arxiv.org/abs/2308.02080v3 )

ライセンス: Link先を確認

Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu

(参考訳) ソーシャルメディアプラットフォームは、オープンな言論を広める価値はあるものの、有害なコンテンツを広めるためにしばしば利用される。現在のディープラーニングと自然言語処理モデルは、この有害なコンテンツを検出するために、一般的なヘイトスピーチ検出に適応する能力に影響するドメイン固有の用語に依存している。これは、特定の言語信号や特定のカテゴリーの単語の使用に焦点を絞る傾向があるためである。もうひとつの重要な課題は、プラットフォームにトレーニング用の高品質なアノテートデータがない場合であり、異なる分散シフトに適応可能なクロスプラットフォームモデルの必要性が生じる。本研究では,あるプラットフォームのデータに基づいて学習し,複数のプラットフォームに一般化可能な,クロスプラットフォームのヘイトスピーチ検出モデルを提案する。プラットフォーム間の優れた一般化を実現するために、入力表現を不変かつプラットフォームに依存した機能に分解する方法がある。また,多様な環境にまたがる因果関係の学習は,ヘイトスピーチにおける不変表現の理解に大きく寄与すると考えられる。プラットフォームに依存した特徴(ヘイトターゲットの予測に使用される)とプラットフォームに依存しない特徴(ヘイトの存在の予測に使用される)に入力を分離することにより、分布シフトに抵抗する不変表現を学習する。これらの機能は、未公開のプラットフォームでヘイトスピーチを予測するために使用される。 4つのプラットフォームにまたがる広範な実験では,ヘイトスピーチの一般化検出における既存の最先端手法と比較して,モデルの有効性が向上していることが強調された。

Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content. Current deep learning and natural language processing models used for detecting this harmful content overly rely on domain-specific terms affecting their capabilities to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular linguistic signals or the use of certain categories of words. Another significant challenge arises when platforms lack high-quality annotated data for training, leading to a need for cross-platform models that can adapt to different distribution shifts. Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms. To achieve good generalizability across platforms, one way is to disentangle the input representations into invariant and platform-dependent features. We also argue that learning causal relationships, which remain constant across diverse environments, can significantly aid in understanding invariant representations in hate speech. By disentangling input into platform-dependent features (useful for predicting hate targets) and platform-independent features (used to predict the presence of hate), we learn invariant representations resistant to distribution shifts. These features are then used to predict hate speech across unseen platforms. Our extensive experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.

翻訳日:2023-12-12 22:31:32 公開日:2023-12-10

# 教育における人間とaiのハイブリッドエッセイのための境界の自動検出

Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education ( http://arxiv.org/abs/2307.12267v5 )

ライセンス: Link先を確認

Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c, Guanliang Chen

(参考訳) 最近の大規模言語モデル(llm)、例えばchatgptは、特定の指示が提供されたときに、人間的かつ流動的な応答を生成することができる。技術進歩によってもたらされる利便性を認める一方で、教育者は、学生がLSMを活用して執筆の課題を完了し、それらを元の作業として引き渡すのではないかと懸念している。このような懸念から、多くのAIコンテンツ検出研究が実施されているが、これらの先行研究の多くは、テキストが完全に人間書きであるか、完全にAI生成であると仮定して、AIコンテンツ検出を分類問題としてモデル化した。本研究では,人間と生成的LLM(ハイブリッドテキスト)が共同で検出対象のテキストを書けるような,希少かつ現実的な環境下でのAIコンテンツ検出について検討した。まず,対象とするハイブリッドテキスト(境界検出)から人書きコンテンツとAI生成コンテンツ間の遷移点を特定することを目的とした。そこで我々は,(1)エンコーダ訓練中にAI生成コンテンツと人書きコンテンツとを分離する2段階のアプローチを提案し,(2)隣り合う2つのプロトタイプ間の距離を計算し,その境界が互いに最も近い2つのプロトタイプの間に存在すると仮定した。 Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.

The recent large language models (LLMs), e.g., ChatGPT, have been able to generate human-like and fluent responses when provided with specific instructions. While admitting the convenience brought by technological advancement, educators also have concerns that students might leverage LLMs to complete their writing assignments and pass them off as their original work. Although many AI content detection studies have been conducted as a result of such concerns, most of these prior studies modeled AI content detection as a classification problem, assuming that a text is either entirely human-written or entirely AI-generated. In this study, we investigated AI content detection in a rarely explored yet realistic setting where the text to be detected is collaboratively written by human and generative LLMs (i.e., hybrid text). We first formalized the detection task as identifying the transition points between human-written content and AI-generated content from a given hybrid text (boundary detection). Then we proposed a two-step approach where we (1) separated AI-generated content from human-written content during the encoder training process; and (2) calculated the distances between every two adjacent prototypes and assumed that the boundaries exist between the two adjacent prototypes that have the furthest distance from each other. Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.

翻訳日:2023-12-12 22:30:42 公開日:2023-12-10

# 自動コンテンツ分析における誤分類は回帰バイアスを引き起こす。修正できますか? はいできます!

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can! ( http://arxiv.org/abs/2307.06483v2 )

ライセンス: Link先を確認

Nathan TeBlunthuis, Valerie Hase, Chung-Hong Chan

(参考訳) 教師付き機械学習(sml)によって構築される自動分類器(acs)は、テキストから画像やビデオまで、大規模で統計的に強力なデータのサンプルを分類することができ、通信科学や関連分野において広く普及している。この人気にもかかわらず、高精度な分類器でさえ誤分類バイアスや誤解を招くようなエラーを発生させ、下流解析の結果を誤解させる。 SML応用の体系的な文献レビューで示すように、コミュニケーション研究者は誤分類バイアスをほとんど無視する。原則として、既存の統計手法は、人間の注釈者によって作成されたような「金標準」検証データを使用して、誤分類バイアスを正し、一貫した見積もりを生成することができる。我々は,Rパッケージの誤分類モデルの設計と実装を含む新しい手法をモンテカルロシミュレーションを用いて導入し,その手法の限界を明らかにする。提案手法は汎用性と効率性を有するため,新しい誤り訂正手法を推奨する。まとめると、自動分類器(共通精度基準以下のものや体系的な誤分類)は、注意深い研究設計と適切な誤り訂正方法を用いて測定するのに有用である。

Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.

翻訳日:2023-12-12 22:30:15 公開日:2023-12-10

# ReLoRA:低ランク更新によるハイランクトレーニング

ReLoRA: High-Rank Training Through Low-Rank Updates ( http://arxiv.org/abs/2307.05695v4 )

ライセンス: Link先を確認

Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky

(参考訳) 数十億のパラメータを持つ大規模ネットワークによるスケールの優位と有効性にもかかわらず、過剰パラメータモデルのトレーニングの必要性はいまだに理解されておらず、トレーニングコストは指数関数的に増加する。本稿では,大規模ニューラルネットワークのトレーニング手法としてパラメータ効率のトレーニング手法を検討する。高速ネットワークのトレーニングに低ランク更新を利用するReLoRAという新しい手法を提案する。最大1.3Bパラメータを持つトランスフォーマー言語モデルのトレーニングにReLoRAを適用し、通常のニューラルネットワークトレーニングに匹敵するパフォーマンスを示す。 ReLoRAはGPU当たり最大5.5GbのRAMを節約し、モデルサイズとハードウェア設定に応じてトレーニング速度を9～40%改善する。本研究は,大規模プレトレーニングにおけるパラメータ効率向上手法の可能性を示す。

Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially. In this paper, we explore parameter-efficient training techniques as an approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to training transformer language models with up to 1.3B parameters and demonstrate comparable performance to regular neural network training. ReLoRA saves up to 5.5Gb of RAM per GPU and improves training speed by 9-40% depending on the model size and hardware setup. Our findings show the potential of parameter-efficient techniques for large-scale pre-training.

翻訳日:2023-12-12 22:29:51 公開日:2023-12-10

# ロバスト・プルーニングに向けて:言語モデルのための適応的知識保持プルーニング戦略

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models ( http://arxiv.org/abs/2310.13191v2 )

ライセンス: Link先を確認

Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu

(参考訳) pruningの目標は、言語モデルの正確性と頑健性を超えて、最近拡張された。それにもかかわらず、既存の手法は、モデルの間隔を継続的に増加させ、再訓練プロセスを必要とする場合、敵攻撃に対する堅牢性を高めるのに苦労している。人間が大きな言語モデルの時代に入ると、これらの問題はますます顕著になる。本稿では, 言語モデルの頑健性は, 学習済み知識の程度に比例することを示す。そこで本研究では,高密度言語モデルの埋め込み空間と特徴空間を忠実に再現し,pruningプロセスにおける事前学習知識の保存を目的とした,訓練後のpruning戦略を提案する。このセットアップでは、各レイヤの再構成エラーはそれ自体から発生するだけでなく、前のレイヤからの累積誤差も含む。他の最先端のベースラインと比較して、我々のアプローチは、SST2、IMDB、AGNewsのデータセット上でBERTによる精度、スパーシリティ、ロバスト性、およびプルーニングコストのバランスが優れていることを示す。

The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.

翻訳日:2023-12-12 22:22:02 公開日:2023-12-10

# リー代数畳み込みによる概等分散

Almost Equivariance via Lie Algebra Convolutions ( http://arxiv.org/abs/2310.13164v3 )

ライセンス: Link先を確認

Daniel McNeela

(参考訳) 近年,機械学習の研究において,集団行動に関するモデルの等価性が重要な話題となっている。既存のニューラルネットワークアーキテクチャの組込み等価性の解析や、明示的に"bake in"等価性を持つモデルの構築に関する研究は、それ自体で重要な研究領域となっている。しかし、特定のグループの同値性を持つアーキテクチャを付与することは、モデルが期待するデータ変換のタイプに強く先行する。厳密な同変モデルは対称性を強制するが、実世界のデータは必ずしもそのような厳密な等式に従わない。そのような場合、厳密な等分散の事前は実際には強すぎることが証明され、モデルが過小評価される。そこで本研究では,近縁な話題であるほぼ同値な話題について考察する。概等分散の定義を提供し、リー群のリー代数に訴えることでモデルの概等分散を符号化する実用的な方法を与える。具体的には、リー代数の畳み込みを定義し、それらはリー群畳み込みよりもいくつかの利点をもたらすことを証明している。そこから, 等分散および等化の概念と, 概等分散および概等化の概念との関係を示す。 2つの存在定理を証明し、1つは多様体の等距離の有界距離における概等距離の存在を示し、もう1つはヒルベルト空間の逆を示す。我々は、これらの定理を拡張して、群作用と関数類に関する一定の制約に従う完全同値な埋め込み関数の有界距離内における概同値多様体埋め込みの存在を証明する。最後に、完全同値およびほぼ同値な設定でデータセットに対してベンチマークを行うことにより、このアプローチの有効性を実証する。

Recently, the equivariance of models with respect to a group action has become an important topic of research in machine learning. Analysis of the built-in equivariance of existing neural network architectures, as well as the study of building models that explicitly "bake in" equivariance, have become significant research areas in their own right. However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, real-world data does not always conform to such strict equivariances. In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform. Therefore, in this work we study a closely related topic, that of almost equivariance. We provide a definition of almost equivariance and give a practical method for encoding almost equivariance in models by appealing to the Lie algebra of a Lie group. Specifically, we define Lie algebra convolutions and demonstrate that they offer several benefits over Lie group convolutions, including being well-defined for non-compact Lie groups having non-surjective exponential map. From there, we demonstrate connections between the notions of equivariance and isometry and those of almost equivariance and almost isometry. We prove two existence theorems, one showing the existence of almost isometries within bounded distance of isometries of a manifold, and another showing the converse for Hilbert spaces. We extend these theorems to prove the existence of almost equivariant manifold embeddings within bounded distance of fully equivariant embedding functions, subject to certain constraints on the group action and the function class. Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.

翻訳日:2023-12-12 22:21:37 公開日:2023-12-10

# フィードバックからの多様性

Diversity from Human Feedback ( http://arxiv.org/abs/2310.06648v2 )

ライセンス: Link先を確認

Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian

(参考訳) 多様性はアンサンブル学習、強化学習、組合せ最適化など多くの問題において重要な役割を果たす。多様性の尺度を定義する方法は、長年にわたる問題である。多くの手法は専門的な経験に基づいて適切な行動空間を定義し、多様性の測定値を得るが、多くのシナリオでは難しい。本稿では,人間のフィードバックから行動空間を学習する問題を提案し,それを解決するために多様性(diversity from human feedback,divhf)と呼ばれる一般的な手法を提案する。 DivHFは、人間のフィードバックをクエリすることで、人間の好みと一致した行動記述子を学習する。学習した行動記述子は、あらゆる距離測度と組み合わせて多様性測度を定義することができる。本稿では,品質多様性最適化アルゴリズムmap-elitesと統合し,qdaxスイート上で実験を行い,divhfの有効性を示す。結果は、DivHFが直接データ駆動アプローチよりも人間の要求に合う行動空間を学習し、人間の好みの下でより多様なソリューションをもたらすことを示している。我々の貢献は、問題の定式化、DivHF法の提案、実験による効果の実証である。

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

翻訳日:2023-12-12 22:21:10 公開日:2023-12-10

# フェデレーション・トランスファー・ラーニングによる基礎モデル:汎用フレームワーク

Grounding Foundation Models through Federated Transfer Learning: A General Framework ( http://arxiv.org/abs/2311.17431v5 )

ライセンス: Link先を確認

Yan Kang, Tao Fan, Hanlin Gu, Lixin Fan, Qiang Yang

(参考訳) 膨大な知識と強力な創発能力を備えたGPT-4のような基礎モデル(FM)は、様々な自然言語処理やコンピュータビジョンタスクにおいて大きな成功を収めている。 FMをドメイン固有のタスクに適応させたり、ドメイン固有の知識で拡張することで、FMの潜在能力を最大限活用することができる。しかし、基盤となるFMは、主に制約のあるコンピューティングリソース、データプライバシ、モデルの不均一性、モデルオーナシップなど、いくつかの課題に直面している。フェデレーション・トランスファー・ラーニング(FTL)は、フェデレーション・ラーニングとトランスファー・ラーニングを組み合わせたもので、これらの課題に対処するための有望なソリューションを提供する。近年、FTL-FMと呼ばれるFTLを利用したFMの接地の必要性が、学術と産業の両方で強く現れている。本研究では,FTL-FM研究の高度化とFTL-FMの産業的応用への影響を背景として,FTL-FMフレームワークの構築,FTL-FMフレームワークに基づく詳細な分類法の構築,最先端のFTL-FM作品の分類,提案した分類法に基づくFTL-FM作品の包括的概要について述べる。また、FTL-FMと従来のFM適応フェーズの対応性を確立し、FM実践者がFTL-FMと研究作業を整合させることができるようにした。さらに、FTL-FMにおいて効率とプライバシーが重要となるため、高度な効率改善とプライバシー保護技術の概要を述べる。最後に,FTL-FMの今後の研究の方向性について述べる。

Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM.

翻訳日:2023-12-12 22:08:36 公開日:2023-12-10

# FLORIDA:フェイクっぽいリアルイメージデータセット

FLORIDA: Fake-looking Real Images Dataset ( http://arxiv.org/abs/2311.10931v2 )

ライセンス: Link先を確認

Ali Borji

(参考訳) ディープフェイクの検出におけるAIツールやモデルの有効性を評価するために、広範な研究がなされているが、これらのモデルが人工的に現れる真のイメージを正確に識別できるかどうかについては疑問が残る。本研究では,この問題に対処するための最初のステップとして,偽の外観を示す510の本物画像のデータセットをキュレートし,2つのaiモデルを用いて評価を行った。データセットに適用すると,2つのモデルがサブパー性能を示した。さらに,我々のデータセットは,複雑な視覚刺激を理解するための深層学習モデルの能力を評価する上で有用なツールとなり得る。本研究は,本分野におけるさらなる議論と調査の促進を期待する。私たちのデータセットはhttps://github.com/aliborji/FLORIDAでアクセスできます。

Although extensive research has been carried out to evaluate the effectiveness of AI tools and models in detecting deep fakes, the question remains unanswered regarding whether these models can accurately identify genuine images that appear artificial. In this study, as an initial step towards addressing this issue, we have curated a dataset of 510 genuine images that exhibit a fake appearance and conducted an assessment using two AI models. We show that two models exhibited subpar performance when applied to our dataset. Additionally, our dataset can serve as a valuable tool for assessing the ability of deep learning models to comprehend complex visual stimuli. We anticipate that this research will stimulate further discussions and investigations in this area. Our dataset is accessible at https://github.com/aliborji/FLORIDA.

翻訳日:2023-12-12 22:07:49 公開日:2023-12-10

# グラフニューラルネットワークによるイスラム教に対するヘイトスピーチの特定

Explainable Identification of Hate Speech towards Islam using Graph Neural Networks ( http://arxiv.org/abs/2311.04916v2 )

ライセンス: Link先を確認

Azmine Toushik Wasi

(参考訳) islamophobic languageは、オンラインソーシャルインタラクションプラットフォームにおける一般的な課題である。このような憎しみの特定と排除は、調和と平和の未来への重要な一歩である。本研究では,グラフニューラルネットワークを用いて,イスラム教に対するヘイトスピーチを識別し,説明するための新しいパラダイムを提案する。グラフニューラルネットワークの本質的な能力を利用して、異なるデータポイント間の関係を探索、抽出、使用することにより、我々のモデルは、基礎となる相関関係と因果関係の説明を提供しながら、一貫して優れた性能を達成する。

Islamophobic language is a prevalent challenge on online social interaction platforms. Identifying and eliminating such hatred is a crucial step towards a future of harmony and peace. This study presents a novel paradigm for identifying and explaining hate speech towards Islam using graph neural networks. Utilizing the intrinsic ability of graph neural networks to find, extract, and use relationships across disparate data points, our model consistently achieves outstanding performance while offering explanations for the underlying correlations and causation.

翻訳日:2023-12-12 22:07:21 公開日:2023-12-10

# JADE:大規模言語モデルのための言語ベースの安全評価プラットフォーム

JADE: A Linguistics-based Safety Evaluation Platform for Large Language Models ( http://arxiv.org/abs/2311.00286v3 )

ライセンス: Link先を確認

Mi Zhang and Xudong Pan and Min Yang

(参考訳) 本稿では, シード質問の言語的複雑さを強化し, 広範に使用されているLLMを, オープンソース中国語8種, 商用中国語6種, 商用英語4種に分類し, 同時に一貫的に破壊する言語ファジリングプラットフォームであるJADEを提案する。質問は同時に複数のLSMの有害な生成を誘発し、平均的な安全でない生成比は$70\%$(下表を参照)であるが、依然として自然の質問であり、コアの安全でないセマンティクスは流動的で保存されている。我々は、商用のLLMとオープンソースのLLM向けに生成されたベンチマークデモを、以下のリンクでリリースする。 JADEによって生成されたより多くの質問を評価することに興味がある読者には、ご連絡ください。 JADEはノーム・チョムスキーの変質生成文法の理論に基づいている。シード質問が安全でない意図で与えられると、JADEは、安全ガードレールが壊れるまで、元の質問の構文構造の複雑さを増すために、生成規則と変換規則のシーケンスを起動する。我々の重要な洞察は: 人間の言語の複雑さのため、現在の最高のLLMのほとんどは、完全にカバーできない無制限の例空間を形成する無限の異なる構文構造から、不変の悪をほとんど認識できない。技術的には、生成/変換規則は言語のネイティブな話者によって構築され、一旦開発されていれば、ガードレールが壊れるまで、ある質問のパースツリーを自動成長させ変換するのに使うことができる。さらなる評価結果とデモについては、Webサイトを参照してください。

In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which contain unsafe questions that are highly threatening: the questions simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of $70\%$ (please see the table below), while are still natural questions, fluent and preserving the core unsafe semantics. We release the benchmark demos generated for commercial English LLMs and open-sourced English LLMs in the following link: https://github.com/whitzard-ai/jade-db. For readers who are interested in evaluating on more questions generated by JADE, please contact us. JADE is based on Noam Chomsky's seminal theory of transformational-generative grammar. Given a seed question with unsafe intention, JADE invokes a sequence of generative and transformational rules to increment the complexity of the syntactic structure of the original question, until the safety guardrail is broken. Our key insight is: Due to the complexity of human language, most of the current best LLMs can hardly recognize the invariant evil from the infinite number of different syntactic structures which form an unbound example space that can never be fully covered. Technically, the generative/transformative rules are constructed by native speakers of the languages, and, once developed, can be used to automatically grow and transform the parse tree of a given question, until the guardrail is broken. For more evaluation results and demo, please check our website: https://whitzard-ai.github.io/jade.html.

翻訳日:2023-12-12 22:06:06 公開日:2023-12-10

# 弱教師付きセマンティックセグメンテーションを支援する基礎モデル

Foundation Model Assisted Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2312.03585v2 )

ライセンス: Link先を確認

Xiaobo Yang and Xiaojin Gong

(参考訳) 本研究の目的は, 画像レベルのラベルを用いた弱教師付きセマンティックセマンティックセグメンテーション (WSSS) に対処するために, コントラッシブ言語イメージ事前学習 (CLIP) やSAMセグメンテーションアプライアンスモデル (SAM) などの事前訓練された基礎モデルを活用することである。そこで本研究では,高品質なセグメンテーション種子を生成するためのCLIPとSAMに基づく粗粒度フレームワークを提案する。具体的には,CLIPが凍結重量と2組の学習可能なタスク固有のプロンプトで共同で行う画像分類タスクとシードセグメンテーションタスクを構築する。 SAM-based seeding (SAMS) モジュールは、粗いシードマップまたは細かなシードマップを生成するために各タスクに設計および適用される。さらに,画像レベルラベルに教師付きマルチラベルコントラスト損失と,生成した粗いシードマップに教師付されたカムアクティベーション損失をデザインする。これらの損失は、私たちのフレームワークで学ぶべき唯一の部分であるプロンプトを学ぶために使用されます。ひとたびプロンプトが学習されると、学習したセグメンテーション固有のプロンプトとともにCLIPとSAMSモジュールに各イメージを入力し、高品質なセグメンテーションシードを生成する。これらのシードは、他の2段階のWSSSメソッドと同様に、市販のセグメンテーションネットワークをトレーニングするための擬似ラベルとして機能する。実験により, PASCAL VOC 2012の最先端性能とMS COCO 2014の競争結果が得られた。コードはhttps://github.com/HAL-42/FMA-WSSS.gitで入手できる。

This work aims to leverage pre-trained foundation models, such as contrastive language-image pre-training (CLIP) and segment anything model (SAM), to address weakly supervised semantic segmentation (WSSS) using image-level labels. To this end, we propose a coarse-to-fine framework based on CLIP and SAM for generating high-quality segmentation seeds. Specifically, we construct an image classification task and a seed segmentation task, which are jointly performed by CLIP with frozen weights and two sets of learnable task-specific prompts. A SAM-based seeding (SAMS) module is designed and applied to each task to produce either coarse or fine seed maps. Moreover, we design a multi-label contrastive loss supervised by image-level labels and a CAM activation loss supervised by the generated coarse seed map. These losses are used to learn the prompts, which are the only parts need to be learned in our framework. Once the prompts are learned, we input each image along with the learned segmentation-specific prompts into CLIP and the SAMS module to produce high-quality segmentation seeds. These seeds serve as pseudo labels to train an off-the-shelf segmentation network like other two-stage WSSS methods. Experiments show that our method achieves the state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014. Code is available at https://github.com/HAL-42/FMA-WSSS.git.

翻訳日:2023-12-12 21:54:50 公開日:2023-12-10

# オフライン強化学習における一般化ギャップ

The Generalization Gap in Offline Reinforcement Learning ( http://arxiv.org/abs/2312.05742v1 )

ライセンス: Link先を確認

Ishita Mediratta, Qingfei You, Minqi Jiang and Roberta Raileanu

(参考訳) オフライン学習の最近の進歩にもかかわらず、これらの手法はいまだに同じ環境で訓練され、テストされている。本稿では、オンライン強化学習(RL)、オフラインRL、シーケンスモデリング、行動クローニングなど、広く使われているオンラインおよびオフライン学習手法の一般化能力を比較する。実験の結果,オフライン学習アルゴリズムはオンライン学習よりも新しい環境ではパフォーマンスが良いことがわかった。また,オフライン学習における一般化を評価する最初のベンチマークとして,procgen (2dビデオゲーム) や webshop (eコマースwebサイト) から,さまざまなサイズとスキルレベルのデータセットを収集した。データセットには限られた数のゲームレベルや自然言語命令の軌跡が含まれており、テスト時にはエージェントは新しいレベルや命令に一般化する必要がある。実験の結果,既存のオフライン学習アルゴリズムは,トレーニング環境とテスト環境の両方においてオンラインRLの性能に適合することが判明した。ビヘイビアクローンは強力なベースラインであり、複数の環境のデータに基づいてトレーニングし、新しい環境でテストした場合、最先端のオフラインRLとシーケンスモデリングアプローチより優れている。最後に、データのサイズよりも多様性が増すことで、すべてのオフライン学習アルゴリズムの新たな環境の性能が向上することがわかった。本研究は,現在のオフライン学習アルゴリズムの限定的一般化を実証し,この分野におけるさらなる研究の必要性を浮き彫りにした。

Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinforcement learning (RL), offline RL, sequence modeling, and behavioral cloning. Our experiments show that offline learning algorithms perform worse on new environments than online learning ones. We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skill-levels from Procgen (2D video games) and WebShop (e-commerce websites). The datasets contain trajectories for a limited number of game levels or natural language instructions and at test time, the agent has to generalize to new levels or instructions. Our experiments reveal that existing offline learning algorithms struggle to match the performance of online RL on both train and test environments. Behavioral cloning is a strong baseline, outperforming state-of-the-art offline RL and sequence modeling approaches when trained on data from multiple environments and tested on new ones. Finally, we find that increasing the diversity of the data, rather than its size, improves performance on new environments for all offline learning algorithms. Our study demonstrates the limited generalization of current offline learning algorithms highlighting the need for more research in this area.

翻訳日:2023-12-12 19:15:41 公開日:2023-12-10

# FP8-BERT:変圧器の後の量子化

FP8-BERT: Post-Training Quantization for Transformer ( http://arxiv.org/abs/2312.05725v1 )

ライセンス: Link先を確認

Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu

(参考訳) BERTのようなトランスフォーマーベースのモデルは、幅広い自然言語処理タスクに広く応用されている。しかし、避けられない副作用は、大規模なメモリストレージと本番環境にデプロイする際の推論コストである。量子化はコストを緩和する一般的な方法の1つである。しかし、INT8データフォーマットに基づく以前の8ビット量子化戦略は、PTQ(Post-Training Quantization)方式の精度の低下に悩まされるか、高価な量子化アウェアトレーニング(QAT)プロセスを必要とする。近年、H100のような商用AIコンピューティングプラットフォームにおいて、新しい数値形式FP8(すなわち浮動小数点8ビット)が提案されサポートされている。本稿では,簡単なキャリブレーションとフォーマット変換プロセスを用いて,精度を損なうことなく後トレーニング量子化を行う方法としてのfp8の有効性を実証的に検証した。 GLUE と SQuAD v1.1 データセット上でのBERT 変種の実験において,~\citet{nvidia_release} が提案した FP8 標準を採用し,FP8 を用いた PTQ が INT8 の精度を大幅に向上できることを示す。

Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks. However, one inevitable side effect is that they require massive memory storage and inference cost when deployed in production. Quantization is one of the popularized ways to alleviate the cost. However, the previous 8-bit quantization strategy based on INT8 data format either suffers from the degradation of accuracy in a Post-Training Quantization (PTQ) fashion or requires an expensive Quantization-Aware Training (QAT) process. Recently, a new numeric format FP8 (i.e. floating-point of 8-bits) has been proposed and supported in commercial AI computing platforms such as H100. In this paper, we empirically validate the effectiveness of FP8 as a way to do Post-Training Quantization without significant loss of accuracy, with a simple calibration and format conversion process. We adopt the FP8 standard proposed by~\citet{nvidia_release} in our extensive experiments of BERT variants on GLUE and SQuAD v1.1 datasets, and show that PTQ with FP8 can significantly improve the accuracy upon that with INT8, to the extent of the full-precision model.

翻訳日:2023-12-12 19:15:18 公開日:2023-12-10

# 空間連続繊維配向関数の学習

Learning Spatially-Continuous Fiber Orientation Functions ( http://arxiv.org/abs/2312.05721v1 )

ライセンス: Link先を確認

Tyler Spears and P. Thomas Fletcher

(参考訳) ヒトコネクトームの理解は拡散mr画像の分解能によって根本的に制限される。コネクトームを構成する神経経路をコントリクトグラフィで再構築するには、繊維方向の連続フィールドに従う必要がある。典型的には、この磁場は低分解能、雑音拡散MRIにおいて単純な三線補間で見られる。しかし、低品質データの微細な変化にともなうトリリニア補間は困難である。超解像拡散mriにおける最近のディープラーニング手法は固定空間格子へのアップサンプリングに焦点を当てているが、連続場の必要性を満たすものではない。本研究では,低分解能拡散強調画像から空間連続繊維配向密度関数を学習する新しい手法fenriを提案する。また, フェンリの気道解析能力の定量化のために, 深層気道モデル評価のための拡張シミュレーションデータセットも導入した。我々は,FENRIが高分解能繊維配向を現実的な低品質データから正確に予測し,FENRIをベースとしたトラクトグラフィーにより,現在のトリリニア補間よりも高速な直線再構成を実現することを示した。

Our understanding of the human connectome is fundamentally limited by the resolution of diffusion MR images. Reconstructing a connectome's constituent neural pathways with tractography requires following a continuous field of fiber directions. Typically, this field is found with simple trilinear interpolation in low-resolution, noisy diffusion MRIs. However, trilinear interpolation struggles following fine-scale changes in low-quality data. Recent deep learning methods in super-resolving diffusion MRIs have focused on upsampling to a fixed spatial grid, but this does not satisfy tractography's need for a continuous field. In this work, we propose FENRI, a novel method that learns spatially-continuous fiber orientation density functions from low-resolution diffusion-weighted images. To quantify FENRI's capabilities in tractography, we also introduce an expanded simulated dataset built for evaluating deep-learning tractography models. We demonstrate that FENRI accurately predicts high-resolution fiber orientations from realistic low-quality data, and that FENRI-based tractography offers improved streamline reconstruction over the current use of trilinear interpolation.

翻訳日:2023-12-12 19:14:52 公開日:2023-12-10

# プライバシ攻撃の勾配と優先順位を超えて: フェデレーション学習における言語モデルのプール層入力の活用

Beyond Gradient and Priors in Privacy Attacks: Leveraging Pooler Layer Inputs of Language Models in Federated Learning ( http://arxiv.org/abs/2312.05720v1 )

ライセンス: Link先を確認

Jianwei Li, Sheng Liu, Qi Lei

(参考訳) federated learning(fl)は、データをローカルに保存し、モデル更新のみを送信することで、ユーザのプライバシを強調する。最近、flの文脈で言語モデルからセンシティブなトレーニングテキストを抽出することで、プライバシ攻撃に関する一連の作業がユーザのプライバシを損なう。バッチサイズが制限された作業(バッチサイズ1など)もあれば,検出が容易なものもある。本稿では,様々なバッチサイズ設定におけるテキストの回復率を著しく向上させ,検出し難い革新的なアプローチを提案する。基本的なグラデーションマッチングとドメイン事前知識に基づいて,言語モデルのプール層の入力を復元することで,機能レベルで追加の教師付き信号を提供することができる。勾配データとは異なり、これらの信号は文やトークンの平均値ではなく、より微妙で効果的な洞察を提供する。我々は,テキスト分類タスクをCoLA,SST-2,Rotten Tomatoesなどのデータセット上でベンチマークする。バッチサイズとモデルが異なるため、我々のアプローチは従来よりも一貫して優れています。

Federated learning (FL) emphasizes decentralized training by storing data locally and sending only model updates, underlining user privacy. Recently, a line of works on privacy attacks impairs user privacy by extracting sensitive training text from language models in the context of FL. Yet, these attack techniques face distinct hurdles: some work chiefly with limited batch sizes (e.g., batch size of 1), and others are easily detectable. This paper introduces an innovative approach that is challenging to detect, significantly enhancing the recovery rate of text in various batch-size settings. Building on fundamental gradient matching and domain prior knowledge, we enhance the attack by recovering the input of the Pooler layer of language models, which enables us to provide additional supervised signals at the feature level. Unlike gradient data, these signals do not average across sentences and tokens, thereby offering more nuanced and effective insights. We benchmark our method using text classification tasks on datasets such as CoLA, SST-2, and Rotten Tomatoes. Across different batch sizes and models, our approach consistently outperforms previous state-of-the-art results.

翻訳日:2023-12-12 19:14:33 公開日:2023-12-10

# dvanet:マルチビューアクション認識のためのビューとアクションの分離

DVANet: Disentangling View and Action Features for Multi-View Action Recognition ( http://arxiv.org/abs/2312.05719v1 )

ライセンス: Link先を確認

Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah

(参考訳) 本研究では,映像中の映像関連情報から,学習した行動表現を分離するための多視点行動認識手法を提案する。複数の視点からキャプチャされたアクションインスタンスを分類しようとすると、異なるカメラアングルからキャプチャされたアクションの背景、オクルージョン、可視性の違いにより、より困難度が高くなる。マルチビュー動作認識で導入された様々な問題に対処するため,学習可能なトランスフォーマーデコーダクエリを2つの教師付きコントラスト損失とともに新たに構成し,視点の変化に頑健な動作特徴の学習を行う。トランスフォーマーデコーダは、別々のクエリを使用して、アクションとビュー情報を分離して学習します。我々は,NTU RGB+D,NTU RGB+D 120,PKU-MMD,N-UCLAの4つの多視点行動認識データセットにおいて,他のユニモーダルモデルよりも有意に優れていることを示す。従来のRGBと比較すると、各データセットでそれぞれ1.5\%、4.8\%、2.2\%、および4.8\%の最大改善が見られる。

In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video. When trying to classify action instances captured from multiple viewpoints, there is a higher degree of difficulty due to the difference in background, occlusion, and visibility of the captured action from different camera angles. To tackle the various problems introduced in multi-view action recognition, we propose a novel configuration of learnable transformer decoder queries, in conjunction with two supervised contrastive losses, to enforce the learning of action features that are robust to shifts in viewpoints. Our disentangled feature learning occurs in two stages: the transformer decoder uses separate queries to separately learn action and view information, which are then further disentangled using our two contrastive losses. We show that our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets: NTU RGB+D, NTU RGB+D 120, PKU-MMD, and N-UCLA. Compared to previous RGB works, we see maximal improvements of 1.5\%, 4.8\%, 2.2\%, and 4.8\% on each dataset, respectively.

翻訳日:2023-12-12 19:14:10 公開日:2023-12-10

# データ可用性を制限したリチウムイオン電池寿命予測: 異なる機械学習アルゴリズムのベンチマーク

Forecasting Lithium-Ion Battery Longevity with Limited Data Availability: Benchmarking Different Machine Learning Algorithms ( http://arxiv.org/abs/2312.05717v1 )

ライセンス: Link先を確認

Hudson Hilal and Pramit Saha

(参考訳) リチウムイオン電池の使用が拡大するにつれて、リチウムイオン電池の寿命を予測できることがますます重要になっている。この研究は、従来の機械学習とディープラーニングの両方で異なる機械学習アルゴリズムの相対的性能を比較して、最小限のデータに基づいてバッテリーサイクル寿命予測のための最高のパフォーマンスアルゴリズムを決定することを目的としている。統計的データに基づいて,14種類の機械学習モデルを用いて手作りの機能を入力し,テストのために3つの特徴群に分割した。ディープラーニングモデルでは,標準的なリカレントニューラルネットワークの構成,ゲート型リカレントユニット,アテンション機構のない長期記憶など,さまざまなニューラルネットワークモデルをテストした。ディープラーニングモデルは、最初の100サイクルで各バッテリーの生データに基づいて、多変量時系列信号を供給した。実験の結果,手作り機能を用いた機械学習アルゴリズムは特に良好であり,平均絶対パーセンテージ誤差は10～20%であった。最も優れたアルゴリズムはランダムフォレスト回帰器であり、9.8%の平均絶対パーセンテージ誤差を与えた。従来の機械学習モデルは、一般的なデータセットのトレンドを理解する能力に優れていた。一方,深層学習モデルでは,生の限られたデータに対して,特に性能が低かった。中間範囲のデータ依存を捉えることにフォーカスした gru や rnns のようなアルゴリズムは、このタスクにとって重要で緩やかな傾向を認識するのにあまり適していなかった。本研究により,手作り機能付き機械学習モデルの実装は,データ可用性に制限のあるリチウムイオン電池寿命を予測するための高度なディープラーニングモデルよりも有効であることが判明した。

As the use of Lithium-ion batteries continues to grow, it becomes increasingly important to be able to predict their remaining useful life. This work aims to compare the relative performance of different machine learning algorithms, both traditional machine learning and deep learning, in order to determine the best-performing algorithms for battery cycle life prediction based on minimal data. We investigated 14 different machine learning models that were fed handcrafted features based on statistical data and split into 3 feature groups for testing. For deep learning models, we tested a variety of neural network models including different configurations of standard Recurrent Neural Networks, Gated Recurrent Units, and Long Short Term Memory with and without attention mechanism. Deep learning models were fed multivariate time series signals based on the raw data for each battery across the first 100 cycles. Our experiments revealed that the machine learning algorithms on handcrafted features performed particularly well, resulting in 10-20% average mean absolute percentage error. The best-performing algorithm was the Random Forest Regressor, which gave a minimum 9.8% mean absolute percentage error. Traditional machine learning models excelled due to their capability to comprehend general data set trends. In comparison, deep learning models were observed to perform particularly poorly on raw, limited data. Algorithms like GRU and RNNs that focused on capturing medium-range data dependencies were less adept at recognizing the gradual, slow trends critical for this task. Our investigation reveals that implementing machine learning models with hand-crafted features proves to be more effective than advanced deep learning models for predicting the remaining useful Lithium-ion battery life with limited data availability.

翻訳日:2023-12-12 19:13:45 公開日:2023-12-10

# 逆転学習における初期化の課題

Initialization Matters for Adversarial Transfer Learning ( http://arxiv.org/abs/2312.05716v1 )

ライセンス: Link先を確認

Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin

(参考訳) 転校学習における事前学習・微調整パラダイムの普及に伴い,ダウンストリームタスクの堅牢性が重要な課題となっている。本研究では,トランスファー学習における敵対的ロバスト性に着目し,事前学習モデルとリニアヘッドの両方を含む初期化の重要な役割を明らかにする。まず, 対向的にロバストな事前学習モデルの必要性を見いだす。特に, 標準事前学習モデルでは, パラメータ効率のよいファインチューニング (peft) 手法は, 逆行性に乏しいか, 逆行性が著しく低下した下流タスクに対して, 逆行性に頑健性を示すことが判明した。強固な事前学習モデルを活用することで、単純な線形プローブが特定のデータセット上でランダムな初期化を伴う完全微調整や他のペフト法よりも優れていることがわかりました。さらに,ロバスト事前学習からロバスト性を維持するのに線形プローブが優れていることも確認した。そこで本稿では, 逆線形探索により得られる重みを線形ヘッドに初期化して, 事前学習から頑健性を最大限に継承する, 逆線形初期化(RoLI)を提案する。 5つの異なる画像分類データセットにおいて,RoLIの有効性を実証し,新しい最先端結果を得た。

With the prevalence of the Pretraining-Finetuning paradigm in transfer learning, the robustness of downstream tasks has become a critical concern. In this work, we delve into adversarial robustness in transfer learning and reveal the critical role of initialization, including both the pretrained model and the linear head. First, we discover the necessity of an adversarially robust pretrained model. Specifically, we reveal that with a standard pretrained model, Parameter-Efficient Finetuning~(PEFT) methods either fail to be adversarially robust or continue to exhibit significantly degraded adversarial robustness on downstream tasks, even with adversarial training during finetuning. Leveraging a robust pretrained model, surprisingly, we observe that a simple linear probing can outperform full finetuning and other PEFT methods with random initialization on certain datasets. We further identify that linear probing excels in preserving robustness from the robust pretraining. Based on this, we propose Robust Linear Initialization~(RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining. Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.

翻訳日:2023-12-12 19:13:16 公開日:2023-12-10

# マルチスケールモデリングにおけるマイクロマクロ整合性:高速・低速力学系のスコアベースモデルによるサンプリング

Micro-Macro Consistency in Multiscale Modeling: Score-Based Model Assisted Sampling of Fast/Slow Dynamical Systems ( http://arxiv.org/abs/2312.05715v1 )

ライセンス: Link先を確認

Ellis R. Crabtree, Juan M. Bello-Rivas, Ioannis G. Kevrekidis

(参考訳) 計算化学、生物学、材料科学などの分野におけるマルチスケール力学系のモデリングにおける重要なステップは、長期にわたる関心事における位相空間の代表的なサンプリングである。例えば、多くの自由度を持つ系の長期的挙動は直接力学シミュレーションによって効率的に計算できないことが多く、そのようなシステムは局所的な自由エネルギーミニマの中に閉じ込められることがある。物理学に基づくマルチ時間力学系の研究において、自由エネルギー障壁を越える探索を加速するためにサンプリングを強化する技術が開発されている。一方、機械学習の分野では、生成モデルの一般的な目標は、この密度から経験的なサンプルをトレーニングした後、ターゲット密度からサンプルをサンプリングすることである。スコアベース生成モデル(SGM)は、目標トレーニング分布から可塑性データを生成する最先端の能力を実証している。このような生成モデルの条件付き実装は、強化サンプリングに対する長い確立された-および物理に基づく-ソリューションと大きな並列性を示すことが示されている。これらの物理に基づく手法は、ML生成モデルとの結合によって強化され、強度を補完し、それぞれの技術の弱点を軽減することができる。本研究では,SGMをこのような結合フレームワークで利用することにより,マルチスケールな動的システムのサンプリングを改善することができることを示す。

A valuable step in the modeling of multiscale dynamical systems in fields such as computational chemistry, biology, materials science and more, is the representative sampling of the phase space over long timescales of interest; this task is not, however, without challenges. For example, the long term behavior of a system with many degrees of freedom often cannot be efficiently computationally explored by direct dynamical simulation; such systems can often become trapped in local free energy minima. In the study of physics-based multi-time-scale dynamical systems, techniques have been developed for enhancing sampling in order to accelerate exploration beyond free energy barriers. On the other hand, in the field of Machine Learning, a generic goal of generative models is to sample from a target density, after training on empirical samples from this density. Score based generative models (SGMs) have demonstrated state-of-the-art capabilities in generating plausible data from target training distributions. Conditional implementations of such generative models have been shown to exhibit significant parallels with long-established -- and physics based -- solutions to enhanced sampling. These physics-based methods can then be enhanced through coupling with the ML generative models, complementing the strengths and mitigating the weaknesses of each technique. In this work, we show that that SGMs can be used in such a coupling framework to improve sampling in multiscale dynamical systems.

翻訳日:2023-12-12 19:12:51 公開日:2023-12-10

# ミューオン崩壊の絡み合いエントロピー分布

Entanglement Entropy Distributions of a Muon Decay ( http://arxiv.org/abs/2312.05712v1 )

ライセンス: Link先を確認

Shanmuka Shivashankara, Patti Rizzo, Nicole Cafe

(参考訳) 崩壊および散乱過程の密度行列で生じる発散は、トレースとユニタリティーまたは光学定理によって正規化される。これらの発散は、崩壊する粒子の寿命または全散乱断面積によって規則化される。また、この正規化は最終的な粒子の期待されるヘリシティを与える。密度行列は、ローレンツ不変密度行列のエントリとユニタリティーが木レベルで保たれる、休息時の偏極ミューオンの弱崩壊、$\mu^- \rightarrow \nu_{\mu} (e^\bar \nu_e)$ に対して導かれる。電子のフォン・ノイマンエンタングルメントエントロピー分布は、電子の放出角とエネルギーの両方に関して計算される。角エントロピー分布は、最小体積正規化が与えられたミューオンの分極に対して後方に放出される電子を好む。運動エントロピー分布はミューオンの静止質量エネルギーの半分で最大である。これらの結果は、電子の角および運動的減衰率分布に類似している。密度行列とエンタングルメントエントロピーは、領域または体積の比のどちらかでキャストすることができる。

Divergences that occur in density matrices of decay and scattering processes are shown to be regularized by tracing and unitarity or the optical theorem. These divergences are regularized by the lifetime of the decaying particle or the total scattering cross section. Also, this regularization is shown to give the expected helicities of final particles. The density matrix is derived for the weak decay of a polarized muon at rest, $\mu^- \rightarrow \nu_{\mu} (e^- \bar \nu_e)$, with Lorentz invariant density matrix entries and unitarity upheld at tree level. The electron's von Neumann entanglement entropy distributions are calculated with respect to both the electron's emission angle and energy. The angular entropy distribution favors an electron emitted backwards with respect to the muon's polarization given a minimum volume regularization. The kinematic entropy distribution is maximal at half the muon's rest mass energy. These results are similar to the electron's angular and kinematic decay rate distributions. Both the density matrix and entanglement entropy can be cast either in terms of ratios of areas or volumes.

翻訳日:2023-12-12 19:12:29 公開日:2023-12-10

# スパース誘導ネットワークを用いたカメラベース3次元セマンティックシーン補完

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network ( http://arxiv.org/abs/2312.05752v1 )

ライセンス: Link先を確認

Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao, Jongwon Ra, Laijian Li, Yong Liu

(参考訳) semantic scene completion (ssc) は、3dシーン全体における各voxelの意味的占有率を限定的な観察から予測することを目的としている。近年,よりリッチな視覚手がかりとカメラの費用対効果により,カメラベースのsscソリューションが研究されている。しかし、既存の手法は通常、高度で重い3dモデルを頼りにして、明確なセグメンテーション境界に十分な識別性を持たないリフトされた3d特徴を直接処理する。そこで,本稿では,sgnと呼ばれるエンドツーエンドカメラベースのsscフレームワークを提案する。sgnは,意味的かつ可食的な種子ボクセルから,幾何学的前後の空間情報に基づいてシーン全体へ意味を拡散する。空間的占有と幾何学的先行のためのハイブリッドガイダンス(疎意味的および幾何的ガイダンス)と効果的なボクセルアグリゲーションを設計することにより、異なるカテゴリ間の特徴分離を強化し、意味拡散の収束を早める。 SemanticKITTIデータセットの大規模な実験結果は、既存の最先端手法よりもSGNの方が優れていることを示している。

Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to directly process the lifted 3D features that are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene based on geometry prior and occupancy information. By designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial occupancy and geometry priors, we enhance the feature separation between different categories and expedite the convergence of semantic diffusion. Extensive experimental results on the SemanticKITTI dataset demonstrate the superiority of our SGN over existing state-of-the-art methods.

翻訳日:2023-12-12 19:05:40 公開日:2023-12-10

# クエリストラテジーのベンチマーク: 深層学習を目指して

Benchmarking of Query Strategies: Towards Future Deep Active Learning ( http://arxiv.org/abs/2312.05751v1 )

ライセンス: Link先を確認

Shiryu Ueno, Yusei Yamada, Shunsuke Nakatsuka, and Kunihito Kato

(参考訳) 本研究では,深層行動学習(DAL)のためのクエリ戦略をベンチマークする。 DALは、クエリ戦略によって選択された高品質なサンプルに注釈を付けることで、アノテーションのコストを削減する。既存の研究には2つの主要な問題があり、実験的な設定は標準化されておらず、既存の方法の評価が困難であり、実験のほとんどはcifarまたはmnistデータセットで行われた。そこで我々は,DALの標準化された実験環境を開発し,医用および視覚検査画像を含む6つのデータセットを用いて,様々なクエリ戦略の有効性を検討する。さらに,現在のdalアプローチのほとんどがモデルベースであるため,クエリのためのフルトレーニングモデルを用いた検証実験を行い,これら6つのデータセットの有効性を検証した。私たちのコードは \href{https://github.com/ia-gu/Benchmarking-of-Query-Strategies-Towards-Future-Deep-Active-Learning} で利用可能です。

In this study, we benchmark query strategies for deep actice learning~(DAL). DAL reduces annotation costs by annotating only high-quality samples selected by query strategies. Existing research has two main problems, that the experimental settings are not standardized, making the evaluation of existing methods is difficult, and that most of experiments were conducted on the CIFAR or MNIST datasets. Therefore, we develop standardized experimental settings for DAL and investigate the effectiveness of various query strategies using six datasets, including those that contain medical and visual inspection images. In addition, since most current DAL approaches are model-based, we perform verification experiments using fully-trained models for querying to investigate the effectiveness of these approaches for the six datasets. Our code is available at \href{https://github.com/ia-gu/Benchmarking-of-Query-Strategies-Towards-Future-Deep-Active-Learning}

翻訳日:2023-12-12 19:05:13 公開日:2023-12-10

# IL-NeRF:カメラポッドアライメントを用いたニューラルラジアンスフィールドのインクリメンタル学習

IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment ( http://arxiv.org/abs/2312.05748v1 )

ライセンス: Link先を確認

Letian Zhang, Ming Li, Chen Chen, Jie Xu

(参考訳) neural radiance fields(nerf)は、フォトリアリスティックな画像を生成し、複雑なシーンを表現するための有望なアプローチである。しかし、データを逐次処理する場合は、新しいデータでトレーニングした後、前のデータを忘れやすい破滅的な忘れ込みに悩まされることがある。知識蒸留を用いた既存の漸進的な学習手法では、連続データチャンクは2次元画像と対応するカメラポーズパラメータの両方を含むと仮定する。これは、データが順次到着し、将来のチャンクがアクセスできない場合でも、必要なカメラのポーズをデータセット全体から推定する必要があるため、パラドックスとなる。対照的に,カメラのポーズが不明な実用的なシナリオに注目する。我々は,この課題に対処するために,段階的NeRFトレーニングのための新しいフレームワークであるIL-NeRFを提案する。 IL-NeRFのキーとなるアイデアは、カメラのポーズを初期化して調整するための参照として過去のカメラのポーズを選択することである。この後、カメラポーズと再生ベースのnerf蒸留の合同最適化が行われる。実世界の屋内および屋外のシーンにおける実験により、IL-NeRFはインクリメンタルなNeRFトレーニングを処理し、ベースラインを最大54.04 %のレンダリング品質で上回ります。

Neural radiance fields (NeRF) is a promising approach for generating photorealistic images and representing complex scenes. However, when processing data sequentially, it can suffer from catastrophic forgetting, where previous data is easily forgotten after training with new data. Existing incremental learning methods using knowledge distillation assume that continuous data chunks contain both 2D images and corresponding camera pose parameters, pre-estimated from the complete dataset. This poses a paradox as the necessary camera pose must be estimated from the entire dataset, even though the data arrives sequentially and future chunks are inaccessible. In contrast, we focus on a practical scenario where camera poses are unknown. We propose IL-NeRF, a novel framework for incremental NeRF training, to address this challenge. IL-NeRF's key idea lies in selecting a set of past camera poses as references to initialize and align the camera poses of incoming image data. This is followed by a joint optimization of camera poses and replay-based NeRF distillation. Our experiments on real-world indoor and outdoor scenes show that IL-NeRF handles incremental NeRF training and outperforms the baselines by up to $54.04\%$ in rendering quality.

翻訳日:2023-12-12 19:04:57 公開日:2023-12-10

# 学生学習におけるスキル分類と予測のための確率と情報エントロピーの差異

Difference of Probability and Information Entropy for Skills Classification and Prediction in Student Learning ( http://arxiv.org/abs/2312.05747v1 )

ライセンス: Link先を確認

Kennedy Efosa Ehimwenma, Safiya Al Sharji and Maruf Raheem

(参考訳) 事象の確率は[0, 1]の範囲にある。サンプル空間sにおいて、確率の値は結果が真か偽かを決定する。決して起こらない事象Pr(A)の確率 = 0. イベントPr(B) の確率は確実に起こる。 = 1 なので、イベント a と b の両方が確実である。さらに、与えられたサンプル空間 s = 1 における有限個の事象の集合の確率の和 pr(e1) + pr(e2) + ... + pr(en) は、逆に、確実に起こる2つの確率の和の差が成り立つ。まず, ベイズの定理を考察し, 学生学習における学習対象の予測に応用する前に, 学習事象の発生確率と確率の差を補う。本論文は,生徒の学習対象の重みを,argMaxPr(S)と学生効果の確率の差で定量化するものである。スキルセットのデータセットを使用して、計算手順が示す。一より高いレベルの学習につながるようなスキルセットの事象の確率二被写体・被写体の再学習を要しない事象の確率三学生の成績をクラスラベルに予測する際の決定木の正確性及び四スキルセットデータに関する情報エントロピーとその学生の認知能力及び学習の推薦に関する意味 [1]

The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + ... + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes' theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].

翻訳日:2023-12-12 19:04:34 公開日:2023-12-10

# ファンデーションモデルにおけるオープンワールドオブジェクト検出

Open World Object Detection in the Era of Foundation Models ( http://arxiv.org/abs/2312.05745v1 )

ライセンス: Link先を確認

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh Wang

(参考訳) 物体検出は、ロボット工学から医療画像解析まで、現実世界の様々な応用に不可欠なものだ。このようなアプリケーションで確実に使用されるためには、モデルが予期せぬ(または新しい)オブジェクトを処理できる必要がある。オープンワールドオブジェクト検出(OWD)パラダイムは、未知のオブジェクトを検出し、発見したオブジェクトを段階的に学習することで、この課題に対処する。しかし、OWDメソッドの開発は、厳密なベンチマークとタスク定義のために妨げられている。これらの定義は事実上基礎モデルを禁じる。本稿では,これらの定義を緩和し,OWDにおける事前学習基盤モデルの利用について検討する。まず,既存のベンチマークでは基礎モデルを用いた評価手法が不十分であることを示す。その結果、これらのモデルの新たな、挑戦的なベンチマークをキュレートする動機になりました。そこで我々は,航空画像や外科画像などの挑戦的領域を含む,現実世界のアプリケーション駆動データセット5つを含む新しいベンチマークを導入し,ベースラインを確立する。アプリケーション駆動データセットのクラス間の固有の接続を利用し、新しいメソッドであるオープンワールドのためのファウンデーションオブジェクト検出モデル(fomo)を導入し、ベースとなる既知のオブジェクトと共有属性に基づいて未知のオブジェクトを識別する。 FOMOは、ベンチマークのベースラインに比べて、未知のオブジェクトmAPが約3倍である。しかし,本研究の結果から,オブジェクト検出手法を現実世界のドメインに拡張する大きな研究機会が示唆された。私たちのコードとベンチマークはhttps://orrzohar.github.io/projects/fomo/で利用可能です。

Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these models. Therefore, we introduce a new benchmark that includes five real-world application-driven datasets, including challenging domains such as aerial and surgical images, and establish baselines. We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects. FOMO has ~3x unknown object mAP compared to baselines on our benchmark. However, our results indicate a significant place for improvement - suggesting a great research opportunity in further scaling object detection methods to real-world domains. Our code and benchmark are available at https://orrzohar.github.io/projects/fomo/.

翻訳日:2023-12-12 19:04:10 公開日:2023-12-10

# Learngene Poolによる可変サイズモデルの構築

Building Variable-sized Models via Learngene Pool ( http://arxiv.org/abs/2312.05743v1 )

ライセンス: Link先を確認

Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

(参考訳) 近年、ステッチ可能なニューラルネットワーク(sn-net)が、いくつかの事前学習されたネットワークを縫い合わせて、複雑さとパフォーマンスのトレードオフが異なる多数のネットワークを迅速に構築するために提案されている。このようにして、さまざまなリソース制約のあるアプリケーションシナリオで使用できる可変サイズのネットワークの設計やトレーニングの負担を軽減することができる。しかし、SN-Netはまだいくつかの課題に直面している。 1) 独立に訓練された複数のアンカーからのスティッチは、高いストレージリソース消費をもたらす。 2) SN-Netはリソース制約の少ないモデルを構築するための課題に直面している。 3). SN-Netは縫い目層に未学習の初期化法を使用し、最終的な性能を制限している。最近提案されたlearnergeneフレームワークに動機づけられたこれらの課題を克服するために,learnergene poolと呼ばれる新しい手法を提案する。簡単に言うと、learnergeneは、大きな事前学習されたモデルから重要な知識を小さな部分(learnergeneと呼ばれる)に蒸留し、その小さな部分をいくつかの可変サイズのモデルに拡張する。提案手法では,ネットワークブロックを学習ジェネレーションインスタンスとして使用して学習ジェネレーションプールを構築する複数の小モデルに事前学習した大モデルを蒸留する。 1つの大きなモデルしか使われないので、SN-Netとしてもっと大きなモデルを格納する必要はなく、蒸留後、低いリソース制約を満たすために小さなモデルを構築するために小さな学習遺伝子インスタンスを作成できる。また、インスタンス間で学習可能な変換行列を挿入して可変サイズのモデルに縫い付け、これらのモデルの性能を向上させる。その結果, SN-Netと比較して, 提案したLeargen Poolの有効性が検証された。

Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net.

翻訳日:2023-12-12 19:03:45 公開日:2023-12-10

# misca:マルチインテント検出とスロット充填のためのインテント・スロット協調モデル

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention ( http://arxiv.org/abs/2312.05741v1 )

ライセンス: Link先を確認

Thinh Pham and Chi Tran and Dat Quoc Nguyen

(参考訳) 複雑な現実の状況と関連性から,複数意図の検出やスロットの充填に関する研究が盛んになっている。グラフに基づくジョイントモデルである最近の高度なアプローチは、まだ2つの潜在的な問題に直面しているかもしれない。 (i)事前の意図とスロットに基づいてグラフを構築することにより生じる不確実性は、意図とスロットの相関情報を不正確なラベルノードの宛先に転送することができる。 (ii)トークン単位のインテント投票ごとに複数のインテントラベルを直接組み込むと、スロット予測が誤っている可能性があるため、全体的なパフォーマンスが損なわれる可能性がある。この2つの問題に対処するため,我々はmiscaというジョイントモデルを提案する。我々のMISCAは、意図-スロットのコアテンション機構とラベルアテンション機構の基盤層を導入している。これらのメカニズムによりmiscaはインテントとスロットラベルの相関を効果的に捉え、グラフ構築の必要性をなくすことができる。インテントからスロットへ、スロットからインテントへ、複数のレベルのラベル固有の表現を通して、トークンレベルのインテント情報に頼ることなく、双方向の相関情報の転送も行う。実験の結果、MISCAは従来のモデルよりも優れており、MixATISとMixSNIPSの2つのベンチマークデータセット上で、新しい最先端の全体的な精度性能を実現している。これは注意機構の有効性を強調します。

The research study of detecting multiple intents and filling slots is becoming more popular because of its relevance to complicated real-world situations. Recent advanced approaches, which are joint models based on graphs, might still face two potential issues: (i) the uncertainty introduced by constructing graphs based on preliminary intents and slots, which may transfer intent-slot correlation information to incorrect label node destinations, and (ii) direct incorporation of multiple intent labels for each token w.r.t. token-level intent voting might potentially lead to incorrect slot predictions, thereby hurting the overall performance. To address these two issues, we propose a joint model named MISCA. Our MISCA introduces an intent-slot co-attention mechanism and an underlying layer of label attention mechanism. These mechanisms enable MISCA to effectively capture correlations between intents and slot labels, eliminating the need for graph construction. They also facilitate the transfer of correlation information in both directions: from intents to slots and from slots to intents, through multiple levels of label-specific representations, without relying on token-level intent information. Experimental results show that MISCA outperforms previous models, achieving new state-of-the-art overall accuracy performances on two benchmark datasets MixATIS and MixSNIPS. This highlights the effectiveness of our attention mechanisms.

翻訳日:2023-12-12 19:03:18 公開日:2023-12-10

# GAMC:マスキング付きグラフオートエンコーダを用いたフェイクニュース検出のための教師なし手法

GAMC: An Unsupervised Method for Fake News Detection using Graph Autoencoder with Masking ( http://arxiv.org/abs/2312.05739v1 )

ライセンス: Link先を確認

Shu Yin, Chao Gao, Zhen Wang

(参考訳) ソーシャルメディアの普及に伴い、偽ニュースの拡散は重大な懸念となり、大衆の認識を誤解させ、社会的安定に影響を及ぼす可能性がある。 cnn、rnn、bertのようなトランスフォーマーモデルのようなディープラーニング手法は偽ニュースの検出を強化しているが、主にニュース伝播中にソーシャルコンテキストを見下ろすコンテンツに焦点を当てている。グラフベースのテクニックはこのソーシャルコンテキストを取り入れているが、大きなラベル付きデータセットの必要性によって制限されている。本稿では,マスキングとコントラスト学習を備えたグラフオートエンコーダを用いて,教師なしの偽ニュース検出手法であるGAMCを紹介する。情報伝達のコンテキストと内容を自己教師付き信号として活用することにより,ラベル付きデータセットの要求を無効化する。元のニュース伝搬グラフを拡張し、それらをグラフエンコーダでエンコードし、グラフデコーダを用いて再構成する。再構成誤差やコントラスト損失を含むユニークな複合損失関数を設計する。この手法の貢献は、偽ニュースの検出に自己教師付き学習を導入し、2つの異なる損失を統合するグラフオートエンコーダを提案し、実際のデータセット実験を通じてアプローチの有効性を検証することである。

With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection, they primarily focus on content, overlooking social context during news propagation. Graph-based techniques have incorporated this social context but are limited by the need for large labeled datasets. Addressing these challenges, this paper introduces GAMC, an unsupervised fake news detection technique using the Graph Autoencoder with Masking and Contrastive learning. By leveraging both the context and content of news propagation as self-supervised signals, our method negates the requirement for labeled datasets. We augment the original news propagation graph, encode these with a graph encoder, and employ a graph decoder for reconstruction. A unique composite loss function, including reconstruction error and contrast loss, is designed. The method's contributions are: introducing self-supervised learning to fake news detection, proposing a graph autoencoder integrating two distinct losses, and validating our approach's efficacy through real-world dataset experiments.

翻訳日:2023-12-12 19:02:48 公開日:2023-12-10

# fedreverse: 多人数可逆ディープニューラルネットワークのウォーターマーキング

FedReverse: Multiparty Reversible Deep Neural Network Watermarking ( http://arxiv.org/abs/2312.05738v1 )

ライセンス: Link先を確認

Junlong Mao, Huiyi Tang, Yi Zhang, Fengxia Liu, Zhiyong Zheng and Shanxiang Lyu

(参考訳) 商用アプリケーションにおけるディープニューラルネットワーク(DNN)の普及は急速に進んでいる。同時に、DNNモデルの複雑化とコストの増大により、これらの訓練されたモデルに関連する知的財産の保護を取り巻く緊急性が高まっている。この点において、DNNの透かしは重要な保護技術として現れている。本稿では,性能への影響を最小限に抑えつつ,堅牢な著作権保護のための多元的可逆的透かし手法であるfeedreverseを提案する。既存の方法とは異なり、feedreverseはモデルトレーニング後に複数のパーティから共同ウォーターマークを埋め込み、個々の著作権クレームを保証できる。さらに、FedReverseは可逆であり、全クライアントの同意を得て完全な透かしを削除することができる。 FedReverseは完璧なカバーを示し、透かしのある内容の観察が隠された透かしに関する情報を明かさないようにする。さらに、既知のオリジナル攻撃(koa)に対する抵抗を示し、攻撃者がウォーターマークを偽造したり、鍵を推測したりするのは非常に困難である。本稿では,MNISTデータセットに基づいて学習した多層パーセプトロン(MLP)と畳み込みニューラルネットワーク(CNN)の総合シミュレーションを通じてFedReverseを評価する。シミュレーションは、FedReverseの堅牢性、可逆性、および様々な埋め込みパラメータと複数のクライアントシナリオにわたるモデルの精度に最小限の影響を示す。

The proliferation of Deep Neural Networks (DNN) in commercial applications is expanding rapidly. Simultaneously, the increasing complexity and cost of training DNN models have intensified the urgency surrounding the protection of intellectual property associated with these trained models. In this regard, DNN watermarking has emerged as a crucial safeguarding technique. This paper presents FedReverse, a novel multiparty reversible watermarking approach for robust copyright protection while minimizing performance impact. Unlike existing methods, FedReverse enables collaborative watermark embedding from multiple parties after model training, ensuring individual copyright claims. In addition, FedReverse is reversible, enabling complete watermark removal with unanimous client consent. FedReverse demonstrates perfect covering, ensuring that observations of watermarked content do not reveal any information about the hidden watermark. Additionally, it showcases resistance against Known Original Attacks (KOA), making it highly challenging for attackers to forge watermarks or infer the key. This paper further evaluates FedReverse through comprehensive simulations involving Multi-layer Perceptron (MLP) and Convolutional Neural Networks (CNN) trained on the MNIST dataset. The simulations demonstrate FedReverse's robustness, reversibility, and minimal impact on model accuracy across varying embedding parameters and multiple client scenarios.

翻訳日:2023-12-12 19:02:13 公開日:2023-12-10

# aswt-sgnn:適応スペクトルウェーブレット変換に基づく自己教師付きグラフニューラルネットワーク

ASWT-SGNN: Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network ( http://arxiv.org/abs/2312.05736v1 )

ライセンス: Link先を確認

Ruyue Liu, Rong Yin, Yong Liu, Weiping Wang

(参考訳) グラフ比較学習(GCL)は、グラフ畳み込みネットワーク(GCN)と比較学習の利点を組み合わせた自己教師型手法であり、ノード表現の学習に有望である。しかし、これらの手法で使用されるGCNエンコーダは、空間的およびスペクトル的局所化トレードオフを含む不確実性原理によって本質的に制限されている固定グラフ表現を学習するためにフーリエ変換に依存する。本稿では,既存手法の柔軟性と計算コストのかかる固有分解と高密度行列乗算を克服するために,適応スペクトルウェーブレット変換を用いた自己教師付きグラフニューラルネットワーク(ASWT-SGNN)を提案する。フィルタ関数を近似するためにスペクトル適応多項式を用い,コントラスト損失を用いてウェーブレットを最適化する。この設計により、スペクトル領域と空間領域の両方で局所フィルタを作成でき、様々なスケールで近隣情報の柔軟な集約を可能にし、局所情報とグローバル情報の制御された変換を容易にする。従来の手法と比較して,提案手法は計算複雑性を低減し,グラフサイズに制約されたグラフ畳み込みニューラルネットワークの制限に対処する。 8つのベンチマークデータセットの大規模な実験により、ASWT-SGNNは高密度スペクトル領域のフィルタ関数を正確に近似し、コストの高い固有分解を避けることを示した。さらに、ASWT-SGNNはノード分類タスクにおける最先端モデルに匹敵する性能を達成する。

Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial and spectral localization trade-offs. To overcome the inflexibility of existing methods and the computationally expensive eigen-decomposition and dense matrix multiplication, this paper proposes an Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network (ASWT-SGNN). The proposed method employs spectral adaptive polynomials to approximate the filter function and optimize the wavelet using contrast loss. This design enables the creation of local filters in both spectral and spatial domains, allowing flexible aggregation of neighborhood information at various scales and facilitating controlled transformation between local and global information. Compared to existing methods, the proposed approach reduces computational complexity and addresses the limitation of graph convolutional neural networks, which are constrained by graph size and lack flexible control over the neighborhood aspect. Extensive experiments on eight benchmark datasets demonstrate that ASWT-SGNN accurately approximates the filter function in high-density spectral regions, avoiding costly eigen-decomposition. Furthermore, ASWT-SGNN achieves comparable performance to state-of-the-art models in node classification tasks.

翻訳日:2023-12-12 19:01:40 公開日:2023-12-10

# ディープラーニングを用いたマルチモーダル会話感情認識に関する総合的研究

A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning ( http://arxiv.org/abs/2312.05735v1 )

ライセンス: Link先を確認

Yuntao Shou, Tao Meng, Wei Ai, Nan Yin, Keqin Li

(参考訳) マルチモーダル会話感情認識(MCER)は、会話シーンにおけるテキスト、音声、視覚情報を用いて話者の感情状態を認識し、追跡することを目的としている。 MCER問題の解析と研究は、感情コンピューティング、インテリジェントなレコメンデーション、人間とコンピュータの相互作用分野において重要である。従来の単一発話のマルチモーダル感情認識や単一モーダル会話感情認識とは異なり、mcerはより複雑な感情相互作用関係を扱う必要があるより難しい問題である。重要な問題は、感情的相互作用関係に基づくマルチモーダル特徴融合のための一貫性と補完的意味論の学習である。この問題を解決するために、深層学習技術に基づくmcerに関する広範な研究を行ったが、モデリング手法の体系的なレビューが不足している。したがって、MCERのディープラーニングにおける最近の進歩のタイムリーで包括的な概要は、学術や産業にとって非常に重要である。本研究では,mcerモデリング手法の包括的概要と,mcer手法を4つのカテゴリ(文脈自由モデリング,逐次文脈モデリング,話者微分モデリング,話者関係モデリング)に大まかに分割した。さらに,MCERが公開している一般的なデータセット,マルチモーダル特徴抽出手法,アプリケーション領域,既存の課題,今後の開発方向性についても論じる。我々は、MCER研究者が感情認識の現在の研究状況を理解し、いくつかのインスピレーションを与え、より効率的なモデルを開発するのに役立つことを期待している。

Multi-modal conversation emotion recognition (MCER) aims to recognize and track the speaker's emotional state using text, speech, and visual information in the conversation scene. Analyzing and studying MCER issues is significant to affective computing, intelligent recommendations, and human-computer interaction fields. Unlike the traditional single-utterance multi-modal emotion recognition or single-modal conversation emotion recognition, MCER is a more challenging problem that needs to deal with more complex emotional interaction relationships. The critical issue is learning consistency and complementary semantics for multi-modal feature fusion based on emotional interaction relationships. To solve this problem, people have conducted extensive research on MCER based on deep learning technology, but there is still a lack of systematic review of the modeling methods. Therefore, a timely and comprehensive overview of MCER's recent advances in deep learning is of great significance to academia and industry. In this survey, we provide a comprehensive overview of MCER modeling methods and roughly divide MCER methods into four categories, i.e., context-free modeling, sequential context modeling, speaker-differentiated modeling, and speaker-relationship modeling. In addition, we further discuss MCER's publicly available popular datasets, multi-modal feature extraction methods, application areas, existing challenges, and future development directions. We hope that our review can help MCER researchers understand the current research status in emotion recognition, provide some inspiration, and develop more efficient models.

翻訳日:2023-12-12 19:00:34 公開日:2023-12-10

# DevBotsはAPIを共同設計できる

DevBots can co-design APIs ( http://arxiv.org/abs/2312.05733v1 )

ライセンス: Link先を確認

Vinicius Soares Silva Marques

(参考訳) DevBotsは、ソフトウェア開発をサポートするためにさまざまなタスクを実行する自動化ツールである。それらは増加傾向にあり、繰り返しタスクの自動化やコードジェネレータ、要件の排除やアーキテクチャ定義のコラボレータとして、リポジトリで使用されている。本研究では,ソフトウェア開発におけるdevbotの利用の現状,その特性の理解,ユースケースの特定,devbotと会話型ソフトウェア開発の関係の学習など,24の記事を分析し,人間開発者とボットのコラボレーションを実現する方法について議論した。さらに,人間設計者とdevbotとの協調型api設計に即座のエンジニアリングを適用することで対処すべきギャップを特定し,検索拡張現実を用いた場合とそうでない場合のアプローチが適切かを評価する実験を提案した。私たちの結論では、DevBotsは人間のAPIデザイナと協力することができますが、この2つのアプローチにはアドバンテージとデメリットがあります。

DevBots are automated tools that perform various tasks in order to support software development. They are a growing trend and have been used in repositories to automate repetitive tasks, as code generators, and as collaborators in eliciting requirements and defining architectures. In this study, we analyzed 24 articles to investigate the state of the art of using DevBots in software development, trying to understand their characteristics, identify use cases, learn the relationship between DevBots and conversational software development, and discuss how prompt engineering can enable collaboration between human developers and bots. Additionally, we identified a gap to address by applying prompt engineering to collaborative API design between human designers and DevBots and proposed an experiment to assess what approach, between using Retrieval Augmented Generation or not, is more suitable. Our conclusion is that DevBots can collaborate with human API designers, but the two approaches have advantages and disadvantages.

翻訳日:2023-12-12 18:59:42 公開日:2023-12-10

# 一般化ジェームスの有効ハミルトン法」への回答

Reply to "Comment on `Generalized James' effective Hamiltonian method' " ( http://arxiv.org/abs/2312.05732v1 )

ライセンス: Link先を確認

Wenjun Shao, Chunfeng Wu, and Xun-Li Feng

(参考訳) 前回のコメント [1] において、元の論文 [2] で得られる三階ハミルトニアンは、時間依存性や有効三階拡大の導出方法を考える場合の一般的な状況ではエルミート的ではないと主張した。まず第一に、我々の論文で与えられた3階ハミルトニアンは、ここで述べた条件の下で正確にエルミート的である。第二に, 一般化実効ハミルトニアンを導出する反復的手法はダイソン級数と同値であり, その正確性を保証することができる。第三に、発散した実効ハミルトニアンは、コメントに示されるような時間依存的な状況下では確かに非エルミート的であるが、それは正確には非単体発散ダイソン級数に対応する。断続ダイソン級数は時間依存摂動理論において広く利用されてきたが、本論では非エルミート断続有効ハミルトニアンを有効ハミルトニアンの近似として扱うことができる。

In the preceding Comment [1] it was claimed that the third-order Hamiltonian obtained in our original paper [2] is not Hermitian for general situations when considering time-dependence and the way of deriving the effective third-order expansion is not very rigorous. To reply the comment we should emphasize the following three points: first of all, the third-order Hamiltonian given in our paper is exactly Hermitian under the conditions mentioned there. Secondly, the iterative method adopted in our paper to derive the generalized effective Hamiltonian is equivalent to the Dyson series, and its correctness can thus be guaranteed. Thirdly, although the truncated effective Hamiltonian is indeed non-Hermitian under the time-dependent situation as presented in the Comment, it corresponds exactly to the non-unitary truncated Dyson series. Considering the truncated Dyson series has been extensively utilized in the time-dependent perturbation theory, in our opinion, the non-Hermitian truncated effective Hamiltonian can still be treated as an approximation of the effective Hamiltonian.

翻訳日:2023-12-12 18:59:24 公開日:2023-12-10

# 相関行列を用いたパラメタライズドステアリング基準

Parameterized steering criteria via correlation matrices ( http://arxiv.org/abs/2312.05729v1 )

ライセンス: Link先を確認

Qing-Hua Zhang, Lemin Lai, Shao-Ming Fei

(参考訳) 局所特殊ユニタリ群が与える相関行列に基づく任意の次元二成分系のステアビリティについて検討した。パラメータ化相関行列を用いた二部量子状態のステアリング基準のファミリについて述べる。これらのステアリング基準は、既存のステアリング基準よりも、よりステアブルな状態を検出する可能性がある。結果は詳細な例で示される。

We study the steerability for arbitrary dimensional bipartite systems based on the correlation matrices given by local special unitary groups. We present families of steering criteria for bipartite quantum states in terms of parameterized correlation matrices. We show that these steering criteria may detect more steerable states than the existing steering criteria. The results are illustrated by detailed examples.

翻訳日:2023-12-12 18:58:59 公開日:2023-12-10

# 説明一貫性チェックによるChatGPTによるWeb UIテストの修正

Guiding ChatGPT to Fix Web UI Tests via Explanation-Consistency Checking ( http://arxiv.org/abs/2312.05778v1 )

ライセンス: Link先を確認

Zhuolin Xu, Yuanzhang Lin, Qiushi Li and Shin Hwei Tan

(参考訳) Web UIの急速な進化は、UIテストの維持に時間と労力を要する。 Web UIテストの既存のテクニックは、古いものと一致する新しいWebページのターゲット要素を見つけることに重点を置いており、対応する壊れたステートメントを修復することができる。本稿では,初期局所マッチングに先行する web ui の修正手法を活用し,グローバルマッチングを行うために chatgpt を用いた最初の研究を行う。キーとなる洞察は、以前のテクニックにマッチする要素のリストが与えられたら、ChatGPTは言語理解を利用してグローバルなビューマッチングを実行し、そのコード生成モデルを使って壊れたステートメントを修正できるということです。本稿では,ChatGPTにおける幻覚を緩和するため,提案した結果が一致しているかどうかを判定する説明検証器を設計し,自己補正プロンプトを通じてChatGPTにヒントを提供し,その結果をさらに改善する。本稿では,ChatGPTで強化した手法により,既存のWebテスト修復手法の有効性が向上したことを示す。また、将来のweb uiテストの修復技術を改善する上で、いくつかの重要な知見を共有しています。

The rapid evolution of Web UI incurs time and effort in maintaining UI tests. Existing techniques in Web UI test repair focus on finding the target elements on the new web page that match the old ones so that the corresponding broken statements can be repaired. We present the first study that investigates the feasibility of using prior Web UI repair techniques for initial local matching and then using ChatGPT to perform global matching. Our key insight is that given a list of elements matched by prior techniques, ChatGPT can leverage the language understanding to perform global view matching and use its code generation model for fixing the broken statements. To mitigate hallucination in ChatGPT, we design an explanation validator that checks whether the provided explanation for the matching results is consistent, and provides hints to ChatGPT via a self-correction prompt to further improve its results. Our evaluation on a widely used dataset shows that the ChatGPT-enhanced techniques improve the effectiveness of existing Web test repair techniques. Our study also shares several important insights in improving future Web UI test repair techniques.

翻訳日:2023-12-12 18:52:30 公開日:2023-12-10

# ノイズクロスモーダルマッチングのための負の事前認識

Negative Pre-aware for Noisy Cross-modal Matching ( http://arxiv.org/abs/2312.05777v1 )

ライセンス: Link先を確認

Zhang Xu and Li Hao and Ye Mang

(参考訳) 雑音対応は認識と修正が難しいため,クロスモーダルノイズロバスト学習は難しい課題である。未解決ノイズの累積及び不可避負の影響により、既存の手法ではノイズが増大しても安定した性能を維持することはできない。本稿では,雑音の多い下流タスクにおける大規模視覚言語モデルファインチューニングのための,NPC(Negative Pre-aware Cross-modal)マッチングソリューションを提案する。 1) ノイズ認識と抵抗の2つの側面で特徴付けられる:(1) 従来の手法は、通常、ノイズサブセットを直接フィルタリングするが、各サンプルの負の影響を推定する。信頼できない修正結果を予測するための追加の補正機構は不要であり、自己補強誤差につながる。トレーニングプロセスにおける負の影響に応じて,各サンプルに信頼度重みを割り当てる。これにより、ノイズ蓄積を避けるために各試料の寄与を適応的に調整する。 2) ノイズの増加とともに安定した性能を維持するため, メモリバンクの維持によるDNNの記憶効果を利用する。具体的には、メモリエントリとして高信頼クリーンサンプルを選択するためにGMMを適用し、メモリエントリを使用して各サンプルの負の影響を推定する。クリーンサンプルはノイズの増加とともにGMMにより識別が容易であるため、メモリバンクは高いノイズ比で高い品質を維持することができる。ノイズサンプルに着目した補正機構に比べ、メモリバンクに基づく推定はより堅牢であり、ノイズの多いデータセットでモデル性能を安定させる。広汎な実験により,提案手法は雑音比の増加に伴うマッチング精度と性能安定性を著しく向上することが示された。我々のアプローチは最先端の手法を大きく上回っている。

Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. Due to the cumulative and unavoidable negative impact of unresolved noise, existing methods cannot maintain a stable performance when the noise increases. In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tuning on noisy downstream tasks. It is featured in two aspects: (1) For noise recognition and resistance, previous methods usually directly filter out a noise subset, we propose to estimate the negative impact of each sample. It does not need additional correction mechanisms that may predict unreliable correction results, leading to self-reinforcing error. We assign a confidence weight to each sample according to its negative impact in the training process. This adaptively adjusts the contribution of each sample to avoid noisy accumulation. (2) For maintaining stable performance with increasing noise, we utilize the memorization effect of DNNs by maintaining a memory bank. Specifically, we apply GMM to select high-confident clean samples as the memory entry, where the memory entry is used to estimate the negative impact of each sample. Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio. Compared to the correction mechanism focusing on noise samples, memory bank-based estimation is more robust, which makes the model performance stable on noisy datasets. Extensive experiments demonstrate that our method significantly improves matching accuracy and performance stability at increasing noise ratio. Our approach also surpasses the state-of-the-art methods by a large margin.

翻訳日:2023-12-12 18:52:11 公開日:2023-12-10

# 量子インターネットのためのセキュアかつ効率的な絡み合い分散プロトコル

Secure and Efficient Entanglement Distribution Protocol for Near-Term Quantum Internet ( http://arxiv.org/abs/2312.05775v1 )

ライセンス: Link先を確認

Nicholas Skjellum, Mohamed Shaban, and Muhammad Ismail

(参考訳) 量子情報技術は、コンピューティング、通信、セキュリティに革命をもたらす可能性がある。その可能性を十分に実現するためには、数百万の量子ビットを持つ量子プロセッサが必要である。したがって、分散量子コンピューティングが既存の短期量子プロセッサをより強力なリソースに活用できるようにするために、量子ネットワークを確立することが重要である。本稿では,量子リンクが限られている古典量子ネットワークにおいて,量子デバイス間の絡み合いを分散するプロトコルを提案する。提案プロトコルでは,バタフライネットワークにおいてエンタングルメントを効率的に分散するためにエンタングルメントスワッピングを用い,ネットワークボトルネックを克服しながら量子テレポーテーションを実現し,各ノードの量子ビット要求を最小化するために,古典的なネットワーク符号化を適用する。実験の結果,提案プロトコルはネットワークサイズと線形にスケールする量子資源を必要とし,各ノードは固定数の量子ビットしか必要としないことがわかった。最大3つのトランシーバペアの小さなネットワークサイズの場合、提案プロトコルは17%少ないキュービットリソースを使用し、精度を8.8%向上し、35%高速なシミュレーション時間でベンチマークを上回っている。ネットワークサイズが大きいほど、割合が大幅に向上する。また,回転による量子状態エンコードを用いた悪質な絡み合いに対する絡み合い分布を確保するプロトコルを提案する。解析の結果,この手法は通信オーバーヘッドを必要とせず,悪意のあるノードが量子状態を取得する確率を7.2%に低下させることがわかった。得られた結果は、高度にスケーラブルで効率的でセキュアな短期量子インターネットを実現するプロトコルに向けられている。

Quantum information technology has the potential to revolutionize computing, communications, and security. To fully realize its potential, quantum processors with millions of qubits are needed, which is still far from being accomplished. Thus, it is important to establish quantum networks to enable distributed quantum computing to leverage existing and near-term quantum processors into more powerful resources. This paper introduces a protocol to distribute entanglements among quantum devices within classical-quantum networks with limited quantum links, enabling more efficient quantum teleportation in near-term hybrid networks. The proposed protocol uses entanglement swapping to distribute entanglements efficiently in a butterfly network, then classical network coding is applied to enable quantum teleportation while overcoming network bottlenecks and minimizing qubit requirements for individual nodes. Experimental results show that the proposed protocol requires quantum resources that scale linearly with network size, with individual nodes only requiring a fixed number of qubits. For small network sizes of up to three transceiver pairs, the proposed protocol outperforms the benchmark by using 17% fewer qubit resources, achieving 8.8% higher accuracy, and with a 35% faster simulation time. The percentage improvement increases significantly for large network sizes. We also propose a protocol for securing entanglement distribution against malicious entanglements using quantum state encoding through rotation. Our analysis shows that this method requires no communication overhead and reduces the chance of a malicious node retrieving a quantum state to 7.2%. The achieved results point toward a protocol that enables a highly scalable, efficient, and secure near-term quantum Internet.

翻訳日:2023-12-12 18:51:46 公開日:2023-12-10

# 量子ネットワークのためのセキュア量子アイデンティティ認証プロトコル

Secured Quantum Identity Authentication Protocol for Quantum Networks ( http://arxiv.org/abs/2312.05774v1 )

ライセンス: Link先を確認

Mohamed Shaban and Muhammad Ismail

(参考訳) 量子インターネットは、量子絡み合いと重ね合わせの原理を利用して、非並列レベルのセキュリティと効率的な計算を促進する通信技術の顕著な進歩を示している。量子通信は量子絡み合いを利用して実現することができる。 2つの実体間の絡み合った対の交換によって、量子通信は実現可能となり、量子テレポーテーションのプロセスによって実現される。チャネルの損失の性質と送信光子の指数的デコヒーレンスを考えると、中間ノードの集合は量子リピータとして機能し、2つの遠方のノードを直接絡み合わせることができる。このような量子リピータは悪意があり、2つの通信ノード間で交換された量子情報の秘密性を危うくすることができる。そこで本稿では,量子ネットワークを悪質な絡み合いから保護する量子id認証プロトコルを提案する。既存のプロトコルとは異なり、提案された量子認証プロトコルは共有秘密鍵の定期的な更新を必要としない。シミュレーションの結果,提案プロトコルは,平均4回の認証ラウンドの後に,100%確率で悪質な絡み合いを検出することができた。

Quantum Internet signifies a remarkable advancement in communication technology, harnessing the principles of quantum entanglement and superposition to facilitate unparalleled levels of security and efficient computations. Quantum communication can be achieved through the utilization of quantum entanglement. Through the exchange of entangled pairs between two entities, quantum communication becomes feasible, enabled by the process of quantum teleportation. Given the lossy nature of the channels and the exponential decoherence of the transmitted photons, a set of intermediate nodes can serve as quantum repeaters to perform entanglement swapping and directly entangle two distant nodes. Such quantum repeaters may be malicious and by setting up malicious entanglements, intermediate nodes can jeopardize the confidentiality of the quantum information exchanged between the two communication nodes. Hence, this paper proposes a quantum identity authentication protocol that protects quantum networks from malicious entanglements. Unlike the existing protocols, the proposed quantum authentication protocol does not require periodic refreshments of the shared secret keys. Simulation results demonstrate that the proposed protocol can detect malicious entanglements with a 100% probability after an average of 4 authentication rounds.

翻訳日:2023-12-12 18:51:17 公開日:2023-12-10

# コードリポジトリのためのコンテキスト対応コード生成フレームワーク:ローカル、グローバル、サードパーティライブラリの認識

Context-Aware Code Generation Framework for Code Repositories: Local, Global, and Third-Party Library Awareness ( http://arxiv.org/abs/2312.05772v1 )

ライセンス: Link先を確認

Dianshu Liao, Shidong Pan, Qing Huang, Xiaoxue Ren, Zhenchang Xing, Huan Jin, Qinying Li

(参考訳) コード生成ツールは、ソフトウェア開発プロセスの開発者を助けるために不可欠です。既存のツールはしばしば作業コンテキスト、すなわちコードリポジトリと切り離され、生成されたコードは人間の開発者と似ていない。本稿では,コードリポジトリ内の情報を利用して,論理エラーやコードの冗長性,ライブラリ関連の互換性問題などの少ないコードを生成するための,新しいコード生成フレームワークである \textbf{$a^3$}-codgenを提案する。本稿では,現在のコードファイルからのローカル認識情報,他のコードファイルからのグローバル認識情報,サードパーティライブラリ情報の3つのカテゴリを識別する。結果は、 \textbf{$a^3$}-codgenフレームワークを採用することで、コードのリポジトリ情報をllmに抽出、融合、フィードし、より正確で効率的で再利用可能なコードを生成することに成功した。我々のフレームワークの有効性は、人間の開発者に比べて高い再利用率のコードを生成することでさらに強調されている。この研究はコード生成の分野に大きく貢献し、開発者が実際にソフトウェア開発の進化する要求に対処するためのより強力なツールを提供する。

Code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed \textbf{$A^3$}-CodGen, to harness information within the code repository to generate code with fewer logical errors, code redundancy, and library-related compatibility issues. We identify three categories of representative information for the code repository: local-aware information from current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the \textbf{$A^3$}-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.

翻訳日:2023-12-12 18:50:56 公開日:2023-12-10

# メタラーニングにおけるタスク共同創設者のハッキング

Hacking Task Confounder in Meta-Learning ( http://arxiv.org/abs/2312.05771v1 )

ライセンス: Link先を確認

Jingyao Wang, Wenwen Qiang, Yi Ren, Zeen Song, Xingzhe Su, Changwen Zheng

(参考訳) メタ学習は、様々なタスクからメタ知識を学習することで、新しいタスクへの迅速な一般化を可能にする。モデルが1つのトレーニングバッチで学習するタスクが多ければ多いほど、より豊かな知識が得られ、より一般化のパフォーマンスが向上すると直感的に仮定される。しかし、この直感に反して、我々の実験は予期せぬ結果を示した: 1つのバッチにより多くのタスクを追加することは、実際に一般化性能を低下させる。この予期せぬ現象を説明するために,構造因果モデル(scm)を用いて因果分析を行う。本研究は,メタラーニングにおけるタスク固有の因果要因とラベルの相関関係を明らかにする。さらに、結合因子は異なるバッチ間で異なる。これらの要因を`Task Confounders'と呼んでいる。この知見に基づいて,タスク共同創設者の排除を目的としたメタ学習因果表現学習システム(MetaCRL)を提案する。複数のタスクから分離された因果因子をエンコードし、メタラーニングの因果性を保証するために不変ベースのバイレベル最適化機構を利用する。様々なベンチマークデータセットに対する大規模な実験により、我々の研究がSOTA(State-of-the-art)のパフォーマンスを達成することを示す。

Meta-learning enables rapid generalization to new tasks by learning meta-knowledge from a variety of tasks. It is intuitively assumed that the more tasks a model learns in one training batch, the richer knowledge it acquires, leading to better generalization performance. However, contrary to this intuition, our experiments reveal an unexpected result: adding more tasks within a single batch actually degrades the generalization performance. To explain this unexpected phenomenon, we conduct a Structural Causal Model (SCM) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as ``Task Confounders". Based on this insight, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled causal factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.

翻訳日:2023-12-12 18:50:36 公開日:2023-12-10

# ネットワークおよび遺伝的ネットワーク量子ステアリングの検出

Detection of Network and Genuine Network Quantum Steering ( http://arxiv.org/abs/2312.05769v1 )

ライセンス: Link先を確認

Zhihua Chen, Kai Wu, Shao-Ming Fei

(参考訳) 量子ネットワーク相関は、長距離量子通信、量子暗号、分散量子コンピューティングにおいて重要な役割を果たす。一般に、非局所性、絡み合い、操舵などの多部量子ネットワークの相関を特徴付けることは極めて困難である。本稿では,スターネットワーク構成の確率の観点から,ネットワークと真のネットワーク量子ステアリングモデルを提案する。線形および非線形の不等式が導出され、中央のパーティが1つの固定された測定を行うときに、ネットワークと真のネットワーク量子ステアリングを検出する。提案手法は,n局所量子ネットワークの破れよりも多くの量子ネットワークステアリングを検出できることを示す。さらに,biseparable assemblagesはスターネットワーク構成において真のネットワークステアリングを示すことができることを示した。

The quantum network correlations play significant roles in long distance quantum communication,quantum cryptography and distributed quantum computing. Generally it is very difficult to characterize the multipartite quantum network correlations such as nonlocality, entanglement and steering. In this paper, we propose the network and the genuine network quantum steering models from the aspect of probabilities in the star network configurations. Linear and nonlinear inequalities are derived to detect the network and genuine network quantum steering when the central party performs one fixed measurement. We show that our criteria can detect more quantum network steering than that from the violation of the n-locality quantum networks. Moreover, it is shown that biseparable assemblages can demonstrate genuine network steering in the star network configurations.

翻訳日:2023-12-12 18:50:17 公開日:2023-12-10

# Anomaly Diffusion:拡散モデルを用いたFew-Shot Anomaly Image Generation

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model ( http://arxiv.org/abs/2312.05767v1 )

ライセンス: Link先を確認

Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, Chengjie Wang

(参考訳) 異常検査は工業生産において重要な役割を果たす。既存の異常検査手法は、異常データ不足のため性能に制限がある。異常発生法は異常データを強化するために提案されているが、生成した異常とマスクの間の不正確さや不正確さに苦しむかのどちらかである。そこで本研究では, 大規模データセットから学習した潜在拡散モデルの強い先行情報を利用して, マイノリティトレーニングデータに基づく生成信頼性を向上させる, 新たな拡散型少数ショット異常生成モデルであるanomalydiffusionを提案する。まず、学習可能な異常埋め込みと、異常マスクから符号化された空間埋め込みからなり、異常情報を異常な外観と位置情報に切り離す空間異常埋め込みを提案する。さらに, 生成した異常と異常マスクとの整合性を改善するために, 適応的注意再重み付け機構を導入する。生成した異常画像と正常なサンプルとの差に基づいて、モデルを動的に誘導し、あまり目立たない生成異常の領域に焦点を合わせることにより、正確に一致した異常画像・マスク対を生成することができる。広範な実験により,本モデルが実効性と多様性において最先端手法を著しく上回り,下流異常検査タスクの性能を効果的に向上することを示した。コードとデータはhttps://github.com/sjtuplayer/anomalydiffusionで入手できる。

Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above problems, we propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model, which utilizes the strong prior information of latent diffusion model learned from large-scale dataset to enhance the generation authenticity under few-shot training data. Firstly, we propose Spatial Anomaly Embedding, which consists of a learnable anomaly embedding and a spatial embedding encoded from an anomaly mask, disentangling the anomaly information into anomaly appearance and location information. Moreover, to improve the alignment between the generated anomalies and the anomaly masks, we introduce a novel Adaptive Attention Re-weighting Mechanism. Based on the disparities between the generated anomaly image and normal sample, it dynamically guides the model to focus more on the areas with less noticeable generated anomalies, enabling generation of accurately-matched anomalous image-mask pairs. Extensive experiments demonstrate that our model significantly outperforms the state-of-the-art methods in generation authenticity and diversity, and effectively improves the performance of downstream anomaly inspection tasks. The code and data are available in https://github.com/sjtuplayer/anomalydiffusion.

翻訳日:2023-12-12 18:50:04 公開日:2023-12-10

# 階層的推論による多元的法的判断予測

Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning ( http://arxiv.org/abs/2312.05762v1 )

ライセンス: Link先を確認

Yougang Lyu, Jitai Hao, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren

(参考訳) 刑事事実記述における複数の被告は一般に複雑な相互作用を示しており、単一の被告に対する判決結果(例えば、法律記事、訴追、罰則)の予測に焦点を当てた既存の法的判断予測(ljp)手法ではうまく扱えない。この問題に対処するために,マルチディペンダント LJP の課題を提案し,マルチディペンダント事件の各被告に対する判断結果を自動予測することを目的とした。マルチディペンダント LJP の課題は,(1) 各被告の識別不能な判断結果, (2) 訓練と評価のための実世界のデータセットの欠如である。第1の課題に取り組むために,多元的判断プロセスを階層的推論連鎖として定式化し,階層的推論連鎖に従う階層的推論ネットワーク(hrn)と呼ばれる多元的ljp法を導入する。第2の課題に取り組むために,現実のマルチディペンダント LJP データセット,すなわち MultiLJP を収集し,今後の研究を加速する。 MultiLJPの大規模実験により提案したHRNの有効性が検証された。

Multiple defendants in a criminal fact description generally exhibit complex interactions, and cannot be well handled by existing Legal Judgment Prediction (LJP) methods which focus on predicting judgment results (e.g., law articles, charges, and terms of penalty) for single-defendant cases. To address this problem, we propose the task of multi-defendant LJP, which aims to automatically predict the judgment results for each defendant of multi-defendant cases. Two challenges arise with the task of multi-defendant LJP: (1) indistinguishable judgment results among various defendants; and (2) the lack of a real-world dataset for training and evaluation. To tackle the first challenge, we formalize the multi-defendant judgment process as hierarchical reasoning chains and introduce a multi-defendant LJP method, named Hierarchical Reasoning Network (HRN), which follows the hierarchical reasoning chains to determine criminal relationships, sentencing circumstances, law articles, charges, and terms of penalty for each defendant. To tackle the second challenge, we collect a real-world multi-defendant LJP dataset, namely MultiLJP, to accelerate the relevant research in the future. Extensive experiments on MultiLJP verify the effectiveness of our proposed HRN.

翻訳日:2023-12-12 18:49:38 公開日:2023-12-10

# qmgeo:混合切断幾何分布を用いた確率量子化による微分プライベートフェデレート学習

QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution ( http://arxiv.org/abs/2312.05761v1 )

ライセンス: Link先を確認

Zixi Wang and M. Cenk Gursoy

(参考訳) フェデレートラーニング(FL)は、複数のユーザがパラメータサーバの調整の下でのみモデル更新を送信し、データセットをローカルに保つことで、グローバル機械学習(ML)モデルを共同でトレーニングすることを可能にするフレームワークである。このような分散フレームワークの重要な動機の1つは、ユーザにプライバシ保証を提供することである。しかし、ユーザのデータセットをローカルに保存することは、プライバシには不十分であることが示されている。フレームワークにランダム性を導入することで、証明可能なプライバシー保証を提供するために、いくつかの差分プライバシー(DP)機構が提案されている。 FLフレームワークは、特に機械学習モデルが複雑さとサイズを増すにつれて、通信効率の課題にも直面する。量子化は一般的に利用される手法であり、基礎となる情報の圧縮表現を伝送することで通信コストを削減する。 FLにおけるDPと量子化の研究はいくつかあるが、プライバシ保証の提供における量子化手法の潜在的貢献は、まだ広く分析されていない。本稿では,混合幾何分布を用いて,付加雑音を伴わずにdpの提供に必要なランダム性を導入する新しい確率的量子化法を提案する。我々は,フレームワークの収束解析を行い,その性能を実証研究する。

Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.

翻訳日:2023-12-12 18:49:12 公開日:2023-12-10

# RepViT-SAM: リアルタイムセグメンテーションを目指す

RepViT-SAM: Towards Real-Time Segmenting Anything ( http://arxiv.org/abs/2312.05760v1 )

ライセンス: Link先を確認

Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

(参考訳) segment anything model (sam) は様々なコンピュータビジョンタスクにおいて印象的なゼロショット転送性能を示している。しかし、その計算コストは実用的用途にはまだ支障をきたしている。 MobileSAM は蒸留を用いて SAM の重い画像エンコーダを TinyViT に置き換えることを提案する。しかしながら、リソース制限されたモバイルデバイスへのデプロイメントは、自己保持機構によるメモリと計算オーバーヘッドの大幅な増加により、依然として課題に直面している。近年、RepViTはモバイルデバイス上での最先端のパフォーマンスとレイテンシのトレードオフを実現し、ViTの効率的なアーキテクチャ設計をCNNに組み込むことで実現している。そこで,モバイルSAMを追従して,モバイルデバイス上でのリアルタイムセグメンテーションを実現するため,SAMのヘビー級画像エンコーダをRepViTモデルに置き換え,最終的にRepViT-SAMモデルに置き換える。大規模な実験によると、RepViT-SAMはMobileSAMよりもはるかに優れたゼロショット転送能力を持ち、推論速度は10ドル近い。コードとモデルは \url{https://github.com/thu-mig/repvit} で利用可能である。

Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its deployment on resource-constrained mobile devices still encounters challenges due to the substantial memory and computational overhead caused by self-attention mechanisms. Recently, RepViT achieves the state-of-the-art performance and latency trade-off on mobile devices by incorporating efficient architectural designs of ViTs into CNNs. Here, to achieve real-time segmenting anything on mobile devices, following MobileSAM, we replace the heavyweight image encoder in SAM with RepViT model, ending up with the RepViT-SAM model. Extensive experiments show that RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10\times$ faster inference speed. The code and models are available at \url{https://github.com/THU-MIG/RepViT}.

翻訳日:2023-12-12 18:48:50 公開日:2023-12-10

# 1つのモデルを超えて: 自動運転車のためのディープラーニングを組み立てる

Beyond One Model Fits All: Ensemble Deep Learning for Autonomous Vehicles ( http://arxiv.org/abs/2312.05759v1 )

ライセンス: Link先を確認

Hemanth Manjunatha and Panagiotis Tsiotras

(参考訳) 深層学習は、車両が周囲を目覚ましい精度で認識し、解釈できるようにすることによって、自動運転に革命をもたらした。この進歩は、媒介的知覚、行動反射、直接知覚を含む様々なディープラーニングモデルに起因しており、それぞれが自律運転能力を向上させるためのユニークな利点と課題を提供している。しかし、これらのアプローチの統合と、様々な運転シナリオにおけるそれらの関連性を理解することに関する研究にはギャップがある。本研究では,Mediated Perception, Behavior Reflex, Direct Perceptionの3つの異なるニューラルネットワークモデルを紹介する。様々な運転条件においてその重要性を探り、それぞれのアプローチの強みと限界に光を当てる。我々のアーキテクチャは、ベース、将来の潜在ベクトル予測、補助タスクネットワークからの情報を融合し、グローバルルーティングコマンドを使用して適切なアクションサブネットワークを選択する。我々は,自動運転における多様なモデリング戦略を効果的に活用するための知見を実験と評価によって提供することを目的とする。その結果、アンサンブルモデルは個々のアプローチよりも優れた性能を示し、各モードが全体のモデルの性能に一意に寄与することが示唆された。さらに,各モダリティの重要性を探究することにより,ロバストな性能を実現するために複数のモデルを活用することの重要性を強調しながら,自動運転における今後の研究のロードマップを提供する。

Deep learning has revolutionized autonomous driving by enabling vehicles to perceive and interpret their surroundings with remarkable accuracy. This progress is attributed to various deep learning models, including Mediated Perception, Behavior Reflex, and Direct Perception, each offering unique advantages and challenges in enhancing autonomous driving capabilities. However, there is a gap in research addressing integrating these approaches and understanding their relevance in diverse driving scenarios. This study introduces three distinct neural network models corresponding to Mediated Perception, Behavior Reflex, and Direct Perception approaches. We explore their significance across varying driving conditions, shedding light on the strengths and limitations of each approach. Our architecture fuses information from the base, future latent vector prediction, and auxiliary task networks, using global routing commands to select appropriate action sub-networks. We aim to provide insights into effectively utilizing diverse modeling strategies in autonomous driving by conducting experiments and evaluations. The results show that the ensemble model performs better than the individual approaches, suggesting that each modality contributes uniquely toward the performance of the overall model. Moreover, by exploring the significance of each modality, this study offers a roadmap for future research in autonomous driving, emphasizing the importance of leveraging multiple models to achieve robust performance.

翻訳日:2023-12-12 18:48:31 公開日:2023-12-10

# CLeaRForecast:時系列予測のための高純度表現の対比学習

CLeaRForecast: Contrastive Learning of High-Purity Representations for Time Series Forecasting ( http://arxiv.org/abs/2312.05758v1 )

ライセンス: Link先を確認

Jiaxin Gao, Yuxiao Hu, Qinglong Cao, Siqi Dai, Yuntian Chen

(参考訳) 時系列予測(TSF)は多くの領域にまたがる現代社会において重要な意味を持つ。従来の表現型学習ベースのtsfアルゴリズムは、典型的な対比型学習パラダイムを採用しており、傾向周期性表現を分離している。しかし、これらの手法は、時系列データに埋め込まれた固有の高インパクトノイズを無視し、表現の不正確さと予測性能を著しく低下させる。 CLeaRForecastは,高純度時系列表現をサンプル,特徴量,アーキテクチャ浄化手法を用いて学習するための,新しいコントラスト学習フレームワークである。より具体的には、元のサンプル(シリーズ)の変換によって生じるより多くのノイズ付加を避けるために、変換は、それぞれ傾向のある部分と周期的な部分に適用される。さらに,多変量系列の無関係変数から発する雑音を緩和するために,チャネル独立学習方式を導入する。線形学習のバックボーンとグローバルなコントラスト損失関数を用いることで、周期性や傾向の冗長性や不均一性によるノイズ導入を防止する。実験の結果, 下流TSFタスクにおけるCLeaRForecastの性能は良好であった。

Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccuracies and seriously demoting the forecasting performance. To address this issue, we propose CLeaRForecast, a novel contrastive learning framework to learn high-purity time series representations with proposed sample, feature, and architecture purifying methods. More specifically, to avoid more noise adding caused by the transformations of original samples (series), transformations are respectively applied for trendy and periodic parts to provide better positive samples with obviously less noise. Moreover, we introduce a channel independent training manner to mitigate noise originating from unrelated variables in the multivariate series. By employing a streamlined deep-learning backbone and a comprehensive global contrastive loss function, we prevent noise introduction due to redundant or uneven learning of periodicity and trend. Experimental results show the superior performance of CLeaRForecast in various downstream TSF tasks.

翻訳日:2023-12-12 18:48:09 公開日:2023-12-10

# 人間のような知覚に向けて:不均一グラフにおける構造因果モデル学習

Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph ( http://arxiv.org/abs/2312.05757v1 )

ライセンス: Link先を確認

Tianqianjin Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

(参考訳) 異種グラフニューラルネットワークは様々な領域で普及している。しかしながら、それらの一般化可能性と解釈性は、固有の推論フローと人間の推論論理と、学習問題に対する基礎となる因果関係との相違により制限される。本研究では,構造因果モデルとしてHG-SCM (Heterogeneous Graph as Structure Causal Model)を提案する。グラフスキーマから派生したセマンティクスに基づく理解可能な変数の構築と、高度な因果関係発見技術の導入による、これらの変数間のタスクレベルの因果関係の自動学習である。我々は,HG-SCMを実世界の3つのデータセット上の7つの最先端ベースラインモデルと比較した。 HG-SCMは標準偏差を最小限に抑え、予測力と一般化性の両方の観点からその有効性と優位性を実証した。さらに,3つのタスクを対象とした自動学習因果図の可視化と解析は,ドメイン知識と人間の認知とよく一致し,顕著な解釈可能性を示した。 HG-SCMの人間的な性質と、その拡張された一般化性と解釈性は、透明性と信頼性が最重要である特別なシナリオに対して有望な解決策となる。

Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount.

翻訳日:2023-12-12 18:47:49 公開日:2023-12-10

# particle swarm optimization-back propagation neural network と multivariate gaussian-hidden markov model に基づくストックピッキングとタイミングの定量的融合戦略

A quantitative fusion strategy of stock picking and timing based on Particle Swarm Optimized-Back Propagation Neural Network and Multivariate Gaussian-Hidden Markov Model ( http://arxiv.org/abs/2312.05756v1 )

ライセンス: Link先を確認

Huajian Li, Longjian Li, Jiajian Liang, Weinan Dai

(参考訳) 近年、機械学習(ml)は経済的意思決定、投資予測、リスク管理などに効果的なアプローチと新しい技術をもたらし、経済・金融環境の可変かつ複雑な性質に対処している。本研究は,多変量ガウス・ハイデンマルコフモデル (MGHMM) とParticle Swarm (PSO-BPNN) に最適化されたバックプロパゲーションニューラルネットワークを活用することで,株価タイミングとピッキング戦略を組み合わせた定量的融合モデルを提案する。利得化、中和、標準化、CSI300指数の戻りを含む52の因子間の情報係数(IC)が算出された後、主成分分析(PCA)による次元減少後のPSO-BPNNの入力に向かう候補因子として、上位にランクインする要因の所定の量を選択し、次いで一定量の成分在庫を出力する。その後,過去4年間の卓越したパフォーマンスを示すBox-Cox変換後のCSI300インデックスデータを入力して訓練したMGHMMが出力するスクリーニング株と株式市場の状態に基づいて,予測と取引を行う。最終的に、従来の予測と取引の方法は、中国株式市場の戦略と比較される。本論文で提示する株式の選定とタイミングを取り入れた融合戦略は、金融分析の革新的な技術である。

In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by leveraging the Multivariate Gaussian-Hidden Markov Model (MGHMM) and Back Propagation Neural Network optimized by Particle Swarm (PSO-BPNN). After the information coefficients (IC) between fifty-two factors that have been winsorized, neutralized and standardized and the return of CSI 300 index are calculated, a given amount of factors that rank ahead are choose to be candidate factors heading for the input of PSO-BPNN after dimension reduction by Principal Component Analysis (PCA), followed by a certain amount of constituent stocks outputted. Subsequently, we conduct the prediction and trading on the basis of the screening stocks and stock market state outputted by MGHMM trained using inputting CSI 300 index data after Box-Cox transformation, bespeaking eximious performance during the period of past four years. Ultimately, some conventional forecast and trading methods are compared with our strategy in Chinese stock market. Our fusion strategy incorporating stock picking and timing presented in this article provide a innovative technique for financial analysis.

翻訳日:2023-12-12 18:47:27 公開日:2023-12-10

# グローバル・社会経済的・文化的レコメンダシステムを目指して

Towards Global, Socio-Economic, and Culturally Aware Recommender Systems ( http://arxiv.org/abs/2312.05805v1 )

ライセンス: Link先を確認

Kelley Ann Yohe

(参考訳) 消費者の嗜好をパーソナライズするためのレコメンデーションシステムが注目されている。これらのシステムは、主に広告推奨(Googleなど)、パーソナライズされた提案(NetflixやSpotifyなど)、小売業の選択(Amazonなど)といったアプリケーションに焦点を合わせてきたが、特に企業が多様な市場への進出を目指す中で、よりグローバルで社会経済的で文化的に意識されたアプローチの恩恵を受ける可能性がある。本稿では,文化的アイデンティティと社会経済的要因を考慮したレコメンダシステムの可能性を検討することを目的とする。近年のレコメンデーションシステムの発展を振り返り、文化的アイデンティティと社会経済的要因が消費者の嗜好に与える影響を考察する。次に,これらの因子をレコメンダシステムに組み込むためのオントロジーとアプローチを提案する。このアプローチの可能性を説明するために,エンタテインメント業界における消費者サブスクリプションプラン選択のシナリオを提案する。既存のレコメンデーターシステムは、社会経済的要因や文化的アイデンティティの認識が欠如しているため、ユーザの好みを正確に理解する能力が限られていると論じる。また、社会経済状況の変化に応じてレコメンデーションを更新することができない。さまざまな機械学習モデルを探索し、このギャップに対処する最終人工ニューラルネットワークモデル(ANN)を開発する。社会経済的・文化的に意識された推薦システムの有効性を,正確性,正確性,F1,リコールの4次元にわたって評価した。ドメイン固有データを含む高度に調整されたannモデル,文化指標の選択,関連する社会経済的要因は,95%の精度,94%の精度,92\%のf1スコア,90\%のリコールでユーザ好みを予測する。

Recommender systems have gained increasing attention to personalise consumer preferences. While these systems have primarily focused on applications such as advertisement recommendations (e.g., Google), personalized suggestions (e.g., Netflix and Spotify), and retail selection (e.g., Amazon), there is potential for these systems to benefit from a more global, socio-economic, and culturally aware approach, particularly as companies seek to expand into diverse markets. This paper aims to investigate the potential of a recommender system that considers cultural identity and socio-economic factors. We review the most recent developments in recommender systems and explore the impact of cultural identity and socio-economic factors on consumer preferences. We then propose an ontology and approach for incorporating these factors into recommender systems. To illustrate the potential of our approach, we present a scenario in consumer subscription plan selection within the entertainment industry. We argue that existing recommender systems have limited ability to precisely understand user preferences due to a lack of awareness of socio-economic factors and cultural identity. They also fail to update recommendations in response to changing socio-economic conditions. We explore various machine learning models and develop a final artificial neural network model (ANN) that addresses this gap. We evaluate the effectiveness of socio-economic and culturally aware recommender systems across four dimensions: Precision, Accuracy, F1, and Recall. We find that a highly tuned ANN model incorporating domain-specific data, select cultural indices and relevant socio-economic factors predicts user preference in subscriptions with an accuracy of 95%, a precision of 94%, a F1 Score of 92\%, and a Recall of 90\%.

翻訳日:2023-12-12 18:42:50 公開日:2023-12-10

# HumanCoser:Semantic-Aware Diffusion Modelによる階層型3Dヒューマンジェネレーション

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model ( http://arxiv.org/abs/2312.05804v1 )

ライセンス: Link先を確認

Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Yebin Liu, Kun Li

(参考訳) 近年、3d服を着た人間の世代が注目を集めている。しかし、既存の作業は、一貫した身体構造を持つ階層化された高品質な3D人間を生成できない。結果として、これらの方法は人間の身体や衣服を任意に別々に変更・編集することができない。本稿では,新しい物理分離意味認識拡散モデルに基づく,テキスト駆動型階層型3次元人間生成フレームワークを提案する。生成した衣服を対象のテキストと整合性を保つため,モデルが生成する非着装コンテンツを排除可能な衣服のセマンティック信頼戦略を提案する。衣服を異なる体型に合わせるため,衣服の自由な移動と再利用を可能にするsmpl駆動暗黙的場変形ネットワークを提案する。また,身体および衣服のsmplモデルに基づく均一な形状プリエントを導入し,特定のテンプレートに拘束されることなく,より多様な3dコンテンツを生成する。実験結果から,本手法は立体構造が一貫した3次元人体を生成できるだけでなく,自由な編集もできることがわかった。ソースコードは公開される予定だ。

The generation of 3D clothed humans has attracted increasing attention in recent years. However, existing work cannot generate layered high-quality 3D humans with consistent body structures. As a result, these methods are unable to arbitrarily and separately change and edit the body and clothing of the human. In this paper, we propose a text-driven layered 3D human generation framework based on a novel physically-decoupled semantic-aware diffusion model. To keep the generated clothing consistent with the target text, we propose a semantic-confidence strategy for clothing that can eliminate the non-clothing content generated by the model. To match the clothing with different body shapes, we propose a SMPL-driven implicit field deformation network that enables the free transfer and reuse of clothing. Besides, we introduce uniform shape priors based on the SMPL model for body and clothing, respectively, which generates more diverse 3D content without being constrained by specific templates. The experimental results demonstrate that the proposed method not only generates 3D humans with consistent body structures but also allows free editing in a layered manner. The source code will be made public.

翻訳日:2023-12-12 18:42:20 公開日:2023-12-10

# 画像の高精細化のための変圧器による選択的超解像法

Transformer-based Selective Super-Resolution for Efficient Image Refinement ( http://arxiv.org/abs/2312.05803v1 )

ライセンス: Link先を確認

Tianyi Zhang, Kishore Kasichainula, Yaoxin Zhuo, Baoxin Li, Jae-sun Seo, Yu Cao

(参考訳) 従来の超解像法には、2つの欠点がある: 大きな画像全体をアップスケールする際の相当な計算コストと、背景の洗練中に下流コンピュータビジョンタスクに異常または潜在的に有害な情報を導入することである。そこで本研究では,非重複タイルにイメージを分割し,ピラミッドアーキテクチャを用いて様々なスケールで興味のあるタイルを選択し,これら選択したタイルを精巧に再構築する,トランスフォーマティブ・ベースの新しいアルゴリズムであるssrを提案する。 3つのデータセットにおける実験結果は,超解像に対するアプローチの効率性とロバスト性を示している。最先端の手法と比較して、FIDスコアは26.78から10.41に削減され、BDD100Kデータセットの計算コストは40%削減された。ソースコードはhttps://github.com/destiny301/ssrで入手できる。

Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlapping tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset. The source code is available at https://github.com/destiny301/SSR.

翻訳日:2023-12-12 18:42:02 公開日:2023-12-10

# SGNet:Depth Map Super-Resolutionのための勾配周波数認識による構造案内ネットワーク

SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-Resolution ( http://arxiv.org/abs/2312.05799v1 )

ライセンス: Link先を確認

Zhengxu Wang and Zhiqiang Yan and Jian Yang

(参考訳) 深度超解像(DSR)は、高分解能(HR)深度を低分解能(LR)深度から復元することを目的としており、RGB画像がこの課題を促進するためにしばしば使用される。最近の画像誘導型DSRアプローチは主に深度構造を再構築するための空間領域に焦点を当てている。しかし、LR深度の構造は通常曖昧であるため、空間領域のみを考えると十分な結果を得るには不十分である。本稿では、高次構造を捕捉する固有の能力を有する勾配領域と周波数領域により注意を払う構造ガイドネットワーク(SGNet)を提案する。具体的には,まず,lr深度構造を研削するために,rgb前の正確な勾配を用いた勾配キャリブレーションモジュール(gcm)を導入する。次に、複数のスペクトル差分ブロック(SDB)を再帰的に実行し、RGBの正確な高周波成分をLR深さに伝播する周波数認識モジュール(FAM)を提案する。実データと合成データの両方に関する広範な実験結果は、sgnetの優位性を示し、最先端に到達しています。コードと事前学習されたモデルはhttps://github.com/yanzq95/sgnetで入手できる。

Depth super-resolution (DSR) aims to restore high-resolution (HR) depth from low-resolution (LR) one, where RGB image is often used to promote this task. Recent image guided DSR approaches mainly focus on spatial domain to rebuild depth structure. However, since the structure of LR depth is usually blurry, only considering spatial domain is not very sufficient to acquire satisfactory results. In this paper, we propose structure guided network (SGNet), a method that pays more attention to gradient and frequency domains, both of which have the inherent ability to capture high-frequency structure. Specifically, we first introduce the gradient calibration module (GCM), which employs the accurate gradient prior of RGB to sharpen the LR depth structure. Then we present the Frequency Awareness Module (FAM) that recursively conducts multiple spectrum differencing blocks (SDB), each of which propagates the precise high-frequency components of RGB into the LR depth. Extensive experimental results on both real and synthetic datasets demonstrate the superiority of our SGNet, reaching the state-of-the-art. Codes and pre-trained models are available at https://github.com/yanzq95/SGNet.

翻訳日:2023-12-12 18:41:45 公開日:2023-12-10

# 制御可能な人物画像生成のためのアンタングル表現学習

Disentangled Representation Learning for Controllable Person Image Generation ( http://arxiv.org/abs/2312.05798v1 )

ライセンス: Link先を確認

Wenju Xu, Chengjiang Long, Yongwei Nie, Guanghui Wang

(参考訳) 本稿では,制御可能な人物画像を生成するために,DRL-CPG という新しいフレームワークを提案する。これは,様々なソースの人物が提供した,所望のポーズと人的属性(例えば,ポーズ,頭,上着,ズボン)でリアルな人物画像を生成する。従来のセマンティックマスクを活用して各コンポーネントの表現を得る作業とは違って,比較的簡単な段階から徐々に難しい段階へのカリキュラム学習で学習したトランスフォーマーを用いた,新しい属性エンコーダによる非絡み付き潜在コード生成を提案する。個人セグメンテーションマスクからコンポーネントマスクをランダムに除去するランダムコンポーネントマスク非依存戦略を導入し、トレーニングの困難化とトランスフォーマーエンコーダの促進を目標とし、各コンポーネント間の基底境界を認識する。これにより、モデルがコンポーネントの形状とテクスチャの両方を転送できる。さらに,複数レベル属性(例えば,構造特徴と属性表現)をよく設計されたDual Adaptive Denormalization (DAD)残余ブロックと統合する属性デコーダネットワークを提案する。広範囲にわたる実験により,提案手法は異なる人間の部位のテクスチャと形状の両方を伝達し,現実的な結果が得られることが示された。我々の知る限り、私たちは人物画像生成のためのトランスフォーマーを用いた非絡み合った潜在表現を初めて学習する。

In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g., pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the model to transfer both the shape and texture of the components. Furthermore, we propose a novel attribute decoder network to integrate multi-level attributes (e.g., the structure feature and the attribute representation) with well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive experiments strongly demonstrate that the proposed approach is able to transfer both the texture and shape of different human parts and yield realistic results. To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.

翻訳日:2023-12-12 18:41:24 公開日:2023-12-10

# オンライン教育におけるマルチモーダリティ : 比較研究

Multimodality in Online Education: A Comparative Study ( http://arxiv.org/abs/2312.05797v1 )

ライセンス: Link先を確認

Praneeta Immadisetty, Pooja Rajesh, Akshita Gupta, Anala M R, Soumya A, K. N. Subramanya

(参考訳) 十年が経つと、それは重大なパンデミックとなり、教育フォーラムがオンラインの世界へと大きく移行した。生徒の理解を深めるためのオンラインビデオ会議プラットフォームやツールの利用が急増しているため、教官が生徒が対象と教育的刺激に対する反応を理解する程度を把握できるかどうかを評価するためのメカニズムが必要である。現在のシステムは、教育分野に焦点をあてていない単一のキューのみを考慮する。したがって、対象物に対する学生の反応の全体的概観を総合的に測定する必要性がある。本稿では, 姿勢・ジェスチャー, 顔, 視線追跡, 言語認識の4つの手がかりを考慮しつつ, 認識とオンライン教室への展開に影響を与えるマルチモーダルアプローチの必要性を強調した。各キューで利用可能なさまざまな機械学習モデルを比較し、利用可能なデータセットと教室映像のパラメータを考えると、最も適切なアプローチを提供する。重み付けされた多数決投票から導かれるマルチモーダル手法は, 精度, 調達容易性, 感度, 主要な欠点に基づいて, 個々の手がかりから最も適合したモデルを組み合わせることによって提案される。

The commencement of the decade brought along with it a grave pandemic and in response the movement of education forums predominantly into the online world. With a surge in the usage of online video conferencing platforms and tools to better gauge student understanding, there needs to be a mechanism to assess whether instructors can grasp the extent to which students understand the subject and their response to the educational stimuli. The current systems consider only a single cue with a lack of focus in the educational domain. Thus, there is a necessity for the measurement of an all-encompassing holistic overview of the students' reaction to the subject matter. This paper highlights the need for a multimodal approach to affect recognition and its deployment in the online classroom while considering four cues, posture and gesture, facial, eye tracking and verbal recognition. It compares the various machine learning models available for each cue and provides the most suitable approach given the available dataset and parameters of classroom footage. A multimodal approach derived from weighted majority voting is proposed by combining the most fitting models from this analysis of individual cues based on accuracy, ease of procuring data corpus, sensitivity and any major drawbacks.

翻訳日:2023-12-12 18:40:58 公開日:2023-12-10

# AntGroupにおける効率的なプルーニングと蒸留による大規模マルチモーダルモデル圧縮

Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup ( http://arxiv.org/abs/2312.05795v1 )

ライセンス: Link先を確認

Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao

(参考訳) AntGroupにLarge Multimodal Models(LMM)が配備されたことにより、Alipayにおける広告オーディションタスクの強化など、支払い、セキュリティ、広告におけるマルチモーダルタスクが大幅に進歩した。しかし、このような大規模なモデルの展開は、特にグリーンAIの理想に反するレイテンシや二酸化炭素排出量の増加に課題をもたらす。本稿では,当社独自のLLMであるAntGMMに対して,新しいマルチステージ圧縮戦略を提案する。提案手法は, 小規模のサンプルサイズの採用, 多段プルーニングによる多段冗長性への対処, 高度な蒸留損失設計の導入という3つの側面に焦点をあてている。本研究では,alipayにおける実世界のシナリオから,maad(multimodal ad audition dataset)というデータセットを構築し,提案手法の信頼性を検証する実験を行った。さらに,本戦略の有効性は2023年9月から3ヶ月間のalipayのマルチモーダル広告オーディションにおいて,その運用的成功に現れている。特に、当社のアプローチは遅延を大幅に削減し、700msから90msに削減しました。さらに,我々の圧縮モデルでは,AntGMMの直接展開と比較して,年間約7500万kWhの消費電力削減が期待でき,グリーンAIイニシアチブへのコミットメントを示す。いくつかのレビュー(footnote{https://github.com/MorinW/AntGMM$\_$Pruning})の後、コードとMAADデータセットを公開します。

The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay's real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives. We will publicly release our code and the MAAD dataset after some reviews\footnote{https://github.com/MorinW/AntGMM$\_$Pruning}.

翻訳日:2023-12-12 18:40:36 公開日:2023-12-10

# 高次元線形ガウス系におけるサンプル共分散行列のスペクトル統計

Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear Gaussians ( http://arxiv.org/abs/2312.05794v1 )

ライセンス: Link先を確認

Muhammad Abdullah Naeem, Miroslav Pajic

(参考訳) Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}^{*})=\sum_{i=0}^{N-1}x_{i}x_{i}^{*}$ and singular values of $EX_{-}^{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. 負のモーメントはすべての固有値 $\lambda_{1}\big(X_{-}X_{-}^{*}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}^{*}\big) \geq 0$ に対して鋭い推定を必要とする。測度現象と摂動理論(gershgorins' と cauchys' interlacing theorem)の集中とともに、 \cite{naeem2023spectral} における非エルミート作用素のスペクトル定理の最近の結果を利用して、$a=a^{*}$ のときのみ、$\lambda_{j}\big(x_{-}x_{-}^{*}\big) \in \big[n-n\sqrt{n}, n+n\sqrt{n}\big]$ の典型的な順序がすべての$j \in [n]$ であることを示した。しかし \emph{high dimension} において、$a$ が1つの異なる固有値 $\lambda$ と1つの幾何学的多重性を持つとき、固有値が \emph{complex half unit disc} を去ると、最大の固有値が次元の呪いに苦しむ: $\lambda_{1}\big(x_{-}x_{-}^{*}\big)=\omega\big( \lfloor\frac{n}{n}\rfloor e^{\alpha_{\lambda}n} \big)$, 最小の固有値 $\lambda_{n}\big(x_{-}x_{-}^{*}\big) \in (0,n+\sqrt{n}]$ である。したがって、ols推定器は \emph{phase transition} を発生させ、 \emph{transient: increasing iteration only worsens estimation error} となる。

Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}^{*})=\sum_{i=0}^{N-1}x_{i}x_{i}^{*}$ and singular values of $EX_{-}^{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues $\lambda_{1}\big(X_{-}X_{-}^{*}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}^{*}\big) \geq 0$. Leveraging upon recent results on spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along with concentration of measure phenomenon and perturbation theory(Gershgorins' and Cauchys' interlacing theorem) we show that only when $A=A^{*}$, typical order of $\lambda_{j}\big(X_{-}X_{-}^{*}\big) \in \big[N-n\sqrt{N}, N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when $A$ has only one distinct eigenvalue $\lambda$ with geometric multiplicity of one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest eigenvalue suffers from curse of dimensionality: $\lambda_{1}\big(X_{-}X_{-}^{*}\big)=\Omega\big( \lfloor\frac{N}{n}\rfloor e^{\alpha_{\lambda}n} \big)$, while smallest eigenvalue $\lambda_{n}\big(X_{-}X_{-}^{*}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS estimator incurs a \emph{phase transition} and becomes \emph{transient: increasing iteration only worsens estimation error}, all of this happening when the dynamics are generated from stable systems.

翻訳日:2023-12-12 18:40:09 公開日:2023-12-10

# 統計的空間的不均質拡散推論

Statistical Spatially Inhomogeneous Diffusion Inference ( http://arxiv.org/abs/2312.05793v1 )

ライセンス: Link先を確認

Yinuo Ren, Yiping Lu, Lexing Ying, Grant M. Rotskoff

(参考訳) 離散観測から拡散方程式を推定することは、生体物理系の単一分子追跡から金融機器のモデリングに至るまで、様々な分野で重要な課題である。基礎となる力学過程が$d$-次元確率微分方程式に従えば、$$$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t}_t,$$$$ はドリフト $\boldsymbol{b}$ と空間非同次拡散テンソル $D = \Sigma\Sigma^{T} の両方のニューラルネットワークに基づく推定器を提案し、$d\boldsymbol{x}_t} と $D$D$D$-$-D が連続であるときの統計収束を保証する。特に、観測データ内に相関が存在する場合であっても、非パラメトリック関数推定のために最小値の最適値 $N^{-\frac{2s}{2s+d}}$ と整列する。この理論結果は,空間的不均質拡散テンソルの正確な推定を示す数値実験によって裏付けられる。

Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form $$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+\Sigma(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t,$$ we propose neural network-based estimators of both the drift $\boldsymbol{b}$ and the spatially-inhomogeneous diffusion tensor $D = \Sigma\Sigma^{T}$ and provide statistical convergence guarantees when $\boldsymbol{b}$ and $D$ are $s$-H\"older continuous. Notably, our bound aligns with the minimax optimal rate $N^{-\frac{2s}{2s+d}}$ for nonparametric function estimation even in the presence of correlation within observational data, which necessitates careful handling when establishing fast-rate generalization bounds. Our theoretical results are bolstered by numerical experiments demonstrating accurate inference of spatially-inhomogeneous diffusion tensors.

翻訳日:2023-12-12 18:38:59 公開日:2023-12-10

# 不規則な経路を取る:時系列予測変換器のデコーダ

Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting Transformer ( http://arxiv.org/abs/2312.05792v1 )

ライセンス: Link先を確認

Li Shen, Yuning Wei, Yangzhu Wang, Hongguang Li

(参考訳) モノのインターネット(IoT)システムの開発において,意思決定者が現状を評価し,今後の政策を定式化する上で,正確な長期予測手法が不可欠である。現在、トランスフォーマーとmlpは、深い時系列予測のための2つのパラダイムであり、前者は、その優れた注意機構とエンコーダ-デコーダアーキテクチャのおかげで、より普及している。しかし、データ科学者はエンコーダの研究に参入する意思があり、デコーダは無意識のままである。一部の研究者は複雑さを減らすためにデコーダの代わりに線形射影も採用している。我々は、入力シーケンスの特徴を抽出し、エンコーダとデコーダのそれぞれの機能である入力シーケンスと予測シーケンスの関係を求めることが最重要であると主張している。 CV分野におけるFPNの成功を機に,エンコーダとデコーダのボトムアップアーキテクチャとトップダウンアーキテクチャを用いて,フルかつ合理的な階層を構築するFPPformerを提案する。本研究における要素的注意を改良したエンコーダとデコーダの形式も異なるカッティング・エッジ・パッチ・アズ・アウェイ・アテンションを活用し,さらに発展させた。 12ベンチマークの6つの最先端ベースラインによる大規模な実験により、FPPformerの有望な性能と、Transformerの時系列予測における精巧なデコーダの重要性が検証された。ソースコードはhttps://github.com/OrigamiSL/FPPformerで公開されている。

With the development of Internet of Things (IoT) systems, precise long-term forecasting method is requisite for decision makers to evaluate current statuses and formulate future policies. Currently, Transformer and MLP are two paradigms for deep time-series forecasting and the former one is more prevailing in virtue of its exquisite attention mechanism and encoder-decoder architecture. However, data scientists seem to be more willing to dive into the research of encoder, leaving decoder unconcerned. Some researchers even adopt linear projections in lieu of the decoder to reduce the complexity. We argue that both extracting the features of input sequence and seeking the relations of input and prediction sequence, which are respective functions of encoder and decoder, are of paramount significance. Motivated from the success of FPN in CV field, we propose FPPformer to utilize bottom-up and top-down architectures respectively in encoder and decoder to build the full and rational hierarchy. The cutting-edge patch-wise attention is exploited and further developed with the combination, whose format is also different in encoder and decoder, of revamped element-wise attention in this work. Extensive experiments with six state-of-the-art baselines on twelve benchmarks verify the promising performances of FPPformer and the importance of elaborately devising decoder in time-series forecasting Transformer. The source code is released in https://github.com/OrigamiSL/FPPformer.

翻訳日:2023-12-12 18:38:12 公開日:2023-12-10

# SimPSI: 時系列データ拡張におけるスペクトル情報保存のための簡易戦略

SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation ( http://arxiv.org/abs/2312.05790v1 )

ライセンス: Link先を確認

Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

(参考訳) データ拡張は、データサイズによる制限を克服するために、ニューラルネットワークをトレーニングする上で重要な要素であり、時系列のためにいくつかの技術が研究されている。これらのテクニックは特定のタスクで有効であるが、時系列ベンチマークにはまだ一般化されていない。現在のデータ拡張技術は、周波数領域に含まれるコア情報を台無しにする。そこで本研究では,時系列データ拡張におけるスペクトル情報(SimPSI)の保存方法を提案する。 SimPSIは、各周波数の重要度を示す保存マップによって重み付けされた元の入力スペクトルと拡張入力スペクトルを混合することによりスペクトル情報を保存する。特に、我々の実験的な貢献は、マグニチュードスペクトル、サリエンシーマップ、スペクトル保存マップの3つの異なる保存マップを構築することである。我々は,SimPSIを様々な時系列データ拡張に適用し,その効果を広範囲の時系列ベンチマークで評価する。実験結果から,SimPSIはコアスペクトル情報を保存することで時系列データ拡張の性能を大幅に向上することがわかった。論文で使用されたソースコードはhttps://github.com/Hyun-Ryu/simpsi.comで公開されている。

Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency domain. To address this issue, we propose a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. SimPSI preserves the spectral information by mixing the original and augmented input spectrum weighted by a preservation map, which indicates the importance score of each frequency. Specifically, our experimental contributions are to build three distinct preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. We apply SimPSI to various time series data augmentations and evaluate its effectiveness across a wide range of time series benchmarks. Our experimental results support that SimPSI considerably enhances the performance of time series data augmentations by preserving core spectral information. The source code used in the paper is available at https://github.com/Hyun-Ryu/simpsi.

翻訳日:2023-12-12 18:37:44 公開日:2023-12-10

# 高再生率と正規化を考慮したスパース・リワードゴール・コンディション強化学習

Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization ( http://arxiv.org/abs/2312.05787v1 )

ライセンス: Link先を確認

Takuya Hiraoka

(参考訳) 高再生率(RR)と正則化を有する強化学習(RL)法は, より優れた試料効率により注目されている。しかし、これらの手法は主に密帰的タスクのために開発された。本稿では、これらのRL手法をスパース逆ゴール条件タスクに拡張することを目的とする。我々はRandomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021) を用いた。 REDQをスパース・リワード目標条件タスクに適用するには、以下の修正を加えます。 (i)後見体験リプレイと (ii)バウンディングターゲットのq値。我々は,ロボット工学における目標条件12タスク(plappert et al., 2018)において,これらの修正によりredqを評価し,従来のsota(state-of-the-art) rl法よりも約2 \times$良いサンプル効率が得られることを示した。さらに、REDQの特定のコンポーネントの必要性を再考し、不要なものを取り除き、それを単純化する。我々の修正によって単純化されたREDQは、ロボティクスの4つのFetchタスクのSoTAメソッドよりも、$\sim 8 \times$優れたサンプル効率が得られる。

Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.

翻訳日:2023-12-12 18:37:24 公開日:2023-12-10

# 深層強化学習を用いた動的環境におけるスケーラブルな自動運転のためのグラフベース予測計画政策ネットワーク(GP3Net)

Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning ( http://arxiv.org/abs/2312.05784v1 )

ライセンス: Link先を確認

Jayabrata Chowdhury, Venkataramanan Shivaraman, Suresh Sundaram and P B Sujit

(参考訳) 最近の自動運転車のモーションプランニング(avs)の進歩は、非定常運転環境でのエキスパートドライバーの行動を使うことに大きな期待を示している。しかし、専門家ドライバーによる学習は、交通参加者のダイナミックな振る舞いと気象条件のために、ドメインシフトやほぼ障害シナリオから回復するために、より汎用性を必要とする。深層グラフに基づく予測・計画政策ネットワーク(GP3Net)フレームワークは,交通参加者間のインタラクションをコンテキスト情報にエンコードし,AVの安全な操作を判断する非定常環境に対して提案されている。時空間グラフは、トラヒック参加者間の相互作用をモデル化し、その参加者の将来の軌跡を予測する。予測された軌道は、進化する非定常運転環境を予測するために不確実性が埋め込まれたAV周辺の将来の占有マップを生成するために利用される。次に、gp3netフレームワークのポリシーネットワークにコンテキスト情報と将来の占有マップを入力し、近位ポリシー最適化(ppo)アルゴリズムを用いてトレーニングする。提案したGP3Net性能は,交通パターンのドメインシフト(アーバン,ハイウェイ,混合)を基準とした標準CARLAベンチマークシナリオで評価される。その結果,gp3netは旧来の模倣学習型計画モデルよりも優れていた。さらに、目に見えない新しい気象条件では、GP3Netはより少ないトラフィック違反で所望の経路を完成させる。最後に,非定常環境における安全対策を強化するための予測モジュールの導入の利点を強調する。

Recent advancements in motion planning for Autonomous Vehicles (AVs) show great promise in using expert driver behaviors in non-stationary driving environments. However, learning only through expert drivers needs more generalizability to recover from domain shifts and near-failure scenarios due to the dynamic behavior of traffic participants and weather conditions. A deep Graph-based Prediction and Planning Policy Network (GP3Net) framework is proposed for non-stationary environments that encodes the interactions between traffic participants with contextual information and provides a decision for safe maneuver for AV. A spatio-temporal graph models the interactions between traffic participants for predicting the future trajectories of those participants. The predicted trajectories are utilized to generate a future occupancy map around the AV with uncertainties embedded to anticipate the evolving non-stationary driving environments. Then the contextual information and future occupancy maps are input to the policy network of the GP3Net framework and trained using Proximal Policy Optimization (PPO) algorithm. The proposed GP3Net performance is evaluated on standard CARLA benchmarking scenarios with domain shifts of traffic patterns (urban, highway, and mixed). The results show that the GP3Net outperforms previous state-of-the-art imitation learning-based planning models for different towns. Further, in unseen new weather conditions, GP3Net completes the desired route with fewer traffic infractions. Finally, the results emphasize the advantage of including the prediction module to enhance safety measures in non-stationary environments.

翻訳日:2023-12-12 18:37:01 公開日:2023-12-10

# DCIR:マルチエージェント強化学習のための動的一貫性固有のリワード

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2312.05783v1 )

ライセンス: Link先を確認

Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan

(参考訳) マルチエージェントシステムにおけるエージェント毎の最適行動ポリシーの学習は必須だが難しい問題である。マルチエージェント強化学習は実りある進歩を遂げているが、2つのエージェントが一貫性のある行動を示すべきかどうかのダイナミクスに対処するという課題はまだ未解決である。本稿では,各エージェントに対して最適なポリシーを学習するために本質的な報酬を利用することで,エージェントの行動が他のエージェントの行動と一致しているかどうかを学習できる新しいアプローチを提案する。振る舞いの一貫性を、2つのエージェント間の出力アクションの相違として定義することから始めます。次に,他者の行動に気付くエージェントを刺激し,それと一貫性があるかどうかを判断するために,動的一貫性内在報酬(dcir)を導入する。最後に,エージェントの学習可能なスケールファクタを各ステップ毎に提供するダイナミックスケールネットワーク(dsn)を考案し,一貫した行動と報酬の程度を動的に確認する。マルチエージェント粒子, Google Research Football および StarCraft II マイクロマネジメントを含む複数の環境における DCIR の評価を行い,その有効性を示した。

Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy.

翻訳日:2023-12-12 18:36:38 公開日:2023-12-10

# PULSAR:パーキンソン病認識のためのマルチストリーム適応畳み込みを用いたグラフベース正の未ラベル学習

PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition ( http://arxiv.org/abs/2312.05780v1 )

ライセンス: Link先を確認

Md. Zarif Ul Alam, Md Saiful Islam, Ehsan Hoque, M Saifur Rahman

(参考訳) パーキンソン病(英: Parkinson's disease、PD)は、運動、言語、協調に影響を及ぼす神経変性疾患である。タイムリーな診断と治療はpd患者の生活の質を改善することができる。しかし、低所得国(LMIC)では臨床診断へのアクセスが制限されている。したがって、PDのための自動スクリーニングツールの開発は、特に公衆衛生分野において大きな社会的影響をもたらす可能性がある。本稿では,運動障害学会(united parkinson's disease rating scale (mds-updrs)) の指テーピングタスクをウェブカメラで録画したビデオからpdをスクリーニングする新しい方法であるpulsarを提案する。 PULSARは,382名(PD患者183名)から収集したデータに基づいて,訓練および評価を行った。適応型グラフ畳み込みニューラルネットワークを用いて,フィンガーテーピングタスクに特有の時間的グラフエッジを動的に学習した。指関節の相対的位置, 触覚の速度, 加速度など, PD検出に重要となる様々なデータから特徴を学習するために, マルチストリーム適応畳み込みモデルを用いてこのアイデアを拡張した。ビデオのラベルが自己申告されているため、非PDラベルのサンプルに未診断のPDがある可能性がある。我々は、ラベル付き負のデータを必要としないPositive Unlabeled (PU) Learningというアイデアを活用しました。我々の実験は、この方法で問題をモデル化する利点を明らかに示している。 PULSARは検証セットの80.95%の精度を達成し、データ量に制限があるにもかかわらず、独立したテストでは平均71.29%(2.49%の標準偏差)の精度を達成した。これは医療分野でラベル付きデータが不足しているため、特に有望である。 PULSARは、PDスクリーニングを誰にとってもよりアクセスしやすいものにすることを願っている。提案手法は、失調症やハンティントン病などの他の運動障害を評価するために拡張することができる。

Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public health sector. In this paper, we present PULSAR, a novel method to screen for PD from webcam-recorded videos of the finger-tapping task from the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS). PULSAR is trained and evaluated on data collected from 382 participants (183 self-reported as PD patients). We used an adaptive graph convolutional neural network to dynamically learn the spatio temporal graph edges specific to the finger-tapping task. We enhanced this idea with a multi stream adaptive convolution model to learn features from different modalities of data critical to detect PD, such as relative location of the finger joints, velocity and acceleration of tapping. As the labels of the videos are self-reported, there could be cases of undiagnosed PD in the non-PD labeled samples. We leveraged the idea of Positive Unlabeled (PU) Learning that does not need labeled negative data. Our experiments show clear benefit of modeling the problem in this way. PULSAR achieved 80.95% accuracy in validation set and a mean accuracy of 71.29% (2.49% standard deviation) in independent test, despite being trained with limited amount of data. This is specially promising as labeled data is scarce in health care sector. We hope PULSAR will make PD screening more accessible to everyone. The proposed techniques could be extended for assessment of other movement disorders, such as ataxia, and Huntington's disease.

翻訳日:2023-12-12 18:36:09 公開日:2023-12-10

# クロスサイロ知識伝達を用いた大小言語モデルの相互強化

Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer ( http://arxiv.org/abs/2312.05842v1 )

ライセンス: Link先を確認

Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang

(参考訳) 大きな言語モデル(LLM)は広い知識で権限を与えられるが、タスク固有のパフォーマンスは、しばしば準最適である。タスク固有のデータで微調整 LLM を必要とするが、プライバシー上の懸念からアクセスできない可能性がある。本稿では,より小さな言語モデル (SLM) を用いたLLMの拡張手法を提案する。 LLMとSLMの相互強化を実現するために,SLMがタスク固有の高品質なデータを生成するためにLSMを推進し,SLMとSLMの双方が生成されたデータによって拡張されるCrossLMを提案する。様々なベンチマークタスクで公開言語モデルを用いてCrossLMを評価する。その結果、CrossLMはクライアント上でのSLMのタスク固有性能と、LLMの一般化能力を同時に維持しながら、クラウドサーバ上でのLCMのタスク固有性能を著しく向上させることを示した。

While large language models (LLMs) are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible language models across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.

翻訳日:2023-12-12 18:28:34 公開日:2023-12-10

# ニューラルネットワーク解析のためのトポロジカルデータ分析:包括的調査

Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey ( http://arxiv.org/abs/2312.05840v1 )

ライセンス: Link先を確認

Rub\'en Ballester, Carles Casacuberta, Sergio Escalera

(参考訳) このサーベイは、ニューラルネットワーク分析におけるトポロジカルデータ分析(TDA)の適用を包括的に調査する。永続的ホモロジーやMapperといったTDAツールを使用して、ニューラルネットワークとそのデータセットの複雑な構造と振る舞いを調べます。本稿では,データおよびニューラルネットワークから位相情報を得るための様々な戦略について,tdaを用いて検討する。さらに,その一般化能力や表現性など,ニューラルネットワークの特性を分析するためにトポロジカル情報をどのように活用するかについて検討する。深層学習の実際的意義を探究し,特に逆検出やモデル選択といった分野に注目した。調査は,調査対象を4つの広い領域にまとめる。 1.ニューラルネットワークアーキテクチャの特徴 2. 決定領域及び境界の分析 3 内部表現、活性化及びパラメータに関する研究 4. 訓練ダイナミクスと損失関数の探索それぞれのカテゴリの中で,様々な方法論を理解するための背景情報を提供するいくつかの記事について論じる。我々は,本研究から得られた重要な知見を合成し,その分野における課題と潜在的な進歩について議論した。

This survey provides a comprehensive exploration of applications of Topological Data Analysis (TDA) within neural network analysis. Using TDA tools such as persistent homology and Mapper, we delve into the intricate structures and behaviors of neural networks and their datasets. We discuss different strategies to obtain topological information from data and neural networks by means of TDA. Additionally, we review how topological information can be leveraged to analyze properties of neural networks, such as their generalization capacity or expressivity. We explore practical implications of deep learning, specifically focusing on areas like adversarial detection and model selection. Our survey organizes the examined works into four broad domains: 1. Characterization of neural network architectures; 2. Analysis of decision regions and boundaries; 3. Study of internal representations, activations, and parameters; 4. Exploration of training dynamics and loss functions. Within each category, we discuss several articles, offering background information to aid in understanding the various methodologies. We conclude with a synthesis of key insights gained from our study, accompanied by a discussion of challenges and potential advancements in the field.

翻訳日:2023-12-12 18:28:19 公開日:2023-12-10

# 量子インスパイアされたイジング最適化問題の高速数値解法

A Fast Numerical Solver of Quantum-inspired Ising Optimization Problems ( http://arxiv.org/abs/2312.05837v1 )

ライセンス: Link先を確認

Langyu Li and Yu Pan

(参考訳) 量子アニーラ、コヒーレントイジングマシン、および量子インスパイアされた最適化問題を解決するデジタルイジングマシンは、その短期的応用のために急速に発展してきた。デジタルイジングマシンの数値解法は、従来の計算装置に基づいている。本研究では,Ising最適化問題に対する高速かつ効率的な解法を提案する。このアルゴリズムは、イジングモデルのグラフ情報を利用して計算複雑性を低減させるプルーニング法と、離散実行可能領域を連続的に緩和し、効率的な勾配降下法を組み込んだ領域選択法とからなる。実験の結果, 従来の解法よりも桁違いに高速であり, ベンチマーク問題に対する量子アニールを含む量子インスピレーションアニールよりも少なくとも2倍高速であることがわかった。ハードウェアに対する要求が緩和され、量子アニールよりも低コストになるため、提案した解法は、挑戦的な最適化問題の解決における短期的応用の可能性と、量子デバイスの利点を評価するためのベンチマークとして機能する。

Quantum annealers, coherent Ising machines and digital Ising machines for solving quantum-inspired optimization problems have been developing rapidly due to their near-term applications. The numerical solvers of the digital Ising machines are based on traditional computing devices. In this work, we propose a fast and efficient solver for the Ising optimization problems. The algorithm consists of a pruning method that exploits the graph information of the Ising model to reduce the computational complexity, and a domain selection method which introduces significant acceleration by relaxing the discrete feasible domain into a continuous one to incorporate the efficient gradient descent method. The experiment results show that our solver can be an order of magnitude faster than the classical solver, and at least two times faster than the quantum-inspired annealers including the simulated quantum annealing on the benchmark problems. With more relaxed requirements on hardware and lower cost than quantum annealing, the proposed solver has the potential for near-term application in solving challenging optimization problems as well as serving as a benchmark for evaluating the advantage of quantum devices.

翻訳日:2023-12-12 18:28:04 公開日:2023-12-10

# 大規模言語モデルを用いたエビデンスに基づくオープンドメインファクトチェック

Evidence-based Interpretable Open-domain Fact-checking with Large Language Models ( http://arxiv.org/abs/2312.05834v1 )

ライセンス: Link先を確認

Xin Tan, Bowei Zou and Ai Ti Aw

(参考訳) 現実の主張に対する普遍的なファクトチェックシステムは、有効かつ十分なリアルタイムの証拠を集め、合理的な判断を下す上で大きな課題に直面している。本稿では,実世界シナリオにおけるクレームチェックのためのオープンドメイン説明可能な事実チェックシステム(oe-fact)を提案する。 OE-Factシステムは、大規模言語モデル(LLM)の強力な理解と推論能力を活用して、クレームを検証し、ファクトチェック決定のための因果説明を生成する。従来の3モジュールファクトチェックフレームワークをオープンドメイン設定に適応させるために,まず,オープンwebサイトからクレーム関連情報を適切な証拠として取得する。その後、llmおよびその後の検証のための類似性計算により、請求に係る証拠を保持する。我々は、ファクト抽出および検証(fever)データセット上での3モジュールoeファクトシステムの性能を評価する。実験結果から,我々のOE-Factシステムは,クローズドドメインとオープンドメインの両方のシナリオにおいて,一般的なファクトチェックベースラインシステムよりも優れた性能を示し,信頼性と正確性を確保しつつ,ファクトチェック決定のための簡潔かつ説得力のあるリアルタイム説明を提供する。

Universal fact-checking systems for real-world claims face significant challenges in gathering valid and sufficient real-time evidence and making reasoned decisions. In this work, we introduce the Open-domain Explainable Fact-checking (OE-Fact) system for claim-checking in real-world scenarios. The OE-Fact system can leverage the powerful understanding and reasoning capabilities of large language models (LLMs) to validate claims and generate causal explanations for fact-checking decisions. To adapt the traditional three-module fact-checking framework to the open domain setting, we first retrieve claim-related information as relevant evidence from open websites. After that, we retain the evidence relevant to the claim through LLM and similarity calculation for subsequent verification. We evaluate the performance of our adapted three-module OE-Fact system on the Fact Extraction and Verification (FEVER) dataset. Experimental results show that our OE-Fact system outperforms general fact-checking baseline systems in both closed- and open-domain scenarios, ensuring stable and accurate verdicts while providing concise and convincing real-time explanations for fact-checking decisions.

翻訳日:2023-12-12 18:27:46 公開日:2023-12-10

# 軽量列車のMLP型視覚異常検出のための空間的動的蒸留法

Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains ( http://arxiv.org/abs/2312.05832v1 )

ライセンス: Link先を確認

Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren

(参考訳) 物体検出タスクにおける畳み込みニューラルネットワーク(CNN)の適用は成功したが、貨物列車画像から断層を検出する効率は、実際のエンジニアリングシナリオの実装には不十分である。従来のcnnにおける空間的不変性とプーリング層の既存モデルの欠点は、重要なグローバル情報の無視をしばしば無視し、貨物列車の故障異状タスクのエラーローカライズに繋がる。これらの問題を解決するため,貨物列車の視覚的故障検出のための多層パーセプトロン(MLP)に基づく空間的動的蒸留フレームワークを設計した。我々はまず,MLPのようなアーキテクチャが空間不変性の課題を克服し,局所的およびグローバル的両手段を効果的に活用する軸シフト戦略を提案する。学生モデルと意味的不一致を効果的に解消できる動的教師機構を含む,教師を伴わない動的蒸留法を提案する。このようなアプローチは、グローバル空間および意味情報のモデル化に効率的なインスタンス埋め込みを利用する余分な監視信号として、低レベルの特徴の出現と高レベルのラベルセマンティクスから、より豊富な詳細を掘り下げている。さらに,提案した動的教師は,学生と共同で学習し,蒸留効率をより高めることができる。 6つの典型的な断層データセットで実施された大規模な実験により、我々の手法は現在の最先端検出器よりも優れており、より低い計算コストでリアルタイム検出を行うことができる。ソースコードは \url{https://github.com/MVME-HBUT/SDD-FTI-FDet} で入手できる。

Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}.

翻訳日:2023-12-12 18:27:24 公開日:2023-12-10

# 物理を意識した多忠実ベイズ最適化:一般化された定式化

Physics-Aware Multifidelity Bayesian Optimization: a Generalized Formulation ( http://arxiv.org/abs/2312.05831v1 )

ライセンス: Link先を確認

Francesco Di Fiore and Laura Mainini

(参考訳) マルチクエリ最適化問題に対する高忠実度モデルの導入は、各クエリでの評価に要する計算コストに大きく制限されている。 multifidelity bayesian methods (mfbo) は、クエリのサブセレクションのみに対してコストのかかる高忠実度応答を含めることができ、高速な低忠実度モデルを使用して最適化プロセスを高速化できる。 State-of-the-artメソッドは純粋にデータ駆動型検索に依存しており、物理的なコンテキストに関する明確な情報は含まない。本稿では,これらのデータ駆動探索を高速化するために,工学的課題の物理領域に関する事前知識を活用できることを認め,mfboの最適化手順中にドメイン認識の形式を埋め込むための一般化した定式化を提案する。特に、バイアスをドメインの物理的構造をキャプチャする多元性獲得関数として定式化する。これにより、データ駆動検索がドメインプロパティのオンザフライ学習から部分的に緩和され、複数の情報ソースの管理が敏感に強化される。本手法は,全計算コストを抑えつつ最適化探索を誘導する高忠実度シミュレーションを効率よく組み込むことができる。物理学を意識した多元的ベイズ最適化を, 設計最適化と健康モニタリング問題という, 科学や工学でよく見られる2つの最適化問題に対して提示し, 解説した。

The adoption of high-fidelity models for many-query optimization problems is majorly limited by the significant computational cost required for their evaluation at every query. Multifidelity Bayesian methods (MFBO) allow to include costly high-fidelity responses for a sub-selection of queries only, and use fast lower-fidelity models to accelerate the optimization process. State-of-the-art methods rely on a purely data-driven search and do not include explicit information about the physical context. This paper acknowledges that prior knowledge about the physical domains of engineering problems can be leveraged to accelerate these data-driven searches, and proposes a generalized formulation for MFBO to embed a form of domain awareness during the optimization procedure. In particular, we formalize a bias as a multifidelity acquisition function that captures the physical structure of the domain. This permits to partially alleviate the data-driven search from learning the domain properties on-the-fly, and sensitively enhances the management of multiple sources of information. The method allows to efficiently include high-fidelity simulations to guide the optimization search while containing the overall computational expense. Our physics-aware multifidelity Bayesian optimization is presented and illustrated for two classes of optimization problems frequently met in science and engineering, namely design optimization and health monitoring problems.

翻訳日:2023-12-12 18:26:54 公開日:2023-12-10

# スケルトンに基づくアクションセグメンテーションのための分離時空間枠組み

A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation ( http://arxiv.org/abs/2312.05830v1 )

ライセンス: Link先を確認

Yunheng Li, Zhongyu Li, Shanghua Gao, Qilong Wang, Qibin Hou, Ming-Ming Cheng

(参考訳) 識別時空間情報を効果的にモデル化することは、長い行動系列のセグメンテーション活動に不可欠である。しかし, 既存の手法では, 2種類のデカップリングモデリングにより, 弱時空間モデリング能力に制限がある。 (i)カスケード相互作用は空間的・時間的モデリングを結合し、長列上での運動のモデリングを行う。 (ii)ジョイント共有時空間モデリングは、異なるジョイントの動きパターンを無視して、各ジョイントをモデリングするために共有ウェイトを採用する。本稿では,これらの問題に対処するための分散時空間フレームワーク(DeST)を提案する。まず,複数の時空間ブロックの積み重ねを回避し,十分な時空間相互作用を実現する。具体的には、DeSTは一度統一された空間モデルを実行し、空間的特徴を異なるサブフィーチャーのグループに分割し、異なるレイヤから時間的特徴と適応的に相互作用する。異なるサブフィーチャは異なる空間意味を含むため、モデルは各層で最適な相互作用パターンを学ぶことができる。一方,異なる関節が異なる速度で動くという事実に触発されて,個別に訓練可能な重みを用いて各関節の時間的特徴を捉えるジョイント分離時空間モデリングを提案する。異なるシーンの4つの大規模なベンチマークでは、DeSTは計算の複雑さを減らして現在の最先端の手法を著しく上回っている。

Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences. However, we observe that existing methods are limited in weak spatio-temporal modeling capability due to two forms of decoupled modeling: (i) cascaded interaction couples spatial and temporal modeling, which over-smooths motion modeling over the long sequence, and (ii) joint-shared temporal modeling adopts shared weights to model each joint, ignoring the distinct motion patterns of different joints. We propose a Decoupled Spatio-Temporal Framework (DeST) to address the above issues. Firstly, we decouple the cascaded spatio-temporal interaction to avoid stacking multiple spatio-temporal blocks, while achieving sufficient spatio-temporal interaction. Specifically, DeST performs once unified spatial modeling and divides the spatial features into different groups of subfeatures, which then adaptively interact with temporal features from different layers. Since the different sub-features contain distinct spatial semantics, the model could learn the optimal interaction pattern at each layer. Meanwhile, inspired by the fact that different joints move at different speeds, we propose joint-decoupled temporal modeling, which employs independent trainable weights to capture distinctive temporal features of each joint. On four large-scale benchmarks of different scenes, DeST significantly outperforms current state-of-the-art methods with less computational complexity.

翻訳日:2023-12-12 18:26:30 公開日:2023-12-10

# 運動画像の効率的なニューラル表現と実行のためのスパースマルチタスク学習

Sparse Multitask Learning for Efficient Neural Representation of Motor Imagery and Execution ( http://arxiv.org/abs/2312.05828v1 )

ライセンス: Link先を確認

Hye-Bin Shin, Kang Yin, Seong-Whan Lee

(参考訳) 脳-コンピュータインタフェース(BCI)におけるニューラルネットワーク解釈とユーザ意図の分類のための効率的なニューラルネットワークモデルを求める中で、基礎となる神経サブスペースのスパース表現を学習することが重要である。本研究では,人間の脳で観察される神経部分空間の自然な分割に着想を得た,運動画像(mi)と運動実行(me)タスクのためのスパースなマルチタスク学習フレームワークを提案する。 mi-me分類のためのdual-task cnnモデルが与えられた場合,sparsificationアプローチをpruneの超流動接続に適用し,両タスクにおいて高い重要性を示すものを強化する。提案手法では,各タスクに関連付けられ,共通するニューラルアンサンブルを解明し,冗長な接続を排除し,ニューラル信号復号の忠実性を高めるために,スペーシフィケーション手法を用いる。以上の結果から, この調整された疎水性は, オーバーフィッティング問題を緩和し, 少ないデータ量でテスト性能を向上させることを示唆し, 計算効率とロバストなBCIシステムの実現に向けての道のりが示唆された。

In the quest for efficient neural network models for neural data interpretation and user intent classification in brain-computer interfaces (BCIs), learning meaningful sparse representations of the underlying neural subspaces is crucial. The present study introduces a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks, inspired by the natural partitioning of associated neural subspaces observed in the human brain. Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections and reinforce those that show high importance in both tasks. Through our approach, we seek to elucidate the distinct and common neural ensembles associated with each task, employing principled sparsification techniques to eliminate redundant connections and boost the fidelity of neural signal decoding. Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data, suggesting a viable path forward for computationally efficient and robust BCI systems.

翻訳日:2023-12-12 18:26:08 公開日:2023-12-10

# 毒性流の検出

Detecting Toxic Flow ( http://arxiv.org/abs/2312.05827v1 )

ライセンス: Link先を確認

\'Alvaro Cartea, Gerardo Duran-Martin, Leandro S\'anchez-Betancourt

(参考訳) 本稿では,ブローカーがクライアントから受け取る有害取引を予測する枠組みを開発した。トキシック取引は、末層と部分空間の推定の射影に基づく統一(PULSE)と呼ばれる新しいオンラインベイズ手法で予測される。 pulseはベイジアンニューラルネットワークをシーケンシャルにトレーニングするための高速かつ統計効率の良いオンライン手順である。当社の方法論をテストするために、外国為替取引のプロプライエタリなデータセットを使用しています。 PULSEは、取引が有害になるかどうかを予測する際に、標準的な機械学習および統計手法よりも優れており、ベンチマーク手法はロジスティック回帰、ランダムフォレスト、再帰的に更新された最大様相推定器である。顧客から受け取った取引の内面化や外部化のために毒性予測を利用するブローカーのための戦略を考案する。パラメータの更新や予測に1ミリ秒未満を要するため,提案手法をリアルタイムに実装することができる。ベンチマークと比較すると、PULSEは最も高いPnLを獲得し、私たちが考慮する地平線における最大の損失を回避している。

This paper develops a framework to predict toxic trades that a broker receives from her clients. Toxic trades are predicted with a novel online Bayesian method which we call the projection-based unification of last-layer and subspace estimation (PULSE). PULSE is a fast and statistically-efficient online procedure to train a Bayesian neural network sequentially. We employ a proprietary dataset of foreign exchange transactions to test our methodology. PULSE outperforms standard machine learning and statistical methods when predicting if a trade will be toxic; the benchmark methods are logistic regression, random forests, and a recursively-updated maximum-likelihood estimator. We devise a strategy for the broker who uses toxicity predictions to internalise or to externalise each trade received from her clients. Our methodology can be implemented in real-time because it takes less than one millisecond to update parameters and make a prediction. Compared with the benchmarks, PULSE attains the highest PnL and the largest avoided loss for the horizons we consider.

翻訳日:2023-12-12 18:25:47 公開日:2023-12-10

# R2Human:1枚の画像からリアルタイムの3D画像表示

R2Human: Real-Time 3D Human Appearance Rendering from a Single Image ( http://arxiv.org/abs/2312.05826v1 )

ライセンス: Link先を確認

Qiao Feng, Yuanwang Yang, Yu-Kun Lai, Kun Li

(参考訳) ホログラフィックコミュニケーションと没入型社会体験を実現するためには,1枚の画像から3次元人間の外観を再構築することが不可欠である。しかし、これは、通常マルチカメラのセットアップに依存する、あるいはオフライン操作に限定される既存のメソッドにとって、依然として課題である。本稿では,1つの画像から実写的3次元人物像のリアルタイム推論とレンダリングを行う最初の手法であるr$^2$humanを提案する。我々のアプローチの中核は、暗黙のテクスチャフィールドと明示的なニューラルレンダリングの強みと、新しい表現であるZマップを組み合わせることである。そこで本研究では,可視領域の忠実度の高い色再構成を行い,オクルード領域の信頼性の高い色推定を行うエンドツーエンドネットワークを提案する。ネットワークの3次元知覚能力をさらに高めるために、フーリエ占有場を利用して、テクスチャフィールド生成の前駆体として機能し、レンダリング段階でサンプリング面を提供する詳細な3次元形状を再構築する。実験の結果,本手法は合成データと実世界画像の両方において最先端のパフォーマンスを達成し,オフラインメソッドを上回ることさえ可能であった。プロジェクトページは http://cic.tju.edu.cn/faculty/likun/projects/R2Human で研究目的で公開されている。

Reconstructing 3D human appearance from a single image is crucial for achieving holographic communication and immersive social experiences. However, this remains a challenge for existing methods, which typically rely on multi-camera setups or are limited to offline operations. In this paper, we propose R$^2$Human, the first approach for real-time inference and rendering of photorealistic 3D human appearance from a single image. The core of our approach is to combine the strengths of implicit texture fields and explicit neural rendering with our novel representation, namely Z-map. Based on this, we present an end-to-end network that performs high-fidelity color reconstruction of visible areas and provides reliable color inference for occluded regions. To further enhance the 3D perception ability of our network, we leverage the Fourier occupancy field to reconstruct a detailed 3D geometry, which serves as a prior for the texture field generation and provides a sampling surface in the rendering stage. Experiments show that our end-to-end method achieves state-of-the-art performance on both synthetic data and challenging real-world images and even outperforms many offline methods. The project page is available for research purposes at http://cic.tju.edu.cn/faculty/likun/projects/R2Human.

翻訳日:2023-12-12 18:25:29 公開日:2023-12-10

# オープンエンド・エンボディード・タスクの解決に向けて

Toward Open-ended Embodied Tasks Solving ( http://arxiv.org/abs/2312.05822v1 )

ライセンス: Link先を確認

William Wei Wang, Dongqi Han, Xufang Luo, Yifei Shen, Charles Ling, Boyu Wang, Dongsheng Li

(参考訳) 近年,人工知能(AI)を用いたロボットなどのエンボディエージェントの活用がますます重要になっている。大きな課題はタスクの開放性です。実際には、ロボットは多面的、動的、決定的な「終末状態」が欠如しており、訓練中に遭遇しなかった新しい目標でタスクを実行する必要があることが多い。この問題に対処するため,本稿では,オープンエンド目標に対して柔軟かつ動的にAIを計画・動作させることを目的とした新しいフレームワークである‘textit{Diffusion for Open-ended Goals} (DOG) を紹介する。 DOGは、オンライン計画と制御を適応的に行うために、拡散モデルの生成技術を最先端の訓練なし指導技術と相乗効果する。本評価は,迷路ナビゲーションとロボット制御の両問題において,訓練中に見つからない様々なタスク目標を,DOGが扱えることを示す。私たちの仕事は、AIの適応性とオープンな目標に取り組む能力を高めることに光を当てています。

Empowering embodied agents, such as robots, with Artificial Intelligence (AI) has become increasingly important in recent years. A major challenge is task open-endedness. In practice, robots often need to perform tasks with novel goals that are multifaceted, dynamic, lack a definitive "end-state", and were not encountered during training. To tackle this problem, this paper introduces \textit{Diffusion for Open-ended Goals} (DOG), a novel framework designed to enable embodied AI to plan and act flexibly and dynamically for open-ended task goals. DOG synergizes the generative prowess of diffusion models with state-of-the-art, training-free guidance techniques to adaptively perform online planning and control. Our evaluations demonstrate that DOG can handle various kinds of novel task goals not seen during training, in both maze navigation and robot control problems. Our work sheds light on enhancing embodied AI's adaptability and competency in tackling open-ended goals.

翻訳日:2023-12-12 18:25:06 公開日:2023-12-10

# ASVD:大規模言語モデル圧縮のためのアクティベーション対応特異値分解

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models ( http://arxiv.org/abs/2312.05821v1 )

ライセンス: Link先を確認

Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

(参考訳) 本稿では,大規模言語モデル (llm) を圧縮し,様々なコンピューティング環境において広く採用するための,ポストホックなトレーニングフリーな新しい圧縮パラダイムについて検討する。 LLM圧縮の課題、特に、広範囲なトレーニングデータと計算資源への依存について調べる。本稿では,これらの制約に対処するために,アクティベーション対応特異値分解(ASVD)と呼ばれるトレーニングフリーアプローチを提案する。 ASVDは、活性化分布に基づいて重み行列を調整し、分解精度と効率を向上させることにより、活性化出力を効果的に管理する。また, 最適層比分解のための繰り返しキャリブレーション法を用いて, 異なるLCM層の分解感度の変動に対処する。 ASVDは推論能力を失うことなく、ネットワークを10%から20%圧縮できることを示した。加えて、他のLLM圧縮パラダイムとシームレスに統合することができ、柔軟性のある互換性を示している。コードと圧縮されたモデルはhttps://github.com/hahnyuan/ASVD4LLMで入手できる。

This paper explores a new post-hoc training-free compression paradigm for compressing Large Language Models (LLMs) to facilitate their wider adoption in various computing environments. We delve into the challenges of LLM compression, notably their dependency on extensive training data and computational resources. We propose a training-free approach dubbed Activation-aware Singular Value Decomposition (ASVD) to address these limitations. ASVD effectively manages activation outliers by adjusting the weight matrix based on the activation distribution, improving decomposition accuracy and efficiency. Our method also addresses the varying sensitivity of different LLM layers to decomposition, with an iterative calibration process for optimal layer-specific decomposition. Experiments demonstrate that ASVD can compress network by 10%-20% without losing reasoning capacities. Additionally, it can be seamlessly integrated with other LLM compression paradigms, showcasing its flexible compatibility. Code and compressed models are available at https://github.com/hahnyuan/ASVD4LLM.

翻訳日:2023-12-12 18:24:45 公開日:2023-12-10

# ICTSurF:ニューラルネットワークによる連続時間生存機能

ICTSurF: Implicit Continuous-Time Survival Functions with Neural Networks ( http://arxiv.org/abs/2312.05818v1 )

ライセンス: Link先を確認

Chanon Puttanawarut, Panu Looareesuwan, Romen Samuel Wabina, Prut Saowaprut

(参考訳) 生存分析は、時間とともに事象の可能性を予測する方法として広く知られている。検閲されたサンプルを扱うという課題はまだ残っている。 Cox Proportional Hazards (CPH) モデルのような伝統的な手法は、比例的ハザードの強い仮定と共変量間の所定の関係による制限をヒンジする。ディープニューラルネットワーク(DNN)に基づくモデルの台頭は、生存分析における有効性の向上を証明している。本研究では,連続時間生存モデルに基づく暗黙的持続時間生存関数(ictsurf)を導入し,暗黙的表現による生存分布を構築する。これにより,ニューラルネットワークのアーキテクチャによらず,連続時間空間における入力を受け取り,連続時間空間における生存確率を生成することができる。既存手法との比較評価は,提案手法の高競争性を裏付けるものである。 ICTSurFの実装はhttps://github.com/44REAM/ICTSurFで公開されています。

Survival analysis is a widely known method for predicting the likelihood of an event over time. The challenge of dealing with censored samples still remains. Traditional methods, such as the Cox Proportional Hazards (CPH) model, hinge on the limitations due to the strong assumptions of proportional hazards and the predetermined relationships between covariates. The rise of models based on deep neural networks (DNNs) has demonstrated enhanced effectiveness in survival analysis. This research introduces the Implicit Continuous-Time Survival Function (ICTSurF), built on a continuous-time survival model, and constructs survival distribution through implicit representation. As a result, our method is capable of accepting inputs in continuous-time space and producing survival probabilities in continuous-time space, independent of neural network architecture. Comparative assessments with existing methods underscore the high competitiveness of our proposed approach. Our implementation of ICTSurF is available at https://github.com/44REAM/ICTSurF.

翻訳日:2023-12-12 18:24:28 公開日:2023-12-10

# 深層生成ネットワークに基づく音声合成のためのニューラル音声埋め込み

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks ( http://arxiv.org/abs/2312.05814v1 )

ライセンス: Link先を確認

Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee

(参考訳) 脳音声技術は、人工知能、脳-コンピュータインタフェース、音声合成の分野を含む学際的応用の融合を表す。ニューラル表現学習に基づく意図的復号と音声合成は、神経活動と人間の言語コミュニケーションの手段を直接接続し、コミュニケーションの自然性を大幅に向上させる。表現学習と音声合成技術の発展に関する最近の発見により、脳信号の音声への直接翻訳は大きな可能性を秘めている。特に、ニューラルネットワークに与えられた処理された入力特徴とニューラルスピーチ埋め込みは、脳信号からの音声生成に深い生成モデルを使用する場合、全体的なパフォーマンスにおいて重要な役割を果たす。本稿では,脳信号からの音声合成を可能とし,最終的には非言語コミュニケーションの革新を促進する現在の脳-音声技術を紹介する。また,音声合成作業において重要な役割を担っていると思われる,神経生理学的アクティベーションの基盤となる神経特徴や音声の埋め込みを包括的に分析する。

Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.

翻訳日:2023-12-12 18:24:12 公開日:2023-12-10

# 生成コンテンツを活用したフェデレーション学習

Federated Learning Empowered by Generative Content ( http://arxiv.org/abs/2312.05807v1 )

ライセンス: Link先を確認

Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

(参考訳) フェデレートラーニング(FL)は、プライバシ保護方法でモデルのトレーニングに分散プライベートデータを活用可能にする。しかし、データの不均一性は現在のFL法の性能を著しく制限する。本稿では,federative contentでプライベートデータを多角化することにより,データの不均一性問題を解決するために設計された,federcと呼ばれる新しいflフレームワークを提案する。 FedGCは単純な実装フレームワークであり、データ生成のワンショットステップのみを導入している。データ生成では,3つの重要かつ価値ある側面(予算割当,迅速な設計,世代指導)を要約し,各側面に対する3つのソリューション候補を提案する。具体的には,データ多様性と生成指導の忠実性とのトレードオフを改善するために,プロンプトと実データを同時に生成することを提案する。生成されたデータはプライベートデータとマージされ、ローカルモデルのトレーニングが容易になる。このような生成データはプライベートデータの多様性を高め、各クライアントが潜在的に偏ったプライベートデータに適合しないようにし、データの不均一性を緩和する。我々は、さまざまなベースライン、データセット、シナリオ、モダリティをカバーする、FedGCに関する体系的な実証的研究を行う。興味ある発見は, 1) 生成データとプライベートデータの間に顕著な相違がある場合でも, FL法の性能を一貫して, 著しく向上させ, 2) 性能とプライバシ保護の両立を図ることである。この作業が将来の作業に刺激を与え、flを生成コンテンツで強化する可能性をさらに探りたいと思っています。

Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. However, data heterogeneity significantly limits the performance of current FL methods. In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. FedGC is a simple-to-implement framework as it only introduces a one-shot step of data generation. In data generation, we summarize three crucial and worth-exploring aspects (budget allocation, prompt design, and generation guidance) and propose three solution candidates for each aspect. Specifically, to achieve a better trade-off between data diversity and fidelity for generation guidance, we propose to generate data based on the guidance of prompts and real data simultaneously. The generated data is then merged with private data to facilitate local model training. Such generative data increases the diversity of private data to prevent each client from fitting the potentially biased private data, alleviating the issue of data heterogeneity. We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities. Interesting findings include (1) FedGC consistently and significantly enhances the performance of FL methods, even when notable disparities exist between generative and private data; (2) FedGC achieves both better performance and privacy-preservation. We wish this work can inspire future works to further explore the potential of enhancing FL with generative content.

翻訳日:2023-12-12 18:23:55 公開日:2023-12-10

# x線ct画像における腰筋の分節化に関する3次元数値スキーム

Three-dimensional numerical schemes for the segmentation of the psoas muscle in X-ray computed tomography images ( http://arxiv.org/abs/2312.05887v1 )

ライセンス: Link先を確認

Giulio Paolucci, Isabella Cama, Cristina Campi, Michele Piana

(参考訳) 形態学的・機能的画像解析はサルコペン症、すなわち骨格筋の全身的喪失と多因子的エチオロジー的側面と相関する機能を評価するための正確なアプローチであることが判明した。 sarcopenia assessmentを放射線ワークフローに含めるには、セグメンテーションの信頼性とかなりの自動化を保証する画像処理のための計算パイプラインを実装する必要がある。本研究は,低線量X線CT画像における3次元数値計算手法を用いた。具体的には, レベルセットの方法論に着目し, 古典的進化モデルと3次元測地線モデルという2つの標準手法の性能と, 後者の1次修正による性能を比較した。この分析の結果, この勾配に基づくスキームは, 手動のセグメンテーションに関して信頼性を保証し, 1次スキームは2次アプローチで必要とされるものよりもはるかに小さい計算負荷を必要とすることがわかった。

The analysis of the psoas muscle in morphological and functional imaging has proved to be an accurate approach to assess sarcopenia, i.e. a systemic loss of skeletal muscle mass and function that may be correlated to multifactorial etiological aspects. The inclusion of sarcopenia assessment into a radiological workflow would need the implementation of computational pipelines for image processing that guarantee segmentation reliability and a significant degree of automation. The present study utilizes three-dimensional numerical schemes for psoas segmentation in low-dose X-ray computed tomography images. Specifically, here we focused on the level set methodology and compared the performances of two standard approaches, a classical evolution model and a three-dimension geodesic model, with the performances of an original first-order modification of this latter one. The results of this analysis show that these gradient-based schemes guarantee reliability with respect to manual segmentation and that the first-order scheme requires a computational burden that is significantly smaller than the one needed by the second-order approach.

翻訳日:2023-12-12 18:16:33 公開日:2023-12-10

# カーネルリッジ回帰に対する適応パラメータ選択

Adaptive Parameter Selection for Kernel Ridge Regression ( http://arxiv.org/abs/2312.05885v1 )

ライセンス: Link先を確認

Shao-Bo Lin

(参考訳) 本稿ではカーネルリッジ回帰(KRR)のパラメータ選択問題に焦点をあてる。 KRRの特別なスペクトル特性により、パラメータ間隔の微妙な分割が2つの連続KRR推定値の差を縮めることが分かる。そこで本研究では,krrの早期停止型パラメータ選択戦略を,いわゆるlepskii型原理に基づいて開発する。理論的検証は,提案したパラメータ選択戦略を備えたKRRが最適学習率の達成に成功し,異なる基準に適応し,カーネル手法のパラメータ選択の新たな記録を提供するための学習理論の枠組みとして提示される。

This paper focuses on parameter selection issues of kernel ridge regression (KRR). Due to special spectral properties of KRR, we find that delicate subdivision of the parameter interval shrinks the difference between two successive KRR estimates. Based on this observation, we develop an early-stopping type parameter selection strategy for KRR according to the so-called Lepskii-type principle. Theoretical verifications are presented in the framework of learning theory to show that KRR equipped with the proposed parameter selection strategy succeeds in achieving optimal learning rates and adapts to different norms, providing a new record of parameter selection for kernel methods.

翻訳日:2023-12-12 18:16:15 公開日:2023-12-10

# ディープニューラルネットワークを用いた高速サンプリング粒子タイミング検出器の精度向上

Using deep neural networks to improve the precision of fast-sampled particle timing detectors ( http://arxiv.org/abs/2312.05883v1 )

ライセンス: Link先を確認

Mateusz Kocot, Krzysztof Misan, Valentina Avati, Edoardo Bossini, Leszek Grzanka, Nicola Minafra

(参考訳) 粒子タイミング検出器の測定は、通過する粒子によって堆積された電荷の統計的変動によって生じる時間歩行効果に影響されることが多い。定数分数判別器(CFD)アルゴリズムは、CERNのLHCにおけるCMS-PPSシステムのようなテスト設定と実行実験の両方において、この効果を緩和するために頻繁に使用される。 CFDは単純で効果的であるが、時系列のすべての電圧サンプルを活用できない。その性能はディープニューラルネットワークによって強化され、粒子到着時間の計算を含む時系列解析に一般的に使用される。 desy-iiシンクロトロンの試験ビーム施設で得られたデータを用いて様々なニューラルネットワークアーキテクチャを評価し,ppsダイヤモンドタイミング検出器に加えて正確なmcp(マイクロチャネルプレート)検出器を設置した。 MCP測定は、ネットワークのトレーニング基準として使われ、その結果を標準CFD法と比較した。最終的に、検出器の読み出しチャンネルに応じて、タイミング精度を8%から23%改善しました。最善の結果は、古典的畳み込みネットワークや多層パーセプトロンよりも優れるunetモデルを用いて得られた。

Measurements from particle timing detectors are often affected by the time walk effect caused by statistical fluctuations in the charge deposited by passing particles. The constant fraction discriminator (CFD) algorithm is frequently used to mitigate this effect both in test setups and in running experiments, such as the CMS-PPS system at the CERN's LHC. The CFD is simple and effective but does not leverage all voltage samples in a time series. Its performance could be enhanced with deep neural networks, which are commonly used for time series analysis, including computing the particle arrival time. We evaluated various neural network architectures using data acquired at the test beam facility in the DESY-II synchrotron, where a precise MCP (MicroChannel Plate) detector was installed in addition to PPS diamond timing detectors. MCP measurements were used as a reference to train the networks and compare the results with the standard CFD method. Ultimately, we improved the timing precision by 8% to 23%, depending on the detector's readout channel. The best results were obtained using a UNet-based model, which outperformed classical convolutional networks and the multilayer perceptron.

翻訳日:2023-12-12 18:16:05 公開日:2023-12-10

# データ駆動型最適停止:純粋な探索分析

Data-driven optimal stopping: A pure exploration analysis ( http://arxiv.org/abs/2312.05880v1 )

ライセンス: Link先を確認

S\"oren Christensen, Niklas Dexheimer, Claudia Strauch

(参考訳) 最適停止の標準理論は、基礎となる過程が本質的に知られているという理想化された仮定に基づいている。本稿では, この制約を取り除き, 一般的な拡散過程におけるデータ駆動型最適停止について検討し, 最適停止障壁推定器の統計的性能について検討する。より具体的には、単純後悔に対する非漸近上界と、一様および非漸近性PAC境界を導出する。最小限の最適性は、単純な後悔に対する下界を一致させて上界結果を完成させることによって検証される。すべての結果は、給与関数の一般的な条件と、バイナリ分類で使われるマージン条件を模倣したより洗練された仮定の両方で示され、収束率の向上に繋がる。さらに,我々は,下限と上限の両方において,特定の探査・探査戦略の累積的後悔に単純な後悔を移す結果について検討した。

The standard theory of optimal stopping is based on the idealised assumption that the underlying process is essentially known. In this paper, we drop this restriction and study data-driven optimal stopping for a general diffusion process, focusing on investigating the statistical performance of the proposed estimator of the optimal stopping barrier. More specifically, we derive non-asymptotic upper bounds on the simple regret, along with uniform and non-asymptotic PAC bounds. Minimax optimality is verified by completing the upper bound results with matching lower bounds on the simple regret. All results are shown both under general conditions on the payoff functions and under more refined assumptions that mimic the margin condition used in binary classification, leading to an improved rate of convergence. Additionally, we investigate how our results on the simple regret transfer to the cumulative regret for a specific exploration-exploitation strategy, both with respect to lower bounds and upper bounds.

翻訳日:2023-12-12 18:15:45 公開日:2023-12-10

# 野生の運動を解き放たれた : チーターにおけるマーカーレス3次元運動学と力推定

Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs ( http://arxiv.org/abs/2312.05879v1 )

ライセンス: Link先を確認

Zico da Silva, Stacy Shield, Penny E. Hudson, Alan M. Wilson, Fred Nicolls and Amir Patel

(参考訳) 野生動物におけるマニピュラビリティの複雑なダイナミクスの研究は極めて困難である。チーター(\textit{Acinonyx jubatus}$)は、その不整合速度と操作性に大きな関心があるにもかかわらず、これらの動物から完全な全身の動きデータを取得することは未解決の問題のままである。これは野生のチーターでは特に困難であり、使用する方法が遠方であり、動物の動きを拘束しないことが不可欠である。本研究では,野生のチーターから得られたデータを用いて3次元運動量と関節トルクを遠隔で推定する軌道最適化手法を提案する。この手法をK-FTE (etic full trajectory Estimation) と呼ぶ。本手法は,同期ビデオとフォースプレートデータからなるデータセット上で検証する。 3次元キネマティックスの平均再投影誤差は17.69ピクセル (62.94 $\%$ pck は鼻から眼までの長さをしきい値とする) であり, 推定値では, 力板データと比較すると, 平均根-平均2乗誤差は171.3 n (約17.16$\%$ of peak force during stride) である。ジョイントトルクは地上の真理データに対して直接検証することはできないが、チーターではそのようなデータが利用できないため、推定トルクは制御された設定における以前の四重項の研究と一致する。これらの結果は、生物学者とロボット工学者の両方にとって、より自然な環境における動物の移動の研究に深い洞察をもたらすだろう。

The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods used are remote and do not constrain the animal's motion. In this work, we use data obtained from cheetahs in the wild to present a trajectory optimisation approach for estimating the 3D kinematics and joint torques of subjects remotely. We call this approach kinetic full trajectory estimation (K-FTE). We validate the method on a dataset comprising synchronised video and force plate data. We are able to reconstruct the 3D kinematics with an average reprojection error of 17.69 pixels (62.94 $\%$ PCK using the nose-to-eye(s) length segment as a threshold), while the estimates produce an average root-mean-square error of 171.3 N ($\approx$ 17.16 $\%$ of peak force during stride) for the estimated ground reaction force when compared against the force plate data. While the joint torques cannot be directly validated against ground truth data, as no such data is available for cheetahs, the estimated torques agree with previous studies of quadrupeds in controlled settings. These results will enable deeper insight into the study of animal locomotion in a more natural environment for both biologists and roboticists.

翻訳日:2023-12-12 18:15:29 公開日:2023-12-10

# 不均衡データから学習するスキュー確率型ニューラルネットワーク

Skew Probabilistic Neural Networks for Learning from Imbalanced Data ( http://arxiv.org/abs/2312.05878v1 )

ライセンス: Link先を確認

Shraddha M. Naik, Tanujit Chakraborty, Abdenour Hadid, Bibhas Chakraborty

(参考訳) 実世界のデータセットは、特定のクラスレベルが著しく過小評価されている不均衡なデータ分布を示すことが多い。このような場合、伝統的なパターン分類器は多数派に偏りを示し、少数派に対する正確な予測を妨げている。本稿では,確率論的ニューラルネットワーク(PNN)とスキュー正規確率カーネルを用いた不均衡なデータ指向アプローチを提案する。 PNNは確率的出力を提供することで知られており、予測信頼性と不確実性処理の定量化を可能にしている。柔軟性の向上,特に不均衡データと非対称データに対するスキュー正規分布の活用により,提案したスキュー確率ニューラルネットワーク(SkewPNN)は,下層のクラス密度をよりよく表現できる。不均衡データセットに対する提案手法の性能を最適化するには、ハイパーパラメータの微調整が不可欠である。この目的のために,人口ベースのヒューリスティックアルゴリズムであるbat最適化アルゴリズムを用いて,ハイパーパラメータ空間を効果的に探索する。また,サンプルサイズが大きくなるにつれて,真の分布が円滑に近づくことを示す密度推定の統計的整合性も証明する。種々のベンチマーク不均衡学習者を比較し,異なる合成データセットを用いて実験シミュレーションを行った。我々の実データ分析によると、SkewPNNは、ほとんどの実験環境でバランスの取れたデータセットと不均衡なデータセットの両方に対して、最先端の機械学習手法を大幅に上回っている。

Real-world datasets often exhibit imbalanced data distribution, where certain class levels are severely underrepresented. In such cases, traditional pattern classifiers have shown a bias towards the majority class, impeding accurate predictions for the minority class. This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel to address this major challenge. PNNs are known for providing probabilistic outputs, enabling quantification of prediction confidence and uncertainty handling. By leveraging the skew normal distribution, which offers increased flexibility, particularly for imbalanced and non-symmetric data, our proposed Skew Probabilistic Neural Networks (SkewPNNs) can better represent underlying class densities. To optimize the performance of the proposed approach on imbalanced datasets, hyperparameter fine-tuning is imperative. To this end, we employ a population-based heuristic algorithm, Bat optimization algorithms, for effectively exploring the hyperparameter space. We also prove the statistical consistency of the density estimates which suggests that the true distribution will be approached smoothly as the sample size increases. Experimental simulations have been conducted on different synthetic datasets, comparing various benchmark-imbalanced learners. Our real-data analysis shows that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.

翻訳日:2023-12-12 18:14:56 公開日:2023-12-10

# 2023年XCSP3コンペティションの成果

Proceedings of the 2023 XCSP3 Competition ( http://arxiv.org/abs/2312.05877v1 )

ライセンス: Link先を確認

Gilles Audemard, Christophe Lecoutre, Emmanuel Lonca

(参考訳) この文書は2023年のXCSP3コンペティションの手続きを表している。この制約解決の競争の結果はCP'23(カナダのトロントで2023年8月27日から31日まで開催された第29回制約プログラミングの原則と実践に関する国際会議)で発表された。

This document represents the proceedings of the 2023 XCSP3 Competition. The results of this competition of constraint solvers were presented at CP'23 (the 29th International Conference on Principles and Practice of Constraint Programming, held in Toronto, Canada from 27th to 31th August, 2023).

翻訳日:2023-12-12 18:14:33 公開日:2023-12-10

# 効率的なニューラルネットワークのためのクラスアウェアプルーニング

Class-Aware Pruning for Efficient Neural Networks ( http://arxiv.org/abs/2312.05875v1 )

ライセンス: Link先を確認

Mengnan Jiang, Jingcun Wang, Amro Eldebiky, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Grace Li Zhang

(参考訳) ディープニューラルネットワーク(DNN)は様々な分野で顕著な成功を収めている。しかし、DNNにおける多数の浮動小数点演算(FLOP)は、エッジデバイスのようなリソース制約のアプリケーションに展開する上での課題となっている。この問題に対処するため、DNNの実行における計算コストを削減するためにプルーニングが導入された。従来のプルーニング戦略は、重量値、勾配値、アクティベーション出力に基づいている。本稿では,dnnを圧縮するクラスアウェアプルーニング手法を提案し,dnnの計算コストを削減するための新しい視点を提供する。各イテレーションで、ニューラルネットワークのトレーニングが変更され、クラス認識の刈り込みが容易になる。その後、クラス数に関するフィルタの重要性が評価される。いくつかのクラスでのみ重要なフィルタは削除される。ニューラルネットワークは、発生した精度の損失を補償するために再トレーニングされる。プルーニングのイテレーションは、フィルタがなくなるまで終了し、残りのフィルタが多くのクラスにとって非常に重要であることを示す。このプルーニング法は, 従来のプルーニング法よりも精度, プルーニング率, FLOPsの低減に優れていた。実験の結果, このクラスアウェアプルーニング手法は, 高い推定精度を維持しつつ, 重みとフラップ数を大幅に削減できることがわかった。

Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy.

翻訳日:2023-12-12 18:14:27 公開日:2023-12-10

# CasADiの学習: 数値最適化におけるデータ駆動モデル

Learning for CasADi: Data-driven Models in Numerical Optimization ( http://arxiv.org/abs/2312.05873v1 )

ライセンス: Link先を確認

Tim Salzmann, Jon Arrizabalaga, Joel Andersson, Marco Pavone and Markus Ryll

(参考訳) 実世界の問題は分析的に解析することが難しいことが多いが、深層学習は複雑なプロセスをデータからモデル化する上で優れている。 CasADiのような既存の最適化フレームワークは、ソルバのシームレスな使用を容易にするが、学習プロセスモデルを数値最適化に統合する際の課題に直面している。このギャップに対処するため、我々はLearning for CasADi (L4CasADi) フレームワークを提案し、PyTorchで学習したモデルをCasADiとシームレスに統合し、効率的かつハードウェアアクセラレーションのある数値最適化を可能にする。 l4casadiの適用性は2つのチュートリアル例で示されている: まず, 乱流がピトルチモデルで表されるエネルギー効率のために, 乱流河川における魚の軌跡を最適化する。第2に,L4CasADi を用いた最適制御において,暗黙のニューラルラジアンスフィールド環境表現を容易に活用できることを示す。 L4CasADiはサンプルとドキュメントとともに、MITライセンス下でhttps://github.com/Tim-Salzmann/l4casadiで利用可能である。

While real-world problems are often challenging to analyze analytically, deep learning excels in modeling complex processes from data. Existing optimization frameworks like CasADi facilitate seamless usage of solvers but face challenges when integrating learned process models into numerical optimizations. To address this gap, we present the Learning for CasADi (L4CasADi) framework, enabling the seamless integration of PyTorch-learned models with CasADi for efficient and potentially hardware-accelerated numerical optimization. The applicability of L4CasADi is demonstrated with two tutorial examples: First, we optimize a fish's trajectory in a turbulent river for energy efficiency where the turbulent flow is represented by a PyTorch model. Second, we demonstrate how an implicit Neural Radiance Field environment representation can be easily leveraged for optimal control with L4CasADi. L4CasADi, along with examples and documentation, is available under MIT license at https://github.com/Tim-Salzmann/l4casadi

翻訳日:2023-12-12 18:14:09 公開日:2023-12-10

# tabiic:反復的およびインタラクティブなクラスタリングによる分類学的構築

TaBIIC: Taxonomy Building through Iterative and Interactive Clustering ( http://arxiv.org/abs/2312.05866v1 )

ライセンス: Link先を確認

Mathieu d'Aquin

(参考訳) 分類学を構築することは、しばしばオントロジーを構築する重要な部分であり、関連するデータから分類学を作成するための多くの試みがなされている。このようなアプローチにおける考え方は、概念のインテンションの関連する定義を、データ内のパターン(例えば形式的概念解析)として抽出することができるか、あるいは類似性(クラスタリング)に基づいてデータオブジェクトをグループ化することによって拡張を構築することができる。いずれの場合も、プロセスは自動的に構築される構造につながり、大きすぎるか定義に欠ける可能性があるか、きめ細かな細かな細部が多すぎるため、望まれる分類に洗練される必要がある。本稿では、反復的かつインタラクティブなプロセスにおいて、両方のアプローチからインスピレーションを得る方法について検討し、これらの概念をデータ中に特定する際に、分類学における概念の洗練と定義が生じるようにする。本稿では,本手法が様々なデータソースに適用可能であることを示し,オントロジーにより直接的に組み込むことができる分類学につながることを示す。

Building taxonomies is often a significant part of building an ontology, and many attempts have been made to automate the creation of such taxonomies from relevant data. The idea in such approaches is either that relevant definitions of the intension of concepts can be extracted as patterns in the data (e.g. in formal concept analysis) or that their extension can be built from grouping data objects based on similarity (clustering). In both cases, the process leads to an automatically constructed structure, which can either be too coarse and lacking in definition, or too fined-grained and detailed, therefore requiring to be refined into the desired taxonomy. In this paper, we explore a method that takes inspiration from both approaches in an iterative and interactive process, so that refinement and definition of the concepts in the taxonomy occur at the time of identifying those concepts in the data. We show that this method is applicable on a variety of data sources and leads to taxonomies that can be more directly integrated into ontologies.

翻訳日:2023-12-12 18:13:51 公開日:2023-12-10

# 自己組織化マップを用いたニューラルネットワークの概念表現の探索

Finding Concept Representations in Neural Networks with Self-Organizing Maps ( http://arxiv.org/abs/2312.05864v1 )

ライセンス: Link先を確認

Mathieu d'Aquin

(参考訳) 十分複雑なタスクでは、ニューラルネットワークは問題解決の副作用として、その問題の表現に関する関連する抽象化を学習することが期待される。これは特に、ニューラルネットワーク内の特定のユニット(ニューロン)の活性化と、画像に存在する視覚的概念(テクスチャ、色、オブジェクト)の間に相関があることが、多くの研究で示されている機械ビジョンにおいて確認されている。本稿では, ニューラルネットワークの層全体の活性化ベクトルが, 「女性」や「リアリズム画家」といった抽象概念の神経表現とどのように対応しているかを視覚的に, 計算的に検討する。ネットワークのレイヤにおける概念表現のレベルを評価するために,これらのマップに適用する複数の尺度を実験する。実験の結果, 概念の活性化マップの相対エントロピーは, データ全体のマップと比較して適切な候補であり, 概念の神経表現を同定し, 視覚化し, 目の前の予測課題の解決におけるその重要性を理解するための方法論の一部として利用することができることがわかった。

In sufficiently complex tasks, it is expected that as a side effect of learning to solve a problem, a neural network will learn relevant abstractions of the representation of that problem. This has been confirmed in particular in machine vision where a number of works showed that correlations could be found between the activations of specific units (neurons) in a neural network and the visual concepts (textures, colors, objects) present in the image. Here, we explore the use of self-organizing maps as a way to both visually and computationally inspect how activation vectors of whole layers of neural networks correspond to neural representations of abstract concepts such as `female person' or `realist painter'. We experiment with multiple measures applied to those maps to assess the level of representation of a concept in a network's layer. We show that, among the measures tested, the relative entropy of the activation map for a concept compared to the map for the whole data is a suitable candidate and can be used as part of a methodology to identify and locate the neural representation of a concept, visualize it, and understand its importance in solving the prediction task at hand.

翻訳日:2023-12-12 18:13:32 公開日:2023-12-10

# 256ベースのビデオ:ゼロショットビデオ編集のための空間的期待-最大化インバージョン

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing ( http://arxiv.org/abs/2312.05856v1 )

ライセンス: Link先を確認

Maomao Li, Yu Li, Tianyu Yang, Yunfei Liu, Dongxu Yue, Zhihui Lin, and Dong Xu

(参考訳) 本稿では,ゼロショット映像編集のためのビデオインバージョン手法を提案する。既存のビデオ編集方法は、通常、編集の前に2D DDIMのインバージョンやna\の時空間DDIMのインバージョンを適用する。多くの既存手法と異なり,より高密度な映像特徴を期待最大化法で定式化し,映像全体を表現するためのよりコンパクトなベースを反復的に推定する空間的期待最大化(STEM)インバージョンを提案する。各フレームはインバージョンに対して固定的かつグローバルな表現を適用し、再構成と編集の間は時間的一貫性に親しみやすい。我々のSTEMインバージョンは2つの最先端ビデオ編集法において一貫した改善を達成できることを示す。

This paper presents a video inversion approach for zero-shot video editing, which aims to model the input video with low-rank representation during the inversion process. The existing video editing methods usually apply the typical 2D DDIM inversion or na\"ive spatial-temporal DDIM inversion before editing, which leverages time-varying representation for each frame to derive noisy latent. Unlike most existing approaches, we propose a Spatial-Temporal Expectation-Maximization (STEM) inversion, which formulates the dense video feature under an expectation-maximization manner and iteratively estimates a more compact basis set to represent the whole video. Each frame applies the fixed and global representation for inversion, which is more friendly for temporal consistency during reconstruction and editing. Extensive qualitative and quantitative experiments demonstrate that our STEM inversion can achieve consistent improvement on two state-of-the-art video editing methods.

翻訳日:2023-12-12 18:13:09 公開日:2023-12-10

# NeVRF:長周期画像のためのニューラルビデオベース放射場

NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences ( http://arxiv.org/abs/2312.05855v1 )

ライセンス: Link先を確認

Minye Wu, Tinne Tuytelaars

(参考訳) 長周期動的シーケンスへのニューラルレージアンス場(NeRF)の適用は困難である。既存の手法では、品質とストレージサイズのバランスがとれず、トポロジカルな変化や大きな動きといった複雑なシーンの変化で困難に直面する。これらの課題に対処するために,ニューラルビデオベース放射場(NeVRF)の表現を提案する。 NeVRFは、画像ベースのレンダリングを備えたニューラルラディアンスフィールドをマージし、長期のダイナミックな内向きシーンにおけるフォトリアリスティックなノベルビュー合成をサポートする。本稿では,マルチビュー映像から直接放射率を予測するために,新しいマルチビューラディアンスブレンディング手法を提案する。連続的な学習手法を取り入れることで、NeVRFは、以前のフレームを再考することなく、シーケンシャルデータからフレームを効率的に再構築することができる。さらに、最適化された圧縮アプローチにより、NeVRFは動的シーンをコンパクトに表現することができ、現実のシナリオにおいて動的放射場をより実用的なものにすることができる。我々は,NeVRFの長期配列レンダリング,シーケンシャルデータ再構成,コンパクトデータ記憶における有効性を示した。

Adopting Neural Radiance Fields (NeRF) to long-duration dynamic sequences has been challenging. Existing methods struggle to balance between quality and storage size and encounter difficulties with complex scene changes such as topological changes and large motions. To tackle these issues, we propose a novel neural video-based radiance fields (NeVRF) representation. NeVRF marries neural radiance field with image-based rendering to support photo-realistic novel view synthesis on long-duration dynamic inward-looking scenes. We introduce a novel multi-view radiance blending approach to predict radiance directly from multi-view videos. By incorporating continual learning techniques, NeVRF can efficiently reconstruct frames from sequential data without revisiting previous frames, enabling long-duration free-viewpoint video. Furthermore, with a tailored compression approach, NeVRF can compactly represent dynamic scenes, making dynamic radiance fields more practical in real-world scenarios. Our extensive experiments demonstrate the effectiveness of NeVRF in enabling long-duration sequence rendering, sequential data reconstruction, and compact data storage.

翻訳日:2023-12-12 18:12:50 公開日:2023-12-10

# 複合生存分析:補助集約ベースラインと生存スコアを用いた学習

Composite Survival Analysis: Learning with Auxiliary Aggregated Baselines and Survival Scores ( http://arxiv.org/abs/2312.05854v1 )

ライセンス: Link先を確認

Chris Solomou

(参考訳) 生存分析(Survival Analysis, SA)は、時間とともに発生するわずかな事象の事象確率を推定できるため、時間とイベントのモデリングのデフォルト手法である。本研究は,saモデルの全表現を(1)集団の行動全体を把握するベースラインハザードに分解し,(2)特定のメンバーの特異な確率的ダイナミクスをモデル化する独立分散サバイバルスコアに完全にパラメトリックな設定で分解することにより,saモデルのトレーニングと推論を改善する方法を示す。提案手法は, 直交観測地平線を動的に処理し, 計算非効率なDeep Learning-based SA法や, MCMCを必要とするモデルを含む, 様々な実世界のデータセットにおける他の最先端手法と比較して, 競争力を発揮する。しかし,本手法は,微調整やハイパーパラメータ最適化を行なわず,出力から頑健な結果が得られる。

Survival Analysis (SA) constitutes the default method for time-to-event modeling due to its ability to estimate event probabilities of sparsely occurring events over time. In this work, we show how to improve the training and inference of SA models by decoupling their full expression into (1) an aggregated baseline hazard, which captures the overall behavior of a given population, and (2) independently distributed survival scores, which model idiosyncratic probabilistic dynamics of its given members, in a fully parametric setting. The proposed inference method is shown to dynamically handle right-censored observation horizons, and to achieve competitive performance when compared to other state-of-the-art methods in a variety of real-world datasets, including computationally inefficient Deep Learning-based SA methods and models that require MCMC for inference. Nevertheless, our method achieves robust results from the outset, while not being subjected to fine-tuning or hyperparameter optimization.

翻訳日:2023-12-12 18:12:31 公開日:2023-12-10

# InteractDiffusion:テキスト間拡散モデルにおける相互作用制御

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.05849v1 )

ライセンス: Link先を確認

Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu

(参考訳) 大規模テキスト・ツー・イメージ(t2i)拡散モデルは、テキスト記述に基づいてコヒーレントな画像を生成する素晴らしい能力を示しており、コンテンツ生成における広大な応用を可能にしている。近年, 物体の局所化, 姿勢, 画像の輪郭などの要因の制御が進んでいるが, 生成コンテンツ中の物体間の相互作用を制御できる重要なギャップが残っている。生成した画像内の対話をうまく制御することで、対話的なキャラクターで現実的なシーンを作るといった有意義な応用が可能になる。本研究では,三重項ラベル(人,行動,対象)と対応する境界ボックスからなる人間-対象間相互作用(hoi)情報を用いたt2i拡散モデルの条件付け問題について検討する。我々は、既存の訓練済みT2I拡散モデルを拡張して、相互作用により良い条件付けを可能にする、InteractDiffusionと呼ばれるプラグイン可能な相互作用制御モデルを提案する。具体的には、HOI情報をトークン化し、インタラクション埋め込みを通じてそれらの関係を学習する。条件付き自己アテンション層は、HOIトークンを視覚トークンにマッピングするように訓練され、既存のT2I拡散モデルにおいて視覚トークンをよりよく条件付ける。提案モデルでは,既存のT2I拡散モデルにおける相互作用と位置の制御が可能であり,HOI検出スコアの差が大きく,FIDおよびKIDの忠実度も大きく向上する。プロジェクトページ: https://jiuntian.github.io/interactdiffusion。

Large-scale text-to-image (T2I) diffusion models have showcased incredible capabilities in generating coherent images based on textual descriptions, enabling vast applications in content generation. While recent advancements have introduced control over factors such as object localization, posture, and image contours, a crucial gap remains in our ability to control the interactions between objects in the generated content. Well-controlling interactions in generated images could yield meaningful applications, such as creating realistic scenes with interacting characters. In this work, we study the problems of conditioning T2I diffusion models with Human-Object Interaction (HOI) information, consisting of a triplet label (person, action, object) and corresponding bounding boxes. We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions. Specifically, we tokenize the HOI information and learn their relationships via interaction embeddings. A conditioning self-attention layer is trained to map HOI tokens to visual tokens, thereby conditioning the visual tokens better in existing T2I diffusion models. Our model attains the ability to control the interaction and location on existing T2I diffusion models, which outperforms existing baselines by a large margin in HOI detection score, as well as fidelity in FID and KID. Project page: https://jiuntian.github.io/interactdiffusion.

翻訳日:2023-12-12 18:12:12 公開日:2023-12-10

# データフリーハードラベルロバストネス盗み攻撃

Data-Free Hard-Label Robustness Stealing Attack ( http://arxiv.org/abs/2312.05924v1 )

ライセンス: Link先を確認

Xiaojian Yuan, Kejiang Chen, Wen Huang, Jie Zhang, Weiming Zhang, Nenghai Yu

(参考訳) MLaaS(Machine Learning as a Service)の人気は、MLaaSをクエリすることでクローンモデルを構築することを目的とした、モデルステアリングアタック(MSA)に対する懸念の高まりにつながっている。現在、MLaaSに関するほとんどの研究は、MLaaSがソフトラベルを提供し、攻撃者は同様の分布を持つプロキシデータセットを持つと仮定している。しかし、ハードラベルだけがMLaaSによって返却され、データの分散が未解決のままである、より現実的なシナリオをカプセル化できない。さらに、既存の仕事の多くはモデルの正確さを盗み、モデルの堅牢さを怠り、セキュリティに敏感なシナリオ、例えばフェイススキャンの支払いにおいて堅牢性が不可欠である。特に、モデルのロバスト性を改善するには、しばしば、敵対的なトレーニングのような高価な技術を使う必要があるため、ロバスト性を盗む方がより有益である。そこで本研究では,これらのギャップに応答して,対象モデルのハードラベルを自然データを用いずに簡単にクエリすることで,モデル精度とロバスト性の両方を盗むことが可能な,データフリーなハードラベルロバストネス盗み (dfhl-rs) 攻撃を提案する。包括的実験により本手法の有効性が実証された。クローンモデルは77.86%のクリーンな精度と39.51%のロバストな精度を実現し、cifar-10データセットのターゲットモデルよりわずか4.71%と8.40%低く、ベースラインを大幅に上回っている。私たちのコードは、https://github.com/LetheSec/DFHL-RS-Attackで利用可能です。

The popularity of Machine Learning as a Service (MLaaS) has led to increased concerns about Model Stealing Attacks (MSA), which aim to craft a clone model by querying MLaaS. Currently, most research on MSA assumes that MLaaS can provide soft labels and that the attacker has a proxy dataset with a similar distribution. However, this fails to encapsulate the more practical scenario where only hard labels are returned by MLaaS and the data distribution remains elusive. Furthermore, most existing work focuses solely on stealing the model accuracy, neglecting the model robustness, while robustness is essential in security-sensitive scenarios, e.g., face-scan payment. Notably, improving model robustness often necessitates the use of expensive techniques such as adversarial training, thereby further making stealing robustness a more lucrative prospect. In response to these identified gaps, we introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model without the help of any natural data. Comprehensive experiments demonstrate the effectiveness of our method. The clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack, which are only 4.71% and 8.40% lower than the target model on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is available at: https://github.com/LetheSec/DFHL-RS-Attack

翻訳日:2023-12-12 18:05:18 公開日:2023-12-10

# 弱教師付きビデオ個数弱教師付ビデオ個数

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting ( http://arxiv.org/abs/2312.05923v1 )

ライセンス: Link先を確認

Xinyan Liu and Guorong Li and Yuankai Qi and Ziheng Yan and Zhenjun Han and Anton van den Hengel and Ming-Hsuan Yang and Qingming Huang

(参考訳) ビデオ個別カウント(VIC)は、単一のビデオ内のユニークな個人数を予測することを目的としている。 % の既存手法は, 個人に対する軌跡ラベルに基づく表現を学習する。 %) より現実的な実践的課題の反映として, トラジェクティブラベルが提供されない弱教師付きVICタスクを導入する。代わりに、ビューのフィールドに入るトラフィック(インフロー)とフィールドビュー(アウトフロー)を去るトラフィックを示す2種類のラベルが提供される。 % グループレベルのマッチングにおいて,タスクを弱教師付きコントラスト学習問題として定式化するベースラインとして,最初のソリューションを提案する。そこで我々は,ネットワークを駆動し,インフロー,アウトフロー,残りを識別するために,エンドツーエンドのトレーニング可能なソフトコントラスト損失を考案した。 % この方向への今後の研究を促進するため、既存のVICデータセットであるSenseCrowdとCroHDからアノテーションを生成し、新しいデータセットであるUAVVICを構築します。以上の結果から,我々のベースラインの弱弱弱弱弱弱化手法は教師付き手法よりも優れており,より実践的な弱弱弱化タスクへの移行においてほとんど情報が失われることが示唆された。コードとトレーニングされたモデルは、 \href{https://github.com/streamer-ap/cgnet}{cgnet}で公開される。

Video Individual Counting (VIC) aims to predict the number of unique individuals in a single video. % Existing methods learn representations based on trajectory labels for individuals, which are annotation-expensive. % To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided. Instead, two types of labels are provided to indicate traffic entering the field of view (inflow) and leaving the field view (outflow). % We also propose the first solution as a baseline that formulates the task as a weakly supervised contrastive learning problem under group-level matching. In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining. % To facilitate future study in this direction, we generate annotations from the existing VIC datasets SenseCrowd and CroHD and also build a new dataset, UAVVIC. % Extensive results show that our baseline weakly supervised method outperforms supervised methods, and thus, little information is lost in the transition to the more practically relevant weakly supervised task. The code and trained model will be public at \href{https://github.com/streamer-AP/CGNet}{CGNet}

翻訳日:2023-12-12 18:04:47 公開日:2023-12-10

# dig-csi:csiフィードバックトレーニングを支援する分散生成モデル

Dig-CSI: A Distributed and Generative Model Assisted CSI Feedback Training Framework ( http://arxiv.org/abs/2312.05921v1 )

ライセンス: Link先を確認

Zhilin Du, Haozhen Li, Zhenyu Liu, Shilong Fan, Xinyu Gu, Lin Zhang

(参考訳) ディープラーニング(DL)ベースのモデルの出現は、無線通信システムにおけるチャネル状態情報(CSI)フィードバック機構を大幅に進歩させた。しかし、従来のアプローチは、CSIデータ処理の集中的な性質のため、高い通信オーバーヘッドと潜在的なプライバシーリスクに悩まされることが多い。これらの課題に対処するため、我々はDig-CSIと呼ばれるCSIフィードバックトレーニングフレームワークを設計し、CSIフィードバックモデルをトレーニングするためのデータセットは、各ユーザ機器(UE)がアップロードした分散ジェネレータによって作成されるが、ローカルデータのアップロードは行わない。各ueは、デコーダを分散ジェネレータと見なすオートエンコーダを訓練し、ローカルデータを用いて再構成精度と生成能力を得る。実験結果から,Dig-CSIは従来の集中学習モデルと同等の性能のグローバルCSIフィードバックモデルを訓練できることがわかった。

The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called Dig-CSI, in which the dataset for training the CSI feedback model is produced by the distributed generators uploaded by each user equipment (UE), but not through local data upload. Each UE trains an autoencoder, where the decoder is considered as the distributed generator, with local data to gain reconstruction accuracy and the ability to generate. Experimental results show that Dig-CSI can train a global CSI feedback model with comparable performance to the model trained with classical centralized learning with a much lighter communication overhead.

翻訳日:2023-12-12 18:04:25 公開日:2023-12-10

# 自然画像マッティングのための拡散

Diffusion for Natural Image Matting ( http://arxiv.org/abs/2312.05915v1 )

ライセンス: Link先を確認

Yihan Hu, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei, Humphrey Shi

(参考訳) 我々は拡散を利用して、困難な画像マッチング課題に取り組むことを目指している。しかし、高い計算オーバーヘッドの存在とトレーニングと推論プロセス間のノイズサンプリングの不整合は、この目標を達成する上で大きな障害となる。本稿では,これらの課題を効果的に克服するソリューションであるdiffmatteを提案する。まず、DiffMatteはデコーダを複雑な結合されたマッティングネットワーク設計から切り離し、拡散プロセスのイテレーションで1つの軽量デコーダだけを含む。このような戦略により、diffmatteはサンプル数の増加に伴って計算オーバーヘッドの増大を緩和する。第2に,均一な時間間隔を持つ自己整合型トレーニング戦略を採用し,時間領域全体にわたるトレーニングと推論の一貫したノイズサンプリングを実現する。我々のDiffMatteは柔軟性を念頭に設計されており、シームレスに様々なモダンなマッティングアーキテクチャに統合できます。大規模な実験結果から,DiffMatteはコンポジション1kテストセットの最先端レベルに到達し,SAD測定値とMSE測定値でそれぞれ5%,15%のベストメソッドを上回り,他のベンチマークではより強力な一般化能力を示した。

We aim to leverage diffusion to address the challenging image matting task. However, the presence of high computational overhead and the inconsistency of noise sampling between the training and inference processes pose significant obstacles to achieving this goal. In this paper, we present DiffMatte, a solution designed to effectively overcome these challenges. First, DiffMatte decouples the decoder from the intricately coupled matting network design, involving only one lightweight decoder in the iterations of the diffusion process. With such a strategy, DiffMatte mitigates the growth of computational overhead as the number of samples increases. Second, we employ a self-aligned training strategy with uniform time intervals, ensuring a consistent noise sampling between training and inference across the entire time domain. Our DiffMatte is designed with flexibility in mind and can seamlessly integrate into various modern matting architectures. Extensive experimental results demonstrate that DiffMatte not only reaches the state-of-the-art level on the Composition-1k test set, surpassing the best methods in the past by 5% and 15% in the SAD metric and MSE metric respectively, but also show stronger generalization ability in other benchmarks.

翻訳日:2023-12-12 18:04:10 公開日:2023-12-10

# アンサンブルカルマンフィルタによるガウス過程状態空間モデルの変分推論

Ensemble Kalman Filtering-Aided Variational Inference for Gaussian Process State-Space Models ( http://arxiv.org/abs/2312.05910v1 )

ライセンス: Link先を確認

Zhidi Lin and Yiyong Sun and Feng Yin and Alexandre Thi\'ery

(参考訳) ガウス過程状態空間モデル(GPSSM)は、放射モデルを通して観測される潜在状態ダイナミクスをモデル化するための原理的かつ柔軟なアプローチを提供する。しかし、既存のGPSSMを学習するための変分法は、特に償却推論ネットワークの導入によって、多数のパラメータを最適化する上で大きな課題に直面している。この課題に対処するために,定評あるモデルベースフィルタリング手法であるアンサンブルカルマンフィルタ(enkf)を用いて,変動推論フレームワークにおける潜在状態の後方分布を近似する。このアプローチは推論ネットワークの必要性をなくし、変動パラメータの数を大幅に削減する。さらに,EnKFの助けを借りて,閉形式解を用いた複数項の和により,変分推論における近似的エビデンス下界(ELBO)の簡易評価が容易に得られることを示した。自動微分ツールを利用することで、ELBOを最大化し、GPSSMを効率的に訓練することができる。また,提案手法をオンライン環境に拡張し,包括的アルゴリズム解析と洞察を提供する。多様な実データとシミュレーションデータセットの大規模なテストは、我々の変分推論アルゴリズムがEnKFと統合され、学習と推論性能の点で既存の手法よりも優れていることを示す。

Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to model latent state dynamics observed through emission models. However, existing variational methods for learning GPSSMs face a substantial challenge in optimizing a large number of parameters, particularly with the introduction of amortized inference networks. To address this challenge, we leverage the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, to approximate the posterior distribution of latent states within the variational inference framework. This approach eliminates the need for inference networks, significantly reducing the number of variational parameters. Moreover, we demonstrate that with the aid of EnKF, the straightforward evaluation of approximated evidence lower bound (ELBO) in the variational inference can be easily obtained through the summation of multiple terms with closed-form solutions. By leveraging automatic differentiation tools, we thus can maximize the ELBO and train the GPSSM efficiently. We also extend the proposed method to an online setting and provide comprehensive algorithm analyses and insights. Extensive testing on diverse real and simulated datasets demonstrates that our variational inference algorithms, integrated with EnKF, outperform existing methods in terms of learning and inference performance.

翻訳日:2023-12-12 18:03:51 公開日:2023-12-10

# 近赤外表情認識のための確率微分方程式を用いた多エネルギー誘導画像変換

Multi-Energy Guided Image Translation with Stochastic Differential Equations for Near-Infrared Facial Expression Recognition ( http://arxiv.org/abs/2312.05908v1 )

ライセンス: Link先を確認

Bingjun Luo, Zewen Wang, Jinpeng Wang, Junjie Zhu, Xibin Zhao, Yue Gao

(参考訳) 照度変化は、現実世界の表情認識(FER)において長期にわたる課題である。非制御または可視光条件下では、近赤外(NIR)は、高画質の画像を取得し、可視領域に欠けている幾何学的およびテクスチャ的詳細を補うための単純で代替的なソリューションを提供することができる。既存の大規模なNIR表情データセットがないため、VIS FERメソッドを直接NIRスペクトルに拡張することは効果がない可能性がある。さらに、従来の異種画像合成法は、タスク知識のない低制御性によって制限される。これらの問題に対処するため、我々はNIR-FER確率微分方程式 (NFER-SDE) を初めて提案する。 NFER-SDEは、VISソースイメージ全体を入力として、ドメイン固有の知識とともに、画像の高周波コンテンツにおけるモダリティ不変情報の保存をガイドすることができる。大規模な実験およびアブレーション研究により、NFER-SDEはNIR FERの性能を著しく改善し、唯一利用可能な2つのNIR FERデータセットであるOulu-CASIAとLarge-HFEに対して最先端の結果を得ることが示された。

Illumination variation has been a long-term challenge in real-world facial expression recognition(FER). Under uncontrolled or non-visible light conditions, Near-infrared (NIR) can provide a simple and alternative solution to obtain high-quality images and supplement the geometric and texture details that are missing in the visible domain. Due to the lack of existing large-scale NIR facial expression datasets, directly extending VIS FER methods to the NIR spectrum may be ineffective. Additionally, previous heterogeneous image synthesis methods are restricted by low controllability without prior task knowledge. To tackle these issues, we present the first approach, called for NIR-FER Stochastic Differential Equations (NFER-SDE), that transforms face expression appearance between heterogeneous modalities to the overfitting problem on small-scale NIR data. NFER-SDE is able to take the whole VIS source image as input and, together with domain-specific knowledge, guide the preservation of modality-invariant information in the high-frequency content of the image. Extensive experiments and ablation studies show that NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

翻訳日:2023-12-12 18:03:31 公開日:2023-12-10

# 近赤外表情認識のためのハイパーグラフ誘導不等角スペクトルトランスフォーマネットワーク

Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition ( http://arxiv.org/abs/2312.05907v1 )

ライセンス: Link先を確認

Bingjun Luo, Haowen Wang, Jinpeng Wang, Junjie Zhu, Xibin Zhao, Yue Gao

(参考訳) 照明変化に対する強い堅牢性により、近赤外(NIR)は、低照度または完全な暗黒条件下での視覚的(VIS)表情認識を効果的かつ必須に補完することができる。しかし,NIR画像からの表情認識(FER)は,データスケールの制約や不完全な可視光コンテンツから識別的特徴を抽出することが困難であるため,従来のFERよりも困難である。本稿では,表情認識の深化を初めて試み,近赤外式トランスフォーマ(nfer-former)と呼ばれる新しい手法を提案する。具体的には、visの分野における豊富なラベル情報をフルに活用するために、入力画像から表現情報とスペクトル情報とを分離する自己対応直交分解機構を導入し、スペクトル変動の干渉を伴わずに表現特徴を抽出する。また,いくつかの重要な顔動作をモデル化し,それら間の複雑な相関構造を学習し,クラス間類似性の干渉を軽減するハイパーグラフガイド機能埋め込み手法を提案する。さらに,NFER-Formerの効率性を評価するために,360個の被験者を含む大規模なNIR-VIS顔表現データセットを構築した。大規模な実験とアブレーション研究により、NFER-FormerはNIR FERの性能を大幅に改善し、利用可能な2つのNIR FERデータセット(Oulu-CASIAとLarge-HFE)で最先端の結果が得られることが示された。

With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

翻訳日:2023-12-12 18:03:05 公開日:2023-12-10

# エッジレベルEgo-NetworkエンコーディングによるサブグラフGNNの改善

Improving Subgraph-GNNs via Edge-Level Ego-Network Encodings ( http://arxiv.org/abs/2312.05905v1 )

ライセンス: Link先を確認

Nurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicen\c{c} G\'omez

(参考訳) 本稿では,ノードとエッジの機能追加やメッセージパッシングフォーマットの拡張により,メッセージパッシンググラフニューラルネットワーク(mp-gnns)を高速化するグラフ学習のための,新たなエッジレベルのegoネットワーク符号化を提案する。提案した符号化法は,3WL相当グラフ群であるStrongly Regular Graphsを識別するのに十分である。このような符号化はノードベースのMP-GNNよりも表現力が高いことを示す。 10のグラフデータセットを持つ4つのベンチマークに対する実証的な評価では、実際の設定ではメモリ使用量を18.1倍削減しつつ、表現性、グラフ分類、グラフ回帰、近接タスクの以前のベースラインにマッチまたは改善しています。

We present a novel edge-level ego-network encoding for learning on graphs that can boost Message Passing Graph Neural Networks (MP-GNNs) by providing additional node and edge features or extending message-passing formats. The proposed encoding is sufficient to distinguish Strongly Regular Graphs, a family of challenging 3-WL equivalent graphs. We show theoretically that such encoding is more expressive than node-based sub-graph MP-GNNs. In an empirical evaluation on four benchmarks with 10 graph datasets, our results match or improve previous baselines on expressivity, graph classification, graph regression, and proximity tasks -- while reducing memory usage by 18.1x in certain real-world settings.

翻訳日:2023-12-12 18:02:38 公開日:2023-12-10

# ディープラーニングを用いた白内障手術ビデオの解析

Deep-Learning-Assisted Analysis of Cataract Surgery Videos ( http://arxiv.org/abs/2312.05900v1 )

ライセンス: Link先を確認

Negin Ghamsarian

(参考訳) 医療技術の進歩に伴い、手術室はインテリジェントな環境へと進化している。文脈認識システム(CAS)は、手術状態を包括的に解釈し、リアルタイム警告を可能にし、特に初心者外科医の意思決定を支援する。これらのシステムは、手術ビデオを自動的に分析し、インデクシング、文書化、手術後のレポート生成を行うことができる。このような自動システムに対する需要がますます高まる中、手術用ビデオ分析のための機械学習ベースのアプローチが生まれている。この論文は白内障手術ビデオ解析における重要な課題に対処し、効率的な文脈認識システム構築の道を開く。 1) 本論文は, 関連コンテンツの時空間的局所化が位相認識精度を大幅に向上させることを示す。 2)本論文は,白内障手術ビデオのリアルタイムストリーミングと適応ストレージを実現するための,関連性に基づく圧縮のための新しいディープラーニングフレームワークを提案する。 3)いくつかの畳み込みモジュールが提案され,ネットワークの意味解釈性能の向上が期待できる。これらの課題には、ぼかしと反射の歪み、透明性、変形性、色とテクスチャの変化、鈍いエッジ、スケールの変動などがある。 (4)白内障手術ビデオにおける自動不規則検出のための最初の枠組みを提案し,評価する。 (5)手動ピクセルベースのアノテーションの要件を軽減するため,セマンティックセグメンテーションに適応した自己教師付き表現学習のための新しい戦略を提案する。

Following the technological advancements in medicine, the operation rooms are evolving into intelligent environments. The context-aware systems (CAS) can comprehensively interpret the surgical state, enable real-time warning, and support decision-making, especially for novice surgeons. These systems can automatically analyze surgical videos and perform indexing, documentation, and post-operative report generation. The ever-increasing demand for such automatic systems has sparked machine-learning-based approaches for surgical video analysis. This thesis addresses the significant challenges in cataract surgery video analysis to pave the way for building efficient context-aware systems. The main contributions of this thesis are five folds: (1) This thesis demonstrates that spatio-temporal localization of the relevant content can considerably improve phase recognition accuracy. (2) This thesis proposes a novel deep-learning-based framework for relevance-based compression to enable real-time streaming and adaptive storage of cataract surgery videos. (3) Several convolutional modules are proposed to boost the networks' semantic interpretation performance in challenging conditions. These challenges include blur and reflection distortion, transparency, deformability, color and texture variation, blunt edges, and scale variation. (4) This thesis proposes and evaluates the first framework for automatic irregularity detection in cataract surgery videos. (5) To alleviate the requirement for manual pixel-based annotations, this thesis proposes novel strategies for self-supervised representation learning adapted to semantic segmentation.

翻訳日:2023-12-12 18:02:24 公開日:2023-12-10

# PSCR:AIGC画像品質評価のためのサンプリングベースのコントラスト回帰

PSCR: Patches Sampling-based Contrastive Regression for AIGC Image Quality Assessment ( http://arxiv.org/abs/2312.05897v1 )

ライセンス: Link先を確認

Jiquan Yuan, Xinyan Cao, Linjing Cao, Jinlong Lin, and Xixin Cao

(参考訳) 近年、AIGC(Artificial Intelligence Generated Content)はコンピュータ科学コミュニティを超えて広く注目を集めている。 AIGC画像品質評価(AIGCIQA、AIGC Image Quality Assessment)は、AIGIの連続的な生成から生じる様々な問題から、人間の知覚の観点からAIGIの品質を評価することを目的としており、コンピュータビジョン分野における新たなトピックとして浮上している。しかし、既存のほとんどのAIGCIQAメソッドは、単一の生成された画像から予測されたスコアを直接回帰し、AIGIとスコアの固有の違いを見落としている。さらに、リサイズやクロッピングのような操作は、大域的な幾何学的歪みや情報損失を引き起こし、モデルの性能を制限している。これらの問題に対処するために,パッチサンプリングに基づくコントラスト回帰(PSCR)フレームワークを提案する。より優れた表現空間を学習するために,様々な画像の差を利用したコントラスト回帰フレームワークを提案する。この領域では、画像間の差とスコアランキングを相対スコアで測定することができる。また,exemplar aigisを参照として選択することで,参照なし画像データベースでは参照画像が利用できない従来モデルの制限を克服した。画像入力における幾何歪みや情報損失を回避するため,パッチサンプリング戦略を提案する。提案するPSCRフレームワークの有効性を実証するため,AGIQA-1K, AGIQA-3K, AIGCIQA2023を含む3つの主流AIGCIQAデータベース上で広範囲に実験を行った。その結果,提案したPSCRフレームワークの導入により,モデル性能が大幅に向上した。コードは \url{https://github.com/jiquan123/PSCR} で入手できる。

In recent years, Artificial Intelligence Generated Content (AIGC) has gained widespread attention beyond the computer science community. Due to various issues arising from continuous creation of AI-generated images (AIGI), AIGC image quality assessment (AIGCIQA), which aims to evaluate the quality of AIGIs from human perception perspectives, has emerged as a novel topic in the field of computer vision. However, most existing AIGCIQA methods directly regress predicted scores from a single generated image, overlooking the inherent differences among AIGIs and scores. Additionally, operations like resizing and cropping may cause global geometric distortions and information loss, thus limiting the performance of models. To address these issues, we propose a patches sampling-based contrastive regression (PSCR) framework. We suggest introducing a contrastive regression framework to leverage differences among various generated images for learning a better representation space. In this space, differences and score rankings among images can be measured by their relative scores. By selecting exemplar AIGIs as references, we also overcome the limitations of previous models that could not utilize reference images on the no-reference image databases. To avoid geometric distortions and information loss in image inputs, we further propose a patches sampling strategy. To demonstrate the effectiveness of our proposed PSCR framework, we conduct extensive experiments on three mainstream AIGCIQA databases including AGIQA-1K, AGIQA-3K and AIGCIQA2023. The results show significant improvements in model performance with the introduction of our proposed PSCR framework. Code will be available at \url{https://github.com/jiquan123/PSCR}.

翻訳日:2023-12-12 18:01:59 公開日:2023-12-10

# 素粒子とのダークマター差動相互作用の原子プローブ

An atomic probe of dark matter differential interactions with elementary particles ( http://arxiv.org/abs/2312.05894v1 )

ライセンス: Link先を確認

Yossi Rosenzweig (1), Yevgeny Kats (1), Menachem Givon (1), Yonathan Japha (1) and Ron Folman (1) ((1) Department of Physics, Ben-Gurion University of the Negev, Israel)

(参考訳) 標準模型を超えて物理を探すことは実験物理学の主要な課題の一つである。ダークマターの候補には、アクシオンのような超軽いボソニック粒子がある。コマグネトメータは、そのような粒子と原子のスピンと相互作用するエキゾチックな磁場に対して超高感度プローブを形成する。本研究では, それらの磁場を発見し, スペクトルを測定できるだけでなく, サブ原子初等粒子, 電子, 中性子, 陽子との結合強度の比を決定するマルチ原子種プローブを提案する。さらに, このプローブの多面的特性は, 通常の磁場とアルカリ原子の光誘起架空の磁場の組み合わせによって生じる合成エキゾチック場によっても証明できることを示した。これらの合成場はまた、エキゾチック物理学のための任意の磁力計またはコマグネトメータプローブの正確な校正を可能にする。

Searching for physics beyond the Standard Model is one of the main tasks of experimental physics. Candidates for dark matter include axion-like ultralight bosonic particles. Comagnetometers form ultra-high sensitivity probes for such particles and any exotic field that interacts with the spin of an atom. Here, we propose a multi-atom-species probe that enables not only to discover such fields and measure their spectrum but also to determine the ratios of their coupling strengths to sub-atomic elementary particles, electrons, neutrons and protons. We further show that the multi-faceted capabilities of this probe may be demonstrated with synthetic exotic fields generated by a combination of regular magnetic fields and light-induced fictitious magnetic fields in alkali atoms. These synthetic fields also enable the accurate calibration of any magnetometer or comagnetometer probe for exotic physics.

翻訳日:2023-12-12 18:01:33 公開日:2023-12-10

# 局所赤外源を照射した超伝導量子ビットの準粒子ダイナミックス

Quasiparticle dynamics in a superconducting qubit irradiated by a localized infrared source ( http://arxiv.org/abs/2312.05892v1 )

ライセンス: Link先を確認

Rodrigo Benevides, Maxwell Drimmer, Giacomo Bisson, Francesco Adinolfi, Uwe von L\"upke, Hugo Michiel Doeleman, Gianluigi Catelani, Yiwen Chu

(参考訳) 超伝導量子ビットにおけるデコヒーレンスの既知の源は、クーパー対または準粒子の存在である。これらは、環境に存在する高エネルギー放射や、ハイブリッド量子デバイスのような目的に導入された高エネルギー放射によって生成される。本研究では,様々なパワー,持続時間,空間的位置の焦点をあてた赤外線照射により,照明下でのトランスモンキュービットの特性を体系的に研究する。入射光子の高エネルギーにもかかわらず、我々の観測はトラップが支配する低エネルギー準粒子ダイナミクスのモデルとよく一致する。この手法は、様々なジオメトリや材料を持つ超伝導回路に対する高エネルギー放射線の影響を理解し、緩和することができる。

A known source of decoherence in superconducting qubits is the presence of broken Cooper pairs, or quasiparticles. These can be generated by high-energy radiation, either present in the environment or purposefully introduced, as in the case of some hybrid quantum devices. Here, we systematically study the properties of a transmon qubit under illumination by focused infrared radiation with various powers, durations, and spatial locations. Despite the high energy of incident photons, our observations agree well with a model of low-energy quasiparticle dynamics dominated by trapping. This technique can be used for understanding and potentially mitigating the effects of high-energy radiation on superconducting circuits with a variety of geometries and materials.

翻訳日:2023-12-12 18:01:22 公開日:2023-12-10

# Maxwell-Amp\`{e}re-Nernst-Planck方程式に対する保守的ハイブリッド物理インフォームドニューラルネットワーク法

A conservative hybrid physics-informed neural network method for Maxwell-Amp\`{e}re-Nernst-Planck equations ( http://arxiv.org/abs/2312.05891v1 )

ライセンス: Link先を確認

Cheng Chang, Zhouping Xin, Tieyong Zeng

(参考訳) Maxwell-Amp\`{e}re-Nernst-Planck (MANP) 方程式は、荷電粒子の力学をモデル化するために最近提案されている。本研究では,このシステムの数値アルゴリズムを深層学習ツールを用いて拡張する。提案するハイブリッドアルゴリズムはダミー変数の適切な近似を決定する自動手法を提供する。さらに、元の方法は2次元問題に対して検証される。しかし、空間次元が 1 の場合、元のカールフリー緩和成分は適用不可能であり、ダミー変数の近似式は2次元シナリオでうまく機能するが、1 次元の場合において妥当な出力を与えることができない。提案手法は1次元の場合に容易に一般化できる。実験は1次元の場合のポアソン・ボルツマン型方程式から得られる定常解の数値安定性と良好な収束性を示す。 2次元の場合の実験は,提案手法が保存特性を保っていることを示唆する。

Maxwell-Amp\`{e}re-Nernst-Planck (MANP) equations were recently proposed to model the dynamics of charged particles. In this study, we enhance a numerical algorithm of this system with deep learning tools. The proposed hybrid algorithm provides an automated means to determine a proper approximation for the dummy variables, which can otherwise only be obtained through massive numerical tests. In addition, the original method is validated for 2-dimensional problems. However, when the spatial dimension is one, the original curl-free relaxation component is inapplicable, and the approximation formula for dummy variables, which works well in a 2-dimensional scenario, fails to provide a reasonable output in the 1-dimensional case. The proposed method can be readily generalised to cases with one spatial dimension. Experiments show numerical stability and good convergence to the steady-state solution obtained from Poisson-Boltzmann type equations in the 1-dimensional case. The experiments conducted in the 2-dimensional case indicate that the proposed method preserves the conservation properties.

翻訳日:2023-12-12 18:01:13 公開日:2023-12-10

# 効率的な境界伝搬と並列計算による#DNN検証ツールのスケーリング

Scaling #DNN-Verification Tools with Efficient Bound Propagation and Parallel Computing ( http://arxiv.org/abs/2312.05890v1 )

ライセンス: Link先を確認

Luca Marzari, Gabriele Roncolato and Alessandro Farinelli

(参考訳) ディープニューラルネットワーク(dnn)は、パターン認識から複雑なロボット問題に至るまで、多くのシナリオで驚くべき結果を示す強力なツールである。しかし、それらの複雑な設計と透明性の欠如は、現実世界のアプリケーションに適用された場合の安全性の懸念を引き起こす。この文脈において、DNNの形式検証(FV)は、安全面の証明可能な保証を提供する貴重なソリューションとして登場した。それにもかかわらず、バイナリ回答(すなわち、安全か安全か)は、安全モデルのランク付けや選択のような直接的安全介入に十分な情報がない可能性がある。この制限に対処するため、FV問題は、最近#DNN-Verificationと呼ばれるカウントバージョンに拡張され、与えられた安全プロパティのドメイン内の安全でない領域のサイズを計算した。それでも、問題の複雑さのため、既存のソリューションは、DNNが大規模で複雑な実世界のロボットシナリオにスケールするのに苦労している。本研究は,FVの進歩に触発されたこの限界に対処するため,DNNカウンタの高精度かつ近似的なFVの効率を高めるために,シンボリック線形緩和と並列計算を組み合わせた到達可能性解析に基づく新しい戦略を提案する。標準のfvベンチマークと現実的なロボットシナリオの実証的な評価は、スケーラビリティと効率が著しく向上し、複雑なロボットアプリケーションでもそのようなテクニックが利用できることを示した。

Deep Neural Networks (DNNs) are powerful tools that have shown extraordinary results in many scenarios, ranging from pattern recognition to complex robotic problems. However, their intricate designs and lack of transparency raise safety concerns when applied in real-world applications. In this context, Formal Verification (FV) of DNNs has emerged as a valuable solution to provide provable guarantees on the safety aspect. Nonetheless, the binary answer (i.e., safe or unsafe) could be not informative enough for direct safety interventions such as safety model ranking or selection. To address this limitation, the FV problem has recently been extended to the counting version, called #DNN-Verification, for the computation of the size of the unsafe regions in a given safety property's domain. Still, due to the complexity of the problem, existing solutions struggle to scale on real-world robotic scenarios, where the DNN can be large and complex. To address this limitation, inspired by advances in FV, in this work, we propose a novel strategy based on reachability analysis combined with Symbolic Linear Relaxation and parallel computing to enhance the efficiency of existing exact and approximate FV for DNN counters. The empirical evaluation on standard FV benchmarks and realistic robotic scenarios shows a remarkable improvement in scalability and efficiency, enabling the use of such techniques even for complex robotic applications.

翻訳日:2023-12-12 18:00:57 公開日:2023-12-10

# SuperPrimitive: 原始レベルでのシーン再構築

SuperPrimitive: Scene Reconstruction at a Primitive Level ( http://arxiv.org/abs/2312.05889v1 )

ライセンス: Link先を確認

Kirill Mazur, Gwangbin Bae, Andrew J. Davison

(参考訳) 画像群や単眼映像群からのジョイントカメラのポーズと密度幾何推定は、その計算の複雑さと固有の視覚的曖昧さのため、依然として困難な問題である。多くの高密度増分再構成システムは画像画素を直接操作し、多視点幾何学的手がかりを用いて3次元位置を解く。このようなピクセルレベルのアプローチは、多視点一貫性の曖昧さや違反(例えば、テクスチャのない表面や鏡面に起因する)に苦しむ。我々はスーパープリミティブと呼ばれる新しいイメージ表現でこの問題に対処する。超プリミティブは、イメージを意味的に相関した局所領域に分割し、それらを予測された表面正規方向で拡張することで得られる。これはスーパープリミティブ当たりの局所幾何学的推定を提供し、相対的な位置は多視点観測に基づいて調整される。新しい表現の汎用性を示すために,3つの3次元再構成課題である奥行き完了,運動からの少数視点構造,単眼密度視覚オドメトリに対処した。

Joint camera pose and dense geometry estimation from a set of images or a monocular video remains a challenging problem due to its computational complexity and inherent visual ambiguities. Most dense incremental reconstruction systems operate directly on image pixels and solve for their 3D positions using multi-view geometry cues. Such pixel-level approaches suffer from ambiguities or violations of multi-view consistency (e.g. caused by textureless or specular surfaces). We address this issue with a new image representation which we call a SuperPrimitive. SuperPrimitives are obtained by splitting images into semantically correlated local regions and enhancing them with estimated surface normal directions, both of which are predicted by state-of-the-art single image neural networks. This provides a local geometry estimate per SuperPrimitive, while their relative positions are adjusted based on multi-view observations. We demonstrate the versatility of our new representation by addressing three 3D reconstruction tasks: depth completion, few-view structure from motion, and monocular dense visual odometry.

翻訳日:2023-12-12 18:00:33 公開日:2023-12-10

# Fake it Till Make it: 合意指向世代によるフェデレーション学習

Fake It Till Make It: Federated Learning with Consensus-Oriented Generation ( http://arxiv.org/abs/2312.05966v1 )

ライセンス: Link先を確認

Rui Ye, Yaxin Du, Zhenyang Ni, Siheng Chen, Yanfeng Wang

(参考訳) フェデレートラーニング(FL)では、データの異質性はモデルの分散と性能の制限を引き起こす重要なボトルネックの1つである。これに対応するために、既存の手法では、データ不均一性を固有の特性とみなし、モデルを修正することでその悪影響を軽減することを提案する。本稿では,元となるデータセットを補完し,不均一性を根本的に緩和するデータを生成することによって,この特性を破ろうとする。データの観点からの新しい試みとして,コンセンサス指向生成(fedcog)を用いた連合学習を提案する。 FedCOGは、共有グローバルモデルから抽出されたデータを生成して元のデータセットを補完する補完データ生成と、生成されたデータに基づいてグローバルモデルからローカルモデルに知識を蒸留し、元の異種データセットの過度な適合を緩和する知識蒸留モデルトレーニングの2つの主要なコンポーネントで構成されている。 FedCOGには2つの重要な利点がある。 1)既存のFLメソッドの性能をさらに向上させるプラグイン・アンド・プレイモジュールとなり得る。 2)セキュアアグリゲーションのような標準flプロトコルと自然に互換性がある。古典的および実世界のFLデータセットに対する大規模な実験は、FedCOGが一貫して最先端の手法より優れていることを示している。

In federated learning (FL), data heterogeneity is one key bottleneck that causes model divergence and limits performance. Addressing this, existing methods often regard data heterogeneity as an inherent property and propose to mitigate its adverse effects by correcting models. In this paper, we seek to break this inherent property by generating data to complement the original dataset to fundamentally mitigate heterogeneity level. As a novel attempt from the perspective of data, we propose federated learning with consensus-oriented generation (FedCOG). FedCOG consists of two key components at the client side: complementary data generation, which generates data extracted from the shared global model to complement the original dataset, and knowledge-distillation-based model training, which distills knowledge from global model to local model based on the generated data to mitigate over-fitting the original heterogeneous dataset. FedCOG has two critical advantages: 1) it can be a plug-and-play module to further improve the performance of most existing FL methods, and 2) it is naturally compatible with standard FL protocols such as Secure Aggregation since it makes no modification in communication process. Extensive experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.

翻訳日:2023-12-12 17:53:14 公開日:2023-12-10

# 結果:電子健康記録作成のための論理制約付きシーケンスの合成

ConSequence: Synthesizing Logically Constrained Sequences for Electronic Health Record Generation ( http://arxiv.org/abs/2312.05964v1 )

ライセンス: Link先を確認

Brandon Theodorou, Shrusti Jain, Cao Xiao, and Jimeng Sun

(参考訳) 生成モデルは、実際のデータが使用できない、あるいは制限された場合に、分析タスクのための合成患者記録を生成することができる。しかし、現在の手法はドメイン固有の知識に固執し、無効なデータを削除するのに苦労している。本稿では,逐次生成型ニューラルネットワーク出力にドメイン知識を統合するための効果的な手法を提案する。我々の規則に基づく定式化は時間的集約と先行評価モジュールを含み、効率的な行列乗算定式化によって保証され、時間ステップ間のハードかつソフトな論理的制約を満たす。既存の制約手法は、しばしば制約満足度を保証することができず、時間的制約を扱う能力がなく、モデルの学習と計算効率を妨げる。対照的に,本手法は論理コヒーレンスを保証することで,全ての制約を効率的に処理する。本研究は,電子健康記録の作成において,実行時性能や生成的品質を損なうことなく,完全な時間的・空間的制約満足度を達成するための競争相手を上回り,その結果の有効性を示す。具体的には、ConSequenceは、モデル品質を改善しながら、テストの難易度を5%削減し、制約のないモデルに比べて生成速度が13%以下に低下する。

Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model.

翻訳日:2023-12-12 17:52:51 公開日:2023-12-10

# Aikyam: 聴覚障害とダムのためのビデオ会議ユーティリティ

Aikyam: A Video Conferencing Utility for Deaf and Dumb ( http://arxiv.org/abs/2312.05962v1 )

ライセンス: Link先を確認

Kshitij Deshpande, Varad Mashalkar, Kaustubh Mhaisekar, Amaan Naikwadi and Archana Ghotkar

(参考訳) パンデミックの到来に伴い、コミュニケーション手段としてのビデオ会議プラットフォームの使用が大幅に増加し、それに伴い遠隔地での機会も増えた。聴覚障害者と愚か者は伝統的にコミュニケーションのいくつかの問題に直面してきたが、現在ではその影響はより厳しく感じられている。本稿では、既存のビデオ会議プラットフォームと併用してこれらの問題に対処できる全アクセスビデオ会議ユーティリティを提案する。適切な意味的正しい文は、システムによって解釈されるシグナーのジェスチャーから生成される。この文を出力するオーディオと共に、ユーザのフィードも、その文に注釈をつけるために使用される。これはすべての参加者が見ることができ、すべての関係者との円滑なコミュニケーションを支援する。このユーティリティは、ジェスチャの分類に単純なLSTMモデルを使用する。文はt5ベースのモデルによって構築される。必要なデータフローを達成するために、仮想カメラを使用する。

With the advent of the pandemic, the use of video conferencing platforms as a means of communication has greatly increased and with it, so have the remote opportunities. The deaf and dumb have traditionally faced several issues in communication, but now the effect is felt more severely. This paper proposes an all-encompassing video conferencing utility that can be used with existing video conferencing platforms to address these issues. Appropriate semantically correct sentences are generated from the signer's gestures which would be interpreted by the system. Along with an audio to emit this sentence, the user's feed is also used to annotate the sentence. This can be viewed by all participants, thus aiding smooth communication with all parties involved. This utility utilizes a simple LSTM model for classification of gestures. The sentences are constructed by a t5 based model. In order to achieve the required data flow, a virtual camera is used.

翻訳日:2023-12-12 17:52:29 公開日:2023-12-10

# TransGlow:水流予測のためのグラフニューラルネットワークに基づく注意増強型トランスダクションモデル

TransGlow: Attention-augmented Transduction model based on Graph Neural Networks for Water Flow Forecasting ( http://arxiv.org/abs/2312.05961v1 )

ライセンス: Link先を確認

Naghmeh Shafiee Roudbari, Charalambos Poullis, Zachary Patterson, Ursula Eicker

(参考訳) 水量の水量予測は、水管理、洪水予測、洪水制御など様々な用途に有用である。しかし、水系の動的性質と限られたデータのため、作業は困難である。高度に相互接続された水系は、水力計の予測に大きな影響を与える。したがって、他のシステムコンポーネント間の関係を表すモデルを開発することが重要である。近年,河川流量予測,洪水予測,水質予測など,多くの水文学的応用が研究されている。既存の方法は、変数のペア間の隣接領域の影響をモデル化できない。本稿では,GCRN(Graph Convolution Recurrent Neural Network)エンコーダデコーダにおいて,アテンションメカニズムの効率的なバージョンを用いて隠れた状態を増大させる時空間予測モデルを提案する。注目層は、デコーダが入力シーケンスの異なる部分に選択的にアクセスできるようにする。水系は相互接続され、ステーション間の接続情報は暗黙的であるため、提案モデルはグラフ学習モジュールを利用してデータに基づいて疎グラフ隣接行列を適応的に抽出する。時空間予測は歴史的データに依存する。しかし、一部の地域では、歴史的データは限定的あるいは不完全であり、将来の水質を正確に予測することは困難である。さらに,河川,河川,湖上のカナダステーションのネットワークから,新たな水流のベンチマークデータセットを提案する。実験の結果,提案モデルであるTransGlowはベースライン法よりも広いマージンで優れていた。

The hydrometric prediction of water quantity is useful for a variety of applications, including water management, flood forecasting, and flood control. However, the task is difficult due to the dynamic nature and limited data of water systems. Highly interconnected water systems can significantly affect hydrometric forecasting. Consequently, it is crucial to develop models that represent the relationships between other system components. In recent years, numerous hydrological applications have been studied, including streamflow prediction, flood forecasting, and water quality prediction. Existing methods are unable to model the influence of adjacent regions between pairs of variables. In this paper, we propose a spatiotemporal forecasting model that augments the hidden state in Graph Convolution Recurrent Neural Network (GCRN) encoder-decoder using an efficient version of the attention mechanism. The attention layer allows the decoder to access different parts of the input sequence selectively. Since water systems are interconnected and the connectivity information between the stations is implicit, the proposed model leverages a graph learning module to extract a sparse graph adjacency matrix adaptively based on the data. Spatiotemporal forecasting relies on historical data. In some regions, however, historical data may be limited or incomplete, making it difficult to accurately predict future water conditions. Further, we present a new benchmark dataset of water flow from a network of Canadian stations on rivers, streams, and lakes. Experimental results demonstrate that our proposed model TransGlow significantly outperforms baseline methods by a wide margin.

翻訳日:2023-12-12 17:52:17 公開日:2023-12-10

# VAE-IF:定期取得ICU時系列における非教師なしアーティファクト検出のための平均化による深部特徴抽出

VAE-IF: Deep feature extraction with averaging for unsupervised artifact detection in routine acquired ICU time-series ( http://arxiv.org/abs/2312.05959v1 )

ライセンス: Link先を確認

Hollan Haule, Ian Piper, Patricia Jones, Chen Qin, Tsz-Yan Milly Lo, and Javier Escudero

(参考訳) アーティファクトは集中治療ユニット(icu)やその他の設定から収集された生理時系列データにおいて一般的な問題である。臨床研究と患者のケアの質と信頼性に影響を及ぼす。アーティファクトのマニュアルアノテーションは費用がかかり、時間がかかり、実用的ではない。自動化された方法が望ましい。本稿では,先行ラベルや信号固有の知識を必要とせず,臨床標準分単位のicuデータからアーティファクトを検出するための新しい教師なし手法を提案する。このアプローチでは,変動型オートエンコーダ(vae)と孤立林(iforest)モデルを組み合わせて,血圧,心拍数,頭蓋内圧など,さまざまな生命徴候の特徴を学習し,異常を同定する。我々は、実世界のICUデータセットに対するアプローチを評価し、長期記憶(LSTM)とXGBoostに基づく教師付きモデルと比較する。提案手法は, 同等の感度を達成し, 外部データセットによく適合することを示す。また,vaeが学習した潜在空間を可視化し,クリーンでノイズの多いサンプルを分離する能力を示す。本手法は,臨床研究や実践において,ラベルを一切必要とせずにICUデータをクリーニングする,有望なソリューションを提供する。

Artifacts are a common problem in physiological time-series data collected from intensive care units (ICU) and other settings. They affect the quality and reliability of clinical research and patient care. Manual annotation of artifacts is costly and time-consuming, rendering it impractical. Automated methods are desired. Here, we propose a novel unsupervised approach to detect artifacts in clinical-standard minute-by-minute resolution ICU data without any prior labeling or signal-specific knowledge. Our approach combines a variational autoencoder (VAE) and an isolation forest (iForest) model to learn features and identify anomalies in different types of vital signs, such as blood pressure, heart rate, and intracranial pressure. We evaluate our approach on a real-world ICU dataset and compare it with supervised models based on long short-term memory (LSTM) and XGBoost. We show that our approach achieves comparable sensitivity and generalizes well to an external dataset. We also visualize the latent space learned by the VAE and demonstrate its ability to disentangle clean and noisy samples. Our approach offers a promising solution for cleaning ICU data in clinical research and practice without the need for any labels whatsoever.

翻訳日:2023-12-12 17:51:53 公開日:2023-12-10

# フライ上の微分可能な粒子フィルタの学習

Learning Differentiable Particle Filter on the Fly ( http://arxiv.org/abs/2312.05955v1 )

ライセンス: Link先を確認

Jiaxi Li, Xiongjie Chen, Yunpeng Li

(参考訳) 微分可能な粒子フィルタは、ニューラルネットワークを用いて状態空間モデルに成分を構成するシーケンシャルベイズ推論技術の新たなクラスである。既存のアプローチは、主にオフラインの教師付きトレーニング戦略に基づいている。これにより、モデルデプロイメントの遅延が発生し、得られたフィルタはテスト時間データの分散シフトに影響を受けやすい。本稿では,データ到着時にモデルパラメータを更新できるように,微分可能な粒子フィルタのためのオンライン学習フレームワークを提案する。技術的な制約は、オンライン推論設定に既知の真理状態情報がないことである。我々は、オンラインモデル更新手順を構築するために、教師なしの損失を採用することで、この問題に対処する。提案手法の有効性を実証的に評価し,多変量線形ガウス状態空間モデルと擬似物体追跡実験を含むシミュレーション設定における教師付き学習手法と比較した。

Differentiable particle filters are an emerging class of sequential Bayesian inference techniques that use neural networks to construct components in state space models. Existing approaches are mostly based on offline supervised training strategies. This leads to the delay of the model deployment and the obtained filters are susceptible to distribution shift of test-time data. In this paper, we propose an online learning framework for differentiable particle filters so that model parameters can be updated as data arrive. The technical constraint is that there is no known ground truth state information in the online inference setting. We address this by adopting an unsupervised loss to construct the online model updating procedure, which involves a sequence of filtering operations for online maximum likelihood-based parameter estimation. We empirically evaluate the effectiveness of the proposed method, and compare it with supervised learning methods in simulation settings including a multivariate linear Gaussian state-space model and a simulated object tracking experiment.

翻訳日:2023-12-12 17:51:34 公開日:2023-12-10

# RadImageGAN - 医療画像のためのマルチモーダルデータセットスケール生成AI

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging ( http://arxiv.org/abs/2312.05953v1 )

ライセンス: Link先を確認

Zelong Liu, Alexander Zhou, Arnold Yang, Alara Yilmaz, Maxwell Yoo, Mikey Sullivan, Catherine Zhang, James Grant, Daiqing Li, Zahi A. Fayad, Sean Huver, Timothy Deyer, Xueyan Mei

(参考訳) 医用画像の深層学習は、しばしば大規模で高品質なデータを必要とする。しかしながら、医療データセットはデータアベイラビリティ、ドメイン固有の知識、プライバシの懸念によって制限されており、radimagenetのような大規模で多様な放射線データベースの作成は非常にリソース集約的である。これらの制約に対処するため、我々は102,774人の実際のRadImageNetデータセット上でStyleGAN-XLをトレーニングすることで開発された最初のマルチモーダルラジオグラフィーデータジェネレータであるRadImageGANを紹介した。 RadImageGANは、12の解剖学的領域と130の病理組織を3つのモードで高解像度の医用画像データセットを生成することができる。さらに,radimagegan ジェネレータを bigdatasetgan と併用することで,手作業によるアノテーションの少ない下流セグメンテーションタスクのためのマルチクラスピクセルアノテートされた合成画像とマスクを生成できることを実証した。本研究では,radimageganの合成自動ラベルデータを用いることで,実トレーニングデータの拡張や微調整のための事前学習重み付けの開発により,4種類の下流セグメンテーションデータセットの性能を大幅に向上できることを示した。これは、RadImageGANとBigDatasetGANを組み合わせることで、セグメンテーションタスクのアノテーションに必要なリソースを削減しつつ、モデルパフォーマンスを改善し、データの不足に対処できることを示している。

Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, the first multi-modal radiologic data generator, which was developed by training StyleGAN-XL on the real RadImageNet dataset of 102,774 patients. RadImageGAN can generate high-resolution synthetic medical imaging datasets across 12 anatomical regions and 130 pathological classes in 3 modalities. Furthermore, we demonstrate that RadImageGAN generators can be utilized with BigDatasetGAN to generate multi-class pixel-wise annotated paired synthetic images and masks for diverse downstream segmentation tasks with minimal manual annotation. We showed that using synthetic auto-labeled data from RadImageGAN can significantly improve performance on four diverse downstream segmentation datasets by augmenting real training data and/or developing pre-trained weights for fine-tuning. This shows that RadImageGAN combined with BigDatasetGAN can improve model performance and address data scarcity while reducing the resources needed for annotations for segmentation tasks.

翻訳日:2023-12-12 17:51:19 公開日:2023-12-10

# 静的黒孔の熱力学における非摂動補正の探索

Exploring Non-perturbative Corrections in Thermodynamics of Static Dirty Black Holes ( http://arxiv.org/abs/2312.05948v1 )

ライセンス: Link先を確認

Saheb Soroushfar, Behnam Pourhassan, and \.Izzet Sakall{\i}

(参考訳) 本研究は、アインシュタイン-非線形電気力学(ene)-ディラトン理論の枠組みにおける一様電場に浸漬された汚れたブラックホールの熱力学的性質についての研究である。解析は熱容量、ヘルムホルツ自由エネルギー、内部エネルギーを含む様々な熱力学的側面に分解され、電場の影響下でのブラックホールの挙動についての洞察を与える。さらに、量子補正エントロピーの検証を通じて、量子効果と熱力学的挙動の間の複雑な相互作用を探求する。この研究は、この複雑なシステムで発生する非摂動的補正に光を当てることを目的としており、特定の理論枠組みの中で汚れたブラックホールの修正熱力学を包括的に理解することを目的としている。

This study presents an investigation into the thermodynamic properties of a dirty black hole immersed in a uniform electric field within the framework of the Einstein-Nonlinear Electrodynamics (ENE)-dilaton theory. The analysis delves into various thermodynamic aspects, including heat capacity, Helmholtz free energy, and internal energy, providing insights into the behavior of the black hole under the influence of the electric field. Furthermore, the article explores the intricate interplay between quantum effects and thermodynamic behavior through the examination of quantum-corrected entropy. The study aims to shed light on the non-perturbative corrections that arise in this complex system, offering a comprehensive understanding of the modified thermodynamics of dirty black holes within the specified theoretical framework.

翻訳日:2023-12-12 17:50:51 公開日:2023-12-10

# 因子グラフを用いた訓練深層ニューラルネットワークによる不確かさ伝播

Uncertainty Propagation through Trained Deep Neural Networks Using Factor Graphs ( http://arxiv.org/abs/2312.05946v1 )

ライセンス: Link先を確認

Angel Daruna, Yunye Gong, Abhinav Rajvanshi, Han-Pang Chiu, Yi Yao

(参考訳) 予測の不確実性推定は、安全クリティカルなアプリケーションにおけるディープニューラルネットワークのサブシステムとしての使用を前提とした、依然として難しい問題である。不確実性は予測の不確実性の構成要素であり、モデルの改善によって低減できない。不確実性伝播は、入力の不確実性をネットワーク予測に伝播させることで、アリーアティック不確実性の推定を試みる。既存の不確実性伝播技術では、一方向の情報フローを使用し、層間またはニューラルネットワーク全体にわたって不確実性が伝播する。深層ニューラルネットワーク内の複雑な情報フロー(スキップ接続など)に動機付け,不確実性伝播を係数グラフを用いた非線形最適化問題として用いた新しい手法を開発し,評価した。 3つのデータセットと2つのニューラルネットワークアーキテクチャを含む、ほとんどの実験でファクタグラフを使用する場合、以前の作業よりも、統計的に大幅なパフォーマンス改善が見られた。我々の実装はサンプリングと解析的伝播技術の利点のバランスをとっており、これは性能改善の鍵となる要素であると考えている。

Predictive uncertainty estimation remains a challenging problem precluding the use of deep neural networks as subsystems within safety-critical applications. Aleatoric uncertainty is a component of predictive uncertainty that cannot be reduced through model improvements. Uncertainty propagation seeks to estimate aleatoric uncertainty by propagating input uncertainties to network predictions. Existing uncertainty propagation techniques use one-way information flows, propagating uncertainties layer-by-layer or across the entire neural network while relying either on sampling or analytical techniques for propagation. Motivated by the complex information flows within deep neural networks (e.g. skip connections), we developed and evaluated a novel approach by posing uncertainty propagation as a non-linear optimization problem using factor graphs. We observed statistically significant improvements in performance over prior work when using factor graphs across most of our experiments that included three datasets and two neural network architectures. Our implementation balances the benefits of sampling and analytical propagation techniques, which we believe, is a key factor in achieving performance improvements.

翻訳日:2023-12-12 17:50:40 公開日:2023-12-10

# ASH: 効率的でフォトリアルな人間レンダリングのためのアニマブルなガウススプラッター

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering ( http://arxiv.org/abs/2312.05941v1 )

ライセンス: Link先を確認

Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann

(参考訳) フォトリアリスティックで制御可能なアバターのリアルタイムレンダリングは、コンピュータビジョンとグラフィックスの基盤となっている。ニューラル暗黙的レンダリングの最近の進歩は、デジタルアバターに対する前例のないフォトリアリズムを解き放つ一方で、リアルタイムのパフォーマンスは静的なシーンでのみ実証されている。そこで本研究では,動的人間をリアルタイムに光実写レンダリングするためのアニマタブルなガウススプラッティング手法であるASHを提案する。我々は、被服をアニマタブルな3Dガウスとしてパラメータ化し、画像空間に効率よく切り込み、最終的なレンダリングを生成する。しかし、3d空間でガウスパラメータを自然に学習することは計算の面で厳しい課題となる。代わりに、変形可能なキャラクタモデルにガウシアンをアタッチし、2次元テクスチャ空間でパラメータを学習することで、必要な数のガウシアンで容易にスケールできる効率的な2次元畳み込みアーキテクチャを活用できる。我々は、ポーズ制御可能なアバターの競合手法を用いてASHをベンチマークし、我々の手法が既存のリアルタイムメソッドよりも大きなマージンで優れており、オフラインメソッドよりも同等あるいはそれ以上の結果を示すことを示した。

Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.

翻訳日:2023-12-12 17:50:24 公開日:2023-12-10

# 微調整か、それとも検索か? LLMにおける知識注入の比較

Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs ( http://arxiv.org/abs/2312.05934v1 )

ライセンス: Link先を確認

Oded Ovadia, Menachem Brief, Moshik Mishaeli, Oren Elisha

(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる多様な質問に答える能力によって証明されるように、事前訓練された重みの中に大量の事実情報をカプセル化する。しかしながら、この知識は本質的に限定的であり、トレーニングデータの特性に大きく依存している。したがって、新しい情報を組み込んだり、以前見た情報にllmの機能を洗練したりする外部データセットを使用することは、大きな課題となる。本研究では,微細チューニングと検索強化生成(RAG)の2つのアプローチを比較した。さまざまなトピックにまたがる様々な知識集約的なタスクに対して,両アプローチを評価した。私たちの調査結果は、微調整が改善をもたらす一方で、ragはトレーニング中に遭遇した既存の知識と全く新しい知識の両方において、一貫してそれを上回っています。さらに、llmは微調整によって新しい事実情報を学ぶのに苦労し、訓練中に同じ事実の様々なバリエーションを露出させることでこの問題を緩和できることがわかった。

Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.

翻訳日:2023-12-12 17:50:01 公開日:2023-12-10

# 患者リスク進行のモデル化のための時間監督型コントラスト学習

Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression ( http://arxiv.org/abs/2312.05933v1 )

ライセンス: Link先を確認

Shahriar Noroozizadeh, Jeremy C. Weiss, George H. Chen

(参考訳) 我々は,患者の興味ある結果の可能性が,患者のデータより多く観察されるにつれてどのように変化するかを予測する問題を考える。この問題を解決するために,患者時系列の各段階の埋め込み表現を学習する教師付きコントラスト学習フレームワークを提案する。組込み空間は,(1)組込み空間内の近傍点が類似するクラス確率を持つ,(2)組込み空間内の近傍点に対する同一時系列マップの隣接時間ステップ,(3)全く異なる生特徴ベクトルを持つ時間ステップが組込み空間の遠く離れた領域にマップされる,という特性を持つように学習する。特性 (3) を達成するために, 原特徴空間に最も近い近傍のペアリング機構を用いる。このメカニズムは、臨床表形式データに対して適切な現実性を持つ標準的手順を欠いているコントラスト学習の重要な要素であるデータ拡張の代替としても機能する。本手法は, 敗血症患者 (MIMIC-III データセット) の死亡を予測し, 認知障害者 (ADNI データセット) の進行の追跡において, 最先端のベースラインよりも優れていることを示す。また,本手法は,実験全体にわたる正しい合成データセットの埋め込み構造を一貫して復元する。我々のアブレーション実験は、最も近い隣人のペアリングの重要な役割を示している。

We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to data augmentation, a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.

翻訳日:2023-12-12 17:49:44 公開日:2023-12-10

# 爪折り毛細管解析のための包括的データセットと自動パイプライン

A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis ( http://arxiv.org/abs/2312.05930v1 )

ライセンス: Link先を確認

Linxi Zhao, Jiankai Tang, Dongyu Chen, Xiaohong Liu, Yong Zhou, Guangyu Wang, Yuntao Wang

(参考訳) ネイルフォールド毛細管鏡は健康状態の評価法として確立されているが,最近の進歩にもかかわらず,機械学習を用いた自動医用画像解析の可能性は未解決である。本稿では,ディープラーニングモデルの学習に欠かせないリソースとして,総合的なデータセット-321画像,219ビデオ,68のクリニックレポートを構築するための先駆的な取り組みを提案する。このデータセットを利用して,多様な形態的特徴と動的特徴を自動的に検出・測定できるエンドツーエンドのネイルフォールドキャピラリー解析パイプラインを提案する。実験結果は, 異常部分の予測におけるサブピクセル測定精度と90%の精度を示し, 定量的医学研究の進展と医療における広範コンピューティングの実現の可能性を強調した。私たちはオープンソースコードとデータ(https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillaryで利用可能)を共有して、計算医療画像解析における変革的な進歩に貢献しました。

Nailfold capillaroscopy is a well-established method for assessing health conditions, but the untapped potential of automated medical image analysis using machine learning remains despite recent advancements. In this groundbreaking study, we present a pioneering effort in constructing a comprehensive dataset-321 images, 219 videos, 68 clinic reports, with expert annotations-that serves as a crucial resource for training deep-learning models. Leveraging this dataset, we propose an end-to-end nailfold capillary analysis pipeline capable of automatically detecting and measuring diverse morphological and dynamic features. Experimental results demonstrate sub-pixel measurement accuracy and 90% accuracy in predicting abnormality portions, highlighting its potential for advancing quantitative medical research and enabling pervasive computing in healthcare. We've shared our open-source codes and data (available at https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary) to contribute to transformative progress in computational medical image analysis.

翻訳日:2023-12-12 17:49:18 公開日:2023-12-10

# AesFA: 美的特徴を意識した任意型ニューラルネットワーク

AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer ( http://arxiv.org/abs/2312.05928v1 )

ライセンス: Link先を確認

Joonwoo Kwon, Sooyoung Kim, Yuewei Lin, Shinjae Yoo, Jiook Cha

(参考訳) ニューラルスタイル転送(NST)は近年大きく進歩している。しかし、その急速な進歩と進歩にもかかわらず、既存のNST手法は、あるスタイルから美的情報を効果的に伝達するのに苦労するか、あるいは事前訓練されたモデルの使用による特徴のゆがみに高い計算コストと非効率に苦しむかのいずれかである。この研究は軽量だが効果的なモデルであるAesFA -- Aesthetic Feature-Aware NSTを提案する。主なアイデアは、モデル全体をエンドツーエンドでトレーニングしながら、その周波数でイメージを分解し、参照画像から審美的なスタイルを分離し、推論時に事前訓練されたモデルを完全に排除することである。ネットワークがより明確な表現を抽出し、スタイライズ品質をさらに向上する能力を向上させるため、本研究では、新しい美的特徴であるコントラッシブ・ロスを導入する。大規模な実験と改善は、最近のNST法をスタイリング品質で上回るだけでなく、より高速な推論も達成していることを示している。コードはhttps://github.com/Sooyyoungg/AesFAで入手できる。

Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Aesthetic Feature-Aware NST. The primary idea is to decompose the image via its frequencies to better disentangle aesthetic styles from the reference image while training the entire model in an end-to-end manner to exclude pre-trained models at inference completely. To improve the network's ability to extract more distinct representations and further enhance the stylization quality, this work introduces a new aesthetic feature: contrastive loss. Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Codes are available at https://github.com/Sooyyoungg/AesFA.

翻訳日:2023-12-12 17:48:58 公開日:2023-12-10

# 言語記述セマンティック検索に基づくロボットマニピュレーションタスクのポリシー

Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks ( http://arxiv.org/abs/2312.05925v1 )

ライセンス: Link先を確認

Jannik Sheikh, Andrew Melnik, Gora Chand Nandi, Robert Haschke

(参考訳) 強化学習と模倣学習のアプローチは、タスクのごく一部の例でうまく一般化することが難しい政策学習戦略を利用する。本研究では,状態行動軌跡の実証データセットからオンライン検索ポリシーを作成するための言語条件のセマンティック検索手法を提案する。ここでは、データセットにある最もよく似た操作軌跡からアクションを直接取得する。提案手法は,CALVINベンチマークのベースライン性能を超越し,ゼロショット適応性能が向上する。これは、オンライン検索ベースのポリシーアプローチを、通常Imitation LearningやReinforcement Learningベースのポリシーによって対処されるタスクに拡張する大きな可能性を秘めている。

Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task. In this work, we propose a language-conditioned semantic search-based method to produce an online search-based policy from the available demonstration dataset of state-action trajectories. Here we directly acquire actions from the most similar manipulation trajectories found in the dataset. Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities. This holds great potential for expanding the use of our online search-based policy approach to tasks typically addressed by Imitation Learning or Reinforcement Learning-based policies.

翻訳日:2023-12-12 17:48:36 公開日:2023-12-10

# 想像上のアクションによるRLポリシーの修正:新しいタスクの実行を可能にする予測可能なポリシー

Modifying RL Policies with Imagined Actions: How Predictable Policies Can Enable Users to Perform Novel Tasks ( http://arxiv.org/abs/2312.05991v1 )

ライセンス: Link先を確認

Isaac Sheidlower, Reuben Aronson, Elaine Short

(参考訳) ユーザーは、ロボットの機能を利用して、リアルタイムで問題を創造的に解決できることが重要です。強化学習(rl)ベースのロボットにアクセス可能なユーザーは、ロボットの自律性とその行動に関する知識を使って新しいタスクを完了させたいかもしれない。 1つの方法は、ユーザが遠隔操作によってロボットのアクション空間の一部を制御し、RLポリシーが残りを同時に制御することである。しかし、既定のrlポリシーは簡単には実現できないかもしれない。例えば、ユーザのコントロールは、ポリシーの観点からロボットを障害状態にし、ユーザが慣れていない方法で動作させることで、ユーザの望むタスクの成功を妨げる可能性がある。本稿では,この課題を定式化し,その問題に対処し,ロボットの行動に対する期待を生かして新たなタスクを実現するための初期アルゴリズムであるiodaを提案する。

It is crucial that users are empowered to use the functionalities of a robot to creatively solve problems on the fly. A user who has access to a Reinforcement Learning (RL) based robot may want to use the robot's autonomy and their knowledge of its behavior to complete new tasks. One way is for the user to take control of some of the robot's action space through teleoperation while the RL policy simultaneously controls the rest. However, an out-of-the-box RL policy may not readily facilitate this. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act in a way the user is not familiar with, hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm for addressing that problem and empowering user's to leverage their expectation of a robot's behavior to accomplish new tasks.

翻訳日:2023-12-12 17:43:52 公開日:2023-12-10

# テキストから遅延メッセージの特徴を抽出するベクティナリーの構築:道徳的アピールを事例として

Constructing Vec-tionaries to Extract Latent Message Features from Texts: A Case Study of Moral Appeals ( http://arxiv.org/abs/2312.05990v1 )

ライセンス: Link先を確認

Zening Duan, Anqi Shao, Yicheng Hu, Heysung Lee, Xining Liao, Yoo Ji Suh, Jisoo Kim, Kai-Cheng Yang, Kaiping Chen, and Sijia Yang

(参考訳) コミュニケーション研究は、道徳的アピールのような潜在メッセージの特徴をしばしば研究しているが、その定量化は依然として課題である。従来のヒューマンコーディングはスケーラビリティとインターコーダの信頼性に苦しむ。辞書ベースの手法はコスト効率と計算効率が良いが、文脈感度に欠けることが多く、本来の用途で開発された語彙によって制限される。本稿では,非線形最適化による単語埋め込みによる検証辞書の高速化を目的とした,ベクトル定値測定ツールの構築手法を提案する。埋め込みによって符号化される意味関係を利用することにより、vec-tionaryは、元の語彙を他の文脈に適用する可能性を広げることで、潜在メッセージ特徴の測定を改善する。 Vec-tionariesは、辞書の本来の語彙を超えて、テキスト、特に短いフォーマットのテキストから意味情報を抽出するのに役立つ。重要なことに、vec-tionaryは、テキストの強み以上の潜在機能の価値とあいまいさを捉えるために、追加のメトリクスを生成することができる。新型コロナウイルス関連ツイートの道徳的魅力を事例研究として、倫理的基盤を構築するためのステップを解説し、辞書の手法で欠落した投稿を処理し、クラウドソースされた人的評価に適合した測定結果を作成する能力を示す。さらに、道徳的基礎からのさらなるメトリクスは、メッセージの再送のような予測結果を促進するユニークな洞察を明らかにした。

While communication research frequently studies latent message features like moral appeals, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present a novel approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionaries improve the measurement of latent message features by expanding the applicability of original vocabularies to other contexts. Vec-tionaries can also help extract semantic information from texts, especially those in short format, beyond the original vocabulary of a dictionary. Importantly, a vec-tionary can produce additional metrics to capture the valence and ambivalence of a latent feature beyond its strength in texts. Using moral appeals in COVID-19-related tweets as a case study, we illustrate the steps to construct the moral foundations vec-tionary, showcasing its ability to process posts missed by dictionary methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the moral foundations vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission.

翻訳日:2023-12-12 17:43:37 公開日:2023-12-10

# Denoising Diffusion Probabilistic Modelの収束性に関する一考察

A Note on the Convergence of Denoising Diffusion Probabilistic Models ( http://arxiv.org/abs/2312.05989v1 )

ライセンス: Link先を確認

Sokhna Diarra Mbacke, Omar Rivasplata

(参考訳) 拡散モデルは深層生成モデルの最も重要なファミリーの1つである。本稿では,データ生成分布と拡散モデルで学習した分布との間のワッサーシュタイン距離の定量的上限を導出する。この分野の先行研究とは異なり、この結果は学習スコア関数を仮定しない。さらに、有界なインスタンス空間上の任意のデータ生成分布に対する束縛は、密度 w.r.t. を持たないものでさえも、ルベーグ測度であり、上界は指数的依存関係を伴わない。我々の主な成果は、Mbacke et al. (2023) の最近の研究に基づいている。

Diffusion models are one of the most important families of deep generative models. In this note, we derive a quantitative upper bound on the Wasserstein distance between the data-generating distribution and the distribution learned by a diffusion model. Unlike previous works in this field, our result does not make assumptions on the learned score function. Moreover, our bound holds for arbitrary data-generating distributions on bounded instance spaces, even those without a density w.r.t. the Lebesgue measure, and the upper bound does not suffer from exponential dependencies. Our main result builds upon the recent work of Mbacke et al. (2023) and our proofs are elementary.

翻訳日:2023-12-12 17:43:12 公開日:2023-12-10

# 反復変形学習による乳児脳MRIからの球面形状を持つ皮質表面の再構成

Reconstruction of Cortical Surfaces with Spherical Topology from Infant Brain MRI via Recurrent Deformation Learning ( http://arxiv.org/abs/2312.05986v1 )

ライセンス: Link先を確認

Xiaoyang Chen, Junjie Zhao, Siyuan Liu, Sahar Ahmad, Pew-Thian Yap

(参考訳) MRIからの皮質表面再構成(CSR)は、脳の構造と機能を研究する鍵となる。近年のディープラーニングアプローチはCSRの速度を大幅に向上させたが、下流の幾何学的解析を容易にするために、皮質を位相的に正しい球面多様体にマッピングするためには、かなりのランタイムが必要である。さらに、このマッピングは、表面メッシュのトポロジーが球面とホモトピーである場合にのみ可能である。本稿では,数秒以内に効率的にCSRと球面マッピングを同時に行う手法を提案する。提案手法は,2つのサブネットワークをシームレスに接続し,白色表面生成を行う。残留微分同相変形を反復的に学習し, メッシュトポロジーと均一性を保ちながら, 球面テンプレートメッシュを白色およびピアル面に徐々にワープする。テンプレート球面と皮質面の間の1対1の頂点対応により、凸性や曲率といった幾何学的特徴を球面に簡単に直接マッピングでき、可視化や下流処理が可能となる。乳児期脳MRIに対するアプローチの有効性を実証し,初生後1年間の急速な脳発達に伴う組織コントラストの変化により,CSRに重大な課題を提起した。 0～12ヶ月の幼児のデータセットに基づく性能評価の結果,本手法はメッシュの正則性を大幅に向上し,幾何学的誤差を低減し,高度な計算効率を維持しつつ,最先端のディープラーニングアプローチよりも優れていた。

Cortical surface reconstruction (CSR) from MRI is key to investigating brain structure and function. While recent deep learning approaches have significantly improved the speed of CSR, a substantial amount of runtime is still needed to map the cortex to a topologically-correct spherical manifold to facilitate downstream geometric analyses. Moreover, this mapping is possible only if the topology of the surface mesh is homotopic to a sphere. Here, we present a method for simultaneous CSR and spherical mapping efficiently within seconds. Our approach seamlessly connects two sub-networks for white and pial surface generation. Residual diffeomorphic deformations are learned iteratively to gradually warp a spherical template mesh to the white and pial surfaces while preserving mesh topology and uniformity. The one-to-one vertex correspondence between the template sphere and the cortical surfaces allows easy and direct mapping of geometric features like convexity and curvature to the sphere for visualization and downstream processing. We demonstrate the efficacy of our approach on infant brain MRI, which poses significant challenges to CSR due to tissue contrast changes associated with rapid brain development during the first postnatal year. Performance evaluation based on a dataset of infants from 0 to 12 months demonstrates that our method substantially enhances mesh regularity and reduces geometric errors, outperforming state-of-the-art deep learning approaches, all while maintaining high computational efficiency.

翻訳日:2023-12-12 17:42:59 公開日:2023-12-10

# 重み付き導入による差分差分に対する融合2ウェイ固定効果

Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions ( http://arxiv.org/abs/2312.05985v1 )

ライセンス: Link先を確認

Gregory Faletto

(参考訳) 停滞した導入下での差分差分に対する正準二方向固定効果推定器のバイアスに対処するため、Woldridge (2021) は拡張二方向固定効果推定器を提案し、多くのパラメータを追加した。しかし、これは効率を低下させる。これらのパラメータのいくつかを等しく制限することは役に立つが、アドホックな制限はバイアスを再導入する可能性がある。本研究では,これらの制約の自動データ駆動選択を可能にする,単一チューニングパラメータによる拡張双方向固定効果(fetwfe)を用いた機械学習推定器を提案する。 FETWFEは適切な空間性仮定の下で、確率が1の傾向の正しい制限を識別する。また,FETWFEとFETWFEの整合性,漸近的正規性,およびオラクル効率を条件付きおよび辺縁的平行性傾向の両条件で評価し,条件付き平均処理効果の2種類の条件付き平均化効果の整合性を示す。シミュレーション研究におけるFETWFEの実証と実証応用について述べる。

To address the bias of the canonical two-way fixed effects estimator for difference-in-differences under staggered adoptions, Wooldridge (2021) proposed the extended two-way fixed effects estimator, which adds many parameters. However, this reduces efficiency. Restricting some of these parameters to be equal helps, but ad hoc restrictions may reintroduce bias. We propose a machine learning estimator with a single tuning parameter, fused extended two-way fixed effects (FETWFE), that enables automatic data-driven selection of these restrictions. We prove that under an appropriate sparsity assumption FETWFE identifies the correct restrictions with probability tending to one. We also prove the consistency, asymptotic normality, and oracle efficiency of FETWFE for two classes of heterogeneous marginal treatment effect estimators under either conditional or marginal parallel trends, and we prove consistency for two classes of conditional average treatment effects under conditional parallel trends. We demonstrate FETWFE in simulation studies and an empirical application.

翻訳日:2023-12-12 17:42:35 公開日:2023-12-10

# ハイブリッドニューラルフィールドのための高精度微分作用素

Accurate Differential Operators for Hybrid Neural Fields ( http://arxiv.org/abs/2312.05984v1 )

ライセンス: Link先を確認

Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan

(参考訳) ニューラルネットワークは、形状表現からニューラルレンダリング、偏微分方程式(PDE)の解法など、様々な分野で広く使われている。小さなMLPと明示的な表現を活用するInstant NGPのようなハイブリッドニューラルネットワーク表現の出現により、これらのモデルは迅速にトレーニングされ、大きなシーンに適合する。しかし、レンダリングやシミュレーションのような多くのアプリケーションでは、ハイブリッドニューラルネットワークは顕著で不合理なアーティファクトを引き起こす可能性がある。これは、これらの下流アプリケーションに必要な正確な空間微分が得られないためである。本研究では,これらの課題を回避する2つの方法を提案する。我々の最初のアプローチは、事前訓練されたハイブリッドニューラルネットワークからより正確な導出を得るために局所多項式フィッティングを使用するポストホック演算子である。さらに,初期信号を維持しながら正確な導関数を直接生成するために,神経場を洗練する自己教師付き微調整手法を提案する。提案手法のレンダリング, 衝突シミュレーション, PDE の解法への応用について述べる。提案手法を用いることで, より正確な導関数が得られ, アーティファクトが低減され, 下流のアプリケーションでより正確なシミュレーションがもたらされる。

Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybrid neural fields can cause noticeable and unreasonable artifacts. This is because they do not yield accurate spatial derivatives needed for these downstream applications. In this work, we propose two ways to circumvent these challenges. Our first approach is a post hoc operator that uses local polynomial-fitting to obtain more accurate derivatives from pre-trained hybrid neural fields. Additionally, we also propose a self-supervised fine-tuning approach that refines the neural field to yield accurate derivatives directly while preserving the initial signal. We show the application of our method on rendering, collision simulation, and solving PDEs. We observe that using our approach yields more accurate derivatives, reducing artifacts and leading to more accurate simulations in downstream applications.

翻訳日:2023-12-12 17:42:02 公開日:2023-12-10

# 電気自動車充電ステーションの最適位置の最大流量に基づく定式化

Maximum flow-based formulation for the optimal location of electric vehicle charging stations ( http://arxiv.org/abs/2312.05980v1 )

ライセンス: Link先を確認

Pierre-Luc Parent and Margarida Carvalho and Miguel F. Anjos and Ribal Atallah

(参考訳) 気候変動の影響が増すにつれ、化石燃料から遠ざかる緊急性はこれまで以上に大きくなっている。電気自動車(ev)は、これらの効果を減少させる一つの方法であるが、その普及は充電ステーションが不十分なため、しばしば制限される。この作業では、ユーザの満足度(と充電ステーションの可用性)の観点から、より優れたサービスの質を提供するために、ev充電ステーションのインフラストラクチャを拡張することを目標としています。特に我々の焦点は都市部に向けられている。まず, ステーションへのEV充電需要の配分モデルを提案し, 最大流量問題として検討した。このモデルは、所定の課金インフラストラクチャによるユーザ満足度の評価の基礎となる。第2に,混合整数線形プログラムに最大流量モデルを導入することで,新しい駅の開設や追加出口による容量拡大に関する決定を行う。実世界のシナリオを扱うための我々のアプローチのスケーラビリティを実証し、モントリオール市の方法論を紹介します。実例の解法では,充電需要の時間的変化と時間的変化の両方を考慮すると有意義である。

With the increasing effects of climate change, the urgency to step away from fossil fuels is greater than ever before. Electric vehicles (EVs) are one way to diminish these effects, but their widespread adoption is often limited by the insufficient availability of charging stations. In this work, our goal is to expand the infrastructure of EV charging stations, in order to provide a better quality of service in terms of user satisfaction (and availability of charging stations). Specifically, our focus is directed towards urban areas. We first propose a model for the assignment of EV charging demand to stations, framing it as a maximum flow problem. This model is the basis for the evaluation of user satisfaction with a given charging infrastructure. Secondly, we incorporate the maximum flow model into a mixed-integer linear program, where decisions on the opening of new stations and on the expansion of their capacity through additional outlets is accounted for. We showcase our methodology for the city of Montreal, demonstrating the scalability of our approach to handle real-world scenarios. We conclude that considering both spacial and temporal variations in charging demand is meaningful when solving realistic instances.

翻訳日:2023-12-12 17:41:33 公開日:2023-12-10

# novacomet: シンボリック知識蒸留を伴うオープンコモンセンス基礎モデル

NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation ( http://arxiv.org/abs/2312.05979v1 )

ライセンス: Link先を確認

Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi

(参考訳) novacometはオープン・コモンセンス・ナレッジ・モデルで、知識と一般的なタスク・モデルの最良の側面を組み合わせたものです。従来の知識モデルと比較すると、NovaCOMETは推論タスクへの直接適用を可能にするオープンフォーマットのリレーションシップを可能にしており、Flan-T5のような一般的なタスクモデルと比較して、知識を明示的に中心とし、常識推論の優れたパフォーマンスを実現する。 NovaCOMETは、不透明なプロプライエタリモデルの知識を活用して、オープンな知識パイプラインを作成する。第一に、知識は象徴的にNovATOMICに蒸留され、これは、監査、批判、フィルタリングが可能な公開リリースの個別知識グラフである。次に、NovaCOMETをNovATOMIC上で訓練し、オープンソースの事前学習モデルを微調整する。 NovaCOMETはオープンフォーマットのトレーニング目標を使用して、過去の知識モデルの固定された関係セットを置き換えることで、データ内の任意の構造が入力や出力として機能できるようにする。生成された生成モデルは、オプションで人間のアノテーションで拡張され、さまざまなコモンセンス生成タスクでFlan-T5のようなオープンタスクモデルと一致するか、超える。 NovaCOMETは、命令チューニングのみに焦点を合わせ、コモンセンス知識を明示的にモデル化する上で、明確な利点を示す。

We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.

翻訳日:2023-12-12 17:41:01 公開日:2023-12-10

# 高速ブラッグピーク解析のためのニューラルアーキテクチャ符号符号

Neural Architecture Codesign for Fast Bragg Peak Analysis ( http://arxiv.org/abs/2312.05978v1 )

ライセンス: Link先を確認

Luke McDermott, Jason Weitz, Dmitri Demler, Daniel Cummings, Nhan Tran, Javier Duarte

(参考訳) 高エネルギー回折顕微鏡で高速かつリアルタイムブラッグピーク解析を行うために,ニューラルネットワークのコード署名を合理化する自動パイプラインを開発した。従来のアプローチ、特に擬似Voigtフィッティングは重要な計算資源を必要とし、より効率的なソリューションのためのディープラーニングモデルへの関心を喚起した。我々の手法では、ハードウェアコストを含むこれらのモデルを強化するためにニューラルアーキテクチャ検索とAutoMLを使用し、よりハードウェア効率の良いニューラルアーキテクチャの発見に繋がる。その結果,従来の最先端技術と比較して,ビット演算の13$\times$削減を実現した。量子化・アウェアトレーニングやニューラルネットワークのプルーニングといったモデル圧縮技術により、さらなるスピードアップを示す。さらに、階層的な検索空間は最適化の柔軟性を高め、他のタスクやドメインにも簡単に拡張できます。

We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in high-energy diffraction microscopy. Traditional approaches, notably pseudo-Voigt fitting, demand significant computational resources, prompting interest in deep learning models for more efficient solutions. Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures. Our results match the performance, while achieving a 13$\times$ reduction in bit operations compared to the previous state-of-the-art. We show further speedup through model compression techniques such as quantization-aware-training and neural network pruning. Additionally, our hierarchical search space provides greater flexibility in optimization, which can easily extend to other tasks and domains.

翻訳日:2023-12-12 17:40:27 公開日:2023-12-10

# 国間における人工メディアの検出に関する代表的研究

A Representative Study on Human Detection of Artificially Generated Media Across Countries ( http://arxiv.org/abs/2312.05976v1 )

ライセンス: Link先を確認

Joel Frank, Franziska Herbert, Jonas Ricker, Lea Sch\"onherr, Thorsten Eisenhofer, Asja Fischer, Markus D\"urmuth, Thorsten Holz

(参考訳) AI生成メディアは、私たちが知っているデジタル社会にとって脅威となっている。これらの偽造物は、公開技術に基づいて、かつ大規模に自動的に作成することができる。この課題を認識した学者や実践者は,そのような人工メディアを検出するための自動検出戦略を多数提案している。しかし、これらの技術的進歩とは対照的に、生成されたメディアに対する人間の認識は、まだ十分に研究されていない。本稿では,この研究ギャップを埋めることを目的とする。我々は,3カ国(米国,ドイツ,中国)で音声,画像,テキストメディアの3,002人を対象に,生成メディアを検出する人々の能力に関する総合的な調査を行った。以上の結果から,最先端の偽造品は「リアル」メディアとほとんど区別できないことが示唆された。さらに、AIによって生成されたメディア受信は、あらゆるメディアタイプとすべての国で、より人間らしく投票される。生成メディアの検出能力に影響を及ぼす要因をさらに理解するため,ディープフェイクやフェイクニュース研究の分野における文献レビューに基づいて選択した個人変数を含む。回帰分析の結果, 総合的信頼, 認知的リフレクション, およびディープフェイクに対する自己報告的親和性は, 全メディアカテゴリーの参加者の判断に大きく影響した。

AI-generated media has become a threat to our digital society as we know it. These forgeries can be created automatically and on a large scale based on publicly available technology. Recognizing this challenge, academics and practitioners have proposed a multitude of automatic detection strategies to detect such artificial media. However, in contrast to these technical advances, the human perception of generated media has not been thoroughly studied yet. In this paper, we aim at closing this research gap. We perform the first comprehensive survey into people's ability to detect generated media, spanning three countries (USA, Germany, and China) with 3,002 participants across audio, image, and text media. Our results indicate that state-of-the-art forgeries are almost indistinguishable from "real" media, with the majority of participants simply guessing when asked to rate them as human- or machine-generated. In addition, AI-generated media receive is voted more human like across all media types and all countries. To further understand which factors influence people's ability to detect generated media, we include personal variables, chosen based on a literature review in the domains of deepfake and fake news research. In a regression analysis, we found that generalized trust, cognitive reflection, and self-reported familiarity with deepfakes significantly influence participant's decision across all media categories.

翻訳日:2023-12-12 17:39:36 公開日:2023-12-10

# FM-G-CAM:コンピュータビジョンにおける説明可能なAIの全体的アプローチ

FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision ( http://arxiv.org/abs/2312.05975v1 )

ライセンス: Link先を確認

Ravidu Suien Rammuni Silva, Jordan J. Bird

(参考訳) 説明可能性(Explainability)は、現実世界のインパクトとユーザビリティに不可欠な、現代のAIの側面である。本研究の目的は,コンピュータビジョンモデル,特に畳み込みニューラルネットワーク(CNN)に基づくモデルの予測を理解する必要性を強調することである。既存のCNN予測法は、主にグラディエント重み付きクラスアクティベーションマップ(Grad-CAM)に基づいており、単一のターゲットクラスのみに焦点を当てている。対象クラス選択の観点から予測過程を仮定し,予測者のcnnモデルの思考過程の大部分を無視することを示した。本稿では,複数のトップ予測クラスを考慮したfm-g-cam(fused multi-class gradient-weighted class activation map)と呼ばれる徹底的な手法を提案する。また,本手法の詳細な数学的およびアルゴリズム的記述も提供する。さらに,既存の手法の簡潔な比較とともに,FM-G-CAMとGrad-CAMを比較し,現実の実践的ユースケースによるメリットを強調した。最後に,FM-G-CAMを実装したオープンソースのPythonライブラリを提案する。

Explainability is an aspect of modern AI that is vital for impact and usability in the real world. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) based models. Existing methods of explaining CNN predictions are mostly based on Gradient-weighted Class Activation Maps (Grad-CAM) and solely focus on a single target class. We show that from the point of the target class selection, we make an assumption on the prediction process, hence neglecting a large portion of the predictor CNN model's thinking process. In this paper, we present an exhaustive methodology called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM) that considers multiple top predicted classes, which provides a holistic explanation of the predictor CNN's thinking rationale. We also provide a detailed and comprehensive mathematical and algorithmic description of our method. Furthermore, along with a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.

翻訳日:2023-12-12 17:38:39 公開日:2023-12-10

# 潜在ノードと構造騒音下におけるネットワーク力学系の因果構造学習

Learning the Causal Structure of Networked Dynamical Systems under Latent Nodes and Structured Noise ( http://arxiv.org/abs/2312.05974v1 )

ライセンス: Link先を確認

Augusto Santos, Diogo Rente, Rui Seabra and Jos\'e M. F. Moura

(参考訳) 本稿では,線形ネットワーク型力学系(NDS)の隠れ因果ネットワークを,そのノードの一部の時系列データから学習する。 NDSのダイナミクスは、一対のノード間で急激な関連を生み出す色付きノイズによって駆動され、問題をはるかに難しくする。ノイズ相関と部分可観測性の課題に対処するため,観測ノードの時系列データから計算した特徴ベクトルを各ノードに割り当てる。特徴の集合を一貫して分割するアフィン超平面が存在し、接続されたノードのペアに対応する特徴ベクトルと非連結なペアに対応するものとを分離する。従って因果推論問題は、設計された特徴をクラスタリングすることで解決される。単純なベースライン教師付き手法を用いて,実世界ネットワークを含む広帯域接続環境と雑音相関レベル下での因果推論機構の競合性能を実証する。さらに,線形NDSにおける構造整合性の新たな技術的保証を考察した。

This paper considers learning the hidden causal network of a linear networked dynamical system (NDS) from the time series data at some of its nodes -- partial observability. The dynamics of the NDS are driven by colored noise that generates spurious associations across pairs of nodes, rendering the problem much harder. To address the challenge of noise correlation and partial observability, we assign to each pair of nodes a feature vector computed from the time series data of observed nodes. The feature embedding is engineered to yield structural consistency: there exists an affine hyperplane that consistently partitions the set of features, separating the feature vectors corresponding to connected pairs of nodes from those corresponding to disconnected pairs. The causal inference problem is thus addressed via clustering the designed features. We demonstrate with simple baseline supervised methods the competitive performance of the proposed causal inference mechanism under broad connectivity regimes and noise correlation levels, including a real world network. Further, we devise novel technical guarantees of structural consistency for linear NDS under the considered regime.

翻訳日:2023-12-12 17:38:15 公開日:2023-12-10

# 参照のない3次元クラウド品質評価のための周波数とVTTの活性化

Activating Frequency and ViT for 3D Point Cloud Quality Assessment without Reference ( http://arxiv.org/abs/2312.05972v1 )

ライセンス: Link先を確認

Oussama Messai, Abdelouahid Bentamou, Abbass Zein-Eddine, Yann Gavet

(参考訳) 深層学習に基づく品質評価は、知覚的マルチメディア品質評価を著しく向上させたが、3dポイントクラウド(pcs)のような3dビジュアルデータはまだ初期段階にある。 3D-PCの容量が大きいため、このような量は送信や視聴のために頻繁に圧縮され、品質に影響を及ぼす可能性がある。そこで我々は,与えられた3D-PCの非参照品質指標を提案する。幾何や色に着目した既存手法と比較して, 圧縮による空間劣化パターンの指標として, 周波数等級を統合することを提案する。入力属性を品質スコアにマップするには、変形可能な畳み込みネットワーク(dcn)と視覚トランスフォーマー(vit)を組み合わせた軽量ハイブリッドディープモデルを用いる。 icip20 [1]、pointxr [2] dataset、basics [3]と呼ばれる新しいデータセットで実験が行われている。その結果,本手法は現在のNR-PCQA測度やPointXRのFR-PCQAよりも優れていた。実装コードはhttps://github.com/o-messai/3d-pcqa。

Deep learning-based quality assessments have significantly enhanced perceptual multimedia quality assessment, however it is still in the early stages for 3D visual data such as 3D point clouds (PCs). Due to the high volume of 3D-PCs, such quantities are frequently compressed for transmission and viewing, which may affect perceived quality. Therefore, we propose no-reference quality metric of a given 3D-PC. Comparing to existing methods that mostly focus on geometry or color aspects, we propose integrating frequency magnitudes as indicator of spatial degradation patterns caused by the compression. To map the input attributes to quality score, we use a light-weight hybrid deep model; combined of Deformable Convolutional Network (DCN) and Vision Transformers (ViT). Experiments are carried out on ICIP20 [1], PointXR [2] dataset, and a new big dataset called BASICS [3]. The results show that our approach outperforms state-of-the-art NR-PCQA measures and even some FR-PCQA on PointXR. The implementation code can be found at: https://github.com/o-messai/3D-PCQA

翻訳日:2023-12-12 17:37:59 公開日:2023-12-10

# 境界場を持つスピン-$\frac{1}{2}$ XXZ鎖におけるスピン分数化とゼロモード

Spin fractionalization and zero modes in the spin-$\frac{1}{2}$ XXZ chain with boundary fields ( http://arxiv.org/abs/2312.05970v1 )

ライセンス: Link先を確認

Parameshwar R. Pasnoori, Yicheng Tang, Junhyun Lee, J. H. Pixley, Natan Andrei, Patrick Azaria

(参考訳) この研究において、境界磁場を持つガッピング相における反強磁性スピン $\frac{1}{2}$ xxz 鎖は、その端に分数スピン $\frac{1}{4}$ を持つと主張する。ベーテ・アンザッツと密度行列再正規化群の組み合わせを用いて、これらの分数スピンは基底と第1の励起状態の両方においてシャープな量子観測可能であり、関連する分数スピン作用素の分散はゼロであることを示す。零端場の極限において、これらの分数スピン作用素はかつて基底状態と第1励起状態によって広がる低エネルギー部分空間に射影され、P. Fendley \cite{Fendley} によって発見された強い零エネルギーモードと同一視される。

In this work we argue that the antiferromagnetic spin $\frac{1}{2}$ XXZ chain in the gapped phase with boundary magnetic fields hosts fractional spin $\frac{1}{4}$ at its edges. Using a combination of Bethe ansatz and the density matrix renormalization group we show that these fractional spins are sharp quantum observables in both the ground and the first excited state as the associated fractional spin operators have zero variance. In the limit of zero edge fields, we argue that these fractional spin operators once projected onto the low energy subspace spanned by the ground state and the first excited state, identify with the strong zero energy mode discovered by P. Fendley \cite{Fendley}.

翻訳日:2023-12-12 17:37:38 公開日:2023-12-10

# 跳躍する手術用コンピュータビジョン

Jumpstarting Surgical Computer Vision ( http://arxiv.org/abs/2312.05968v1 )

ライセンス: Link先を確認

Deepak Alapatt, Aditya Murali, Vinkle Srivastav, Pietro Mascagni, AI4SafeChole Consortium, Nicolas Padoy

(参考訳) 目的: 研究者と業界の間での一般的なコンセンサスでは、大規模な注釈付きデータセットの欠如が、外科データ科学の分野における進歩の最大の障害であることを示している。自己教師型学習は、この問題の一部に対する解決策であり、アノテーションへの依存を取り除く。しかし,現在の自己教師あり学習法の領域シフトへの頑健性はいまだ不明であり,多種多様な手術データを活用するための有用性の理解は限られている。方法: 本研究では, 様々な手術用データセットを柔軟に活用するために, 自己教師付き学習を用いて, 様々な手術下処理に使用できるタスク非依存表現を学習する。本研究は,下流作業性能に対するプレトレーニングの影響を明らかにするために,ソース病院,外科手術の種類,トレーニング前の規模(動画数)の3変数を調整し,22種類のプレトレーニングデータセットの組み合わせを探索する。次に, 腹腔鏡下胆嚢摘出術における位相認識と安全性の重要視, 腹腔鏡下子宮摘出術における位相認識の3つの課題について検討した。結果: コントロールされた実験は、さまざまなタスク、データセット、ラベリング予算におけるパフォーマンスの大幅な向上を強調する。しかしながら、このパフォーマンスは、いくつかの研究段階を通じて堅牢に証明された事前学習データセットの構成と複雑に結びついている。結論: 事前トレーニングデータセットの構成は,さまざまなダウンストリームタスクに対するSSLメソッドの有効性に大きく影響し,SSL方法論の適用拡大に向けた今後のデータ収集の取り組みを批判的に伝える必要がある。キーワード:自己監督学習、移行学習、手術用コンピュータビジョン、内視鏡映像、安全性の批判的視点、位相認識

Purpose: General consensus amongst researchers and industry points to a lack of large, representative annotated datasets as the biggest obstacle to progress in the field of surgical data science. Self-supervised learning represents a solution to part of this problem, removing the reliance on annotations. However, the robustness of current self-supervised learning methods to domain shifts remains unclear, limiting our understanding of its utility for leveraging diverse sources of surgical data. Methods: In this work, we employ self-supervised learning to flexibly leverage diverse surgical datasets, thereby learning taskagnostic representations that can be used for various surgical downstream tasks. Based on this approach, to elucidate the impact of pre-training on downstream task performance, we explore 22 different pre-training dataset combinations by modulating three variables: source hospital, type of surgical procedure, and pre-training scale (number of videos). We then finetune the resulting model initializations on three diverse downstream tasks: namely, phase recognition and critical view of safety in laparoscopic cholecystectomy and phase recognition in laparoscopic hysterectomy. Results: Controlled experimentation highlights sizable boosts in performance across various tasks, datasets, and labeling budgets. However, this performance is intricately linked to the composition of the pre-training dataset, robustly proven through several study stages. Conclusion: The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks and should critically inform future data collection efforts to scale the application of SSL methodologies. Keywords: Self-Supervised Learning, Transfer Learning, Surgical Computer Vision, Endoscopic Videos, Critical View of Safety, Phase Recognition

翻訳日:2023-12-12 17:37:23 公開日:2023-12-10

# 教育用AIのマルチモーダリティ : 汎用人工知能を目指して

Multimodality of AI for Education: Towards Artificial General Intelligence ( http://arxiv.org/abs/2312.06037v1 )

ライセンス: Link先を確認

Gyeong-Geon Lee, Lehong Shi, Ehsan Latif, Yizhu Gao, Arne Bewersdorf, Matthew Nyaaba, Shuchen Guo, Zihao Wu, Zhengliang Liu, Hui Wang, Gengchen Mai, Tiaming Liu, and Xiaoming Zhai

(参考訳) 本稿では,マルチモーダル人工知能(AI)アプローチが,教育的文脈における人工知能(AGI)の実現に向けてどのように進んでいるのかを包括的に検討する。教育システムにおけるAIの進化と統合を精査し、聴覚、視覚、審美、言語的な学習様式を含むマルチモーダルの重要な役割を強調している。この研究は、認知フレームワーク、高度な知識表現、適応学習機構、戦略的計画、洗練された言語処理、多様なマルチモーダルデータソースの統合など、AGIの重要な側面を深く掘り下げている。教育パラダイムの改革におけるAGIの変革的ポテンシャルを批判的に評価し、教育と学習の有効性の向上、既存の方法論のギャップを埋めること、教育環境における倫理的配慮とAGIの責任ある利用に対処することに焦点を当てている。本稿は、AGI開発における今後の方向性と課題に関する洞察を提供する、教育におけるマルチモーダルAIの役割の意味についても論じる。この調査は、AIとマルチモダリティ、教育の交わりの微妙な理解を提供することを目的としており、AGIにおける将来の研究と開発の基礎を確立している。

This paper presents a comprehensive examination of how multimodal artificial intelligence (AI) approaches are paving the way towards the realization of Artificial General Intelligence (AGI) in educational contexts. It scrutinizes the evolution and integration of AI in educational systems, emphasizing the crucial role of multimodality, which encompasses auditory, visual, kinesthetic, and linguistic modes of learning. This research delves deeply into the key facets of AGI, including cognitive frameworks, advanced knowledge representation, adaptive learning mechanisms, strategic planning, sophisticated language processing, and the integration of diverse multimodal data sources. It critically assesses AGI's transformative potential in reshaping educational paradigms, focusing on enhancing teaching and learning effectiveness, filling gaps in existing methodologies, and addressing ethical considerations and responsible usage of AGI in educational settings. The paper also discusses the implications of multimodal AI's role in education, offering insights into future directions and challenges in AGI development. This exploration aims to provide a nuanced understanding of the intersection between AI, multimodality, and education, setting a foundation for future research and development in AGI.

翻訳日:2023-12-12 17:30:25 公開日:2023-12-10

# aiコンペティションとベンチマーク:ポストチャレンジ紙、ベンチマーク、その他の普及行動における課題の長期的影響の確保方法

AI Competitions and Benchmarks: How to ensure a long-lasting impact of a challenge with post-challenge paper, benchmarks and other dissemination action ( http://arxiv.org/abs/2312.06036v1 )

ライセンス: Link先を確認

Antoine Marot, David Rousseau, Zhen Xu

(参考訳) AIチャレンジの組織化は最終イベントに終止符を打たない。長期的な影響も組織化する必要がある。この章は、チャレンジが正式に完了した後の様々な活動を取り上げている。異なるアフターチャレンジ活動のターゲットオーディエンスを特定した。チャレンジのさまざまなアウトプットは、それらを収集する手段でリストされる。章の主部は典型的なポストカレンゲ紙のテンプレートであり、グラフや、チャレンジを長期のベンチマークに変換する方法についてのアドバイスを含んでいる。

Organising an AI challenge does not end with the final event. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. The target audience of different post-challenge activities is identified. The various outputs of the challenge are listed with the means to collect them. The main part of the chapter is a template for a typical post-challenge paper, including possible graphs as well as advice on how to turn the challenge into a long-lasting benchmark.

翻訳日:2023-12-12 17:30:05 公開日:2023-12-10

# 正規化フローによるパーソナライズされた感情予測の不確かさのモデル化

Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows ( http://arxiv.org/abs/2312.06034v1 )

ライセンス: Link先を確認

Piotr Mi{\l}kowski, Konrad Karanowski, Patryk Wielopolski, Jan Koco\'n, Przemys{\l}aw Kazienko, Maciej Zi\k{e}ba

(参考訳) 自然言語処理(NLP)における主観的問題に対する予測モデルの設計は依然として困難である。これは主に、その非決定論的性質と、異なる人間の内容に対する異なる認識によるものである。これはパーソナライズされた自然言語処理(pnlp)によって解決される可能性があり、モデルでは読み手に関する追加情報を利用してより正確な予測を行う。しかし、現在のアプローチでは、受信者の完全な情報を直接埋め込む必要がある。さらに、近年の手法は、確率の決定論的推測や単純な周波数に基づく推定に焦点を当てている。本研究では,条件付き正規化フローを用いて予測の不確かさを捉える新しい手法を提案することにより,この制限を克服する。これにより、複雑なマルチモーダル分布をモデル化し、負の対数類似度(NLL)を用いて様々なモデルを比較することができる。さらに、新しいソリューションでは、利用可能なサンプリング機能のおかげで、読者認識の様々な解釈が可能になる。感情認識やヘイトスピーチを含む3つの主観的nlp課題について検証を行った。一般化およびパーソナライズされたアプローチの比較分析により、我々のパーソナライズされたソリューションはベースラインを著しく上回り、より正確な不確実性推定を提供することがわかった。テキストの解釈可能性と不確実性の研究にも影響がある。開発した手法によって得られた情報により、従来のソリューションを超える効果を持つハイブリッドモデルを構築することができる。また,アノテーションとアノテーションを混同したアノテータのエントロピーが高いテキストに対して,与えられた決定の確率分析と可視化を行った。

Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the reader to make more accurate predictions. However, current approaches require complete information about the recipients to be straight embedded. Besides, the recent methods focus on deterministic inference or simple frequency-based estimations of the probabilities. In this work, we overcome this limitation by proposing a novel approach to capture the uncertainty of the forecast using conditional Normalizing Flows. This allows us to model complex multimodal distributions and to compare various models using negative log-likelihood (NLL). In addition, the new solution allows for various interpretations of possible reader perception thanks to the available sampling function. We validated our method on three challenging, subjective NLP tasks, including emotion recognition and hate speech. The comparative analysis of generalized and personalized approaches revealed that our personalized solutions significantly outperform the baseline and provide more precise uncertainty estimates. The impact on the text interpretability and uncertainty studies are presented as well. The information brought by the developed methods makes it possible to build hybrid models whose effectiveness surpasses classic solutions. In addition, an analysis and visualization of the probabilities of the given decisions for texts with high entropy of annotations and annotators with mixed views were carried out.

翻訳日:2023-12-12 17:29:56 公開日:2023-12-10

# モデル開発におけるモデル説明の有用性評価

Evaluating the Utility of Model Explanations for Model Development ( http://arxiv.org/abs/2312.06032v1 )

ライセンス: Link先を確認

Shawn Im, Jacob Andreas, Yilun Zhou

(参考訳) 説明可能なAIのモチベーションのひとつは、AIモデルの使用とデプロイに関して、人間がより良く、より情報的な決定を行えるようにすることです。しかし、この期待が達成されたかどうかを評価するには慎重な評価が必要である。現在の評価は、主に説明のアルゴリズム的性質に焦点を当てており、対象者を含むものは、客観的な測定値や測定値に基かずに、説明の有用性に対する人間の知覚をテストするために主観的な質問をしばしば採用している。本研究では,機械学習モデル開発の実践シナリオにおいて,説明が人間の意思決定を改善できるかどうかを評価する。 smoothgrad, gradcam, oracle による2つのタスク - モデル選択と反事実シミュレーション - によって生成された給与マップを評価するために,画像データを含む混合手法のユーザ調査を行った。驚いたことに、サリエンシマップのいずれかがユーザによって提供されたとき、これらのタスクが大幅に改善されたという証拠は見つからなかった。それでも、説明はユーザーがモデルをより正確に記述するのに役立った。これらの結果は, 塩分に基づく説明における誤解の有用性と可能性について注意を喚起する。

One of the motivations for explainable AI is to allow humans to make better and more informed decisions regarding the use and deployment of AI models. But careful evaluations are needed to assess whether this expectation has been fulfilled. Current evaluations mainly focus on algorithmic properties of explanations, and those that involve human subjects often employ subjective questions to test human's perception of explanation usefulness, without being grounded in objective metrics and measurements. In this work, we evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. We conduct a mixed-methods user study involving image data to evaluate saliency maps generated by SmoothGrad, GradCAM, and an oracle explanation on two tasks: model selection and counterfactual simulation. To our surprise, we did not find evidence of significant improvement on these tasks when users were provided with any of the saliency maps, even the synthetic oracle explanation designed to be simple to understand and highly indicative of the answer. Nonetheless, explanations did help users more accurately describe the models. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.

翻訳日:2023-12-12 17:29:34 公開日:2023-12-10

# 大規模時系列データセットの高速分類

Fast Classification of Large Time Series Datasets ( http://arxiv.org/abs/2312.06029v1 )

ライセンス: Link先を確認

Muhammad Marwan Muhammad Fuad

(参考訳) 時系列分類(TSC)は、医学、気象学、ファイナンスサイバーセキュリティなど多くの分野で応用されているため、時系列マイニングにおいて最も輸入されたタスクである。時系列データセットのサイズがますます大きくなる中、伝統的なtscメソッドのいくつかは、そのような非常に大きなデータセットでこのタスクを実行するのに十分な効率がなくなった。しかし、tscに関する最近の論文では、深層学習(例えば、膨大なデータセットに効率的に適用できない膨大な計算リソースを必要とする)を適用する手法を用いて、精度に重点が置かれている。本論文で紹介する手法は、効率性が主な目的である超大規模時系列データセットに焦点をあてる。時系列の簡易表現によってこれを達成します。これは、表される時系列の値のいくつかしか考慮しない距離測度によって拡張される。この組み合わせの結果は、tscの非常に効率的な表現方法である。これは、その効率で特に人気のある別の時系列法に対して実験的にテストされている。実験の結果,本手法は平均4倍高速であるだけでなく,29件の時系列データセットのうち24件においてより優れた結果が得られるため,分類精度も優れていることがわかった。 .

Time series classification (TSC) is the most import task in time series mining as it has several applications in medicine, meteorology, finance cyber security, and many others. With the ever increasing size of time series datasets, several traditional TSC methods are no longer efficient enough to perform this task on such very large datasets. Yet, most recent papers on TSC focus mainly on accuracy by using methods that apply deep learning, for instance, which require extensive computational resources that cannot be applied efficiently to very large datasets. The method we introduce in this paper focuses on these very large time series datasets with the main objective being efficiency. We achieve this through a simplified representation of the time series. This in turn is enhanced by a distance measure that considers only some of the values of the represented time series. The result of this combination is a very efficient representation method for TSC. This has been tested experimentally against another time series method that is particularly popular for its efficiency. The experiments show that our method is not only 4 times faster, on average, but it is also superior in terms of classification accuracy, as it gives better results on 24 out of the 29 tested time series datasets. .

翻訳日:2023-12-12 17:29:12 公開日:2023-12-10

# 仮想現実を用いた注意力トレーニングによるストレス管理

Stress Management Using Virtual Reality-Based Attention Training ( http://arxiv.org/abs/2312.06025v1 )

ライセンス: Link先を確認

Rojaina Mahmoud, Mona Mamdouh, Omneya Attallah, Ahmad Al-Kabbany

(参考訳) 本研究では,ストレス管理のためのツールとしてのバーチャルリアリティに基づく注意訓練の適用性について考察する。メンタルストレスは世界の課題であり、完全に管理されるには程遠い。これにより、ストレスの検出と管理のためのツールの開発と検証に注目すべき研究が続けられている。テクノロジーベースのツールは、仮想現実(VR)技術など、これらの取り組みの中心にある。とはいえ、vrの可能性の大部分は、そのような技術によって消費されるコンテンツの性質にある。本研究では,VRによるストレス管理の実現可能性に及ぼす特別タイプのコンテンツ,すなわちアテンショントレーニングの影響について検討した。大学生14名を対象に,脳波信号が記録されている間にストレス誘発器に2回露出する実験を行った。最初のイテレーションでは、ストレスタスクを開始する前にVRベースのアテンショントレーニングが行われた。複数の特徴と様々な機械学習モデルを用いて、VRベースの注意訓練が、記録された脳波信号における認識されたストレスインスタンス数を一貫して減少させることを示した。この研究はストレス管理のためのvrベースの注意トレーニングの導入に関する予備的な洞察を与え、その結果をより大きなサンプルで再現するために将来の研究が必要である。

In this research, we are concerned with the applicability of virtual reality-based attention training as a tool for stress management. Mental stress is a worldwide challenge that is still far from being fully managed. This has maintained a remarkable research attention on developing and validating tools for detecting and managing stress. Technology-based tools have been at the heart of these endeavors, including virtual reality (VR) technology. Nevertheless, the potential of VR lies, to a large part, in the nature of the content being consumed through such technology. In this study, we investigate the impact of a special type of content, namely, attention training, on the feasibility of using VR for stress management. On a group of fourteen undergraduate engineering students, we conducted a study in which the participants got exposed twice to a stress inducer while their EEG signals were being recorded. The first iteration involved VR-based attention training before starting the stress task while the second time did not. Using multiple features and various machine learning models, we show that VR-based attention training has consistently resulted in reducing the number of recognized stress instances in the recorded EEG signals. This research gives preliminary insights on adopting VR-based attention training for managing stress, and future studies are required to replicate the results in larger samples.

翻訳日:2023-12-12 17:28:55 公開日:2023-12-10

# 抽象テキスト要約におけるデータ蒸留における表現バイアスの活用

Exploiting Representation Bias for Data Distillation in Abstractive Text Summarization ( http://arxiv.org/abs/2312.06022v1 )

ライセンス: Link先を確認

Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty

(参考訳) 抽象的なテキスト要約は、ディープラーニングモデルのニーズを満たすためのトレーニングサンプルの数とともに増えている。これらのモデルは、訓練データ表現を利用して、結果要約の定量的要素を改善することにより、優れた性能を得る傾向がある。しかしながら、トレーニングセットのサイズを増やすことは、常にパフォーマンスを最大化するための理想的なソリューションであるとは限らないため、トレーニングサンプルの品質とディープラーニングモデルの学習プロトコルを再検討する必要がある。本稿では,入力埋め込み空間とモデルエンコーダ空間の間の特性を理解するために,抽象的テキスト要約モデルのベクトル空間を離散化することを目的とする。深いモデルでは入力空間の多様性を捉えられていないことを示す。さらに、エンコーダ空間におけるデータポイントの分布は、トレーニングサンプルの未チェック増加が付加価値をもたらさないことを示している。我々は、モデルのサンプル空間の多様性と、埋め込み空間からエンコーダ空間へのデータポイントのマッピング方法を学ぶためにクラスタリング技術を採用している。さらに,冗長なデータポイントをフィルタしてモデルをより堅牢かつ少ないデータ空腹にするために,メトリクスを考案する。本稿では, BERTScore, FEQA, ピラミドスコアなどの定量値と定性値を用いて, 提案手法のベンチマークを行った。また、モデルが様々な入力サンプルから多様性を学ぶことを妨げる理由を定量化する。

Abstractive text summarization is surging with the number of training samples to cater to the needs of the deep learning models. These models tend to exploit the training data representations to attain superior performance by improving the quantitative element of the resultant summary. However, increasing the size of the training set may not always be the ideal solution to maximize the performance, and therefore, a need to revisit the quality of training samples and the learning protocol of deep learning models is a must. In this paper, we aim to discretize the vector space of the abstractive text summarization models to understand the characteristics learned between the input embedding space and the models' encoder space. We show that deep models fail to capture the diversity of the input space. Further, the distribution of data points on the encoder space indicates that an unchecked increase in the training samples does not add value; rather, a tear-down of data samples is highly needed to make the models focus on variability and faithfulness. We employ clustering techniques to learn the diversity of a model's sample space and how data points are mapped from the embedding space to the encoder space and vice versa. Further, we devise a metric to filter out redundant data points to make the model more robust and less data hungry. We benchmark our proposed method using quantitative metrics, such as Rouge, and qualitative metrics, such as BERTScore, FEQA and Pyramid score. We also quantify the reasons that inhibit the models from learning the diversity from the varied input samples.

翻訳日:2023-12-12 17:28:36 公開日:2023-12-10

# GenDepth: 平面埋め込みによる任意カメラパラメータの単眼深度推定の一般化

GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding ( http://arxiv.org/abs/2312.06021v1 )

ライセンス: Link先を確認

Karlo Koledi\'c, Luka Petrovi\'c, Ivan Petrovi\'c, Ivan Markovi\'c

(参考訳) 学習に基づく単眼深度推定は、トレーニングデータに存在する幾何学的先行情報を利用して、1つの画像からメートル法的深度知覚を可能にする。しかし、これらの先入観は特定の領域に特有であり、見当たらないデータに対する限定的な一般化性能をもたらす。十分に研究された環境領域間隙とは別に、単眼深度推定は様々なカメラパラメータによって引き起こされる領域間隙にも敏感である。この問題は、データセットが単一車両とカメラのセットアップで一般的に収集される自律運転シナリオにおいて特に顕著であり、固定された視点幾何学によるトレーニングデータのバイアスにつながる。本稿では,この傾向に挑戦し,任意の車載カメラ装置の計量深度推定が可能な新しいモデルであるGenDepthを紹介する。十分な多様なカメラパラメータによるデータの欠如に対処するため、まず異なる車両カメラシステムで収集された合成データセットを作成する。そして、2つの目的を同時に最適化するGenDepthを設計する。 (i)合成データにおけるカメラパラメータ変動の等価性 2) 固定車載カメラシステムを用いた1つの実世界のデータセットを用いて, 学習した同値を実世界の環境特徴に伝達する。そこで本研究では,地平面深度にカメラパラメータを埋め込む新しい手法を提案し,これらの埋め込みを対向領域アライメントと統合するアーキテクチャを提案する。我々は、複数の自動運転データセットについてgendepthを検証し、異なる車載カメラシステムに対する最先端の一般化能力を示す。

Learning-based monocular depth estimation leverages geometric priors present in the training data to enable metric depth perception from a single image, a traditionally ill-posed problem. However, these priors are often specific to a particular domain, leading to limited generalization performance on unseen data. Apart from the well studied environmental domain gap, monocular depth estimation is also sensitive to the domain gap induced by varying camera parameters, an aspect that is often overlooked in current state-of-the-art approaches. This issue is particularly evident in autonomous driving scenarios, where datasets are typically collected with a single vehicle-camera setup, leading to a bias in the training data due to a fixed perspective geometry. In this paper, we challenge this trend and introduce GenDepth, a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups. To address the lack of data with sufficiently diverse camera parameters, we first create a bespoke synthetic dataset collected with different vehicle-camera systems. Then, we design GenDepth to simultaneously optimize two objectives: (i) equivariance to the camera parameter variations on synthetic data, (ii) transferring the learned equivariance to real-world environmental features using a single real-world dataset with a fixed vehicle-camera system. To achieve this, we propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment. We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.

翻訳日:2023-12-12 17:28:12 公開日:2023-12-10

# 1+1次元の2電子間の光子の相対論的量子力学について

On the relativistic quantum mechanics of a photon between two electrons in 1+1 dimensions ( http://arxiv.org/abs/2312.06019v1 )

ライセンス: Link先を確認

Lawrence Frolov, Samuel E. Leigh, and A. Shadi Tahvildar-Zadeh

(参考訳) 波動方程式のローレンツ共変系は、1つの空間次元の量子力学的3体系に対して定式化され、1つの光子と2つの同一の質量スピン1-ハーフディラック粒子からなる。すなわち、波動関数 $\Psi(\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2})$ where $\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2}$ はそれぞれ光子と2つの電子の一般的な時空イベントである。それらの相互作用は、偶然のサブマニフォールズ $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_1}\}$ および $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_2}\}$ におけるローレンツ不変のno-crossing-of-paths境界条件によって実装される。対応する初期境界値問題は、パウリの排他原理によって与えられる反対称性の仮定の下でうまく仮定され、クライン=ゴードンと輸送方程式の連結系に対する閉形式解が与えられる。

A Lorentz-covariant system of wave equations is formulated for a quantum-mechanical three-body system in one space dimension, comprised of one photon and two identical massive spin one-half Dirac particles, which can be thought of as two electrons (or alternatively, two positrons). Manifest covariance is achieved using Dirac's formalism of multi-time wave functions, i.e, wave functions $\Psi(\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2})$ where $\textbf{x}_{\text{ph}},\textbf{x}_{\text{e}_1},\textbf{x}_{\text{e}_2}$ are generic spacetime events of the photon and two electrons respectively. Their interaction is implemented via a Lorentz-invariant no-crossing-of-paths boundary condition at the coincidence submanifolds $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_1}\}$ and $\{\textbf{x}_{\text{ph}}=\textbf{x}_{\text{e}_2}\}$ compatible with conservation of probability current. The corresponding initial-boundary value problem is shown to be well-posed under the additional assumption of anti-symmetry given by the Pauli exclusion principle, and a closed-form solution to the ensuing coupled system of Klein-Gordon and transport equations is given.

翻訳日:2023-12-12 17:27:48 公開日:2023-12-10

# オートエンコーダとニューラルodeによるアストロケミカル反応ネットワークの高速化

Speeding up astrochemical reaction networks with autoencoders and neural ODEs ( http://arxiv.org/abs/2312.06015v1 )

ライセンス: Link先を確認

Immanuel Sulzer, Tobias Buck

(参考訳) 天体物理学において、複雑な化学反応ネットワークを解くことは必須であるが、ODEシステムの高次元性と剛性のために計算的に要求される。計算負荷を減らす伝統的なアプローチは、しばしば特定の化学ネットワークに特化しており、専門知識を必要とする。本稿では,次元減少のためのオートエンコーダと,アストロケミカル反応ネットワーク計算を高速化する潜在空間ニューラルODEソルバを用いた機械学習ソリューションを提案する。さらに,ニューラルネットワークの代替として,コスト効率の高い潜在空間線形関数解法を提案する。これらの方法は29の化学種と224の反応からなるデータセットで評価される。その結果,ニューラルodeはベースラインモデルに比べて55倍のスピードアップを達成し,相対誤差を最大2桁削減することで精度が向上した。さらに、線形潜在モデルは精度を高め、標準手法に比べて最大4000倍の高速化を実現する。

In astrophysics, solving complex chemical reaction networks is essential but computationally demanding due to the high dimensionality and stiffness of the ODE systems. Traditional approaches for reducing computational load are often specialized to specific chemical networks and require expert knowledge. This paper introduces a machine learning-based solution employing autoencoders for dimensionality reduction and a latent space neural ODE solver to accelerate astrochemical reaction network computations. Additionally, we propose a cost-effective latent space linear function solver as an alternative to neural ODEs. These methods are assessed on a dataset comprising 29 chemical species and 224 reactions. Our findings demonstrate that the neural ODE achieves a 55x speedup over the baseline model while maintaining significantly higher accuracy by up to two orders of magnitude reduction in relative error. Furthermore, the linear latent model enhances accuracy and achieves a speedup of up to 4000x compared to standard methods.

翻訳日:2023-12-12 17:26:59 公開日:2023-12-10

# guardians of trust: ベンダーパートナーシップによるaiopsのデータセキュリティのナビゲート

Guardians of Trust: Navigating Data Security in AIOps through Vendor Partnerships ( http://arxiv.org/abs/2312.06008v1 )

ライセンス: Link先を確認

Subhadip Kumar

(参考訳) AIOps(AI AI for IT Operations)は、ITオペレーションの自動化と最適化に人工知能と機械学習を適用する、急速に成長する分野である。 AIOpsベンダは、エンドツーエンドのログ、トレース、メトリクスを取り込み、ITシステムの完全なスタック可観測性を提供するサービスを提供している。しかし、これらのデータソースは、内部ipアドレス、ホスト名、httpヘッダ、sql、メソッド/パラメータの戻り値、url、個人識別情報(pii)、機密ビジネスデータなどの機密情報を含む可能性がある。したがって、aiopsベンダーと作業する場合、データセキュリティは重要な関心事である。この記事では、異なるベンダーが提供するセキュリティ機能と、データ保護とプライバシを確保するためにベストプラクティスをどのように適用できるかについて論じます。

Artificial Intelligence for IT Operations (AIOps) is a rapidly growing field that applies artificial intelligence and machine learning to automate and optimize IT operations. AIOps vendors provide services that ingest end-to-end logs, traces, and metrics to offer a full stack observability of IT systems. However, these data sources may contain sensitive information such as internal IP addresses, hostnames, HTTP headers, SQLs, method/argument return values, URLs, personal identifiable information (PII), or confidential business data. Therefore, data security is a crucial concern when working with AIOps vendors. In this article, we will discuss the security features offered by different vendors and how we can adopt best practices to ensure data protection and privacy.

翻訳日:2023-12-12 17:26:46 公開日:2023-12-10

# バグ修正プロセスに関するソフトウェア問題レポート:機械学習ライブラリに関する実証的研究

Software issues report for bug fixing process: An empirical study of machine-learning libraries ( http://arxiv.org/abs/2312.06005v1 )

ライセンス: Link先を確認

Adekunle Ajibode, Dong Yunwei, Yang Hongji

(参考訳) 問題解決とバグ修正プロセスは、よく最適化された機能を保証するために、ソフトウェア開発と同様の機械学習ライブラリの開発に不可欠である。機械学習ライブラリのイシュー解決とバグ修正プロセスを理解することで、開発者は改善すべき領域を特定し、イシュー解決とバグ修正のための戦略を最適化することができる。しかし、この話題に関する詳細な研究は乏しい。そこで我々は,6つの機械学習ライブラリ,Tensorflow,Keras,Theano,Pytorch,Caffe,Scikit-learnにおけるバグ修正プロセスの課題解決の有効性を検討した。 GitHub Rest APIを通じてGitHubリポジトリから抽出された16,921のイシューを使用して、7つのリサーチ質問(RQ)に対処しました。 rqs分析には相関, ols回帰, パーセンテージと周波数数, ヒートマップなど, データ分析の定量的な方法がいくつか用いられた。 1) マシンラーニングライブラリで発生した問題の最も一般的なカテゴリは、バグ、ドキュメンテーション、最適化、クラッシュ、拡張、新機能要求、ビルド/ci、サポート、パフォーマンスです。 2) 重要なバグの修正、パフォーマンスの最適化、ドキュメントの改善など、これらの問題を解決する効果的な戦略。 3) これらの分類問題はテストとランタイムに関連するもので,6つの機械学習ライブラリすべてに共通している。 (4) 問題に関するコメントの総数を監視することで、問題の期間に関する洞察が得られる。 (5)重要課題の優先順位付けと他の課題へのタイムリーな対処のバランスをとることが不可欠である。そこで本研究では,効率的な課題追跡プロセス,効果的なコミュニケーション,コラボレーションが,機械学習ライブラリの課題解決とバグフィックスの効果的な解決に不可欠であると結論づける。

Issue resolution and bug-fixing processes are essential in the development of machine-learning libraries, similar to software development, to ensure well-optimized functions. Understanding the issue resolution and bug-fixing process of machine-learning libraries can help developers identify areas for improvement and optimize their strategies for issue resolution and bug-fixing. However, detailed studies on this topic are lacking. Therefore, we investigated the effectiveness of issue resolution for bug-fixing processes in six machine-learning libraries: Tensorflow, Keras, Theano, Pytorch, Caffe, and Scikit-learn. We addressed seven research questions (RQs) using 16,921 issues extracted from the GitHub repository via the GitHub Rest API. We employed several quantitative methods of data analysis, including correlation, OLS regression, percentage and frequency count, and heatmap to analyze the RQs. We found the following through our empirical investigation: (1) The most common categories of issues that arise in machine-learning libraries are bugs, documentation, optimization, crashes, enhancement, new feature requests, build/CI, support, and performance. (2) Effective strategies for addressing these problems include fixing critical bugs, optimizing performance, and improving documentation. (3) These categorized issues are related to testing and runtime and are common among all six machine-learning libraries. (4) Monitoring the total number of comments on issues can provide insights into the duration of the issues. (5) It is crucial to strike a balance between prioritizing critical issues and addressing other issues in a timely manner. Therefore, this study concludes that efficient issue-tracking processes, effective communication, and collaboration are vital for effective resolution of issues and bug fixing processes in machine-learning libraries.

翻訳日:2023-12-12 17:26:33 公開日:2023-12-10

# 語彙意味変化検出における大規模言語モデルの評価

Large Language Models on Lexical Semantic Change Detection: An Evaluation ( http://arxiv.org/abs/2312.06002v1 )

ライセンス: Link先を確認

Ruiyu Wang, Matthew Choi

(参考訳) Lexical Semantic Change Detectionは、Large Language Models (LLM)が広く関与していない数少ない領域の1つである。 PPMIやSGNSといった従来の手法は、新しいBERTベースのアプローチとともに研究で広く使われている。 LLMによって様々な自然言語処理領域が包括的にカバーされているにもかかわらず、この特定の領域におけるそれらの適用に関する文献は顕著に乏しい。本研究では,LLMをLexical Semantic Change Detectionの領域に導入することで,このギャップを埋めようとしている。本研究は,3世代にわたる言語モデルにまたがる新しいプロンプトソリューションと包括的評価を提示し,本研究領域におけるLLMの探索に寄与する。

Lexical Semantic Change Detection stands out as one of the few areas where Large Language Models (LLMs) have not been extensively involved. Traditional methods like PPMI, and SGNS remain prevalent in research, alongside newer BERT-based approaches. Despite the comprehensive coverage of various natural language processing domains by LLMs, there is a notable scarcity of literature concerning their application in this specific realm. In this work, we seek to bridge this gap by introducing LLMs into the domain of Lexical Semantic Change Detection. Our work presents novel prompting solutions and a comprehensive evaluation that spans all three generations of language models, contributing to the exploration of LLMs in this research area.

翻訳日:2023-12-12 17:26:02 公開日:2023-12-10

# 対応から詩へ:曖昧さのない最小限の最適相対詩

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation ( http://arxiv.org/abs/2312.05995v1 )

ライセンス: Link先を確認

Javier Tirado-Gar\'in and Javier Civera

(参考訳) 2つのキャリブレーションビュー間の$n \geq 5$対応による相対カメラポーズの推定は、コンピュータビジョンの基本的なタスクである。この過程には一般的に2つの段階がある。 1)ビューとビューの間に必要不可欠な行列を推定する 2) エピポーラ幾何を満たす4つの候補関係ポーズのうち曖昧さを解消する。本稿では,第2段階をバイパスする新たなアプローチを提案する。具体的には,適切な相対カメラのポーズを処理後ステップを必要とせず,直接対応から推定することが可能であることを示す。証明不能な非最小最適化の最近の進歩に基づいて、擬似制約付き擬似プログラム(QCQP)として相対的なポーズ推定を行う。適切な制約を適用することで,有効な3次元形状に対応するカメラのポーズを推定し,認証時に世界規模で最適とする。提案手法の有効性, 有効性, 精度を検証し, 総合的な合成および実世界の実験による検証を行った。コードはhttps://github.com/javrtg/C2Pで入手できる。

Estimating the relative camera pose from $n \geq 5$ correspondences between two calibrated views is a fundamental task in computer vision. This process typically involves two stages: 1) estimating the essential matrix between the views, and 2) disambiguating among the four candidate relative poses that satisfy the epipolar geometry. In this paper, we demonstrate a novel approach that, for the first time, bypasses the second stage. Specifically, we show that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences. Building on recent advances in certifiable non-minimal optimization, we frame the relative pose estimation as a Quadratically Constrained Quadratic Program (QCQP). By applying the appropriate constraints, we ensure the estimation of a camera pose that corresponds to a valid 3D geometry and that is globally optimal when certified. We validate our method through exhaustive synthetic and real-world experiments, confirming the efficacy, efficiency and accuracy of the proposed approach. Code is available at https://github.com/javrtg/C2P.

翻訳日:2023-12-12 17:25:49 公開日:2023-12-10

# Fast Part: スパース最適化のための過パラメータ確率勾配勾配

FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures ( http://arxiv.org/abs/2312.05993v1 )

ライセンス: Link先を確認

Yohann De Castro, S\'ebastien Gadat, Cl\'ement Marteau

(参考訳) 本稿では,確率的勾配降下戦略とランダムな特徴を併用して,測度の偏最適化問題を解くために特別に調整されたcpgdのスケーラビリティを向上させる新しいアルゴリズムを提案する。変分フレームワーク内でCPGDステップを定式化することにより、以下の重要な結果を示す厳密な数学的証明を提供する。一降下軌道に沿った解測度の総変動ノルムが有界であり、安定を確保し、望ましくない発散を防止すること。 (ii)$\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ の収束率で大域収束保証を確立し、アルゴリズムの有効性と有効性を示す。 (iii)さらに,一階条件の不一致に対する局所制御を解析・確立し,実用的応用におけるアルゴリズムの挙動と信頼性の理解を深めた。

This paper presents a novel algorithm that leverages Stochastic Gradient Descent strategies in conjunction with Random Features to augment the scalability of Conic Particle Gradient Descent (CPGD) specifically tailored for solving sparse optimisation problems on measures. By formulating the CPGD steps within a variational framework, we provide rigorous mathematical proofs demonstrating the following key findings: (i) The total variation norms of the solution measures along the descent trajectory remain bounded, ensuring stability and preventing undesirable divergence; (ii) We establish a global convergence guarantee with a convergence rate of $\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ iterations, showcasing the efficiency and effectiveness of our algorithm; (iii) Additionally, we analyze and establish local control over the first-order condition discrepancy, contributing to a deeper understanding of the algorithm's behavior and reliability in practical applications.

翻訳日:2023-12-12 17:25:28 公開日:2023-12-10

# 再サンプリングによる拡散生成の補正

Correcting Diffusion Generation through Resampling ( http://arxiv.org/abs/2312.06038v1 )

ライセンス: Link先を確認

Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

(参考訳) 拡散モデルが複雑な分布をモデル化する能力は優れているが、生成画像と基底画像の間には、まだ自明な分布の相違があり、画像生成において、テキスト対画像生成におけるオブジェクトエラーの欠如や画像品質の低下など、いくつかの顕著な問題を引き起こしている。これらの問題に対処しようとする既存の手法は、分布的不一致であるこれらの問題の背後にある根本的な原因に対処しがちであり、従って最適以下の結果を得る。本稿では,分散の相違を明示的に低減し,両問題を効果的に解決できる粒子フィルタリングフレームワークを提案する。具体的には,実画像と事前学習対象物検出装置のセットを含む外部ガイダンスのセットを用いて,分布ギャップを計測し,そのギャップを補正するために再サンプリング重量を設計する。実験の結果,提案手法はオブジェクトの誤りを効果的に修正し,画像生成タスクの画質を向上させることができることがわかった。特に,ms-coco 上では,既存の最強ベースラインを5%,fid を 1.0 と上回っている。私たちのコードはhttps://github.com/UCSB-NLP-Chang/diffusion_resampling.gitで公開されています。

Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do not tend to address the fundamental cause behind these problems, which is the distributional discrepancies, and hence achieve sub-optimal results. In this paper, we propose a particle filtering framework that can effectively address both problems by explicitly reducing the distributional discrepancies. Specifically, our method relies on a set of external guidance, including a small set of real images and a pre-trained object detector, to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap. Experiments show that our methods can effectively correct missing object errors and improve image quality in various image generation tasks. Notably, our method outperforms the existing strongest baseline by 5% in object occurrence and 1.0 in FID on MS-COCO. Our code is publicly available at https://github.com/UCSB-NLP-Chang/diffusion_resampling.git.

翻訳日:2023-12-12 17:14:24 公開日:2023-12-10

# コード大言語モデルにおけるトロイの木馬入力のオクルージョンに基づく検出

Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code ( http://arxiv.org/abs/2312.04004v2 )

ライセンス: Link先を確認

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Mohammad Amin Alipour, Bowen Xu

(参考訳) 大規模言語モデル(LLM)はソフトウェア開発の一体的な部分になりつつある。これらのモデルは、コードのために大きなデータセットでトレーニングされ、各データポイントの検証が難しい。したがって、潜在的攻撃面は、有毒データをトレーニングデータに注入してモデルに脆弱性を持たせることができる。モデル内にマニピュレーション的な振る舞いを隠すことで重大な脅威をもたらし、ダウンストリームタスクにおけるモデルの整合性を損なうことになる。本稿では,コードのトロイの木馬入力を識別するためのオクルージョンに基づくヒューマン・イン・ザ・ループ手法であるoseqlを提案する。この手法は、コードのトロイの木馬型ニューラルモデルが入力のトリガー部分に大きく依存しているという観察に基づいており、その除去によって予測におけるモデルの信頼性が大幅に変化する。以上の結果から,OSeqlは,ほぼ100%のリコールでトリガ入力を検出できることが示唆された。我々は偽陽性の問題と対処方法について議論する。これらの結果は今後の研究の基盤となる。

Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, leading to compromising the integrity of the models in downstream tasks. In this paper, we propose an occlusion-based human-in-the-loop technique, OSeql, to distinguish trojan-triggering inputs of code. The technique is based on the observation that trojaned neural models of code rely heavily on the triggering part of input; hence, its removal would change the confidence of the models in their prediction substantially. Our results suggest that OSeql can detect the triggering inputs with almost 100% recall. We discuss the problem of false positives and how to address them. These results provide a baseline for future studies in this field.

翻訳日:2023-12-12 12:22:59 公開日:2023-12-10

PDF登録状況（公開日: 20231210）