Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231111となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# BlockEmulator: ブロックチェーンシャーディングプロトコルをテストするエミュレータ BlockEmulator: An Emulator Enabling to Test Blockchain Sharding Protocols ( http://arxiv.org/abs/2311.03612v2 ) ライセンス: Link先を確認	Huawei Huang, Guang Ye, Qinde Chen, Zhaokang Yin, Xiaofei Luo, Jianru Lin, Taotao Li, Qinglin Yang, Zibin Zheng,	(参考訳) 研究者が主流ブロックチェーンをシミュレートできるように、多くのブロックチェーンシミュレータが提案されている。しかし、ブロックチェーンシャーディングシステムのための新しいコンセンサスアルゴリズムや新しいプロトコルの開発と評価を可能にするテストベッドはまだ見つかっていない。このギャップを埋めるために、実験的なプラットフォームとして設計されたBlockEmulatorを開発し、特にブロックチェーンシャーディングメカニズムをエミュレートする。 BlockEmulatorは、開発者が新しいプロトコルやメカニズムの実装のみに集中できるように、軽量なブロックチェーンアーキテクチャを採用する。レイヤ化されたモジュールとBlockEmulatorが提供する有用なプログラミングインターフェースを使うことで、研究者は最小限の労力で新しいプロトコルを実装できる。実験を通じてBlockEmulatorの様々な機能を2つのステップでテストする。まず,BlockEmulatorによるエミュレーション結果の正当性を,理論解析と実験結果との比較により検証した。第二に、BlockEmulatorがスループット、トランザクション確認レイテンシ、クロスシャードトランザクション比率、トランザクションプールのキューサイズ、ブロックチェーンシャード間のワークロード分散など、一連のメトリクスの測定を容易にすることを示しています。 GithubでBlockEmulatorをオープンソース化しました。 Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating blockchain sharding mechanisms. BlockEmulator adopts a lightweight blockchain architecture such that developers can only focus on implementing their new protocols or mechanisms. Using layered modules and useful programming interfaces offered by BlockEmulator, researchers can implement a new protocol with minimum effort. Through experiments, we test various functionalities of BlockEmulator in two steps. Firstly, we prove the correctness of the emulation results yielded by BlockEmulator by comparing the theoretical analysis with the observed experiment results. Secondly, other experimental results demonstrate that BlockEmulator can facilitate the measurement of a series of metrics, including throughput, transaction confirmation latency, cross-shard transaction ratio, the queuing size of transaction pools, workload distribution across blockchain shards, etc. We have made BlockEmulator open-source in Github.	翻訳日:2024-03-25 13:36:10 公開日:2023-11-11
# クラウドソーシング画像サービスに対する変更内容の決定 Determining Intent of Changes to Ascertain Fake Crowdsourced Image Services ( http://arxiv.org/abs/2403.12045v1 ) ライセンス: Link先を確認	Muhammad Umair, Athman Bouguettaya, Abdallah Lakhdari,	(参考訳) 画像が偽物である可能性を判定するクラウドソース画像のための新しいフレームワークを提案する。我々は、ソーシャルメディアにアップロードされたクラウドソースイメージを、画像サービスとしてモデル化し、表現するために、サービス指向のアプローチを採用している。信頼は、ある状況において、画像サービスの非機能属性、すなわち画像メタデータのみを使用することで決定することができる。我々は、変更の意図を、偽画像サービスを確認するための重要なパラメータとして定義する。画像のセマンティクスの変化を考慮した基礎的変化の意図を推定する新しい枠組みを提案する。実験では,大規模な実データを用いた高精度な実験を行った。 We propose a novel framework for crowdsourced images to determine the likelihood of an image being fake. We use a service-oriented approach to model and represent crowdsourced images uploaded on social media, as image services. Trust may, in some circumstances, be determined by using only the non-functional attributes of an image service, i.e., image metadata. We define intention of changes as a key parameter to ascertain fake image services. A novel framework is proposed to estimate the intention of underlying changes considering change in semantics of an image. Our experiments show high accuracy using a large real dataset.	翻訳日:2024-03-25 07:56:27 公開日:2023-11-11
# ワイヤレスインジェクション攻撃を検知するフェデレートラーニングベースのプロトタイプ「Seeing is Believing」 Seeing is Believing: A Federated Learning Based Prototype to Detect Wireless Injection Attacks ( http://arxiv.org/abs/2311.06564v1 ) ライセンス: Link先を確認	Aadil Hussain, Nitheesh Gundapu, Sarang Drugkar, Suraj Kiran, J. Harshan, Ranjitha Prasad,	(参考訳) リアクティブ・インジェクション・アタック(Reactive Injection attack)は、無線ネットワークにおけるセキュリティ上の脅威の一種で、敵がクライアントの周波数帯域にスプーフィングパケットを同時に注入することで、ベースステーションに偽装検出方法の展開を強制する。このような脅威を回避するために、我々は、ベースステーションがベースバンド内の機械学習モデル(ML)を配置し、攻撃検出のためにベースバンドに二次的なサンプルを配置できるように、シークレットキーベースの物理層信号処理手法をクライアントに実装する。 Adalm Pluto ベースのソフトウェア定義無線を用いて秘密鍵ベースのシグナリング手法を実装し,基地局でロバストMLモデルを設計可能であることを示す。しかし、実際には、ベースステーションでのトレーニングデータセットの入手が不十分なため、これらの手法を効果的に利用できないことが指摘されている。これにより、クライアントをリアクティブなインジェクション脅威から保護する必要のあるベースステーションのグループは、データセットのプライバシを確保することで、MLモデルを洗練するために協力します。バックホールネットワークを実装するために,XBee機器のネットワークを用いて,フェデレート学習装置の実験結果から,検出精度が大幅に向上し,無線セキュリティが6Gネットワーク以降におけるフェデレーション学習の優れたユースケースとして提示される。 Reactive injection attacks are a class of security threats in wireless networks wherein adversaries opportunistically inject spoofing packets in the frequency band of a client thereby forcing the base-station to deploy impersonation-detection methods. Towards circumventing such threats, we implement secret-key based physical-layer signalling methods at the clients which allow the base-stations to deploy machine learning (ML) models on their in-phase and quadrature samples at the baseband for attack detection. Using Adalm Pluto based software defined radios to implement the secret-key based signalling methods, we show that robust ML models can be designed at the base-stations. However, we also point out that, in practice, insufficient availability of training datasets at the base-stations can make these methods ineffective. Thus, we use a federated learning framework in the backhaul network, wherein a group of base-stations that need to protect their clients against reactive injection threats collaborate to refine their ML models by ensuring privacy on their datasets. Using a network of XBee devices to implement the backhaul network, experimental results on our federated learning setup shows significant enhancements in the detection accuracy, thus presenting wireless security as an excellent use-case for federated learning in 6G networks and beyond.	翻訳日:2024-03-18 23:32:03 公開日:2023-11-11
# 有限体上の列の言葉線形複素性と写像の局所反転 Word Linear Complexity of sequences and Local Inversion of maps over finite fields ( http://arxiv.org/abs/2311.06574v1 ) ライセンス: Link先を確認	Virendra Sule,	(参考訳) 本稿では、有限体上のベクトル値列の「emph{Word Linear Complexity}」(WLC$)の概念を、列とそのアンサンブルの拡張として展開する。この複雑性の概念は、アンサンブル(ベクトル値)列の最小多項式の概念を行列最小多項式の最小多項式に拡張し、行列最小多項式が周期型であるときの方程式のユニークな局所的逆$x$を解くために与えられた$y$ in $\ff^n$において、写像$F:\ff^n\rightarrow\ff^n$で反復的に生成されたベクトル値列で使用できることを示す。反復列が周期的であるときの有限体における写像の局所的逆問題とそのクリプトアナリシスの様々な問題への応用のアイデアは、よく知られた$LC$という概念を用いて、以前の論文 \cite{sule322, sule521, sule722,suleCAM22} で展開されている。 $LC$ は、関連する列の最小多項式の次数である。 $LC$ から $WLC$ への一般化は、ワード指向の反復関係が $LC$ の定義で考慮されるスカラー乗法の代わりに行列ベクトル乗法によって得られるようなベクトル値(または単語指向)列を考える。したがって、関連する最小多項式は、次数が$WLC$と呼ばれる行列値である。単語指向反復関係に関連する非自明な行列多項式が周期的であるときに条件が導出される。行列最小多項式が存在するとき、$n(WLC)=LC$である。最後に、そのようなポリノメールが存在する場合、行列最小多項式を用いて局所反転問題を解くことにより、局所反転に対する単語指向のアプローチが導かれることを示す。 This paper develops the notion of \emph{Word Linear Complexity} ($WLC$) of vector valued sequences over finite fields $\ff$ as an extension of Linear Complexity ($LC$) of sequences and their ensembles. This notion of complexity extends the concept of the minimal polynomial of an ensemble (vector valued) sequence to that of a matrix minimal polynomial and shows that the matrix minimal polynomial can be used with iteratively generated vector valued sequences by maps $F:\ff^n\rightarrow\ff^n$ at a given $y$ in $\ff^n$ for solving the unique local inverse $x$ of the equation $y=F(x)$ when the sequence is periodic. The idea of solving a local inverse of a map in finite fields when the iterative sequence is periodic and its application to various problems of Cryptanalysis is developed in previous papers \cite{sule322, sule521, sule722,suleCAM22} using the well known notion of $LC$ of sequences. $LC$ is the degree of the associated minimal polynomial of the sequence. The generalization of $LC$ to $WLC$ considers vector valued (or word oriented) sequences such that the word oriented recurrence relation is obtained by matrix vector multiplication instead of scalar multiplication as considered in the definition of $LC$. Hence the associated minimal polynomial is matrix valued whose degree is called $WLC$. A condition is derived when a nontrivial matrix polynomial associated with the word oriented recurrence relation exists when the sequence is periodic. It is shown that when the matrix minimal polynomial exists $n(WLC)=LC$. Finally it is shown that the local inversion problem is solved using the matrix minimal polynomial when such a polynomail exists hence leads to a word oriented approach to local inversion.	翻訳日:2024-03-18 23:32:03 公開日:2023-11-11
# 文法的ジェンダーを見過ごさない:ヒンディー語-英語機械翻訳におけるバイアス評価 Don't Overlook the Grammatical Gender: Bias Evaluation for Hindi-English Machine Translation ( http://arxiv.org/abs/2312.03710v1 ) ライセンス: Link先を確認	Pushpdeep Singh	(参考訳) ニューラル機械翻訳(NMT)モデルは、翻訳の最先端であるが、しばしば社会的バイアス、特にジェンダーバイアスを反映する。既存の評価ベンチマークは主に翻訳の言語としての英語に焦点を当てている。英語以外のソース言語では、研究はしばしばバイアス評価のために性中立の文を用いるが、現実世界の文は、しばしば異なる形態の性別情報を含んでいる。したがって、そのようなソース文を用いてバイアスを評価することで、nmtモデルが偏りのある関係に頼るのではなく、文法的な性別の手がかりから性別を識別できるかどうかを判断する方が理にかなっている。これを説明するために、ヒンディー語に2つの性特化文セットを作成し、ヒンディー語(HI-EN)NMTシステムにおいて、ジェンダーバイアスを自動的に評価する。ソース言語における文法的ジェンダーマーカーを考慮したバイアス評価テストセットの調整の重要性を強調する。 Neural Machine Translation (NMT) models, though state-of-the-art for translation, often reflect social biases, particularly gender bias. Existing evaluation benchmarks primarily focus on English as the source language of translation. For source languages other than English, studies often employ gender-neutral sentences for bias evaluation, whereas real-world sentences frequently contain gender information in different forms. Therefore, it makes more sense to evaluate for bias using such source sentences to determine if NMT models can discern gender from the grammatical gender cues rather than relying on biased associations. To illustrate this, we create two gender-specific sentence sets in Hindi to automatically evaluate gender bias in various Hindi-English (HI-EN) NMT systems. We emphasise the significance of tailoring bias evaluation test sets to account for grammatical gender markers in the source language.	翻訳日:2023-12-11 03:20:32 公開日:2023-11-11
# ユーゴス・ジダンアイの「モデル」 : ウトリザンド PBL による多学際的考察 An\'alise e modelagem de jogos digitais: Relato de uma experi\^encia educacional utlizando PBL em um grupo multidisciplinar ( http://arxiv.org/abs/2311.14704v1 ) ライセンス: Link先を確認	David de Oliveira Lemes, Ezequiel Fran\c{c}a dos Santos, Eduardo Romanek, Celso Fujimoto, Adriano Felix Valente	(参考訳) Traditional software engineering education generally emphasizes strict collaboration and technical skills However active teaching strategies where students actively engage with the material transitioning from passive observers to active manipulators of realworld tools have shown effectiveness in software engineering The evolving market demands new skills in the context of digital transformation presenting challenges such as modeling complex business scenarios and navigating the interconnections between people systems and technologies Shifting from conventional software engineering instruction to active methodologies like ProblemBased Learning PBL has proven to bring realworld market challenges and realities into the classroom This article details an experience from the Digital Games Analysis and Modeling course in the Digital Games Masters program at Pontifical Catholic University of Sao Paulo It covers the discussed concepts case study rolebased work method and steps of the meetings We also present examples of outcomes like requirement diagrams context diagrams use case diagrams class diagrams interviews and others that contributed to the Game Design Document GDD These were created by each group during the meetings alongside their game prototypes Additionally a discussion on the developed capabilities is included Traditional software engineering education generally emphasizes strict collaboration and technical skills However active teaching strategies where students actively engage with the material transitioning from passive observers to active manipulators of realworld tools have shown effectiveness in software engineering The evolving market demands new skills in the context of digital transformation presenting challenges such as modeling complex business scenarios and navigating the interconnections between people systems and technologies Shifting from conventional software engineering instruction to active methodologies like ProblemBased Learning PBL has proven to bring realworld market challenges and realities into the classroom This article details an experience from the Digital Games Analysis and Modeling course in the Digital Games Masters program at Pontifical Catholic University of Sao Paulo It covers the discussed concepts case study rolebased work method and steps of the meetings We also present examples of outcomes like requirement diagrams context diagrams use case diagrams class diagrams interviews and others that contributed to the Game Design Document GDD These were created by each group during the meetings alongside their game prototypes Additionally a discussion on the developed capabilities is included	翻訳日:2023-12-03 13:55:02 公開日:2023-11-11
# 医療におけるIoTの進歩と課題: 短いレビュー Progression and Challenges of IoT in Healthcare: A Short Review ( http://arxiv.org/abs/2311.12869v1 ) ライセンス: Link先を確認	S M Atikur Rahman, Sifat Ibtisum, Priya Podder, S. M. Saokat Hossain	(参考訳) スマートヘルスケアは、コネクテッドライフの不可欠な要素であり、人間の基本的なニーズを満たす上で重要な役割を果たす。スマートヘルスケアの急成長する分野は、近い将来、かなりの収入を生み出す可能性がある。その多面的フレームワークは、IoT(Internet of Things)、医療センサー、人工知能(AI)、エッジとクラウドコンピューティング、および次世代無線通信技術といった重要なコンポーネントを含んでいる。多くの研究論文がスマートヘルスケアとヘルスケアをより広く議論している。インターネット・オブ・メディカル・モノ(IoMT)は、新型コロナウイルス(COVID-19)の感染拡大対策として多くの国で戦略的に配備されている。この共同作業は、最前線の医療従事者の安全性を高めるだけでなく、パンデミックの管理における全体的な効果を高め、その後の人命と死亡率への影響を減らした。 iomtドメイン内のアプリケーションと技術の両方で顕著な進歩がなされている。しかし、この技術的進歩は、特にセキュリティの領域において、特定の課題を提起したと認めることが不可欠である。世界中のIoMTの急速な普及により、セキュリティとプライバシーに関する問題が拡大した。これらには、リプレイ攻撃、中間者攻撃、偽装、特権的なインサイダー脅威、リモートハイジャック、パスワード推測、dos攻撃、マルウェア侵入など、さまざまな懸念が含まれている。本稿では,IoT環境におけるマルウェアの検出と防止を目的とした既存の戦略の比較分析を行う。 Smart healthcare, an integral element of connected living, plays a pivotal role in fulfilling a fundamental human need. The burgeoning field of smart healthcare is poised to generate substantial revenue in the foreseeable future. Its multifaceted framework encompasses vital components such as the Internet of Things (IoT), medical sensors, artificial intelligence (AI), edge and cloud computing, as well as next-generation wireless communication technologies. Many research papers discuss smart healthcare and healthcare more broadly. Numerous nations have strategically deployed the Internet of Medical Things (IoMT) alongside other measures to combat the propagation of COVID-19. This combined effort has not only enhanced the safety of frontline healthcare workers but has also augmented the overall efficacy in managing the pandemic, subsequently reducing its impact on human lives and mortality rates. Remarkable strides have been made in both applications and technology within the IoMT domain. However, it is imperative to acknowledge that this technological advancement has introduced certain challenges, particularly in the realm of security. The rapid and extensive adoption of IoMT worldwide has magnified issues related to security and privacy. These encompass a spectrum of concerns, ranging from replay attacks, man-in-the-middle attacks, impersonation, privileged insider threats, remote hijacking, password guessing, and denial of service (DoS) attacks, to malware incursions. In this comprehensive review, we undertake a comparative analysis of existing strategies designed for the detection and prevention of malware in IoT environments.	翻訳日:2023-11-27 00:22:51 公開日:2023-11-11
# セルフ・アテンションによるモデリング選択 Modeling Choice via Self-Attention ( http://arxiv.org/abs/2311.07607v1 ) ライセンス: Link先を確認	Joohwan Ko, Andrew A. Li	(参考訳) 選択モデルは、ソート、インベントリ、価格最適化など、オペレーション管理の分野における現在カノニカルな多くの最適化問題に対する基本的なインプットである。当然、データからこれらのモデルの正確な推定は、実際、これらの最適化問題の適用において重要なステップであり、理論上、実際上、ほぼ排他的にこの選択が達成されなければならないことは、おそらく驚きである。 (a) 深い学習を有意義な方法で使わずに、 (b)常に変化する指標による限られたデータの評価による。これは、機械学習の実践が示唆している、類似の学習アプリケーションの大部分とは対照的である。 (a)ニューラルネットワークベースのモデルは一般的に最先端であり、 (b)評価手順(データセット、メトリクス等)の厳格な標準化が不可欠である。そこで,我々はまず,現代のニューラルネットワークアーキテクチャの概念(自己注意)を成功(理論的にも実用的にも)するための選択モデルを提案する。理論的には、我々の注意に基づく選択モデルは、不合理な選択効果をパロニティに捉え、経験的成功を収めたHalo Multinomial Logitモデルの低ランクな一般化であることを示す。我々はHalo-MNLが推定に$\Omega(m^2)$のデータサンプルを必要とするのに対し、$m$は製品数である。次に、実データに対する選択推定のための最初の現実的なベンチマークを確立し、このベンチマークを使用して、現在までの既存の選択モデルの最大評価を実行します。短期データと長期データの両方において,提案モデルが支配的であることがわかった。 Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice, and so it is perhaps surprising that such choice estimation has to now been accomplished almost exclusively, both in theory and in practice, (a) without the use of deep learning in any meaningful way, and (b) via evaluation on limited data with constantly-changing metrics. This is in stark contrast to the vast majority of similar learning applications, for which the practice of machine learning suggests that (a) neural network-based models are typically state-of-the-art, and (b) strict standardization on evaluation procedures (datasets, metrics, etc.) is crucial. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit model, a recent model that parsimoniously captures irrational choice effects and has seen empirical success. We prove that whereas the Halo-MNL requires $\Omega(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. We then establish the first realistic-scale benchmark for choice estimation on real data and use this benchmark to run the largest evaluation of existing choice models to date. We find that the model we propose is dominant over both short-term and long-term data periods.	翻訳日:2023-11-15 17:13:10 公開日:2023-11-11
# 大規模言語モデルに対する概念モデル解釈 Conceptual Model Interpreter for Large Language Models ( http://arxiv.org/abs/2311.07605v1 ) ライセンス: Link先を確認	Felix H\"arer	(参考訳) 大規模言語モデル(llms)は最近、共通プログラミング言語でソースコードを生成する機能を実証した。さらに、chatgpt 4のような商用製品がコードインタプリタを提供し始め、生成されたコードフラグメントの自動実行、インスタントフィードバック、会話的な方法で開発と洗練を可能にするようになった。本稿では,探索的研究手法を用いて,概念モデルにコード生成と解釈を適用する。概念モデルインタプリタのコンセプトとプロトタイプについて検討し,Llama~2やChatGPT4といった最先端のLLMを用いて,テキスト構文で生成した視覚モデルをレンダリングする。特に、これらのLLMは、会話型ユーザインタフェース内で自動的にレンダリングされるPlanUMLとGraphvizモデリングソフトウェアのためのテキスト構文を生成することができる。最初の成果は、インタプリタやLLMとの対話に必要なコンポーネントをAPIまたはローカルで記述したアーキテクチャで、多くの商用およびオープンソースのLLMとインタプリタをサポートする。次に, ChatGPT 4 と Llama 2 で生成されたモデルの実験結果について,UML をカバーする2つの事例と,インスタンスレベルではカスタムデータから生成されたグラフについて考察する。その結果,対話的手法で反復的にモデリングする可能性が示唆された。 Large Language Models (LLMs) recently demonstrated capabilities for generating source code in common programming languages. Additionally, commercial products such as ChatGPT 4 started to provide code interpreters, allowing for the automatic execution of generated code fragments, instant feedback, and the possibility to develop and refine in a conversational fashion. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models. The concept and prototype of a conceptual model interpreter is explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface. The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.	翻訳日:2023-11-15 17:12:19 公開日:2023-11-11
# 公正なテキスト・画像拡散モデル Finetuning Text-to-Image Diffusion Models for Fairness ( http://arxiv.org/abs/2311.07604v1 ) ライセンス: Link先を確認	Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli	(参考訳) 社会におけるテキスト・画像拡散モデルの急速な導入は、彼らのバイアスに対処する緊急の必要性を浮き彫りにしている。介入がなければ、これらのバイアスは歪んだ世界観を広め、少数派グループの機会を制限することができる。本研究では,分布アライメント問題として公正性を考察する。提案手法は,(1) 生成した画像の特定の特性をユーザ定義対象分布に向ける分布アライメント損失,(2) 生成した画像に定義された損失をより効果的に最適化するためにバイアス勾配を利用する拡散モデルのサンプリングプロセスのバイアスド直接微調整という2つの技術的貢献からなる。経験的に、この方法は職業的プロンプトに対する性別、人種、交叉バイアスを著しく減少させる。わずか5つのソフトトークンを微調整しても、性別バイアスは大幅に減少する。本手法は, 性別と人種の偏差と同時に, 年齢を75\%$ youngと25\%$ oldに制御することで, 絶対的平等を超えた公平性の多様な視点をサポートする。最後に,これらのプロンプトを微調整データに含めることで,複数の概念を同時にデバイアスすることができる。私たちの仕事は、T2I生成AIのソーシャルアライメントを促進することを願っています。コードと様々なデバイアス拡散モデルアダプタを共有します。 The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.	翻訳日:2023-11-15 17:11:58 公開日:2023-11-11
# PECoP:行動品質評価のためのパラメータ効率的な連続事前学習 PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment ( http://arxiv.org/abs/2311.07603v1 ) ライセンス: Link先を確認	Amirhossein Dadashzadeh, Shuchao Duan, Alan Whone, Majid Mirmehdi	(参考訳) ラベル付きデータのAQA(Action Quality Assessment)での可用性の制限により、以前の作業では、大規模なドメイン汎用データセットで事前トレーニングされたモデルを微調整せざるを得なくなった。この一般的なアプローチは、特に大きな領域シフトがある場合、弱い一般化をもたらす。そこで本研究では,新たなパラメータ効率の高い連続的事前学習フレームワークであるpecopを提案する。 PECoPでは,事前学習モデルに挿入された3D-Adaptersを導入し,適応モジュールのパラメータのみを更新する自己教師型学習を通じて時空間的,ドメイン内情報を学習する。 AQAに適用された最近の最先端手法(MUSDL、CoRe、TSA)の性能を向上させるPECoPの能力を実証し、ベンチマークデータセット、JIGSAWS(\uparrow 6.0\%$), MTL-AQA(\uparrow0.99\%$), FineDiving(\uparrow2.54\%$)を大幅に改善した。また, パーキンソン病の新しいデータセットpd4tを提示し, 実際の患者が4つのアクションを行ない, 比較において, 最新値の$\uparrow3.56\%$を上回った。私たちのコード、事前トレーニングされたモデル、PD4Tデータセットはhttps://github.com/Plrbear/PECoP.orgで公開されています。 The limited availability of labelled data in Action Quality Assessment (AQA), has forced previous works to fine-tune their models pretrained on large-scale domain-general datasets. This common approach results in weak generalisation, particularly when there is a significant domain shift. We propose a novel, parameter efficient, continual pretraining framework, PECoP, to reduce such domain shift via an additional pretraining stage. In PECoP, we introduce 3D-Adapters, inserted into the pretrained model, to learn spatiotemporal, in-domain information via self-supervised learning where only the adapter modules' parameters are updated. We demonstrate PECoP's ability to enhance the performance of recent state-of-the-art methods (MUSDL, CoRe, and TSA) applied to AQA, leading to considerable improvements on benchmark datasets, JIGSAWS ($\uparrow6.0\%$), MTL-AQA ($\uparrow0.99\%$), and FineDiving ($\uparrow2.54\%$). We also present a new Parkinson's Disease dataset, PD4T, of real patients performing four various actions, where we surpass ($\uparrow3.56\%$) the state-of-the-art in comparison. Our code, pretrained models, and the PD4T dataset are available at https://github.com/Plrbear/PECoP.	翻訳日:2023-11-15 17:11:37 公開日:2023-11-11
# LLMのオンライン化 - 機会と課題 Online Advertisements with LLMs: Opportunities and Challenges ( http://arxiv.org/abs/2311.07601v1 ) ライセンス: Link先を確認	Soheil Feizi, MohammadTaghi Hajiaghayi, Keivan Rezaei, Suho Shin	(参考訳) 本稿では,オンライン広告システムにおけるLarge Language Models(LLM)の活用の可能性について検討する。プライバシ、レイテンシ、信頼性、ユーザと広告主の満足度など、そのようなシステムが満たさなければならない必須要件について検討する。さらに,修正,入札,予測,オークションモジュールからなるLCM広告の一般的なフレームワークを紹介する。各モジュールに対する異なる設計上の考慮事項が提示され、その実用性と実装に固有の技術的課題を詳細に検討する。 This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability, users and advertisers' satisfaction, which such a system must fulfill. We further introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules. Different design considerations for each module is presented, with an in-depth examination of their practicality and the technical challenges inherent to their implementation.	翻訳日:2023-11-15 17:11:08 公開日:2023-11-11
# ポラリメトリックパッチマッチマルチビューステレオ Polarimetric PatchMatch Multi-View Stereo ( http://arxiv.org/abs/2311.07600v1 ) ライセンス: Link先を確認	Jinyu Zhao, Jumpei Oishi, Yusuke Monno, Masatoshi Okutomi	(参考訳) PatchMatch Multi-View Stereo (PatchMatch MVS)は、そのバランスの取れた精度と効率のため、人気のあるMVSアプローチの1つである。本稿では、PatchMatch MVSへの偏光キューを利用した最初の手法であるPolarPMS(PolarPMS)を提案する。 patchmatch mvsの鍵は、局所的な3d平面と傾斜ステレオマッチングウィンドウを形成する深さと通常の仮説を生成し、マルチビュー画像間の一貫性に基づいて最適な仮説を効率的に探索することである。標準光度整合性に加えて、偏光度情報と物体表面の正常性の関係が物理的性質によって動機付けられた、深さと正規仮説の有効性を評価するために偏光度整合性を評価する。 PatchMatch MVS法と比較して,PolaPMSはテクスチャレス表面における再構成3次元モデルの精度と完全性を向上させることができることを示した。 PatchMatch Multi-View Stereo (PatchMatch MVS) is one of the popular MVS approaches, owing to its balanced accuracy and efficiency. In this paper, we propose Polarimetric PatchMatch multi-view Stereo (PolarPMS), which is the first method exploiting polarization cues to PatchMatch MVS. The key of PatchMatch MVS is to generate depth and normal hypotheses, which form local 3D planes and slanted stereo matching windows, and efficiently search for the best hypothesis based on the consistency among multi-view images. In addition to standard photometric consistency, our PolarPMS evaluates polarimetric consistency to assess the validness of a depth and normal hypothesis, motivated by the physical property that the polarimetric information is related to the object's surface normal. Experimental results demonstrate that our PolarPMS can improve the accuracy and the completeness of reconstructed 3D models, especially for texture-less surfaces, compared with state-of-the-art PatchMatch MVS methods.	翻訳日:2023-11-15 17:11:00 公開日:2023-11-11
# LLM応答における意図的ビアーゼ Intentional Biases in LLM Responses ( http://arxiv.org/abs/2311.07611v1 ) ライセンス: Link先を確認	Nicklaus Badyal, Derek Jacoby, Yvonne Coady	(参考訳) 本研究では,対話型メディアのための特定のペルソナを作成するために,大規模言語モデル応答にバイアスを意図的に導入する。 Falcon-7bのようなオープンソースモデルとOpen AIのGPT-4モデルの違いについて検討し、2つのシステムで得られる応答の差を定量化する。専門家モデルと監督官を混合したGPT-4のガードレールは、一般にAIアライメントを確保するのに有用であるが、様々な不一般的な視点でペルソナを構築するのに有害であることがわかった。本研究の目的は,これらのプラクティスを創造的分野やメディアの新たな形態に適用できるような,大規模言語モデルの意図的バイアスにおける将来の探索の基盤となることにある。 In this study we intentionally introduce biases into large language model responses in an attempt to create specific personas for interactive media purposes. We explore the differences between open source models such as Falcon-7b and the GPT-4 model from Open AI, and we quantify some differences in responses afforded by the two systems. We find that the guardrails in the GPT-4 mixture of experts models with a supervisor, while useful in assuring AI alignment in general, are detrimental in trying to construct personas with a variety of uncommon viewpoints. This study aims to set the groundwork for future exploration in intentional biases of large language models such that these practices can be applied in the creative field, and new forms of media.	翻訳日:2023-11-15 16:56:01 公開日:2023-11-11
# 網膜底像による心血管疾患と危険因子の評価における人工知能 : 過去10年間のレビュー Artificial Intelligence in Assessing Cardiovascular Diseases and Risk Factors via Retinal Fundus Images: A Review of the Last Decade ( http://arxiv.org/abs/2311.07609v1 ) ライセンス: Link先を確認	Mirsaeed Abdollahi, Ali Jafarizadeh, Amirhosein Ghafouri Asbagh, Navid Sobhi, Keysan Pourmoghtader, Siamak Pedrammehr, Houshyar Asadi, Roohallah Alizadehsani, Ru-San Tan, U. Rajendra Acharya	(参考訳) 背景:心血管疾患(cvds)は世界規模で死亡率の主要な原因であり続けている。近年、人工知能(AI)技術の応用、特に深層学習(DL)は、CVDの様々な側面を評価することでかなりの人気を集めている。また、眼底画像と光学コヒーレンス断層撮影(optical coherence tomography angiography:octa)を用いて網膜疾患の診断法が広く研究されている。心臓の機能をよりよく理解し、微小血管の特徴と機能に基づく変化を予想するために、研究者は現在、AIと非侵襲網膜スキャンの統合を検討している。 AIを利用した大規模心血管疾患の早期発見と予測の活用は、循環器疾患の緩和と医療システムの経済的負担軽減に優れた可能性をもたらす。 Method: PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, ACM Digital Libraryなど,さまざまなデータベースに対して,心臓血管疾患や人工知能に関連する特定のキーワードを使用して包括的な検索を行った。結果:本研究に関連性のある87の英文出版物が収録され,追加の参考文献が検討された。本研究では, 網膜イメージングと人工知能を用いて心血管疾患を同定する現在の進歩と課題について概観し, この分野のさらなる探究への洞察を提供する。結論: 高齢化とグローバルCVD負荷の増加に伴い, 正確な疾患予後パターンの発達を目指す。 AIとディープラーニングは医療を変革させており、医療システムにおける迅速な採用の必要性にもかかわらず、網膜画像に基づく様々なCVDの診断の可能性を提供している。 Background: Cardiovascular diseases (CVDs) continue to be the leading cause of mortality on a global scale. In recent years, the application of artificial intelligence (AI) techniques, particularly deep learning (DL), has gained considerable popularity for evaluating the various aspects of CVDs. Moreover, using fundus images and optical coherence tomography angiography (OCTA) to diagnose retinal diseases has been extensively studied. To better understand heart function and anticipate changes based on microvascular characteristics and function, researchers are currently exploring the integration of AI with non-invasive retinal scanning. Leveraging AI-assisted early detection and prediction of cardiovascular diseases on a large scale holds excellent potential to mitigate cardiovascular events and alleviate the economic burden on healthcare systems. Method: A comprehensive search was conducted across various databases, including PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, and ACM Digital Library, using specific keywords related to cardiovascular diseases and artificial intelligence. Results: A total of 87 English-language publications, selected for relevance were included in the study, and additional references were considered. This study presents an overview of the current advancements and challenges in employing retinal imaging and artificial intelligence to identify cardiovascular disorders and provides insights for further exploration in this field. Conclusion: Researchers aim to develop precise disease prognosis patterns as the aging population and global CVD burden increase. AI and deep learning are transforming healthcare, offering the potential for single retinal image-based diagnosis of various CVDs, albeit with the need for accelerated adoption in healthcare systems.	翻訳日:2023-11-15 16:55:47 公開日:2023-11-11
# 病院入室予測のためのマルチモーダル時空間グラフ変換器 MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction ( http://arxiv.org/abs/2311.07608v1 ) ライセンス: Link先を確認	Yan Miao, Lequan Yu	(参考訳) 病院の入院率予測は,医療システムの品質と有効性を評価する上で重要な要因である,入院率の低下に不可欠なアプローチと考えられる。これまでの研究では、電子健康記録(ehr)、医療画像、臨床ノートの3つの主要な特徴を利用して病院の入院を予測している。しかし,これらの研究の大部分は,3つのモーダルからの情報の統合や,データセットに存在する時空間的関係の活用は行わなかった。本研究は,病院入退院予測のためのMultimodal Spatiotemporal Graph-Transformer (MuST) と呼ばれる新しいモデルを提案する。グラフ畳み込みネットワークと時間変換器を用いることで,ERHや胸部X線撮影における空間的および時間的依存関係を効果的に捉えることができる。次に,前述した2つのモードの時空間的特徴と,事前学習したドメイン固有変換器から抽出した臨床メモの特徴を組み合わせた融合変圧器を提案する。提案手法の有効性を,最新の公開データセットMIMIC-IVを用いて評価した。実験結果から, MuST にマルチモーダルな特徴を組み込むことで, 単調な手法と比較して性能が向上することが示唆された。さらに,本研究のパイプラインは,院内寛容の予測において,現在の先行手法よりも優れていた。 Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studies did not integrate information from all three modalities or utilize the spatiotemporal relationships present in the dataset. This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) for predicting hospital readmissions. By employing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. We then propose a fusion transformer to combine the spatiotemporal features from the two modalities mentioned above with the features from clinical notes extracted by a pre-trained, domain-specific transformer. We assess the effectiveness of our methods using the latest publicly available dataset, MIMIC-IV. The experimental results indicate that the inclusion of multimodal features in MuST improves its performance in comparison to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in the prediction of hospital readmissions.	翻訳日:2023-11-15 16:55:19 公開日:2023-11-11
# グラフ正規化テンソル補完のための交互最小化アルゴリズム Alternating minimization algorithms for graph regularized tensor completion ( http://arxiv.org/abs/2008.12876v2 ) ライセンス: Link先を確認	Yu Guan, Shuyu Dong, Bin Gao, P.-A. Absil, Fran\c{c}ois Glineur	(参考訳) cp因子行列上のグラフラプラシアン正則化を通した外部対関係を組み込むことにより、低ランクテンソル完全化(lrtc)に対する正準多進(cp)分解法を考える。グラフ正規化の利用にはLRTCの学習精度の利点が伴うが、同時にテンソル完備モデルの最適化を妨げる結合グラフラプラシアン項を誘導する。グラフ正規化lrtcの解法として,cp分解ベースモデルのブロック構造を活用し,効率的な交互最小化アルゴリズムを提案する。交互最小化の部分問題に対して、線形共役勾配サブルーチンはグラフ正規化lrtcに特に適合する。あるいは、乗算器の交互方向法を用いてグラフラプラシアン項の複雑結合効果を回避する。 Kurdyka-{\L}ojasiewicz の性質に基づき、提案アルゴリズムによって生成される列が、対象関数の臨界点にグローバルに収束することを示す。さらに、複雑性と収束率も導出される。さらに、合成データや実データを含む数値実験により、グラフ正規化テンソル補完モデルがグラフ正規化のないモデルに比べて回復結果が向上し、既存のアルゴリズムよりも時間効率が向上することを示した。 We consider a Canonical Polyadic (CP) decomposition approach to low-rank tensor completion (LRTC) by incorporating external pairwise similarity relations through graph Laplacian regularization on the CP factor matrices. The usage of graph regularization entails benefits in the learning accuracy of LRTC, but at the same time, induces coupling graph Laplacian terms that hinder the optimization of the tensor completion model. In order to solve graph-regularized LRTC, we propose efficient alternating minimization algorithms by leveraging the block structure of the underlying CP decomposition-based model. For the subproblems of alternating minimization, a linear conjugate gradient subroutine is specifically adapted to graph-regularized LRTC. Alternatively, we circumvent the complicating coupling effects of graph Laplacian terms by using an alternating directions method of multipliers. Based on the Kurdyka-{\L}ojasiewicz property, we show that the sequence generated by the proposed algorithms globally converges to a critical point of the objective function. Moreover, the complexity and convergence rate are also derived. In addition, numerical experiments including synthetic data and real data show that the graph regularized tensor completion model has improved recovery results compared to those without graph regularization, and that the proposed algorithms achieve gains in time efficiency over existing algorithms.	翻訳日:2023-11-15 01:00:27 公開日:2023-11-11
# 行動概念を用いたAI説明手法の診断 Diagnosing AI Explanation Methods with Folk Concepts of Behavior ( http://arxiv.org/abs/2201.11239v4 ) ライセンス: Link先を確認	Alon Jacovi, Jasmijn Bastings, Sebastian Gehrmann, Yoav Goldberg, Katja Filippova	(参考訳) 我々は,AIの説明が成功する条件に対する形式主義について検討する。我々は「成功」は、説明がどんな情報を含んでいるかだけでなく、説明者が理解している情報にも依存すると考える。心の文学の理論は、人間が行動を理解し、一般化するために使用する民間概念を論じる。行動の民俗概念は、人間が行動を理解する「言語」をもたらすと仮定する。我々は、説明的物語(第1図)の青写真を導入して、人間の説明者による社会的帰属の枠組みとして、人間が説明から理解する可能性が高い情報構造として、これらの民間概念を使用します。そして,今日,多くのXAI手法が質的評価において民生的な行動概念にマッピング可能であることを示す。これにより、現在のメソッドがうまく説明できないように、障害モードを明らかにすることができます。つまり、任意のXAIメソッドに欠けている情報構造であり、AIの動作が誤解される可能性を減らすことができます。 We investigate a formalism for the conditions of a successful explanation of AI. We consider "success" to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a "language" that humans understand behavior with. We use these folk concepts as a framework of social attribution by the human explainee -- the information constructs that humans are likely to comprehend from explanations -- by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully -- i.e., the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior.	翻訳日:2023-11-15 00:54:00 公開日:2023-11-11
# 自律走行車用カメラの物理的状態のモニタリングと適応 Monitoring and Adapting the Physical State of a Camera for Autonomous Vehicles ( http://arxiv.org/abs/2112.05456v3 ) ライセンス: Link先を確認	Maik Wischow and Guillermo Gallego and Ines Ernst and Anko B\"orner	(参考訳) 自動運転車とロボットは、現代のタスクの要求を満たすために、ますます堅牢さと信頼性を必要としている。これらの要件は、特にそのような車両に搭載されているカメラに適用される。カメラは適切な機能を維持し、必要に応じて自動的な対策を講じなければならない。既存のソリューションは、通常、特定の問題に合わせて調整されるか、マシンの下流のコンピュータビジョンタスクから分離される。本稿では,データおよび物理モデルに基づくカメラの汎用的・タスク指向型自己維持フレームワークを提案する。そこで本研究では,従来およびカスタマイズされた機械学習に基づくアプローチを広範囲な実験で評価することにより,カメラの典型的な画像効果(ブラインド,ノイズ現象,および最も一般的な組み合わせ)に対する信頼性の高い2つの実時間可能な推定器を決定する。さらに,実世界の地上車両にそのフレームワークを実装し,カメラがパラメータを調整して識別不良条件に対抗し,実験的な(非線形および非単調な)入出力性能曲線に基づく最適な適用能力を達成する方法を示す。対象のアプリケーションとして物体検出が選択され、画像が動きのぼやけやセンサノイズを条件付けする例となる。私たちのフレームワークは、カメラの健全性を監視し、維持するための実用的なソリューションを提供するだけでなく、完全な信頼性と堅牢なマシンを達成するために、経験的にデータソース(例えば、センサーや環境パラメータ)を結合するより高度な問題に取り組むための拡張の基盤としても機能します。コード:https://github.com/MaikWischow/Camera-Condition-Monitoring Autonomous vehicles and robots require increasingly more robustness and reliability to meet the demands of modern tasks. These requirements specially apply to cameras onboard such vehicles because they are the predominant sensors to acquire information about the environment and support actions. Cameras must maintain proper functionality and take automatic countermeasures if necessary. Existing solutions are typically tailored to specific problems or detached from the downstream computer vision tasks of the machines, which, however, determine the requirements on the quality of the produced camera images. We propose a generic and task-oriented self-health-maintenance framework for cameras based on data- and physically-grounded models. To this end, we determine two reliable, real-time capable estimators for typical image effects of a camera in poor condition (blur, noise phenomena and most common combinations) by evaluating traditional and customized machine learning-based approaches in extensive experiments. Furthermore, we implement the framework on a real-world ground vehicle and demonstrate how a camera can adjust its parameters to counter an identified poor condition to achieve optimal application capability based on experimental (non-linear and non-monotonic) input-output performance curves. Object detection is chosen as target application, and the image effects motion blur and sensor noise as conditioning examples. Our framework not only provides a practical ready-to-use solution to monitor and maintain the health of cameras, but can also serve as a basis for extensions to tackle more sophisticated problems that combine additional data sources (e.g., sensor or environment parameters) empirically in order to attain fully reliable and robust machines. Code: https://github.com/MaikWischow/Camera-Condition-Monitoring	翻訳日:2023-11-15 00:53:14 公開日:2023-11-11
# サブガンマ次数列が増加するネットワークモデルのクラスにおける漸近性 Asymptotic in a class of network models with an increasing sub-Gamma degree sequence ( http://arxiv.org/abs/2111.01301v4 ) ライセンス: Link先を確認	Jing Luo, Haoyu Wei, Xiaoyu Lei, Jiaxin Guo	(参考訳) サブガンマノイズ下の微分プライバシーについては、一般リンク関数を持つバイナリ値を持つネットワークモデルのクラスにおける漸近特性を導出する。本稿では、離散的なLaplace機構を特別なケースとして、一般的な雑音機構の下でバイナリネットワークの次数列を解放する。ネットワークモデルのクラスにおいてパラメータの数が無限度に達すると、パラメータ推定器の一貫性と漸近正規性の両方を含む漸近結果を確立する。漸近的な結果を示すシミュレーションと実データ例が提供される。 For the differential privacy under the sub-Gamma noise, we derive the asymptotic properties of a class of network models with binary values with a general link function. In this paper, we release the degree sequences of the binary networks under a general noisy mechanism with the discrete Laplace mechanism as a special case. We establish the asymptotic result including both consistency and asymptotically normality of the parameter estimator when the number of parameters goes to infinity in a class of network models. Simulations and a real data example are provided to illustrate asymptotic results.	翻訳日:2023-11-15 00:51:58 公開日:2023-11-11
# 駆動型量子Rabiモデルにおける2光子側バンド遷移 : 導出長手駆動と回転波近似を超えての定量的議論 Two-photon sideband transition in a driven quantum Rabi model : Quantitative discussions with derived longitudinal drives and beyond the rotating wave approximation ( http://arxiv.org/abs/2108.00137v2 ) ライセンス: Link先を確認	Byoung-moo Ann, Wouter Kessels, Gary A. Steele	(参考訳) 本研究では、駆動量子ラビモデル(QRM)のサイドバンド遷移ダイナミクスを解析的および数値的に研究する。特に、外部横方向駆動フィールドが一階側バンド遷移を誘導する条件に着目する。 2つの異なるシステム間のサイドバンド遷移を誘導することは、QRMを含む様々な物理モデルにとって重要な技術である。しかしながら、その重要性にもかかわらず、全てのシステムパラメータ構成に適用可能な駆動qrmのサイドバンド遷移率をうまく説明できる正確な分析研究はまだ報告されていない。本研究では、回転波近似 (rwa) \cite{rwa} に依存しない、二階摂動理論に基づくサイドバンド遷移率を解析的に導出する。計算式はドライブ周波数とシステムのパラメータのあらゆる範囲で有効である。解析的導出式は、適度な駆動振幅の系における数値結果とよく一致する。興味深いことに、横駆動ハミルトニアンから得られる非自明な縦駆動効果が発見された。このことは、導出長手効果を考慮せずに期待されるサイドバンド遷移率を著しく補正する。このアプローチを用いることで、QRMのサイドバンド遷移速度を特定のパラメータレギュレーション内に収まらないように正確に推定することができる。これは、駆動QRMによって記述された実験を理解するための重要な貢献である。 In this work, we analytically and numerically study the sideband transition dynamics of the driven quantum Rabi model (QRM). We focus in particular on the conditions when the external transverse drive fields induce first-order sideband transitions. Inducing sideband transitions between two different systems is an essential technique for various physical models, including the QRM. However, despite its importance, a precise analytical study has not been reported yet that successfully explains the sideband transition rates in a driven QRM applicable for all system parameter configurations. In our study, we analytically derive the sideband transition rates based on second-order perturbation theory, not relying on the rotating wave approximation (RWA) \cite{RWA}. Our formula are valid for all ranges of drive frequencies and system's parameters. Our analytical derived formula agrees well with the numerical results in a regime of moderate drive amplitudes. Interestingly, we have found a non-trivial longitudinal drive effect derived from the transverse drive Hamiltonian. This accounts for significant corrections to the sideband transition rates that are expected without considering the derived longitudinal effect. Using this approach, one can precisely estimate the sideband transition rates in the QRM not confining themselves within specific parameter regimes. This provides important contributions for understanding experiments described by the driven QRM.	翻訳日:2023-11-15 00:51:39 公開日:2023-11-11
# 量子データ圧縮と量子クロスエントロピー Quantum Data Compression and Quantum Cross Entropy ( http://arxiv.org/abs/2106.13823v3 ) ライセンス: Link先を確認	Zhou Shangnan	(参考訳) 量子機械学習の新たな分野は、量子コンピューティングと人工知能の視点に革命をもたらす可能性がある。量子機械学習の実証的な領域では、理論的な空白が持続する。本稿では,古典的クロスエントロピーに匹敵する量子クロスエントロピーを強調することにより,このギャップに対処する。我々は、量子データ圧縮における量子クロスエントロピーの役割を確立し、それが準最適量子源符号化の圧縮速度として機能することを実証することによって、基礎的な機械学習タスクである量子データ圧縮において果たす。我々のアプローチは、可変長符号化の量子一般化と量子強典型性の原理に基づく、新しい普遍的な量子データ圧縮プロトコルである。これは量子クロスエントロピーが量子機械学習アルゴリズムの損失関数として効果的に機能することを明らかにする。さらに、量子クロスエントロピーの最小値はフォン・ノイマンエントロピーと一致し、最適な圧縮速度としての役割を補強し、量子機械学習の理論的枠組みを理解する上での意義を強調する。 The emerging field of quantum machine learning has the potential of revolutionizing our perspectives of quantum computing and artificial intelligence. In the predominantly empirical realm of quantum machine learning, a theoretical void persists. This paper addresses the gap by highlighting the quantum cross entropy, a pivotal counterpart to the classical cross entropy. We establish quantum cross entropy's role in quantum data compression, a fundamental machine learning task, by demonstrating that it acts as the compression rate for sub-optimal quantum source coding. Our approach involves a novel, universal quantum data compression protocol based on the quantum generalization of variable-length coding and the principle of quantum strong typicality. This reveals that quantum cross entropy can effectively serve as a loss function in quantum machine learning algorithms. Furthermore, we illustrate that the minimum of quantum cross entropy aligns with the von Neumann entropy, reinforcing its role as the optimal compression rate and underscoring its significance in advancing our understanding of quantum machine learning's theoretical framework.	翻訳日:2023-11-15 00:51:02 公開日:2023-11-11
# 緩やかに不安定な回帰 Slowly Varying Regression under Sparsity ( http://arxiv.org/abs/2102.10773v5 ) ライセンス: Link先を確認	Dimitris Bertsimas, Vassilis Digalakis Jr, Michael Linghzi Li, Omar Skali Lami	(参考訳) 疎度下での緩やかな回帰の枠組みを示し、スパース回帰モデルでは緩やかでスパースな回帰を示す。パラメータ推定の問題は混合整数最適化問題として定式化される。新たな緩和手法により,二項凸最適化問題として正確に再構成できることを実証した。この緩和はムーア・ペンローズ逆数に対する新しい等式を含み、すべての実現可能な二分点上の元の目的と一致しながら非凸目的函数を凸化する。これにより,切断平面型アルゴリズムを用いて効率よく最適性を証明できる。このアルゴリズムの高度に最適化された実装を開発し、簡単な実装の漸近的計算複雑性を大幅に改善する。さらに,実現可能な解を保証する高速ヒューリスティック手法を提案し,実証的に示すように,二項最適化問題に対する高品質なウォームスタート解を生成する。フレームワークのハイパーパラメータを調整するために、ある仮定の下では、真のモデルパラメータを復元することが保証されるバイナリ検索に依存する実用的な手順を提案する。合成データと実世界のデータの両方について, 推定精度, 予測能力, 計算時間など, 様々な指標で比較して, 結果のアルゴリズムが競合する定式化を上回っていることを示す。アルゴリズムは非常にスケーラブルで、数千のパラメータでモデルをトレーニングすることができます。実装はhttps://github.com/vvdigalakis/ssvregression.gitで公開しています。 We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equality on Moore-Penrose inverses, convexifying the non-convex objective function while matching the original objective on all feasible binary points. This enables us to efficiently solve the problem to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of this algorithm, substantially improving upon the asymptotic computational complexity of a straightforward implementation. Additionally, we propose a fast heuristic method that guarantees a feasible solution and, as empirically illustrated, produces high-quality warm-start solutions for the binary optimization problem. To tune the framework's hyperparameters, we suggest a practical procedure relying on binary search that, under certain assumptions, is guaranteed to recover the true model parameters. On both synthetic and real-world datasets, we demonstrate that the resulting algorithm outperforms competing formulations in comparable times across various metrics, including estimation accuracy, predictive power, and computational time. The algorithm is highly scalable, allowing us to train models with thousands of parameters. Our implementation is available open-source at https://github.com/vvdigalakis/SSVRegression.git.	翻訳日:2023-11-15 00:50:21 公開日:2023-11-11
# 凝集関数を持つファジィ推論のMPおよびMT特性 MP and MT properties of fuzzy inference with aggregation function ( http://arxiv.org/abs/2205.01269v2 ) ライセンス: Link先を確認	Dechao Li and Mengying He	(参考訳) 2つの基本的なファジィ推論モデルとして、ファジィモーダスポネン(fmp)とファジィモーダストレン(fmt)は人工知能において重要な応用である。 FMPとFMTの問題を解決するために、Zadeh氏は推論の合成規則(CRI)を提案した。本稿では,A-compositional rule of inference(ACRI)法の有効性を,論理的視点と補間的視点から,集約関数に基づく一般化されたCRI法として検討することを目的とする。具体的には, acri法のmodus ponens (mp) と modus tollens (mt) 特性について詳細に述べる。 FMP問題とFMT問題を実装する集約関数は、それぞれT-条件性、U-条件性、O-条件性の法則としてよく知られているt-ノルム、ユニノム、重なり関数よりもより一般性を示す。さらに、理論的結果を説明するための2つの例も提示されている。特に、例 6.2 は fmp(fmt) 問題における出力 b' が、ファジィの入力とファジィ規則の先行項が近いときに提案する推論法で b(dc) に近いことを示している(ファジィ規則における接点の否定に近いファジィ入力)。 As the two basic fuzzy inference models, fuzzy modus ponens (FMP) and fuzzy modus tollens (FMT) have the important application in artificial intelligence. In order to solve FMP and FMT problems, Zadeh proposed a compositional rule of inference (CRI) method. This paper aims mainly to investigate the validity of A-compositional rule of inference (ACRI) method, as a generalized CRI method based on aggregation functions, from a logical view and an interpolative view, respectively. Specifically, the modus ponens (MP) and modus tollens (MT) properties of ACRI method are discussed in detail. It is shown that the aggregation functions to implement FMP and FMT problems provide more generality than the t-norms, uninorms and overlap functions as well-known the laws of T-conditionality, U-conditionality and O-conditionality, respectively. Moreover, two examples are also given to illustrate our theoretical results. Especially, Example 6.2 shows that the output B' in FMP(FMT) problem is close to B(DC) with our proposed inference method when the fuzzy input and the antecedent of fuzzy rule are near (the fuzzy input near with the negation of the seccedent in fuzzy rule).	翻訳日:2023-11-14 23:06:42 公開日:2023-11-11
# GUPはER=EPRのモデルとして機能するか? Could GUP Act as a Model for the ER=EPR Conjecture? ( http://arxiv.org/abs/2210.13974v5 ) ライセンス: Link先を確認	Ahmed Farag Ali	(参考訳) アインシュタイン、ポドルスキー、ローゼン(epr)は思考実験を通じて、不確実性原理は現実の完全な説明を提供しないかもしれないと提案した。線形一般化不確実性原理(GUP)は,最小測定可能な長さで消失不確実性を示すことによって,EPRパラドックスを解くことができる。これは量子力学の完全性に光を当てることで、線形 GUP とベケンシュタイン境界の間の等価性、すなわち物理系を量子レベルまで完全に記述するのに必要となる情報の最大量を規定する境界を提案することができる。この等価性は、水素原子/核半径と宇宙定数の値を説明することによって検証される。最近の研究では、アインシュタイン・ローゼン橋(ER)が最小長(GUP)に由来することが確認された。これらの結果を踏まえ、線形 GUP が ER=EPR 予想のモデルとして機能することを提案する。 Einstein, Podolsky, and Rosen (EPR) proposed, via a thought experiment, that the uncertainty principle might not provide a complete description of reality. We propose that the linear generalized uncertainty principle (GUP) may resolve the EPR paradox by demonstrating vanishing uncertainty at the minimal measurable length. This may shed light on the completeness of quantum mechanics which leads us to propose an equivalency between the linear GUP and the Bekenstein bound, a bound that prescribes the maximum amount of information needed to completely describe a physical system up to quantum level. This equivalency is verified through explaining the Hydrogen's atom/nuclei radii as well as the value of the cosmological constant. In a recent published study, we verified that the Einstein-Rosen (ER) bridge originates from the minimal length or GUP. Considering these findings together, we propose that linear GUP could function as a model for the ER=EPR conjecture.	翻訳日:2023-11-14 22:54:40 公開日:2023-11-11
# 限定ラベルデータを用いたハイブリッド融合型解釈可能なマルチモーダル感情認識 Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Limited Labelled Data ( http://arxiv.org/abs/2208.11450v2 ) ライセンス: Link先を確認	Puneet Kumar, Sarthak Malik, Balasubramanian Raman and Xiaobai Li	(参考訳) 本稿では,画像,音声,テキストを含むマルチモーダル入力に反映される感情を離散クラスに分類するマルチモーダル感情認識システム visual spoken textual additive net (vista net) を提案する。 K-Average Additive exPlanation (KAAP) と呼ばれる新しい解釈可能性技術も開発され、視覚的、音声的、テキスト的特徴を識別し、特定の感情クラスを予測する。 VISTAネットは、早期融合と後期融合のハイブリッドを用いて、画像、音声、テキストモダリティから情報を融合する。重み付け平均を計算しながら、中間出力の重みを自動的に調整する。 KAAP技術は、特定の感情のクラスを予測するために、各モダリティと対応する特徴の寄与を計算する。離散感情クラスでラベル付けされたマルチモーダル感情データセットの不十分さを軽減するために,画像,対応する音声,テキスト,感情ラベル(「angry」,「happy」,「hate」,「sad」)からなる大規模iit-r mmemorecデータセットを構築した。 VISTAネットは、IIT-R MMEmoRecデータセット上で、視覚的、音声的、テキスト的モダリティを使用して、95.99\%の感情認識精度を達成している。 This paper proposes a multimodal emotion recognition system, VIsual Spoken Textual Additive Net (VISTA Net), to classify emotions reflected by multimodal input containing image, speech, and text into discrete classes. A new interpretability technique, K-Average Additive exPlanation (KAAP), has also been developed that identifies important visual, spoken, and textual features leading to predicting a particular emotion class. The VISTA Net fuses information from image, speech, and text modalities using a hybrid of early and late fusion. It automatically adjusts the weights of their intermediate outputs while computing the weighted average. The KAAP technique computes the contribution of each modality and corresponding features toward predicting a particular emotion class. To mitigate the insufficiency of multimodal emotion datasets labeled with discrete emotion classes, we have constructed a large-scale IIT-R MMEmoRec dataset consisting of images, corresponding speech and text, and emotion labels ('angry,' 'happy,' 'hate,' and 'sad'). The VISTA Net has resulted in 95.99\% emotion recognition accuracy on the IIT-R MMEmoRec dataset on using visual, audio, and textual modalities, outperforming when using any one or two modalities.	翻訳日:2023-11-14 22:52:25 公開日:2023-11-11
# エージェントベースの市場モデルと相互作用する単純な学習エージェント A simple learning agent interacting with an agent-based market model ( http://arxiv.org/abs/2208.10434v4 ) ライセンス: Link先を確認	Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie	(参考訳) 本稿では,イベント駆動型金融市場モデルと相互作用する単一強化学習最適実行取引エージェントの学習ダイナミクスについて考察する。トレーディングはイベント時にマッチングエンジンを介して非同期に行われる。最適な実行エージェントは、初期オーダーサイズと異なるサイズの状態空間の異なるレベルで考慮される。エージェントベースのモデルと市場への影響は、経験的スタイル化された事実と価格影響曲線の変化を探索するキャリブレーションアプローチを用いて考慮される。収束、ボリューム軌道、アクショントレースプロットは学習ダイナミクスを視覚化するために使用される。ここで、より小さな状態空間エージェントは、訪問した状態がより大きな状態空間エージェントよりもずっと早く収束し、スプレッド状態とボリューム状態を使って直感的に取引を学べるようになった。モデルのモーメントは,戦略的な秩序分割の導入によって低下したHurst指数を除いて,学習エージェントの影響に対して堅牢であることがわかった。学習エージェントの導入は、価格影響曲線の形状を保ちながら、取引量の増加に伴うトレードサイン自己相関を低減できる。 We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.	翻訳日:2023-11-14 22:51:51 公開日:2023-11-11
# immesh:lidarの即時ローカライズとメッシュ化フレームワーク ImMesh: An Immediate LiDAR Localization and Meshing Framework ( http://arxiv.org/abs/2301.05206v3 ) ライセンス: Link先を確認	Jiarong Lin, Chongjiang Yuan, Yixi Cai, Haotian Li, Yunfan Ren, Yuying Zou, Xiaoping Hong and Fu Zhang	(参考訳) 本稿では,リアルタイムの同時局所化とメッシュ化を実現するために,新しいLiDAR(-inertial odometry and mapping framework)を提案する。このフレームワークはImMeshと呼ばれ、レシーバ、ローカライゼーション、メッシュ、ブロードキャストの4つの密結合モジュールで構成されている。ローカライゼーションモジュールは、受信機から推定されるセンサデータを利用し、LiDARスキャンを地図に登録してオンラインのポーズを推定し、地図を動的に成長させる。そして、私たちのメッシュモジュールは登録済みのLiDARスキャンを使って、オンザフライでトライアングルメッシュを漸進的に再構築します。最後に、リアルタイムのオドメトリ、マップ、メッシュをブロードキャストで公開します。この研究の主な貢献は、効率的な階層的なボクセル構造によってシーンを表現するメッシュモジュールであり、新しいスキャンで観察されたボクセルの高速発見を実行し、各ボクセルの三角形のファセットを漸進的に再構築する。このボクセルワイドメッシュ操作は、効率性のために微妙に設計され、まず、ボクセルに含まれる2次元局所平面に3Dポイントを投影し、次に、三角形の面を漸進的に再構成するためのプル、コミット、プッシュステップでメッシュ操作を実行する。私たちの知る限りでは、gpuアクセラレーションなしで標準的なcpuに頼るだけで、大規模なシーンのトライアングルメッシュをオンラインで再構築できる文学作品はこれが初めてです。私たちの発見を共有し、コミュニティへのコントリビューションをするために、私たちのコードをGitHubで公開しています。 In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The key contribution of this work is the meshing module, which represents a scene by an efficient hierarchical voxels structure, performs fast finding of voxels observed by new scans, and reconstructs triangle facets in each voxel in an incremental manner. This voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration. To share our findings and make contributions to the community, we make our code publicly available on our GitHub: https://github.com/hku-mars/ImMesh.	翻訳日:2023-11-14 22:44:13 公開日:2023-11-11
# セマンティックスを駆使したコミュニケーション:テュートリアル・クム・サーベイ Semantics-Empowered Communication: A Tutorial-cum-Survey ( http://arxiv.org/abs/2212.08487v5 ) ライセンス: Link先を確認	Zhilin Lu, Rongpeng Li, Kun Lu, Xianfu Chen, Ekram Hossain, Zhifeng Zhao, and Honggang Zhang	(参考訳) セマンティクス・エミュレーション・コミュニケーション(semcom, semantics-empowered communication, semcom)研究の興隆とともに、学界と産業の両方において、幅広い側面(理論、応用、メトリクス、実装など)に対する前例のない関心が高まっている。本研究の目的は,背景分類学と研究分類学の両方に関する総合的な調査と,詳細な技術チュートリアルを提供することである。具体的には、文献をレビューし、意味伝達における「何」と「なぜ」の質問に答えることから始める。その後,semcomのエコシステムとして,歴史,理論,メトリクス,データセット,ツールキットを提示し,その上で研究の方向性を分類する。さらに, 明示的かつ暗黙的な推論に基づく手法により, 重要な実現手法を分類し, それらがどのように進化し, 現代的コンテントとチャネルセマンティクスを用いたコミュニケーションに寄与するかを詳述する。セムコムにおける最新の取り組みの見直しと要約に加えて、包括的で統一された視点から他のコミュニケーションレベル(例えば、従来のコミュニケーション)との関係について論じる。その後、今後の開発や工業的応用を促進するために、セマンティックな正確性、堅牢性、大規模スケーラビリティを高めるための先進的な実践技術を強調します。最後に,今後の研究機会に光を当てた技術的課題について論じる。 Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present the ecosystems of SemCom, including history, theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content & channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., conventional communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.	翻訳日:2023-11-14 22:41:57 公開日:2023-11-11
# マイクロ波被覆弱非調和超伝導量子ビットの非摂動的再正規化の解消 Resolving non-perturbative renormalization of a microwave-dressed weakly anharmonic superconducting qubit ( http://arxiv.org/abs/2212.05847v2 ) ライセンス: Link先を確認	Byoung-moo Ann, Sercan Deve, and Gary A. Steele	(参考訳) マイクロ波駆動は超伝導量子ビット(scqs)のユビキタスな技術であるが、従来の摂動理論に基づく服装状態の記述は強い駆動限界のダイナミクスを完全に捉えることはできない。トランスモン系回路量子力学(QED)系に適用可能なこれらの近似以外の包括的な研究は、主に単一モードまたは2状態系に限られているため、残念ながら稀である。本研究では,マイクロ波を装ったトランスモンを,広い範囲の駆動パラメータ上で単一量子化モードに結合する。トランスモンと共振器の相互作用と各モードの特性が強い駆動限界において著しく再正規化されることを明らかにする。従来の理論的な研究と異なり、摂動的レジームを超えた非再帰的かつ非フレケット理論を確立し、実験を良好に定量化する。本研究は,従来の近似を超越した,身なりのQEDライクなシステムに対する基本的な理解を拡大する。我々の研究は、高速量子ゲートの実装、量子ビットパラメータ工学、および駆動非線形システムに関する基礎研究にも貢献する。 Microwave driving is a ubiquitous technique for superconducting qubits (SCQs), but the dressed states description based on the conventionally used perturbation theory cannot fully capture the dynamics in the strong driving limit. Comprehensive studies beyond these approximations applicable to transmon-based circuit quantum electrodynamics (QED) systems are unfortunately rare as the relevant works have been mainly limited to single-mode or two-state systems. In this work, we investigate a microwave-dressed transmon coupled to a single quantized mode over a wide range of driving parameters. We reveal that the interaction between the transmon and resonator as well as the properties of each mode is significantly renormalized in the strong driving limit. Unlike previous theoretical works, we establish a non-recursive, and non-Floquet theory beyond the perturbative regimes, which excellently quantifies the experiments. This work expands our fundamental understanding of dressed cavity QED-like systems beyond the conventional approximations. Our work will also contribute to fast quantum gate implementation, qubit parameter engineering, and fundamental studies on driven nonlinear systems.	翻訳日:2023-11-14 22:41:31 公開日:2023-11-11
# 大規模フレキシブルタイトガウス混合モデルの確率的1次学習 Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model ( http://arxiv.org/abs/2212.05402v3 ) ライセンス: Link先を確認	Mohammad Pasande, Reshad Hosseini, Babak Nadjar Araabi	(参考訳) ガウス混合モデル(gmms)は、多くの応用で広く使われている最も強力なパラメトリック密度モデルの一つである。 GMMにおける共分散行列の柔軟な分解は、多くのガウス成分を必要とする高次元データや複素密度に直面した場合の共通GMMの課題に対処するための強力なアプローチである。しかし, フレキシブルタイトGMMを適合させるための期待最大化アルゴリズムは, ストリーミングや非常に大きな次元データに難航している。これらの課題を克服するために,一階確率最適化アルゴリズムを提案する。具体的には、直交行列の多様体上の新しい確率最適化アルゴリズムを提案する。合成データセットと実データセットの両方における多くの実験結果を通して、確率的最適化手法は、より良い可能性の達成、収束のエポックの低減、各エポック毎の時間の短縮という観点で予測-最大化アルゴリズムより優れていることが観察された。 Gaussian Mixture Models (GMMs) are one of the most potent parametric density models used extensively in many applications. Flexibly-tied factorization of the covariance matrices in GMMs is a powerful approach for coping with the challenges of common GMMs when faced with high-dimensional data and complex densities which often demand a large number of Gaussian components. However, the expectation-maximization algorithm for fitting flexibly-tied GMMs still encounters difficulties with streaming and very large dimensional data. To overcome these challenges, this paper suggests the use of first-order stochastic optimization algorithms. Specifically, we propose a new stochastic optimization algorithm on the manifold of orthogonal matrices. Through numerous empirical results on both synthetic and real datasets, we observe that stochastic optimization methods can outperform the expectation-maximization algorithm in terms of attaining better likelihood, needing fewer epochs for convergence, and consuming less time per each epoch.	翻訳日:2023-11-14 22:41:15 公開日:2023-11-11
# 大規模言語モデルで自然発生した心の理論 Theory of Mind Might Have Spontaneously Emerged in Large Language Models ( http://arxiv.org/abs/2302.02083v5 ) ライセンス: Link先を確認	Michal Kosinski	(参考訳) 我々は、心の理論(ToM)や、観察不能な精神状態を他人に説明するユニークな人間の能力が、大きな言語モデル(LLM)に自然に現れる可能性を探る。 ToMをヒトでテストする際の金の基準として,40の偽確認タスクを設計し,複数のLSMに投与した。各タスクには、偽確認シナリオ、3つの密に一致した真信制御、全4つの逆バージョンが含まれていた。 GPT-3-davinci-003(2022年11月)とChatGPT-3.5-turbo(2023年3月)は20%のタスクを解き、ChatGPT-4(2023年6月)は75%のタスクを解き、過去の研究で観察された6歳児のパフォーマンスと一致した。これらの結果から,これまでヒトに排他的と考えられていたToMが,LLMの言語能力向上の副産物として自然に出現した可能性が示唆された。 We explore the intriguing possibility that theory of mind (ToM), or the uniquely human ability to impute unobservable mental states to others, might have spontaneously emerged in large language models (LLMs). We designed 40 false-belief tasks, considered a gold standard in testing ToM in humans, and administered them to several LLMs. Each task included a false-belief scenario, three closely matched true-belief controls, and the reversed versions of all four. Smaller and older models solved no tasks; GPT-3-davinci-003 (from November 2022) and ChatGPT-3.5-turbo (from March 2023) solved 20% of the tasks; ChatGPT-4 (from June 2023) solved 75% of the tasks, matching the performance of six-year-old children observed in past studies. These findings suggest the intriguing possibility that ToM, previously considered exclusive to humans, may have spontaneously emerged as a byproduct of LLMs' improving language skills.	翻訳日:2023-11-14 22:29:23 公開日:2023-11-11
# MS-DETR:低結合核融合型マルチスペクトル歩行者検出変換器とモードベース最適化 MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization ( http://arxiv.org/abs/2302.00290v3 ) ライセンス: Link先を確認	Yinghui Xing, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang	(参考訳) 可視・熱変調は特に低照度条件下で相補的な情報を提供することができるため、多スペクトル歩行者検出は、多くの時空応用にとって重要な課題である。利用可能なマルチスペクトル歩行者検出装置のほとんどが非エンド・ツー・エンド検出器に基づいているが,本稿ではマルチスペクトル歩行者検出用トランスフォーマ(ms-detr)を提案し,detrをマルチモーダル検出の分野に拡張する。 ms-detrは2つのモダリティ固有のバックボーンとトランスエンコーダで構成され、続いてマルチモーダルトランスフォーマデコーダがあり、可視性と熱的特徴はマルチモーダルトランスフォーマデコーダで融合される。マルチモーダル画像間の不一致によく抵抗するため,マルチモーダル特徴のキーポイントを個別に抽出し,適応的に学習した注意重みでそれらを融合することにより,疎結合な融合戦略を設計する。さらに、異なるモダリティだけでなく、異なる歩行者インスタンスが最終検出のために異なる信頼度スコアを持つ傾向があるという知見に基づいて、可視およびサーマルデコーダの分岐を保存し、インスタンス毎の動的損失を通じて予測スロットを整列するインスタンス対応モダリティバランス最適化戦略を提案する。我々のエンドツーエンドMS-DETRは、挑戦的なKAIST、CVC-14、LLVIPベンチマークデータセットよりも優れた性能を示している。ソースコードはhttps://github.com/YinghuiXing/MS-DETR で公開されている。 Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR .	翻訳日:2023-11-14 22:28:41 公開日:2023-11-11
# 言語モデルは後続のプロンプトで人間より悪いか? 複雑です Are Language Models Worse than Humans at Following Prompts? It's Complicated ( http://arxiv.org/abs/2301.07085v2 ) ライセンス: Link先を確認	Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick	(参考訳) プロンプトは言語モデルのゼロショットと少数ショットのパフォーマンスの進歩の中心である。しかし、最近の研究では、意図的な無関係や誤解を招くプロンプトが与えられた場合、モデルは驚くほどうまく機能することがわかった。このような結果は、モデル行動が「人間らしくない」という証拠として解釈できる。本研究は,病的指示が与えられた場合,人間は良く行動する,という研究の中心的な前提に挑戦する。人間は無関係な指示を確実に無視することができ、従ってモデルのように、要求されるタスクに関する信号が明らかに不足しているにもかかわらず、基礎となるタスクでうまく機能する。しかし、故意に誤解を招く指示を受けると、人間は忠実に指示に従うが、モデルは従わない。今後の研究は、モノリスとしての人間の行動を理想化すべきではなく、人間の行動を実証的に検証することなく、これらの行動に関する仮定を模倣するモデルを訓練・評価すべきではない。 Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Our findings caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically.	翻訳日:2023-11-14 22:27:19 公開日:2023-11-11
# 多元アノテーションによるロバストな医用画像セグメンテーションの学習 Learning Robust Medical Image Segmentation from Multi-source Annotations ( http://arxiv.org/abs/2304.00466v2 ) ライセンス: Link先を確認	Yifeng Wang, Luyang Luo, Mingxiang Wu, Qiong Wang and Hao Chen	(参考訳) 複数の独立したソースからアノテーションを収集することで、単一のソースからの潜在的なノイズやバイアスの影響を軽減することができる。マルチソースアノテーションからセグメンテーションネットワークを学習することは、アノテーションのばらつきと画像の品質がもたらす不確実性のため、依然として課題である。本稿では,画素レベルと画像レベルの両方における不確実性推定によるトレーニングプロセスを導く,不確実性誘導型多元アノテーションネットワーク(uma-net)を提案する。まず,アノテーションの不確実性評価モジュール(AUEM)を開発し,各アノテーションの画素単位の不確かさを学習し,重み付きセグメンテーション損失による信頼画素からの学習をネットワークに誘導した。第2に,評価済みアノテーションの不確実性に基づいて,入力サンプルの画質を評価する品質評価モジュール(QAM)を提案した。重要となるのは, 廃棄する代わりに, 低品質のサンプルから学習するための補助的予測器を導入することで, 主予測器にエラーを直接蓄積することなく, その表現知識をバックボーンに保存することであった。 2次元胸部X線セグメンテーション,眼底画像セグメンテーション,3次元胸部DCE-MRIセグメンテーションなど,様々なデータセットに対するUMA-Netの有効性と有用性を示した。 Collecting annotations from multiple independent sources could mitigate the impact of potential noises and biases from a single source, which is a common practice in medical image segmentation. Learning segmentation networks from multi-source annotations remains a challenge due to the uncertainties brought by the variance of annotations and the quality of images. In this paper, we propose an Uncertainty-guided Multi-source Annotation Network (UMA-Net), which guides the training process by uncertainty estimation at both the pixel and the image levels. First, we developed the annotation uncertainty estimation module (AUEM) to learn the pixel-wise uncertainty of each annotation, which then guided the network to learn from reliable pixels by weighted segmentation loss. Second, a quality assessment module (QAM) was proposed to assess the image-level quality of the input samples based on the former assessed annotation uncertainties. Importantly, we introduced an auxiliary predictor to learn from the low-quality samples instead of discarding them, which ensured the preservation of their representation knowledge in the backbone without directly accumulating errors within the primary predictor. Extensive experiments demonstrated the effectiveness and feasibility of our proposed UMA-Net on various datasets, including 2D chest X-ray segmentation, fundus image segmentation, and 3D breast DCE-MRI segmentation.	翻訳日:2023-11-14 22:20:28 公開日:2023-11-11
# 人間反応データからの最適・プライベート学習 Optimal and Private Learning from Human Response Data ( http://arxiv.org/abs/2303.06234v2 ) ライセンス: Link先を確認	Duc Nguyen and Anderson Y. Zhang	(参考訳) 項目応答理論 (IRT) は、人々が確率的意思決定を行う方法の研究であり、教育試験やレコメンデーションシステムなどに様々な応用がある。 IRTにおける最も基本的なモデルの1つであるバイナリ応答データのラッシュモデルは、重要な実践的重要性を持つ研究の活発な領域である。最近、Nguyen と Zhang (2022) は、効率的かつ正確な新しいスペクトル推定アルゴリズムを提案した。本研究では2つの重要な方法で結果を拡張する。まず,スペクトルアルゴリズムにおいて,「平均誤差」$\ell_2$バウンド」を補足する改良されたエントリワイド誤差を求める。特に、軽度のサンプリング条件下では、スペクトルアルゴリズムは最小誤差境界(ログ係数の変調)を達成する。改良された分析に基づいて、スペクトルアルゴリズムは、上位$K$回復のための最適なサンプル複雑さ(例えば、承認/不承認応答データから最高の$K$アイテムを識別する)を享受し、前回の研究の実証的な結果を説明する。第2のコントリビューションでは、IRTで重要だが未検討のトピックであるプライバシーについて取り上げています。 IRTの人間中心の応用にもかかわらず、文献にはプライバシー保護機構が提案されていない。我々は、独自のマルコフ連鎖定式化と離散ガウス機構を利用したスペクトルアルゴリズムのプライベート拡張を開発する(Canonne et al., 2020)。実験により、我々のアプローチは低レベルのプライバシー体制のベースラインよりもはるかに正確であることが示されている。 Item response theory (IRT) is the study of how people make probabilistic decisions, with diverse applications in education testing, recommendation systems, among others. The Rasch model of binary response data, one of the most fundamental models in IRT, remains an active area of research with important practical significance. Recently, Nguyen and Zhang (2022) proposed a new spectral estimation algorithm that is efficient and accurate. In this work, we extend their results in two important ways. Firstly, we obtain a refined entrywise error bound for the spectral algorithm, complementing the `average error' $\ell_2$ bound in their work. Notably, under mild sampling conditions, the spectral algorithm achieves the minimax optimal error bound (modulo a log factor). Building on the refined analysis, we also show that the spectral algorithm enjoys optimal sample complexity for top-$K$ recovery (e.g., identifying the best $K$ items from approval/disapproval response data), explaining the empirical findings in the previous work. Our second contribution addresses an important but understudied topic in IRT: privacy. Despite the human-centric applications of IRT, there has not been any proposed privacy-preserving mechanism in the literature. We develop a private extension of the spectral algorithm, leveraging its unique Markov chain formulation and the discrete Gaussian mechanism (Canonne et al., 2020). Experiments show that our approach is significantly more accurate than the baselines in the low-to-moderate privacy regime.	翻訳日:2023-11-14 22:16:58 公開日:2023-11-11
# 対話型テキスト生成 Interactive Text Generation ( http://arxiv.org/abs/2303.00908v3 ) ライセンス: Link先を確認	Felix Faltings and Michel Galley and Baolin Peng and Kiant\'e Brantley and Weixin Cai and Yizhe Zhang and Jianfeng Gao and Bill Dolan	(参考訳) ユーザは毎日、テキスト、画像、コード、その他のエディタと対話する。しかし、ユーザーとエディタ間の対話性を反映した設定では、機械学習モデルをトレーニングすることは滅多にない。これは、実際のユーザによるAIモデルのトレーニングが遅くてコストがかかるだけでなく、これらのモデルが学んだことは、ユーザインターフェースの設計選択に特有のものかもしれないため、理解できる。残念ながらこれは、テキスト、コード、画像生成に関するほとんどの研究が非インタラクティブな設定に焦点を当てていることを意味している。対象テキストに対してモデルを誘導する編集を提供するユーザシミュレータを用いて,実ユーザを巻き込むことなく,対話的に生成モデルを訓練できる新たな対話型テキスト生成タスクを提案する。我々は、Imitation Learningを使ってインタラクティブモデルをトレーニングし、競争力のある非インタラクティブ生成モデルに対する実験により、すべてのモデルにユーザー入力や編集の予算が同じであっても、インタラクティブにトレーニングされたモデルは非インタラクティブモデルよりも優れていることを示す。 Users interact with text, image, code, or other editors on a daily basis. However, machine learning models are rarely trained in the settings that reflect the interactivity between users and their editor. This is understandable as training AI models with real users is not only slow and costly, but what these models learn may be specific to user interface design choices. Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help. We introduce a new Interactive Text Generation task that allows training generation models interactively without the costs of involving real users, by using user simulators that provide edits that guide the model towards a given target text. We train our interactive models using Imitation Learning, and our experiments against competitive non-interactive generation models show that models trained interactively are superior to their non-interactive counterparts, even when all models are given the same budget of user inputs or edits.	翻訳日:2023-11-14 22:16:05 公開日:2023-11-11
# 指数ヒルベルト空間を持たない多体マヨラナブレイディング Many-body Majorana braiding without an exponential Hilbert space ( http://arxiv.org/abs/2303.00761v3 ) ライセンス: Link先を確認	Eric Mascot, Themba Hodge, Dan Crawford, Jasmin Bedow, Dirk K. Morr, Stephan Rachel	(参考訳) majorana zero modes (mzms) で構築された量子ビットは、位相的に保護された量子コンピューティングへの主要な経路である。複数のMZMのブレイディング過程のシミュレーションは超伝導多体系の量子力学に対応する。マヨラナ力学は、他の全ての準粒子の存在と、合理的に大きなシステムサイズの両方で研究することが重要である。本稿では,任意の多体波動関数とその期待値,相関値,重なりを超伝導体の時間発展単粒子状態から計算する方法を提案する。ブレイディングプロセスの品質を追跡するために,マヨラナペアの忠実性,遷移確率,ジョイントパリティを計算する。ブレイディングの成功はブレイドの速度にどのように依存するかを示す。さらに, トポロジカルCNOT2量子ゲートを2量子絡みの例として示す。我々の研究は、Majorana qubitsの多くの理論的実装をテストし分析する道を開く。さらに、この方法は任意の非相互作用超伝導体の動力学を研究するのに使うことができる。 Qubits built out of Majorana zero modes (MZMs) constitute the primary path towards topologically protected quantum computing. Simulating the braiding process of multiple MZMs corresponds to the quantum dynamics of a superconducting many-body system. It is crucial to study the Majorana dynamics both in the presence of all other quasiparticles and for reasonably large system sizes. We present a method to calculate arbitrary many-body wavefunctions as well as their expectation values, correlators and overlaps from time evolved single-particle states of a superconductor, allowing for significantly larger system sizes. We calculate the fidelity, transition probabilities, and joint parities of Majorana pairs to track the quality of the braiding process. We show how the braiding success depends on the speed of the braid. Moreover, we demonstrate the topological CNOT two-qubit gate as an example of two-qubit entanglement. Our work opens the path to test and analyze the many theoretical implementations of Majorana qubits. Moreover, this method can be used to study the dynamics of any non-interacting superconductor.	翻訳日:2023-11-14 22:15:46 公開日:2023-11-11
# ベル実験の解説 An explanation of the Bell experiment ( http://arxiv.org/abs/2305.05299v6 ) ライセンス: Link先を確認	Inge S. Helland	(参考訳) ベル実験は、量子力学の基礎に対する新しいアプローチとして議論されている。基本的なモデルから、どんなオブザーバーの心も何らかの方法で制限されなければならないと結論づけられる: ある文脈では、彼は単に意思決定時に十分な変数を心に保持できない。これはベルの定理の帰結であるが、より広い結果をもたらすようである。 The Bell experiment is discussed in the light of a new approach to the foundation of quantum mechanics. It is concluded from the basic model that the mind of any observer must be limited in some way: In certain contexts, he is simply not able to keep enough variables in his mind when making decisions. This has consequences for Bell's theorem, but it also seems to have wider consequences.	翻訳日:2023-11-14 22:06:29 公開日:2023-11-11
# ロバストツリーアンサンブルの検証可能な学習 Verifiable Learning for Robust Tree Ensembles ( http://arxiv.org/abs/2305.03626v4 ) ライセンス: Link先を確認	Stefano Calzavara, Lorenzo Cazzaro, Giulio Ermanno Pibiri, Nicola Prezza	(参考訳) テスト時の回避攻撃に対する機械学習モデルの堅牢性を検証することは重要な研究課題である。残念なことに、この問題は決定木アンサンブルに対してNPハードであることが証明され、従って特定の入力に対して難解となる。本稿では,多項式時間で動作するセキュリティ検証アルゴリズムを付加した,大規模分散アンサンブルと呼ばれる決定木アンサンブルの制限クラスを同定する。次に,効率的な検証が可能な制限付きモデルクラスのトレーニングを提唱する,verizable learningと呼ばれる新しいアプローチを提案する。我々は,ラベル付きデータから大域的な決定木を自動学習する新しい学習アルゴリズムを設計し,多項式時間でセキュリティ検証を可能にすることにより,このアイデアの利点を示す。公開データセットの実験結果から,我々のアルゴリズムを用いてトレーニングした大域的なアンサンブルが,標準的な商用ハードウェアを用いて数秒で検証可能であることを確認した。さらに、大スプレッドアンサンブルは、非敵対的な設定において許容される精度の損失を犠牲にして、従来の回避攻撃に対するアンサンブルよりも頑丈である。 Verifying the robustness of machine learning models against evasion attacks at test time is an important research problem. Unfortunately, prior work established that this problem is NP-hard for decision tree ensembles, hence bound to be intractable for specific inputs. In this paper, we identify a restricted class of decision tree ensembles, called large-spread ensembles, which admit a security verification algorithm running in polynomial time. We then propose a new approach called verifiable learning, which advocates the training of such restricted model classes which are amenable for efficient verification. We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data, thus enabling its security verification in polynomial time. Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds, using standard commercial hardware. Moreover, large-spread ensembles are more robust than traditional ensembles against evasion attacks, at the cost of an acceptable loss of accuracy in the non-adversarial setting.	翻訳日:2023-11-14 22:06:24 公開日:2023-11-11
# mixpro:プロンプトベース学習のためのシンプルで効果的なデータ拡張 MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning ( http://arxiv.org/abs/2304.09402v2 ) ライセンス: Link先を確認	Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che	(参考訳) プロンプトに基づく学習は、元の入力と所定のテンプレートを組み合わせることで、様々な下流タスクをクローゼ問題として再構成する上で大きな可能性を示してきた。このアプローチは、特に少ない量のデータに基づいてモデルをトレーニングする、数ショットの学習シナリオにおいて、その効果を示す。その成功にもかかわらず、少数のプロンプトベースの学習シナリオで限定されたテンプレートとテキストは、パフォーマンス改善の余地を残している。さらに、既存の手法ではモデルアンサンブルを利用する場合もあるが、計算要求の増加によりモデル効率が低下する可能性がある。これらの問題に対処するため,我々は,バニラ入力テキストとテンプレートの両方を補完する拡張手法であるmixproを紹介する。これをトークンレベル、文レベル、テンプレートレベルのミックスアップ戦略を通じて実装します。 5つの数ショットデータセットの実験結果は、mixproが他の拡張ベースラインよりも優れており、拡張前のモデルパフォーマンスが平均5.8%向上していることを示している。 Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template. This approach demonstrates its effectiveness, especially in few-shot learning scenarios, where the model is trained on a scarce amount of data. Despite its successes, the limited templates and text in few-shot prompt-based learning scenarios leave significant room for performance improvement. Moreover, existing methods sometimes resort to model ensembles, which, while effective, could potentially hamper model efficiency due to increased computational demands. To address these issues, we introduce MixPro, an augmentation method designed to augment both the vanilla input text and the templates. We implement this through the token-level, the sentence-level, and the template-level Mixup strategies. The experimental results on five few-shot datasets show that MixPro outperforms other augmentation baselines, improving model performance by an average of 5.08% compared to before augmentation.	翻訳日:2023-11-14 22:04:31 公開日:2023-11-11
# ヒトiPSC再プログラム成功の早期予測に向けて Towards Early Prediction of Human iPSC Reprogramming Success ( http://arxiv.org/abs/2305.14575v2 ) ライセンス: Link先を確認	Abhineet Singh, Ila Jasra, Omar Mouhammed, Nidheesh Dadheech, Nilanjan Ray, James Shapiro	(参考訳) 本報告では,iPSCを再生細胞療法の候補として,ヒト誘導多能性幹細胞(iPSCs)のプログラム成功の早期自動予測の進歩について述べる。そのため、数百万の細胞を培養し、単一の最適なクローンを特定するために複数のクローンの強力な生物学的精査が必要である。熟成の初期段階において、どの細胞が最適なiPSCラインとして成立するかを確実に予測できる能力は、パーソナライズドメディカルへの実用的で費用対効果の高いアプローチである。細胞増殖の経時変化に関する時間的情報はその将来の成長予測に不可欠である。このデータを生成するために,我々はまず,超高分解能顕微鏡を用いて培養中のiPSCの連続時間ラプス撮影を行った。そこで我々は、信頼できる手動識別が可能な後期画像に、細胞の位置とアイデンティティを注釈付けした。次に, 半自動追跡システムを用いてラベルを後方に伝播させ, 成長初期のラベルを得る。最後に、このデータを用いてディープニューラルネットワークをトレーニングし、セルのセグメンテーションと分類を自動実行する。私たちのコードとデータはhttps://github.com/abhineet123/ipsc_predictionで入手できます。 This paper presents advancements in automated early-stage prediction of the success of reprogramming human induced pluripotent stem cells (iPSCs) as a potential source for regenerative cell therapies.The minuscule success rate of iPSC-reprogramming of around $ 0.01% $ to $ 0.1% $ makes it labor-intensive, time-consuming, and exorbitantly expensive to generate a stable iPSC line. Since that requires culturing of millions of cells and intense biological scrutiny of multiple clones to identify a single optimal clone. The ability to reliably predict which cells are likely to establish as an optimal iPSC line at an early stage of pluripotency would therefore be ground-breaking in rendering this a practical and cost-effective approach to personalized medicine. Temporal information about changes in cellular appearance over time is crucial for predicting its future growth outcomes. In order to generate this data, we first performed continuous time-lapse imaging of iPSCs in culture using an ultra-high resolution microscope. We then annotated the locations and identities of cells in late-stage images where reliable manual identification is possible. Next, we propagated these labels backwards in time using a semi-automated tracking system to obtain labels for early stages of growth. Finally, we used this data to train deep neural networks to perform automatic cell segmentation and classification. Our code and data are available at https://github.com/abhineet123/ipsc_prediction.	翻訳日:2023-11-14 21:54:42 公開日:2023-11-11
# 代理モデルを用いた深部強化学習エージェントのテスト Testing of Deep Reinforcement Learning Agents with Surrogate Models ( http://arxiv.org/abs/2305.12751v2 ) ライセンス: Link先を確認	Matteo Biagiola, Paolo Tonella	(参考訳) 近年,深層強化学習 (DRL) が研究コミュニティから注目を集めている。この技術は、ゲームプレイから自動運転車やロボティクスといった実践的なコンテキストに移行するため、drlエージェントの品質を評価することが不可欠である。本稿では,このようなエージェントを検索ベースでテストする手法を提案する。 Indagoと呼ばれるツールで実装された我々のアプローチは、DRLトレーニングプロセスから生じる障害環境と非障害環境(すなわちパス)の分類器を訓練する。この分類器は、テスト時に環境におけるdrlエージェントの実行のサロゲートモデルとして使用され、与えられた環境設定がテスト中のdrlエージェントの障害を引き起こす程度を予測する。障害予測は適合関数として機能し、障害環境設定への生成を導くと同時に、障害を露呈する可能性のある構成に対して環境内のdrlエージェントの実行を遅らせることで、計算時間を節約する。実験の結果,我々の検索手法は最先端技術よりもDRLエージェントの失敗率が50%多いことがわかった。さらに、このような障害は平均して78%多様であり、同様に障害構成によって誘発されるDRLエージェントの挙動は74%多様である。 Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment (i.e., pass) configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. The failure prediction acts as a fitness function, guiding the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failures are, on average, 78% more diverse; similarly, the behaviors of the DRL agent induced by failure configurations are 74% more diverse.	翻訳日:2023-11-14 21:51:52 公開日:2023-11-11
# StEik: ニューラルサイン付き距離関数の最適化と有限形状表現の安定化 StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation ( http://arxiv.org/abs/2305.18414v3 ) ライセンス: Link先を確認	Huizong Yang, Yuxin Sun, Ganesh Sundaramoorthi, Anthony Yezzi	(参考訳) 形態の暗黙的神経表現(INR)を学習するための新しい知見と新しいパラダイム(StEik)を提案する。特に,INRに符号付き距離関数制約を課すのによく使われるエイコナール損失に光を当てた。ネットワークの表現力が増加するにつれて、最適化は連続極限における偏微分方程式(PDE)に近づき、不安定となることを示す。この不安定性は, 既設のネットワーク最適化において発現し, 再構成表面の不規則性や, 局所的局所最小値への収束を招き, 微妙な幾何学的・位相的構造を捉えることができないことを示す。我々は、現在文献で使われている損失に付加された他の用語が、実際にこれらの不安定性を排除することができるかを分析的に示す。しかし、そのような用語は表面を過度に規則化することができ、微細な形状の表現を妨げている。同様の連続体極限のpde理論に基づき、固有不安定性は相反するが過剰正規化はしない新しい正規化項を導入する。さらに, 安定度は連続限界で保証されているため, この安定化により, より微細な形状の細部を表現できる新しいネットワーク構造も検討できる。このような構造を二次層に導入する。複数のベンチマークデータセットの実験により、我々の新しい正規化とネットワークは、既存の最先端技術よりも正確な形状の詳細と正確なトポロジを捉えることができることが示された。 We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.	翻訳日:2023-11-14 21:40:00 公開日:2023-11-11
# GlyphControl:ビジュアルテキスト生成のためのグリフ条件制御 GlyphControl: Glyph Conditional Control for Visual Text Generation ( http://arxiv.org/abs/2305.18259v2 ) ライセンス: Link先を確認	Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen	(参考訳) 近年,コヒーレントでよく表現されたビジュアルテキストを生成できる拡散型テキスト対画像生成モデルの開発が注目されている。本稿では,この課題に対処するために,GlyphControlという新しい,効率的な手法を提案する。 ByT5のような文字認識型テキストエンコーダに依存し、テキスト・ツー・イメージモデルの再訓練を必要とする既存の方法とは異なり、本手法ではグリフ条件情報を活用して、正確なビジュアルテキストを生成する際に、既製の安定拡散モデルの性能を向上させる。 glyph命令を組み込むことで、ユーザーは特定の要求に応じて生成されたテキストの内容、場所、サイズをカスタマイズできる。視覚テキスト生成のさらなる研究を容易にするため,LAION-Glyphと呼ばれるトレーニングベンチマークデータセットを構築した。提案手法の有効性を,OCRに基づく測定値,CLIPスコア,FIDを用いて評価した。 GlyphControl は OCR の精度,CLIP スコア,FID の点で近年の DeepFloyd IF アプローチよりも優れており,本手法の有効性が示された。 Recently, there has been an increasing interest in developing diffusion-based text-to-image generative models capable of generating coherent and well-formed visual text. In this paper, we propose a novel and efficient approach called GlyphControl to address this task. Unlike existing methods that rely on character-aware text encoders like ByT5 and require retraining of text-to-image models, our approach leverages additional glyph conditional information to enhance the performance of the off-the-shelf Stable-Diffusion model in generating accurate visual text. By incorporating glyph instructions, users can customize the content, location, and size of the generated text according to their specific requirements. To facilitate further research in visual text generation, we construct a training benchmark dataset called LAION-Glyph. We evaluate the effectiveness of our approach by measuring OCR-based metrics, CLIP score, and FID of the generated visual text. Our empirical evaluations demonstrate that GlyphControl outperforms the recent DeepFloyd IF approach in terms of OCR accuracy, CLIP score, and FID, highlighting the efficacy of our method.	翻訳日:2023-11-14 21:39:36 公開日:2023-11-11
# NashFormer: 局所的なNash平衡を利用した意味的多元性軌道予測 NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction ( http://arxiv.org/abs/2305.17600v3 ) ライセンス: Link先を確認	Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman	(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,マルチモーダル予測のカバレッジ向上のために,ゲーム理論の逆強化学習を活用する軌道予測フレームワークであるNashFormerを提案する。トレーニング時間ゲーム理論解析を補助的損失として用いて,エージェントの行動の分類を仮定することなく,カバレッジと精度を向上させる。 Waymo Open Motion Datasetのインタラクティブな分割について,対話性の高いシナリオを含む4つのサブセットを含む,私たちのアプローチを実証する。実験の結果,予測器はベースラインモデルよりも3,3\%以上の潜在的な相互作用をカバーし,正確な予測を行うことがわかった。 Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model.	翻訳日:2023-11-14 21:38:55 公開日:2023-11-11
# 領域一般化における不均一性の定量的測定と対比 Quantitatively Measuring and Contrastively Exploring Heterogeneity for Domain Generalization ( http://arxiv.org/abs/2305.15889v3 ) ライセンス: Link先を確認	Yunze Tong, Junkun Yuan, Min Zhang, Didi Zhu, Keli Zhang, Fei Wu, Kun Kuang	(参考訳) ドメイン一般化(dg、domain generalization)は、実世界のアプリケーションでよく見られる問題であり、複数のソースドメインを利用することで、対象とするドメインの well-generalized モデルを訓練することを目的としている。ドメインラベル、すなわち、各データポイントがサンプリングされたドメインが自然に存在するため、ほとんどのdgアルゴリズムは、それらを一般化性能を改善するための監督情報の一種として扱う。しかし、元のドメインラベルはドメインの不均一性の欠如、すなわちドメイン間の多様性のため、最適な監視信号ではないかもしれない。例えば、あるドメインのサンプルは別のドメインに近い場合があり、その元のラベルは一般化学習を妨げるノイズとなる可能性がある。ドメインを再分割し、新たに生成された分割パターンを適用することでそれを解こうとする手法もあるが、不均一性の計量が欠如しているため、それらが選択するパターンは最も異種でないかもしれない。本稿では、ドメインの不均一性は主に不変学習フレームワークの下での変分特徴にあることを指摘する。対照的な学習法により,学習の変分的特徴を促進させることにより,ドメインの不均一性を学習可能な指標を提案する。次に, 分散に基づく不均一性を求めることと, 学習不変性に基づく一般化モデルの違いに着目する。そこで本研究では,DGタスクのための異種性に基づく二段階コントラスト学習(HTCL)を提案する。第一段階では、最も異質な分割パターンを対照的な計量で生成する。第2段階では、ドメインやクラスが示唆する安定した関係とペアを再構築し、生成したドメインラベルを一般化学習に有効活用することで、不変性を考慮したコントラスト学習を行う。広範囲な実験により、htclは異質性をよりよく掘り出し、大きな一般化性能をもたらすことが示されている。 Domain generalization (DG) is a prevalent problem in real-world applications, which aims to train well-generalized models for unseen target domains by utilizing several source domains. Since domain labels, i.e., which domain each data point is sampled from, naturally exist, most DG algorithms treat them as a kind of supervision information to improve the generalization performance. However, the original domain labels may not be the optimal supervision signal due to the lack of domain heterogeneity, i.e., the diversity among domains. For example, a sample in one domain may be closer to another domain, its original label thus can be the noise to disturb the generalization learning. Although some methods try to solve it by re-dividing domains and applying the newly generated dividing pattern, the pattern they choose may not be the most heterogeneous due to the lack of the metric for heterogeneity. In this paper, we point out that domain heterogeneity mainly lies in variant features under the invariant learning framework. With contrastive learning, we propose a learning potential-guided metric for domain heterogeneity by promoting learning variant features. Then we notice the differences between seeking variance-based heterogeneity and training invariance-based generalizable model. We thus propose a novel method called Heterogeneity-based Two-stage Contrastive Learning (HTCL) for the DG task. In the first stage, we generate the most heterogeneous dividing pattern with our contrastive metric. In the second stage, we employ an invariance-aimed contrastive learning by re-building pairs with the stable relation hinted by domains and classes, which better utilizes generated domain labels for generalization learning. Extensive experiments show HTCL better digs heterogeneity and yields great generalization performance.	翻訳日:2023-11-14 21:37:54 公開日:2023-11-11
# 変分量子シミュレーションのスケーラビリティ向上のためのフェルミオンシミュレータ Fermionic Simulators for Enhanced Scalability of Variational Quantum Simulation ( http://arxiv.org/abs/2306.14842v2 ) ライセンス: Link先を確認	Qingyu Li, Chiranjib Mukhopadhyay, Abolfazl Bayat	(参考訳) 短期量子シミュレータは主に量子ビットベースのアーキテクチャに基づいている。しかし、その不完全な性質は実用性を著しく制限している。この状況は、物質科学や化学のほとんどを根底にあるフェルミオン系をシミュレートする上でさらに悪化している。光ツイーザーにおける中性原子のトラップと操作の最近の進歩により、デジタルフェルミオン量子シミュレーターが実現しつつある。鍵となる疑問は、これらの出現するフェルミオンシミュレータが、強い相関電子系を特徴づけるためにキュービットベースのシミュレータより優れているかどうかである。本稿では, 凝縮体系と量子化学問題の両方におけるフェルミオン系の変動基底状態エミュレーションのための量子ビットシミュレータとフェルミオンシミュレータとの資源効率の包括的比較を行う。フェルミイオンシミュレータは量子進化の資源(循環深さ)や古典的最適化(必要パラメータ数と反復数)において量子ビットシミュレータよりも優れていることを示す。さらに、回路のランダム初期化に対する感度を低下させる。フェルミオンシミュレータの相対的な利点は、相互作用が強くなるにつれてさらに顕著になり、また、スピンフルフェルミオンと同様に1次元以上のトンネルが許される。重要なのは、この改善はスケーラブルであり、fermionicシミュレータとqubitシミュレータのパフォーマンスギャップは、より大きなシステムサイズでのみ大きくなることだ。 Near-term quantum simulators are mostly based on qubit-based architectures. However, their imperfect nature significantly limits their practical application. The situation is even worse for simulating fermionic systems, which underlie most of material science and chemistry, as one has to adopt fermion-to-qubit encodings which create significant additional resource overhead and trainability issues. Thanks to recent advances in trapping and manipulation of neutral atoms in optical tweezers, digital fermionic quantum simulators are becoming viable. A key question is whether these emerging fermionic simulators can outperform qubit-based simulators for characterizing strongly correlated electronic systems. Here, we perform a comprehensive comparison of resource efficiency between qubit and fermionic simulators for variational ground-state emulation of fermionic systems in both condensed matter systems and quantum chemistry problems. We show that the fermionic simulators indeed outperform their qubit counterparts with respect to resources for quantum evolution (circuit depth), as well as classical optimization (number of required parameters and iterations). In addition, they show less sensitivity to the random initialization of the circuit. The relative advantage of fermionic simulators becomes even more pronounced as interaction becomes stronger, or tunneling is allowed in more than one dimension, as well as for spinful fermions. Importantly, this improvement is scalable, i.e., the performance gap between fermionic and qubit simulators only grows for bigger system sizes.	翻訳日:2023-11-14 21:29:56 公開日:2023-11-11
# 二次元半導体における電磁ボソンの制御可能な融合 Controllable fusion of electromagnetic bosons in two-dimensional semiconductors ( http://arxiv.org/abs/2306.14225v2 ) ライセンス: Link先を確認	Sergue\"i V. Andreev	(参考訳) 二次元(2次元)半導体における同一電磁ボソン(励起子またはポラリトン)の制御可能な相互作用の実装のための物理原理を提案する。鍵となる成分は、例えば一軸ひずみによるホスト構造の強結合二エクシトンおよび面内異方性である。放射励起子2重項の異方性による分裂は、バイエクシトン状態とボソン散乱状態の連続性を結合させることを示す。その結果、横磁場を印加したり、マイクロキャビティ光子モードとの結合を調整することにより、バイエクシトンに近接してエネルギー的に調整されたときに、ボソンの2体弾性散乱を共鳴増幅することができる。共鳴では、ボソニック場はそのスクイーズを伴う核融合の量子反応を受ける。励起子に対しては、共鳴を横切る磁場の急激な断熱的掃流によってバイエクシトンから得られる巨大分子(フェシュバッハ二量体)を予測する。分子は非自明な絡み合い特性を有する。我々の提案は、強い相関のフォトニクスと光の量子化学を約束する。 We propose a physical principle for implementation of controllable interactions of identical electromagnetic bosons (excitons or polaritons) in two-dimensional (2D) semiconductors. The key ingredients are tightly bound biexcitons and in-plane anisotropy of the host structure due to, e.g., a uniaxial strain. We show that anisotropy-induced splitting of the radiative exciton doublet couples the biexciton state to continua of boson scattering states. As a result, two-body elastic scattering of bosons may be resonantly amplified when energetically tuned close to the biexciton by applying a transverse magnetic field or tuning the coupling with the microcavity photon mode. At the resonance, bosonic fields undergo quantum reaction of fusion accompanied by their squeezing. For excitons, we predict giant molecules (Feshbach dimers) which can be obtained from a biexciton via rapid adiabatic sweeping of the magnetic field across the resonance. The molecules possess non-trivial entanglement properties. Our proposal holds promise for the strongly-correlated photonics and quantum chemistry of light.	翻訳日:2023-11-14 21:29:31 公開日:2023-11-11
# 文脈性、コヒーレンス、量子チェシャー猫 Contextuality, Coherences, and Quantum Cheshire Cats ( http://arxiv.org/abs/2307.06583v2 ) ライセンス: Link先を確認	Jonte R. Hance, Ming Ji, Holger F. Hofmann	(参考訳) 我々は、文脈性理論を用いて量子チェシャイア猫を分析し、このパラドックスを解釈する最善の方法が何かわかるかどうかを確かめる。このシナリオは3つの異なる測定値の関係を用いて解析できることを示すが、これは論理的な矛盾をもたらすと考えられる。この文脈的振る舞いが弱値とどのようにつながり、禁止状態間の一貫性を議論する。量子チェシャー猫(quantum cheshire cat)は、粒子の性質を示すのではなく、これらのコヒーレンスの効果を示す。 We analyse the quantum Cheshire cat using contextuality theory, to see if this can tell us anything about how best to interpret this paradox. We show that this scenario can be analysed using the relation between three different measurements, which seem to result in a logical contradiction. We discuss how this contextual behaviour links to weak values, and coherences between prohibited states. Rather than showing a property of the particle is disembodied, the quantum Cheshire cat instead demonstrates the effects of these coherences, which are typically found in pre- and postselected systems.	翻訳日:2023-11-14 21:17:46 公開日:2023-11-11
# 開量子系における共鳴支配光力学的絡み合い Resonance-dominant optomechanical entanglement in open quantum systems ( http://arxiv.org/abs/2307.12383v2 ) ライセンス: Link先を確認	Cheng Shang and Hongchao Li	(参考訳) 絡み合い保護に動機づけられ,共振効果を用いてコヒーレント状態表現における光力学的絡み合いを高める。本研究では, 熱力学モードと周辺熱浴との間に有意な変動成分を弱結合限界でフィルタするフィルタモデルを提案する。連続変数の絡み合いの保護は、重要な変形成分に関連する自由度を排除し、脱コヒーレンスに抵抗することを明らかにする。本研究では, フィルタモデルの非線形ランゲヴィン方程式を構築し, 温度ゆらぎノイズと機械減衰との定常的最大最適エンタングルメントのロバスト性を数値的に示す。さらに、これらの結果を1つの振動するエンドミラーを持つ光学キャビティアレイに一般化し、長距離最適オプティメカニカルエンタングルメント転送について検討する。本研究は, 量子システムのデコヒーレンスから保護し, 大規模量子情報処理と量子ネットワーク構築の可能性を高めるために, 共鳴効果を適用した新たな基盤を打破する。 Motivated by entanglement protection, our work utilizes a resonance effect to enhance optomechanical entanglement in the coherent-state representation. We propose a filtering model to filter out the significant detuning components between a thermal-mechanical mode and its surrounding heat baths in the weak coupling limit. We reveal that protecting continuous-variable entanglement involves the elimination of degrees of freedom associated with significant detuning components, thereby resisting decoherence. We construct a nonlinear Langevin equation of the filtering model and numerically show that the filtering model doubles the robustness of the stationary maximum optomechanical entanglement to the thermal fluctuation noise and mechanical damping. Furthermore, we generalize these results to an optical cavity array with one oscillating end-mirror to investigate the long-distance optimal optomechanical entanglement transfer. Our study breaks new ground for applying the resonance effect to protect quantum systems from decoherence and advancing the possibilities of large-scale quantum information processing and quantum network construction.	翻訳日:2023-11-14 21:04:58 公開日:2023-11-11
# バイオメディカル自然言語処理におけるフェデレーション学習の深度評価 An In-Depth Evaluation of Federated Learning on Biomedical Natural Language Processing ( http://arxiv.org/abs/2307.11254v2 ) ライセンス: Link先を確認	Le Peng, Gaoxiang Luo, sicheng zhou, jiandong chen, Rui Zhang, Ziyue Xu, Ju Sun	(参考訳) BERTやGPTのような言語モデル(LM)は自然言語処理(NLP)に革命をもたらした。しかし、医療分野は、医療保険ポータビリティ・アンド・アカウンタビリティ法(hippa)や一般データ保護規則(gdpr)などの規制によって課されるデータアクセスの制限とプライバシーの制約により、lmsの訓練が困難に直面している。フェデレートラーニング(FL)は、データプライバシを確保しながら協調学習を可能にする分散ソリューションを提供する。本研究では,8コーパスを含む2つのバイオメディカルNLPタスクのFLを6 LMを用いて評価した。結果はこう示しています 1) flモデルは、個々のクライアントのデータに基づいてトレーニングされたモデルよりも一貫して優れており、時々、ポーリングされたデータで訓練されたモデルと互換性がある。 2) 総データ量は一定であり, クライアント数の多いFLモデルでは性能は劣るが, 事前学習したトランスフォーマーモデルでは高いレジリエンスを示した。 3) FLモデルはゼロ・ワンショット学習と稲妻推論速度によって大きな言語モデルよりも優れていた。 Language models (LMs) such as BERT and GPT have revolutionized natural language processing (NLP). However, the medical field faces challenges in training LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring data privacy. In this study, we evaluated FL on 2 biomedical NLP tasks encompassing 8 corpora using 6 LMs. Our results show that: 1) FL models consistently outperformed models trained on individual clients' data and sometimes performed comparably with models trained with polled data; 2) with the fixed number of total data, FL models training with more clients produced inferior performance but pre-trained transformer-based models exhibited great resilience. 3) FL models significantly outperformed large language models using zero-/one-shot learning and offered lightning inference speed.	翻訳日:2023-11-14 21:03:00 公開日:2023-11-11
# 自律走行車のための連帯学習--既存手法と課題の検討 Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges ( http://arxiv.org/abs/2308.10407v2 ) ライセンス: Link先を確認	Vishnu Pandi Chellapandi and Liangqi Yuan and Christopher G. Brinton and Stanislaw H Zak and Ziran Wang	(参考訳) 機械学習(ml)は、知覚、計画、制御を含む、コネクテッドおよび自動車両(cav)における重要なタスクに広く使われている。しかし、モデルトレーニングにおける車両データへの依存は、車内ユーザのプライバシと大量のデータボリュームが生み出す通信オーバーヘッドに重大な課題をもたらす。フェデレートラーニング(FL)は、複数の車両が協力してモデルを開発し、さまざまな運転環境からの学習を拡大し、全体的なパフォーマンスを高め、ローカル車両のデータプライバシとセキュリティを同時に確保する、分散MLアプローチである。本報告では, FL の CAV (FL4CAV) への適用における進歩について概説する。まず、flの集中型フレームワークと分散フレームワークを分析し、その重要な特徴と方法論を強調する。次に、CAVにおけるFLに関連する多様なデータソース、モデル、およびデータセキュリティ技術についてレビューし、プライバシーと機密性を保証することの重要性を強調した。第3に、flの特定のアプリケーションが検討され、各アプリケーションで採用されるベースモデルとデータセットについての洞察を提供する。最後に、FL4CAVの既存の課題をリストアップし、CAVにおけるFLの有効性と効率をさらに高めるための今後の研究の方向性について論じる。 Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future investigation to further enhance the effectiveness and efficiency of FL in the context of CAV are discussed.	翻訳日:2023-11-14 20:53:03 公開日:2023-11-11
# rtllm: 大きな言語モデルによるrtl生成のためのオープンソースベンチマーク RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model ( http://arxiv.org/abs/2308.05345v3 ) ライセンス: Link先を確認	Yao Lu, Shang Liu, Qijun Zhang, Zhiyao Xie	(参考訳) ChatGPTのような最近の大規模言語モデル(LLM)の成功に触発されて、研究者は、自然言語命令に基づいた設計RTLの生成など、アジャイルハードウェア設計におけるLLMの採用を探り始めた。しかし、既存の研究では、それらのターゲット設計はすべて比較的単純で小規模であり、著者自身によって提案されており、異なるLLMソリューション間で公正に比較することは困難である。さらに、多くの先行作品は、生成した設計rtlの設計品質を評価することなく、設計の正確性にのみ焦点を合わせている。本研究では,自然言語命令を用いた設計RTLを生成するRTLLMというオープンソースのベンチマークを提案する。自動生成設計RTLを体系的に評価するために,構文目標,機能目標,設計品質目標の3つの段階目標をまとめた。このベンチマークは、任意のLCMベースのソリューションを定量的に評価する。さらに,提案するベンチマークにおいて,gpt-3.5の性能が大幅に向上することを示すセルフプランニングという,簡便かつ驚くほど効果的なプロンプトエンジニアリング手法を提案する。 Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.	翻訳日:2023-11-14 20:50:44 公開日:2023-11-11
# 加速光によるフォトニック絡み合い Photonic entanglement with accelerated light ( http://arxiv.org/abs/2308.01764v3 ) ライセンス: Link先を確認	R. C. Souza Pimenta, G. H. dos Santos, A. B. Barreto, L. C. Celeri and P. H. Souto Ribeiro	(参考訳) 加速光はレーザー光と回折で実証されている。回折場内では、例えば重力場によって加速されたような曲線軌道で伝播するビームエネルギーの大部分を運ぶ部分を特定することができる。ここでは、自然パラメトリックダウンコンバージョンで発生する双対ビーム間の絡み合いに対するこの種の加速度の影響を解析する。その結果, 加速度は理想的な条件下では絡み合いに大きく影響しないことがわかった。導入された光学スキームは重力と量子物理学の境界における過程の理解に有用である。 Accelerated light has been demonstrated with laser light and diffraction. Within the diffracting field it is possible to identify a portion that carries most of the beam energy, which propagates in a curved trajectory as it would have been accelerated by a gravitational field for instance. Here, we analyze the effects of this kind of acceleration over the entanglement between twin beams produced in spontaneous parametric down-conversion. Our results show that acceleration does not affect entanglement significantly, under ideal conditions. The optical scheme introduced can be useful in the understanding of processes in the boundary between gravitation and quantum physics.	翻訳日:2023-11-14 20:50:10 公開日:2023-11-11
# 非線形超伝導マイクロ波システムのスペクトル理論:緩和率の抽出とモードハイブリダイゼーション Spectral Theory for Non-linear Superconducting Microwave Systems: Extracting Relaxation Rates and Mode Hybridization ( http://arxiv.org/abs/2309.03435v2 ) ライセンス: Link先を確認	Dung N. Pham, Richard D. Li, Hakan E. T\"ureci	(参考訳) モードハイブリダイゼーションの正確なモデリングと放射緩和率の計算は超伝導量子デバイスの設計と最適化に不可欠である。本研究では,超伝導体の一般三次元分布における励起緩和率の抽出を可能にする超伝導体の電気流体力学のスペクトル理論を提案する。提案手法は, 効率が高く, 放射型ハイブリダイゼーション場を2次量子化できるオープンシステムのモーダル記述を定式化する, 長年の課題に対処する。これは、放射が計算領域内と外へ伝播できる有限だが透明な境界を実装することで達成される。結果として生じるスペクトル問題は、多スケール超伝導量子系の非平衡ダイナミクスの解析に適した電気流体力学方程式の粗い定式化の中で定義される。 The accurate modeling of mode hybridization and calculation of radiative relaxation rates have been crucial to the design and optimization of superconducting quantum devices. In this work, we introduce a spectral theory for the electrohydrodynamics of superconductors that enables the extraction of the relaxation rates of excitations in a general three-dimensional distribution of superconducting bodies. Our approach addresses the long-standing problem of formulating a modal description of open systems that is both efficient and allows for second quantization of the radiative hybridized fields. This is achieved through the implementation of finite but transparent boundaries through which radiation can propagate into and out of the computational domain. The resulting spectral problem is defined within a coarse-grained formulation of the electrohydrodynamical equations that is suitable for the analysis of the non-equilibrium dynamics of multiscale superconducting quantum systems.	翻訳日:2023-11-14 20:39:40 公開日:2023-11-11
# 自律走行における運動関連モジュールのDRLに基づく軌道追跡 DRL-Based Trajectory Tracking for Motion-Related Modules in Autonomous Driving ( http://arxiv.org/abs/2308.15991v2 ) ライセンス: Link先を確認	Yinda Xu, Lidong Yu	(参考訳) 自律運転システムは、常にプランナーやコントローラのような運動関連モジュール上に構築される。これらの運動関連モジュールを原始ルーチンとして高精度でロバストな軌道追跡法が不可欠である。現在の手法は、コンテキストやダイナミクスのようなモデルについて強い仮定をすることが多いが、現実のシステムの変化するシナリオに対処するには不十分である。本稿では,自律走行システムにおける運動関連モジュールに対する深部強化学習(DRL)に基づく軌道追跡手法を提案する。 DLの表現学習能力とRLの探索特性は強靭性と精度の向上をもたらす。一方、モデルフリーでデータ駆動の方法で軌道追跡を実行することで、汎用性を高める。広範な実験により,現在の手法と比較して,提案手法の効率性と有効性の両方を実証した。 Autonomous driving systems are always built on motion-related modules such as the planner and the controller. An accurate and robust trajectory tracking method is indispensable for these motion-related modules as a primitive routine. Current methods often make strong assumptions about the model such as the context and the dynamics, which are not robust enough to deal with the changing scenarios in a real-world system. In this paper, we propose a Deep Reinforcement Learning (DRL)-based trajectory tracking method for the motion-related modules in autonomous driving systems. The representation learning ability of DL and the exploration nature of RL bring strong robustness and improve accuracy. Meanwhile, it enhances versatility by running the trajectory tracking in a model-free and data-driven manner. Through extensive experiments, we demonstrate both the efficiency and effectiveness of our method compared to current methods.	翻訳日:2023-11-14 20:38:16 公開日:2023-11-11
# GraphがLLMと出会い、大規模グラフモデルへ Graph Meets LLMs: Towards Large Graph Models ( http://arxiv.org/abs/2308.14522v2 ) ライセンス: Link先を確認	Ziwei Zhang, Haoyang Li, Zeyang Zhang, Yijian Qin, Xin Wang, Wenwu Zhu	(参考訳) 人工知能、特に機械学習における最近の画期的な成果として、大きなモデルが現れている。しかし、グラフに関して言えば、大きなモデルは自然言語処理やコンピュータビジョンといった他の分野と同様の成功レベルに達していない。グラフに対する大規模モデルの適用を促進するために,我々は,大規模グラフモデルの開発に伴う課題と機会について議論する。まず,大規模グラフモデルの望ましい特性について述べる。次に,表現基底,グラフデータ,グラフモデルという3つの視点から詳細な議論を行う。それぞれのカテゴリにおいて、最近の進歩の概要を簡潔に述べ、残りの課題をビジョンとともに強調します。最後に,大規模グラフモデルの有用な応用について論じる。この視点は、大きなグラフモデルに関するさらなる調査を促し、最終的には人工知能(AGI)に一歩近づいたと信じています。私たちは、知識を最大限に活用するために、大規模なグラフモデルを包括的に研究した最初の人物です。 Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective can encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI). We are the first to comprehensively study large graph models, to the best of our knowledge.	翻訳日:2023-11-14 20:37:33 公開日:2023-11-11
# Diffuse, Attend, Segment: 安定拡散を用いた教師なしゼロショットセグメンテーション Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion ( http://arxiv.org/abs/2308.12469v2 ) ライセンス: Link先を確認	Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco	(参考訳) 画像の品質セグメンテーションマスクの作成は、コンピュータビジョンの基本的な問題である。近年の研究では、画像スタイルのゼロショットセグメンテーションを可能にするための大規模教師あり訓練と、濃密なアノテーションを使わずにセグメンテーションを可能にする教師なしトレーニングが検討されている。しかし、アノテーションなしであらゆるものをゼロショットでセグメント化できるモデルを構築することは依然として難しい。本稿では, 自己付着層を安定拡散モデルに活用し, 事前学習した安定拡散モデルが注意層内における物体の固有概念を学習したことにより, この目標を達成することを提案する。具体的には,注意マップ間のklの発散を計測し,有効なセグメンテーションマスクにマージする簡易かつ効果的な反復的マージプロセスを提案する。提案手法は,画像の品質セグメンテーションを抽出するために訓練や言語依存を必要としない。 COCO-Stuff-27では,従来の教師なしゼロショットSOTA法を26%,IoU平均17%で上回っている。プロジェクトページは \url{https://sites.google.com/view/diffseg/home} にある。 Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at \url{https://sites.google.com/view/diffseg/home}.	翻訳日:2023-11-14 20:36:45 公開日:2023-11-11
# FinGPT:財務データセットにおけるオープンソースの大規模言語モデルのインストラクションチューニングベンチマーク FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets ( http://arxiv.org/abs/2310.04793v2 ) ライセンス: Link先を確認	Neng Wang, Hongyang Yang, Christina Dan Wang	(参考訳) 自然言語処理(NLP)分野が急速に拡大する中で、金融セクターにおけるGPTベースのモデルの可能性はますます明白になっている。しかしながら、これらのモデルと財務データセットの統合は、特にその妥当性と妥当性を決定する上で、課題を提起する。本稿では、特に財務状況に適応したオープンソースの大規模言語モデルに対して、インストラクションチューニングパラダイムに固有のアプローチを導入する。この方法論を通じて、我々はオープンソースのモデルの相互運用性を活かし、シームレスで透過的な統合を保証する。まず、インストラクションチューニングのパラダイムを説明し、即時統合の有効性を強調します。本稿では,エンドツーエンドのトレーニングとテストのためのベンチマーク手法を提案する。まず,名前付きエンティティ認識(NER)や感情分析などの基本的な能力と基本的なタスクを評価し,専門性を高める。次に、汎用性を調べるために全ての命令チューニングを融合してマルチタスク操作を実行する包括的モデルについて検討する。最後に,目立たないタスクを認識してゼロショット機能を探索し,未開の地形における適応性を理解するための新しいデータセットを組み込んだ。このようなパラダイムはオープン性と再現性の原則を立証し、オープンソースの金融大言語モデル(FinLLMs)における将来の調査の基盤となる。 In the swiftly expanding domain of Natural Language Processing (NLP), the potential of GPT-based models for the financial sector is increasingly evident. However, the integration of these models with financial datasets presents challenges, notably in determining their adeptness and relevance. This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models, specifically adapted for financial contexts. Through this methodology, we capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration. We begin by explaining the Instruction Tuning paradigm, highlighting its effectiveness for immediate integration. The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression. Firstly, we assess basic competencies and fundamental tasks, such as Named Entity Recognition (NER) and sentiment analysis to enhance specialization. Next, we delve into a comprehensive model, executing multi-task operations by amalgamating all instructional tunings to examine versatility. Finally, we explore the zero-shot capabilities by earmarking unseen tasks and incorporating novel datasets to understand adaptability in uncharted terrains. Such a paradigm fortifies the principles of openness and reproducibility, laying a robust foundation for future investigations in open-source financial large language models (FinLLMs).	翻訳日:2023-11-14 20:28:36 公開日:2023-11-11
# ソフトウェア工学のための大規模言語モデル:調査とオープン問題 Large Language Models for Software Engineering: Survey and Open Problems ( http://arxiv.org/abs/2310.03533v4 ) ライセンス: Link先を確認	Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, Jie M. Zhang	(参考訳) 本稿では,ソフトウェア工学(SE)におけるLarge Language Models(LLMs)の新興領域について調査する。また、llmをソフトウェアエンジニアが直面する技術的問題に適用するためのオープンリサーチの課題も規定している。 LLMの創発的な特性は、コーディング、設計、要求、修復、リファクタリング、パフォーマンス改善、ドキュメントと分析を含むソフトウェアエンジニアリングのアクティビティの範囲で、アプリケーションによって、斬新さと創造性をもたらします。しかし、これらの全く同じ創発的な性質は重要な技術的課題を生じさせ、幻覚のような不正確な解を確実に除去できる技術が必要である。本調査では,ハイブリッド技術(従来のSE+LLM)が,信頼性,効率的,効果的なLLMベースのSEの開発と展開において果たす役割を明らかにする。 This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.	翻訳日:2023-11-14 20:27:49 公開日:2023-11-11
# 数学質問改善のための検索強化生成:地味と人の嗜好のトレードオフ Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference ( http://arxiv.org/abs/2310.03184v2 ) ライセンス: Link先を確認	Zachary Levonian, Chenglu Li, Wangda Zhu, Anoushka Gade, Owen Henkel, Millie-Ellen Postle, Wanli Xing	(参考訳) 中学生にとって、教師との対話型質問応答(QA)は効果的な学習方法である。生成的大言語モデル(LLM)の柔軟性と創発的能力は、数学的概念に関する概念的議論を支援する対話型QAを含む、学習プロセスの一部を自動化することへの関心の高まりにつながっている。しかし、数学の質問に対する LLM の応答は、学校のカリキュラムと不一致であるなど、教育の文脈に正しく、あるいは不一致している可能性がある。潜在的な解決策の1つは検索強化生成(RAG)であり、LLMプロンプトに精査された外部知識ソースを組み込んで応答品質を向上させる。本稿では,高品質なオープンソース教科書からコンテンツを検索し,活用するプロンプトを設計し,実際の学生の質問に対する回答を生成する。我々は,中学代数学・幾何学QAにおけるRAGシステムの有効性を,多条件サーベイによって評価し,RAGを用いて生成した応答をヒトが好むが,教科書の内容に応答があまりに根付いていない場合ではないことを示した。我々は、RAGは応答品質を向上させることができるが、数学のQAシステムの設計者は、学生が好む応答と、特定の教育資源と密接に一致する応答とのトレードオフを検討する必要があると論じる。 For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.	翻訳日:2023-11-14 20:26:52 公開日:2023-11-11
# 量子エンタングルメント位相遷移と計算複雑性:イジングモデルからの考察 Quantum Entanglement Phase Transitions and Computational Complexity: Insights from Ising Models ( http://arxiv.org/abs/2310.01699v2 ) ライセンス: Link先を確認	Hanchen Liu, Vikram Ravindranath, and Xiao Chen	(参考訳) 本稿では,2次元のバイパートイトクラスタ状態を構築し,バルク量子ビットの単一量子ビット計測を行う。測定されていない1次元境界状態の絡み合いスケーリングを考察し、ある条件下では、境界状態が測定角度の変化によって駆動される領域則絡み転移に体積則を適用できることを示す。この境界状態絡み合い遷移と非単位1+1次元回路における測定誘起相転移を伝達行列法により橋渡しする。計算複雑性問題に対するこの絡み合い遷移の適用についても検討する。具体的には、境界状態の絡み合い遷移と、複雑なパラメータを持つ対応するイジング分割関数の計算複雑性に直接関係する2部2ドルのクラスター状態のサンプリング複雑性との関係を定式化する。境界状態の絡み合いスケーリングを調べることにより,2ドルの量子状態が効率的にサンプリングできるパラメータレジームを数値的に同定し,イジング分割関数をそのような領域で効率的に評価できることを示す。 In this paper, we construct 2-dimensional bipartite cluster states and perform single-qubit measurements on the bulk qubits. We explore the entanglement scaling of the unmeasured 1-dimensional boundary state and show that under certain conditions, the boundary state can undergo a volume-law to an area-law entanglement transition driven by variations in the measurement angle. We bridge this boundary state entanglement transition and the measurement-induced phase transition in the non-unitary 1+1-dimensional circuit via the transfer matrix method. We also explore the application of this entanglement transition on the computational complexity problems. Specifically, we establish a relation between the boundary state entanglement transition and the sampling complexity of the bipartite $2$d cluster state, which is directly related to the computational complexity of the corresponding Ising partition function with complex parameters. By examining the boundary state entanglement scaling, we numerically identify the parameter regime for which the $2$d quantum state can be efficiently sampled, which indicates that the Ising partition function can be evaluated efficiently in such a region.	翻訳日:2023-11-14 20:26:04 公開日:2023-11-11
# De-SaTE:Liイオン電池の健康診断のためのセルフアテンショントランスフォーマーエンコーダ De-SaTE: Denoising Self-attention Transformer Encoders for Li-ion Battery Health Prognostics ( http://arxiv.org/abs/2310.00023v2 ) ライセンス: Link先を確認	Gaurav Shinde, Rohan Mohapatra, Pooja Krishan and Saptarshi Sengupta	(参考訳) リチウムイオン電池の使用は、ポータブル電子機器の電源から電気自動車の推進、エネルギー貯蔵システムのサポートに至るまで、様々な産業で広く普及している。リチウムイオン電池の信頼性における中心的な課題は、持続的メンテナンスと予測分析にとって重要な指標であるRemaining Useful Life (RUL)を正確に予測することにある。本研究は,電池データに共通する特定のノイズに対処するよう訓練された,複数モジュールのパワーを利用する新しい手法を提案する。具体的には、消音オートエンコーダとウェーブレットデノイザーを使用して符号化/分解された表現を生成し、その後専用のセルフアテンショントランスフォーマエンコーダで処理する。 NASAとCALCEのデータに対する広範な実験の後、様々なノイズパターンの下で幅広い健康指標値が推定される。これらのデータに関する報告されたエラーメトリクスは、最近の文献で報告された最新技術と同等かそれ以上である。 The usage of Lithium-ion (Li-ion) batteries has gained widespread popularity across various industries, from powering portable electronic devices to propelling electric vehicles and supporting energy storage systems. A central challenge in Li-ion battery reliability lies in accurately predicting their Remaining Useful Life (RUL), which is a critical measure for proactive maintenance and predictive analytics. This study presents a novel approach that harnesses the power of multiple denoising modules, each trained to address specific types of noise commonly encountered in battery data. Specifically, a denoising auto-encoder and a wavelet denoiser are used to generate encoded/decomposed representations, which are subsequently processed through dedicated self-attention transformer encoders. After extensive experimentation on NASA and CALCE data, a broad spectrum of health indicator values are estimated under a set of diverse noise patterns. The reported error metrics on these data are on par with or better than the state-of-the-art reported in recent literature.	翻訳日:2023-11-14 20:25:46 公開日:2023-11-11
# PreM:ノードレベルグラフ異常検出のためのシンプルで効果的なアプローチ PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection ( http://arxiv.org/abs/2310.11676v2 ) ライセンス: Link先を確認	Junjun Pan, Yixin Liu, Yizhen Zheng, Shirui Pan	(参考訳) ノードレベルのグラフ異常検出(GAD)は、医学、ソーシャルネットワーク、eコマースなど、さまざまな領域におけるグラフ構造化データから異常ノードを特定する上で重要な役割を果たす。しかし、異常の多様性とラベル付きデータの変形により、問題が発生している。既存の方法論に基づくコントラスト学習 - 効率的ではあるが、しばしば効率上の問題に悩まされ、複雑な目的や精巧なモジュールから生じる。本稿では,GADの効率を向上させるために,PREM (preprocessing and Matching) という簡単な手法を提案する。我々のアプローチは、強力な異常検出機能を維持しながら、GADを合理化し、時間とメモリ消費を削減する。プリプロセッシングモジュールとego-neighborマッチングモジュールの2つのモジュールで構成されるpremは、トレーニング中にメッセージパッシング伝搬の必要性をなくし、単純なコントラスト損失を採用し、トレーニング時間とメモリ使用量を大幅に削減する。さらに,5つの実世界のデータセットの厳密な評価により,ロバスト性と有効性を示した。特に、ACMデータセットで検証された場合、PremMはAUCの5%の改善、トレーニング速度の9倍向上、最も効率的なベースラインと比較してメモリ使用量を大幅に削減した。 Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in various domains such as medicine, social networks, and e-commerce. However, challenges have arisen due to the diversity of anomalies and the dearth of labeled data. Existing methodologies - reconstruction-based and contrastive learning - while effective, often suffer from efficiency issues, stemming from their complex objectives and elaborate modules. To improve the efficiency of GAD, we introduce a simple method termed PREprocessing and Matching (PREM for short). Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities. Comprising two modules - a pre-processing module and an ego-neighbor matching module - PREM eliminates the necessity for message-passing propagation during training, and employs a simple contrastive loss, leading to considerable reductions in training time and memory usage. Moreover, through rigorous evaluations of five real-world datasets, our method demonstrated robustness and effectiveness. Notably, when validated on the ACM dataset, PREM achieved a 5% improvement in AUC, a 9-fold increase in training speed, and sharply reduce memory usage compared to the most efficient baseline.	翻訳日:2023-11-14 20:16:45 公開日:2023-11-11
# サンプル効率の良いマルチタスクチューニングのためのプロトタイプベースハイパーアダプタ Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning ( http://arxiv.org/abs/2310.11670v3 ) ライセンス: Link先を確認	Hao Zhao, Jie Fu, Zhaofeng He	(参考訳) パラメータ効率のよい微調整(PEFT)は、少数のパラメータを更新するだけで、トレーニング済み言語モデルを下流タスクに適応させる効果を示した。成功にもかかわらず、既存の手法のほとんどはタスク間の知識伝達を考慮せずに個別にタスクに適応し、低データ体制に限られる。この問題を解決するために,アダプタチューニングとハイパーネットワークに基づく新しいフレームワークであるPrototype-based HyperAdapter (PHA)を提案する。インスタンスデンスレトリバーとプロトタイプのハイパーネットワークを導入し、条件付きモジュールをサンプル効率のよい方法で生成する。これにより、マルチタスク学習と少ない転送学習において、既存のpeftメソッドと同等のパフォーマンス改善がもたらされる。さらに重要なことは、利用可能なデータサイズが小さくなると、我々のメソッドは大きなマージンで他の強力なベースラインを上回っます。さまざまなデータセットにわたる広範な実証実験に基づいて、トレーニング可能なパラメータとストリームタスクの正確性、サンプル効率のトレードオフをPHAがよりよいものにすることを実証した。 Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.	翻訳日:2023-11-14 20:16:20 公開日:2023-11-11
# 本を読むのは最高だけど、運転するなら違う! デファシブル・コモンセンス・ノームに関する視覚的根拠に基づく推論 Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms ( http://arxiv.org/abs/2310.10418v2 ) ライセンス: Link先を確認	Seungju Han and Junhyeok Kim and Jack Hessel and Liwei Jiang and Jiwan Chung and Yejin Son and Yejin Choi and Youngjae Yu	(参考訳) 普通は本を読むことは素晴らしいが、車を運転するときにはそうではない。コンテキストは言語で明示的に記述できるが、具体化されたシナリオでは、コンテキストはしばしば視覚的に提供される。この種の視覚的に根ざした、デファシブル・コモンセンス規範に関する推論は、一般に人間にとって容易であるが、(私たちが見せているように)機械にとって、視覚的理解とコモンセンス規範に関する推論の両方を必要とするため、挑戦となる。 NORMLENSというビジュアルグラウンドのコモンセンス規範を研究するための新しいマルチモーダルベンチマークを構築した。 NORMLENSは、2Kマルチモーダル状況に関する自由形式の説明を伴う10K人の人的判断で構成されており、(1)モデルが平均的な人的判断とどの程度一致しているかという2つの疑問に対処するための調査となる。 2)モデルが予測した判断をどの程度説明できるか? 現状のモデル判断や説明は人間のアノテーションとよく一致していないことがわかった。さらに, 大規模言語モデルから社会常識知識を抽出し, モデルと人間との協調性を高めるための新しいアプローチを提案する。データとコードはhttps://seungjuhan.me/normlensでリリースされる。 Commonsense norms are defeasible by context: reading books is usually great, but not when driving a car. While contexts can be explicitly described in language, in embodied scenarios, contexts are often provided visually. This type of visually grounded reasoning about defeasible commonsense norms is generally easy for humans, but (as we show) poses a challenge for machines, as it necessitates both visual understanding and reasoning about commonsense norms. We construct a new multimodal benchmark for studying visual-grounded commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment? and (2) how well can models explain their predicted judgments? We find that state-of-the-art model judgments and explanations are not well-aligned with human annotation. Additionally, we present a new approach to better align models with humans by distilling social commonsense knowledge from large language models. The data and code are released at https://seungjuhan.me/normlens.	翻訳日:2023-11-14 20:15:41 公開日:2023-11-11
# assert: 大規模言語モデルのロバスト性評価のための自動安全シナリオred teaming ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models ( http://arxiv.org/abs/2310.09624v2 ) ライセンス: Link先を確認	Alex Mei, Sharon Levy, William Yang Wang	(参考訳) 大規模言語モデルが社会へ統合されるにつれ,高分散環境において信頼性を維持する上で,一組のプロンプトに対する堅牢性がますます重要になってきており,利用者がインテリジェントシステムを呼び出す様々な設定を包括的にカプセル化する必要がある。本稿では,ASSERT(Automated Safety Scenario Red Teaming)を提案する。3つの手法 – セマンティックアライメント,ターゲットブートストラップ,対人的知識注入 – から構成される。堅牢な安全性評価のために,これらの手法をAI安全の重要な領域に適用し,多種多様なロバスト性設定,関連するシナリオ,敵対的シナリオを含むテストスイートをアルゴリズム的に生成する。このプロンプトを4つの安全領域に分割し、ドメインがモデルの性能に与える影響を詳細に分析する。既存の最先端モデルでは特に安全対策を講じているが,意味的関連シナリオにおける絶対的分類精度の最大11%,ゼロショットの敵意設定では最大19%の絶対エラー率の統計的に有意な性能差が見出され,ユーザの身体的安全性への懸念が高まった。 As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment.Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system. This paper proposes ASSERT, Automated Safety Scenario Red Teaming, consisting of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. For robust safety evaluation, we apply these methods in the critical domain of AI safety to algorithmically generate a test suite of prompts covering diverse robustness settings -- semantic equivalence, related scenarios, and adversarial. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. Despite dedicated safeguards in existing state-of-the-art models, we find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings, raising concerns for users' physical safety.	翻訳日:2023-11-14 20:14:46 公開日:2023-11-11
# 探索を伴わない共同ビームフォーミングのためのRL-Policiesの学習--Batch Constrained Off-Policy アプローチ Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach ( http://arxiv.org/abs/2310.08660v2 ) ライセンス: Link先を確認	Heasung Kim and Sravan Kumar Ankireddy	(参考訳) 本研究では,レート最大化のためのネットワークパラメータ最適化の問題を考える。我々はこれを、電力制御、ビーム形成、干渉キャンセルの連立最適化問題とみなす。複数の基地局(BS)が複数のユーザ機器(UE)と通信する環境を考える。ブルート力探索の指数関数的計算複雑性のため、深部強化学習(RL)技術を用いて、この非凸最適化問題を解く。現代の通信システムは、行動を正確にモデル化することが難しいことで悪名高い。これにより、エージェントが効率的に探索し学習するために必要な環境との相互作用として、RLベースのアルゴリズムを使用することが制限される。さらに、失敗のコストが高いため、探索と学習のために現実世界にアルゴリズムをデプロイすることが不適当である。ディープQネットワーク(DQN)ベースの制御など,従来のRLベースのソリューションとは対照的に,オフラインモデルベースのアプローチを提案する。具体的には、離散バッチ制約深度Q-ラーニング(BCQ)について検討し、DQNに類似した性能を探索することなく、少数のデータで実現できることを示す。これはサンプル効率を最大化し、商用ネットワークに新しいアルゴリズムをデプロイするリスクを最小化する。 https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.com/ のリンクで、コードとデータを含むプロジェクトリソース全体を提供します。 In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipment (UEs). Because of the exponential computational complexity of brute force search, we instead solve this nonconvex optimization problem using deep reinforcement learning (RL) techniques. Modern communication systems are notorious for their difficulty in exactly modeling their behavior. This limits us in using RL-based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill-advised to deploy the algorithm in the real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we suggest an offline model-based approach. We specifically consider discrete batch-constrained deep Q-learning (BCQ) and show that performance similar to DQN can be achieved with only a fraction of the data without exploring. This maximizes sample efficiency and minimizes risk in deploying a new algorithm to commercial networks. We provide the entire project resource, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.	翻訳日:2023-11-14 20:13:58 公開日:2023-11-11
# CATEモデル選択のための因果Q-集約 Causal Q-Aggregation for CATE Model Selection ( http://arxiv.org/abs/2310.16945v4 ) ライセンス: Link先を確認	Hui Lan, Vasilis Syrgkanis	(参考訳) 条件平均治療効果(CATE)の正確な推定は、パーソナライズされた意思決定の中核にある。 CATE推定には多くのモデルが存在するが、因果推論の根本的な問題のため、モデル選択は非自明な作業である。最近の実証研究は、二重ロバストな特性を持つプロキシ損失メトリクスとモデルアンサンブルを支持する証拠を提供する。しかし、理論的な理解は不足している。事前の理論的研究の直接適用は、モデル選択問題の非凸性に起因する最適オラクルモデル選択率につながる。我々は,既存の主要なcate ensemblingアプローチに対する後悔率を提供し,二重ロバストな損失を用いたq集約に基づく新しいcate モデル ensemblingアプローチを提案する。本結果から, 因果Q-集約は, 誤差関数の積に関する高次推定誤差項を付加することにより, 統計的に最適なオラクルモデル選択残差率$\frac{\log(M)}{n}$(M$モデルと$n$サンプルを含む)が得られることを示した。重要なことは、我々の後悔率は、どの候補CATEモデルも真実に近いものを必要としない。我々は、多くの半合成データセットで新しい手法を検証するとともに、モデル選択をインストゥルメンタル変数と非オブザーブドコンファウンディングで分類する作業の拡張も提供する。 Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.	翻訳日:2023-11-14 20:04:17 公開日:2023-11-11
# VQ-NeRF:ベクトル量子化によるニューラルリフレクタンス分解と編集 VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector Quantization ( http://arxiv.org/abs/2310.11864v3 ) ライセンス: Link先を確認	Hongliang Zhong, Jingbo Zhang, Jing Liao	(参考訳) 本研究では,ベクトル量子化(vector quantization, vq)を組み込んだ2分岐ニューラルネットワークモデルであるvq-nerfを提案する。従来のニューラル・リフレクタンス・フィールドは、3Dシーンをモデル化するためにのみ連続表現を使用する。この離散化の欠如は、ノイズのある材料分解と複雑な材料編集をもたらす。これらの制限に対処するため、我々のモデルは連続枝と離散枝からなる。連続枝は従来のパイプラインに従って分解物を予測し、離散枝はVQ機構を用いて連続物質を個別に定量化する。材料を離散化することにより,分解過程におけるノイズを低減し,離散材料のセグメンテーションマップを生成する。セグメンテーション結果の対応する領域をクリックして、さらに編集するための特定材料を容易に選択することができる。さらに,シーン内の材料数を予測するために,ドロップアウトに基づくVQコードワードランキング手法を提案する。ユーザビリティを向上させるために,素材編集を支援するインタラクティブインタフェースも開発している。我々は,コンピュータ生成シーンと実世界のシーンの両方でモデルを評価し,その優れた性能を示す。我々の知る限り、我々のモデルは3Dシーンで個別の素材編集を可能にする最初のモデルである。 We propose VQ-NeRF, a two-branch neural network model that incorporates Vector Quantization (VQ) to decompose and edit reflectance fields in 3D scenes. Conventional neural reflectance fields use only continuous representations to model 3D scenes, despite the fact that objects are typically composed of discrete materials in reality. This lack of discretization can result in noisy material decomposition and complicated material editing. To address these limitations, our model consists of a continuous branch and a discrete branch. The continuous branch follows the conventional pipeline to predict decomposed materials, while the discrete branch uses the VQ mechanism to quantize continuous materials into individual ones. By discretizing the materials, our model can reduce noise in the decomposition process and generate a segmentation map of discrete materials. Specific materials can be easily selected for further editing by clicking on the corresponding area of the segmentation outcomes. Additionally, we propose a dropout-based VQ codeword ranking strategy to predict the number of materials in a scene, which reduces redundancy in the material segmentation process. To improve usability, we also develop an interactive interface to further assist material editing. We evaluate our model on both computer-generated and real-world scenes, demonstrating its superior performance. To the best of our knowledge, our model is the first to enable discrete material editing in 3D scenes.	翻訳日:2023-11-14 19:59:35 公開日:2023-11-11
# LLMにおける選択予測改善のための自己評価による適応 Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs ( http://arxiv.org/abs/2310.11689v2 ) ライセンス: Link先を確認	Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha	(参考訳) 大規模言語モデル(LLM)は近年,自然言語理解や生成など,さまざまなタスクにおいて大きな進歩を見せている。しかし、高い意思決定シナリオでの使用は、エラーの可能性があるため、依然として制限されている。選択予測(Selective prediction)とは、LLMの信頼性を向上させるために、答えが不確実な場合には予測を控えることによって使用できる手法である。本研究では, LLMの選択的予測性能を向上させるために, 自己評価による適応のための新しいフレームワークを提案する。本フレームワークは,自己評価能力の向上を図りながら,パラメータ効率のチューニングを用いて,特定のタスクにLLMを適用するという考え方に基づいている。提案手法は,様々な質問応答(QA)データセット上で評価し,最先端の選択予測手法よりも優れていることを示す。例えば、CoQAベンチマークでは、AUACCを91.23%から92.63%に改善し、AUROCを74.61%から80.25%に改善した。 Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.	翻訳日:2023-11-14 19:58:53 公開日:2023-11-11
# 医用画像階層型マルチラベル分類のためのTLMCMネットワーク TLMCM Network for Medical Image Hierarchical Multi-Label Classification ( http://arxiv.org/abs/2311.00282v2 ) ライセンス: Link先を確認	Meng Wu, Siyan Luo, Qiyu Wu, Wenbin Ouyang	(参考訳) 医用画像階層的マルチラベル分類(MI-HMC)は、現代医療において最重要であり、データ不均衡と‘textit{hierarchy constraint}’の2つの重要な課題を提示している。既存のソリューションには複雑なモデルアーキテクチャ設計やドメイン固有の前処理が含まれており、実装にかなりの専門知識や労力を要する。本稿では,mi-hmcタスクのための最大制約モジュール(tlmcm)ネットワークを用いた転送学習を提案する。 TLMCMネットワークは、上記の課題を克服するための新しいアプローチを提供し、平均精度とリコール曲線($AU\overline{(PRC)}$)測定値に基づく既存の手法よりも優れている。さらに、本研究では、mi-hmcタスクの文脈で広く研究されていない2つの新しい精度指標である$emr$と$hammingaccuracy$を提案する。実験の結果,TLMCMネットワークはMI-HMCタスクに対して高いマルチラベル予測精度(80\%$-90\%$)を達成し,医療領域アプリケーションに有用な貢献をすることが示された。 Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper proposes Transfer Learning with Maximum Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers a novel approach to overcome the aforementioned challenges, outperforming existing methods based on the Area Under the Average Precision and Recall Curve($AU\overline{(PRC)}$) metric. In addition, this research proposes two novel accuracy metrics, $EMR$ and $HammingAccuracy$, which have not been extensively explored in the context of the MI-HMC task. Experimental results demonstrate that the TLMCM network achieves high multi-label prediction accuracy($80\%$-$90\%$) for MI-HMC tasks, making it a valuable contribution to healthcare domain applications.	翻訳日:2023-11-14 19:50:22 公開日:2023-11-11
# 単発視覚追跡における画像関連誘導バイアスの活用 Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking ( http://arxiv.org/abs/2310.19542v2 ) ライセンス: Link先を確認	Chuanming Tang, Kai Wang, Joost van de Weijer, Jianlin Zhang, Yongmei Huang	(参考訳) 視覚追跡における最先端のパフォーマンスにもかかわらず、最近のシングルブランチトラッカーは、ビジョントランスフォーマー(ViT)エンコーダと推論パイプラインに関連する、弱い前提を見逃す傾向にある。さらに, 判別トラッカの有効性は, デュアルブランチパイプラインの採用により制限されている。単分岐ネットワークと識別モデルとのギャップを埋めるための適応型ViTモデル予測トラッカー(AViTMP)を提案する。具体的には,提案するエンコーダavit-encにおいて,vitに基づく密組込みパラダイムを豊かにするために,アダプタモジュールとジョイントターゲット状態埋め込みを導入する。次にavit-encと密輸デコーダと判別対象モデルを組み合わせて正確な位置を推定する。さらに,従来の推論手法の限界を緩和するため,双方向のサイクルトラッキング検証により,トラクタの存在下でのロバスト性を向上するCycleTrackという新しい推論パイプラインを提案する。最後に,長期的なシナリオにおいて大きな課題を積極的に処理する,デュアルフレーム更新推論戦略を提案する。実験では,lasot,lasotextsub,avistなどを含む総合評価のための10のトラッキングベンチマークについてavitmpを評価した。実験結果から,AViTMPが最先端の性能,特に長期追跡とロバスト性を達成したことが明らかとなった。 Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.	翻訳日:2023-11-14 19:49:21 公開日:2023-11-11
# デバイアス言語表現モデルにおける保護グループを傷つけるな Do Not Harm Protected Groups in Debiasing Language Representation Models ( http://arxiv.org/abs/2310.18458v2 ) ライセンス: Link先を確認	Chloe Qinyu Zhu, Rickard Stureborg, Brandon Fain	(参考訳) 実世界のデータで訓練された言語表現モデル(LRM)は、望ましくない偏見を捉え、悪化させ、様々な人口集団の人々の不公平な扱いを引き起こす可能性がある。単語埋め込みなどのベンチマーク評価におけるバイアスを取り除くため, LRMに介入する手法がいくつか研究されている。しかし、デバイアス介入の副作用は通常下流タスクでは明らかにされない。本稿では,偏見の公平性を評価するための評価セットであるxGAP-DEBIASを提案する。本研究は,現実のテキスト分類タスクにおける4つのデバイアス手法について検討し,デバイアス化手法が保護を目的としているものを含め,すべての人口集団において,バイアスの低減が性能低下のコストとなることを示す。我々は,保護集団に害を与えないような制約で,デバイアスング技術は下流のパフォーマンスを良くするべきだと主張する。 Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.	翻訳日:2023-11-14 19:48:30 公開日:2023-11-11
# OpinSummEval: 意見要約のための自動評価の再検討 OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization ( http://arxiv.org/abs/2310.18122v2 ) ライセンス: Link先を確認	Yuchen Shen, Xiaojun Wan	(参考訳) 意見要約は、側面や感情に特有な焦点をあてることから、他の種類の要約タスクとは分離する。 ROUGEのような一部の自動評価手法が人気を博しているが、意見要約の質を評価するには信頼性が低い。本稿では,人間の判断と14の意見要約モデルからの出力からなるデータセットであるopinsummevalを提案する。さらに、4次元にわたる24の自動測定値と人間の評価値の相関について検討する。以上の結果から,ニューラルネットに基づく指標は一般に非ニューラル指標よりも優れていることが示唆された。しかしながら、BART や GPT-3/3.5 のような強力なバックボーン上に構築されたメトリクスでさえ、すべての次元にわたって一貫して相関するわけではなく、意見要約のための自動評価手法の進歩の必要性を強調している。コードとデータはhttps://github.com/A-Chicharito-S/OpinSummEval/tree/mainで公開されている。 Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are publicly available at https://github.com/A-Chicharito-S/OpinSummEval/tree/main.	翻訳日:2023-11-14 19:47:33 公開日:2023-11-11
# マイクロ波シールド超低温分子の熱化への展望 Prospects for thermalization of microwave-shielded ultracold molecules ( http://arxiv.org/abs/2310.17812v2 ) ライセンス: Link先を確認	Reuben R. W. Wang and John L. Bohn	(参考訳) マイクロ波遮蔽極性分子フェルミオン希薄気体における異方性熱分解の研究を行った。しきい値以上の衝突エネルギーについては, 前方散乱の強い好みと全断面のエネルギー低下により熱化が抑制され, 蒸発冷却の効率が著しく低下することがわかった。 Dengらによって導かれる有効ポテンシャルエネルギー面について密結合計算を行う。 [Phys. Rev. 130, 183001 (2023)], 衝突エネルギー範囲にわたって正確な2体弾性差動断面積を得る。ガウス過程回帰(gaussian process regression)を用いて、広い範囲の衝突角とエネルギーにわたって微分断面積の大域的な表現を得る。平衡への経路は、熱化を達成するための衝突効率の尺度によって定量化され、クロス次元再熱化実験によって分析される。 We study anisotropic thermalization in dilute gases of microwave shielded polar molecular fermions. For collision energies above the threshold regime, we find that thermalization is suppressed due to a strong preference for forward scattering and a reduction in total cross section with energy, significantly reducing the efficiency of evaporative cooling. We perform close-coupling calculations on the effective potential energy surface derived by Deng et al. [Phys. Rev. Lett. 130, 183001 (2023)], to obtain accurate 2-body elastic differential cross sections across a range of collision energies. We use Gaussian process regression to obtain a global representation of the differential cross section, over a wide range of collision angles and energies. The route to equilibrium is then analyzed with cross-dimensional rethermalization experiments, quantified by a measure of collisional efficiency toward achieving thermalization.	翻訳日:2023-11-14 19:47:19 公開日:2023-11-11
# 単一スピンにおける非エルミート系の結び目位相の観察 Observation of the Knot Topology of Non-Hermitian Systems in a Single Spin ( http://arxiv.org/abs/2311.03642v2 ) ライセンス: Link先を確認	Yang Wu, Yunhan Wang, Xiangyu Ye, Wenquan Liu, Chang-Kui Duan, Ya Wang, Xing Rong, and Jiangfeng Du	(参考訳) 系の非ハーモニティ性は、エルミート的トポロジーを持たない異なる結び目トポロジーをもたらす。本稿では,長いコヒーレンス時間窒素空洞中心を持つ普遍的希釈法に基づく,ギャップ付き非エルミート系における結び目トポロジーの包括的研究を,$^{\text{12}}$C同位体精製ダイヤモンドで報告する。エネルギーバンドのブレイディングパターンと固有状態トポロジーの両方が明らかにされる。さらに,非エルミート系の位相的不変性を明らかにするため,固有状態トポロジーに関連する大域的生物rthogonal berry相が観察された。提案手法は,非エルミート量子系におけるバンドブレイディング,固有状態トポロジー,対称性間の相互作用のさらなる探索方法である。 The non-Hermiticity of the system gives rise to distinct knot topology that has no Hermitian counterpart. Here, we report a comprehensive study of the knot topology in gapped non-Hermitian systems based on the universal dilation method with a long coherence time nitrogen-vacancy center in a $^{\text{12}}$C isotope purified diamond. Both the braiding patterns of energy bands and the eigenstate topology are revealed. Furthermore, the global biorthogonal Berry phase related to the eigenstate topology has been successfully observed, which identifies the topological invariance for the non-Hermitian system. Our method paves the way for further exploration of the interplay among band braiding, eigenstate topology and symmetries in non-Hermitian quantum systems.	翻訳日:2023-11-14 19:39:55 公開日:2023-11-11
# 汎用的異常検出と理解に向けて:大規模視覚言語モデル(gpt-4v)がリード Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead ( http://arxiv.org/abs/2311.02782v2 ) ライセンス: Link先を確認	Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, and Weiming Shen	(参考訳) 異常検出は、さまざまなドメインとデータタイプにまたがる重要なタスクである。しかし、既存の異常検出モデルは、しばしば特定の領域とモダリティのために設計される。本研究では,視覚言語モデルであるgpt-4v(ision)を用いて,異常検出タスクを汎用的に処理する。 gpt-4vのマルチモダリティ,画像,ビデオ,ポイントクラウド,時系列データを含むマルチドメイン異常検出タスクにおいて,産業,医療,論理,ビデオ,3次元異常検出,ローカライズタスクなど,複数のアプリケーション領域にまたがる適用について検討した。 GPT-4Vの性能を高めるために,クラス情報や人的専門知識,参照画像など,さまざまな種類の付加的手がかりをプロンプトとして組み込んで,GPT-4Vは,ゼロ・ワンショット異常検出において,グローバルおよび微粒なセマンティックパターンの検出と説明に極めて有効であることが実証された。これにより、正常例と異常例を正確に区別することができる。本研究では広範な評価を行ったが,GPT-4Vの汎用異常検出能力のさらなる活用には今後の評価が必要である。定量的指標の探索、評価ベンチマークの拡張、マルチラウンドインタラクションの導入、ヒューマンフィードバックループの導入などだ。それにもかかわらず、gpt-4vは一般的な異常検出と理解において有望な性能を示し、異常検出のための新しい道を開く。 Anomaly detection is a crucial task across different domains and data types. However, existing anomaly detection models are often designed for specific domains and modalities. This study explores the use of GPT-4V(ision), a powerful visual-linguistic model, to address anomaly detection tasks in a generic manner. We investigate the application of GPT-4V in multi-modality, multi-domain anomaly detection tasks, including image, video, point cloud, and time series data, across multiple application areas, such as industrial, medical, logical, video, 3D anomaly detection, and localization tasks. To enhance GPT-4V's performance, we incorporate different kinds of additional cues such as class information, human expertise, and reference images as prompts.Based on our experiments, GPT-4V proves to be highly effective in detecting and explaining global and fine-grained semantic patterns in zero/one-shot anomaly detection. This enables accurate differentiation between normal and abnormal instances. Although we conducted extensive evaluations in this study, there is still room for future evaluation to further exploit GPT-4V's generic anomaly detection capacity from different aspects. These include exploring quantitative metrics, expanding evaluation benchmarks, incorporating multi-round interactions, and incorporating human feedback loops. Nevertheless, GPT-4V exhibits promising performance in generic anomaly detection and understanding, thus opening up a new avenue for anomaly detection.	翻訳日:2023-11-14 19:39:44 公開日:2023-11-11
# 画像ベースおよび臨床バイオメディシンにおけるマルチモーダル機械学習:調査と展望 Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects ( http://arxiv.org/abs/2311.02332v2 ) ライセンス: Link先を確認	Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, Olivier Gevaert and Arvind Rao	(参考訳) 医療人工知能(AI)システムにおける機械学習(ML)の応用は、伝統的な統計手法からディープラーニングモデルの適用の増加へと移行している。本研究は,マルチモーダルmlの現状を概観し,医療画像解析と臨床意思決定支援システムへの深い影響に注目した。マルチモーダル表現,融合,翻訳,アライメント,コラーニングの課題とイノベーションを強調し,臨床予測のためのマルチモーダルモデルの変換可能性について検討した。また、このようなモデルの実用的な実装に疑問を呈し、意思決定支援システムと医療提供者のダイナミクスに注意を向けている。進歩にもかかわらず、多くの生物医学領域におけるデータバイアスや「ビッグデータ」の不足といった課題が続いている。我々は、失敗をさらに進めるために効果的なイノベーションと協力的努力に関する議論を締めくくった。 Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also questions practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on effective innovation and collaborative efforts to further the miss	翻訳日:2023-11-14 19:38:08 公開日:2023-11-11
# リプレースサンプルを用いた言語モデルのベンチマークと汚染の再検討 Rethinking Benchmark and Contamination for Language Models with Rephrased Samples ( http://arxiv.org/abs/2311.04850v2 ) ライセンス: Link先を確認	Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica	(参考訳) 大規模な言語モデルは、人間が生成したすべてのデータに基づいて、ますます訓練されている。多くの人は、事前トレーニングや微調整データセットの潜在的な汚染のために、公開ベンチマークの信頼性を懸念している。ほとんどのデータ汚染対策は、文字列マッチング(例えばn-gramオーバーラップ)を用いてベンチマークデータを除去するが、これらの手法は不十分であり、単純なテストデータ(例えばパラフレーズ、翻訳)はこれらの汚染対策を簡単に回避できることを示す。さらに, テストデータのばらつきが排除されない場合, 13Bモデルはテストベンチマークに容易に適合し, GPT-4と同等の性能が得られることを示した。我々は、MMLU、GSK8k、HumanEvalなどの広く使われているベンチマークにおいて、そのような観測を検証した。この増大するリスクに対処するために,llmに基づくより強固な除染法を提案し,広く使用されている事前訓練および微調整データセットに適用し,これまで未知だったテストの重なりを明らかにした。例えば、RedPajama-Data-1TやStarCoder-Dataといった事前トレーニングセットでは、HumanEvalベンチマークの8-18\%が重複していることが分かりました。興味深いことに、gpt-3.5/4が生成する合成データセットにもそのような汚染が見られ、意図しない汚染の可能性を示唆している。パブリックなベンチマークを使用する場合、コミュニティはより強い汚染除去アプローチを採用するように促します。さらに,モデルを正確に評価するために,新たなワンタイム試験を積極的に実施するようコミュニティに呼びかける。我々の除染ツールはhttps://github.com/lm-sys/llm-decontaminator.comで公開されている。 Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets. While most data decontamination efforts apply string matching (e.g., n-gram overlap) to remove benchmark data, we show that these methods are insufficient, and simple variations of test data (e.g., paraphrasing, translation) can easily bypass these decontamination measures. Furthermore, we demonstrate that if such variation of test data is not eliminated, a 13B model can easily overfit a test benchmark and achieve drastically high performance, on par with GPT-4. We validate such observations in widely used benchmarks such as MMLU, GSK8k, and HumanEval. To address this growing risk, we propose a stronger LLM-based decontamination method and apply it to widely used pre-training and fine-tuning datasets, revealing significant previously unknown test overlap. For example, in pre-training sets such as RedPajama-Data-1T and StarCoder-Data, we identified that 8-18\% of the HumanEval benchmark overlaps. Interestingly, we also find such contamination in synthetic dataset generated by GPT-3.5/4, suggesting a potential risk of unintentional contamination. We urge the community to adopt stronger decontamination approaches when using public benchmarks. Moreover, we call for the community to actively develop fresh one-time exams to evaluate models accurately. Our decontamination tool is publicly available at https://github.com/lm-sys/llm-decontaminator.	翻訳日:2023-11-14 19:26:34 公開日:2023-11-11
# ビデオオブジェクトセグメンテーションにおけるアノテーションの学習 Learning the What and How of Annotation in Video Object Segmentation ( http://arxiv.org/abs/2311.04414v2 ) ライセンス: Link先を確認	Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos	(参考訳) ビデオオブジェクトセグメンテーション(VOS)は、ビデオ編集からビデオデータ生成まで、いくつかのアプリケーションにとって不可欠である。 VOSモデルのトレーニングには、手動でラベル付けされたトレーニングビデオが多数必要である。オブジェクトをアノテートする方法のデファクトでは、ビデオフレームごとにターゲットオブジェクトに詳細なセグメンテーションマスクを描く必要がある。しかし、このアノテーションプロセスは退屈で時間がかかります。このアノテーションコストを削減するため,ビデオオブジェクトセグメンテーションのためのヒューマンインザループアノテーションフレームワークであるEVA-VOSを提案する。従来のアプローチとは異なり、どのフレーム("What")をアノテーションにするか、どのアノテーションタイプ("How")を使うのかを反復的に予測するエージェントを導入します。次に、アノテーションはVOSモジュールの更新に使用される選択されたフレームのみに注釈を付け、アノテーションの時間が大幅に向上する。我々はMOSEとDAVISデータセットの実験を行い、次のように示す。 (a)EVA-VOSは、ビデオの標準的な注釈付け方法よりも3.5倍早く、人間の同意に近い精度のマスクにつながる。 b)我々のフレーム選択は最先端のパフォーマンスを達成する。 c) eva-vosは、他のすべてのメソッドやベースラインと比較して、アノテーション時間の観点から大きなパフォーマンス向上をもたらす。 Video Object Segmentation (VOS) is crucial for several applications, from video editing to video data generation. Training a VOS model requires an abundance of manually labeled training videos. The de-facto traditional way of annotating objects requires humans to draw detailed segmentation masks on the target objects at each video frame. This annotation process, however, is tedious and time-consuming. To reduce this annotation cost, in this paper, we propose EVA-VOS, a human-in-the-loop annotation framework for video object segmentation. Unlike the traditional approach, we introduce an agent that predicts iteratively both which frame ("What") to annotate and which annotation type ("How") to use. Then, the annotator annotates only the selected frame that is used to update a VOS module, leading to significant gains in annotation time. We conduct experiments on the MOSE and the DAVIS datasets and we show that: (a) EVA-VOS leads to masks with accuracy close to the human agreement 3.5x faster than the standard way of annotating videos; (b) our frame selection achieves state-of-the-art performance; (c) EVA-VOS yields significant performance gains in terms of annotation time compared to all other methods and baselines.	翻訳日:2023-11-14 19:24:35 公開日:2023-11-11
# 擬ランダムアイソメトリ Pseudorandom Isometries ( http://arxiv.org/abs/2311.02901v3 ) ライセンス: Link先を確認	Prabhanjan Ananth, Aditya Gulati, Fatih Kaleoglu, Yao-Ting Lin	(参考訳) 我々は、${\cal Q}$-secure pseudorandom isometries (PRI)と呼ばれる新しい概念を導入する。擬似乱数等長法(pseudorandom isometry)は、n$-qubit状態から$(n+m)$-qubit状態へ等長法でマッピングする効率的な量子回路である。セキュリティに関して言えば、$\rho$ 上の$q$-fold pri の出力は、任意の多項式 $q$ に対して$ \rho \in {\cal q}$ に対して、$\rho$ 上の$q$-fold haar 等長の出力と計算的に区別できないべきである。 ${\cal Q}$を微調整することで、擬似ランダム性の多くの既存の概念を回復する。我々は、pri の構成と、量子一方向関数を仮定すると、${\cal q}$-secure pseudorandom isometries (pri) の安全性を、${\cal q}$ の異なる興味深い設定に対して証明する。また、prisの暗号応用として、量子疑似ランダム性概念に対する長さ拡張定理、量子状態に対するメッセージ認証スキーム、マルチコピーセキュアな公開およびプライベート暗号スキーム、簡潔な量子コミットメントなどがある。 We introduce a new notion called ${\cal Q}$-secure pseudorandom isometries (PRI). A pseudorandom isometry is an efficient quantum circuit that maps an $n$-qubit state to an $(n+m)$-qubit state in an isometric manner. In terms of security, we require that the output of a $q$-fold PRI on $\rho$, for $ \rho \in {\cal Q}$, for any polynomial $q$, should be computationally indistinguishable from the output of a $q$-fold Haar isometry on $\rho$. By fine-tuning ${\cal Q}$, we recover many existing notions of pseudorandomness. We present a construction of PRIs and assuming post-quantum one-way functions, we prove the security of ${\cal Q}$-secure pseudorandom isometries (PRI) for different interesting settings of ${\cal Q}$. We also demonstrate many cryptographic applications of PRIs, including, length extension theorems for quantum pseudorandomness notions, message authentication schemes for quantum states, multi-copy secure public and private encryption schemes, and succinct quantum commitments.	翻訳日:2023-11-14 19:22:49 公開日:2023-11-11
# THOS: ターゲットのヘイトと攻撃的スピーチのためのベンチマークデータセット THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech ( http://arxiv.org/abs/2311.06446v1 ) ライセンス: Link先を確認	Saad Almohaimeed, Saleh Almohaimeed, Ashfaq Ali Shafin, Bogdan Carbunar and Ladislau B\"ol\"oni	(参考訳) Twitterのようなソーシャルメディア上の有害コンテンツを検出することは、一見単純なye/no分類がかなりの複雑さを隠蔽しているという事実によって難しい。残念なことに、ヘイトとアグレッシブスピーチで分類器を訓練するためにいくつかのデータセットが収集されているが、ターゲットクラスと特定のターゲットの細かい粒度でラベル付けされたデータセットは少ない。本稿では,メッセージのターゲットに関する詳細なアノテーションを手作業でラベル付けした8.3kツイートのデータセットTHOSを紹介する。このデータセットは,大規模言語モデルに基づく分類器を訓練し,この粒度レベルでの分類を可能にすることを実証する。 Detecting harmful content on social media, such as Twitter, is made difficult by the fact that the seemingly simple yes/no classification conceals a significant amount of complexity. Unfortunately, while several datasets have been collected for training classifiers in hate and offensive speech, there is a scarcity of datasets labeled with a finer granularity of target classes and specific targets. In this paper, we introduce THOS, a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message. We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.	翻訳日:2023-11-14 18:49:30 公開日:2023-11-11
# 一般硬貨行列を用いた3状態量子ウォークの固有値解析 Eigenvalue analysis of three-state quantum walks with general coin matrices ( http://arxiv.org/abs/2311.06468v1 ) ライセンス: Link先を確認	Jir\^o Akahori, Chusei Kiumi, Norio Konno, Takuya Watanabe	(参考訳) 固有値の存在に関する数学的解析は、量子ウォークの極めて重要な性質である局所化の発生に対応するため、不可欠である。以前の研究では、転送行列を用いた固有値解析は、グローバー行列を含む特定のコイン行列のクラスを持つ空間不均質な3状態量子ウォークに有用であることが証明されている。本研究では,一般のコイン行列を用いた3状態量子ウォークの伝達行列に注意を向ける。従来の研究手法に基づき, 伝達行列の性質を深く調査し, これまで解析不能であったモデルの固有値の導出に数値解析を適用した。 Mathematical analysis on the existence of eigenvalues is vital, as it corresponds to the occurrence of localization, an exceptionally important property of quantum walks. Previous studies have demonstrated that eigenvalue analysis utilizing the transfer matrix proves beneficial for space inhomogeneous three-state quantum walks with a specific class of coin matrices, including Grover matrices. In this research, we turn our attention to the transfer matrix of three-state quantum walks with a general coin matrix. Building upon previous research methodologies, we dive deeper into investigating the properties of the transfer matrix and employ numerical analysis to derive eigenvalues for models that were previously unanalyzable.	翻訳日:2023-11-14 18:36:21 公開日:2023-11-11
# 項目応答理論を用いた適応言語に基づくメンタルヘルス評価 Adaptive Language-based Mental Health Assessment with Item-Response Theory ( http://arxiv.org/abs/2311.06467v1 ) ライセンス: Link先を確認	Vasudha Varadarajan, Sverker Sikstr\"om, Oscar N.E. Kjell and H. Andrew Schwartz	(参考訳) メンタルヘルスの問題は個人によって大きく異なり、徴候や症状の症状はかなり異種である。近年, 言語による抑うつと不安評価は, 患者自身の言語を評価することによって, この異質な性質を捉えることを約束している。本研究では,適応的な言語に基づくアセスメントを導入する。モデルが問うべき質問に対する限定言語応答に基づいて,個人の心理的スコアを反復的に推定するタスクである。そこで本研究では,古典的テスト理論 (CTT) と項目応答理論 (IRT) の2つの統計的学習に基づく計測・検査手法について検討する。一般に適応テストを用いることで、標準テストで高い妥当性(r ~ 0.7)を達成するのに必要な質問数が大幅に減少し、11問から3問に低下し、5問に不安が生じた。課題の組合せ的性質を考慮し,オーダリングとスコアリングの両目的に対する複数の戦略を実証的に評価し,半教師付き項目応答理論に基づく手法 (ALIRT) と教師付きアクタ批判に基づくモデルを導入する。どちらのモデルもランダム順序と固定順序よりも大幅に改善されているが、alirtはより少ない質問数で最高精度を達成するスケーラブルなモデルである(例えば、pearson r ~ 0.93は3つの質問で達成されている)。全体としてalirtは、精度や計算コストを損なうことなく、質問の数を減らすことができる。 Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs.	翻訳日:2023-11-14 18:36:09 公開日:2023-11-11
# 無線通信に基づく電子通信データリンク暗号化シミュレーション Electronic Communication Data Link Encryption Simulation Based on Wireless Communication ( http://arxiv.org/abs/2311.06462v1 ) ライセンス: Link先を確認	Rulin Bai	(参考訳) 筆者は,電子通信データリンク暗号化のシミュレーション効果を向上させるため,無線通信に基づくソリューションを提案する。この技術の主な内容は、無線通信の研究、楕円曲線暗号アルゴリズムの改善、システム暗号化モデルの構築、合法的かつ有効なノード秘密鍵の取得、システムの関連するセキュリティ属性の評価と分析、鍵のセキュリティの検証、無線通信の暗号化最適化の実現である。改良された楕円曲線を用いて、ネットワーク通信における証明なし公開鍵暗号システムの下でのシステムデータチェーンの暗号化をシミュレートし、その時間は2.31ミリ秒であり、他のアルゴリズムよりも低い。結論: 無線通信に基づく技術研究が電子通信データリンクの暗号化シミュレーション効果を効果的に改善できることが実証された。 In order to improve the simulation effect of electronic communication data link encryption, the author proposes a solution based on wireless communication. The main content of this technology is based on the research of wireless communication, improve the elliptic curve cryptographic algorithm to build a system encryption model, obtain legal and valid node private keys, evaluate and analyze the relevant security attributes of the system, verify the security of the keys, and realize the encryption optimization of wireless network communication. Experimental results show that: Using the improved elliptic curve to simulate the system data chain encryption under the certificateless public key cryptosystem in network communication, the time is only 2.31 milliseconds, which is lower than other algorithms. Conclusion: It is proved that the technology research based on wireless communication can effectively improve the encryption simulation effect of electronic communication data link.	翻訳日:2023-11-14 18:35:38 公開日:2023-11-11
# Logit Adjusted Softmaxによるオンライン連続学習 Online Continual Learning via Logit Adjusted Softmax ( http://arxiv.org/abs/2311.06460v1 ) ライセンス: Link先を確認	Zhehao Huang, Tao Li, Chenhe Yuan, Yingwen Wu, Xiaolin Huang	(参考訳) オンライン連続学習は、モデルが壊滅的な忘れ去らないまま、非定常データストリームから学ぶ必要がある難しい問題である。トレーニング中のクラス間の不均衡は、忘れる主な原因として特定され、最近学習されたクラスに対するモデル予測バイアスに繋がる。本稿では,クラス間不均衡が不均衡なクラスプライアーによるものであることを理論的に解析し,クラス内固有分布から得られる関数はベイズ最適分類器である。そこで本研究では,トレーニング中のモデルロジットの簡単な調整により,先行クラスバイアスに効果的に抵抗し,対応するベイズ最適化を追求できることを示す。提案手法であるLogit Adjusted Softmaxは,クラス増分だけでなく,現実的な一般設定においてもクラス間不均衡の影響を軽減し,計算コストを抑える。我々は,様々なベンチマークでアプローチを評価し,先行技術と比較して有意な性能改善を示す。例えば、CIFAR10のベースラインを4.6%改善しています。 Online continual learning is a challenging problem where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.	翻訳日:2023-11-14 18:35:23 公開日:2023-11-11
# 非対称コントラストマルチモーダル学習による化学理解の促進 Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding ( http://arxiv.org/abs/2311.06456v1 ) ライセンス: Link先を確認	Hao Xu, Yifei Wang, Yunrui Li, Pengyu Hong	(参考訳) マルチモーダル深層学習の汎用性は、科学的研究と実践的応用の進歩に非常に有望である。この分野が発展を続けるにつれ、クロスモーダル分析の集団的力は革新的イノベーションを駆動し、化学理解と発見の新しいフロンティアへと導かれる。そこで本研究では, 分子に適した新しいアプローチとして, 非対称コントラスト型M}ultimodal Learning (ACML)を導入し, 化学分野の進展の可能性を示した。 ACMLは効果的な非対称コントラスト学習の力を利用して、様々な化学修飾物から分子グラフ表現への情報をシームレスに伝達する。事前訓練された化学ユニモーダルエンコーダと浅層設計のグラフエンコーダを組み合わせることで、ACMLは、異なるモダリティから協調した化学意味論の同化を促進する。この革新的な枠組みは、学習表現の解釈性を高め、グラフニューラルネットワークの表現力を高める。異性体識別や薬物発見のための重要な化学的性質の発見といった実践的なタスクを通じて、ACMLは化学研究と応用に革命をもたらす能力を示し、異なるモダリティの化学的意味をより深く理解している。 The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, leading us to new frontiers in chemical understanding and discovery. Hence, we introduce Asymmetric Contrastive M}ultimodal Learning (ACML) as a novel approach tailored for molecules, showcasing its potential to advance the field of chemistry. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. This innovative framework enhances the interpretability of learned representations and bolsters the expressive power of graph neural networks. Through practical tasks such as isomer discrimination and uncovering crucial chemical properties for drug discovery, ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of chemical semantics of different modalities.	翻訳日:2023-11-14 18:35:06 公開日:2023-11-11
# Aria-NeRF:マルチモーダルエゴセントリックビュー合成 Aria-NeRF: Multimodal Egocentric View Synthesis ( http://arxiv.org/abs/2311.06455v1 ) ライセンス: Link先を確認	Jiankai Sun, Jianing Qiu, Chuanyang Zheng, John Tucker, Javier Yu, Mac Schwager	(参考訳) 我々は,Neural Radiance Fields (NeRFs) にインスパイアされた可変体積線トレーシングに基づいて,エゴセントリックデータから学習したリッチでマルチモーダルなシーンモデルの開発を加速する。 Egocentric image sequenceからのNeRFライクなモデルの構築は、人間の行動を理解する上で重要な役割を担い、VR/ARの領域における多様な応用を担っている。このような自己中心型NeRFのようなモデルは現実的なシミュレーションとして利用でき、現実世界でタスクを実行する知的エージェントの進歩に大きく貢献する。 Egocentric view synthesisの将来は、現在のNeRFを超える新しい環境表現に繋がる可能性がある。例えば、移動追跡のためのIMU、表面テクスチャと人間の言語コンテキストをキャプチャするオーディオセンサー、シーンにおける人間の注意パターンを推測するアイ・ゲイズ・トラッカーなどである。エゴセントリック・マルチモーダル・シーン・モデリングの開発と評価を支援するため,包括的マルチモーダル・エゴセントリック・ビデオ・データセットを提案する。このデータセットは、RGB画像、アイトラッキングカメラの映像、マイクからの音声記録、気圧計からの気圧測定、GPSからの位置座標、Wi-FiとBluetoothの接続の詳細、デュアル周波数IMUデータセット(1kHzと800Hz)と磁気センサのペアによる情報を含む、総合的なセンサデータの収集を提供する。データセットはMeta Aria Glassesウェアラブルデバイスプラットフォームで収集された。このデータセットで捉えた多様なデータモダリティと現実世界のコンテキストは、人間の行動に対する理解を深め、VR、AR、ロボット工学の領域でより没入的でインテリジェントな体験を可能にする、堅牢な基盤となる。 We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics.	翻訳日:2023-11-14 18:34:42 公開日:2023-11-11
# 異常予測を識別するためのsaliency-based clustering framework A Saliency-based Clustering Framework for Identifying Aberrant Predictions ( http://arxiv.org/abs/2311.06454v1 ) ライセンス: Link先を確認	Aina Tersol Montserrat, Alexander R. Loftus, Yael Daihes	(参考訳) 機械学習では、分類タスクは現実世界の幅広いアプリケーションの基礎となる。信頼性があり信頼性の高い分類は、特にバイオメディカルな環境では複雑であり、基礎的真実は本質的に不確実であり、ラベル付けの高度な専門知識に依存している。正確さやリコールのような伝統的なメトリクスは、価値はあるが、これらの曖昧なシナリオのニュアンスを捉えるには不十分である。ここでは,分類誤りの性質が頻度と同じくらい重要であることを強調して,異常予測の概念を紹介する。本稿では,誤分類率の低減と異常予測の識別を目的とした,新しい効率的な学習手法を提案する。我々のフレームワークはモデルの性能を大幅に向上させ、精度を20倍に向上させる。本手法を獣医学の分野である獣医学の分野に応用し, 被曝率は高いが, 人体医学に比べて広く研究されていない。異常予測の識別と緩和に焦点をあてて、獣医学の世界における新しい応用を含む実世界のシナリオにおける機械学習分類器の有用性と信頼性を高める。 In machine learning, classification tasks serve as the cornerstone of a wide range of real-world applications. Reliable, trustworthy classification is particularly intricate in biomedical settings, where the ground truth is often inherently uncertain and relies on high degrees of human expertise for labeling. Traditional metrics such as precision and recall, while valuable, are insufficient for capturing the nuances of these ambiguous scenarios. Here we introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency. We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions. Our framework demonstrates a substantial improvement in model performance, achieving a 20\% increase in precision. We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine. By focusing on the identification and mitigation of aberrant predictions, we enhance the utility and trustworthiness of machine learning classifiers in high-stakes, real-world scenarios, including new applications in the veterinary world.	翻訳日:2023-11-14 18:34:07 公開日:2023-11-11
# docgen: pythonで詳細なパラメータdocstringを生成する DocGen: Generating Detailed Parameter Docstrings in Python ( http://arxiv.org/abs/2311.06453v1 ) ライセンス: Link先を確認	Vatsal Venkatkrishna, Durga Shree Nagabushanam, Emmanuel Iko-Ojo Simon, Fatemeh H. Fard, Melina Vidoni, Zadia Codabux	(参考訳) ドキュメンテーションの負債は、オープンソースソフトウェアの効果的な利用を妨げる。コード要約ツールは開発者にとって有用だが、ほとんどの場合、高レベルの要約ではなく、関数内の各パラメータの詳細な説明を好む。しかしながら、このような要約の生成は、高品質なトレーニングデータがないため、単一の生成モデルが確実に生成するには複雑すぎる。そこで本稿では,docstringの特定の部分を生成する複数のタスク固有モデルを組み合わせたマルチステップアプローチを提案する。これらのモデルの組み合わせは、最終的な docstring に各セクションを含めることを保証する。提案手法を,自動測定と人中心評価の両方を用いて既存の生成モデルと比較し,既存の手法よりもアプローチの方が優れていることを示す。 Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods.	翻訳日:2023-11-14 18:33:49 公開日:2023-11-11
# 偽負推定によるEコマース検索におけるプールバイアスの緩和 Mitigating Pooling Bias in E-commerce Search via False Negative Estimation ( http://arxiv.org/abs/2311.06444v1 ) ライセンス: Link先を確認	Xiaochen Wang, Xiao Xiao, Ruhan Zhang, Xuan Zhang, Taesik Na, Tejaswi Tenneti, Haixun Wang and Fenglong Ma	(参考訳) ユーザエクスペリエンスとビジネス成功には、効率的で正確な製品関連性評価が不可欠です。熟練した妥当性評価モデルのトレーニングには高品質なクエリ生成ペアが必要である。残念ながら、現在の手法では誤った否定を誤ってサンプリングし、パフォーマンスとビジネスへの影響を減らし、プールバイアスを導入しています。そこで本研究では,従来の偽陰性推定アルゴリズムに基づいて,偽陰性の検出・調整に適した新しいネガティブサンプリング手法であるBias-mitigating Hard Negative Smpling(BHNS)を提案する。 Instacartサーチセッティングの実験により,BHNSが実用的なeコマースに有効であることが確認された。さらに、パブリックデータセットにおける比較分析は、多様なアプリケーションに対するドメインに依存しない可能性を示している。 Efficient and accurate product relevance assessment is critical for user experiences and business success. Training a proficient relevance assessment model requires high-quality query-product pairs, often obtained through negative sampling strategies. Unfortunately, current methods introduce pooling bias by mistakenly sampling false negatives, diminishing performance and business impact. To address this, we present Bias-mitigating Hard Negative Sampling (BHNS), a novel negative sampling strategy tailored to identify and adjust for false negatives, building upon our original False Negative Estimation algorithm. Our experiments in the Instacart search setting confirm BHNS as effective for practical e-commerce use. Furthermore, comparative analyses on public dataset showcase its domain-agnostic potential for diverse applications.	翻訳日:2023-11-14 18:33:38 公開日:2023-11-11
# CVTHead:Vertex-Feature Transformer付きワンショット制御可能なヘッドアバター CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer ( http://arxiv.org/abs/2311.06443v1 ) ライセンス: Link先を確認	Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie	(参考訳) パーソナライズ可能な頭部アバターの再構成は、AR/VRの分野で重要な意味を持つ。 3Dモデル(3DMM)の明示的な顔制御を実現するための既存の方法は、通常、単一の対象の多視点画像やビデオに依存しており、再構成プロセスは複雑である。さらに、従来のレンダリングパイプラインは時間がかかり、リアルタイムアニメーションの可能性を制限する。本稿では,単一参照画像から点ベースニューラルネットワークによる制御可能なニューラルネットワークアバターを生成する新しいアプローチであるcvtheadを提案する。 CVTHeadは、メッシュのスパース頂点をポイントセットとみなし、提案したVertex-Feature Transformerを使用して各頂点のローカル特徴記述子を学習する。これにより、すべての頂点間の長距離依存性のモデリングが可能になる。 VoxCelebデータセットの実験結果は、CVTHeadが最先端のグラフィックスベースの手法と同等のパフォーマンスを達成することを示した。さらに, 表情, ポーズ, カメラビューの異なる新規な人間の頭部の効率的なレンダリングを可能にする。これらの属性は、3dmmの係数を使って明示的に制御でき、リアルタイムシナリオで多用途でリアルなアニメーションが容易になる。 Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.	翻訳日:2023-11-14 18:33:25 公開日:2023-11-11
# ChaffからBREADでWheatを分離する - テキストの冗長性を検出するためのオープンソースのベンチマークとメトリクス Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text ( http://arxiv.org/abs/2311.06440v1 ) ライセンス: Link先を確認	Isaac Caswell, Lisa Wang, Isabel Papadimitriou	(参考訳) データ品質は、タスク、ドメイン、アーキテクチャに関係なく、NLPの分野全体に永久に再浮上する問題であり、低リソース言語では特に深刻な問題である。トレーニングデータとモデル出力の両方に影響を及ぼす典型的な悪質な問題は、反復的であり、価格カタログやコンピュータ生成ログファイルのような言語的に興味のないボイラープレートによって支配されるデータである。この問題は多くのWebスクレイプコーパスに浸透しているが、テストするベンチマークや、言語全体にわたって一般化し、データ品質の人間の判断に同意する単純なメトリクスを見つけるための体系的な研究はまだない。本研究では,360言語にまたがる反復型ボイラープレート対有理言語コンテンツに関する人間ラベルベンチマークであるbreadを作成・公開する。いくつかの基準値CRED(Character REDundancy)スコアを同時にリリースし,BREADの有効性を評価する。コミュニティはこのリソースをより優れたフィルタリング方法の開発に利用し、credスコアのリファレンス実装が標準的なコーパス評価ツールになり、クリーンな言語モデリングコーパス、特に低リソース言語の開発を促進することを願っています。 Data quality is a problem that perpetually resurfaces throughout the field of NLP, regardless of task, domain, or architecture, and remains especially severe for lower-resource languages. A typical and insidious issue, affecting both training data and model output, is data that is repetitive and dominated by linguistically uninteresting boilerplate, such as price catalogs or computer-generated log files. Though this problem permeates many web-scraped corpora, there has yet to be a benchmark to test against, or a systematic study to find simple metrics that generalize across languages and agree with human judgements of data quality. In the present work, we create and release BREAD, a human-labeled benchmark on repetitive boilerplate vs. plausible linguistic content, spanning 360 languages. We release several baseline CRED (Character REDundancy) scores along with it, and evaluate their effectiveness on BREAD. We hope that the community will use this resource to develop better filtering methods, and that our reference implementations of CRED scores can become standard corpus evaluation tools, driving the development of cleaner language modeling corpora, especially in low-resource languages.	翻訳日:2023-11-14 18:33:05 公開日:2023-11-11
# 動的システムの制御強化のための制御可能性制約付きディープネットワークモデル Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems ( http://arxiv.org/abs/2311.06438v1 ) ライセンス: Link先を確認	Suruchi Sharma, Volodymyr Makarenko, Gautam Kumar, Stas Tiomkin	(参考訳) 力学の知識を持たない力学系の制御は重要かつ困難な課題である。ディープニューラルネットワーク(DNN)のような現代の機械学習アプローチは、制御入力と対応する状態観測出力から動的モデルの推定を可能にする。このようなデータ駆動モデルはしばしばモデルベースのコントローラの導出に利用される。しかし、一般的には、dnnで表されるモデルは、制御可能性の正式な制御理論的な意味に従って制御可能であるという保証はない。これはしばしば、正式な制御可能性を保証する必要があるアプリケーションにおけるDNN推定モデルの使用を妨げる。本稿では,制御可能性のあるデータから推定されるモデルを明確に拡張する制御理論手法を提案する。これは、制御可能性の低いモデルにペナルティを与える制御可能性制約でモデル推定目標を増大させることによって達成される。その結果, 制御可能性制約により推定されたモデルでは, より効率的な制御器の導出が可能となり, 制御理論量によって解釈可能となり, 長期予測誤差が低くなった。提案手法は、未知の力学のDNNに基づく推定と解の性質の制御理論的保証との関連性に関する新たな知見を提供する。低分解能高次元画像による状態観察を行う2つの標準古典制御系において,提案手法が優れていることを示す。 Control of a dynamical system without the knowledge of dynamics is an important and challenging task. Modern machine learning approaches, such as deep neural networks (DNNs), allow for the estimation of a dynamics model from control inputs and corresponding state observation outputs. Such data-driven models are often utilized for the derivation of model-based controllers. However, in general, there are no guarantees that a model represented by DNNs will be controllable according to the formal control-theoretical meaning of controllability, which is crucial for the design of effective controllers. This often precludes the use of DNN-estimated models in applications, where formal controllability guarantees are required. In this proof-of-the-concept work, we propose a control-theoretical method that explicitly enhances models estimated from data with controllability. That is achieved by augmenting the model estimation objective with a controllability constraint, which penalizes models with a low degree of controllability. As a result, the models estimated with the proposed controllability constraint allow for the derivation of more efficient controllers, they are interpretable by the control-theoretical quantities and have a lower long-term prediction error. The proposed method provides new insights on the connection between the DNN-based estimation of unknown dynamics and the control-theoretical guarantees of the solution properties. We demonstrate the superiority of the proposed method in two standard classical control systems with state observation given by low resolution high-dimensional images.	翻訳日:2023-11-14 18:32:43 公開日:2023-11-11
# 公平へのステップバイステップ:タスク指向対話システムにおける社会バイアスの帰属 Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems ( http://arxiv.org/abs/2311.06513v1 ) ライセンス: Link先を確認	Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel	(参考訳) 近年,タスク指向対話(TOD)システムにおいて,事前学習された大規模言語モデル(LLM)をエンドツーエンドで活用することにより,大幅な改善が見られた。しかし,TOD システムにおける各コンポーネントの偏りの挙動や,エンドツーエンドフレームワークにおけるエラー伝搬の問題により,TOD 応答のバイアスが深刻になる可能性がある。フェアネスの既存の仕事はシステムのバイアスにのみ焦点を合わせます。本論文では,TODシステムの各コンポーネントに偏りを生じさせる診断手法を提案する。提案手法では,バイアスの発生源についてより深く理解することができる。さらに、より粒度の細かいモデル挙動を緩和することができる。性別,年齢,人種の3つの集団軸に対するtodシステムのバイアスを識別する実験を行った。実験結果から,TODシステムのバイアスは通常応答生成モデルから生じることが示された。 Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.	翻訳日:2023-11-14 18:24:03 公開日:2023-11-11
# CNNモデル伝搬を用いた帯域幅ハイパースペクトル画像パシャパニング Band-wise Hyperspectral Image Pansharpening using CNN Model Propagation ( http://arxiv.org/abs/2311.06510v1 ) ライセンス: Link先を確認	Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giuseppe Scarpa	(参考訳) ハイパースペクトルパンシャープニングは近年、多くの研究論文や課題によって証明されたように、関心が高まっている。低分解能のハイパースペクトルデータキューブと高分解能のシングルバンド画像であるパンクロマティック画像とのピクセルレベルの融合と、パンクロマティック解像度でハイパースペクトルデータキューブを提供することを目的としている。強力な表現能力のおかげで、ディープラーニングモデルは、多くの汎用画像処理タスクで前例のない結果を提供することに成功した。しかしながら、ドメイン固有の問題に移行する場合、例えばこの場合のように、伝統的なモデルベースのアプローチに対する利点は、いくつかの文脈上の理由から、より明確でない。トレーニングデータの空洞化,地味の欠如,データ形状の変動は,ハイパースペクトルパンシャーピングのための最先端のディープラーニングネットワークの一般化能力を制限する要因である。これらの制約に対処するため、本研究では、各バンドが先行するモデルにパンスハーペンを精製する逐次的帯域適応方式でネストされた単純な単一バンドアン教師付きパンスハーペンモデルを継承する新しいディープラーニング手法を提案する。これにより、簡単なモデルが波長次元に沿って適応的かつ柔軟に伝播し、一定の数のスペクトル帯域を持つ必要がなく、大規模で高価なラベル付きトレーニングデータセットを廃棄する必要がない。提案手法は,従来の学習基準法と深層学習基準法の両方より優れた結果が得られる。提案手法の実装はhttps://github.com/giu-guarino/R-PNNで確認できる。 Hyperspectral pansharpening is receiving a growing interest since the last few years as testified by a large number of research papers and challenges. It consists in a pixel-level fusion between a lower-resolution hyperspectral datacube and a higher-resolution single-band image, the panchromatic image, with the goal of providing a hyperspectral datacube at panchromatic resolution. Thanks to their powerful representational capabilities, deep learning models have succeeded to provide unprecedented results on many general purpose image processing tasks. However, when moving to domain specific problems, as in this case, the advantages with respect to traditional model-based approaches are much lesser clear-cut due to several contextual reasons. Scarcity of training data, lack of ground-truth, data shape variability, are some such factors that limit the generalization capacity of the state-of-the-art deep learning networks for hyperspectral pansharpening. To cope with these limitations, in this work we propose a new deep learning method which inherits a simple single-band unsupervised pansharpening model nested in a sequential band-wise adaptive scheme, where each band is pansharpened refining the model tuned on the preceding one. By doing so, a simple model is propagated along the wavelength dimension, adaptively and flexibly, with no need to have a fixed number of spectral bands, and, with no need to dispose of large, expensive and labeled training datasets. The proposed method achieves very good results on our datasets, outperforming both traditional and deep learning reference methods. The implementation of the proposed method can be found on https://github.com/giu-guarino/R-PNN	翻訳日:2023-11-14 18:23:49 公開日:2023-11-11
# CompCodeVet: コードデータセットに対するコンパイラ誘導検証と拡張アプローチ CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset ( http://arxiv.org/abs/2311.06505v1 ) ライセンス: Link先を確認	Le Chen, Arijit Bhattacharjee, Nesreen K. Ahmed, Niranjan Hasabnis, Gal Oren, Bin Lei, Ali Jannesari	(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションで顕著な性能を持つため、学術や産業でますます顕著になっている。これらのモデルがパラメータの増加とともに進化するにつれて、感情分析や機械翻訳といったタスクに優れている。しかし、数十億のパラメータを持つモデルでさえ、マルチステップ推論を必要とするタスクの課題に直面している。コード生成と理解、特にCとC++は、大きな課題として現れます。コードデータセットでトレーニングされたLLMは、多くのタスクで能力を示すが、コンパイル不可能なCとC++のコードの修正に苦労している。当社の調査では,この部分的なパフォーマンスを,トレーニングデータセットの品質と,複雑な推論を必要とする問題の固有の複雑性という,2つの主要な要因に当てはめています。既存の"Chain of Thought"(CoT)促進技術は、多段階推論を強化することを目的としている。しかし、このアプローチはLLMの潜在的な欠点に関連する制限を保っている。本研究では,コンパイル不能なコードからコンパイル可能なコードを生成するコンパイラ誘導型CoTアプローチであるCompCodeVetを提案する。より大規模なLLMを利用する従来のアプローチとは違い,より堅牢なゼロショット思考プロセスを確立するために,コンパイラを教師として採用している。 2つのオープンソースコードデータセットに対するCompCodeVetの評価は、CompCodeVetがLLMのトレーニングデータセット品質を改善する能力を持っていることを示している。 Large language models (LLMs) have become increasingly prominent in academia and industry due to their remarkable performance in diverse applications. As these models evolve with increasing parameters, they excel in tasks like sentiment analysis and machine translation. However, even models with billions of parameters face challenges in tasks demanding multi-step reasoning. Code generation and comprehension, especially in C and C++, emerge as significant challenges. While LLMs trained on code datasets demonstrate competence in many tasks, they struggle with rectifying non-compilable C and C++ code. Our investigation attributes this subpar performance to two primary factors: the quality of the training dataset and the inherent complexity of the problem which demands intricate reasoning. Existing "Chain of Thought" (CoT) prompting techniques aim to enhance multi-step reasoning. This approach, however, retains the limitations associated with the latent drawbacks of LLMs. In this work, we propose CompCodeVet, a compiler-guided CoT approach to produce compilable code from non-compilable ones. Diverging from the conventional approach of utilizing larger LLMs, we employ compilers as a teacher to establish a more robust zero-shot thought process. The evaluation of CompCodeVet on two open-source code datasets shows that CompCodeVet has the ability to improve the training dataset quality for LLMs.	翻訳日:2023-11-14 18:23:21 公開日:2023-11-11
# 産業欠陥の視覚検査のための自己教師付きコンテキスト学習 Self-supervised Context Learning for Visual Inspection of Industrial Defects ( http://arxiv.org/abs/2311.06504v1 ) ライセンス: Link先を確認	Peng Wang, Haiming Yao, Wenyong Yu	(参考訳) 産業製品における欠陥の教師なし視覚検査は、製品表面のかなりの変化のために重大な課題となる。現在の教師なしモデルは、テクスチャの検出とオブジェクトの欠陥のバランスを保ち、遅延表現と複雑な特徴を識別する能力が欠如している。本稿では,有名なジグソーパズルに取り組むことで,最適なエンコーダを導出する自己教師型学習アルゴリズムを提案する。目的画像を9つのパッチに分割し、エンコーダに2つのパッチ間の相対的な位置関係を予測させ、リッチなセマンティクスを抽出する。次に,正規表現と異常表現の差異を強調する親和性提示法を提案する。古典的サポートベクトルデータ記述アルゴリズムを活用すると、最終的な検出結果が得られる。実験結果から,広範に使用されているMVTec ADデータセットにおいて,95.8%,96.8%の精度で検出およびセグメンテーション性能が向上し,テクスチャとオブジェクトの両欠陥に対する最先端のベンチマークが確立された。包括的実験は,多種多様な産業応用における我々のアプローチの有効性を強調する。 The unsupervised visual inspection of defects in industrial products poses a significant challenge due to substantial variations in product surfaces. Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features. In this paper, we present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle. Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics. Subsequently, we introduce an affinity-augmentation method to accentuate differences between normal and abnormal latent representations. Leveraging the classic support vector data description algorithm yields final detection results. Experimental outcomes demonstrate that our proposed method achieves outstanding detection and segmentation performance on the widely used MVTec AD dataset, with rates of 95.8% and 96.8%, respectively, establishing a state-of-the-art benchmark for both texture and object defects. Comprehensive experimentation underscores the effectiveness of our approach in diverse industrial applications.	翻訳日:2023-11-14 18:23:04 公開日:2023-11-11
# ドメイン固有の質問応答におけるLLMの知識的選好アライメント Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering ( http://arxiv.org/abs/2311.06503v1 ) ライセンス: Link先を確認	Yichi Zhang, Zhuo Chen, Yin Fang, Lei Cheng, Yanxi Lu, Fangming Li, Wen Zhang, Huajun Chen	(参考訳) 近年,大規模言語モデル(LLM)の開発が学術や産業で広く注目を集めている。 LLMを実際のシナリオにデプロイすることは、現在のインターネット産業における重要な方向のひとつです。本稿では,ドメイン知識グラフ(KG)を組み込んだドメイン固有質問応答(QA)にLLMを適用するパイプラインを提案する。現実世界のアプリケーションとして、llmsが生成するコンテンツはユーザフレンドリーでなければならない。さらに、モデルは信頼できる回答を生成するためにドメイン知識を適切に利用する必要があります。この2つの問題は、バニラの微調整が適切に対処できないため、llmアプリケーションにおける2つの大きな困難である。両方の要件は、実用的応用を達成するために人間と協調する必要があるモデル選好問題として統一できると考えています。そこで我々は,この2つの課題に対処するために,スタイル選好セットと知識選好セットという2種類の選好セットを構築するKnowPAT(KnowPAT)を提案する。さらに,LLMの嗜好と人間の嗜好を一致させる新たなアライメント目的を設計し,実シナリオドメイン固有のQAに対して,信頼性とユーザフレンドリな回答を生成するために,より良いLLMをトレーニングすることを目的とする。実験と15のベースラインメソッドによる総合的な実験により、我々のKnowPATはLLMを用いた実シナリオドメイン固有のQAにおいて、優れたパイプラインであることが示された。私たちのコードはhttps://github.com/zjukg/KnowPAT.comでオープンソースです。 Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.	翻訳日:2023-11-14 18:22:43 公開日:2023-11-11
# druformer: 運転場面の強化運転関係の自己理解による重要物体検出 DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding ( http://arxiv.org/abs/2311.06497v1 ) ライセンス: Link先を確認	Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda	(参考訳) 交通事故はしばしば致命傷を負い、2023年まで5000万人以上の死者を出した。運転の危険を軽減し、個人の安全を確保するためには、走行中の重要な物体を予測するための車両支援が不可欠である。重要物体検出に関するこれまでの研究は、主に個々の参加者の重要性を評価し、それらを独立した実体として扱い、それらの参加者間のつながりをよく見落としていた。残念ながら、このアプローチは複雑なシナリオで重要なオブジェクトを検出するのにあまり効果がないことが分かっています。そこで本研究では,重要な物体検出タスクを強化するために,運転シーン関連自己理解トランス (DRUformer) を提案する。 druformerはトランスフォーマティブベースのマルチモーダル重要な物体検出モデルであり、運転シナリオのすべての参加者間の関係を考慮に入れている。運転意図が運転中の重要な物体の検出に大きく影響していることを認識し,運転意図を埋め込むモジュールを組み込んだ。提案手法の性能を評価するために,演劇データセットの比較実験を行い,他の最先端(sota)モデルと比較した。その結果、mIoUの16.2\%改善とACCの12.3\%向上がSOTA法と比較して顕著に示された。さらに,様々な道路シナリオやクラスにまたがる重要な物体を検出できるモデルの質的分析を行い,多様な文脈における有効性に注目した。最後に,druformerモデルにおいて提案するモジュールの効率を評価するため,様々なアブレーション実験を行った。 Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model.	翻訳日:2023-11-14 18:22:18 公開日:2023-11-11
# LayoutPrompter: 大規模言語モデルの設計能力の覚醒 LayoutPrompter: Awaken the Design Ability of Large Language Models ( http://arxiv.org/abs/2311.06495v1 ) ライセンス: Link先を確認	Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang	(参考訳) ユーザの制約を高品質なレイアウトに自動マッピングする条件付きグラフィックレイアウト生成が,今日では広く注目を集めている。最近の研究は有望な性能を達成しているが、汎用性とデータ効率の欠如は実用的応用を妨げる。そこで本研究では,大規模言語モデル(LLM)を活用したLayoutPrompterを提案する。 LayoutPrompterは、入力出力シリアライゼーション、動的指数選択、レイアウトランキングという3つの重要なコンポーネントで構成されている。具体的には、入力出力シリアライゼーションコンポーネントは、各レイアウト生成タスクの入力および出力フォーマットを慎重に設計する。動的例題選択は、与えられた入力に対して最も有用な例題を選択する責任がある。 LLMの複数の出力から最高品質のレイアウトを選択するためにレイアウトローダが使用される。 4つの公開データセットを用いて既存のレイアウト生成タスクをすべて実験する。このアプローチの単純さにもかかわらず、実験結果から、LayoutPrompterはモデルトレーニングや微調整なしに、これらのタスクにおける最先端のアプローチと競合したり、性能を上回ります。これは、この多用途でトレーニングフリーなアプローチの有効性を示しています。さらに,レイアウトプロンプターは低データ状態におけるトレーニングベースベースラインよりも有意に優れており,レイアウトプロンプターのデータ効率も向上している。私たちのプロジェクトはhttps://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompterで利用可能です。 Conditional graphic layout generation, which automatically maps user constraints to high-quality layouts, has attracted widespread attention today. Although recent works have achieved promising performance, the lack of versatility and data efficiency hinders their practical applications. In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above problems through in-context learning. LayoutPrompter is made up of three key components, namely input-output serialization, dynamic exemplar selection and layout ranking. Specifically, the input-output serialization component meticulously designs the input and output formats for each layout generation task. Dynamic exemplar selection is responsible for selecting the most helpful prompting exemplars for a given input. And a layout ranker is used to pick the highest quality layout from multiple outputs of LLMs. We conduct experiments on all existing layout generation tasks using four public datasets. Despite the simplicity of our approach, experimental results show that LayoutPrompter can compete with or even outperform state-of-the-art approaches on these tasks without any model training or fine-tuning. This demonstrates the effectiveness of this versatile and training-free approach. In addition, the ablation studies show that LayoutPrompter is significantly superior to the training-based baseline in a low-data regime, further indicating the data efficiency of LayoutPrompter. Our project is available at https://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompter.	翻訳日:2023-11-14 18:21:53 公開日:2023-11-11
# L3 アンサンブル:基礎言語モデルのアンサンブルのための生涯学習アプローチ L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models ( http://arxiv.org/abs/2311.06493v1 ) ライセンス: Link先を確認	Aidin Shiri, Kaushik Roy, Amit Sheth, Manas Gaur	(参考訳) 特定のタスクのための微調整済み基礎言語モデル(FLM)は、特にリソース制約のあるデバイスでは、しばしば実用的ではない。これは、自然言語処理(NLP)タスクのストリームに継続的に適応する、生涯学習(L3)フレームワークの開発を必要とする。本稿では,未知のデータから意味のある表現を抽出し,構造化知識ベースを構築し,タスク性能を漸進的に向上させるアプローチを提案する。我々は,GLUE や SuperGLUE などのベンチマークを含む様々な NLP タスクの有効性を検証する実験を行った。精度,トレーニング効率,知識伝達指標において,優れたパフォーマンスを測定した。初期実験の結果, 提案手法はflmに比べて, モデルの精度を4%～36%向上させることがわかった。さらに、L3モデルは、STSベンチマークで与えられたタスクの最先端言語モデル(T5)と比較して、競争力や優れたパフォーマンス(最大15.4%の精度向上)を維持しながら、微調整のアプローチよりも優れている。 Fine-tuning pre-trained foundational language models (FLM) for specific tasks is often impractical, especially for resource-constrained devices. This necessitates the development of a Lifelong Learning (L3) framework that continuously adapts to a stream of Natural Language Processing (NLP) tasks efficiently. We propose an approach that focuses on extracting meaningful representations from unseen data, constructing a structured knowledge base, and improving task performance incrementally. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. We measured good performance across the accuracy, training efficiency, and knowledge transfer metrics. Initial experimental results show that the proposed L3 ensemble method increases the model accuracy by 4% ~ 36% compared to the fine-tuned FLM. Furthermore, L3 model outperforms naive fine-tuning approaches while maintaining competitive or superior performance (up to 15.4% increase in accuracy) compared to the state-of-the-art language model (T5) for the given task, STS benchmark.	翻訳日:2023-11-14 18:21:29 公開日:2023-11-11
# 動的葉を持つ時空量子力学と古典力学 Spacetime quantum and classical mechanics with dynamical foliation ( http://arxiv.org/abs/2311.06486v1 ) ライセンス: Link先を確認	N. L. Diaz, J. M. Matera, R. Rossignoli	(参考訳) 古典物理学の通常の位相空間は、空間と時間が異なる扱いをし、この差は場の理論と量子力学(qm)に引き継がれる。本稿では、位相空間を2つの主要な拡張により拡張する。まず,ルジャンドル変換の時間選択を動的変数に促進する。次に、物質場のポアソン括弧を時空対称形式に拡張する。続く「時相空間」は、相対論的場の理論に対するハミルトン方程式の明示的な共変版を得るために用いられる。形式主義の正準的な量子化は、場の時空の可換関係を満足し、葉は量子である。このアプローチでは、古典的作用は作用素に昇格し、物質分離分割における非分離性を通して明示的な共分散を保持する。新しい非因果的枠組み(異なる時間における場が独立である)と従来のQMとの対応性を確立する問題は、時空への空間的相関の一般化によって解決される。この一般化では、ハミルトン粒子は作用に置き換わり、従来の粒子はオフシェル粒子に置き換わる。葉が量子化されると、pageおよびwootters機構と類似して、葉の固有状態の条件付けによって以前の地図を復元する。また、システムと環境の間の量子相関から与えられた理論の因果構造が現れる対応の解釈も提供する。このアイデアは一般的な量子系を包含し、密度行列を空間と時間の両方で相関子の情報を含む作用素に一般化することができる。 The conventional phase space of classical physics treats space and time differently, and this difference carries over to field theories and quantum mechanics (QM). In this paper, the phase space is enhanced through two main extensions. Firstly, we promote the time choice of the Legendre transform to a dynamical variable. Secondly, we extend the Poisson brackets of matter fields to a spacetime symmetric form. The ensuing "spacetime phase space" is employed to obtain an explicitly covariant version of Hamilton equations for relativistic field theories. A canonical-like quantization of the formalism is then presented in which the fields satisfy spacetime commutation relations and the foliation is quantum. In this approach, the classical action is also promoted to an operator and retains explicit covariance through its non-separability in the matter-foliation partition. The problem of establishing a correspondence between the new noncausal framework (where fields at different times are independent) and conventional QM is solved through a generalization of spacelike correlators to spacetime. In this generalization, the Hamiltonian is replaced by the action, and conventional particles by off-shell particles. When the foliation is quantized, the previous map is recovered by conditioning on foliation eigenstates, in analogy with the Page and Wootters mechanism. We also provide an interpretation of the correspondence in which the causal structure of a given theory emerges from the quantum correlations between the system and an environment. This idea holds for general quantum systems and allows one to generalize the density matrix to an operator containing the information of correlators both in space and time.	翻訳日:2023-11-14 18:21:13 公開日:2023-11-11
# 重み付き$p$-R\'{e}nyiエントロピーパワー不等式:量子シャノン理論への情報理論 Weighted $p$-R\'{e}nyi Entropy Power Inequality: Information Theory to Quantum Shannon Theory ( http://arxiv.org/abs/2311.06484v1 ) ライセンス: Link先を確認	Junseo Lee, Hyeonjun Yeo, Kabgyun Jeong	(参考訳) p$-R\'{e}nyi エントロピーパワーの不等式を、2つの独立連続確率変数 $X$ と $Y$ の重み係数 $t$ で研究する。この拡張は本質的に、ボブコフとマルシグリッティによるシャープ・ヤングの不等式による変調に依存する。我々の研究は量子シャノン理論の基本的な研究結果として利用でき、量子系におけるエントロピーパワーの不平等の R'{e}nyi 版を提供する。 We study the $p$-R\'{e}nyi entropy power inequality with a weight factor $t$ on two independent continuous random variables $X$ and $Y$. The extension essentially relies on a modulation on the sharp Young's inequality due to Bobkov and Marsiglietti. Our research provides a key result that can be used as a fundamental research finding in quantum Shannon theory, as it offers a R\'{e}nyi version of the entropy power inequality for quantum systems.	翻訳日:2023-11-14 18:20:47 公開日:2023-11-11
# 重ね合わせネットワークによる物理学習の改善:ニューラルネットワークとディープオペレータネットワークへの応用 Stacked networks improve physics-informed training: applications to neural networks and deep operator networks ( http://arxiv.org/abs/2311.06483v1 ) ライセンス: Link先を確認	Amanda A Howard, Sarah H Murphy, Shady E Ahmed, Panos Stinis	(参考訳) 物理インフォームドニューラルネットワークとオペレータネットワークは、物理システムをモデル化する方程式を効果的に解くことを約束している。しかし、これらのネットワークはいくつかの方程式系に対して正確に訓練することは困難または不可能である。本稿では,物理インフォームドニューラルネットワークと演算子ネットワークを積み重ねてトレーニングを容易にする,新しい多忠実度フレームワークを提案する。そこで我々は,学習モデルの表現性を高めつつ,次のステップを訓練するための低忠実度入力として1ステップのアウトプットが機能するネットワークの連鎖を構築した。反復過程の各ステップで課される方程式は同じか異なる(シミュレート・アニーリングのように)。提案手法の反復的(スタックング)な性質は,直接学習しにくい解の特徴を段階的に学習することを可能にする。非線形振り子,波動方程式,粘性バーガース方程式などのベンチマーク問題を通じて,物理に変形したニューラルネットワークと演算子ネットワークの精度向上とサイズ削減にスタック化がいかに役立つかを示す。 Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.	翻訳日:2023-11-14 18:20:33 公開日:2023-11-11
# ロボット学習における分布外検出のためのトポロジーマッチング正規化フロー Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning ( http://arxiv.org/abs/2311.06481v1 ) ライセンス: Link先を確認	Jianxiang Feng, Jongseok Lee, Simon Geisler, Stephan Gunnemann, Rudolph Triebel	(参考訳) 現実の自律ロボットの信頼性の高い展開を容易にするためには、アウト・オブ・ディストリビューション(OOD)検出機能が必要であることが多い。 OOD検出のための強力なアプローチは、正規化フロー(NF)を用いた密度推定に基づいている。しかし,NFsを用いた先行的な研究は,複雑な対象分布とナイーブ基底分布とをトポロジカルに一致させることで,悪影響が生じる。本研究では,この位相的ミスマッチを,要求されるトポロジーに適合する情報論的目的を訓練した表現型クラス条件ベース分布を用いて回避する。提案手法は,OOD検出能力を向上しつつ,性能劣化や計算オーバーヘッドの最小化を伴わず,既存の学習モデルとの広範な互換性を享受できる。本研究では,密度推定と2次元物体検出ベンチマークにおいて,広範なベースラインと比較し,優れた結果を示す。さらに,本手法の適用性を実ロボットで示す。 To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse implications. In this work, we circumvent this topological mismatch using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. The proposed method enjoys the merits of wide compatibility with existing learned models without any performance degradation and minimum computation overhead while enhancing OOD detection capabilities. We demonstrate superior results in density estimation and 2D object detection benchmarks in comparison with extensive baselines. Moreover, we showcase the applicability of the method with a real-robot deployment.	翻訳日:2023-11-14 18:20:15 公開日:2023-11-11
# クラス不均衡に対処するために発生した呼吸音を用いた敵対的微調整 Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance ( http://arxiv.org/abs/2311.06480v1 ) ライセンス: Link先を確認	June-Woo Kim, Chihyeon Yoon, Miika Toikkanen, Sangmin Bae, Ho-Young Jung	(参考訳) 深層生成モデルは、データの不足に対処するために医療画像領域において有望なアプローチとして現れてきた。しかし、呼吸音などのシーケンシャルなデータに対する使用は調査されていない。本研究では,条件付きニューラルボコーダとして音響拡散モデルを用いた非平衡呼吸音データ拡張手法を提案する。また, 合成音と実呼吸音の特徴を整合させ, 呼吸音の分類性能を向上させるために, 簡易かつ効果的な対向微調整法を実証した。 icbhiデータセットにおける実験結果から,提案手法は,従来の拡張法のみを用いて性能低下を示すが,逆向きの微調整が効果的であることが判明した。さらに,本手法はicbhiスコアでベースラインを2.24%上回り,マイノリティクラスの精度を26.58%まで向上させる。追加資料については、https://github.com/kaen2891/adversarial_fine-tuning_using_create_respiratory_soundでコードを提供します。 Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.	翻訳日:2023-11-14 18:19:59 公開日:2023-11-11
# 点状エミッターに焦点をあてた3次元イメージフリー共焦点 FiND: Few-shot three-dimensional image-free confocal focusing on point-like emitters ( http://arxiv.org/abs/2311.06479v1 ) ライセンス: Link先を確認	Swetapadma Sahoo, Junyue Jiang, Jaden Li, Kieran Loehr, Chad E. Germany, Jincheng Zhou, Bryan K. Clark, Simeon I. Bogdanov	(参考訳) 共焦点蛍光顕微鏡は生体分子、材料欠陥、量子光源などの点状放出物質の研究に広く応用されている。共焦点法では、光学分解能の向上、劇的な蛍光背景の拒絶、サブナノメータの局在化、蛍光バイオマーカーの超分解能イメージング、単一分子追跡、量子エミッタのキャラクタリゼーションに有用である。しかし、共焦点顕微鏡では、点状エミッタに焦点をあてる高速でノイズの少ない自動3Dが欠落している。ここでは,ハードウェアアドオンや修正を必要としない,イメージフリーな非トレーニング型3dフォーカスフレームワークであるfind (focusing in noise domain)を紹介する。 FiND は信号対雑音比を 1 まで減らし、信号対雑音比を 5 以上で数発操作する。 FiNDは、教師なしで大規模な、異質な量子エミッタの集合に焦点を合わせることができる。さらに,1つのnvセンターのドリフト軌道を10nmの精度で無期限に追従することにより,リアルタイム3dトラッキングの探索の可能性を示す。その結果,findは生物学,物質科学,量子光学における点状エミッタのスケーラブルな解析に有用なフレームワークであることがわかった。 Confocal fluorescence microscopy is widely applied for the study of point-like emitters such as biomolecules, material defects, and quantum light sources. Confocal techniques offer increased optical resolution, dramatic fluorescence background rejection and sub-nanometer localization, useful in super-resolution imaging of fluorescent biomarkers, single-molecule tracking, or the characterization of quantum emitters. However, rapid, noise-robust automated 3D focusing on point-like emitters has been missing for confocal microscopes. Here, we introduce FiND (Focusing in Noisy Domain), an imaging-free, non-trained 3D focusing framework that requires no hardware add-ons or modifications. FiND achieves focusing for signal-to-noise ratios down to 1, with a few-shot operation for signal-to-noise ratios above 5. FiND enables unsupervised, large-scale focusing on a heterogeneous set of quantum emitters. Additionally, we demonstrate the potential of FiND for real-time 3D tracking by following the drift trajectory of a single NV center indefinitely with a positional precision of < 10 nm. Our results show that FiND is a useful focusing framework for the scalable analysis of point-like emitters in biology, material science, and quantum optics.	翻訳日:2023-11-14 18:19:39 公開日:2023-11-11
# 第1回生成AIと法に関するワークショップ報告 Report of the 1st Workshop on Generative AI and Law ( http://arxiv.org/abs/2311.06477v1 ) ライセンス: Link先を確認	A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry, Milad Nasr, Paul Ohm, Adam Roberts, Tom Rubin, Pamela Samuelson, Ludwig Schubert, Kristen Vaccaro, Luis Villa, Felix Wu, Elana Zeide	(参考訳) 本報告では,2023年7月に開催された第1回生成AI法ワークショップ(GenLaw)について述べる。コンピュータ科学と法学の実践者と学者の学際的なグループが集まり、生成aiに関する法律と法のための生成aiによって提示される技術的、教義的、そして政策上の課題について議論し、特にアメリカ法を強調した。我々は、なぜジェネレーティブAIが法律にとって非常に重要で、非常に難しいのか、という高いレベルの声明でレポートを開始する。これらの課題を満たすために、我々は、必要不可欠なニーズがあると結論づける。 1) 専門分野にまたがる専門家に共通の概念言語を提供する共有知識ベース 2)他のコンピュータ及びAIシステムと比較して,生成型AIシステムの特有な技術的能力の明確化 3) これらの制度が提起する法的問題に関する論理的分類,及び 4) 創発的AIと法律の交差する新興問題における協力と知識共有を促進するための具体的な研究課題。本報告では,これらのニーズに対処し始めるgenlawワークショップの要点をまとめる。リストされた著者の全員がこのレポートをベースとしたワークショップに貢献したが、彼らとその組織は必ずしもこのレポートのすべての特定の主張を支持していない。 This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.	翻訳日:2023-11-14 18:19:12 公開日:2023-11-11
# グラフニューラルネットワークを用いた非構造メッシュ内の渦の同定 Identification of vortex in unstructured mesh with graph neural networks ( http://arxiv.org/abs/2311.06557v1 ) ライセンス: Link先を確認	Lianfa Wang, Yvan Fournier, Jean-Francois Wald, Youssef Mesri	(参考訳) 深層学習は計算流体力学(cfd)データベースからの流れ特性を識別し、研究者が流れ場をよりよく理解できるように支援し、幾何学設計を最適化し、対応する流れ特性に対して正しいcfd構成を選択するために用いられる。畳み込みニューラルネットワーク(CNN)は、フロー特徴の抽出と識別に最も一般的なアルゴリズムの1つである。しかし、追加のフロー場補間なしでの使用は、複雑な幾何学や不規則なメッシュが通常用いられる実際の産業ケースに限定する単純なドメイン幾何学と正規メッシュに限られる。上記の問題に着目し,非構造化メッシュ上でのCFD結果の渦を特定するために,U-Netアーキテクチャを用いたグラフニューラルネットワーク(GNN)モデルを提案する。 CFDメッシュからの代数的乗法を用いたグラフ生成とグラフ階層構築について述べる。 2次元CFDメッシュにおける渦領域をラベル付けするための渦自動ラベル法を提案する。まず, cnn の入力セットを最適化し, cnn モデルに対する現在の gnn カーネルのベンチマークを行い, 分類精度, 訓練効率, 渦形態の同定により, gnn カーネルの性能評価を行った。最後に,非構造メッシュへのアプローチの適応性と,レイノルズ数が異なる乱流モデルが異なる場合に対する一般性を示す。 Deep learning has been employed to identify flow characteristics from Computational Fluid Dynamics (CFD) databases to assist the researcher to better understand the flow field, to optimize the geometry design and to select the correct CFD configuration for corresponding flow characteristics. Convolutional Neural Network (CNN) is one of the most popular algorithms used to extract and identify flow features. However its use, without any additional flow field interpolation, is limited to the simple domain geometry and regular meshes which limits its application to real industrial cases where complex geometry and irregular meshes are usually used. Aiming at the aforementioned problems, we present a Graph Neural Network (GNN) based model with U-Net architecture to identify the vortex in CFD results on unstructured meshes. The graph generation and graph hierarchy construction using algebraic multigrid method from CFD meshes are introduced. A vortex auto-labeling method is proposed to label vortex regions in 2D CFD meshes. We precise our approach by firstly optimizing the input set on CNNs, then benchmarking current GNN kernels against CNN model and evaluating the performances of GNN kernels in terms of classification accuracy, training efficiency and identified vortex morphology. Finally, we demonstrate the adaptability of our approach to unstructured meshes and generality to unseen cases with different turbulence models at different Reynolds numbers.	翻訳日:2023-11-14 18:10:42 公開日:2023-11-11
# 分布シフトによるゼロショット言語間感性分類 : 探索的研究 Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study ( http://arxiv.org/abs/2311.06549v1 ) ライセンス: Link先を確認	Maarten De Raedt, Semere Kiros Bitew, Fr\'ederic Godin, Thomas Demeester and Chris Develder	(参考訳) unseenドメインのout-of-distribution(ood)テストサンプルにおける微調整された言語モデルのパフォーマンスの脆さは、英語でよく研究されているが、多言語モデルでは未検討である。そこで本研究では,OODテストデータのゼロショット言語間転送設定における一般化について検討し,列車データとテストデータ間の言語とドメインのシフトが与える影響を解析した。さらに,単言語的な英語設定ではCADが有用であることが示されているため,OODの一般化改善におけるCADの有効性について検討した。最後に,最近の大規模言語モデル(LLM)のパワーを活用し,CADに関連付けられたコストのかかるアノテーションプロセスを回避するための2つの新しいOOD一般化手法を提案する。英語のimdb movie reviewsでトレーニングされた labse, mbert, xlm-rの3つの多言語モデルを用いて実験を行い,amazon product reviews, tweet, restaurant reviewsの13言語でoodテストセットを評価した。その結果,単言語英語ではOODの低下がみられた。さらに (i)もともとの高リソース言語からの反事実は低リソース言語のOOD一般化を改善し、 (II) 新たに提案したコスト効率のアプローチは,Amazon および Restaurant のレビューにおいて CAD と同等あるいは最大で 3.1% の精度に達する。 The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.	翻訳日:2023-11-14 18:10:21 公開日:2023-11-11
# チャートからatlasへ:潜在空間を1つにマージする From Charts to Atlas: Merging Latent Spaces into One ( http://arxiv.org/abs/2311.06547v1 ) ライセンス: Link先を確認	Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Li\`o, Emanuele Rodol\`a	(参考訳) 意味的に関連したデータセットとタスクでトレーニングされたモデルは、潜在空間内で同等のサンプル間関係を示す。本研究では,そのような潜在空間を集約し,それらの情報を包含する統一空間を作成する。この目的のために、相対的な表現を用いて空間を描画し、簡単な平均でそれらを集約する2段階のアプローチであるRelative Latent Space Aggregationを導入する。分類問題を3つの異なる設定(サンプル、クラス、あるいは両方)で一連の学習タスクに慎重に分割します。次に各タスクでモデルをトレーニングし、結果の潜在空間を集約します。集約された空間を、すべてのタスクで訓練されたエンドツーエンドモデルから派生した空間と比較し、2つの空間が類似していることを示す。次に、集約された空間が分類に適していることを観察し、その表現の中にタスク固有の埋め込み器が残したユニークなインプリントが原因であることを実証的に示す。最終的に、共有領域が存在しないシナリオでフレームワークをテストし、ナイーブマージよりもメリットが少なくても、スペースのマージに引き続き使用できることを示します。 Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces. We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information. To this end, we introduce Relative Latent Space Aggregation, a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an end-to-end model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by task-specific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging.	翻訳日:2023-11-14 18:09:55 公開日:2023-11-11
# 集合論による一般化の理解 Understanding Generalization via Set Theory ( http://arxiv.org/abs/2311.06545v1 ) ライセンス: Link先を確認	Shiqi Liu	(参考訳) 一般化は機械学習モデルの中核にある。しかし、一般化の定義は完全には明確ではない。アルゴリズム,仮説,データセットの一般化の概念を導入するために集合論を用いる。データセットの一般化の性質を解析し、代理一般化手順に関する定理を証明する。この定理は一般化の方法につながる。 MNISTデータセットの一般化実験により,13,541個のサンプルベースを得た。モデルの性能を評価するためにトレーニングセット全体を使用すると、モデルの精度は99.945%になる。しかし、サンプルベースをシフトしたり、ニューラルネットワーク構造を変更したりすると、性能は著しく低下する。また、常に誤予測されたサンプルを特定し、それらがすべて難しい例であることを示す。実験により,一般化定義の精度と提案手法の有効性を実証した。集合論的推論と実験の両方が一般化をより理解するのに役立ちます。 Generalization is at the core of machine learning models. However, the definition of generalization is not entirely clear. We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization. We analyze the properties of dataset generalization and prove a theorem on surrogate generalization procedures. This theorem leads to our generalization method. Through a generalization experiment on the MNIST dataset, we obtain 13,541 sample bases. When we use the entire training set to evaluate the model's performance, the models achieve an accuracy of 99.945%. However, if we shift the sample bases or modify the neural network structure, the performance experiences a significant decline. We also identify consistently mispredicted samples and find that they are all challenging examples. The experiments substantiated the accuracy of the generalization definition and the effectiveness of the proposed methods. Both the set-theoretic deduction and the experiments help us better understand generalization.	翻訳日:2023-11-14 18:09:36 公開日:2023-11-11
# 双方向長期記憶ネットワークを用いた色生成 Generation Of Colors using Bidirectional Long Short Term Memory Networks ( http://arxiv.org/abs/2311.06542v1 ) ライセンス: Link先を確認	A. Sinha	(参考訳) 人間の視覚は、200万から700万の識別可能な色合いと推定される広大な色を区別することができる。しかし、この印象的な範囲は、これらの色が我々の辞書の中で正確に命名され、記述されていることを本質的に意味していない。私たちはしばしば、日常生活で身近な物体や概念と色を関連付けます。この研究は、無数の陰影に対する視覚的認識と、それらを正確に表現し、命名する能力のギャップを埋めようとしている。この目的を達成するために,双方向長短期記憶(BiLSTM)ネットワークとアクティブラーニングを利用した新しいモデルが開発された。このモデルは、この研究のために慎重にキュレートされたプロプライエタリなデータセット上で動作する。本研究の主な目的は、以前は名前のない色を分類・命名したり、伝統的な色用語を損なう中間色を識別するための多用途ツールを作ることである。この発見は、色知覚と言語に対する我々の理解を革新するこの革新的なアプローチの可能性を基礎にしている。本研究は, 厳密な実験と分析を通じて, 多様な産業における自然言語処理(NLP)応用の道筋を照らすものである。広い色スペクトルの探索を容易にすることで、NLPの潜在的な応用は従来の境界を越えて拡張される。 Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries.	翻訳日:2023-11-14 18:09:28 公開日:2023-11-11
# 集積半導体フォトニクスプラットフォーム上の室温エンタングル量子プロセッサ Room-Temperature entangled quantum processor on integrated semiconductor photonics platform ( http://arxiv.org/abs/2311.06541v1 ) ライセンス: Link先を確認	Haibo Hu, Yu Zhou, Ailun Yi, Tongyuan Bao, Chengying Liu, Qi Luo, Yao Zhang, Zi Wang, Zhengtong Liu, Shuming Xiao, Xin Ou, and Qinghai Song	(参考訳) 4H-ケイ素-炭化物-イオン絶縁体 (SiCOI) の台頭は、モノリシック量子フォトニクスネットワークの実現に向けた有望な道のりである。しかし、これらの統合フォトニクスプラットフォーム上で室温絡みレジスタを確立するという課題は未解決のままである。ここでは、SiCOIプラットフォーム上で最初の絡み合ったプロセッサを実演する。室温でのSiCOI上では, 単一希薄電子スピンの決定論的生成と1つの$^{13}$C核スピンの準ユニティスピン初期化が達成可能であることを示す。単一の核スピンをコヒーレントに操作するのに加えて、このCMOS互換半導体集積フォトニクス系では最大エンタングル状態が 0.89 である。この研究は、既存の欠陥ベースのコンピューティングおよびセンシングプロトコルにおけるコンパクトでオンチップなソリューションの基礎を確立し、SiCOIプラットフォームをモノリシックな量子フォトニクスネットワーク統合の最も有望な候補として位置づけている。 The rise of the 4H-silicon-carbide-on-insulator (SiCOI) platform marks a promising pathway towards the realization of monolithic quantum photonic networks. However, the challenge of establishing room-temperature entangled registers on these integrated photonics platforms remains unresolved. Herein, we demonstrate the first entangled processor on the SiCOI platform. We show that both deterministic generation of single divacancy electron spins and near-unity spin initialization of a single $^{13}$C nuclear spin can be achieved on SiCOI at room temperature. Besides coherently manipulating the single nuclear spin, a maximally entangled state with a fidelity of 0.89 has been prepared on this CMOS-compatible semiconductor-integrated photonics system. This work establishes the foundation for compact and on-chip solutions within existing defect-based computing and sensing protocols, positioning the SiCOI platform as the most promising candidate for integrated monolithic quantum photonic networks.	翻訳日:2023-11-14 18:09:09 公開日:2023-11-11
# 非等価な無バイアス基底の実験的実証 Experimental Demonstration of Inequivalent Mutually Unbiased Bases ( http://arxiv.org/abs/2311.06539v1 ) ライセンス: Link先を確認	Wen-Zhe Yan, Yunting Li, Zhibo Hou, Huangjun Zhu, Guo-Yong Xiang, Chuan-Feng Li, and Guang-Can Guo	(参考訳) 相互に偏りのない基底(mub)に基づく量子計測は基礎研究や量子情報処理において重要な役割を果たす。 MUBと等価ではないことが知られているが、その運用上の違いについてはほとんど分かっていない。本研究は, 簡易推定問題により, 高精度フォトニクスシステムに基づく次元4におけるMUBの不等値三重項の操作的区別を実験的に実証する。実験的な推定フィデリティは、平均偏差が0.16$\%である理論的な予測とよく一致し、最大推定フィデリティと最小推定フィデリティとの差(4.1$\%$)より25倍小さい。実験により,不等価なMUBには異なる情報抽出能力と量子情報処理の利点があることが明らかとなった。 Quantum measurements based on mutually unbiased bases (MUB) play crucial roles in foundational studies and quantum information processing. It is known that there exist inequivalent MUB, but little is known about their operational distinctions, not to say experimental demonstration. In this work, by virtue of a simple estimation problem we experimentally demonstrate the operational distinctions between inequivalent triples of MUB in dimension 4 based on high-precision photonic systems. The experimental estimation fidelities coincide well with the theoretical predictions with only 0.16$\%$ average deviation, which is 25 times less than the difference (4.1$\%$) between the maximum estimation fidelity and the minimum estimation fidelity. Our experiments clearly demonstrate that inequivalent MUB have different information extraction capabilities and different merits for quantum information processing.	翻訳日:2023-11-14 18:08:52 公開日:2023-11-11
# 機械学習は社会科学において安全で無責任か? レシディズム予測課題からのパラドックスと再考 Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks ( http://arxiv.org/abs/2311.06537v1 ) ライセンス: Link先を確認	Jianhong Liu (1), Dianshi Li (1) ((1) Faculty of Law, University of Macau, Macau, China)	(参考訳) 本論文は,社会科学への計算的アプローチの根底にある,ハイテイクなイベント予測に関する,基本的な,熱い議論を提起する。我々は機械学習に対するいくつかの一般的な見解を疑問視し、計算手法と従来の社会科学アプローチの融合を促進する新しいパラダイムを概説する。 The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.	翻訳日:2023-11-14 18:08:38 公開日:2023-11-11
# CrashCar101: 損傷評価のための手続き生成 CrashCar101: Procedural Generation for Damage Assessment ( http://arxiv.org/abs/2311.06536v1 ) ライセンス: Link先を確認	Jens Parslov, Erik Riise, Dim P. Papadopoulos	(参考訳) 本稿では,自動車などの車両の損傷評価の問題に対処することに関心がある。この作業では、位置や損傷の程度を検出するだけでなく、損傷部分の特定も必要となる。画像中の意味的部分と損傷のセグメンテーションのためのコンピュータビジョンシステムを訓練するためには,画像に高コストの画素アノテーションを加えて手動でアノテートする必要がある。このニーズを克服するために、これらのモデルをトレーニングするために合成データを使用することを提案する。合成データは、高い可変性、ピクセル精度のアノテーション、そして人間の介入なしに任意に大きなトレーニングセットをサンプルに提供することができる。本研究では, 3次元車両モデルに損傷を与えるプロシージャ生成パイプラインを提案し, 部品および損傷カテゴリに対する画素精度アノテーションと組み合わせた損傷車両の合成2次元画像を得る。私たちのアイデアを検証するために、パイプラインを実行し、CrashCar101データセットをレンダリングします。部品分割と損傷分割のタスクのために、3つの実際のデータセットで実験を行う。部分セグメンテーションについては,実データと合成データの組み合わせで学習したセグメンテーションモデルが,実データのみでトレーニングされたすべてのモデルよりも優れていることを示す。損傷セグメンテーションではCrashCar101のsim2real転送能力を示す。 In this paper, we are interested in addressing the problem of damage assessment for vehicles, such as cars. This task requires not only detecting the location and the extent of the damage but also identifying the damaged part. To train a computer vision system for the semantic part and damage segmentation in images, we need to manually annotate images with costly pixel annotations for both part categories and damage types. To overcome this need, we propose to use synthetic data to train these models. Synthetic data can provide samples with high variability, pixel-accurate annotations, and arbitrarily large training sets without any human intervention. We propose a procedural generation pipeline that damages 3D car models and we obtain synthetic 2D images of damaged cars paired with pixel-accurate annotations for part and damage categories. To validate our idea, we execute our pipeline and render our CrashCar101 dataset. We run experiments on three real datasets for the tasks of part and damage segmentation. For part segmentation, we show that the segmentation models trained on a combination of real data and our synthetic data outperform all models trained only on real data. For damage segmentation, we show the sim2real transfer ability of CrashCar101.	翻訳日:2023-11-14 18:08:32 公開日:2023-11-11
# 自動要約による裁判所意見の理解の高まり Enhancing Public Understanding of Court Opinions with Automated Summarizers ( http://arxiv.org/abs/2311.06534v1 ) ライセンス: Link先を確認	Elliott Ash and Aniket Kesari and Suresh Naidu and Lena Song and Dominik Stammbach	(参考訳) 書記された司法意見は、裁判所決定における公的な信頼を構築するための重要な道具であるが、非専門家が理解することが困難である。本稿では,AIアシスタントを用いて簡易な意見要約を生成するパイプラインを提案する。これらは一般市民によりアクセスしやすく、非専門家にも理解しやすいものであり、簡易的な要約が判断の重要な特徴を理解するのに役立つことを調査実験で示している。大規模言語モデルを用いた研究に法的ドメイン知識を統合する方法について論じる。以上の結果から,AIアシスタントが一般市民に伝える役割と,弁護士がアクセス可能な要約を生成するプロセスを導く役割が示唆された。 Written judicial opinions are an important tool for building public trust in court decisions, yet they can be difficult for non-experts to understand. We present a pipeline for using an AI assistant to generate simplified summaries of judicial opinions. These are more accessible to the public and more easily understood by non-experts, We show in a survey experiment that the simplified summaries help respondents understand the key features of a ruling. We discuss how to integrate legal domain knowledge into studies using large language models. Our results suggest a role both for AI assistants to inform the public, and for lawyers to guide the process of generating accessible summaries.	翻訳日:2023-11-14 18:08:13 公開日:2023-11-11
# マルチモーダル・大規模多言語翻訳のための推論時間における毒性緩和の付加 Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation ( http://arxiv.org/abs/2311.06532v1 ) ライセンス: Link先を確認	Marta R. Costa-juss\`a and David Dale and Maha Elbayad and Bokai Yu	(参考訳) 翻訳の文脈で毒性を加えることは、入力の中に存在するものよりも毒性の高い翻訳出力を生成するという事実を指す。本稿では, 新規な毒性同定パイプラインであるmintoxを提案し, 推理時間に作用するこの問題を緩和する。 MinToxは、マルチモーダル(音声とテキスト)で大規模言語で動作する毒性検出分類器を使用している。この緩和法は、大規模およびテキスト出力に直接言語に適用される。 mintoxは、最新のマルチモーダル機械翻訳システムであるseamlessm4tに適用されている。このシステムのために、MinToxはドメイン、モダリティ、言語方向を横断する毒性を著しく緩和する。 MinToxは、翻訳品質を維持しながら、毒性(モダリティとドメインに依存している)の25%から95%まで、ほぼろ過する。 Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale. The mitigation method is applied to languages at scale and directly in text outputs. MinTox is applied to SEAMLESSM4T, which is the latest multimodal and massively multilingual machine translation system. For this system, MinTox achieves significant added toxicity mitigation across domains, modalities and language directions. MinTox manages to approximately filter out from 25% to 95% of added toxicity (depending on the modality and domain) while keeping translation quality.	翻訳日:2023-11-14 18:08:02 公開日:2023-11-11
# chatgptが脆弱性管理問題を解決する方法 How ChatGPT is Solving Vulnerability Management Problem ( http://arxiv.org/abs/2311.06530v1 ) ライセンス: Link先を確認	Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang	(参考訳) 最近、ChatGPTはコード分析領域から大きな注目を集めています。以前の研究によると、chatgptには抽象構文木生成のような基本的なコード解析タスクの処理能力があり、コード構文と静的な振る舞いを理解するためにchatgptを使用する可能性を示している。しかし、chatgptがセキュリティの関連性の予測やパッチの正確性といった、より複雑な現実世界の脆弱性管理タスクを完了できるかどうかは不明であり、コード構文、プログラムの意味論、関連する手動コメントなど、さまざまな側面を包括的に理解する必要がある。本稿では,78,445のサンプルを含む大規模データセットを用いて,脆弱性管理プロセス全体に関わる6つのタスクに対するchatgptの能力について検討する。各タスクに対して、ChatGPTとSOTAのアプローチを比較し、異なるプロンプトの影響を調査し、困難を調査する。その結果,chatgptを脆弱性管理に活用できる可能性が示唆された。注目すべき例として、ChatGPTのソフトウェアバグレポートのタイトル生成などのタスクにおける熟練度がある。さらに,ChatGPTが抱える困難が明らかとなり,将来的な方向性に光を当てた。例えば、プロンプトでランダムな例を直接提供しても、脆弱性管理における優れたパフォーマンスを一貫して保証することはできない。対照的に、ChatGPTを自己ヒューリスティックな方法で活用 -- 実演例自体から専門知識を抽出し、抽出された専門知識をプロンプトに統合することは、有望な研究方向である。さらにChatGPTは、プロンプトの情報を誤解し、誤用することがある。したがって、ChatGPTが無関係なコンテンツよりも有益な情報に集中するよう効果的に導くことは、まだ未解決の問題である。 Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments. In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 78,445 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way -- extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem.	翻訳日:2023-11-14 18:07:48 公開日:2023-11-11
# TURBO: 自動エンコーダーのスイスナイフ TURBO: The Swiss Knife of Auto-Encoders ( http://arxiv.org/abs/2311.06527v1 ) ライセンス: Link先を確認	Guillaume Qu\'etant, Yury Belousov, Vitaliy Kinakh, Slava Voloshynovskiy	(参考訳) 本稿では,自動符号化手法の体系的解析と一般化を目的とした新しい情報理論フレームワークTURBOを提案する。まず、情報ボトルネックとボトルネックベースのネットワークの原理を自動エンコーディング設定で検証し、それらの固有の制限を識別することから始める。次に、TURBOフレームワークを導入し、情報フローを反映した2方向の様々なデータ表現間の相互情報の最大化からなる、その中核概念を包括的に導出する。このフレームワークには、多くの一般的なニューラルネットワークモデルが含まれている。本論文は,これらのモデルをすべて解明する上で,情報ボトルネックの概念が不十分であることを示す。 TURBOの導入は、データ表現とニューラルネットワークモデルの構造のより深い理解に寄与し、より効率的で汎用的なアプリケーションを可能にする。 We present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We start by examining the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identifying their inherent limitations, which become more prominent for data with multiple relevant, physics-related representations. The TURBO framework is then introduced, providing a comprehensive derivation of its core concept consisting of the maximisation of mutual information between various data representations expressed in two directions reflecting the information flows. We illustrate that numerous prevalent neural network models are encompassed within this framework. The paper underscores the insufficiency of the information bottleneck concept in elucidating all such models, thereby establishing TURBO as a preferable theoretical reference. The introduction of TURBO contributes to a richer understanding of data representation and the structure of neural network models, enabling more efficient and versatile applications.	翻訳日:2023-11-14 18:07:21 公開日:2023-11-11
# 最小記述長ホップフィールドネットワーク Minimum Description Length Hopfield Networks ( http://arxiv.org/abs/2311.06518v1 ) ライセンス: Link先を確認	Matan Abudy, Nur Lan, Emmanuel Chemla, Roni Katzir	(参考訳) 連想記憶アーキテクチャは記憶のために設計されているが、その検索方法を通じて、見つからない入力への一般化の形式を提供する:記憶記憶は、この観点からプロトタイプと見なすことができる。 MHN(Modern Hopfield Networks)に着目して,大規模な記憶能力が一般化の機会を損なうことを示す。このトレードオフを最適化するためのソリューションを提供します。最小記述長(MDL)を使用して、トレーニング中に記憶すべき記憶数と、その数を決定する。 Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this tradeoff. It relies on Minimum Description Length (MDL) to determine during training which memories to store, as well as how many of them.	翻訳日:2023-11-14 18:07:07 公開日:2023-11-11
# BClean:ベイジアンのデータクリーニングシステム BClean: A Bayesian Data Cleaning System ( http://arxiv.org/abs/2311.06517v1 ) ライセンス: Link先を確認	Jianbin Qin, Sifan Huang, Yaoshu Wang, Jing Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao	(参考訳) データクリーニングには、誤ったデータを修正し、汚いデータセットをよりクリーンなものに変換する、さまざまな原則を用いる、かなりの量の作業がある。一般的なアプローチの1つは、ベイズ法を含む確率的手法である。しかし、既存の確率的手法は、しばしば単純分布(例えばガウス分布)を仮定し、それらは実際には不適合であり、専門家が複雑な事前分布(例えば、プログラミング言語を介して)を提供する必要がある。この要件は労働集約的かつ費用がかかるため、実際のアプリケーションには適さない。本稿では,ベイズネットワークの自動構築とユーザインタラクションを特徴とするベイズ清掃システムbcleanを提案する。我々は、データクリーニング問題をベイズ推定として再キャストし、観測されたデータセットの属性とユーザが提供する事前情報の関係を完全に活用する。そこで本研究では,類似度関数を用いた構造学習に基づく関数依存発見法を拡張し,属性間の関係を捉えるベイズネットワーク構築手法を提案する。さらに,本システムでは,生成したベイズネットワークを修正して,自動生成プロセスで特定された事前情報や正確な不正確性を特定する。また,ベイズ推定に必要な効果的なスコアリングモデル(補償スコアリングモデル)を設計する。データクリーニングの効率を高めるために,グラフ分割,ドメインプルーニング,事前検出などベイズ推定のための近似手法を提案する。実世界のデータセットと合成データセットの両方について評価することで、bcleanはデータクリーニングにおいて最大0.9のf-測定を達成でき、既存のベイズ法を2%、その他のデータクリーニング法を15%上回る。 There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.	翻訳日:2023-11-14 18:06:58 公開日:2023-11-11
# フィードバック制御による量子絡み合いの発生と向上 Emergence and enhancement of feedback control induced quantum entanglement ( http://arxiv.org/abs/2311.06578v1 ) ライセンス: Link先を確認	M. Amazioug, D. Dutykh, M. Asjad	(参考訳) 本稿では,機械振動子やマグノンと相互作用しながらキャビティを脱出するキャビティモードにフィードバックを適用し,量子相関を制御する手法を提案する。移動鏡を有するハイブリッドキャビティマグノメカニカルシステムにおいて,提案するコヒーレントフィードバックスキームは,2成分と3成分の量子相関の強化を可能にする。さらに,コヒーレントフィードバック制御の存在下での環境温度に対して,結果として生じる絡み合いは頑健であることを示す。 We present a scheme for controlling quantum correlations by applying feedback to the cavity mode that exits a cavity while interacting with a mechanical oscillator and magnons. In a hybrid cavity magnomechanical system with a movable mirror, the proposed coherent feedback scheme allows for the enhancement of both bipartite and tripartite quantum correlations. Moreover, we demonstrate that the resulting entanglement remains robust with respect to ambient temperatures in the presence of coherent feedback control.	翻訳日:2023-11-14 17:59:14 公開日:2023-11-11
# 強化学習を用いたブラックボックスロボット制御のための知的社会学習に基づく最適化戦略 An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning ( http://arxiv.org/abs/2311.06576v1 ) ライセンス: Link先を確認	Xubo Yang, Jian Gao, Ting Wang, Yaozhen He	(参考訳) ロボットのインテリジェントな制御を実装することは、特に複雑なブラックボックスシステムを扱う場合、これらのロボットの内部動作の可視性と理解が欠如しているため、難しい作業である。本稿では,ブラックボックスロボットシステムのインテリジェント制御を実現するための知的社会学習(ISL)アルゴリズムを提案する。ヒトの社会集団における個人間の相互学習にインスパイアされたISLは、学習、模倣、自己学習スタイルを含む。学習スタイルの個人は、最高のパフォーマーから学び、最も近い関係を形成するために、levy flight search戦略を使用する。模倣スタイルでは、個人はランダムな摂動戦略を用いて第2レベルのラプポートで最高のパフォーマーを模倣する。自己学習スタイルでは、個人は、ベストパフォーマーとの遠い関係を維持しながら、正規分布サンプリング手法を用いて独立して学習する。人口の個人は、それぞれのスタイルで自律的な知的エージェントとみなされる。ニューラルネットワークは、環境とロボットと相互作用し、ネットワークポリシーを反復的に最適化するために、3つのスタイルで戦略的行動を実行する。全体として、ISLは知的最適化の原理に基づいており、強化学習のアイデアを取り入れ、強力な探索能力、高速な計算速度、ハイパーパラメータの減少、スパース報酬に対する感度を持っている。提案するislアルゴリズムは,mujocoの6つの連続制御ベンチマークケースにおいて4つの最先端手法と比較し,その効果と利点を検証した。さらに、UR3ロボットのシミュレーションおよび実験的な把握タスクにISLを採用し、良好な解が得られる。 Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.	翻訳日:2023-11-14 17:59:04 公開日:2023-11-11
# スパース注意に基づくコード分類のためのニューラルネットワーク Sparse Attention-Based Neural Networks for Code Classification ( http://arxiv.org/abs/2311.06575v1 ) ライセンス: Link先を確認	Ziyang Xiang, Zaixi Zhang, Qi Liu	(参考訳) ソースコードを正確かつ効率的に分類することは、実世界のプログラミング教育プラットフォーム管理において難しい問題である。近年,抽象構文木(AST)を用いたモデルベースアプローチがコード分類タスクに広く適用されている。本稿では,SACC(Sparse Attention-based Neural Network for Code Classification)というアプローチを紹介する。最初のステップでは、ソースコードが構文解析と事前処理を受けています。生成された抽象構文木をサブツリーのシーケンスに分割し、再帰的ニューラルネットワークを用いて符号化して高次元表現を得る。このステップでは、コードに含まれる論理構造と語彙レベルの情報の両方を同時に検討する。第2のステップでは、サブツリーの符号化されたシーケンスは、分類のためにスパースアテンション機構を組み込んだトランスフォーマーモデルに供給される。この方法は、自己認識機構の計算コストを効率よく低減し、有効性を保ちながらトレーニング速度を向上させる。私たちの研究は、コード分類タスクのユニークなニーズを満たすように設計された、慎重に設計されたスパースアテンションパターンを導入しました。この設計は冗長な情報の影響を低減し、モデル全体の性能を向上させるのに役立つ。最後に,前回の研究では,不完全分類ラベルやデータセットサイズの小さといった問題も扱っている。我々は,CodeNetデータセットに,膨大な量のデータを含むアルゴリズム関連ラベリングカテゴリを付加した。コード分類作業におけるSACCの有効性と効率を比較検討した。 Categorizing source codes accurately and efficiently is a challenging problem in real-world programming education platform management. In recent years, model-based approaches utilizing abstract syntax trees (ASTs) have been widely applied to code classification tasks. We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) in this paper. The approach involves two main steps: In the first step, source code undergoes syntax parsing and preprocessing. The generated abstract syntax tree is split into sequences of subtrees and then encoded using a recursive neural network to obtain a high-dimensional representation. This step simultaneously considers both the logical structure and lexical level information contained within the code. In the second step, the encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification. This method efficiently reduces the computational cost of the self-attention mechanisms, thus improving the training speed while preserving effectiveness. Our work introduces a carefully designed sparse attention pattern that is specifically designed to meet the unique needs of code classification tasks. This design helps reduce the influence of redundant information and enhances the overall performance of the model. Finally, we also deal with problems in previous related research, which include issues like incomplete classification labels and a small dataset size. We annotated the CodeNet dataset with algorithm-related labeling categories, which contains a significantly large amount of data. Extensive comparative experimental results demonstrate the effectiveness and efficiency of SACC for the code classification tasks.	翻訳日:2023-11-14 17:58:37 公開日:2023-11-11
# 量子ビット列比較器のための一般化空間効率アルゴリズム A Generalized Space-Efficient Algorithm for Quantum Bit String Comparators ( http://arxiv.org/abs/2311.06573v1 ) ライセンス: Link先を確認	Khuram Shahzad and Omar Usman Khan	(参考訳) 量子ビット文字列比較器(QBSC)は、nビットの2つのシーケンスで動作し、その関係を等しく、より大きく、より小さくすることができる。これは条件文がプログラミング言語で使われる方法に似ている。その結果、QBSCは量子コンピュータで実行または適応できる様々なアルゴリズムにおいて重要な役割を果たす。 n$-qubitの長さで効率的で一般化された比較器の開発は、コストのかかるフットプリントと量子遅延をもたらすため、長い間課題とされてきた。効率的な比較器は固定長の入力に関連付けられる。その結果、一般化回路を持たないコンパレータはより高レベルには適用できないが、サイズが制限された問題には適している。本稿では,2つのアンシラリービットのみを用いた2つのn$-qubit論理状態の比較のための一般化設計を提案する。設計は、量子ビット要求、補助ビット使用量、量子コスト、量子遅延、ゲート操作、回路の複雑さに基づいて検討され、様々な入力長で総合的にテストされる。この研究は量子アルゴリズムの設計における十分な柔軟性を可能にし、量子アルゴリズムの開発を加速することができる。 Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of efficient and generalized comparators for any $n$-qubit length has long posed a challenge, as they have a high-cost footprint and lead to quantum delays. Comparators that are efficient are associated with inputs of fixed length. As a result, comparators without a generalized circuit cannot be employed at a higher level, though they are well-suited for problems with limited size requirements. In this paper, we introduce a generalized design for the comparison of two $n$-qubit logic states using just two ancillary bits. The design is examined on the basis of qubit requirements, ancillary bit usage, quantum cost, quantum delay, gate operations, and circuit complexity, and is tested comprehensively on various input lengths. The work allows for sufficient flexibility in the design of quantum algorithms, which can accelerate quantum algorithm development.	翻訳日:2023-11-14 17:58:14 公開日:2023-11-11
# Swin UNETR++: 完全自動放射線腫瘍治療に向けたトランスフォーマーベースの高線量予測 Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments ( http://arxiv.org/abs/2311.06572v1 ) ライセンス: Link先を確認	Kuancheng Wang, Hai Siong Tan, Rafe Mcbeth	(参考訳) 放射線腫瘍学の分野は、がん治療のための放射線治療計画の作成を完全自動化するために人工知能を使用する利点がある。この時間的および専門的なタスクは、患者の画像と臓器と腫瘍のセグメンテーションを組み合わせて、3次元放射線線量分布を生成して臨床治療目標を満たす。そこで本研究では,swain unetr++を提案する。swain unetr++では,dcaモジュールを軽量化することにより,畳み込みニューラルネットワークが欠如している各患者固有の解剖学のボリューム内およびボリューム間関係をキャプチャする。私たちのモデルは、Open Knowledge-Based Planningデータセットでトレーニングされ、検証され、テストされました。 Dose Score $\overline{S_{\text{Dose}}}$およびDVH Score $\overline{S_{\text{DVH}}}$の計測値に加えて、予測された3D線量分布と地上の3D線量分布の差を定量的に測定する指標として、平均容積受入率$\overline{R_{\text{VA}}}$と平均臨床受入率$\overline{R_{\text{PA}}}$の定性測定値を提案し、予測の臨床的信頼性を評価する。 Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation. The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense prediction. In this work, we propose Swin UNETR++, that contains a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient's unique anatomy, which fully convolutional neural networks lack. Our model was trained, validated, and tested on the Open Knowledge-Based Planning dataset. In addition to metrics of Dose Score $\overline{S_{\text{Dose}}}$ and DVH Score $\overline{S_{\text{DVH}}}$ that quantitatively measure the difference between the predicted and ground-truth 3D radiation dose distribution, we propose the qualitative metrics of average volume-wise acceptance rate $\overline{R_{\text{VA}}}$ and average patient-wise clinical acceptance rate $\overline{R_{\text{PA}}}$ to assess the clinical reliability of the predictions. Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.	翻訳日:2023-11-14 17:57:54 公開日:2023-11-11
# 深部スパイクニューラルネットワークにおけるADD残差接続との比較精度を実現するOR残差接続 OR Residual Connection Achieving Comparable Accuracy to ADD Residual Connection in Deep Residual Spiking Neural Networks ( http://arxiv.org/abs/2311.06570v1 ) ライセンス: Link先を確認	Yimeng Shan, Xuerui Qiu, Rui-jie Zhu, Ruike Li, Meng Wang, Haicheng Qu	(参考訳) スパイキングニューラルネットワーク(SNN)は、その生物学的忠実さとエネルギー効率のよいスパイク駆動操作を実行する能力のために、脳のような計算にかなりの注意を払っている。 snnのパフォーマンス向上への需要が高まるにつれ、深層ネットワークのトレーニングへの傾向は必然的となり、残りの学習はディープニューラルネットワークのトレーニングの重要な方法となっている。調査では,深部スパイクニューラルネットワークの代表であるSEW-ResNetが非イベント駆動の操作を取り入れていることを確認した。これを修正するために、アーキテクチャにORRC(OR Residual Connect)を導入します。さらに,高量子化によるエネルギー損失を相殺するために,抑制注意(ia)モジュールと多次元注意(ma)モジュールの融合であるsynaモジュールを提案する。ネットワークにSynAを組み込むと、トレーニング後、モデルの分類精度に影響を与えることなく、ネットワーク内のショートカットの一部または全部が自然に消えてしまう「自然なプルーニング」現象が観察された。これにより計算オーバーヘッドが大幅に削減され、エッジデバイスへのデプロイに適している。様々な公開データセットを用いた実験の結果、syna強化またはスピーキングresnetはニューロン当たり0.8スパイクの少ない単一サンプル分類を達成した。さらに, 他のスパイク残差モデルと比較すると, 精度が高く, 消費電力も低かった。コードはhttps://github.com/Ym-Shan/ORRC-SynA-natural-pruningで公開されている。 Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. As the demand for heightened performance in SNNs surges, the trend towards training deeper networks becomes imperative, while residual learning stands as a pivotal method for training deep neural networks. In our investigation, we identified that the SEW-ResNet, a prominent representative of deep residual spiking neural networks, incorporates non-event-driven operations. To rectify this, we introduce the OR Residual connection (ORRC) to the architecture. Additionally, we propose the Synergistic Attention (SynA) module, an amalgamation of the Inhibitory Attention (IA) module and the Multi-dimensional Attention (MA) module, to offset energy loss stemming from high quantization. When integrating SynA into the network, we observed the phenomenon of "natural pruning", where after training, some or all of the shortcuts in the network naturally drop out without affecting the model's classification accuracy. This significantly reduces computational overhead and makes it more suitable for deployment on edge devices. Experimental results on various public datasets confirmed that the SynA enhanced OR-Spiking ResNet achieved single-sample classification with as little as 0.8 spikes per neuron. Moreover, when compared to other spike residual models, it exhibited higher accuracy and lower power consumption. Codes are available at https://github.com/Ym-Shan/ORRC-SynA-natural-pruning.	翻訳日:2023-11-14 17:56:58 公開日:2023-11-11
# SCADI:潜在変数モデルにおける自己教師付き因果解離 SCADI: Self-supervised Causal Disentanglement in Latent Variable Models ( http://arxiv.org/abs/2311.06567v1 ) ライセンス: Link先を確認	Heejeong Nam	(参考訳) 因果的な絡み合いは複雑な状況を取り込む大きな可能性を秘めている。しかし、実用的で効率的なアプローチが欠けている。教師なしの解離解離法の多くは、追加情報なしでは識別可能な結果が得られず、しばしばランダムに解離する出力をもたらすことが知られている。したがって、既存の解離モデルのほとんどは、過大なコストを発生させる内在的要因に関する情報を提供する弱教師付きモデルである。そこで本研究では,意味的要因の発見と因果関係の学習を可能にする新しいモデルであるSCADI(Self-supervised CAusal DIsentanglement)を提案する。本モデルでは, マスク型構造因果モデル (SCM) と疑似ラベル生成器を組み合わせることで, 自己監督型因果解離モデルに新たな方向性を提供する。 Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models.	翻訳日:2023-11-14 17:56:33 公開日:2023-11-11
# convolve and conquer: wiener filterとのデータ比較 Convolve and Conquer: Data Comparison with Wiener Filters ( http://arxiv.org/abs/2311.06558v1 ) ライセンス: Link先を確認	Deborah Pelacani Cruz, George Strong, Oscar Bates, Carlos Cueto, Jiashun Yao, Lluis Guasch	(参考訳) データサンプル間の差異および/または類似性の定量的評価は、学習データ分布に関連する形状最適化問題を定義する。現在のデータ比較法は、そのような分布を捉える際の制限や最適化に望ましい数学的性質(例えば、滑らかさ、微分可能性、凸性)を欠くことが多い。本稿では,Wiener-filter理論にインスパイアされたペアサンプル間の相似性を測定する新しい手法を提案する。 Wienerフィルタの畳み込み特性により、グローバルに相関した方法でデータサンプルを包括的に比較できる。データ圧縮、医用画像計算、翻訳分類、非パラメトリック生成モデリングの4つの機械学習応用において、我々のアプローチを検証する。その結果,従来の平均二乗誤り類似実装と比較して,知覚品質とデータ忠実度が向上し,翻訳に対する堅牢性も向上した。 Quantitative evaluations of differences and/or similarities between data samples define and shape optimisation problems associated with learning data distributions. Current methods to compare data often suffer from limitations in capturing such distributions or lack desirable mathematical properties for optimisation (e.g. smoothness, differentiability, or convexity). In this paper, we introduce a new method to measure (dis)similarities between paired samples inspired by Wiener-filter theory. The convolutional nature of Wiener filters allows us to comprehensively compare data samples in a globally correlated way. We validate our approach in four machine learning applications: data compression, medical imaging imputation, translated classification, and non-parametric generative modelling. Our results demonstrate increased resolution in reconstructed images with better perceptual quality and higher data fidelity, as well as robustness against translations, compared to conventional mean-squared-error analogue implementations.	翻訳日:2023-11-14 17:56:18 公開日:2023-11-11
# Heuristics-Driven Link-of-Analogy Prompting:Document-Level Event Argument extractのための大規模言語モデルの強化 Heuristics-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction ( http://arxiv.org/abs/2311.06555v1 ) ライセンス: Link先を確認	Hanzhang Zhou, Junlang Qian, Zijian Feng, Hui Lu, Zixiao Zhu, Kezhi Mao	(参考訳) 本研究では,文書レベルのイベント引数抽出(EAE)における文脈内学習(ICL)について検討する。本論文では,この問題の課題として,例の選択,文脈長の制限,イベントの多量化,非推論タスクにおけるチェーン・オブ・ソート(CoT)の制限などを挙げる。これらの課題に対処するために,Huristic-Driven Link-of-Analogy (HD-LoA) プロンプト手法を提案する。具体的には、LCM が ICL による実演からタスク固有のヒューリスティックを学ぶことを仮定し、検証する。この仮説に基づいて,haphazardサンプル選択プロセスをタスクヒューリスティックを強調する方法論的手法に変換する,明示的なヒューリスティック駆動型実証構築手法を提案する。さらに,人間のアナロジー推論に触発されて,llmが既知の状況にアナロジーを描き,適応性を高め,新たな状況の処理を可能にするリンク・オブ・アナロジー・プロンプトを提案する。実験の結果,本手法は既存のプロンプト法や数発の教師あり学習法よりも優れており,文書レベルのAEデータセットではF1スコアが4.53%,9.38%向上した。さらに感情分析や自然言語推論タスクに適用すると、hd-loaプロンプトは2.87%と2.63%の精度向上を達成し、異なるタスク間での有効性を示している。 In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE). The paper identifies key challenges in this problem, including example selection, context length limitation, abundance of event types, and the limitation of Chain-of-Thought (CoT) prompting in non-reasoning tasks. To address these challenges, we introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting method. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their adaptability. Extensive experiments show that our method outperforms the existing prompting methods and few-shot supervised learning methods, exhibiting F1 score improvements of 4.53% and 9.38% on the document-level EAE dataset. Furthermore, when applied to sentiment analysis and natural language inference tasks, the HD-LoA prompting achieves accuracy gains of 2.87% and 2.63%, indicating its effectiveness across different tasks.	翻訳日:2023-11-14 17:56:01 公開日:2023-11-11
# 複雑な相互作用ダイナミクスをモデル化するための因子化プロトタイプ付きグラフODE Graph ODE with Factorized Prototypes for Modeling Complicated Interacting Dynamics ( http://arxiv.org/abs/2311.06554v1 ) ライセンス: Link先を確認	Xiao Luo, Yiyang Gu, Huiyu Jiang, Jinsheng Huang, Wei Ju, Ming Zhang, Yizhou Sun	(参考訳) 本稿では、物理力学や生物学的過程を理解する上で重要な相互作用力学系のモデリング問題について考察する。最近の研究は主に幾何学グラフを用いてこれらの相互作用を表現し、強力なグラフニューラルネットワーク(GNN)によってキャプチャされる。しかし、分散外シフトや複雑なルールといった困難なシナリオにおける相互作用のダイナミクスの予測は未解決である。本稿では,その問題に対処する因子化プロトタイプ(GOAT)を用いたグラフODEという新しい手法を提案する。 GOATの中核となるのは、コンテキスト知識から分解されたプロトタイプを連続グラフODEフレームワークに組み込むことである。具体的には、GOATでは、オブジェクトレベルのコンテキストとシステムレベルのコンテキストの両方を歴史的トラジェクトリから抽出するために、表現のゆがみとシステムパラメータを用いており、それによって、その独立した影響を明示的にモデル化し、システム変更時の一般化能力を高めることができる。そして,これらの非絡み合った潜在表現をグラフODEモデルに統合し,モデル表現性を高めるための様々な対話型プロトタイプの組み合わせを決定する。モデル全体は、確率を最大化するためにエンドツーエンドの変分推論フレームワークを使用して最適化される。分布域内および分布域外における広範囲な実験はヤギの優越性を検証する。 This paper studies the problem of modeling interacting dynamical systems, which is critical for understanding physical dynamics and biological processes. Recent research predominantly uses geometric graphs to represent these interactions, which are then captured by powerful graph neural networks (GNNs). However, predicting interacting dynamics in challenging scenarios such as out-of-distribution shift and complicated underlying rules remains unsolved. In this paper, we propose a new approach named Graph ODE with factorized prototypes (GOAT) to address the problem. The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT employs representation disentanglement and system parameters to extract both object-level and system-level contexts from historical trajectories, which allows us to explicitly model their independent influence and thus enhances the generalization capability under system changes. Then, we integrate these disentangled latent representations into a graph ODE model, which determines a combination of various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood. Extensive experiments in both in-distribution and out-of-distribution settings validate the superiority of GOAT.	翻訳日:2023-11-14 17:55:35 公開日:2023-11-11
# ビジュアルコモンセンスに基づく異種グラフコントラスト学習 Visual Commonsense based Heterogeneous Graph Contrastive Learning ( http://arxiv.org/abs/2311.06553v1 ) ライセンス: Link先を確認	Zongzhao Li, Xiangyu Zhu, Xi Zhang, Zhaoxiang Zhang, Zhen Lei	(参考訳) 視覚的質問応答 (VQA) のような多くのマルチモーダルアプリケーションにおいて、関連するキーオブジェクトの選択方法と複雑な関係性や言語領域の推論は2つの重要な問題である。本研究では,視覚的コモンセンス情報を組み込んで,視覚的推論タスクをより良く仕上げるための異種グラフコントラスト学習法を提案する。本手法はプラグイン・アンド・プレイ方式として設計されており,様々な代表手法と迅速かつ容易に組み合わせることができる。具体的には,コモンセンスに基づくコントラスト学習とグラフ関係ネットワークという2つの重要な構成要素を含む。コントラスト学習を用いて,識別対象と関連する視覚コモンセンス属性に焦点を絞ったモデルを指導する。さらに、グラフ関係ネットワークの導入により、同種エッジ間の相関関係と異種エッジ間の類似性に関するモデルが原因となり、情報伝達がより効果的になる。 4つのベンチマーク実験により,本手法は7つの代表的なVQAモデルを大幅に改善し,その有効性と一般化性を示した。 How to select relevant key objects and reason about the complex relationships cross vision and linguistic domain are two key issues in many multi-modality applications such as visual question answering (VQA). In this work, we incorporate the visual commonsense information and propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task. Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods. Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network. Using contrastive learning, we guide the model concentrate more on discriminative objects and relevant visual commonsense attributes. Besides, thanks to the introduction of the Graph Relation Network, the model reasons about the correlations between homogeneous edges and the similarities between heterogeneous edges, which makes information transmission more effective. Extensive experiments on four benchmarks show that our method greatly improves seven representative VQA models, demonstrating its effectiveness and generalizability.	翻訳日:2023-11-14 17:55:13 公開日:2023-11-11
# Stain Consistency Learning: 自動デジタル病理分類のためのStain Variationの扱い Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation ( http://arxiv.org/abs/2311.06552v1 ) ライセンス: Link先を確認	Michael Yeung, Todd Watts, Sean YW Tan, Pedro F. Ferreira, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang	(参考訳) Stain variationは、デジタル病理の自動解析にまつわるユニークな課題である。機械学習手法の頑健性を改善するために多くの方法が開発されているが、比較研究は性能に限定的な利点を示している。さらに, H&E染色データに対して, 分類タスクに限定して, 染色変化の処理方法が開発された。本稿では,染色色不変特徴を学習するために,染色特異的増強と染色一貫性損失関数を組み合わせた新しい枠組みである染色一貫性学習を提案する。セグメンテーションタスクにおける染色変化に対処する方法について,まず第1回,広範な比較を行い,マッソンのトリクロムとh&e染色セルと核データセットについてそれぞれ10の方法を比較した。染色の正常化法では同等か劣る性能が得られたが, 染色増補法や染色逆行法では性能が向上し, 提案手法により一貫して最高の性能が得られた。コードは、https://github.com/mlyg/stain_consistency_learningで入手できる。 Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning	翻訳日:2023-11-14 17:54:53 公開日:2023-11-11
# FDNet:歯のCBCT画像のための特徴分離セグメンテーションネットワーク FDNet: Feature Decoupled Segmentation Network for Tooth CBCT Image ( http://arxiv.org/abs/2311.06551v1 ) ライセンス: Link先を確認	Xiang Feng, Chengkai Wang, Chengyu Wu, Yunxiang Li, Yongbo He, Shuai Wang, Yaiqi Wang	(参考訳) 精密歯列ビームCT(CBCT)画像分割は矯正治療計画に不可欠である。本稿では, CBCTスキャンで遭遇する歯質変化状況, 複雑なアーチファクトや不明瞭な歯の境界などに対して, FDNet(Feature Decoupled Segmentation Network, FDNet)を提案する。低周波ウェーブレット変換 (LF-Wavelet) は, 歯のグローバルな構造的整合性を強調することで, セマンティックな内容の充実を図り, SAMエンコーダを用いて境界線を改良し, 隣接する歯科構造とのコントラストを向上させる。これらの2つの側面を統合することで、FDNetはセマンティックギャップに十分対処し、詳細で正確なセグメンテーションを提供する。フレームワークの有効性は厳格なベンチマークによって検証され、それぞれ85.28%と75.23%のDiceとIoUのスコアを達成している。この意味的特徴と境界的特徴の革新的な分離は、各要素のユニークな強みを生かし、セグメンテーション性能を著しく向上させる。 Precise Tooth Cone Beam Computed Tomography (CBCT) image segmentation is crucial for orthodontic treatment planning. In this paper, we propose FDNet, a Feature Decoupled Segmentation Network, to excel in the face of the variable dental conditions encountered in CBCT scans, such as complex artifacts and indistinct tooth boundaries. The Low-Frequency Wavelet Transform (LF-Wavelet) is employed to enrich the semantic content by emphasizing the global structural integrity of the teeth, while the SAM encoder is leveraged to refine the boundary delineation, thus improving the contrast between adjacent dental structures. By integrating these dual aspects, FDNet adeptly addresses the semantic gap, providing a detailed and accurate segmentation. The framework's effectiveness is validated through rigorous benchmarks, achieving the top Dice and IoU scores of 85.28% and 75.23%, respectively. This innovative decoupling of semantic and boundary features capitalizes on the unique strengths of each element to significantly elevate the quality of segmentation performance.	翻訳日:2023-11-14 17:54:29 公開日:2023-11-11
# 周期駆動型散逸性超低温原子における非エルミート皮膚効果 Non-Hermitian Skin Effect In Periodically-Driven Dissipative Ultracold Atoms ( http://arxiv.org/abs/2311.06550v1 ) ライセンス: Link先を確認	Zhao-Fan Cai and Tao Liu and Zhongmin Yang	(参考訳) 非エルミートスキン効果(英語版)(NHSE)は、バルクバンド固有状態が系の局所的な境界モードに崩壊することを特徴とするもので、非エルミート物理学の分野において最も顕著な性質の1つである。 NHSEに関する特異な物理現象は多くの関心を集めているが、実験的な実現には通常は非相互ホッピングが必要であり、超低温原子系の大きな課題に直面している。本研究では, 周期的に駆動される超低温原子による1次元光学格子中のNHSEを実現することを提案する。高周波近似における有効フロッケハミルトニアンの研究により、周期駆動によるnhseのメカニズムを明らかにした。その結果,ロバストなnhseは動的局在によって表される駆動位相によって調整できることがわかった。最も注目すべきは、異なる駆動相を持つ2つの結合鎖に対する周期的駆動による臨界皮膚効果を明らかにし、サイズ依存性の位相的in-gapモードの出現を伴っていることである。本研究は,超古原子系における非ヘルミティシティと多体統計の相互作用により,nhseを観測し,それに対応する特異な物理現象を探索するための可能な方法を提供する。 The non-Hermitian skin effect (NHSE), featured by the collapse of bulk-band eigenstates into the localized boundary modes of the systems, is one of most striking properties in the fields of non-Hermitian physics. Unique physical phenomena related to the NHSE have attracted a lot of interest, however, their experimental realizations usually require nonreciprocal hopping, which faces a great challenge in ultracold-atom systems. In this work, we propose to realize the NHSE in a 1D optical lattice by periodically-driven ultracold atoms in the presence of staggered atomic loss. By studying the effective Floquet Hamiltonian in the high-frequency approximation, we reveal the underlying mechanism for the periodic-driving-induced the NHSE. We found that the robust NHSE can be tuned by driving phase, which is manifested by the dynamical localization. Most remarkably, we uncover the periodic-driving-induced critical skin effect for two coupled chains with different driving phases, accompanied by the appearance of size-dependent topological in-gap modes. Our studies provide a feasible way for observing the NHSE and exploring corresponding unique physical phenomena due to the interplay of non-Hermiticity and many-body statistics in ultracold-atom systems.	翻訳日:2023-11-14 17:54:08 公開日:2023-11-11
# Back to Basics: 反復アルゴリズムの高速化 Back to Basics: Fast Denoising Iterative Algorithm ( http://arxiv.org/abs/2311.06634v1 ) ライセンス: Link先を確認	Deborah Pereg	(参考訳) ノイズ低減のための高速反復アルゴリズムであるBack to Basics (BTB)を紹介する。本手法は計算効率が高く, 訓練や基礎的真理データを必要としないため, 独立した雑音が存在する場合や, 雑音レベルが不明な相関音(コヒーレント)にも適用できる。光コヒーレンストモグラフィ(OCT)における白色ガウス雑音の存在下での自然像,ポアソン分布画像デノイング,スペックル抑制の3症例について検討した。実験結果から,提案手法は画像品質を効果的に向上できることが示された。収束安定性に関する理論的保証が提供される。 We introduce Back to Basics (BTB), a fast iterative algorithm for noise reduction. Our method is computationally efficient, does not require training or ground truth data, and can be applied in the presence of independent noise, as well as correlated (coherent) noise, where the noise level is unknown. We examine three study cases: natural image denoising in the presence of additive white Gaussian noise, Poisson-distributed image denoising, and speckle suppression in optical coherence tomography (OCT). Experimental results demonstrate that the proposed approach can effectively improve image quality, in challenging noise settings. Theoretical guarantees are provided for convergence stability.	翻訳日:2023-11-14 17:46:23 公開日:2023-11-11
# 精神医学検出アプリケーション、特にうつ病障害における機械学習と解釈可能な機械学習手法の活用の課題と問題点 The Pros and Cons of Using Machine Learning and Interpretable Machine Learning Methods in psychiatry detection applications, specifically depression disorder: A Brief Review ( http://arxiv.org/abs/2311.06633v1 ) ライセンス: Link先を確認	Hossein Simchi, Samira Tajik	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックにより、多くの人々が社会的活動を制限することを余儀なくされ、精神疾患、特にうつ病が増加した。これらの病気を精度とスピードで診断し、自殺などの重篤な結果を防ぐため、機械学習の利用がますます重要になっている。さらに、より良い治療のために正確で理解可能な診断を提供するためには、AI科学者と研究者は、解釈可能なAIベースのソリューションを開発する必要がある。本稿では、機械学習と解釈可能なAIの分野における関連記事の概要を紹介し、精神疾患検出アプリケーションでAIを使用することの利点とデメリットを理解するのに役立つ。 The COVID-19 pandemic has forced many people to limit their social activities, which has resulted in a rise in mental illnesses, particularly depression. To diagnose these illnesses with accuracy and speed, and prevent severe outcomes such as suicide, the use of machine learning has become increasingly important. Additionally, to provide precise and understandable diagnoses for better treatment, AI scientists and researchers must develop interpretable AI-based solutions. This article provides an overview of relevant articles in the field of machine learning and interpretable AI, which helps to understand the advantages and disadvantages of using AI in psychiatry disorder detection applications.	翻訳日:2023-11-14 17:46:04 公開日:2023-11-11
# スパース正定値行列の特定のクラスの正確な決定式 The Exact Determinant of a Specific Class of Sparse Positive Definite Matrices ( http://arxiv.org/abs/2311.06632v1 ) ライセンス: Link先を確認	Mehdi Molkaraie	(参考訳) スパースガウス図形モデルの特定のクラスに対して、共分散行列の行列式に対する閉形式解を提供する。私たちのフレームワークでは、グラフィカル相互作用モデル(すなわち共分散選択モデル)は$\mathcal{K}_{n}$と$\mathcal{K}_{n-1}$の置換積に等しい。この解析は、正規因子グラフ双対定理とホログラフィックアルゴリズムの応用と見なすことができるモデルの局所因子のフーリエ変換を基礎としている。変換されたグラフィカルモデルに行列行列式Lemmaを適用することにより、クローズドフォーム表現を得る。この文脈では、2つのガウスのグラフィカルモデル間の同値の概念も定義する。 For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.	翻訳日:2023-11-14 17:45:41 公開日:2023-11-11
# 画像品質伝達のための3次元条件拡散モデル -低磁場MRIへの応用- A 3D Conditional Diffusion Model for Image Quality Transfer -- An Application to Low-Field MRI ( http://arxiv.org/abs/2311.06631v1 ) ライセンス: Link先を確認	Seunghoi Kim, Henry F. J. Tregidgo, Ahmed K. Eldaly, Matteo Figini, Daniel C. Alexander	(参考訳) 低磁場(lf)mriスキャナー(<1t)は、限られたリソースや信頼性の低い電源でまだ普及している。しかし、高磁場(HF)スキャナよりも空間分解能とコントラストの低い画像が得られることが多い。この品質格差は、不正確な臨床解釈をもたらす可能性がある。画像品質伝達(IQT)は,低画質画像と高画質画像のマッピング関数を学習することにより,画像の品質を高めるために開発された。既存のIQTモデルは、しばしば高周波の特徴の復元に失敗し、ぼやけた出力をもたらす。本稿では,3次元ボリュームデータ,特にLF MR画像を改善するための3次元条件拡散モデルを提案する。さらに,ネットワークの自己注意とパディングにクロスバッチ機構を組み込んで,小さな3Dパッチの下でもより広いコンテキスト認識を確保する。 IQTと脳解析のためのHuman Connectome Project(HCP)データセットの実験は、我々のモデルが既存の手法よりも定量的かつ質的に優れていることを示した。コードは \url{https://github.com/edshkim98/DiffusionIQT} で公開されている。 Low-field (LF) MRI scanners (<1T) are still prevalent in settings with limited resources or unreliable power supply. However, they often yield images with lower spatial resolution and contrast than high-field (HF) scanners. This quality disparity can result in inaccurate clinician interpretations. Image Quality Transfer (IQT) has been developed to enhance the quality of images by learning a mapping function between low and high-quality images. Existing IQT models often fail to restore high-frequency features, leading to blurry output. In this paper, we propose a 3D conditional diffusion model to improve 3D volumetric data, specifically LF MR images. Additionally, we incorporate a cross-batch mechanism into the self-attention and padding of our network, ensuring broader contextual awareness even under small 3D patches. Experiments on the publicly available Human Connectome Project (HCP) dataset for IQT and brain parcellation demonstrate that our model outperforms existing methods both quantitatively and qualitatively. The code is publicly available at \url{https://github.com/edshkim98/DiffusionIQT}.	翻訳日:2023-11-14 17:45:01 公開日:2023-11-11
# エネルギー移行シナリオの合理化と鍵政策決定 Streamlining Energy Transition Scenarios to Key Policy Decisions ( http://arxiv.org/abs/2311.06625v1 ) ライセンス: Link先を確認	Florian Joseph Baader, Stefano Moret, Wolfram Wiesemann, Iain Staffell, Andr\'e Bardow	(参考訳) エネルギー移行を取り巻く不確実性は、モデラーが政策立案者が解釈し行動することが困難となるような大きなシナリオを提示することにつながる。もう1つのアプローチは、利害関係者の議論からいくつかの質的なストーリーラインを定義することである。一般的な機械学習手法である決定木を活用することで、多くの定量的シナリオから解釈可能なストーリーラインを導き、エネルギー遷移における重要な決定がどのようにリンクされているかを示す。特に, 再生可能エネルギーとセクタ結合の高度展開を選択することで, 気候変動の感度や需要の不確実性に対して, 世界的な脱炭素シナリオが堅牢になることを示す。また、化石のないヨーロッパへのエネルギー移動は、主にバイオエネルギー、貯蔵、熱電化の役割の選択によって決定される。我々の移行可能なアプローチは、膨大なエネルギーモデルの結果を小さな決定セットに変換し、エネルギー遷移を形成する主要な要因を優先順位付けする決定を導く。 Uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. An alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. Specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. Also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. Our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.	翻訳日:2023-11-14 17:44:41 公開日:2023-11-11
# VT-Former:インテリジェントハイウェイ交通システムのためのトランスフォーマーベース車両軌道予測手法 VT-Former: A Transformer-based Vehicle Trajectory Prediction Approach For Intelligent Highway Transportation Systems ( http://arxiv.org/abs/2311.06623v1 ) ライセンス: Link先を確認	Armin Danesh Pazho, Vinit Katariya, Ghazal Alinezhad Noghre, Hamed Tabkhi	(参考訳) 道路の安全性と交通管理の強化は、現代のサイバー物理システムやインテリジェントな輸送システムにとって重要な焦点となっている。自動車軌道予測は、高速道路や道路安全への多くの応用において重要な要素である。これらのアプリケーションには、交通管理や事故防止からワークゾーンの安全性の向上、エネルギー保全の最適化に至るまで、幅広いユースケースが含まれている。この文脈でインテリジェントな管理を実現する能力は、道路網を横断する監視カメラの展開とともに、人工知能(ai)の分野での発展によって大きく進歩した。本稿では,高速道路の安全と監視のための車両軌道予測のためのトランスフォーマーに基づく新しいアプローチ,VT-Formerを提案する。トランスフォーマを使用して長距離の時間パターンを捉えることに加えて、車両間の複雑な社会的相互作用を捉えるために、新しいグラフ注意トークン化(gat)モジュールが提案されている。これら2つのコアコンポーネントを組み合わせることで、車両軌道予測の正確なアプローチが達成される。車両軌道予測におけるVT-Formerの性能と,その一般化性とロバスト性を示す3つの異なる視点を持つ3つのベンチマークデータセットについて検討した。また,組込み基板上でのvt-formerの効率を評価し,サンプルアプリケーションとしての車両異常検出の可能性について検討し,その幅広い適用性を示す。 Enhancing roadway safety and traffic management has become an essential focus area for a broad range of modern cyber-physical systems and intelligent transportation systems. Vehicle Trajectory Prediction is a pivotal element within numerous applications for highway and road safety. These applications encompass a wide range of use cases, spanning from traffic management and accident prevention to enhancing work-zone safety and optimizing energy conservation. The ability to implement intelligent management in this context has been greatly advanced by the developments in the field of Artificial Intelligence (AI), alongside the increasing deployment of surveillance cameras across road networks. In this paper, we introduce a novel transformer-based approach for vehicle trajectory prediction for highway safety and surveillance, denoted as VT-Former. In addition to utilizing transformers to capture long-range temporal patterns, a new Graph Attentive Tokenization (GAT) module has been proposed to capture intricate social interactions among vehicles. Combining these two core components culminates in a precise approach for vehicle trajectory prediction. Our study on three benchmark datasets with three different viewpoints demonstrates the State-of-The-Art (SoTA) performance of VT-Former in vehicle trajectory prediction and its generalizability and robustness. We also evaluate VT-Former's efficiency on embedded boards and explore its potential for vehicle anomaly detection as a sample application, showcasing its broad applicability.	翻訳日:2023-11-14 17:44:05 公開日:2023-11-11
# TrainerAgent: LLM搭載マルチエージェントシステムによるカスタマイズ可能かつ効率的なモデルトレーニング TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System ( http://arxiv.org/abs/2311.06622v1 ) ライセンス: Link先を確認	Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He	(参考訳) AIモデルのトレーニングは、特にパーソナライズされたサービスを提供するカスタムモデルが必要な場合、常に困難だった。アルゴリズムエンジニアは、特定のビジネス要件に合わせて反復的にモデルを開発するための長いプロセスに直面します。高品質で効率的なモデル開発の探求は、大規模言語モデル(llm)エージェントの出現とともに、業界において重要な焦点となっている。 LLMの強力な分析,計画,意思決定機能を活用し,タスク,データ,モデル,サーバエージェントを含むマルチエージェントフレームワークからなるTranerAgentシステムを提案する。これらのエージェントは、ユーザ定義のタスク、入力データ、要求(例えば、精度、速度)を分析し、データとモデルの両方の観点から包括的な最適化を行い、満足なモデルを取得し、最終的にこれらのモデルをオンラインサービスとしてデプロイする。コンピュータビジョンおよび自然言語処理領域における古典的識別的・生成的タスクに関する実験的評価は,我々のシステムが所望の基準を満たすモデルを一貫して生成していることを示す。さらに、システムは、ファンタスティックなシナリオや非倫理的な要求など、達成不可能なタスクを批判的に識別し、拒否する能力を示し、堅牢性と安全性を確保する。本研究は, LLMを用いた分析, 意思決定, 実行能力の統合, および4つのエージェント間の協調により, 従来のモデル開発と比較して, 効率と品質が向上した望ましいモデルの実現において, 大幅な進歩を示すものである。我々は,AI分野におけるモデル開発の新たなパラダイムとして,学術および産業コミュニティにおけるTranerAgentの研究の進展に,我々の研究が貢献することを期待している。 Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.	翻訳日:2023-11-14 17:43:45 公開日:2023-11-11
# 粗粒土壌の粒子径解析のためのコンピュータビジョン Computer Vision for Particle Size Analysis of Coarse-Grained Soils ( http://arxiv.org/abs/2311.06613v1 ) ライセンス: Link先を確認	Sompote Youwai and Parchya Makam	(参考訳) 粒子径解析(PSA)は土壌の物理的特性を評価するための基礎技術である。しかし、占いのような伝統的な方法は時間と労力がかかる。本研究では,コンピュータビジョン(CV)と,粗粒土のPSAのためのPythonプログラム言語を用いた携帯電話カメラを用いた新しいアプローチを提案する。高性能カメラの必要性をなくすことで, 利便性とコスト削減を実現する。本手法では,通常の照明条件下で撮影されたデジタル写真中の土壌粒子の検出と測定にOPENCVライブラリを使用する。正確な粒子径決定のために、既知の寸法のキャリブレーションターゲットを20種類の異なる砂サンプルと共に平らな紙に配置する。提案手法は従来のシーブ解析と比較し, 平均絶対誤差(MAPE)が約6%の2mm以上の土壌粒子に対して良好な性能を示した。しかし、粒子が2mmより小さいと、メーゼが高くなり、最大60%に達する。この制限に対処するために,より小型の土壌粒子の画像を高分解能カメラで撮影することを推奨する。さらに,本手法の利点,限界,今後の改善の可能性についても論じる。驚くべきことに、このプログラムは携帯電話で実行でき、土壌サンプルを実験室に送ることなくすぐに結果を提供できる。このフィールドフレンドリーな特徴は,従来の実験室環境以外での現場利用に非常に便利である。最終的に、この新手法は、実験室によるシーブ解析に頼らずに、土壌の効率的な粒径分析を可能にする産業の初期破壊を表す。 KEYWORDS:コンピュータビジョン、粒度、ARUCO Particle size analysis (PSA) is a fundamental technique for evaluating the physical characteristics of soils. However, traditional methods like sieving can be time-consuming and labor-intensive. In this study, we present a novel approach that utilizes computer vision (CV) and the Python programming language for PSA of coarse-grained soils, employing a standard mobile phone camera. By eliminating the need for a high-performance camera, our method offers convenience and cost savings. Our methodology involves using the OPENCV library to detect and measure soil particles in digital photographs taken under ordinary lighting conditions. For accurate particle size determination, a calibration target with known dimensions is placed on a plain paper alongside 20 different sand samples. The proposed method is compared with traditional sieve analysis and exhibits satisfactory performance for soil particles larger than 2 mm, with a mean absolute percent error (MAPE) of approximately 6%. However, particles smaller than 2 mm result in higher MAPE, reaching up to 60%. To address this limitation, we recommend using a higher-resolution camera to capture images of the smaller soil particles. Furthermore, we discuss the advantages, limitations, and potential future improvements of our method. Remarkably, the program can be executed on a mobile phone, providing immediate results without the need to send soil samples to a laboratory. This field-friendly feature makes our approach highly convenient for on-site usage, outside of a traditional laboratory setting. Ultimately, this novel method represents an initial disruption to the industry, enabling efficient particle size analysis of soil without the reliance on laboratory-based sieve analysis. KEYWORDS: Computer vision, Grain size, ARUCO	翻訳日:2023-11-14 17:43:15 公開日:2023-11-11
# 知覚GPT:視覚知覚をLLMに効果的に融合させる PerceptionGPT: Effectively Fusing Visual Perception into LLM ( http://arxiv.org/abs/2311.06612v1 ) ライセンス: Link先を確認	Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang	(参考訳) 視覚入力と大言語モデル(LLM)の統合は、多モーダル機能において顕著な進歩をもたらし、視覚的大言語モデル(VLLM)がもたらされた。しかしながら、複雑な視覚知覚タスクにVLLMを効果的に活用することは課題である。本稿では,LLMのトークン埋め込みの表現力を生かして,VLLMを視覚的知覚能力に効率よく効果的に装備する,PerceptionGPTという新しいエンドツーエンドフレームワークを提案する。提案手法は, LLMのトークン埋め込みを空間情報のキャリアとして扱い, 軽量な視覚タスクエンコーダとデコーダを利用して視覚知覚タスク(例えば, 検出, セグメンテーション)を実行する。このアプローチは,視覚出力を離散的なトークンとして定式化した従来のアプローチが経験したトレーニングの難しさを著しく軽減し,トレーニング可能なパラメータが少なく,トレーニングデータが少なく,トレーニング時間の短縮によって優れたパフォーマンスを実現する。さらに、視覚的出力をデコードするために1つのトークン埋め込みが必要なため、推論中のシーケンス長が大幅に削減される。これにより,高精度かつ柔軟な表現,視覚知覚タスクのシームレスな統合,複数の視覚出力の効率的な処理が可能となる。我々はこのアプローチの有効性と効率を広範囲な実験によって検証する。その結果、トレーニング可能なパラメータやGPU時間を大幅に削減した従来の手法よりも大幅に改善され、視覚的知覚能力を持つLLMの実現に向けた今後の研究が促進された。 The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate visual perception tasks remains a challenge. In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding. Our proposed method treats the token embedding of the LLM as the carrier of spatial information, then leverage lightweight visual task encoders and decoders to perform visual perception tasks (e.g., detection, segmentation). Our approach significantly alleviates the training difficulty suffered by previous approaches that formulate the visual outputs as discrete tokens, and enables achieving superior performance with fewer trainable parameters, less training data and shorted training time. Moreover, as only one token embedding is required to decode the visual outputs, the resulting sequence length during inference is significantly reduced. Consequently, our approach enables accurate and flexible representations, seamless integration of visual perception tasks, and efficient handling of a multiple of visual outputs. We validate the effectiveness and efficiency of our approach through extensive experiments. The results demonstrate significant improvements over previous methods with much fewer trainable parameters and GPU hours, which facilitates future research in enabling LLMs with visual perception abilities.	翻訳日:2023-11-14 17:42:52 公開日:2023-11-11
# monkey: 画像解像度とテキストラベルは、大規模マルチモーダルモデルにとって重要だ Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models ( http://arxiv.org/abs/2311.06607v1 ) ライセンス: Link先を確認	Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai	(参考訳) 大規模なマルチモーダルモデルは、一般的な視覚言語タスクを理解する素晴らしい能力を示している。しかし、サポート対象の入力解像度(例えば448 x 448)の制限と、トレーニングされた画像テキストペアの説明不足のため、これらのモデルは複雑なシーン理解や物語を扱う際の課題に直面することが多い。ここでは猿を提案することでこの問題に対処します。私たちの貢献は2つあります。 1) 初期から事前学習することなく,既存の視覚エンコーダ(例えばvit-bighuge)上に構築することで,最大896 x 1344ピクセルの入力解像度を効果的に向上させることができる。 2)シーンとオブジェクト間の文脈関係を学習するために,モデルをガイドできるリッチな情報を自動的に提供する多レベル記述生成手法を提案する。 16以上の異なるデータセットにわたる広範なテストの結果、Monkeyは画像キャプチャ、一般的なビジュアル質問回答(VQA)、ドキュメント指向のVQAといった基本的なタスクにおいて、既存のLMMよりも一貫して競争力のあるパフォーマンスを実現しています。モデル、インタラクティブなデモ、ソースコードは以下の https://github.com/Yuliang-Liu/Monkey で提供されている。 Large Multimodal Models have demonstrated impressive capabilities in understanding general vision-language tasks. However, due to the limitation of supported input resolution (e.g., 448 x 448) as well as the inexhaustive description of the training image-text pair, these models often encounter challenges when dealing with intricate scene understandings and narratives. Here we address the problem by proposing the Monkey. Our contributions are two-fold: 1) without pretraining from the start, our method can be built upon an existing vision encoder (e.g., vit-BigHuge) to effectively improve the input resolution capacity up to 896 x 1344 pixels; 2) we propose a multi-level description generation method, which automatically provides rich information that can guide model to learn contextual association between scenes and objects. Our extensive testing across more than 16 distinct datasets reveals that Monkey achieves consistently competitive performance over the existing LMMs on fundamental tasks, such as Image Captioning, General Visual Question Answering (VQA), and Document-oriented VQA. Models, interactive demo, and the source code are provided at the following https://github.com/Yuliang-Liu/Monkey.	翻訳日:2023-11-14 17:42:27 公開日:2023-11-11
# 粒子の位置と運動量の連続的同時測定」へのコメント Comment on "Continuous simultaneous measurement of position and momentum of a particle" ( http://arxiv.org/abs/2311.06606v1 ) ライセンス: Link先を確認	Ad\'elcio C. Oliveira	(参考訳) 最近の論文 [gampel, f. and gajda, m., phys. rev. a 107, 012420, (2023)] では、量子領域における古典的軌道の存在を説明する新しいモデルを提案していると主張した。このアイデアは、位置と運動量の同時測定と「ジャンプマルコフ過程」に基づいている。その結果、古典的軌跡の出現を検出イベントの集合として解釈した。彼らは自由粒子と調和ポテンシャルの下でのモデルの実装に成功した。ここでは,連続観測限界がコヒーレント半古典的展開の実現であることを示す。また,すでに証明されているように,ジャンププロセスは不要であり,観測不能である。言い換えれば、崩壊は非ゴーの定理であり、たとえそれが現実であるとしても、ニュートン古典力学を得るために必要な仮定の下で測定することはできない。 In a recent paper, [Gampel, F. and Gajda, M., Phys. Rev. A 107, 012420, (2023)], the authors claimed they are proposing a new model to explain the existence of classical trajectories in the quantum domain. The idea is based on simultaneous position and momentum measurements and a "jump Markov process". Consequently, they have interpreted the emergence of classical trajectories as sets of detection events. They successfully implemented the model for a free particle and for one under a harmonic potential. Here, we show that the continuous observation limit is a realization of a coherent semiclassical expansion; Also, as has already been demonstrated, the jump process is not necessary and is not observable. In other words, the collapse, as they propose, is a non-go theorem; even if it is real, it can not be measured under the needed assumptions to obtain Newtonian classical dynamics.	翻訳日:2023-11-14 17:42:05 公開日:2023-11-11
# BizBench:ビジネスとファイナンスのための定量的推論ベンチマーク BizBench: A Quantitative Reasoning Benchmark for Business and Finance ( http://arxiv.org/abs/2311.06602v1 ) ライセンス: Link先を確認	Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner	(参考訳) 大規模言語モデル(LLM)が多くの複雑なドメインに影響を与えるにつれ、公正で正確で厳密な評価ベンチマークを持つことがますます重要になっている。ビジネスおよび金融NLPに必要な推論スキルを評価することは、特に難しい課題である。実存的な金融問題に対するモデルの判断能力を評価するための新しいベンチマークであるbizbenchを紹介する。 BizBenchは8つの量的推論タスクからなる。特に、BizBenchは、プログラム合成(コード生成)による構造化および非構造化の財務データに対する質問応答(QA)の複雑なタスクをターゲットにしている。本稿では,新たに収集および拡張されたQAデータから,金融をテーマとした3つのコード生成タスクを紹介する。さらに,これらの課題を解決するために必要な財務的推論能力を分離する: 正しい中間値を抽出するために必要な財務的テキストと表の理解を読むこと,複雑な解を計算するために必要なドメイン知識(例えば財務的公式)を理解すること。これらのタスクは、モデルの財務的背景知識、財務文書から数値的実体を抽出する能力、およびコードによる問題を解決する能力を評価する。我々は、BizBenchが金融及びビジネス領域における量的推論の難しいベンチマークであることを示すオープンソースおよび商用LCMの詳細な評価を行う。 As large language models (LLMs) impact a growing number of complex domains, it is becoming increasingly important to have fair, accurate, and rigorous evaluation benchmarks. Evaluating the reasoning skills required for business and financial NLP stands out as a particularly difficult challenge. We introduce BizBench, a new benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises 8 quantitative reasoning tasks. Notably, BizBench targets the complex task of question-answering (QA) for structured and unstructured financial data via program synthesis (i.e., code generation). We introduce three diverse financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate distinct financial reasoning capabilities required to solve these QA tasks: reading comprehension of financial text and tables, which is required to extract correct intermediate values; and understanding domain knowledge (e.g., financial formulas) needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to extract numeric entities from financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, illustrating that BizBench is a challenging benchmark for quantitative reasoning in the finance and business domain.	翻訳日:2023-11-14 17:41:50 公開日:2023-11-11
# ロバスト性の観点からのグロッキングの理解 Understanding Grokking Through A Robustness Viewpoint ( http://arxiv.org/abs/2311.06597v1 ) ライセンス: Link先を確認	Zhiquan Tan, Weiran Huang	(参考訳) 近年、グロッキングと呼ばれる異常な現象が注目され、ニューラルネットワークがトレーニングデータに完全に適合した後に一般化することがある。ニューラルネットワークのロバスト性を利用して、この奇妙な現象を理解しようとしている。また,ロバスト性の観点からは,ニューラルネットワークのl_2$weight norm (metric) がグルーキングの十分条件であることを示す。また,l_2$ノルムがテストデータのグロッキングと時間的に相関していることが実証的に分かったので,ロバスト性と情報理論に基づく新しい指標を提案し,新しい指標がグロキング現象とよく相関していることを見いだした。先程の観測に基づいて,一般化過程を高速化する手法を提案する。さらに, モジュロ付加データセットの標準トレーニングプロセスについて検討し, 通勤法など, グルーキング前の基本的なグループ操作をほとんど学ばないことを見出した。興味深いことに,提案手法を用いた一般化の高速化は,モデルがテストデータセットに群がる必要条件である可換法則を学習することによって部分的に説明できる。 Recently, an unusual phenomenon called grokking has gained much attention, where sometimes a neural network generalizes long after it perfectly fits the training data. We try to understand this seemingly strange phenomenon using the robustness of the neural network. Using a robustness viewpoint, we show that the popular $l_2$ weight norm (metric) of the neural network is actually a sufficient condition for grokking. As we also empirically find that $l_2$ norm correlates with grokking on the test data not in a timely way, we propose new metrics based on robustness and information theory and find that our new metrics correlate well with the grokking phenomenon. Based on the previous observations, we propose methods to speed up the generalization process. In addition, we examine the standard training process on modulo addition dataset and find that it hardly learns other basic group operations before grokking, including the commutative law. Interestingly, the speed up of generalization when using our proposed method can be partially explained by learning the commutative law, a necessary condition when the model groks on test dataset.	翻訳日:2023-11-14 17:41:28 公開日:2023-11-11
# 分類から生成へ:言語横断検索型ICLへの展望 From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL ( http://arxiv.org/abs/2311.06595v1 ) ライセンス: Link先を確認	Xiaoqian Li, Ercong Nie, Sheng Liang	(参考訳) 大きな言語モデル(llm)が命令を理解して従う能力は、低リソース言語でのin-context learning(icl)性能によって制限されることがある。そこで本研究では,言語間検索強化型インコンテキスト学習(CREA-ICL)を活用した新しい手法を提案する。高リソース言語から意味的に類似したプロンプトを抽出することで、様々なタスクにわたる多言語事前学習言語モデル(mplm)のゼロショット性能を向上させることを目指している。我々のアプローチは分類タスクを着実に改善するが、生成タスクの課題に直面している。本評価は,分類領域と生成領域にまたがる検索文内学習の性能動態に関する知見を提供する。 The remarkable ability of Large Language Models (LLMs) to understand and follow instructions has sometimes been limited by their in-context learning (ICL) performance in low-resource languages. To address this, we introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL). By extracting semantically similar prompts from high-resource languages, we aim to improve the zero-shot performance of multilingual pre-trained language models (MPLMs) across diverse tasks. Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks. Our evaluation offers insights into the performance dynamics of retrieval-augmented in-context learning across both classification and generation domains.	翻訳日:2023-11-14 17:41:08 公開日:2023-11-11
# ホットシステム間の論理ゲートを用いた量子計算 Quantum computation with logical gates between hot systems ( http://arxiv.org/abs/2311.06588v1 ) ライセンス: Link先を確認	Ferran Riera-S\`abat, Pavel Sekatski, and Wolfgang D\"ur	(参考訳) 量子コンピュータアーキテクチャでは、機械的な基底状態ではないホット量子ビット間で相互作用が媒介される。そのような状況は、例えば、理想的に冷却しない場合や、イオンや原子が動き回る場合などに起こる。論理的に符号化されたシステム間で量子ゲートを導入し、これらのゲートがこのような不完全性に対して弾力性を持つことを示す。このようにして、論理系を大きくすることでゲートの忠実度を向上し、未知の位置や関連する粒子の位置ゆらぎの影響に対処できることを実証する。確率分布における位置の古典的処理と、機械的固有値を用いた量子処理の両方を考慮する。 2つのホットシステム間の相互作用を媒介するクールな論理システムや,その位置が個々に変動するホット物理システムからなる2つの論理システムなど,さまざまな設定を分析した。いずれの場合においても,熱騒音を緩和するためのプラットフォームに依存しないツールを提供するゲートフィダリティの大幅な改善を実証する。 We consider quantum computer architectures where interactions are mediated between hot qubits that are not in their mechanical ground state. Such situations occur, e.g., when not cooling ideally, or when moving ions or atoms around. We introduce quantum gates between logically encoded systems that consist of multiple physical ones and show how the encoding can be used to make these gates resilient against such imperfections. We demonstrate that, in this way, one can improve gate fidelities by enlarging the logical system, and counteract the effect of unknown positions or position fluctuations of involved particles. We consider both a classical treatment of positions in terms of probability distributions, as well a quantum treatment using mechanical eigenmodes. We analyze different settings including a cool logical system mediating interactions between two hot systems, as well as two logical systems consisting of hot physical systems whose positions fluctuate collectively or individually. In all cases, we demonstrate a significant improvement of gate fidelities, which provides a platform-independent tool to mitigate thermal noise.	翻訳日:2023-11-14 17:40:55 公開日:2023-11-11
# 非自明な貯蓄によるAgnostic Membership Query Learning: 新しい結果、テクニック Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques ( http://arxiv.org/abs/2311.06690v1 ) ライセンス: Link先を確認	Ari Karchmer	(参考訳) (橋渡し) 不可知学習モデル(Haussler, 1992; Kearns et al., 1994)における計算効率のよいアルゴリズムの設計は、非常に難しい。本研究では,2^n$の自明なランタイム上でどれだけの計算を節約できるかに着目し,非依存学習の最前線におけるタッチストーンクラスのメンバシップクエリによる非依存学習について考察する。このアプローチは‘非自明な貯蓄による学習’(Servedio and Tan, 2017)にインスパイアされ、継続している。この目的のために,1 個のゲートからなる回路の非依存学習アルゴリズムを,次数k の多項式しきい値関数で計算可能な任意の関数(回路の深さは大きさのみに制限される)として確立する。このアルゴリズムは s(n) \approx n/(k+1) の時間 2^{n -s(n)} で実行され、 \{0,1\}^n 上のラベルなし例に対する一様分布を学習する。 2) ゲートのサブ線形数からなる回路の非依存学習アルゴリズムでは,各回路は,サブ指数サイズとサブ対数次数 k の \sym^+ 回路で計算可能な任意の関数を計算できる。このアルゴリズムは s(n) \approx n/(k+1) に対して時間 2^{n-s(n)} で実行され、k+1 の任意の分布と未知の分布の積である非競合例の分布について学習する(k+1 が n を割る一般性を失うことなく)。 (Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n).	翻訳日:2023-11-14 17:34:03 公開日:2023-11-11
# p$-spinモデルのための単層デジタル化カウンタダイアバティック量子最適化 Single-Layer Digitized-Counterdiabatic Quantum Optimization for $p$-spin Models ( http://arxiv.org/abs/2311.06682v1 ) ライセンス: Link先を確認	Huijie Guan, Fei Zhou, Francisco Albarr\'an-Arriagada, Xi Chen, Enrique Solano, Narendra N. Hegade, He-Liang Huang	(参考訳) 量子コンピューティングは最適化問題において量子優位の可能性を秘めており、量子アルゴリズムとハードウェア仕様の進歩を必要とする。断熱量子最適化は概念的には有効な解であり、ハードウェアのコヒーレンス時間が限られている。この意味で、反断熱量子プロトコルはこの過程のショートカットを提供し、急速に変化するハミルトニアンの基底状態に沿ってシステムを操る。本研究では,デジタルカウンタダイバティック量子最適化(DCQO)アルゴリズムの利点をフル活用し,最大4局所相互作用までの$p$-spinモデルの最適解を求める。適切なスケジューリング関数と初期ハミルトニアンを選択すると、単層量子回路が十分満足して良い基底状態重なりが得られる。さらに変動法を用いてパラメータを最適化することにより、それぞれ100\%$,93\%$,83\%$のインスタンスに対して、単位精度2-スピン、3-スピン、4-スピンの問題を解く。後者の場合として、5,9,12量子ビットを含む分解問題も解決する。計算オーバーヘッドが低いため、我々のコンパクトなアプローチは、NISQ時代の量子優位性に対する貴重なツールとなりうる。 Quantum computing holds the potential for quantum advantage in optimization problems, which requires advances in quantum algorithms and hardware specifications. Adiabatic quantum optimization is conceptually a valid solution that suffers from limited hardware coherence times. In this sense, counterdiabatic quantum protocols provide a shortcut to this process, steering the system along its ground state with fast-changing Hamiltonian. In this work, we take full advantage of a digitized-counterdiabatic quantum optimization (DCQO) algorithm to find an optimal solution of the $p$-spin model up to 4-local interactions. We choose a suitable scheduling function and initial Hamiltonian such that a single-layer quantum circuit suffices to produce a good ground-state overlap. By further optimizing parameters using variational methods, we solve with unit accuracy 2-spin, 3-spin, and 4-spin problems for $100\%$, $93\%$, and $83\%$ of instances, respectively. As a particular case of the latter, we also solve factorization problems involving 5, 9, and 12 qubits. Due to the low computational overhead, our compact approach may become a valuable tool towards quantum advantage in the NISQ era.	翻訳日:2023-11-14 17:33:26 公開日:2023-11-11
# ポストセレクテッドメトロロジーにおける圧縮チャネルの理論 Theory of Compression Channels for Post-selected Metrology ( http://arxiv.org/abs/2311.06679v1 ) ライセンス: Link先を確認	Jing Yang	(参考訳) 確率的メタロジ(probabilistic merology)としても知られるポストセレクトメロジ(Post-selected merology)は、精度の低下を伴わずにサンプル数を圧縮する効率的なフィルタや圧縮チャネルとして用いられる。このメトロロジースキームは、実際の実験で最終的な測定が非常に騒がしいか高価な場合、特に有利である。本研究では,ポストセレクトメトロジーにおける圧縮チャネルに関する一般的な理論を提唱する。圧縮品質を特徴付ける基本表記法を定義し,基礎構造を照らし出す。選択後の光位相推定と弱値増幅に関する以前の実験は、この一般理論の特別な例である。さらに,二成分系の2つのカテゴリにおいて,圧縮チャネルを1つのサブシステムに制限しても圧縮損失を任意に小さくすることができることを見出した。これらの結果は、測定ノイズとコストが劇的に低減するように量子測定を分配するために用いられる。そのため、量子技術にすぐに応用できると期待している。 Post-selected metrology, also known as probabilistic metrology, can be employed as an efficient filter or compression channel to compress the number of samples without significant loss of precision. This metrological scheme is especially advantageous when the final measurements are either very noisy or expensive in practical experiments. In this work, we put forward a general theory on the compression channels in post-selected metrology. We define the basic notations characterizing the compression quality and illuminate the underlying structure. Previous experiments on post-selected optical phase estimation and weak-value amplification are shown to be particular cases of this general theory. Furthermore, we discover that for two categories of bipartite systems, the compression loss can be made arbitrarily small even when the compression channel is restricted to one subsystem. These findings can be employed to distribute quantum measurements so that the measurement noise and cost are dramatically reduced. Therefore, we expect they will find immediate applications in quantum technology.	翻訳日:2023-11-14 17:33:06 公開日:2023-11-11
# 適応への夢:潜在文脈イマジネーションとMDPイマジネーションによるメタ強化学習 Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination ( http://arxiv.org/abs/2311.06673v1 ) ライセンス: Link先を確認	Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng	(参考訳) メタ強化学習(Meta RL)は、類似したタスクから学習した知識を伝達することによって、目立たないタスクを素早く学習するために、十分に研究されている。しかし、ほとんどの最先端のアルゴリズムでは、メタトレーニングタスクはタスクの分散を密にカバーし、それぞれに大量のデータを必要とする。本稿では,メタ想像とMDP想像を行うことにより,実際のトレーニング作業やデータが少ないコンテキストベースのメタRLアルゴリズムであるMetaDreamerを提案する。我々は,不連続な性質を持つ学習された潜在コンテキスト空間を補間し,物理的知識をプレーンvaeネットワークに追加する生成世界モデルを通じてmdpを補間することでメタイマジネーションを行う。様々なベンチマークによる実験により,MetaDreamerはデータ効率と補間一般化の既存手法より優れていることが示された。 Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization.	翻訳日:2023-11-14 17:32:50 公開日:2023-11-11
# デジタル著作権管理(DRM)のガイドライン Guideline for the Production of Digital Rights Management (DRM) ( http://arxiv.org/abs/2311.06671v1 ) ライセンス: Link先を確認	Shannon Kathleen Coates, Hossein Abroshan	(参考訳) 長年にわたり複数のニュースソースがデジタル著作権管理の問題点について報告してきたが、DRM開発のための改革は行われていない。問題は一般によく知られ、対処しても頻繁に繰り返される。ソフトウェアやそれらを実行するデバイスへの影響だ。しかし、近年、特に提示された問題を排除する意図で、議論されているものはほとんどない。本研究は,デジタル著作権管理を一般論として,取得る様々な形態,drmに影響を及ぼす現行法,現在の公衆の受容と対応などについて検討する。本研究は、DRMのさまざまな種類を概説し、正例と負例の両方を列挙する。 Multiple news sources over the years have reported on the problematic effects of Digital Rights Management, yet there are no reforms for DRM development, simply removal. The issues are well-known to the public, frequently repeated even when addressed: impact on the software and to the devices that run them. Yet few, if any, have discussed it in recent years, especially with the intent of eliminating the shown issues. This study reviews Digital Rights Management as a general topic, including the various forms it can take, the current laws that affect DRM, and the current public reception and responses. This study describes the different types of DRM in general terms and then lists both positive and negative examples.	翻訳日:2023-11-14 17:32:33 公開日:2023-11-11
# In-context Vectors:潜時空間ステアリングによる文脈学習の効率化と制御性 In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering ( http://arxiv.org/abs/2311.06668v1 ) ライセンス: Link先を確認	Sheng Liu, Lei Xing, James Zou	(参考訳) 大規模言語モデル(LLM)は、実例に基づく新しいタスクに適応する、創発的なコンテキスト内学習能力を示す。しかし、コンテキスト内学習は多くの設定において限定的な効果を示しており、定量的に制御することは困難であり、コンテキストウィンドウスペースを取る。これらの制限を克服するために,文脈内学習を文脈内ベクトル(icv)として再キャストする手法を提案する。 ICVの使用には2つのステップがある。まず、実演例のフォワードパスを使用して、LCMの潜伏埋め込みからコンテキスト内ベクトルを生成する。このベクトルは、意図したタスクに関する重要な情報をキャプチャする。新しいクエリでは、プロンプトにデモを追加する代わりに、ICVを使ってLCMの潜伏状態を変更する。 icvアプローチにはいくつかの利点があります 1) LLM は,より効果的に実演例に従うことができる。 2)ICVの大きさを調整することで制御が容易である。 3) インコンテキストのデモを取り除き,プロンプトの長さを短縮する。 4) ICVは微調整よりも計算効率が高い。安全,スタイル転送,ロールプレイング,フォーマッティングなど多種多様なタスクに対して,標準のコンテキスト内学習や微調整よりも優れた性能を実現することを実証した。さらに,対応するISV上の単純ベクトル演算により,LLMに異なる命令を同時に追従するように柔軟に教えることができることを示す。 Large language models (LLMs) demonstrate emergent in-context learning capabilities, where they adapt to new tasks based on example demonstrations. However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV). Using ICV has two steps. We first use a forward pass on demonstration examples to create the in-context vector from the latent embedding of the LLM. This vector captures essential information about the intended task. On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV. The ICV approach has several benefits: 1) it enables the LLM to more effectively follow the demonstration examples; 2) it's easy to control by adjusting the magnitude of the ICV; 3) it reduces the length of the prompt by removing the in-context demonstrations; 4) ICV is computationally much more efficient than fine-tuning. We demonstrate that ICV achieves better performance compared to standard in-context learning and fine-tuning on diverse tasks including safety, style transfer, role-playing and formatting. Moreover, we show that we can flexibly teach LLM to simultaneously follow different types of instructions by simple vector arithmetics on the corresponding ICVs.	翻訳日:2023-11-14 17:32:17 公開日:2023-11-11
# 3dfusion - ストリームインスタンスセグメンテーションデータに基づくリアルタイム3dオブジェクト再構築パイプライン 3DFusion, A real-time 3D object reconstruction pipeline based on streamed instance segmented data ( http://arxiv.org/abs/2311.06659v1 ) ライセンス: Link先を確認	Xi Sun, Derek Jacoby, Yvonne Coady	(参考訳) 本稿では,RGB-D画像を用いたリアルタイムセグメンテーション・再構築システムを提案する。最先端のインスタンスセグメンテーション技術を活用し、RGB-Dデータ上でピクセルレベルのセグメンテーションを行い、背景オブジェクトを効果的に分離する。セグメント化されたオブジェクトは、高速な計算プラットフォームで異なる3Dモデルに再構成される。リアルタイム3Dモデリングは、拡張現実、仮想現実、インテリアデザイン、都市計画、道路支援、セキュリティシステムなど、さまざまな分野に適用することができる。本稿では,連続フレームを効果的にサンプリングし,復元品質を確保しつつネットワーク負荷を低減する手法を提案する。さらに、並列3次元再構成のためにマルチプロセスSLAMパイプラインが採用され、クラスタリングオブジェクトを個人に効率的に切断することができる。このシステムは、産業をリードするフレームワークであるYOLOを例に挙げる。 YOLOの性能と精度を向上させるため、類似したオブジェクトの重複や誤検出を解消し、再構成されたモデルがターゲットと一致することを保証した。全体として本研究は,室内環境におけるオブジェクトのセグメンテーションと再構成を著しく向上した,堅牢なリアルタイムシステムを確立する。屋外のシナリオに拡張して、現実世界のアプリケーションに多くの機会を開放する可能性がある。 This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images to generate accurate and detailed individual 3D models of objects within a captured scene. Leveraging state-of-the-art instance segmentation techniques, the system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background. The segmented objects are then reconstructed into distinct 3D models in a high-performance computation platform. The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more. To achieve real-time performance, the paper proposes a method that effectively samples consecutive frames to reduce network load while ensuring reconstruction quality. Additionally, a multi-process SLAM pipeline is adopted for parallel 3D reconstruction, enabling efficient cutting of the clustering objects into individuals. This system employs the industry-leading framework YOLO for instance segmentation. To improve YOLO's performance and accuracy, modifications were made to resolve duplicated or false detection of similar objects, ensuring the reconstructed models align with the targets. Overall, this work establishes a robust real-time system with a significant enhancement for object segmentation and reconstruction in the indoor environment. It can potentially be extended to the outdoor scenario, opening up numerous opportunities for real-world applications.	翻訳日:2023-11-14 17:31:55 公開日:2023-11-11
# マイクロ流体技術におけるダイヤモンド量子センサ Diamond quantum sensors in microfluidics technology ( http://arxiv.org/abs/2311.06656v1 ) ライセンス: Link先を確認	Masazumi Fujiwara	(参考訳) ダイヤモンド量子センシングは、様々な化学的および生物学的文脈において、ナノからマイクロスケールの複数の物理化学的パラメータを探索する新しい技術である。これらのセンサをマイクロ流体デバイスに統合することで、マイクロスケールチャネル内の小さなサンプルボリュームの正確な定量化と分析が可能になる。本稿では,ダイヤモンド量子センサとマイクロ流体デバイスの統合の最近の進歩について述べるとともに,今後の技術発展に焦点をあてて今後の展望を探る。 Diamond quantum sensing is an emerging technology for probing multiple physico-chemical parameters in the nano- to micro-scale dimensions within diverse chemical and biological contexts. Integrating these sensors into microfluidic devices enables the precise quantification and analysis of small sample volumes in microscale channels. In this Perspective, we present recent advancements in the integration of diamond quantum sensors with microfluidic devices and explore their prospects with a focus on forthcoming technological developments.	翻訳日:2023-11-14 17:31:32 公開日:2023-11-11
# セグメンテーション周波数統計を用いた教師なし・半教師付き共存物体検出 Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics ( http://arxiv.org/abs/2311.06654v1 ) ライセンス: Link先を確認	Souradeep Chakraborty, Shujon Naha, Muhammet Bastan, Amit Kumar K C, Dimitris Samaras	(参考訳) 本稿では、周波数統計を用いた画像群における共起サラリアンオブジェクト(CoSOD)の検出に対処し、さらに半教師付き手法の開発を可能にする。以前の研究は、主に完全な教師付きcosodにフォーカスしていたが、訓練用に制限されたセグメンテーションアノテーションが利用できる場合、協調オブジェクトを検出することにはあまり注意が払われていない。 us-cosod法は,自己教師付き特徴学習を用いて,教師なし単一画像意味セグメンテーションのオブジェクト共起頻度統計と有意義な前景検出を組み合わせる。初めて、ImageNet-1kのような大規模なラベルなしデータセットを効果的に活用し、教師なしのCoSOD性能を大幅に改善できることを示す。我々の教師なしモデルは、特に非常に限られたラベル付きデータがトレーニングに利用可能である場合、半教師付きモデルSS-CoSODのトレーニング前初期化に優れたものです。ラベルなしデータの予測から誤信号の伝播を避けるため,半教師付きトレーニングをガイドする信頼度推定モジュールを提案する。例えば、Cosal2015データセットでは、当社のUS-CoSODモデルはSOTAの教師なしコセグメンテーションモデルよりも8.8%、SS-CoSODモデルはSOTAの半教師付きCoSODモデルよりも11.81%のF測定ゲインを持つ。 In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).	翻訳日:2023-11-14 17:31:25 公開日:2023-11-11
# 局所視覚変換器を用いた交通信号認識 Traffic Sign Recognition Using Local Vision Transformer ( http://arxiv.org/abs/2311.06651v1 ) ライセンス: Link先を確認	Ali Farzipour, Omid Nejati Manzari, Shahriar B. Shokouhi	(参考訳) 交通標識認識は、自動運転車や運転支援システムにおいて重要な側面であり、交通標識認識などの機械視タスクが注目されている。 cnnは機械ビジョンで頻繁に使われているが、視覚トランスフォーマーの導入はグローバル機能学習に代替のアプローチを提供した。本稿では,交通信号認識のための畳み込み型ネットワークと変圧器型ネットワークの利点を融合した新しいモデルを提案する。提案モデルは,局所相関をキャプチャする畳み込みブロックと,グローバル依存を学習するためのトランスフォーマティブブロックを含む。さらに、局所知覚を高めるために局所モジュールが組み込まれている。提案モデルの性能をペルシャ交通信号データセットとドイツ交通信号認識ベンチマークで評価し,SOTA畳み込みモデルと変圧器モデルとの比較を行った。実験評価の結果,局所性モジュールを用いたハイブリッドネットワークは,トランスフォーマーモデルや畳み込みネットワークの精度を上回っていることがわかった。具体的には、提案した最終モデルは、ドイツのトラフィックサイン認識ベンチマークで99.66%、ペルシアのトラフィックサインデータセットで99.8%に達し、最も優れた畳み込みモデルよりも高かった。さらに、高速な推論速度を維持しながら、既存のCNNやViTよりも優れています。その結果,提案手法はより高速で,現実のアプリケーションに適していることがわかった。 Recognition of traffic signs is a crucial aspect of self-driving cars and driver assistance systems, and machine vision tasks such as traffic sign recognition have gained significant attention. CNNs have been frequently used in machine vision, but introducing vision transformers has provided an alternative approach to global feature learning. This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition. The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies. Additionally, a locality module is incorporated to enhance local perception. The performance of the suggested model is evaluated on the Persian Traffic Sign Dataset and German Traffic Sign Recognition Benchmark and compared with SOTA convolutional and transformer-based models. The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy. Specifically, our proposed final model reached 99.66% accuracy in the German traffic sign recognition benchmark and 99.8% in the Persian traffic sign dataset, higher than the best convolutional models. Moreover, it outperforms existing CNNs and ViTs while maintaining fast inference speed. Consequently, the proposed model proves to be significantly faster and more suitable for real-world applications.	翻訳日:2023-11-14 17:30:49 公開日:2023-11-11
# 分岐ネットワークにおけるヒューリスティック最適輸送 Heuristic Optimal Transport in Branching Networks ( http://arxiv.org/abs/2311.06650v1 ) ライセンス: Link先を確認	M. Andrecut	(参考訳) 最適輸送は、通常距離の関数として定義されるコストを最小限にして、ソースをターゲットにマッピングすることを目的としている。この問題の解決策は、ソースをターゲットに最適に接続する直線セグメントで構成されており、分岐は示さない。これらの最適解は、分岐構造が一般的である自然および人工の輸送ネットワークと対照的である。本稿では,ネットワークにおける最適輸送のための高速ヒューリスティック分岐法について論じる。 Optimal transport aims to learn a mapping of sources to targets by minimizing the cost, which is typically defined as a function of distance. The solution to this problem consists of straight line segments optimally connecting sources to targets, and it does not exhibit branching. These optimal solutions are in stark contrast with both natural, and man-made transportation networks, where branching structures are prevalent. Here we discuss a fast heuristic branching method for optimal transport in networks, and we provide several applications.	翻訳日:2023-11-14 17:30:25 公開日:2023-11-11
# テンプレートはあなただけのミームです A Template Is All You Meme ( http://arxiv.org/abs/2311.06649v1 ) ライセンス: Link先を確認	Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych	(参考訳) ミームはコミュニケーションの現代的な形態であり、ミームテンプレートはベースセマンティクスを持ち、ソーシャルメディアに投稿する人によってカスタマイズできる。機械学習システムはミームに苦しむが、それはおそらく、ミームを理解するためのコンテキストが不十分なためである。ここでは、ミームの理解を支援するために、www.knowyourmeme.comにあるミームの知識ベースと情報を公開し、54,000以上の画像からなるnow your meme knowledge base (kymkb) と呼ぶ。 KYMKBには、人気のあるミームテンプレート、テンプレートの例、テンプレートの詳細情報が含まれている。 memeテンプレートは、以前のアプローチに欠けているコンテキストのモデル注入に使用できる、と仮定しています。仮説を検証するために、非パラメトリックなマジョリティベースの分類器を作成し、これをテンプレートラベルカウンタ(TLC)と呼ぶ。 TLCは微調整ベースラインよりも効果的か,あるいは競争力が高い。ミームテンプレートのパワーと知識ベースと手法の両方の価値を実証するために,5つのミーム分析タスクの文脈において,詳細な分類実験と探索データ分析を行う。 Memes are a modern form of communication and meme templates possess a base semantics that is customizable by whomever posts it on social media. Machine learning systems struggle with memes, which is likely due to such systems having insufficient context to understand memes, as there is more to memes than the obvious image and text. Here, to aid understanding of memes, we release a knowledge base of memes and information found on www.knowyourmeme.com, which we call the Know Your Meme Knowledge Base (KYMKB), composed of more than 54,000 images. The KYMKB includes popular meme templates, examples of each template, and detailed information about the template. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches. To test our hypothesis, we create a non-parametric majority-based classifier, which we call Template-Label Counter (TLC). We find TLC more effective than or competitive with fine-tuned baselines. To demonstrate the power of meme templates and the value of both our knowledge base and method, we conduct thorough classification experiments and exploratory data analysis in the context of five meme analysis tasks.	翻訳日:2023-11-14 17:30:16 公開日:2023-11-11
# ロバストテキスト分類:プロトタイプベースネットワークの解析 Robust Text Classification: Analyzing Prototype-Based Networks ( http://arxiv.org/abs/2311.06647v1 ) ライセンス: Link先を確認	Zhivar Sourati, Darshan Deshpande, Filip Ilievski, Kiril Gashteovski, Sascha Saralajew	(参考訳) 下流のアプリケーションは、正確で堅牢で解釈可能なテキスト分類モデルを必要とすることが多い。最先端言語モデルの精度は人間のパフォーマンスに近似するが、解釈可能ではなく、しばしばノイズの多いデータに性能の低下を示す。クラス(プロトタイプ)の原型的な例と類似性に基づいて例を分類するプロトタイプベースネットワーク(pbns)のファミリは、ネイティブに解釈可能であり、ノイズに頑健であることが示され、コンピュータビジョンタスクに広く使用される。本稿では,PBNのロバスト性がテキスト分類タスクに伝達されるかどうかを考察する。我々は、異なるバックボーンアーキテクチャ、バックボーンサイズ、objective関数を含むpbnを研究するためのモジュラーで包括的なフレームワークを設計する。評価プロトコルは,文字・単語・文レベルの摂動に対するモデルの堅牢性を評価する。 3つのベンチマーク実験により,現実的な摂動に直面したNLP分類タスクへのPBNのロバスト性を示す。さらに、pbnのロバスト性は、主にプロトタイプを解釈可能な目的関数によってサポートされ、データセットが複雑になるにつれて、バニラモデルよりもpbnのロバスト性がより顕著になる。 Downstream applications often require text classification models to be accurate, robust, and interpretable. While the accuracy of the stateof-the-art language models approximates human performance, they are not designed to be interpretable and often exhibit a drop in performance on noisy data. The family of PrototypeBased Networks (PBNs) that classify examples based on their similarity to prototypical examples of a class (prototypes) is natively interpretable and shown to be robust to noise, which enabled its wide usage for computer vision tasks. In this paper, we study whether the robustness properties of PBNs transfer to text classification tasks. We design a modular and comprehensive framework for studying PBNs, which includes different backbone architectures, backbone sizes, and objective functions. Our evaluation protocol assesses the robustness of models against character-, word-, and sentence-level perturbations. Our experiments on three benchmarks show that the robustness of PBNs transfers to NLP classification tasks facing realistic perturbations. Moreover, the robustness of PBNs is supported mostly by the objective function that keeps prototypes interpretable, while the robustness superiority of PBNs over vanilla models becomes more salient as datasets get more complex.	翻訳日:2023-11-14 17:29:54 公開日:2023-11-11
# 医用画像のフェデレーション学習におけるプライバシーリスク分析と緩和 Privacy Risks Analysis and Mitigation in Federated Learning for Medical Images ( http://arxiv.org/abs/2311.06643v1 ) ライセンス: Link先を確認	Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu	(参考訳) 医療画像分析の分野では, 患者データを保護し, プライバシ規制に従うための効果的な手法として, フェデレートラーニング(FL)が普及している。しかし、最近のいくつかの研究により、FLのデフォルト設定がプライバシー攻撃の下でプライベートトレーニングデータを漏洩させる可能性があることが明らかになった。したがって、FLのそのようなプライバシーリスクが医療領域にどの程度存在するのか、また「そのようなリスクを軽減するにはどうすればいいのか」はいまだに不明である。本稿では,まず,フェデレートラーニング(MedPFL)における医療データプライバシリスク分析と緩和のための総合的枠組みを提案し,プライバシリスクを分析し,私的医療データを保護するための効果的な緩和戦略を開発する。第2に、FLを用いて医療画像を処理する場合のプライバシーリスクについて、敵が容易にプライバシー攻撃を行い、医療画像を正確に再構築できることを示す。第3に、ランダムノイズを付加する防御アプローチは、flにおけるプライバシー攻撃から医療画像を保護するために常に効果的に機能するとは限らないことを示し、プライバシー保護のための医療データに関する独特で差し迫った課題を提起する。 Federated learning (FL) is gaining increasing popularity in the medical domain for analyzing medical images, which is considered an effective technique to safeguard sensitive patient data and comply with privacy regulations. However, several recent studies have revealed that the default settings of FL may leak private training data under privacy attacks. Thus, it is still unclear whether and to what extent such privacy risks of FL exist in the medical domain, and if so, ``how to mitigate such risks?''. In this paper, first, we propose a holistic framework for Medical data Privacy risk analysis and mitigation in Federated Learning (MedPFL) to analyze privacy risks and develop effective mitigation strategies in FL for protecting private medical data. Second, we demonstrate the substantial privacy risks of using FL to process medical images, where adversaries can easily perform privacy attacks to reconstruct private medical images accurately. Third, we show that the defense approach of adding random noises may not always work effectively to protect medical images against privacy attacks in FL, which poses unique and pressing challenges associated with medical data for privacy protection.	翻訳日:2023-11-14 17:29:31 公開日:2023-11-11
# 多次元反射問題に対するデータ駆動ルール Data-driven rules for multidimensional reflection problems ( http://arxiv.org/abs/2311.06639v1 ) ライセンス: Link先を確認	S\"oren Christensen, Asbj{\o}rn Holk Thomsen and Lukas Trottner	(参考訳) 近年,モデル不確実性に直面した確率論的最適制御問題を解くためのデータ駆動アルゴリズムが研究の活発な領域となっている。しかし、特異制御と基底拡散ダイナミクスについては、解析はスカラーの場合に限られている。本稿では,反射型制御を持つ可逆拡散に対する多変量特異制御問題の研究により,このギャップを埋める。私たちの貢献は3倍です。まず, 制御問題を形状最適化問題として同等に特徴付けることができることを示すため, 長期平均コストをドメイン依存機能として明確に決定する。任意の拡散ダイナミクスにおいて、最適領域が強星型であると仮定すると、ポリトープ近似に基づく勾配降下アルゴリズムを提案し、コスト最小化領域を数値的に決定する。最後に,制御器に拡散力学が未知な場合のデータ駆動型解について検討する。確率過程の非パラメトリック統計学の手法を用いて、静的な後悔が非反射過程の不変密度の極小最適推定速度によって束縛される最適領域推定器を構築する。最も困難な状況では、プロセスを制御するために同時にダイナミクスを学ばなければならないとき、新たな探索・探索ジレンマを克服するためのエピソディック学習アルゴリズムを開発し、静的な後悔をベースラインとして考えると、時間単位あたりのサブリニア後悔の損失は1次元の場合と比較して自然秩序であることを示す。 Over the recent past data-driven algorithms for solving stochastic optimal control problems in face of model uncertainty have become an increasingly active area of research. However, for singular controls and underlying diffusion dynamics the analysis has so far been restricted to the scalar case. In this paper we fill this gap by studying a multivariate singular control problem for reversible diffusions with controls of reflection type. Our contributions are threefold. We first explicitly determine the long-run average costs as a domain-dependent functional, showing that the control problem can be equivalently characterized as a shape optimization problem. For given diffusion dynamics, assuming the optimal domain to be strongly star-shaped, we then propose a gradient descent algorithm based on polytope approximations to numerically determine a cost-minimizing domain. Finally, we investigate data-driven solutions when the diffusion dynamics are unknown to the controller. Using techniques from nonparametric statistics for stochastic processes, we construct an optimal domain estimator, whose static regret is bounded by the minimax optimal estimation rate of the unreflected process' invariant density. In the most challenging situation, when the dynamics must be learned simultaneously to controlling the process, we develop an episodic learning algorithm to overcome the emerging exploration-exploitation dilemma and show that given the static regret as a baseline, the loss in its sublinear regret per time unit is of natural order compared to the one-dimensional case.	翻訳日:2023-11-14 17:29:11 公開日:2023-11-11
# 皮膚病変スクリーニングのための自動自己監督学習 Automatized Self-Supervised Learning for Skin Lesion Screening ( http://arxiv.org/abs/2311.06691v1 ) ライセンス: Link先を確認	Vullnet Useini, Stephanie Tanadini-Lang, Quentin Lohmeyer, Mirko Meboldt, Nicolaus Andratschke, Ralph P. Braun and Javier Barranco Garc\'ia	(参考訳) 皮膚がんの死亡率が最も高いメラノーマの発生率は世界中で増加しており、皮膚科医にとって大きな課題となっている。悪性黒色腫の早期発見は患者の生存率の向上に不可欠であるが,現在の皮膚がんスクリーニング法であるアヒルスクリーニング(udスクリーニング)による疑わしい病変の同定は困難であり,色素性病変の専門知識を必要とすることが多い。これらの課題に対処し、患者の成果を改善するために、皮膚科医が広範囲の患者画像からUDを特定するのを支援する人工知能(AI)意思決定支援ツールを開発した。このツールは最先端のオブジェクト検出アルゴリズムを使用して、患者の画像からすべての皮膚病変を特定し、抽出する。このツールの性能を評価するために臨床検証を行った結果、顔色素性皮膚病変の専門家の多数が選択した皮膚病変について、トップ10のai同定udの平均感度は93%であった。研究によると、皮膚科医は自信を増し、AIによって補助された場合、トップ10のAI識別UDとの過半数の合意は100%改善した。このAI意思決定支援ツールの開発は、専門家の不足に対処し、リスクの高い患者がより早く相談を受け、AI支援スクリーニングの影響を理解することを目的としている。このツールの自動化は、皮膚科医が疑わしい病変を特定し、より客観的な評価を提供し、スクリーニングプロセスの主観性を低下させる。このプロジェクトの今後のステップは、組織学的に確認されたメラノーマ症例を含むようにデータセットを拡大することと、ツールの信頼性を強化し、現実のコンサルテーションに適応するための臨床検証参加者の数を増やすことである。 The incidence rates of melanoma, the deadliest form of skin cancer, have been increasing steadily worldwide, presenting a significant challenge to dermatologists. Early detection of melanoma is crucial for improving patient survival rates, but identifying suspicious lesions through ugly duckling (UD) screening, the current method used for skin cancer screening, can be challenging and often requires expertise in pigmented lesions. To address these challenges and improve patient outcomes, an artificial intelligence (AI) decision support tool was developed to assist dermatologists in identifying UD from wide-field patient images. The tool uses a state-of-the-art object detection algorithm to identify and extract all skin lesions from patient images, which are then sorted by suspiciousness using a self-supervised AI algorithm. A clinical validation study was conducted to evaluate the tool's performance, which demonstrated an average sensitivity of 93% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that dermatologists confidence increased, and the average majority agreement with the top-10 AI-identified UDs improved to 100% when assisted by AI. The development of this AI decision support tool aims to address the shortage of specialists, enable at-risk patients to receive faster consultations and understand the impact of AI-assisted screening. The tool's automation can assist dermatologists in identifying suspicious lesions and provide a more objective assessment, reducing subjectivity in the screening process. The future steps for this project include expanding the dataset to include histologically confirmed melanoma cases and increasing the number of participants for clinical validation to strengthen the tool's reliability and adapt it for real-world consultation.	翻訳日:2023-11-14 17:16:34 公開日:2023-11-11
# Neuro-GPT:脳波の基礎モデルの開発 Neuro-GPT: Developing A Foundation Model for EEG ( http://arxiv.org/abs/2311.03764v3 ) ライセンス: Link先を確認	Wenhui Cui, Woojae Jeong, Philipp Th\"olke, Takfarinas Medani, Karim Jerbi, Anand A. Joshi, Richard M. Leahy	(参考訳) 脳-コンピューターインタフェース(bci)タスクのための脳波(eeg)データの不足と不均一性に対処するため、大規模な公開データセットのパワーを活用するために、脳波エンコーダとgptモデルからなる基礎モデルであるneuro-gptを提案する。基礎モデルは、マスクされた脳波セグメントの再構築方法を学ぶ自己教師付きタスクを使用して、大規模データセット上で事前訓練される。次に,モータ画像分類タスクのモデルを微調整し,低データ方式(9項目)の性能評価を行う。基礎モデルの適用は,スクラッチからトレーニングしたモデルと比較して,分類性能を著しく向上できることを実証し,基礎モデルの一般化可能性と,脳波におけるデータ不足や多様性の課題に対処する能力を示す。 To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model. The foundation model is pre-trained on a large-scale data set using a self-supervised task that learns how to reconstruct masked EEG segments. We then fine-tune the model on a Motor Imagery Classification task to validate its performance in a low-data regime (9 subjects). Our experiments demonstrate that applying a foundation model can significantly improve classification performance compared to a model trained from scratch, which provides evidence for the generalizability of the foundation model and its ability to address challenges of data scarcity and heterogeneity in EEG.	翻訳日:2023-11-14 11:07:57 公開日:2023-11-11

Title

Authors

Abstract

論文公表日・翻訳日

# BlockEmulator: ブロックチェーンシャーディングプロトコルをテストするエミュレータ

BlockEmulator: An Emulator Enabling to Test Blockchain Sharding Protocols ( http://arxiv.org/abs/2311.03612v2 )

ライセンス: Link先を確認

Huawei Huang, Guang Ye, Qinde Chen, Zhaokang Yin, Xiaofei Luo, Jianru Lin, Taotao Li, Qinglin Yang, Zibin Zheng,

(参考訳) 研究者が主流ブロックチェーンをシミュレートできるように、多くのブロックチェーンシミュレータが提案されている。しかし、ブロックチェーンシャーディングシステムのための新しいコンセンサスアルゴリズムや新しいプロトコルの開発と評価を可能にするテストベッドはまだ見つかっていない。このギャップを埋めるために、実験的なプラットフォームとして設計されたBlockEmulatorを開発し、特にブロックチェーンシャーディングメカニズムをエミュレートする。 BlockEmulatorは、開発者が新しいプロトコルやメカニズムの実装のみに集中できるように、軽量なブロックチェーンアーキテクチャを採用する。レイヤ化されたモジュールとBlockEmulatorが提供する有用なプログラミングインターフェースを使うことで、研究者は最小限の労力で新しいプロトコルを実装できる。実験を通じてBlockEmulatorの様々な機能を2つのステップでテストする。まず,BlockEmulatorによるエミュレーション結果の正当性を,理論解析と実験結果との比較により検証した。第二に、BlockEmulatorがスループット、トランザクション確認レイテンシ、クロスシャードトランザクション比率、トランザクションプールのキューサイズ、ブロックチェーンシャード間のワークロード分散など、一連のメトリクスの測定を容易にすることを示しています。 GithubでBlockEmulatorをオープンソース化しました。

Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating blockchain sharding mechanisms. BlockEmulator adopts a lightweight blockchain architecture such that developers can only focus on implementing their new protocols or mechanisms. Using layered modules and useful programming interfaces offered by BlockEmulator, researchers can implement a new protocol with minimum effort. Through experiments, we test various functionalities of BlockEmulator in two steps. Firstly, we prove the correctness of the emulation results yielded by BlockEmulator by comparing the theoretical analysis with the observed experiment results. Secondly, other experimental results demonstrate that BlockEmulator can facilitate the measurement of a series of metrics, including throughput, transaction confirmation latency, cross-shard transaction ratio, the queuing size of transaction pools, workload distribution across blockchain shards, etc. We have made BlockEmulator open-source in Github.

翻訳日:2024-03-25 13:36:10 公開日:2023-11-11

# クラウドソーシング画像サービスに対する変更内容の決定

Determining Intent of Changes to Ascertain Fake Crowdsourced Image Services ( http://arxiv.org/abs/2403.12045v1 )

ライセンス: Link先を確認

Muhammad Umair, Athman Bouguettaya, Abdallah Lakhdari,

(参考訳) 画像が偽物である可能性を判定するクラウドソース画像のための新しいフレームワークを提案する。我々は、ソーシャルメディアにアップロードされたクラウドソースイメージを、画像サービスとしてモデル化し、表現するために、サービス指向のアプローチを採用している。信頼は、ある状況において、画像サービスの非機能属性、すなわち画像メタデータのみを使用することで決定することができる。我々は、変更の意図を、偽画像サービスを確認するための重要なパラメータとして定義する。画像のセマンティクスの変化を考慮した基礎的変化の意図を推定する新しい枠組みを提案する。実験では,大規模な実データを用いた高精度な実験を行った。

We propose a novel framework for crowdsourced images to determine the likelihood of an image being fake. We use a service-oriented approach to model and represent crowdsourced images uploaded on social media, as image services. Trust may, in some circumstances, be determined by using only the non-functional attributes of an image service, i.e., image metadata. We define intention of changes as a key parameter to ascertain fake image services. A novel framework is proposed to estimate the intention of underlying changes considering change in semantics of an image. Our experiments show high accuracy using a large real dataset.

翻訳日:2024-03-25 07:56:27 公開日:2023-11-11

# ワイヤレスインジェクション攻撃を検知するフェデレートラーニングベースのプロトタイプ「Seeing is Believing」

Seeing is Believing: A Federated Learning Based Prototype to Detect Wireless Injection Attacks ( http://arxiv.org/abs/2311.06564v1 )

ライセンス: Link先を確認

Aadil Hussain, Nitheesh Gundapu, Sarang Drugkar, Suraj Kiran, J. Harshan, Ranjitha Prasad,

(参考訳) リアクティブ・インジェクション・アタック(Reactive Injection attack)は、無線ネットワークにおけるセキュリティ上の脅威の一種で、敵がクライアントの周波数帯域にスプーフィングパケットを同時に注入することで、ベースステーションに偽装検出方法の展開を強制する。このような脅威を回避するために、我々は、ベースステーションがベースバンド内の機械学習モデル(ML)を配置し、攻撃検出のためにベースバンドに二次的なサンプルを配置できるように、シークレットキーベースの物理層信号処理手法をクライアントに実装する。 Adalm Pluto ベースのソフトウェア定義無線を用いて秘密鍵ベースのシグナリング手法を実装し,基地局でロバストMLモデルを設計可能であることを示す。しかし、実際には、ベースステーションでのトレーニングデータセットの入手が不十分なため、これらの手法を効果的に利用できないことが指摘されている。これにより、クライアントをリアクティブなインジェクション脅威から保護する必要のあるベースステーションのグループは、データセットのプライバシを確保することで、MLモデルを洗練するために協力します。バックホールネットワークを実装するために,XBee機器のネットワークを用いて,フェデレート学習装置の実験結果から,検出精度が大幅に向上し,無線セキュリティが6Gネットワーク以降におけるフェデレーション学習の優れたユースケースとして提示される。

Reactive injection attacks are a class of security threats in wireless networks wherein adversaries opportunistically inject spoofing packets in the frequency band of a client thereby forcing the base-station to deploy impersonation-detection methods. Towards circumventing such threats, we implement secret-key based physical-layer signalling methods at the clients which allow the base-stations to deploy machine learning (ML) models on their in-phase and quadrature samples at the baseband for attack detection. Using Adalm Pluto based software defined radios to implement the secret-key based signalling methods, we show that robust ML models can be designed at the base-stations. However, we also point out that, in practice, insufficient availability of training datasets at the base-stations can make these methods ineffective. Thus, we use a federated learning framework in the backhaul network, wherein a group of base-stations that need to protect their clients against reactive injection threats collaborate to refine their ML models by ensuring privacy on their datasets. Using a network of XBee devices to implement the backhaul network, experimental results on our federated learning setup shows significant enhancements in the detection accuracy, thus presenting wireless security as an excellent use-case for federated learning in 6G networks and beyond.

翻訳日:2024-03-18 23:32:03 公開日:2023-11-11

# 有限体上の列の言葉線形複素性と写像の局所反転

Word Linear Complexity of sequences and Local Inversion of maps over finite fields ( http://arxiv.org/abs/2311.06574v1 )

ライセンス: Link先を確認

Virendra Sule,

(参考訳) 本稿では、有限体上のベクトル値列の「emph{Word Linear Complexity}」(WLC$)の概念を、列とそのアンサンブルの拡張として展開する。この複雑性の概念は、アンサンブル(ベクトル値)列の最小多項式の概念を行列最小多項式の最小多項式に拡張し、行列最小多項式が周期型であるときの方程式のユニークな局所的逆$x$を解くために与えられた$y$ in $\ff^n$において、写像$F:\ff^n\rightarrow\ff^n$で反復的に生成されたベクトル値列で使用できることを示す。反復列が周期的であるときの有限体における写像の局所的逆問題とそのクリプトアナリシスの様々な問題への応用のアイデアは、よく知られた$LC$という概念を用いて、以前の論文 \cite{sule322, sule521, sule722,suleCAM22} で展開されている。 $LC$ は、関連する列の最小多項式の次数である。 $LC$ から $WLC$ への一般化は、ワード指向の反復関係が $LC$ の定義で考慮されるスカラー乗法の代わりに行列ベクトル乗法によって得られるようなベクトル値(または単語指向)列を考える。したがって、関連する最小多項式は、次数が$WLC$と呼ばれる行列値である。単語指向反復関係に関連する非自明な行列多項式が周期的であるときに条件が導出される。行列最小多項式が存在するとき、$n(WLC)=LC$である。最後に、そのようなポリノメールが存在する場合、行列最小多項式を用いて局所反転問題を解くことにより、局所反転に対する単語指向のアプローチが導かれることを示す。

This paper develops the notion of \emph{Word Linear Complexity} ($WLC$) of vector valued sequences over finite fields $\ff$ as an extension of Linear Complexity ($LC$) of sequences and their ensembles. This notion of complexity extends the concept of the minimal polynomial of an ensemble (vector valued) sequence to that of a matrix minimal polynomial and shows that the matrix minimal polynomial can be used with iteratively generated vector valued sequences by maps $F:\ff^n\rightarrow\ff^n$ at a given $y$ in $\ff^n$ for solving the unique local inverse $x$ of the equation $y=F(x)$ when the sequence is periodic. The idea of solving a local inverse of a map in finite fields when the iterative sequence is periodic and its application to various problems of Cryptanalysis is developed in previous papers \cite{sule322, sule521, sule722,suleCAM22} using the well known notion of $LC$ of sequences. $LC$ is the degree of the associated minimal polynomial of the sequence. The generalization of $LC$ to $WLC$ considers vector valued (or word oriented) sequences such that the word oriented recurrence relation is obtained by matrix vector multiplication instead of scalar multiplication as considered in the definition of $LC$. Hence the associated minimal polynomial is matrix valued whose degree is called $WLC$. A condition is derived when a nontrivial matrix polynomial associated with the word oriented recurrence relation exists when the sequence is periodic. It is shown that when the matrix minimal polynomial exists $n(WLC)=LC$. Finally it is shown that the local inversion problem is solved using the matrix minimal polynomial when such a polynomail exists hence leads to a word oriented approach to local inversion.

翻訳日:2024-03-18 23:32:03 公開日:2023-11-11

# 文法的ジェンダーを見過ごさない:ヒンディー語-英語機械翻訳におけるバイアス評価

Don't Overlook the Grammatical Gender: Bias Evaluation for Hindi-English Machine Translation ( http://arxiv.org/abs/2312.03710v1 )

ライセンス: Link先を確認

Pushpdeep Singh

(参考訳) ニューラル機械翻訳(NMT)モデルは、翻訳の最先端であるが、しばしば社会的バイアス、特にジェンダーバイアスを反映する。既存の評価ベンチマークは主に翻訳の言語としての英語に焦点を当てている。英語以外のソース言語では、研究はしばしばバイアス評価のために性中立の文を用いるが、現実世界の文は、しばしば異なる形態の性別情報を含んでいる。したがって、そのようなソース文を用いてバイアスを評価することで、nmtモデルが偏りのある関係に頼るのではなく、文法的な性別の手がかりから性別を識別できるかどうかを判断する方が理にかなっている。これを説明するために、ヒンディー語に2つの性特化文セットを作成し、ヒンディー語(HI-EN)NMTシステムにおいて、ジェンダーバイアスを自動的に評価する。ソース言語における文法的ジェンダーマーカーを考慮したバイアス評価テストセットの調整の重要性を強調する。

Neural Machine Translation (NMT) models, though state-of-the-art for translation, often reflect social biases, particularly gender bias. Existing evaluation benchmarks primarily focus on English as the source language of translation. For source languages other than English, studies often employ gender-neutral sentences for bias evaluation, whereas real-world sentences frequently contain gender information in different forms. Therefore, it makes more sense to evaluate for bias using such source sentences to determine if NMT models can discern gender from the grammatical gender cues rather than relying on biased associations. To illustrate this, we create two gender-specific sentence sets in Hindi to automatically evaluate gender bias in various Hindi-English (HI-EN) NMT systems. We emphasise the significance of tailoring bias evaluation test sets to account for grammatical gender markers in the source language.

翻訳日:2023-12-11 03:20:32 公開日:2023-11-11

# ユーゴス・ジダンアイの「モデル」 : ウトリザンド PBL による多学際的考察

An\'alise e modelagem de jogos digitais: Relato de uma experi\^encia educacional utlizando PBL em um grupo multidisciplinar ( http://arxiv.org/abs/2311.14704v1 )

ライセンス: Link先を確認

David de Oliveira Lemes, Ezequiel Fran\c{c}a dos Santos, Eduardo Romanek, Celso Fujimoto, Adriano Felix Valente

(参考訳) Traditional software engineering education generally emphasizes strict collaboration and technical skills However active teaching strategies where students actively engage with the material transitioning from passive observers to active manipulators of realworld tools have shown effectiveness in software engineering The evolving market demands new skills in the context of digital transformation presenting challenges such as modeling complex business scenarios and navigating the interconnections between people systems and technologies Shifting from conventional software engineering instruction to active methodologies like ProblemBased Learning PBL has proven to bring realworld market challenges and realities into the classroom This article details an experience from the Digital Games Analysis and Modeling course in the Digital Games Masters program at Pontifical Catholic University of Sao Paulo It covers the discussed concepts case study rolebased work method and steps of the meetings We also present examples of outcomes like requirement diagrams context diagrams use case diagrams class diagrams interviews and others that contributed to the Game Design Document GDD These were created by each group during the meetings alongside their game prototypes Additionally a discussion on the developed capabilities is included

Traditional software engineering education generally emphasizes strict collaboration and technical skills However active teaching strategies where students actively engage with the material transitioning from passive observers to active manipulators of realworld tools have shown effectiveness in software engineering The evolving market demands new skills in the context of digital transformation presenting challenges such as modeling complex business scenarios and navigating the interconnections between people systems and technologies Shifting from conventional software engineering instruction to active methodologies like ProblemBased Learning PBL has proven to bring realworld market challenges and realities into the classroom This article details an experience from the Digital Games Analysis and Modeling course in the Digital Games Masters program at Pontifical Catholic University of Sao Paulo It covers the discussed concepts case study rolebased work method and steps of the meetings We also present examples of outcomes like requirement diagrams context diagrams use case diagrams class diagrams interviews and others that contributed to the Game Design Document GDD These were created by each group during the meetings alongside their game prototypes Additionally a discussion on the developed capabilities is included

翻訳日:2023-12-03 13:55:02 公開日:2023-11-11

# 医療におけるIoTの進歩と課題: 短いレビュー

Progression and Challenges of IoT in Healthcare: A Short Review ( http://arxiv.org/abs/2311.12869v1 )

ライセンス: Link先を確認

S M Atikur Rahman, Sifat Ibtisum, Priya Podder, S. M. Saokat Hossain

(参考訳) スマートヘルスケアは、コネクテッドライフの不可欠な要素であり、人間の基本的なニーズを満たす上で重要な役割を果たす。スマートヘルスケアの急成長する分野は、近い将来、かなりの収入を生み出す可能性がある。その多面的フレームワークは、IoT(Internet of Things)、医療センサー、人工知能(AI)、エッジとクラウドコンピューティング、および次世代無線通信技術といった重要なコンポーネントを含んでいる。多くの研究論文がスマートヘルスケアとヘルスケアをより広く議論している。インターネット・オブ・メディカル・モノ(IoMT)は、新型コロナウイルス(COVID-19)の感染拡大対策として多くの国で戦略的に配備されている。この共同作業は、最前線の医療従事者の安全性を高めるだけでなく、パンデミックの管理における全体的な効果を高め、その後の人命と死亡率への影響を減らした。 iomtドメイン内のアプリケーションと技術の両方で顕著な進歩がなされている。しかし、この技術的進歩は、特にセキュリティの領域において、特定の課題を提起したと認めることが不可欠である。世界中のIoMTの急速な普及により、セキュリティとプライバシーに関する問題が拡大した。これらには、リプレイ攻撃、中間者攻撃、偽装、特権的なインサイダー脅威、リモートハイジャック、パスワード推測、dos攻撃、マルウェア侵入など、さまざまな懸念が含まれている。本稿では,IoT環境におけるマルウェアの検出と防止を目的とした既存の戦略の比較分析を行う。

Smart healthcare, an integral element of connected living, plays a pivotal role in fulfilling a fundamental human need. The burgeoning field of smart healthcare is poised to generate substantial revenue in the foreseeable future. Its multifaceted framework encompasses vital components such as the Internet of Things (IoT), medical sensors, artificial intelligence (AI), edge and cloud computing, as well as next-generation wireless communication technologies. Many research papers discuss smart healthcare and healthcare more broadly. Numerous nations have strategically deployed the Internet of Medical Things (IoMT) alongside other measures to combat the propagation of COVID-19. This combined effort has not only enhanced the safety of frontline healthcare workers but has also augmented the overall efficacy in managing the pandemic, subsequently reducing its impact on human lives and mortality rates. Remarkable strides have been made in both applications and technology within the IoMT domain. However, it is imperative to acknowledge that this technological advancement has introduced certain challenges, particularly in the realm of security. The rapid and extensive adoption of IoMT worldwide has magnified issues related to security and privacy. These encompass a spectrum of concerns, ranging from replay attacks, man-in-the-middle attacks, impersonation, privileged insider threats, remote hijacking, password guessing, and denial of service (DoS) attacks, to malware incursions. In this comprehensive review, we undertake a comparative analysis of existing strategies designed for the detection and prevention of malware in IoT environments.

翻訳日:2023-11-27 00:22:51 公開日:2023-11-11

# セルフ・アテンションによるモデリング選択

Modeling Choice via Self-Attention ( http://arxiv.org/abs/2311.07607v1 )

ライセンス: Link先を確認

Joohwan Ko, Andrew A. Li

(参考訳) 選択モデルは、ソート、インベントリ、価格最適化など、オペレーション管理の分野における現在カノニカルな多くの最適化問題に対する基本的なインプットである。当然、データからこれらのモデルの正確な推定は、実際、これらの最適化問題の適用において重要なステップであり、理論上、実際上、ほぼ排他的にこの選択が達成されなければならないことは、おそらく驚きである。 (a) 深い学習を有意義な方法で使わずに、 (b)常に変化する指標による限られたデータの評価による。これは、機械学習の実践が示唆している、類似の学習アプリケーションの大部分とは対照的である。 (a)ニューラルネットワークベースのモデルは一般的に最先端であり、 (b)評価手順(データセット、メトリクス等)の厳格な標準化が不可欠である。そこで,我々はまず,現代のニューラルネットワークアーキテクチャの概念(自己注意)を成功(理論的にも実用的にも)するための選択モデルを提案する。理論的には、我々の注意に基づく選択モデルは、不合理な選択効果をパロニティに捉え、経験的成功を収めたHalo Multinomial Logitモデルの低ランクな一般化であることを示す。我々はHalo-MNLが推定に$\Omega(m^2)$のデータサンプルを必要とするのに対し、$m$は製品数である。次に、実データに対する選択推定のための最初の現実的なベンチマークを確立し、このベンチマークを使用して、現在までの既存の選択モデルの最大評価を実行します。短期データと長期データの両方において,提案モデルが支配的であることがわかった。

Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice, and so it is perhaps surprising that such choice estimation has to now been accomplished almost exclusively, both in theory and in practice, (a) without the use of deep learning in any meaningful way, and (b) via evaluation on limited data with constantly-changing metrics. This is in stark contrast to the vast majority of similar learning applications, for which the practice of machine learning suggests that (a) neural network-based models are typically state-of-the-art, and (b) strict standardization on evaluation procedures (datasets, metrics, etc.) is crucial. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit model, a recent model that parsimoniously captures irrational choice effects and has seen empirical success. We prove that whereas the Halo-MNL requires $\Omega(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. We then establish the first realistic-scale benchmark for choice estimation on real data and use this benchmark to run the largest evaluation of existing choice models to date. We find that the model we propose is dominant over both short-term and long-term data periods.

翻訳日:2023-11-15 17:13:10 公開日:2023-11-11

# 大規模言語モデルに対する概念モデル解釈

Conceptual Model Interpreter for Large Language Models ( http://arxiv.org/abs/2311.07605v1 )

ライセンス: Link先を確認

Felix H\"arer

(参考訳) 大規模言語モデル(llms)は最近、共通プログラミング言語でソースコードを生成する機能を実証した。さらに、chatgpt 4のような商用製品がコードインタプリタを提供し始め、生成されたコードフラグメントの自動実行、インスタントフィードバック、会話的な方法で開発と洗練を可能にするようになった。本稿では,探索的研究手法を用いて,概念モデルにコード生成と解釈を適用する。概念モデルインタプリタのコンセプトとプロトタイプについて検討し,Llama~2やChatGPT4といった最先端のLLMを用いて,テキスト構文で生成した視覚モデルをレンダリングする。特に、これらのLLMは、会話型ユーザインタフェース内で自動的にレンダリングされるPlanUMLとGraphvizモデリングソフトウェアのためのテキスト構文を生成することができる。最初の成果は、インタプリタやLLMとの対話に必要なコンポーネントをAPIまたはローカルで記述したアーキテクチャで、多くの商用およびオープンソースのLLMとインタプリタをサポートする。次に, ChatGPT 4 と Llama 2 で生成されたモデルの実験結果について,UML をカバーする2つの事例と,インスタンスレベルではカスタムデータから生成されたグラフについて考察する。その結果,対話的手法で反復的にモデリングする可能性が示唆された。

Large Language Models (LLMs) recently demonstrated capabilities for generating source code in common programming languages. Additionally, commercial products such as ChatGPT 4 started to provide code interpreters, allowing for the automatic execution of generated code fragments, instant feedback, and the possibility to develop and refine in a conversational fashion. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models. The concept and prototype of a conceptual model interpreter is explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface. The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.

翻訳日:2023-11-15 17:12:19 公開日:2023-11-11

# 公正なテキスト・画像拡散モデル

Finetuning Text-to-Image Diffusion Models for Fairness ( http://arxiv.org/abs/2311.07604v1 )

ライセンス: Link先を確認

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli

(参考訳) 社会におけるテキスト・画像拡散モデルの急速な導入は、彼らのバイアスに対処する緊急の必要性を浮き彫りにしている。介入がなければ、これらのバイアスは歪んだ世界観を広め、少数派グループの機会を制限することができる。本研究では,分布アライメント問題として公正性を考察する。提案手法は,(1) 生成した画像の特定の特性をユーザ定義対象分布に向ける分布アライメント損失,(2) 生成した画像に定義された損失をより効果的に最適化するためにバイアス勾配を利用する拡散モデルのサンプリングプロセスのバイアスド直接微調整という2つの技術的貢献からなる。経験的に、この方法は職業的プロンプトに対する性別、人種、交叉バイアスを著しく減少させる。わずか5つのソフトトークンを微調整しても、性別バイアスは大幅に減少する。本手法は, 性別と人種の偏差と同時に, 年齢を75\%$ youngと25\%$ oldに制御することで, 絶対的平等を超えた公平性の多様な視点をサポートする。最後に,これらのプロンプトを微調整データに含めることで,複数の概念を同時にデバイアスすることができる。私たちの仕事は、T2I生成AIのソーシャルアライメントを促進することを願っています。コードと様々なデバイアス拡散モデルアダプタを共有します。

The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.

翻訳日:2023-11-15 17:11:58 公開日:2023-11-11

# PECoP:行動品質評価のためのパラメータ効率的な連続事前学習

PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment ( http://arxiv.org/abs/2311.07603v1 )

ライセンス: Link先を確認

Amirhossein Dadashzadeh, Shuchao Duan, Alan Whone, Majid Mirmehdi

(参考訳) ラベル付きデータのAQA(Action Quality Assessment)での可用性の制限により、以前の作業では、大規模なドメイン汎用データセットで事前トレーニングされたモデルを微調整せざるを得なくなった。この一般的なアプローチは、特に大きな領域シフトがある場合、弱い一般化をもたらす。そこで本研究では,新たなパラメータ効率の高い連続的事前学習フレームワークであるpecopを提案する。 PECoPでは,事前学習モデルに挿入された3D-Adaptersを導入し,適応モジュールのパラメータのみを更新する自己教師型学習を通じて時空間的,ドメイン内情報を学習する。 AQAに適用された最近の最先端手法(MUSDL、CoRe、TSA)の性能を向上させるPECoPの能力を実証し、ベンチマークデータセット、JIGSAWS(\uparrow 6.0\%$), MTL-AQA(\uparrow0.99\%$), FineDiving(\uparrow2.54\%$)を大幅に改善した。また, パーキンソン病の新しいデータセットpd4tを提示し, 実際の患者が4つのアクションを行ない, 比較において, 最新値の$\uparrow3.56\%$を上回った。私たちのコード、事前トレーニングされたモデル、PD4Tデータセットはhttps://github.com/Plrbear/PECoP.orgで公開されています。

The limited availability of labelled data in Action Quality Assessment (AQA), has forced previous works to fine-tune their models pretrained on large-scale domain-general datasets. This common approach results in weak generalisation, particularly when there is a significant domain shift. We propose a novel, parameter efficient, continual pretraining framework, PECoP, to reduce such domain shift via an additional pretraining stage. In PECoP, we introduce 3D-Adapters, inserted into the pretrained model, to learn spatiotemporal, in-domain information via self-supervised learning where only the adapter modules' parameters are updated. We demonstrate PECoP's ability to enhance the performance of recent state-of-the-art methods (MUSDL, CoRe, and TSA) applied to AQA, leading to considerable improvements on benchmark datasets, JIGSAWS ($\uparrow6.0\%$), MTL-AQA ($\uparrow0.99\%$), and FineDiving ($\uparrow2.54\%$). We also present a new Parkinson's Disease dataset, PD4T, of real patients performing four various actions, where we surpass ($\uparrow3.56\%$) the state-of-the-art in comparison. Our code, pretrained models, and the PD4T dataset are available at https://github.com/Plrbear/PECoP.

翻訳日:2023-11-15 17:11:37 公開日:2023-11-11

# LLMのオンライン化 - 機会と課題

Online Advertisements with LLMs: Opportunities and Challenges ( http://arxiv.org/abs/2311.07601v1 )

ライセンス: Link先を確認

Soheil Feizi, MohammadTaghi Hajiaghayi, Keivan Rezaei, Suho Shin

(参考訳) 本稿では,オンライン広告システムにおけるLarge Language Models(LLM)の活用の可能性について検討する。プライバシ、レイテンシ、信頼性、ユーザと広告主の満足度など、そのようなシステムが満たさなければならない必須要件について検討する。さらに,修正,入札,予測,オークションモジュールからなるLCM広告の一般的なフレームワークを紹介する。各モジュールに対する異なる設計上の考慮事項が提示され、その実用性と実装に固有の技術的課題を詳細に検討する。

This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability, users and advertisers' satisfaction, which such a system must fulfill. We further introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules. Different design considerations for each module is presented, with an in-depth examination of their practicality and the technical challenges inherent to their implementation.

翻訳日:2023-11-15 17:11:08 公開日:2023-11-11

# ポラリメトリックパッチマッチマルチビューステレオ

Polarimetric PatchMatch Multi-View Stereo ( http://arxiv.org/abs/2311.07600v1 )

ライセンス: Link先を確認

Jinyu Zhao, Jumpei Oishi, Yusuke Monno, Masatoshi Okutomi

(参考訳) PatchMatch Multi-View Stereo (PatchMatch MVS)は、そのバランスの取れた精度と効率のため、人気のあるMVSアプローチの1つである。本稿では、PatchMatch MVSへの偏光キューを利用した最初の手法であるPolarPMS(PolarPMS)を提案する。 patchmatch mvsの鍵は、局所的な3d平面と傾斜ステレオマッチングウィンドウを形成する深さと通常の仮説を生成し、マルチビュー画像間の一貫性に基づいて最適な仮説を効率的に探索することである。標準光度整合性に加えて、偏光度情報と物体表面の正常性の関係が物理的性質によって動機付けられた、深さと正規仮説の有効性を評価するために偏光度整合性を評価する。 PatchMatch MVS法と比較して,PolaPMSはテクスチャレス表面における再構成3次元モデルの精度と完全性を向上させることができることを示した。

PatchMatch Multi-View Stereo (PatchMatch MVS) is one of the popular MVS approaches, owing to its balanced accuracy and efficiency. In this paper, we propose Polarimetric PatchMatch multi-view Stereo (PolarPMS), which is the first method exploiting polarization cues to PatchMatch MVS. The key of PatchMatch MVS is to generate depth and normal hypotheses, which form local 3D planes and slanted stereo matching windows, and efficiently search for the best hypothesis based on the consistency among multi-view images. In addition to standard photometric consistency, our PolarPMS evaluates polarimetric consistency to assess the validness of a depth and normal hypothesis, motivated by the physical property that the polarimetric information is related to the object's surface normal. Experimental results demonstrate that our PolarPMS can improve the accuracy and the completeness of reconstructed 3D models, especially for texture-less surfaces, compared with state-of-the-art PatchMatch MVS methods.

翻訳日:2023-11-15 17:11:00 公開日:2023-11-11

# LLM応答における意図的ビアーゼ

Intentional Biases in LLM Responses ( http://arxiv.org/abs/2311.07611v1 )

ライセンス: Link先を確認

Nicklaus Badyal, Derek Jacoby, Yvonne Coady

(参考訳) 本研究では,対話型メディアのための特定のペルソナを作成するために,大規模言語モデル応答にバイアスを意図的に導入する。 Falcon-7bのようなオープンソースモデルとOpen AIのGPT-4モデルの違いについて検討し、2つのシステムで得られる応答の差を定量化する。専門家モデルと監督官を混合したGPT-4のガードレールは、一般にAIアライメントを確保するのに有用であるが、様々な不一般的な視点でペルソナを構築するのに有害であることがわかった。本研究の目的は,これらのプラクティスを創造的分野やメディアの新たな形態に適用できるような,大規模言語モデルの意図的バイアスにおける将来の探索の基盤となることにある。

In this study we intentionally introduce biases into large language model responses in an attempt to create specific personas for interactive media purposes. We explore the differences between open source models such as Falcon-7b and the GPT-4 model from Open AI, and we quantify some differences in responses afforded by the two systems. We find that the guardrails in the GPT-4 mixture of experts models with a supervisor, while useful in assuring AI alignment in general, are detrimental in trying to construct personas with a variety of uncommon viewpoints. This study aims to set the groundwork for future exploration in intentional biases of large language models such that these practices can be applied in the creative field, and new forms of media.

翻訳日:2023-11-15 16:56:01 公開日:2023-11-11

# 網膜底像による心血管疾患と危険因子の評価における人工知能 : 過去10年間のレビュー

Artificial Intelligence in Assessing Cardiovascular Diseases and Risk Factors via Retinal Fundus Images: A Review of the Last Decade ( http://arxiv.org/abs/2311.07609v1 )

ライセンス: Link先を確認

Mirsaeed Abdollahi, Ali Jafarizadeh, Amirhosein Ghafouri Asbagh, Navid Sobhi, Keysan Pourmoghtader, Siamak Pedrammehr, Houshyar Asadi, Roohallah Alizadehsani, Ru-San Tan, U. Rajendra Acharya

(参考訳) 背景:心血管疾患(cvds)は世界規模で死亡率の主要な原因であり続けている。近年、人工知能(AI)技術の応用、特に深層学習(DL)は、CVDの様々な側面を評価することでかなりの人気を集めている。また、眼底画像と光学コヒーレンス断層撮影(optical coherence tomography angiography:octa)を用いて網膜疾患の診断法が広く研究されている。心臓の機能をよりよく理解し、微小血管の特徴と機能に基づく変化を予想するために、研究者は現在、AIと非侵襲網膜スキャンの統合を検討している。 AIを利用した大規模心血管疾患の早期発見と予測の活用は、循環器疾患の緩和と医療システムの経済的負担軽減に優れた可能性をもたらす。 Method: PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, ACM Digital Libraryなど,さまざまなデータベースに対して,心臓血管疾患や人工知能に関連する特定のキーワードを使用して包括的な検索を行った。結果:本研究に関連性のある87の英文出版物が収録され,追加の参考文献が検討された。本研究では, 網膜イメージングと人工知能を用いて心血管疾患を同定する現在の進歩と課題について概観し, この分野のさらなる探究への洞察を提供する。結論: 高齢化とグローバルCVD負荷の増加に伴い, 正確な疾患予後パターンの発達を目指す。 AIとディープラーニングは医療を変革させており、医療システムにおける迅速な採用の必要性にもかかわらず、網膜画像に基づく様々なCVDの診断の可能性を提供している。

Background: Cardiovascular diseases (CVDs) continue to be the leading cause of mortality on a global scale. In recent years, the application of artificial intelligence (AI) techniques, particularly deep learning (DL), has gained considerable popularity for evaluating the various aspects of CVDs. Moreover, using fundus images and optical coherence tomography angiography (OCTA) to diagnose retinal diseases has been extensively studied. To better understand heart function and anticipate changes based on microvascular characteristics and function, researchers are currently exploring the integration of AI with non-invasive retinal scanning. Leveraging AI-assisted early detection and prediction of cardiovascular diseases on a large scale holds excellent potential to mitigate cardiovascular events and alleviate the economic burden on healthcare systems. Method: A comprehensive search was conducted across various databases, including PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, and ACM Digital Library, using specific keywords related to cardiovascular diseases and artificial intelligence. Results: A total of 87 English-language publications, selected for relevance were included in the study, and additional references were considered. This study presents an overview of the current advancements and challenges in employing retinal imaging and artificial intelligence to identify cardiovascular disorders and provides insights for further exploration in this field. Conclusion: Researchers aim to develop precise disease prognosis patterns as the aging population and global CVD burden increase. AI and deep learning are transforming healthcare, offering the potential for single retinal image-based diagnosis of various CVDs, albeit with the need for accelerated adoption in healthcare systems.

翻訳日:2023-11-15 16:55:47 公開日:2023-11-11

# 病院入室予測のためのマルチモーダル時空間グラフ変換器

MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction ( http://arxiv.org/abs/2311.07608v1 )

ライセンス: Link先を確認

Yan Miao, Lequan Yu

(参考訳) 病院の入院率予測は,医療システムの品質と有効性を評価する上で重要な要因である,入院率の低下に不可欠なアプローチと考えられる。これまでの研究では、電子健康記録(ehr)、医療画像、臨床ノートの3つの主要な特徴を利用して病院の入院を予測している。しかし,これらの研究の大部分は,3つのモーダルからの情報の統合や,データセットに存在する時空間的関係の活用は行わなかった。本研究は,病院入退院予測のためのMultimodal Spatiotemporal Graph-Transformer (MuST) と呼ばれる新しいモデルを提案する。グラフ畳み込みネットワークと時間変換器を用いることで,ERHや胸部X線撮影における空間的および時間的依存関係を効果的に捉えることができる。次に,前述した2つのモードの時空間的特徴と,事前学習したドメイン固有変換器から抽出した臨床メモの特徴を組み合わせた融合変圧器を提案する。提案手法の有効性を,最新の公開データセットMIMIC-IVを用いて評価した。実験結果から, MuST にマルチモーダルな特徴を組み込むことで, 単調な手法と比較して性能が向上することが示唆された。さらに,本研究のパイプラインは,院内寛容の予測において,現在の先行手法よりも優れていた。

Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studies did not integrate information from all three modalities or utilize the spatiotemporal relationships present in the dataset. This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) for predicting hospital readmissions. By employing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. We then propose a fusion transformer to combine the spatiotemporal features from the two modalities mentioned above with the features from clinical notes extracted by a pre-trained, domain-specific transformer. We assess the effectiveness of our methods using the latest publicly available dataset, MIMIC-IV. The experimental results indicate that the inclusion of multimodal features in MuST improves its performance in comparison to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in the prediction of hospital readmissions.

翻訳日:2023-11-15 16:55:19 公開日:2023-11-11

# グラフ正規化テンソル補完のための交互最小化アルゴリズム

Alternating minimization algorithms for graph regularized tensor completion ( http://arxiv.org/abs/2008.12876v2 )

ライセンス: Link先を確認

Yu Guan, Shuyu Dong, Bin Gao, P.-A. Absil, Fran\c{c}ois Glineur

(参考訳) cp因子行列上のグラフラプラシアン正則化を通した外部対関係を組み込むことにより、低ランクテンソル完全化(lrtc)に対する正準多進(cp)分解法を考える。グラフ正規化の利用にはLRTCの学習精度の利点が伴うが、同時にテンソル完備モデルの最適化を妨げる結合グラフラプラシアン項を誘導する。グラフ正規化lrtcの解法として,cp分解ベースモデルのブロック構造を活用し,効率的な交互最小化アルゴリズムを提案する。交互最小化の部分問題に対して、線形共役勾配サブルーチンはグラフ正規化lrtcに特に適合する。あるいは、乗算器の交互方向法を用いてグラフラプラシアン項の複雑結合効果を回避する。 Kurdyka-{\L}ojasiewicz の性質に基づき、提案アルゴリズムによって生成される列が、対象関数の臨界点にグローバルに収束することを示す。さらに、複雑性と収束率も導出される。さらに、合成データや実データを含む数値実験により、グラフ正規化テンソル補完モデルがグラフ正規化のないモデルに比べて回復結果が向上し、既存のアルゴリズムよりも時間効率が向上することを示した。

We consider a Canonical Polyadic (CP) decomposition approach to low-rank tensor completion (LRTC) by incorporating external pairwise similarity relations through graph Laplacian regularization on the CP factor matrices. The usage of graph regularization entails benefits in the learning accuracy of LRTC, but at the same time, induces coupling graph Laplacian terms that hinder the optimization of the tensor completion model. In order to solve graph-regularized LRTC, we propose efficient alternating minimization algorithms by leveraging the block structure of the underlying CP decomposition-based model. For the subproblems of alternating minimization, a linear conjugate gradient subroutine is specifically adapted to graph-regularized LRTC. Alternatively, we circumvent the complicating coupling effects of graph Laplacian terms by using an alternating directions method of multipliers. Based on the Kurdyka-{\L}ojasiewicz property, we show that the sequence generated by the proposed algorithms globally converges to a critical point of the objective function. Moreover, the complexity and convergence rate are also derived. In addition, numerical experiments including synthetic data and real data show that the graph regularized tensor completion model has improved recovery results compared to those without graph regularization, and that the proposed algorithms achieve gains in time efficiency over existing algorithms.

翻訳日:2023-11-15 01:00:27 公開日:2023-11-11

# 行動概念を用いたAI説明手法の診断

Diagnosing AI Explanation Methods with Folk Concepts of Behavior ( http://arxiv.org/abs/2201.11239v4 )

ライセンス: Link先を確認

Alon Jacovi, Jasmijn Bastings, Sebastian Gehrmann, Yoav Goldberg, Katja Filippova

(参考訳) 我々は,AIの説明が成功する条件に対する形式主義について検討する。我々は「成功」は、説明がどんな情報を含んでいるかだけでなく、説明者が理解している情報にも依存すると考える。心の文学の理論は、人間が行動を理解し、一般化するために使用する民間概念を論じる。行動の民俗概念は、人間が行動を理解する「言語」をもたらすと仮定する。我々は、説明的物語(第1図)の青写真を導入して、人間の説明者による*社会的帰属*の枠組みとして、人間が説明から理解する可能性が高い情報構造として、これらの民間概念を使用します。そして,今日,多くのXAI手法が質的評価において民生的な行動概念にマッピング可能であることを示す。これにより、現在のメソッドがうまく説明できないように、障害モードを明らかにすることができます。つまり、任意のXAIメソッドに欠けている情報構造であり、AIの動作が誤解される可能性を減らすことができます。

We investigate a formalism for the conditions of a successful explanation of AI. We consider "success" to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a "language" that humans understand behavior with. We use these folk concepts as a framework of *social attribution* by the human explainee -- the information constructs that humans are likely to comprehend from explanations -- by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully -- i.e., the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior.

翻訳日:2023-11-15 00:54:00 公開日:2023-11-11

# 自律走行車用カメラの物理的状態のモニタリングと適応

Monitoring and Adapting the Physical State of a Camera for Autonomous Vehicles ( http://arxiv.org/abs/2112.05456v3 )

ライセンス: Link先を確認

Maik Wischow and Guillermo Gallego and Ines Ernst and Anko B\"orner

(参考訳) 自動運転車とロボットは、現代のタスクの要求を満たすために、ますます堅牢さと信頼性を必要としている。これらの要件は、特にそのような車両に搭載されているカメラに適用される。カメラは適切な機能を維持し、必要に応じて自動的な対策を講じなければならない。既存のソリューションは、通常、特定の問題に合わせて調整されるか、マシンの下流のコンピュータビジョンタスクから分離される。本稿では,データおよび物理モデルに基づくカメラの汎用的・タスク指向型自己維持フレームワークを提案する。そこで本研究では,従来およびカスタマイズされた機械学習に基づくアプローチを広範囲な実験で評価することにより,カメラの典型的な画像効果(ブラインド,ノイズ現象,および最も一般的な組み合わせ)に対する信頼性の高い2つの実時間可能な推定器を決定する。さらに,実世界の地上車両にそのフレームワークを実装し,カメラがパラメータを調整して識別不良条件に対抗し,実験的な(非線形および非単調な)入出力性能曲線に基づく最適な適用能力を達成する方法を示す。対象のアプリケーションとして物体検出が選択され、画像が動きのぼやけやセンサノイズを条件付けする例となる。私たちのフレームワークは、カメラの健全性を監視し、維持するための実用的なソリューションを提供するだけでなく、完全な信頼性と堅牢なマシンを達成するために、経験的にデータソース(例えば、センサーや環境パラメータ)を結合するより高度な問題に取り組むための拡張の基盤としても機能します。コード:https://github.com/MaikWischow/Camera-Condition-Monitoring

Autonomous vehicles and robots require increasingly more robustness and reliability to meet the demands of modern tasks. These requirements specially apply to cameras onboard such vehicles because they are the predominant sensors to acquire information about the environment and support actions. Cameras must maintain proper functionality and take automatic countermeasures if necessary. Existing solutions are typically tailored to specific problems or detached from the downstream computer vision tasks of the machines, which, however, determine the requirements on the quality of the produced camera images. We propose a generic and task-oriented self-health-maintenance framework for cameras based on data- and physically-grounded models. To this end, we determine two reliable, real-time capable estimators for typical image effects of a camera in poor condition (blur, noise phenomena and most common combinations) by evaluating traditional and customized machine learning-based approaches in extensive experiments. Furthermore, we implement the framework on a real-world ground vehicle and demonstrate how a camera can adjust its parameters to counter an identified poor condition to achieve optimal application capability based on experimental (non-linear and non-monotonic) input-output performance curves. Object detection is chosen as target application, and the image effects motion blur and sensor noise as conditioning examples. Our framework not only provides a practical ready-to-use solution to monitor and maintain the health of cameras, but can also serve as a basis for extensions to tackle more sophisticated problems that combine additional data sources (e.g., sensor or environment parameters) empirically in order to attain fully reliable and robust machines. Code: https://github.com/MaikWischow/Camera-Condition-Monitoring

翻訳日:2023-11-15 00:53:14 公開日:2023-11-11

# サブガンマ次数列が増加するネットワークモデルのクラスにおける漸近性

Asymptotic in a class of network models with an increasing sub-Gamma degree sequence ( http://arxiv.org/abs/2111.01301v4 )

ライセンス: Link先を確認

Jing Luo, Haoyu Wei, Xiaoyu Lei, Jiaxin Guo

(参考訳) サブガンマノイズ下の微分プライバシーについては、一般リンク関数を持つバイナリ値を持つネットワークモデルのクラスにおける漸近特性を導出する。本稿では、離散的なLaplace機構を特別なケースとして、一般的な雑音機構の下でバイナリネットワークの次数列を解放する。ネットワークモデルのクラスにおいてパラメータの数が無限度に達すると、パラメータ推定器の一貫性と漸近正規性の両方を含む漸近結果を確立する。漸近的な結果を示すシミュレーションと実データ例が提供される。

For the differential privacy under the sub-Gamma noise, we derive the asymptotic properties of a class of network models with binary values with a general link function. In this paper, we release the degree sequences of the binary networks under a general noisy mechanism with the discrete Laplace mechanism as a special case. We establish the asymptotic result including both consistency and asymptotically normality of the parameter estimator when the number of parameters goes to infinity in a class of network models. Simulations and a real data example are provided to illustrate asymptotic results.

翻訳日:2023-11-15 00:51:58 公開日:2023-11-11

# 駆動型量子Rabiモデルにおける2光子側バンド遷移 : 導出長手駆動と回転波近似を超えての定量的議論

Two-photon sideband transition in a driven quantum Rabi model : Quantitative discussions with derived longitudinal drives and beyond the rotating wave approximation ( http://arxiv.org/abs/2108.00137v2 )

ライセンス: Link先を確認

Byoung-moo Ann, Wouter Kessels, Gary A. Steele

(参考訳) 本研究では、駆動量子ラビモデル(QRM)のサイドバンド遷移ダイナミクスを解析的および数値的に研究する。特に、外部横方向駆動フィールドが一階側バンド遷移を誘導する条件に着目する。 2つの異なるシステム間のサイドバンド遷移を誘導することは、QRMを含む様々な物理モデルにとって重要な技術である。しかしながら、その重要性にもかかわらず、全てのシステムパラメータ構成に適用可能な駆動qrmのサイドバンド遷移率をうまく説明できる正確な分析研究はまだ報告されていない。本研究では、回転波近似 (rwa) \cite{rwa} に依存しない、二階摂動理論に基づくサイドバンド遷移率を解析的に導出する。計算式はドライブ周波数とシステムのパラメータのあらゆる範囲で有効である。解析的導出式は、適度な駆動振幅の系における数値結果とよく一致する。興味深いことに、横駆動ハミルトニアンから得られる非自明な縦駆動効果が発見された。このことは、導出長手効果を考慮せずに期待されるサイドバンド遷移率を著しく補正する。このアプローチを用いることで、QRMのサイドバンド遷移速度を特定のパラメータレギュレーション内に収まらないように正確に推定することができる。これは、駆動QRMによって記述された実験を理解するための重要な貢献である。

In this work, we analytically and numerically study the sideband transition dynamics of the driven quantum Rabi model (QRM). We focus in particular on the conditions when the external transverse drive fields induce first-order sideband transitions. Inducing sideband transitions between two different systems is an essential technique for various physical models, including the QRM. However, despite its importance, a precise analytical study has not been reported yet that successfully explains the sideband transition rates in a driven QRM applicable for all system parameter configurations. In our study, we analytically derive the sideband transition rates based on second-order perturbation theory, not relying on the rotating wave approximation (RWA) \cite{RWA}. Our formula are valid for all ranges of drive frequencies and system's parameters. Our analytical derived formula agrees well with the numerical results in a regime of moderate drive amplitudes. Interestingly, we have found a non-trivial longitudinal drive effect derived from the transverse drive Hamiltonian. This accounts for significant corrections to the sideband transition rates that are expected without considering the derived longitudinal effect. Using this approach, one can precisely estimate the sideband transition rates in the QRM not confining themselves within specific parameter regimes. This provides important contributions for understanding experiments described by the driven QRM.

翻訳日:2023-11-15 00:51:39 公開日:2023-11-11

# 量子データ圧縮と量子クロスエントロピー

Quantum Data Compression and Quantum Cross Entropy ( http://arxiv.org/abs/2106.13823v3 )

ライセンス: Link先を確認

Zhou Shangnan

(参考訳) 量子機械学習の新たな分野は、量子コンピューティングと人工知能の視点に革命をもたらす可能性がある。量子機械学習の実証的な領域では、理論的な空白が持続する。本稿では,古典的クロスエントロピーに匹敵する量子クロスエントロピーを強調することにより,このギャップに対処する。我々は、量子データ圧縮における量子クロスエントロピーの役割を確立し、それが準最適量子源符号化の圧縮速度として機能することを実証することによって、基礎的な機械学習タスクである量子データ圧縮において果たす。我々のアプローチは、可変長符号化の量子一般化と量子強典型性の原理に基づく、新しい普遍的な量子データ圧縮プロトコルである。これは量子クロスエントロピーが量子機械学習アルゴリズムの損失関数として効果的に機能することを明らかにする。さらに、量子クロスエントロピーの最小値はフォン・ノイマンエントロピーと一致し、最適な圧縮速度としての役割を補強し、量子機械学習の理論的枠組みを理解する上での意義を強調する。

The emerging field of quantum machine learning has the potential of revolutionizing our perspectives of quantum computing and artificial intelligence. In the predominantly empirical realm of quantum machine learning, a theoretical void persists. This paper addresses the gap by highlighting the quantum cross entropy, a pivotal counterpart to the classical cross entropy. We establish quantum cross entropy's role in quantum data compression, a fundamental machine learning task, by demonstrating that it acts as the compression rate for sub-optimal quantum source coding. Our approach involves a novel, universal quantum data compression protocol based on the quantum generalization of variable-length coding and the principle of quantum strong typicality. This reveals that quantum cross entropy can effectively serve as a loss function in quantum machine learning algorithms. Furthermore, we illustrate that the minimum of quantum cross entropy aligns with the von Neumann entropy, reinforcing its role as the optimal compression rate and underscoring its significance in advancing our understanding of quantum machine learning's theoretical framework.

翻訳日:2023-11-15 00:51:02 公開日:2023-11-11

# 緩やかに不安定な回帰

Slowly Varying Regression under Sparsity ( http://arxiv.org/abs/2102.10773v5 )

ライセンス: Link先を確認

Dimitris Bertsimas, Vassilis Digalakis Jr, Michael Linghzi Li, Omar Skali Lami

(参考訳) 疎度下での緩やかな回帰の枠組みを示し、スパース回帰モデルでは緩やかでスパースな回帰を示す。パラメータ推定の問題は混合整数最適化問題として定式化される。新たな緩和手法により,二項凸最適化問題として正確に再構成できることを実証した。この緩和はムーア・ペンローズ逆数に対する新しい等式を含み、すべての実現可能な二分点上の元の目的と一致しながら非凸目的函数を凸化する。これにより,切断平面型アルゴリズムを用いて効率よく最適性を証明できる。このアルゴリズムの高度に最適化された実装を開発し、簡単な実装の漸近的計算複雑性を大幅に改善する。さらに,実現可能な解を保証する高速ヒューリスティック手法を提案し,実証的に示すように,二項最適化問題に対する高品質なウォームスタート解を生成する。フレームワークのハイパーパラメータを調整するために、ある仮定の下では、真のモデルパラメータを復元することが保証されるバイナリ検索に依存する実用的な手順を提案する。合成データと実世界のデータの両方について, 推定精度, 予測能力, 計算時間など, 様々な指標で比較して, 結果のアルゴリズムが競合する定式化を上回っていることを示す。アルゴリズムは非常にスケーラブルで、数千のパラメータでモデルをトレーニングすることができます。実装はhttps://github.com/vvdigalakis/ssvregression.gitで公開しています。

We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equality on Moore-Penrose inverses, convexifying the non-convex objective function while matching the original objective on all feasible binary points. This enables us to efficiently solve the problem to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of this algorithm, substantially improving upon the asymptotic computational complexity of a straightforward implementation. Additionally, we propose a fast heuristic method that guarantees a feasible solution and, as empirically illustrated, produces high-quality warm-start solutions for the binary optimization problem. To tune the framework's hyperparameters, we suggest a practical procedure relying on binary search that, under certain assumptions, is guaranteed to recover the true model parameters. On both synthetic and real-world datasets, we demonstrate that the resulting algorithm outperforms competing formulations in comparable times across various metrics, including estimation accuracy, predictive power, and computational time. The algorithm is highly scalable, allowing us to train models with thousands of parameters. Our implementation is available open-source at https://github.com/vvdigalakis/SSVRegression.git.

翻訳日:2023-11-15 00:50:21 公開日:2023-11-11

# 凝集関数を持つファジィ推論のMPおよびMT特性

MP and MT properties of fuzzy inference with aggregation function ( http://arxiv.org/abs/2205.01269v2 )

ライセンス: Link先を確認

Dechao Li and Mengying He

(参考訳) 2つの基本的なファジィ推論モデルとして、ファジィモーダスポネン(fmp)とファジィモーダストレン(fmt)は人工知能において重要な応用である。 FMPとFMTの問題を解決するために、Zadeh氏は推論の合成規則(CRI)を提案した。本稿では,A-compositional rule of inference(ACRI)法の有効性を,論理的視点と補間的視点から,集約関数に基づく一般化されたCRI法として検討することを目的とする。具体的には, acri法のmodus ponens (mp) と modus tollens (mt) 特性について詳細に述べる。 FMP問題とFMT問題を実装する集約関数は、それぞれT-条件性、U-条件性、O-条件性の法則としてよく知られているt-ノルム、ユニノム、重なり関数よりもより一般性を示す。さらに、理論的結果を説明するための2つの例も提示されている。特に、例 6.2 は fmp(fmt) 問題における出力 b' が、ファジィの入力とファジィ規則の先行項が近いときに提案する推論法で b(dc) に近いことを示している(ファジィ規則における接点の否定に近いファジィ入力)。

As the two basic fuzzy inference models, fuzzy modus ponens (FMP) and fuzzy modus tollens (FMT) have the important application in artificial intelligence. In order to solve FMP and FMT problems, Zadeh proposed a compositional rule of inference (CRI) method. This paper aims mainly to investigate the validity of A-compositional rule of inference (ACRI) method, as a generalized CRI method based on aggregation functions, from a logical view and an interpolative view, respectively. Specifically, the modus ponens (MP) and modus tollens (MT) properties of ACRI method are discussed in detail. It is shown that the aggregation functions to implement FMP and FMT problems provide more generality than the t-norms, uninorms and overlap functions as well-known the laws of T-conditionality, U-conditionality and O-conditionality, respectively. Moreover, two examples are also given to illustrate our theoretical results. Especially, Example 6.2 shows that the output B' in FMP(FMT) problem is close to B(DC) with our proposed inference method when the fuzzy input and the antecedent of fuzzy rule are near (the fuzzy input near with the negation of the seccedent in fuzzy rule).

翻訳日:2023-11-14 23:06:42 公開日:2023-11-11

# GUPはER=EPRのモデルとして機能するか?

Could GUP Act as a Model for the ER=EPR Conjecture? ( http://arxiv.org/abs/2210.13974v5 )

ライセンス: Link先を確認

Ahmed Farag Ali

(参考訳) アインシュタイン、ポドルスキー、ローゼン(epr)は思考実験を通じて、不確実性原理は現実の完全な説明を提供しないかもしれないと提案した。線形一般化不確実性原理(GUP)は,最小測定可能な長さで消失不確実性を示すことによって,EPRパラドックスを解くことができる。これは量子力学の完全性に光を当てることで、線形 GUP とベケンシュタイン境界の間の等価性、すなわち物理系を量子レベルまで完全に記述するのに必要となる情報の最大量を規定する境界を提案することができる。この等価性は、水素原子/核半径と宇宙定数の値を説明することによって検証される。最近の研究では、アインシュタイン・ローゼン橋(ER)が最小長(GUP)に由来することが確認された。これらの結果を踏まえ、線形 GUP が ER=EPR 予想のモデルとして機能することを提案する。

Einstein, Podolsky, and Rosen (EPR) proposed, via a thought experiment, that the uncertainty principle might not provide a complete description of reality. We propose that the linear generalized uncertainty principle (GUP) may resolve the EPR paradox by demonstrating vanishing uncertainty at the minimal measurable length. This may shed light on the completeness of quantum mechanics which leads us to propose an equivalency between the linear GUP and the Bekenstein bound, a bound that prescribes the maximum amount of information needed to completely describe a physical system up to quantum level. This equivalency is verified through explaining the Hydrogen's atom/nuclei radii as well as the value of the cosmological constant. In a recent published study, we verified that the Einstein-Rosen (ER) bridge originates from the minimal length or GUP. Considering these findings together, we propose that linear GUP could function as a model for the ER=EPR conjecture.

翻訳日:2023-11-14 22:54:40 公開日:2023-11-11

# 限定ラベルデータを用いたハイブリッド融合型解釈可能なマルチモーダル感情認識

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Limited Labelled Data ( http://arxiv.org/abs/2208.11450v2 )

ライセンス: Link先を確認

Puneet Kumar, Sarthak Malik, Balasubramanian Raman and Xiaobai Li

(参考訳) 本稿では,画像,音声,テキストを含むマルチモーダル入力に反映される感情を離散クラスに分類するマルチモーダル感情認識システム visual spoken textual additive net (vista net) を提案する。 K-Average Additive exPlanation (KAAP) と呼ばれる新しい解釈可能性技術も開発され、視覚的、音声的、テキスト的特徴を識別し、特定の感情クラスを予測する。 VISTAネットは、早期融合と後期融合のハイブリッドを用いて、画像、音声、テキストモダリティから情報を融合する。重み付け平均を計算しながら、中間出力の重みを自動的に調整する。 KAAP技術は、特定の感情のクラスを予測するために、各モダリティと対応する特徴の寄与を計算する。離散感情クラスでラベル付けされたマルチモーダル感情データセットの不十分さを軽減するために,画像,対応する音声,テキスト,感情ラベル(「angry」,「happy」,「hate」,「sad」)からなる大規模iit-r mmemorecデータセットを構築した。 VISTAネットは、IIT-R MMEmoRecデータセット上で、視覚的、音声的、テキスト的モダリティを使用して、95.99\%の感情認識精度を達成している。

This paper proposes a multimodal emotion recognition system, VIsual Spoken Textual Additive Net (VISTA Net), to classify emotions reflected by multimodal input containing image, speech, and text into discrete classes. A new interpretability technique, K-Average Additive exPlanation (KAAP), has also been developed that identifies important visual, spoken, and textual features leading to predicting a particular emotion class. The VISTA Net fuses information from image, speech, and text modalities using a hybrid of early and late fusion. It automatically adjusts the weights of their intermediate outputs while computing the weighted average. The KAAP technique computes the contribution of each modality and corresponding features toward predicting a particular emotion class. To mitigate the insufficiency of multimodal emotion datasets labeled with discrete emotion classes, we have constructed a large-scale IIT-R MMEmoRec dataset consisting of images, corresponding speech and text, and emotion labels ('angry,' 'happy,' 'hate,' and 'sad'). The VISTA Net has resulted in 95.99\% emotion recognition accuracy on the IIT-R MMEmoRec dataset on using visual, audio, and textual modalities, outperforming when using any one or two modalities.

翻訳日:2023-11-14 22:52:25 公開日:2023-11-11

# エージェントベースの市場モデルと相互作用する単純な学習エージェント

A simple learning agent interacting with an agent-based market model ( http://arxiv.org/abs/2208.10434v4 )

ライセンス: Link先を確認

Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie

(参考訳) 本稿では,イベント駆動型金融市場モデルと相互作用する単一強化学習最適実行取引エージェントの学習ダイナミクスについて考察する。トレーディングはイベント時にマッチングエンジンを介して非同期に行われる。最適な実行エージェントは、初期オーダーサイズと異なるサイズの状態空間の異なるレベルで考慮される。エージェントベースのモデルと市場への影響は、経験的スタイル化された事実と価格影響曲線の変化を探索するキャリブレーションアプローチを用いて考慮される。収束、ボリューム軌道、アクショントレースプロットは学習ダイナミクスを視覚化するために使用される。ここで、より小さな状態空間エージェントは、訪問した状態がより大きな状態空間エージェントよりもずっと早く収束し、スプレッド状態とボリューム状態を使って直感的に取引を学べるようになった。モデルのモーメントは,戦略的な秩序分割の導入によって低下したHurst指数を除いて,学習エージェントの影響に対して堅牢であることがわかった。学習エージェントの導入は、価格影響曲線の形状を保ちながら、取引量の増加に伴うトレードサイン自己相関を低減できる。

We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.

翻訳日:2023-11-14 22:51:51 公開日:2023-11-11

# immesh:lidarの即時ローカライズとメッシュ化フレームワーク

ImMesh: An Immediate LiDAR Localization and Meshing Framework ( http://arxiv.org/abs/2301.05206v3 )

ライセンス: Link先を確認

Jiarong Lin, Chongjiang Yuan, Yixi Cai, Haotian Li, Yunfan Ren, Yuying Zou, Xiaoping Hong and Fu Zhang

(参考訳) 本稿では,リアルタイムの同時局所化とメッシュ化を実現するために,新しいLiDAR(-inertial odometry and mapping framework)を提案する。このフレームワークはImMeshと呼ばれ、レシーバ、ローカライゼーション、メッシュ、ブロードキャストの4つの密結合モジュールで構成されている。ローカライゼーションモジュールは、受信機から推定されるセンサデータを利用し、LiDARスキャンを地図に登録してオンラインのポーズを推定し、地図を動的に成長させる。そして、私たちのメッシュモジュールは登録済みのLiDARスキャンを使って、オンザフライでトライアングルメッシュを漸進的に再構築します。最後に、リアルタイムのオドメトリ、マップ、メッシュをブロードキャストで公開します。この研究の主な貢献は、効率的な階層的なボクセル構造によってシーンを表現するメッシュモジュールであり、新しいスキャンで観察されたボクセルの高速発見を実行し、各ボクセルの三角形のファセットを漸進的に再構築する。このボクセルワイドメッシュ操作は、効率性のために微妙に設計され、まず、ボクセルに含まれる2次元局所平面に3Dポイントを投影し、次に、三角形の面を漸進的に再構成するためのプル、コミット、プッシュステップでメッシュ操作を実行する。私たちの知る限りでは、gpuアクセラレーションなしで標準的なcpuに頼るだけで、大規模なシーンのトライアングルメッシュをオンラインで再構築できる文学作品はこれが初めてです。私たちの発見を共有し、コミュニティへのコントリビューションをするために、私たちのコードをGitHubで公開しています。

In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The key contribution of this work is the meshing module, which represents a scene by an efficient hierarchical voxels structure, performs fast finding of voxels observed by new scans, and reconstructs triangle facets in each voxel in an incremental manner. This voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration. To share our findings and make contributions to the community, we make our code publicly available on our GitHub: https://github.com/hku-mars/ImMesh.

翻訳日:2023-11-14 22:44:13 公開日:2023-11-11

# セマンティックスを駆使したコミュニケーション:テュートリアル・クム・サーベイ

Semantics-Empowered Communication: A Tutorial-cum-Survey ( http://arxiv.org/abs/2212.08487v5 )

ライセンス: Link先を確認

Zhilin Lu, Rongpeng Li, Kun Lu, Xianfu Chen, Ekram Hossain, Zhifeng Zhao, and Honggang Zhang

(参考訳) セマンティクス・エミュレーション・コミュニケーション(semcom, semantics-empowered communication, semcom)研究の興隆とともに、学界と産業の両方において、幅広い側面(理論、応用、メトリクス、実装など)に対する前例のない関心が高まっている。本研究の目的は,背景分類学と研究分類学の両方に関する総合的な調査と,詳細な技術チュートリアルを提供することである。具体的には、文献をレビューし、意味伝達における「何」と「なぜ」の質問に答えることから始める。その後,semcomのエコシステムとして,歴史,理論,メトリクス,データセット,ツールキットを提示し,その上で研究の方向性を分類する。さらに, 明示的かつ暗黙的な推論に基づく手法により, 重要な実現手法を分類し, それらがどのように進化し, 現代的コンテントとチャネルセマンティクスを用いたコミュニケーションに寄与するかを詳述する。セムコムにおける最新の取り組みの見直しと要約に加えて、包括的で統一された視点から他のコミュニケーションレベル(例えば、従来のコミュニケーション)との関係について論じる。その後、今後の開発や工業的応用を促進するために、セマンティックな正確性、堅牢性、大規模スケーラビリティを高めるための先進的な実践技術を強調します。最後に,今後の研究機会に光を当てた技術的課題について論じる。

Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present the ecosystems of SemCom, including history, theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content & channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., conventional communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.

翻訳日:2023-11-14 22:41:57 公開日:2023-11-11

# マイクロ波被覆弱非調和超伝導量子ビットの非摂動的再正規化の解消

Resolving non-perturbative renormalization of a microwave-dressed weakly anharmonic superconducting qubit ( http://arxiv.org/abs/2212.05847v2 )

ライセンス: Link先を確認

Byoung-moo Ann, Sercan Deve, and Gary A. Steele

(参考訳) マイクロ波駆動は超伝導量子ビット(scqs)のユビキタスな技術であるが、従来の摂動理論に基づく服装状態の記述は強い駆動限界のダイナミクスを完全に捉えることはできない。トランスモン系回路量子力学(QED)系に適用可能なこれらの近似以外の包括的な研究は、主に単一モードまたは2状態系に限られているため、残念ながら稀である。本研究では,マイクロ波を装ったトランスモンを,広い範囲の駆動パラメータ上で単一量子化モードに結合する。トランスモンと共振器の相互作用と各モードの特性が強い駆動限界において著しく再正規化されることを明らかにする。従来の理論的な研究と異なり、摂動的レジームを超えた非再帰的かつ非フレケット理論を確立し、実験を良好に定量化する。本研究は,従来の近似を超越した,身なりのQEDライクなシステムに対する基本的な理解を拡大する。我々の研究は、高速量子ゲートの実装、量子ビットパラメータ工学、および駆動非線形システムに関する基礎研究にも貢献する。

Microwave driving is a ubiquitous technique for superconducting qubits (SCQs), but the dressed states description based on the conventionally used perturbation theory cannot fully capture the dynamics in the strong driving limit. Comprehensive studies beyond these approximations applicable to transmon-based circuit quantum electrodynamics (QED) systems are unfortunately rare as the relevant works have been mainly limited to single-mode or two-state systems. In this work, we investigate a microwave-dressed transmon coupled to a single quantized mode over a wide range of driving parameters. We reveal that the interaction between the transmon and resonator as well as the properties of each mode is significantly renormalized in the strong driving limit. Unlike previous theoretical works, we establish a non-recursive, and non-Floquet theory beyond the perturbative regimes, which excellently quantifies the experiments. This work expands our fundamental understanding of dressed cavity QED-like systems beyond the conventional approximations. Our work will also contribute to fast quantum gate implementation, qubit parameter engineering, and fundamental studies on driven nonlinear systems.

翻訳日:2023-11-14 22:41:31 公開日:2023-11-11

# 大規模フレキシブルタイトガウス混合モデルの確率的1次学習

Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian Mixture Model ( http://arxiv.org/abs/2212.05402v3 )

ライセンス: Link先を確認

Mohammad Pasande, Reshad Hosseini, Babak Nadjar Araabi

(参考訳) ガウス混合モデル(gmms)は、多くの応用で広く使われている最も強力なパラメトリック密度モデルの一つである。 GMMにおける共分散行列の柔軟な分解は、多くのガウス成分を必要とする高次元データや複素密度に直面した場合の共通GMMの課題に対処するための強力なアプローチである。しかし, フレキシブルタイトGMMを適合させるための期待最大化アルゴリズムは, ストリーミングや非常に大きな次元データに難航している。これらの課題を克服するために,一階確率最適化アルゴリズムを提案する。具体的には、直交行列の多様体上の新しい確率最適化アルゴリズムを提案する。合成データセットと実データセットの両方における多くの実験結果を通して、確率的最適化手法は、より良い可能性の達成、収束のエポックの低減、各エポック毎の時間の短縮という観点で予測-最大化アルゴリズムより優れていることが観察された。

Gaussian Mixture Models (GMMs) are one of the most potent parametric density models used extensively in many applications. Flexibly-tied factorization of the covariance matrices in GMMs is a powerful approach for coping with the challenges of common GMMs when faced with high-dimensional data and complex densities which often demand a large number of Gaussian components. However, the expectation-maximization algorithm for fitting flexibly-tied GMMs still encounters difficulties with streaming and very large dimensional data. To overcome these challenges, this paper suggests the use of first-order stochastic optimization algorithms. Specifically, we propose a new stochastic optimization algorithm on the manifold of orthogonal matrices. Through numerous empirical results on both synthetic and real datasets, we observe that stochastic optimization methods can outperform the expectation-maximization algorithm in terms of attaining better likelihood, needing fewer epochs for convergence, and consuming less time per each epoch.

翻訳日:2023-11-14 22:41:15 公開日:2023-11-11

# 大規模言語モデルで自然発生した心の理論

Theory of Mind Might Have Spontaneously Emerged in Large Language Models ( http://arxiv.org/abs/2302.02083v5 )

ライセンス: Link先を確認

Michal Kosinski

(参考訳) 我々は、心の理論(ToM)や、観察不能な精神状態を他人に説明するユニークな人間の能力が、大きな言語モデル(LLM)に自然に現れる可能性を探る。 ToMをヒトでテストする際の金の基準として,40の偽確認タスクを設計し,複数のLSMに投与した。各タスクには、偽確認シナリオ、3つの密に一致した真信制御、全4つの逆バージョンが含まれていた。 GPT-3-davinci-003(2022年11月)とChatGPT-3.5-turbo(2023年3月)は20%のタスクを解き、ChatGPT-4(2023年6月)は75%のタスクを解き、過去の研究で観察された6歳児のパフォーマンスと一致した。これらの結果から,これまでヒトに排他的と考えられていたToMが,LLMの言語能力向上の副産物として自然に出現した可能性が示唆された。

We explore the intriguing possibility that theory of mind (ToM), or the uniquely human ability to impute unobservable mental states to others, might have spontaneously emerged in large language models (LLMs). We designed 40 false-belief tasks, considered a gold standard in testing ToM in humans, and administered them to several LLMs. Each task included a false-belief scenario, three closely matched true-belief controls, and the reversed versions of all four. Smaller and older models solved no tasks; GPT-3-davinci-003 (from November 2022) and ChatGPT-3.5-turbo (from March 2023) solved 20% of the tasks; ChatGPT-4 (from June 2023) solved 75% of the tasks, matching the performance of six-year-old children observed in past studies. These findings suggest the intriguing possibility that ToM, previously considered exclusive to humans, may have spontaneously emerged as a byproduct of LLMs' improving language skills.

翻訳日:2023-11-14 22:29:23 公開日:2023-11-11

# MS-DETR:低結合核融合型マルチスペクトル歩行者検出変換器とモードベース最適化

MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization ( http://arxiv.org/abs/2302.00290v3 )

ライセンス: Link先を確認

Yinghui Xing, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang

(参考訳) 可視・熱変調は特に低照度条件下で相補的な情報を提供することができるため、多スペクトル歩行者検出は、多くの時空応用にとって重要な課題である。利用可能なマルチスペクトル歩行者検出装置のほとんどが非エンド・ツー・エンド検出器に基づいているが,本稿ではマルチスペクトル歩行者検出用トランスフォーマ(ms-detr)を提案し,detrをマルチモーダル検出の分野に拡張する。 ms-detrは2つのモダリティ固有のバックボーンとトランスエンコーダで構成され、続いてマルチモーダルトランスフォーマデコーダがあり、可視性と熱的特徴はマルチモーダルトランスフォーマデコーダで融合される。マルチモーダル画像間の不一致によく抵抗するため,マルチモーダル特徴のキーポイントを個別に抽出し,適応的に学習した注意重みでそれらを融合することにより,疎結合な融合戦略を設計する。さらに、異なるモダリティだけでなく、異なる歩行者インスタンスが最終検出のために異なる信頼度スコアを持つ傾向があるという知見に基づいて、可視およびサーマルデコーダの分岐を保存し、インスタンス毎の動的損失を通じて予測スロットを整列するインスタンス対応モダリティバランス最適化戦略を提案する。我々のエンドツーエンドMS-DETRは、挑戦的なKAIST、CVC-14、LLVIPベンチマークデータセットよりも優れた性能を示している。ソースコードはhttps://github.com/YinghuiXing/MS-DETR で公開されている。

Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR .

翻訳日:2023-11-14 22:28:41 公開日:2023-11-11

# 言語モデルは後続のプロンプトで人間より悪いか? 複雑です

Are Language Models Worse than Humans at Following Prompts? It's Complicated ( http://arxiv.org/abs/2301.07085v2 )

ライセンス: Link先を確認

Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick

(参考訳) プロンプトは言語モデルのゼロショットと少数ショットのパフォーマンスの進歩の中心である。しかし、最近の研究では、意図的な無関係や誤解を招くプロンプトが与えられた場合、モデルは驚くほどうまく機能することがわかった。このような結果は、モデル行動が「人間らしくない」という証拠として解釈できる。本研究は,病的指示が与えられた場合,人間は良く行動する,という研究の中心的な前提に挑戦する。人間は無関係な指示を確実に無視することができ、従ってモデルのように、要求されるタスクに関する信号が明らかに不足しているにもかかわらず、基礎となるタスクでうまく機能する。しかし、故意に誤解を招く指示を受けると、人間は忠実に指示に従うが、モデルは従わない。今後の研究は、モノリスとしての人間の行動を理想化すべきではなく、人間の行動を実証的に検証することなく、これらの行動に関する仮定を模倣するモデルを訓練・評価すべきではない。

Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Our findings caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically.

翻訳日:2023-11-14 22:27:19 公開日:2023-11-11

# 多元アノテーションによるロバストな医用画像セグメンテーションの学習

Learning Robust Medical Image Segmentation from Multi-source Annotations ( http://arxiv.org/abs/2304.00466v2 )

ライセンス: Link先を確認

Yifeng Wang, Luyang Luo, Mingxiang Wu, Qiong Wang and Hao Chen

(参考訳) 複数の独立したソースからアノテーションを収集することで、単一のソースからの潜在的なノイズやバイアスの影響を軽減することができる。マルチソースアノテーションからセグメンテーションネットワークを学習することは、アノテーションのばらつきと画像の品質がもたらす不確実性のため、依然として課題である。本稿では,画素レベルと画像レベルの両方における不確実性推定によるトレーニングプロセスを導く,不確実性誘導型多元アノテーションネットワーク(uma-net)を提案する。まず,アノテーションの不確実性評価モジュール(AUEM)を開発し,各アノテーションの画素単位の不確かさを学習し,重み付きセグメンテーション損失による信頼画素からの学習をネットワークに誘導した。第2に,評価済みアノテーションの不確実性に基づいて,入力サンプルの画質を評価する品質評価モジュール(QAM)を提案した。重要となるのは, 廃棄する代わりに, 低品質のサンプルから学習するための補助的予測器を導入することで, 主予測器にエラーを直接蓄積することなく, その表現知識をバックボーンに保存することであった。 2次元胸部X線セグメンテーション,眼底画像セグメンテーション,3次元胸部DCE-MRIセグメンテーションなど,様々なデータセットに対するUMA-Netの有効性と有用性を示した。

Collecting annotations from multiple independent sources could mitigate the impact of potential noises and biases from a single source, which is a common practice in medical image segmentation. Learning segmentation networks from multi-source annotations remains a challenge due to the uncertainties brought by the variance of annotations and the quality of images. In this paper, we propose an Uncertainty-guided Multi-source Annotation Network (UMA-Net), which guides the training process by uncertainty estimation at both the pixel and the image levels. First, we developed the annotation uncertainty estimation module (AUEM) to learn the pixel-wise uncertainty of each annotation, which then guided the network to learn from reliable pixels by weighted segmentation loss. Second, a quality assessment module (QAM) was proposed to assess the image-level quality of the input samples based on the former assessed annotation uncertainties. Importantly, we introduced an auxiliary predictor to learn from the low-quality samples instead of discarding them, which ensured the preservation of their representation knowledge in the backbone without directly accumulating errors within the primary predictor. Extensive experiments demonstrated the effectiveness and feasibility of our proposed UMA-Net on various datasets, including 2D chest X-ray segmentation, fundus image segmentation, and 3D breast DCE-MRI segmentation.

翻訳日:2023-11-14 22:20:28 公開日:2023-11-11

# 人間反応データからの最適・プライベート学習

Optimal and Private Learning from Human Response Data ( http://arxiv.org/abs/2303.06234v2 )

ライセンス: Link先を確認

Duc Nguyen and Anderson Y. Zhang

(参考訳) 項目応答理論 (IRT) は、人々が確率的意思決定を行う方法の研究であり、教育試験やレコメンデーションシステムなどに様々な応用がある。 IRTにおける最も基本的なモデルの1つであるバイナリ応答データのラッシュモデルは、重要な実践的重要性を持つ研究の活発な領域である。最近、Nguyen と Zhang (2022) は、効率的かつ正確な新しいスペクトル推定アルゴリズムを提案した。本研究では2つの重要な方法で結果を拡張する。まず,スペクトルアルゴリズムにおいて,「平均誤差」$\ell_2$バウンド」を補足する改良されたエントリワイド誤差を求める。特に、軽度のサンプリング条件下では、スペクトルアルゴリズムは最小誤差境界(ログ係数の変調)を達成する。改良された分析に基づいて、スペクトルアルゴリズムは、上位$K$回復のための最適なサンプル複雑さ(例えば、承認/不承認応答データから最高の$K$アイテムを識別する)を享受し、前回の研究の実証的な結果を説明する。第2のコントリビューションでは、IRTで重要だが未検討のトピックであるプライバシーについて取り上げています。 IRTの人間中心の応用にもかかわらず、文献にはプライバシー保護機構が提案されていない。我々は、独自のマルコフ連鎖定式化と離散ガウス機構を利用したスペクトルアルゴリズムのプライベート拡張を開発する(Canonne et al., 2020)。実験により、我々のアプローチは低レベルのプライバシー体制のベースラインよりもはるかに正確であることが示されている。

Item response theory (IRT) is the study of how people make probabilistic decisions, with diverse applications in education testing, recommendation systems, among others. The Rasch model of binary response data, one of the most fundamental models in IRT, remains an active area of research with important practical significance. Recently, Nguyen and Zhang (2022) proposed a new spectral estimation algorithm that is efficient and accurate. In this work, we extend their results in two important ways. Firstly, we obtain a refined entrywise error bound for the spectral algorithm, complementing the `average error' $\ell_2$ bound in their work. Notably, under mild sampling conditions, the spectral algorithm achieves the minimax optimal error bound (modulo a log factor). Building on the refined analysis, we also show that the spectral algorithm enjoys optimal sample complexity for top-$K$ recovery (e.g., identifying the best $K$ items from approval/disapproval response data), explaining the empirical findings in the previous work. Our second contribution addresses an important but understudied topic in IRT: privacy. Despite the human-centric applications of IRT, there has not been any proposed privacy-preserving mechanism in the literature. We develop a private extension of the spectral algorithm, leveraging its unique Markov chain formulation and the discrete Gaussian mechanism (Canonne et al., 2020). Experiments show that our approach is significantly more accurate than the baselines in the low-to-moderate privacy regime.

翻訳日:2023-11-14 22:16:58 公開日:2023-11-11

# 対話型テキスト生成

Interactive Text Generation ( http://arxiv.org/abs/2303.00908v3 )

ライセンス: Link先を確認

Felix Faltings and Michel Galley and Baolin Peng and Kiant\'e Brantley and Weixin Cai and Yizhe Zhang and Jianfeng Gao and Bill Dolan

(参考訳) ユーザは毎日、テキスト、画像、コード、その他のエディタと対話する。しかし、ユーザーとエディタ間の対話性を反映した設定では、機械学習モデルをトレーニングすることは滅多にない。これは、実際のユーザによるAIモデルのトレーニングが遅くてコストがかかるだけでなく、これらのモデルが学んだことは、ユーザインターフェースの設計選択に特有のものかもしれないため、理解できる。残念ながらこれは、テキスト、コード、画像生成に関するほとんどの研究が非インタラクティブな設定に焦点を当てていることを意味している。対象テキストに対してモデルを誘導する編集を提供するユーザシミュレータを用いて,実ユーザを巻き込むことなく,対話的に生成モデルを訓練できる新たな対話型テキスト生成タスクを提案する。我々は、Imitation Learningを使ってインタラクティブモデルをトレーニングし、競争力のある非インタラクティブ生成モデルに対する実験により、すべてのモデルにユーザー入力や編集の予算が同じであっても、インタラクティブにトレーニングされたモデルは非インタラクティブモデルよりも優れていることを示す。

Users interact with text, image, code, or other editors on a daily basis. However, machine learning models are rarely trained in the settings that reflect the interactivity between users and their editor. This is understandable as training AI models with real users is not only slow and costly, but what these models learn may be specific to user interface design choices. Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help. We introduce a new Interactive Text Generation task that allows training generation models interactively without the costs of involving real users, by using user simulators that provide edits that guide the model towards a given target text. We train our interactive models using Imitation Learning, and our experiments against competitive non-interactive generation models show that models trained interactively are superior to their non-interactive counterparts, even when all models are given the same budget of user inputs or edits.

翻訳日:2023-11-14 22:16:05 公開日:2023-11-11

# 指数ヒルベルト空間を持たない多体マヨラナブレイディング

Many-body Majorana braiding without an exponential Hilbert space ( http://arxiv.org/abs/2303.00761v3 )

ライセンス: Link先を確認

Eric Mascot, Themba Hodge, Dan Crawford, Jasmin Bedow, Dirk K. Morr, Stephan Rachel

(参考訳) majorana zero modes (mzms) で構築された量子ビットは、位相的に保護された量子コンピューティングへの主要な経路である。複数のMZMのブレイディング過程のシミュレーションは超伝導多体系の量子力学に対応する。マヨラナ力学は、他の全ての準粒子の存在と、合理的に大きなシステムサイズの両方で研究することが重要である。本稿では,任意の多体波動関数とその期待値,相関値,重なりを超伝導体の時間発展単粒子状態から計算する方法を提案する。ブレイディングプロセスの品質を追跡するために,マヨラナペアの忠実性,遷移確率,ジョイントパリティを計算する。ブレイディングの成功はブレイドの速度にどのように依存するかを示す。さらに, トポロジカルCNOT2量子ゲートを2量子絡みの例として示す。我々の研究は、Majorana qubitsの多くの理論的実装をテストし分析する道を開く。さらに、この方法は任意の非相互作用超伝導体の動力学を研究するのに使うことができる。

Qubits built out of Majorana zero modes (MZMs) constitute the primary path towards topologically protected quantum computing. Simulating the braiding process of multiple MZMs corresponds to the quantum dynamics of a superconducting many-body system. It is crucial to study the Majorana dynamics both in the presence of all other quasiparticles and for reasonably large system sizes. We present a method to calculate arbitrary many-body wavefunctions as well as their expectation values, correlators and overlaps from time evolved single-particle states of a superconductor, allowing for significantly larger system sizes. We calculate the fidelity, transition probabilities, and joint parities of Majorana pairs to track the quality of the braiding process. We show how the braiding success depends on the speed of the braid. Moreover, we demonstrate the topological CNOT two-qubit gate as an example of two-qubit entanglement. Our work opens the path to test and analyze the many theoretical implementations of Majorana qubits. Moreover, this method can be used to study the dynamics of any non-interacting superconductor.

翻訳日:2023-11-14 22:15:46 公開日:2023-11-11

# ベル実験の解説

An explanation of the Bell experiment ( http://arxiv.org/abs/2305.05299v6 )

ライセンス: Link先を確認

Inge S. Helland

(参考訳) ベル実験は、量子力学の基礎に対する新しいアプローチとして議論されている。基本的なモデルから、どんなオブザーバーの心も何らかの方法で制限されなければならないと結論づけられる: ある文脈では、彼は単に意思決定時に十分な変数を心に保持できない。これはベルの定理の帰結であるが、より広い結果をもたらすようである。

The Bell experiment is discussed in the light of a new approach to the foundation of quantum mechanics. It is concluded from the basic model that the mind of any observer must be limited in some way: In certain contexts, he is simply not able to keep enough variables in his mind when making decisions. This has consequences for Bell's theorem, but it also seems to have wider consequences.

翻訳日:2023-11-14 22:06:29 公開日:2023-11-11

# ロバストツリーアンサンブルの検証可能な学習

Verifiable Learning for Robust Tree Ensembles ( http://arxiv.org/abs/2305.03626v4 )

ライセンス: Link先を確認

Stefano Calzavara, Lorenzo Cazzaro, Giulio Ermanno Pibiri, Nicola Prezza

(参考訳) テスト時の回避攻撃に対する機械学習モデルの堅牢性を検証することは重要な研究課題である。残念なことに、この問題は決定木アンサンブルに対してNPハードであることが証明され、従って特定の入力に対して難解となる。本稿では,多項式時間で動作するセキュリティ検証アルゴリズムを付加した,大規模分散アンサンブルと呼ばれる決定木アンサンブルの制限クラスを同定する。次に,効率的な検証が可能な制限付きモデルクラスのトレーニングを提唱する,verizable learningと呼ばれる新しいアプローチを提案する。我々は,ラベル付きデータから大域的な決定木を自動学習する新しい学習アルゴリズムを設計し,多項式時間でセキュリティ検証を可能にすることにより,このアイデアの利点を示す。公開データセットの実験結果から,我々のアルゴリズムを用いてトレーニングした大域的なアンサンブルが,標準的な商用ハードウェアを用いて数秒で検証可能であることを確認した。さらに、大スプレッドアンサンブルは、非敵対的な設定において許容される精度の損失を犠牲にして、従来の回避攻撃に対するアンサンブルよりも頑丈である。

Verifying the robustness of machine learning models against evasion attacks at test time is an important research problem. Unfortunately, prior work established that this problem is NP-hard for decision tree ensembles, hence bound to be intractable for specific inputs. In this paper, we identify a restricted class of decision tree ensembles, called large-spread ensembles, which admit a security verification algorithm running in polynomial time. We then propose a new approach called verifiable learning, which advocates the training of such restricted model classes which are amenable for efficient verification. We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data, thus enabling its security verification in polynomial time. Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds, using standard commercial hardware. Moreover, large-spread ensembles are more robust than traditional ensembles against evasion attacks, at the cost of an acceptable loss of accuracy in the non-adversarial setting.

翻訳日:2023-11-14 22:06:24 公開日:2023-11-11

# mixpro:プロンプトベース学習のためのシンプルで効果的なデータ拡張

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning ( http://arxiv.org/abs/2304.09402v2 )

ライセンス: Link先を確認

Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che

(参考訳) プロンプトに基づく学習は、元の入力と所定のテンプレートを組み合わせることで、様々な下流タスクをクローゼ問題として再構成する上で大きな可能性を示してきた。このアプローチは、特に少ない量のデータに基づいてモデルをトレーニングする、数ショットの学習シナリオにおいて、その効果を示す。その成功にもかかわらず、少数のプロンプトベースの学習シナリオで限定されたテンプレートとテキストは、パフォーマンス改善の余地を残している。さらに、既存の手法ではモデルアンサンブルを利用する場合もあるが、計算要求の増加によりモデル効率が低下する可能性がある。これらの問題に対処するため,我々は,バニラ入力テキストとテンプレートの両方を補完する拡張手法であるmixproを紹介する。これをトークンレベル、文レベル、テンプレートレベルのミックスアップ戦略を通じて実装します。 5つの数ショットデータセットの実験結果は、mixproが他の拡張ベースラインよりも優れており、拡張前のモデルパフォーマンスが平均5.8%向上していることを示している。

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template. This approach demonstrates its effectiveness, especially in few-shot learning scenarios, where the model is trained on a scarce amount of data. Despite its successes, the limited templates and text in few-shot prompt-based learning scenarios leave significant room for performance improvement. Moreover, existing methods sometimes resort to model ensembles, which, while effective, could potentially hamper model efficiency due to increased computational demands. To address these issues, we introduce MixPro, an augmentation method designed to augment both the vanilla input text and the templates. We implement this through the token-level, the sentence-level, and the template-level Mixup strategies. The experimental results on five few-shot datasets show that MixPro outperforms other augmentation baselines, improving model performance by an average of 5.08% compared to before augmentation.

翻訳日:2023-11-14 22:04:31 公開日:2023-11-11

# ヒトiPSC再プログラム成功の早期予測に向けて

Towards Early Prediction of Human iPSC Reprogramming Success ( http://arxiv.org/abs/2305.14575v2 )

ライセンス: Link先を確認

Abhineet Singh, Ila Jasra, Omar Mouhammed, Nidheesh Dadheech, Nilanjan Ray, James Shapiro

(参考訳) 本報告では,iPSCを再生細胞療法の候補として,ヒト誘導多能性幹細胞(iPSCs)のプログラム成功の早期自動予測の進歩について述べる。そのため、数百万の細胞を培養し、単一の最適なクローンを特定するために複数のクローンの強力な生物学的精査が必要である。熟成の初期段階において、どの細胞が最適なiPSCラインとして成立するかを確実に予測できる能力は、パーソナライズドメディカルへの実用的で費用対効果の高いアプローチである。細胞増殖の経時変化に関する時間的情報はその将来の成長予測に不可欠である。このデータを生成するために,我々はまず,超高分解能顕微鏡を用いて培養中のiPSCの連続時間ラプス撮影を行った。そこで我々は、信頼できる手動識別が可能な後期画像に、細胞の位置とアイデンティティを注釈付けした。次に, 半自動追跡システムを用いてラベルを後方に伝播させ, 成長初期のラベルを得る。最後に、このデータを用いてディープニューラルネットワークをトレーニングし、セルのセグメンテーションと分類を自動実行する。私たちのコードとデータはhttps://github.com/abhineet123/ipsc_predictionで入手できます。

This paper presents advancements in automated early-stage prediction of the success of reprogramming human induced pluripotent stem cells (iPSCs) as a potential source for regenerative cell therapies.The minuscule success rate of iPSC-reprogramming of around $ 0.01% $ to $ 0.1% $ makes it labor-intensive, time-consuming, and exorbitantly expensive to generate a stable iPSC line. Since that requires culturing of millions of cells and intense biological scrutiny of multiple clones to identify a single optimal clone. The ability to reliably predict which cells are likely to establish as an optimal iPSC line at an early stage of pluripotency would therefore be ground-breaking in rendering this a practical and cost-effective approach to personalized medicine. Temporal information about changes in cellular appearance over time is crucial for predicting its future growth outcomes. In order to generate this data, we first performed continuous time-lapse imaging of iPSCs in culture using an ultra-high resolution microscope. We then annotated the locations and identities of cells in late-stage images where reliable manual identification is possible. Next, we propagated these labels backwards in time using a semi-automated tracking system to obtain labels for early stages of growth. Finally, we used this data to train deep neural networks to perform automatic cell segmentation and classification. Our code and data are available at https://github.com/abhineet123/ipsc_prediction.

翻訳日:2023-11-14 21:54:42 公開日:2023-11-11

# 代理モデルを用いた深部強化学習エージェントのテスト

Testing of Deep Reinforcement Learning Agents with Surrogate Models ( http://arxiv.org/abs/2305.12751v2 )

ライセンス: Link先を確認

Matteo Biagiola, Paolo Tonella

(参考訳) 近年,深層強化学習 (DRL) が研究コミュニティから注目を集めている。この技術は、ゲームプレイから自動運転車やロボティクスといった実践的なコンテキストに移行するため、drlエージェントの品質を評価することが不可欠である。本稿では,このようなエージェントを検索ベースでテストする手法を提案する。 Indagoと呼ばれるツールで実装された我々のアプローチは、DRLトレーニングプロセスから生じる障害環境と非障害環境(すなわちパス)の分類器を訓練する。この分類器は、テスト時に環境におけるdrlエージェントの実行のサロゲートモデルとして使用され、与えられた環境設定がテスト中のdrlエージェントの障害を引き起こす程度を予測する。障害予測は適合関数として機能し、障害環境設定への生成を導くと同時に、障害を露呈する可能性のある構成に対して環境内のdrlエージェントの実行を遅らせることで、計算時間を節約する。実験の結果,我々の検索手法は最先端技術よりもDRLエージェントの失敗率が50%多いことがわかった。さらに、このような障害は平均して78%多様であり、同様に障害構成によって誘発されるDRLエージェントの挙動は74%多様である。

Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment (i.e., pass) configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. The failure prediction acts as a fitness function, guiding the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failures are, on average, 78% more diverse; similarly, the behaviors of the DRL agent induced by failure configurations are 74% more diverse.

翻訳日:2023-11-14 21:51:52 公開日:2023-11-11

# StEik: ニューラルサイン付き距離関数の最適化と有限形状表現の安定化

StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation ( http://arxiv.org/abs/2305.18414v3 )

ライセンス: Link先を確認

Huizong Yang, Yuxin Sun, Ganesh Sundaramoorthi, Anthony Yezzi

(参考訳) 形態の暗黙的神経表現(INR)を学習するための新しい知見と新しいパラダイム(StEik)を提案する。特に,INRに符号付き距離関数制約を課すのによく使われるエイコナール損失に光を当てた。ネットワークの表現力が増加するにつれて、最適化は連続極限における偏微分方程式(PDE)に近づき、不安定となることを示す。この不安定性は, 既設のネットワーク最適化において発現し, 再構成表面の不規則性や, 局所的局所最小値への収束を招き, 微妙な幾何学的・位相的構造を捉えることができないことを示す。我々は、現在文献で使われている損失に付加された他の用語が、実際にこれらの不安定性を排除することができるかを分析的に示す。しかし、そのような用語は表面を過度に規則化することができ、微細な形状の表現を妨げている。同様の連続体極限のpde理論に基づき、固有不安定性は相反するが過剰正規化はしない新しい正規化項を導入する。さらに, 安定度は連続限界で保証されているため, この安定化により, より微細な形状の細部を表現できる新しいネットワーク構造も検討できる。このような構造を二次層に導入する。複数のベンチマークデータセットの実験により、我々の新しい正規化とネットワークは、既存の最先端技術よりも正確な形状の詳細と正確なトポロジを捉えることができることが示された。

We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.

翻訳日:2023-11-14 21:40:00 公開日:2023-11-11

# GlyphControl:ビジュアルテキスト生成のためのグリフ条件制御

GlyphControl: Glyph Conditional Control for Visual Text Generation ( http://arxiv.org/abs/2305.18259v2 )

ライセンス: Link先を確認

Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen

(参考訳) 近年,コヒーレントでよく表現されたビジュアルテキストを生成できる拡散型テキスト対画像生成モデルの開発が注目されている。本稿では,この課題に対処するために,GlyphControlという新しい,効率的な手法を提案する。 ByT5のような文字認識型テキストエンコーダに依存し、テキスト・ツー・イメージモデルの再訓練を必要とする既存の方法とは異なり、本手法ではグリフ条件情報を活用して、正確なビジュアルテキストを生成する際に、既製の安定拡散モデルの性能を向上させる。 glyph命令を組み込むことで、ユーザーは特定の要求に応じて生成されたテキストの内容、場所、サイズをカスタマイズできる。視覚テキスト生成のさらなる研究を容易にするため,LAION-Glyphと呼ばれるトレーニングベンチマークデータセットを構築した。提案手法の有効性を,OCRに基づく測定値,CLIPスコア,FIDを用いて評価した。 GlyphControl は OCR の精度,CLIP スコア,FID の点で近年の DeepFloyd IF アプローチよりも優れており,本手法の有効性が示された。

Recently, there has been an increasing interest in developing diffusion-based text-to-image generative models capable of generating coherent and well-formed visual text. In this paper, we propose a novel and efficient approach called GlyphControl to address this task. Unlike existing methods that rely on character-aware text encoders like ByT5 and require retraining of text-to-image models, our approach leverages additional glyph conditional information to enhance the performance of the off-the-shelf Stable-Diffusion model in generating accurate visual text. By incorporating glyph instructions, users can customize the content, location, and size of the generated text according to their specific requirements. To facilitate further research in visual text generation, we construct a training benchmark dataset called LAION-Glyph. We evaluate the effectiveness of our approach by measuring OCR-based metrics, CLIP score, and FID of the generated visual text. Our empirical evaluations demonstrate that GlyphControl outperforms the recent DeepFloyd IF approach in terms of OCR accuracy, CLIP score, and FID, highlighting the efficacy of our method.

翻訳日:2023-11-14 21:39:36 公開日:2023-11-11

# NashFormer: 局所的なNash平衡を利用した意味的多元性軌道予測

NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction ( http://arxiv.org/abs/2305.17600v3 )

ライセンス: Link先を確認

Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,マルチモーダル予測のカバレッジ向上のために,ゲーム理論の逆強化学習を活用する軌道予測フレームワークであるNashFormerを提案する。トレーニング時間ゲーム理論解析を補助的損失として用いて,エージェントの行動の分類を仮定することなく,カバレッジと精度を向上させる。 Waymo Open Motion Datasetのインタラクティブな分割について,対話性の高いシナリオを含む4つのサブセットを含む,私たちのアプローチを実証する。実験の結果,予測器はベースラインモデルよりも3,3\%以上の潜在的な相互作用をカバーし,正確な予測を行うことがわかった。

Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model.

翻訳日:2023-11-14 21:38:55 公開日:2023-11-11

# 領域一般化における不均一性の定量的測定と対比

Quantitatively Measuring and Contrastively Exploring Heterogeneity for Domain Generalization ( http://arxiv.org/abs/2305.15889v3 )

ライセンス: Link先を確認

Yunze Tong, Junkun Yuan, Min Zhang, Didi Zhu, Keli Zhang, Fei Wu, Kun Kuang

(参考訳) ドメイン一般化(dg、domain generalization)は、実世界のアプリケーションでよく見られる問題であり、複数のソースドメインを利用することで、対象とするドメインの well-generalized モデルを訓練することを目的としている。ドメインラベル、すなわち、各データポイントがサンプリングされたドメインが自然に存在するため、ほとんどのdgアルゴリズムは、それらを一般化性能を改善するための監督情報の一種として扱う。しかし、元のドメインラベルはドメインの不均一性の欠如、すなわちドメイン間の多様性のため、最適な監視信号ではないかもしれない。例えば、あるドメインのサンプルは別のドメインに近い場合があり、その元のラベルは一般化学習を妨げるノイズとなる可能性がある。ドメインを再分割し、新たに生成された分割パターンを適用することでそれを解こうとする手法もあるが、不均一性の計量が欠如しているため、それらが選択するパターンは最も異種でないかもしれない。本稿では、ドメインの不均一性は主に不変学習フレームワークの下での変分特徴にあることを指摘する。対照的な学習法により,学習の変分的特徴を促進させることにより,ドメインの不均一性を学習可能な指標を提案する。次に, 分散に基づく不均一性を求めることと, 学習不変性に基づく一般化モデルの違いに着目する。そこで本研究では,DGタスクのための異種性に基づく二段階コントラスト学習(HTCL)を提案する。第一段階では、最も異質な分割パターンを対照的な計量で生成する。第2段階では、ドメインやクラスが示唆する安定した関係とペアを再構築し、生成したドメインラベルを一般化学習に有効活用することで、不変性を考慮したコントラスト学習を行う。広範囲な実験により、htclは異質性をよりよく掘り出し、大きな一般化性能をもたらすことが示されている。

Domain generalization (DG) is a prevalent problem in real-world applications, which aims to train well-generalized models for unseen target domains by utilizing several source domains. Since domain labels, i.e., which domain each data point is sampled from, naturally exist, most DG algorithms treat them as a kind of supervision information to improve the generalization performance. However, the original domain labels may not be the optimal supervision signal due to the lack of domain heterogeneity, i.e., the diversity among domains. For example, a sample in one domain may be closer to another domain, its original label thus can be the noise to disturb the generalization learning. Although some methods try to solve it by re-dividing domains and applying the newly generated dividing pattern, the pattern they choose may not be the most heterogeneous due to the lack of the metric for heterogeneity. In this paper, we point out that domain heterogeneity mainly lies in variant features under the invariant learning framework. With contrastive learning, we propose a learning potential-guided metric for domain heterogeneity by promoting learning variant features. Then we notice the differences between seeking variance-based heterogeneity and training invariance-based generalizable model. We thus propose a novel method called Heterogeneity-based Two-stage Contrastive Learning (HTCL) for the DG task. In the first stage, we generate the most heterogeneous dividing pattern with our contrastive metric. In the second stage, we employ an invariance-aimed contrastive learning by re-building pairs with the stable relation hinted by domains and classes, which better utilizes generated domain labels for generalization learning. Extensive experiments show HTCL better digs heterogeneity and yields great generalization performance.

翻訳日:2023-11-14 21:37:54 公開日:2023-11-11

# 変分量子シミュレーションのスケーラビリティ向上のためのフェルミオンシミュレータ

Fermionic Simulators for Enhanced Scalability of Variational Quantum Simulation ( http://arxiv.org/abs/2306.14842v2 )

ライセンス: Link先を確認

Qingyu Li, Chiranjib Mukhopadhyay, Abolfazl Bayat

(参考訳) 短期量子シミュレータは主に量子ビットベースのアーキテクチャに基づいている。しかし、その不完全な性質は実用性を著しく制限している。この状況は、物質科学や化学のほとんどを根底にあるフェルミオン系をシミュレートする上でさらに悪化している。光ツイーザーにおける中性原子のトラップと操作の最近の進歩により、デジタルフェルミオン量子シミュレーターが実現しつつある。鍵となる疑問は、これらの出現するフェルミオンシミュレータが、強い相関電子系を特徴づけるためにキュービットベースのシミュレータより優れているかどうかである。本稿では, 凝縮体系と量子化学問題の両方におけるフェルミオン系の変動基底状態エミュレーションのための量子ビットシミュレータとフェルミオンシミュレータとの資源効率の包括的比較を行う。フェルミイオンシミュレータは量子進化の資源(循環深さ)や古典的最適化(必要パラメータ数と反復数)において量子ビットシミュレータよりも優れていることを示す。さらに、回路のランダム初期化に対する感度を低下させる。フェルミオンシミュレータの相対的な利点は、相互作用が強くなるにつれてさらに顕著になり、また、スピンフルフェルミオンと同様に1次元以上のトンネルが許される。重要なのは、この改善はスケーラブルであり、fermionicシミュレータとqubitシミュレータのパフォーマンスギャップは、より大きなシステムサイズでのみ大きくなることだ。

Near-term quantum simulators are mostly based on qubit-based architectures. However, their imperfect nature significantly limits their practical application. The situation is even worse for simulating fermionic systems, which underlie most of material science and chemistry, as one has to adopt fermion-to-qubit encodings which create significant additional resource overhead and trainability issues. Thanks to recent advances in trapping and manipulation of neutral atoms in optical tweezers, digital fermionic quantum simulators are becoming viable. A key question is whether these emerging fermionic simulators can outperform qubit-based simulators for characterizing strongly correlated electronic systems. Here, we perform a comprehensive comparison of resource efficiency between qubit and fermionic simulators for variational ground-state emulation of fermionic systems in both condensed matter systems and quantum chemistry problems. We show that the fermionic simulators indeed outperform their qubit counterparts with respect to resources for quantum evolution (circuit depth), as well as classical optimization (number of required parameters and iterations). In addition, they show less sensitivity to the random initialization of the circuit. The relative advantage of fermionic simulators becomes even more pronounced as interaction becomes stronger, or tunneling is allowed in more than one dimension, as well as for spinful fermions. Importantly, this improvement is scalable, i.e., the performance gap between fermionic and qubit simulators only grows for bigger system sizes.

翻訳日:2023-11-14 21:29:56 公開日:2023-11-11

# 二次元半導体における電磁ボソンの制御可能な融合

Controllable fusion of electromagnetic bosons in two-dimensional semiconductors ( http://arxiv.org/abs/2306.14225v2 )

ライセンス: Link先を確認

Sergue\"i V. Andreev

(参考訳) 二次元(2次元)半導体における同一電磁ボソン(励起子またはポラリトン)の制御可能な相互作用の実装のための物理原理を提案する。鍵となる成分は、例えば一軸ひずみによるホスト構造の強結合二エクシトンおよび面内異方性である。放射励起子2重項の異方性による分裂は、バイエクシトン状態とボソン散乱状態の連続性を結合させることを示す。その結果、横磁場を印加したり、マイクロキャビティ光子モードとの結合を調整することにより、バイエクシトンに近接してエネルギー的に調整されたときに、ボソンの2体弾性散乱を共鳴増幅することができる。共鳴では、ボソニック場はそのスクイーズを伴う核融合の量子反応を受ける。励起子に対しては、共鳴を横切る磁場の急激な断熱的掃流によってバイエクシトンから得られる巨大分子(フェシュバッハ二量体)を予測する。分子は非自明な絡み合い特性を有する。我々の提案は、強い相関のフォトニクスと光の量子化学を約束する。

We propose a physical principle for implementation of controllable interactions of identical electromagnetic bosons (excitons or polaritons) in two-dimensional (2D) semiconductors. The key ingredients are tightly bound biexcitons and in-plane anisotropy of the host structure due to, e.g., a uniaxial strain. We show that anisotropy-induced splitting of the radiative exciton doublet couples the biexciton state to continua of boson scattering states. As a result, two-body elastic scattering of bosons may be resonantly amplified when energetically tuned close to the biexciton by applying a transverse magnetic field or tuning the coupling with the microcavity photon mode. At the resonance, bosonic fields undergo quantum reaction of fusion accompanied by their squeezing. For excitons, we predict giant molecules (Feshbach dimers) which can be obtained from a biexciton via rapid adiabatic sweeping of the magnetic field across the resonance. The molecules possess non-trivial entanglement properties. Our proposal holds promise for the strongly-correlated photonics and quantum chemistry of light.

翻訳日:2023-11-14 21:29:31 公開日:2023-11-11

# 文脈性、コヒーレンス、量子チェシャー猫

Contextuality, Coherences, and Quantum Cheshire Cats ( http://arxiv.org/abs/2307.06583v2 )

ライセンス: Link先を確認

Jonte R. Hance, Ming Ji, Holger F. Hofmann

(参考訳) 我々は、文脈性理論を用いて量子チェシャイア猫を分析し、このパラドックスを解釈する最善の方法が何かわかるかどうかを確かめる。このシナリオは3つの異なる測定値の関係を用いて解析できることを示すが、これは論理的な矛盾をもたらすと考えられる。この文脈的振る舞いが弱値とどのようにつながり、禁止状態間の一貫性を議論する。量子チェシャー猫(quantum cheshire cat)は、粒子の性質を示すのではなく、これらのコヒーレンスの効果を示す。

We analyse the quantum Cheshire cat using contextuality theory, to see if this can tell us anything about how best to interpret this paradox. We show that this scenario can be analysed using the relation between three different measurements, which seem to result in a logical contradiction. We discuss how this contextual behaviour links to weak values, and coherences between prohibited states. Rather than showing a property of the particle is disembodied, the quantum Cheshire cat instead demonstrates the effects of these coherences, which are typically found in pre- and postselected systems.

翻訳日:2023-11-14 21:17:46 公開日:2023-11-11

# 開量子系における共鳴支配光力学的絡み合い

Resonance-dominant optomechanical entanglement in open quantum systems ( http://arxiv.org/abs/2307.12383v2 )

ライセンス: Link先を確認

Cheng Shang and Hongchao Li

(参考訳) 絡み合い保護に動機づけられ,共振効果を用いてコヒーレント状態表現における光力学的絡み合いを高める。本研究では, 熱力学モードと周辺熱浴との間に有意な変動成分を弱結合限界でフィルタするフィルタモデルを提案する。連続変数の絡み合いの保護は、重要な変形成分に関連する自由度を排除し、脱コヒーレンスに抵抗することを明らかにする。本研究では, フィルタモデルの非線形ランゲヴィン方程式を構築し, 温度ゆらぎノイズと機械減衰との定常的最大最適エンタングルメントのロバスト性を数値的に示す。さらに、これらの結果を1つの振動するエンドミラーを持つ光学キャビティアレイに一般化し、長距離最適オプティメカニカルエンタングルメント転送について検討する。本研究は, 量子システムのデコヒーレンスから保護し, 大規模量子情報処理と量子ネットワーク構築の可能性を高めるために, 共鳴効果を適用した新たな基盤を打破する。

Motivated by entanglement protection, our work utilizes a resonance effect to enhance optomechanical entanglement in the coherent-state representation. We propose a filtering model to filter out the significant detuning components between a thermal-mechanical mode and its surrounding heat baths in the weak coupling limit. We reveal that protecting continuous-variable entanglement involves the elimination of degrees of freedom associated with significant detuning components, thereby resisting decoherence. We construct a nonlinear Langevin equation of the filtering model and numerically show that the filtering model doubles the robustness of the stationary maximum optomechanical entanglement to the thermal fluctuation noise and mechanical damping. Furthermore, we generalize these results to an optical cavity array with one oscillating end-mirror to investigate the long-distance optimal optomechanical entanglement transfer. Our study breaks new ground for applying the resonance effect to protect quantum systems from decoherence and advancing the possibilities of large-scale quantum information processing and quantum network construction.

翻訳日:2023-11-14 21:04:58 公開日:2023-11-11

# バイオメディカル自然言語処理におけるフェデレーション学習の深度評価

An In-Depth Evaluation of Federated Learning on Biomedical Natural Language Processing ( http://arxiv.org/abs/2307.11254v2 )

ライセンス: Link先を確認

Le Peng, Gaoxiang Luo, sicheng zhou, jiandong chen, Rui Zhang, Ziyue Xu, Ju Sun

(参考訳) BERTやGPTのような言語モデル(LM)は自然言語処理(NLP)に革命をもたらした。しかし、医療分野は、医療保険ポータビリティ・アンド・アカウンタビリティ法(hippa)や一般データ保護規則(gdpr)などの規制によって課されるデータアクセスの制限とプライバシーの制約により、lmsの訓練が困難に直面している。フェデレートラーニング(FL)は、データプライバシを確保しながら協調学習を可能にする分散ソリューションを提供する。本研究では,8コーパスを含む2つのバイオメディカルNLPタスクのFLを6 LMを用いて評価した。結果はこう示しています 1) flモデルは、個々のクライアントのデータに基づいてトレーニングされたモデルよりも一貫して優れており、時々、ポーリングされたデータで訓練されたモデルと互換性がある。 2) 総データ量は一定であり, クライアント数の多いFLモデルでは性能は劣るが, 事前学習したトランスフォーマーモデルでは高いレジリエンスを示した。 3) FLモデルはゼロ・ワンショット学習と稲妻推論速度によって大きな言語モデルよりも優れていた。

Language models (LMs) such as BERT and GPT have revolutionized natural language processing (NLP). However, the medical field faces challenges in training LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring data privacy. In this study, we evaluated FL on 2 biomedical NLP tasks encompassing 8 corpora using 6 LMs. Our results show that: 1) FL models consistently outperformed models trained on individual clients' data and sometimes performed comparably with models trained with polled data; 2) with the fixed number of total data, FL models training with more clients produced inferior performance but pre-trained transformer-based models exhibited great resilience. 3) FL models significantly outperformed large language models using zero-/one-shot learning and offered lightning inference speed.

翻訳日:2023-11-14 21:03:00 公開日:2023-11-11

# 自律走行車のための連帯学習--既存手法と課題の検討

Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges ( http://arxiv.org/abs/2308.10407v2 )

ライセンス: Link先を確認

Vishnu Pandi Chellapandi and Liangqi Yuan and Christopher G. Brinton and Stanislaw H Zak and Ziran Wang

(参考訳) 機械学習(ml)は、知覚、計画、制御を含む、コネクテッドおよび自動車両(cav)における重要なタスクに広く使われている。しかし、モデルトレーニングにおける車両データへの依存は、車内ユーザのプライバシと大量のデータボリュームが生み出す通信オーバーヘッドに重大な課題をもたらす。フェデレートラーニング(FL)は、複数の車両が協力してモデルを開発し、さまざまな運転環境からの学習を拡大し、全体的なパフォーマンスを高め、ローカル車両のデータプライバシとセキュリティを同時に確保する、分散MLアプローチである。本報告では, FL の CAV (FL4CAV) への適用における進歩について概説する。まず、flの集中型フレームワークと分散フレームワークを分析し、その重要な特徴と方法論を強調する。次に、CAVにおけるFLに関連する多様なデータソース、モデル、およびデータセキュリティ技術についてレビューし、プライバシーと機密性を保証することの重要性を強調した。第3に、flの特定のアプリケーションが検討され、各アプリケーションで採用されるベースモデルとデータセットについての洞察を提供する。最後に、FL4CAVの既存の課題をリストアップし、CAVにおけるFLの有効性と効率をさらに高めるための今後の研究の方向性について論じる。

Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future investigation to further enhance the effectiveness and efficiency of FL in the context of CAV are discussed.

翻訳日:2023-11-14 20:53:03 公開日:2023-11-11

# rtllm: 大きな言語モデルによるrtl生成のためのオープンソースベンチマーク

RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model ( http://arxiv.org/abs/2308.05345v3 )

ライセンス: Link先を確認

Yao Lu, Shang Liu, Qijun Zhang, Zhiyao Xie

(参考訳) ChatGPTのような最近の大規模言語モデル(LLM)の成功に触発されて、研究者は、自然言語命令に基づいた設計RTLの生成など、アジャイルハードウェア設計におけるLLMの採用を探り始めた。しかし、既存の研究では、それらのターゲット設計はすべて比較的単純で小規模であり、著者自身によって提案されており、異なるLLMソリューション間で公正に比較することは困難である。さらに、多くの先行作品は、生成した設計rtlの設計品質を評価することなく、設計の正確性にのみ焦点を合わせている。本研究では,自然言語命令を用いた設計RTLを生成するRTLLMというオープンソースのベンチマークを提案する。自動生成設計RTLを体系的に評価するために,構文目標,機能目標,設計品質目標の3つの段階目標をまとめた。このベンチマークは、任意のLCMベースのソリューションを定量的に評価する。さらに,提案するベンチマークにおいて,gpt-3.5の性能が大幅に向上することを示すセルフプランニングという,簡便かつ驚くほど効果的なプロンプトエンジニアリング手法を提案する。

Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.

翻訳日:2023-11-14 20:50:44 公開日:2023-11-11

# 加速光によるフォトニック絡み合い

Photonic entanglement with accelerated light ( http://arxiv.org/abs/2308.01764v3 )

ライセンス: Link先を確認

R. C. Souza Pimenta, G. H. dos Santos, A. B. Barreto, L. C. Celeri and P. H. Souto Ribeiro

(参考訳) 加速光はレーザー光と回折で実証されている。回折場内では、例えば重力場によって加速されたような曲線軌道で伝播するビームエネルギーの大部分を運ぶ部分を特定することができる。ここでは、自然パラメトリックダウンコンバージョンで発生する双対ビーム間の絡み合いに対するこの種の加速度の影響を解析する。その結果, 加速度は理想的な条件下では絡み合いに大きく影響しないことがわかった。導入された光学スキームは重力と量子物理学の境界における過程の理解に有用である。

Accelerated light has been demonstrated with laser light and diffraction. Within the diffracting field it is possible to identify a portion that carries most of the beam energy, which propagates in a curved trajectory as it would have been accelerated by a gravitational field for instance. Here, we analyze the effects of this kind of acceleration over the entanglement between twin beams produced in spontaneous parametric down-conversion. Our results show that acceleration does not affect entanglement significantly, under ideal conditions. The optical scheme introduced can be useful in the understanding of processes in the boundary between gravitation and quantum physics.

翻訳日:2023-11-14 20:50:10 公開日:2023-11-11

# 非線形超伝導マイクロ波システムのスペクトル理論:緩和率の抽出とモードハイブリダイゼーション

Spectral Theory for Non-linear Superconducting Microwave Systems: Extracting Relaxation Rates and Mode Hybridization ( http://arxiv.org/abs/2309.03435v2 )

ライセンス: Link先を確認

Dung N. Pham, Richard D. Li, Hakan E. T\"ureci

(参考訳) モードハイブリダイゼーションの正確なモデリングと放射緩和率の計算は超伝導量子デバイスの設計と最適化に不可欠である。本研究では,超伝導体の一般三次元分布における励起緩和率の抽出を可能にする超伝導体の電気流体力学のスペクトル理論を提案する。提案手法は, 効率が高く, 放射型ハイブリダイゼーション場を2次量子化できるオープンシステムのモーダル記述を定式化する, 長年の課題に対処する。これは、放射が計算領域内と外へ伝播できる有限だが透明な境界を実装することで達成される。結果として生じるスペクトル問題は、多スケール超伝導量子系の非平衡ダイナミクスの解析に適した電気流体力学方程式の粗い定式化の中で定義される。

The accurate modeling of mode hybridization and calculation of radiative relaxation rates have been crucial to the design and optimization of superconducting quantum devices. In this work, we introduce a spectral theory for the electrohydrodynamics of superconductors that enables the extraction of the relaxation rates of excitations in a general three-dimensional distribution of superconducting bodies. Our approach addresses the long-standing problem of formulating a modal description of open systems that is both efficient and allows for second quantization of the radiative hybridized fields. This is achieved through the implementation of finite but transparent boundaries through which radiation can propagate into and out of the computational domain. The resulting spectral problem is defined within a coarse-grained formulation of the electrohydrodynamical equations that is suitable for the analysis of the non-equilibrium dynamics of multiscale superconducting quantum systems.

翻訳日:2023-11-14 20:39:40 公開日:2023-11-11

# 自律走行における運動関連モジュールのDRLに基づく軌道追跡

DRL-Based Trajectory Tracking for Motion-Related Modules in Autonomous Driving ( http://arxiv.org/abs/2308.15991v2 )

ライセンス: Link先を確認

Yinda Xu, Lidong Yu

(参考訳) 自律運転システムは、常にプランナーやコントローラのような運動関連モジュール上に構築される。これらの運動関連モジュールを原始ルーチンとして高精度でロバストな軌道追跡法が不可欠である。現在の手法は、コンテキストやダイナミクスのようなモデルについて強い仮定をすることが多いが、現実のシステムの変化するシナリオに対処するには不十分である。本稿では,自律走行システムにおける運動関連モジュールに対する深部強化学習(DRL)に基づく軌道追跡手法を提案する。 DLの表現学習能力とRLの探索特性は強靭性と精度の向上をもたらす。一方、モデルフリーでデータ駆動の方法で軌道追跡を実行することで、汎用性を高める。広範な実験により,現在の手法と比較して,提案手法の効率性と有効性の両方を実証した。

Autonomous driving systems are always built on motion-related modules such as the planner and the controller. An accurate and robust trajectory tracking method is indispensable for these motion-related modules as a primitive routine. Current methods often make strong assumptions about the model such as the context and the dynamics, which are not robust enough to deal with the changing scenarios in a real-world system. In this paper, we propose a Deep Reinforcement Learning (DRL)-based trajectory tracking method for the motion-related modules in autonomous driving systems. The representation learning ability of DL and the exploration nature of RL bring strong robustness and improve accuracy. Meanwhile, it enhances versatility by running the trajectory tracking in a model-free and data-driven manner. Through extensive experiments, we demonstrate both the efficiency and effectiveness of our method compared to current methods.

翻訳日:2023-11-14 20:38:16 公開日:2023-11-11

# GraphがLLMと出会い、大規模グラフモデルへ

Graph Meets LLMs: Towards Large Graph Models ( http://arxiv.org/abs/2308.14522v2 )

ライセンス: Link先を確認

Ziwei Zhang, Haoyang Li, Zeyang Zhang, Yijian Qin, Xin Wang, Wenwu Zhu

(参考訳) 人工知能、特に機械学習における最近の画期的な成果として、大きなモデルが現れている。しかし、グラフに関して言えば、大きなモデルは自然言語処理やコンピュータビジョンといった他の分野と同様の成功レベルに達していない。グラフに対する大規模モデルの適用を促進するために,我々は,大規模グラフモデルの開発に伴う課題と機会について議論する。まず,大規模グラフモデルの望ましい特性について述べる。次に,表現基底,グラフデータ,グラフモデルという3つの視点から詳細な議論を行う。それぞれのカテゴリにおいて、最近の進歩の概要を簡潔に述べ、残りの課題をビジョンとともに強調します。最後に,大規模グラフモデルの有用な応用について論じる。この視点は、大きなグラフモデルに関するさらなる調査を促し、最終的には人工知能(AGI)に一歩近づいたと信じています。私たちは、知識を最大限に活用するために、大規模なグラフモデルを包括的に研究した最初の人物です。

Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective can encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI). We are the first to comprehensively study large graph models, to the best of our knowledge.

翻訳日:2023-11-14 20:37:33 公開日:2023-11-11

# Diffuse, Attend, Segment: 安定拡散を用いた教師なしゼロショットセグメンテーション

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion ( http://arxiv.org/abs/2308.12469v2 )

ライセンス: Link先を確認

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco

(参考訳) 画像の品質セグメンテーションマスクの作成は、コンピュータビジョンの基本的な問題である。近年の研究では、画像スタイルのゼロショットセグメンテーションを可能にするための大規模教師あり訓練と、濃密なアノテーションを使わずにセグメンテーションを可能にする教師なしトレーニングが検討されている。しかし、アノテーションなしであらゆるものをゼロショットでセグメント化できるモデルを構築することは依然として難しい。本稿では, 自己付着層を安定拡散モデルに活用し, 事前学習した安定拡散モデルが注意層内における物体の固有概念を学習したことにより, この目標を達成することを提案する。具体的には,注意マップ間のklの発散を計測し,有効なセグメンテーションマスクにマージする簡易かつ効果的な反復的マージプロセスを提案する。提案手法は,画像の品質セグメンテーションを抽出するために訓練や言語依存を必要としない。 COCO-Stuff-27では,従来の教師なしゼロショットSOTA法を26%,IoU平均17%で上回っている。プロジェクトページは \url{https://sites.google.com/view/diffseg/home} にある。

Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at \url{https://sites.google.com/view/diffseg/home}.

翻訳日:2023-11-14 20:36:45 公開日:2023-11-11

# FinGPT:財務データセットにおけるオープンソースの大規模言語モデルのインストラクションチューニングベンチマーク

FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets ( http://arxiv.org/abs/2310.04793v2 )

ライセンス: Link先を確認

Neng Wang, Hongyang Yang, Christina Dan Wang

(参考訳) 自然言語処理(NLP)分野が急速に拡大する中で、金融セクターにおけるGPTベースのモデルの可能性はますます明白になっている。しかしながら、これらのモデルと財務データセットの統合は、特にその妥当性と妥当性を決定する上で、課題を提起する。本稿では、特に財務状況に適応したオープンソースの大規模言語モデルに対して、インストラクションチューニングパラダイムに固有のアプローチを導入する。この方法論を通じて、我々はオープンソースのモデルの相互運用性を活かし、シームレスで透過的な統合を保証する。まず、インストラクションチューニングのパラダイムを説明し、即時統合の有効性を強調します。本稿では,エンドツーエンドのトレーニングとテストのためのベンチマーク手法を提案する。まず,名前付きエンティティ認識(NER)や感情分析などの基本的な能力と基本的なタスクを評価し,専門性を高める。次に、汎用性を調べるために全ての命令チューニングを融合してマルチタスク操作を実行する包括的モデルについて検討する。最後に,目立たないタスクを認識してゼロショット機能を探索し,未開の地形における適応性を理解するための新しいデータセットを組み込んだ。このようなパラダイムはオープン性と再現性の原則を立証し、オープンソースの金融大言語モデル(FinLLMs)における将来の調査の基盤となる。

In the swiftly expanding domain of Natural Language Processing (NLP), the potential of GPT-based models for the financial sector is increasingly evident. However, the integration of these models with financial datasets presents challenges, notably in determining their adeptness and relevance. This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models, specifically adapted for financial contexts. Through this methodology, we capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration. We begin by explaining the Instruction Tuning paradigm, highlighting its effectiveness for immediate integration. The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression. Firstly, we assess basic competencies and fundamental tasks, such as Named Entity Recognition (NER) and sentiment analysis to enhance specialization. Next, we delve into a comprehensive model, executing multi-task operations by amalgamating all instructional tunings to examine versatility. Finally, we explore the zero-shot capabilities by earmarking unseen tasks and incorporating novel datasets to understand adaptability in uncharted terrains. Such a paradigm fortifies the principles of openness and reproducibility, laying a robust foundation for future investigations in open-source financial large language models (FinLLMs).

翻訳日:2023-11-14 20:28:36 公開日:2023-11-11

# ソフトウェア工学のための大規模言語モデル:調査とオープン問題

Large Language Models for Software Engineering: Survey and Open Problems ( http://arxiv.org/abs/2310.03533v4 )

ライセンス: Link先を確認

Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, Jie M. Zhang

(参考訳) 本稿では,ソフトウェア工学(SE)におけるLarge Language Models(LLMs)の新興領域について調査する。また、llmをソフトウェアエンジニアが直面する技術的問題に適用するためのオープンリサーチの課題も規定している。 LLMの創発的な特性は、コーディング、設計、要求、修復、リファクタリング、パフォーマンス改善、ドキュメントと分析を含むソフトウェアエンジニアリングのアクティビティの範囲で、アプリケーションによって、斬新さと創造性をもたらします。しかし、これらの全く同じ創発的な性質は重要な技術的課題を生じさせ、幻覚のような不正確な解を確実に除去できる技術が必要である。本調査では,ハイブリッド技術(従来のSE+LLM)が,信頼性,効率的,効果的なLLMベースのSEの開発と展開において果たす役割を明らかにする。

This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.

翻訳日:2023-11-14 20:27:49 公開日:2023-11-11

# 数学質問改善のための検索強化生成:地味と人の嗜好のトレードオフ

Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference ( http://arxiv.org/abs/2310.03184v2 )

ライセンス: Link先を確認

Zachary Levonian, Chenglu Li, Wangda Zhu, Anoushka Gade, Owen Henkel, Millie-Ellen Postle, Wanli Xing

(参考訳) 中学生にとって、教師との対話型質問応答(QA)は効果的な学習方法である。生成的大言語モデル(LLM)の柔軟性と創発的能力は、数学的概念に関する概念的議論を支援する対話型QAを含む、学習プロセスの一部を自動化することへの関心の高まりにつながっている。しかし、数学の質問に対する LLM の応答は、学校のカリキュラムと不一致であるなど、教育の文脈に正しく、あるいは不一致している可能性がある。潜在的な解決策の1つは検索強化生成(RAG)であり、LLMプロンプトに精査された外部知識ソースを組み込んで応答品質を向上させる。本稿では,高品質なオープンソース教科書からコンテンツを検索し,活用するプロンプトを設計し,実際の学生の質問に対する回答を生成する。我々は,中学代数学・幾何学QAにおけるRAGシステムの有効性を,多条件サーベイによって評価し,RAGを用いて生成した応答をヒトが好むが,教科書の内容に応答があまりに根付いていない場合ではないことを示した。我々は、RAGは応答品質を向上させることができるが、数学のQAシステムの設計者は、学生が好む応答と、特定の教育資源と密接に一致する応答とのトレードオフを検討する必要があると論じる。

For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.

翻訳日:2023-11-14 20:26:52 公開日:2023-11-11

# 量子エンタングルメント位相遷移と計算複雑性:イジングモデルからの考察

Quantum Entanglement Phase Transitions and Computational Complexity: Insights from Ising Models ( http://arxiv.org/abs/2310.01699v2 )

ライセンス: Link先を確認

Hanchen Liu, Vikram Ravindranath, and Xiao Chen

(参考訳) 本稿では,2次元のバイパートイトクラスタ状態を構築し,バルク量子ビットの単一量子ビット計測を行う。測定されていない1次元境界状態の絡み合いスケーリングを考察し、ある条件下では、境界状態が測定角度の変化によって駆動される領域則絡み転移に体積則を適用できることを示す。この境界状態絡み合い遷移と非単位1+1次元回路における測定誘起相転移を伝達行列法により橋渡しする。計算複雑性問題に対するこの絡み合い遷移の適用についても検討する。具体的には、境界状態の絡み合い遷移と、複雑なパラメータを持つ対応するイジング分割関数の計算複雑性に直接関係する2部2ドルのクラスター状態のサンプリング複雑性との関係を定式化する。境界状態の絡み合いスケーリングを調べることにより,2ドルの量子状態が効率的にサンプリングできるパラメータレジームを数値的に同定し,イジング分割関数をそのような領域で効率的に評価できることを示す。

In this paper, we construct 2-dimensional bipartite cluster states and perform single-qubit measurements on the bulk qubits. We explore the entanglement scaling of the unmeasured 1-dimensional boundary state and show that under certain conditions, the boundary state can undergo a volume-law to an area-law entanglement transition driven by variations in the measurement angle. We bridge this boundary state entanglement transition and the measurement-induced phase transition in the non-unitary 1+1-dimensional circuit via the transfer matrix method. We also explore the application of this entanglement transition on the computational complexity problems. Specifically, we establish a relation between the boundary state entanglement transition and the sampling complexity of the bipartite $2$d cluster state, which is directly related to the computational complexity of the corresponding Ising partition function with complex parameters. By examining the boundary state entanglement scaling, we numerically identify the parameter regime for which the $2$d quantum state can be efficiently sampled, which indicates that the Ising partition function can be evaluated efficiently in such a region.

翻訳日:2023-11-14 20:26:04 公開日:2023-11-11

# De-SaTE:Liイオン電池の健康診断のためのセルフアテンショントランスフォーマーエンコーダ

De-SaTE: Denoising Self-attention Transformer Encoders for Li-ion Battery Health Prognostics ( http://arxiv.org/abs/2310.00023v2 )

ライセンス: Link先を確認

Gaurav Shinde, Rohan Mohapatra, Pooja Krishan and Saptarshi Sengupta

(参考訳) リチウムイオン電池の使用は、ポータブル電子機器の電源から電気自動車の推進、エネルギー貯蔵システムのサポートに至るまで、様々な産業で広く普及している。リチウムイオン電池の信頼性における中心的な課題は、持続的メンテナンスと予測分析にとって重要な指標であるRemaining Useful Life (RUL)を正確に予測することにある。本研究は,電池データに共通する特定のノイズに対処するよう訓練された,複数モジュールのパワーを利用する新しい手法を提案する。具体的には、消音オートエンコーダとウェーブレットデノイザーを使用して符号化/分解された表現を生成し、その後専用のセルフアテンショントランスフォーマエンコーダで処理する。 NASAとCALCEのデータに対する広範な実験の後、様々なノイズパターンの下で幅広い健康指標値が推定される。これらのデータに関する報告されたエラーメトリクスは、最近の文献で報告された最新技術と同等かそれ以上である。

The usage of Lithium-ion (Li-ion) batteries has gained widespread popularity across various industries, from powering portable electronic devices to propelling electric vehicles and supporting energy storage systems. A central challenge in Li-ion battery reliability lies in accurately predicting their Remaining Useful Life (RUL), which is a critical measure for proactive maintenance and predictive analytics. This study presents a novel approach that harnesses the power of multiple denoising modules, each trained to address specific types of noise commonly encountered in battery data. Specifically, a denoising auto-encoder and a wavelet denoiser are used to generate encoded/decomposed representations, which are subsequently processed through dedicated self-attention transformer encoders. After extensive experimentation on NASA and CALCE data, a broad spectrum of health indicator values are estimated under a set of diverse noise patterns. The reported error metrics on these data are on par with or better than the state-of-the-art reported in recent literature.

翻訳日:2023-11-14 20:25:46 公開日:2023-11-11

# PreM:ノードレベルグラフ異常検出のためのシンプルで効果的なアプローチ

PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection ( http://arxiv.org/abs/2310.11676v2 )

ライセンス: Link先を確認

Junjun Pan, Yixin Liu, Yizhen Zheng, Shirui Pan

(参考訳) ノードレベルのグラフ異常検出(GAD)は、医学、ソーシャルネットワーク、eコマースなど、さまざまな領域におけるグラフ構造化データから異常ノードを特定する上で重要な役割を果たす。しかし、異常の多様性とラベル付きデータの変形により、問題が発生している。既存の方法論に基づくコントラスト学習 - 効率的ではあるが、しばしば効率上の問題に悩まされ、複雑な目的や精巧なモジュールから生じる。本稿では,GADの効率を向上させるために,PREM (preprocessing and Matching) という簡単な手法を提案する。我々のアプローチは、強力な異常検出機能を維持しながら、GADを合理化し、時間とメモリ消費を削減する。プリプロセッシングモジュールとego-neighborマッチングモジュールの2つのモジュールで構成されるpremは、トレーニング中にメッセージパッシング伝搬の必要性をなくし、単純なコントラスト損失を採用し、トレーニング時間とメモリ使用量を大幅に削減する。さらに,5つの実世界のデータセットの厳密な評価により,ロバスト性と有効性を示した。特に、ACMデータセットで検証された場合、PremMはAUCの5%の改善、トレーニング速度の9倍向上、最も効率的なベースラインと比較してメモリ使用量を大幅に削減した。

Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in various domains such as medicine, social networks, and e-commerce. However, challenges have arisen due to the diversity of anomalies and the dearth of labeled data. Existing methodologies - reconstruction-based and contrastive learning - while effective, often suffer from efficiency issues, stemming from their complex objectives and elaborate modules. To improve the efficiency of GAD, we introduce a simple method termed PREprocessing and Matching (PREM for short). Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities. Comprising two modules - a pre-processing module and an ego-neighbor matching module - PREM eliminates the necessity for message-passing propagation during training, and employs a simple contrastive loss, leading to considerable reductions in training time and memory usage. Moreover, through rigorous evaluations of five real-world datasets, our method demonstrated robustness and effectiveness. Notably, when validated on the ACM dataset, PREM achieved a 5% improvement in AUC, a 9-fold increase in training speed, and sharply reduce memory usage compared to the most efficient baseline.

翻訳日:2023-11-14 20:16:45 公開日:2023-11-11

# サンプル効率の良いマルチタスクチューニングのためのプロトタイプベースハイパーアダプタ

Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning ( http://arxiv.org/abs/2310.11670v3 )

ライセンス: Link先を確認

Hao Zhao, Jie Fu, Zhaofeng He

(参考訳) パラメータ効率のよい微調整(PEFT)は、少数のパラメータを更新するだけで、トレーニング済み言語モデルを下流タスクに適応させる効果を示した。成功にもかかわらず、既存の手法のほとんどはタスク間の知識伝達を考慮せずに個別にタスクに適応し、低データ体制に限られる。この問題を解決するために,アダプタチューニングとハイパーネットワークに基づく新しいフレームワークであるPrototype-based HyperAdapter (PHA)を提案する。インスタンスデンスレトリバーとプロトタイプのハイパーネットワークを導入し、条件付きモジュールをサンプル効率のよい方法で生成する。これにより、マルチタスク学習と少ない転送学習において、既存のpeftメソッドと同等のパフォーマンス改善がもたらされる。さらに重要なことは、利用可能なデータサイズが小さくなると、我々のメソッドは大きなマージンで他の強力なベースラインを上回っます。さまざまなデータセットにわたる広範な実証実験に基づいて、トレーニング可能なパラメータとストリームタスクの正確性、サンプル効率のトレードオフをPHAがよりよいものにすることを実証した。

Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.

翻訳日:2023-11-14 20:16:20 公開日:2023-11-11

# 本を読むのは最高だけど、運転するなら違う! デファシブル・コモンセンス・ノームに関する視覚的根拠に基づく推論

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms ( http://arxiv.org/abs/2310.10418v2 )

ライセンス: Link先を確認

Seungju Han and Junhyeok Kim and Jack Hessel and Liwei Jiang and Jiwan Chung and Yejin Son and Yejin Choi and Youngjae Yu

(参考訳) 普通は本を読むことは素晴らしいが、車を運転するときにはそうではない。コンテキストは言語で明示的に記述できるが、具体化されたシナリオでは、コンテキストはしばしば視覚的に提供される。この種の視覚的に根ざした、デファシブル・コモンセンス規範に関する推論は、一般に人間にとって容易であるが、(私たちが見せているように)機械にとって、視覚的理解とコモンセンス規範に関する推論の両方を必要とするため、挑戦となる。 NORMLENSというビジュアルグラウンドのコモンセンス規範を研究するための新しいマルチモーダルベンチマークを構築した。 NORMLENSは、2Kマルチモーダル状況に関する自由形式の説明を伴う10K人の人的判断で構成されており、(1)モデルが平均的な人的判断とどの程度一致しているかという2つの疑問に対処するための調査となる。 2)モデルが予測した判断をどの程度説明できるか? 現状のモデル判断や説明は人間のアノテーションとよく一致していないことがわかった。さらに, 大規模言語モデルから社会常識知識を抽出し, モデルと人間との協調性を高めるための新しいアプローチを提案する。データとコードはhttps://seungjuhan.me/normlensでリリースされる。

Commonsense norms are defeasible by context: reading books is usually great, but not when driving a car. While contexts can be explicitly described in language, in embodied scenarios, contexts are often provided visually. This type of visually grounded reasoning about defeasible commonsense norms is generally easy for humans, but (as we show) poses a challenge for machines, as it necessitates both visual understanding and reasoning about commonsense norms. We construct a new multimodal benchmark for studying visual-grounded commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment? and (2) how well can models explain their predicted judgments? We find that state-of-the-art model judgments and explanations are not well-aligned with human annotation. Additionally, we present a new approach to better align models with humans by distilling social commonsense knowledge from large language models. The data and code are released at https://seungjuhan.me/normlens.

翻訳日:2023-11-14 20:15:41 公開日:2023-11-11

# assert: 大規模言語モデルのロバスト性評価のための自動安全シナリオred teaming

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models ( http://arxiv.org/abs/2310.09624v2 )

ライセンス: Link先を確認

Alex Mei, Sharon Levy, William Yang Wang

(参考訳) 大規模言語モデルが社会へ統合されるにつれ,高分散環境において信頼性を維持する上で,一組のプロンプトに対する堅牢性がますます重要になってきており,利用者がインテリジェントシステムを呼び出す様々な設定を包括的にカプセル化する必要がある。本稿では,ASSERT(Automated Safety Scenario Red Teaming)を提案する。3つの手法 – セマンティックアライメント,ターゲットブートストラップ,対人的知識注入 – から構成される。堅牢な安全性評価のために,これらの手法をAI安全の重要な領域に適用し,多種多様なロバスト性設定,関連するシナリオ,敵対的シナリオを含むテストスイートをアルゴリズム的に生成する。このプロンプトを4つの安全領域に分割し、ドメインがモデルの性能に与える影響を詳細に分析する。既存の最先端モデルでは特に安全対策を講じているが,意味的関連シナリオにおける絶対的分類精度の最大11%,ゼロショットの敵意設定では最大19%の絶対エラー率の統計的に有意な性能差が見出され,ユーザの身体的安全性への懸念が高まった。

As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment.Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system. This paper proposes ASSERT, Automated Safety Scenario Red Teaming, consisting of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. For robust safety evaluation, we apply these methods in the critical domain of AI safety to algorithmically generate a test suite of prompts covering diverse robustness settings -- semantic equivalence, related scenarios, and adversarial. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. Despite dedicated safeguards in existing state-of-the-art models, we find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings, raising concerns for users' physical safety.

翻訳日:2023-11-14 20:14:46 公開日:2023-11-11

# 探索を伴わない共同ビームフォーミングのためのRL-Policiesの学習--Batch Constrained Off-Policy アプローチ

Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach ( http://arxiv.org/abs/2310.08660v2 )

ライセンス: Link先を確認

Heasung Kim and Sravan Kumar Ankireddy

(参考訳) 本研究では,レート最大化のためのネットワークパラメータ最適化の問題を考える。我々はこれを、電力制御、ビーム形成、干渉キャンセルの連立最適化問題とみなす。複数の基地局(BS)が複数のユーザ機器(UE)と通信する環境を考える。ブルート力探索の指数関数的計算複雑性のため、深部強化学習(RL)技術を用いて、この非凸最適化問題を解く。現代の通信システムは、行動を正確にモデル化することが難しいことで悪名高い。これにより、エージェントが効率的に探索し学習するために必要な環境との相互作用として、RLベースのアルゴリズムを使用することが制限される。さらに、失敗のコストが高いため、探索と学習のために現実世界にアルゴリズムをデプロイすることが不適当である。ディープQネットワーク(DQN)ベースの制御など,従来のRLベースのソリューションとは対照的に,オフラインモデルベースのアプローチを提案する。具体的には、離散バッチ制約深度Q-ラーニング(BCQ)について検討し、DQNに類似した性能を探索することなく、少数のデータで実現できることを示す。これはサンプル効率を最大化し、商用ネットワークに新しいアルゴリズムをデプロイするリスクを最小化する。 https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.com/ のリンクで、コードとデータを含むプロジェクトリソース全体を提供します。

In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipment (UEs). Because of the exponential computational complexity of brute force search, we instead solve this nonconvex optimization problem using deep reinforcement learning (RL) techniques. Modern communication systems are notorious for their difficulty in exactly modeling their behavior. This limits us in using RL-based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill-advised to deploy the algorithm in the real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we suggest an offline model-based approach. We specifically consider discrete batch-constrained deep Q-learning (BCQ) and show that performance similar to DQN can be achieved with only a fraction of the data without exploring. This maximizes sample efficiency and minimizes risk in deploying a new algorithm to commercial networks. We provide the entire project resource, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.

翻訳日:2023-11-14 20:13:58 公開日:2023-11-11

# CATEモデル選択のための因果Q-集約

Causal Q-Aggregation for CATE Model Selection ( http://arxiv.org/abs/2310.16945v4 )

ライセンス: Link先を確認

Hui Lan, Vasilis Syrgkanis

(参考訳) 条件平均治療効果(CATE)の正確な推定は、パーソナライズされた意思決定の中核にある。 CATE推定には多くのモデルが存在するが、因果推論の根本的な問題のため、モデル選択は非自明な作業である。最近の実証研究は、二重ロバストな特性を持つプロキシ損失メトリクスとモデルアンサンブルを支持する証拠を提供する。しかし、理論的な理解は不足している。事前の理論的研究の直接適用は、モデル選択問題の非凸性に起因する最適オラクルモデル選択率につながる。我々は,既存の主要なcate ensemblingアプローチに対する後悔率を提供し,二重ロバストな損失を用いたq集約に基づく新しいcate モデル ensemblingアプローチを提案する。本結果から, 因果Q-集約は, 誤差関数の積に関する高次推定誤差項を付加することにより, 統計的に最適なオラクルモデル選択残差率$\frac{\log(M)}{n}$(M$モデルと$n$サンプルを含む)が得られることを示した。重要なことは、我々の後悔率は、どの候補CATEモデルも真実に近いものを必要としない。我々は、多くの半合成データセットで新しい手法を検証するとともに、モデル選択をインストゥルメンタル変数と非オブザーブドコンファウンディングで分類する作業の拡張も提供する。

Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.

翻訳日:2023-11-14 20:04:17 公開日:2023-11-11

# VQ-NeRF:ベクトル量子化によるニューラルリフレクタンス分解と編集

VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector Quantization ( http://arxiv.org/abs/2310.11864v3 )

ライセンス: Link先を確認

Hongliang Zhong, Jingbo Zhang, Jing Liao

(参考訳) 本研究では,ベクトル量子化(vector quantization, vq)を組み込んだ2分岐ニューラルネットワークモデルであるvq-nerfを提案する。従来のニューラル・リフレクタンス・フィールドは、3Dシーンをモデル化するためにのみ連続表現を使用する。この離散化の欠如は、ノイズのある材料分解と複雑な材料編集をもたらす。これらの制限に対処するため、我々のモデルは連続枝と離散枝からなる。連続枝は従来のパイプラインに従って分解物を予測し、離散枝はVQ機構を用いて連続物質を個別に定量化する。材料を離散化することにより,分解過程におけるノイズを低減し,離散材料のセグメンテーションマップを生成する。セグメンテーション結果の対応する領域をクリックして、さらに編集するための特定材料を容易に選択することができる。さらに,シーン内の材料数を予測するために,ドロップアウトに基づくVQコードワードランキング手法を提案する。ユーザビリティを向上させるために,素材編集を支援するインタラクティブインタフェースも開発している。我々は,コンピュータ生成シーンと実世界のシーンの両方でモデルを評価し,その優れた性能を示す。我々の知る限り、我々のモデルは3Dシーンで個別の素材編集を可能にする最初のモデルである。

We propose VQ-NeRF, a two-branch neural network model that incorporates Vector Quantization (VQ) to decompose and edit reflectance fields in 3D scenes. Conventional neural reflectance fields use only continuous representations to model 3D scenes, despite the fact that objects are typically composed of discrete materials in reality. This lack of discretization can result in noisy material decomposition and complicated material editing. To address these limitations, our model consists of a continuous branch and a discrete branch. The continuous branch follows the conventional pipeline to predict decomposed materials, while the discrete branch uses the VQ mechanism to quantize continuous materials into individual ones. By discretizing the materials, our model can reduce noise in the decomposition process and generate a segmentation map of discrete materials. Specific materials can be easily selected for further editing by clicking on the corresponding area of the segmentation outcomes. Additionally, we propose a dropout-based VQ codeword ranking strategy to predict the number of materials in a scene, which reduces redundancy in the material segmentation process. To improve usability, we also develop an interactive interface to further assist material editing. We evaluate our model on both computer-generated and real-world scenes, demonstrating its superior performance. To the best of our knowledge, our model is the first to enable discrete material editing in 3D scenes.

翻訳日:2023-11-14 19:59:35 公開日:2023-11-11

# LLMにおける選択予測改善のための自己評価による適応

Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs ( http://arxiv.org/abs/2310.11689v2 )

ライセンス: Link先を確認

Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

(参考訳) 大規模言語モデル(LLM)は近年,自然言語理解や生成など,さまざまなタスクにおいて大きな進歩を見せている。しかし、高い意思決定シナリオでの使用は、エラーの可能性があるため、依然として制限されている。選択予測(Selective prediction)とは、LLMの信頼性を向上させるために、答えが不確実な場合には予測を控えることによって使用できる手法である。本研究では, LLMの選択的予測性能を向上させるために, 自己評価による適応のための新しいフレームワークを提案する。本フレームワークは,自己評価能力の向上を図りながら,パラメータ効率のチューニングを用いて,特定のタスクにLLMを適用するという考え方に基づいている。提案手法は,様々な質問応答(QA)データセット上で評価し,最先端の選択予測手法よりも優れていることを示す。例えば、CoQAベンチマークでは、AUACCを91.23%から92.63%に改善し、AUROCを74.61%から80.25%に改善した。

Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.

翻訳日:2023-11-14 19:58:53 公開日:2023-11-11

# 医用画像階層型マルチラベル分類のためのTLMCMネットワーク

TLMCM Network for Medical Image Hierarchical Multi-Label Classification ( http://arxiv.org/abs/2311.00282v2 )

ライセンス: Link先を確認

Meng Wu, Siyan Luo, Qiyu Wu, Wenbin Ouyang

(参考訳) 医用画像階層的マルチラベル分類(MI-HMC)は、現代医療において最重要であり、データ不均衡と‘textit{hierarchy constraint}’の2つの重要な課題を提示している。既存のソリューションには複雑なモデルアーキテクチャ設計やドメイン固有の前処理が含まれており、実装にかなりの専門知識や労力を要する。本稿では,mi-hmcタスクのための最大制約モジュール(tlmcm)ネットワークを用いた転送学習を提案する。 TLMCMネットワークは、上記の課題を克服するための新しいアプローチを提供し、平均精度とリコール曲線($AU\overline{(PRC)}$)測定値に基づく既存の手法よりも優れている。さらに、本研究では、mi-hmcタスクの文脈で広く研究されていない2つの新しい精度指標である$emr$と$hammingaccuracy$を提案する。実験の結果,TLMCMネットワークはMI-HMCタスクに対して高いマルチラベル予測精度(80\%$-90\%$)を達成し,医療領域アプリケーションに有用な貢献をすることが示された。

Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper proposes Transfer Learning with Maximum Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers a novel approach to overcome the aforementioned challenges, outperforming existing methods based on the Area Under the Average Precision and Recall Curve($AU\overline{(PRC)}$) metric. In addition, this research proposes two novel accuracy metrics, $EMR$ and $HammingAccuracy$, which have not been extensively explored in the context of the MI-HMC task. Experimental results demonstrate that the TLMCM network achieves high multi-label prediction accuracy($80\%$-$90\%$) for MI-HMC tasks, making it a valuable contribution to healthcare domain applications.

翻訳日:2023-11-14 19:50:22 公開日:2023-11-11

# 単発視覚追跡における画像関連誘導バイアスの活用

Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking ( http://arxiv.org/abs/2310.19542v2 )

ライセンス: Link先を確認

Chuanming Tang, Kai Wang, Joost van de Weijer, Jianlin Zhang, Yongmei Huang

(参考訳) 視覚追跡における最先端のパフォーマンスにもかかわらず、最近のシングルブランチトラッカーは、ビジョントランスフォーマー(ViT)エンコーダと推論パイプラインに関連する、弱い前提を見逃す傾向にある。さらに, 判別トラッカの有効性は, デュアルブランチパイプラインの採用により制限されている。単分岐ネットワークと識別モデルとのギャップを埋めるための適応型ViTモデル予測トラッカー(AViTMP)を提案する。具体的には,提案するエンコーダavit-encにおいて,vitに基づく密組込みパラダイムを豊かにするために,アダプタモジュールとジョイントターゲット状態埋め込みを導入する。次にavit-encと密輸デコーダと判別対象モデルを組み合わせて正確な位置を推定する。さらに,従来の推論手法の限界を緩和するため,双方向のサイクルトラッキング検証により,トラクタの存在下でのロバスト性を向上するCycleTrackという新しい推論パイプラインを提案する。最後に,長期的なシナリオにおいて大きな課題を積極的に処理する,デュアルフレーム更新推論戦略を提案する。実験では,lasot,lasotextsub,avistなどを含む総合評価のための10のトラッキングベンチマークについてavitmpを評価した。実験結果から,AViTMPが最先端の性能,特に長期追跡とロバスト性を達成したことが明らかとなった。

Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.

翻訳日:2023-11-14 19:49:21 公開日:2023-11-11

# デバイアス言語表現モデルにおける保護グループを傷つけるな

Do Not Harm Protected Groups in Debiasing Language Representation Models ( http://arxiv.org/abs/2310.18458v2 )

ライセンス: Link先を確認

Chloe Qinyu Zhu, Rickard Stureborg, Brandon Fain

(参考訳) 実世界のデータで訓練された言語表現モデル(LRM)は、望ましくない偏見を捉え、悪化させ、様々な人口集団の人々の不公平な扱いを引き起こす可能性がある。単語埋め込みなどのベンチマーク評価におけるバイアスを取り除くため, LRMに介入する手法がいくつか研究されている。しかし、デバイアス介入の副作用は通常下流タスクでは明らかにされない。本稿では,偏見の公平性を評価するための評価セットであるxGAP-DEBIASを提案する。本研究は,現実のテキスト分類タスクにおける4つのデバイアス手法について検討し,デバイアス化手法が保護を目的としているものを含め,すべての人口集団において,バイアスの低減が性能低下のコストとなることを示す。我々は,保護集団に害を与えないような制約で,デバイアスング技術は下流のパフォーマンスを良くするべきだと主張する。

Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.

翻訳日:2023-11-14 19:48:30 公開日:2023-11-11

# OpinSummEval: 意見要約のための自動評価の再検討

OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization ( http://arxiv.org/abs/2310.18122v2 )

ライセンス: Link先を確認

Yuchen Shen, Xiaojun Wan

(参考訳) 意見要約は、側面や感情に特有な焦点をあてることから、他の種類の要約タスクとは分離する。 ROUGEのような一部の自動評価手法が人気を博しているが、意見要約の質を評価するには信頼性が低い。本稿では,人間の判断と14の意見要約モデルからの出力からなるデータセットであるopinsummevalを提案する。さらに、4次元にわたる24の自動測定値と人間の評価値の相関について検討する。以上の結果から,ニューラルネットに基づく指標は一般に非ニューラル指標よりも優れていることが示唆された。しかしながら、BART や GPT-3/3.5 のような強力なバックボーン上に構築されたメトリクスでさえ、すべての次元にわたって一貫して相関するわけではなく、意見要約のための自動評価手法の進歩の必要性を強調している。コードとデータはhttps://github.com/A-Chicharito-S/OpinSummEval/tree/mainで公開されている。

Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are publicly available at https://github.com/A-Chicharito-S/OpinSummEval/tree/main.

翻訳日:2023-11-14 19:47:33 公開日:2023-11-11

# マイクロ波シールド超低温分子の熱化への展望

Prospects for thermalization of microwave-shielded ultracold molecules ( http://arxiv.org/abs/2310.17812v2 )

ライセンス: Link先を確認

Reuben R. W. Wang and John L. Bohn

(参考訳) マイクロ波遮蔽極性分子フェルミオン希薄気体における異方性熱分解の研究を行った。しきい値以上の衝突エネルギーについては, 前方散乱の強い好みと全断面のエネルギー低下により熱化が抑制され, 蒸発冷却の効率が著しく低下することがわかった。 Dengらによって導かれる有効ポテンシャルエネルギー面について密結合計算を行う。 [Phys. Rev. 130, 183001 (2023)], 衝突エネルギー範囲にわたって正確な2体弾性差動断面積を得る。ガウス過程回帰(gaussian process regression)を用いて、広い範囲の衝突角とエネルギーにわたって微分断面積の大域的な表現を得る。平衡への経路は、熱化を達成するための衝突効率の尺度によって定量化され、クロス次元再熱化実験によって分析される。

We study anisotropic thermalization in dilute gases of microwave shielded polar molecular fermions. For collision energies above the threshold regime, we find that thermalization is suppressed due to a strong preference for forward scattering and a reduction in total cross section with energy, significantly reducing the efficiency of evaporative cooling. We perform close-coupling calculations on the effective potential energy surface derived by Deng et al. [Phys. Rev. Lett. 130, 183001 (2023)], to obtain accurate 2-body elastic differential cross sections across a range of collision energies. We use Gaussian process regression to obtain a global representation of the differential cross section, over a wide range of collision angles and energies. The route to equilibrium is then analyzed with cross-dimensional rethermalization experiments, quantified by a measure of collisional efficiency toward achieving thermalization.

翻訳日:2023-11-14 19:47:19 公開日:2023-11-11

# 単一スピンにおける非エルミート系の結び目位相の観察

Observation of the Knot Topology of Non-Hermitian Systems in a Single Spin ( http://arxiv.org/abs/2311.03642v2 )

ライセンス: Link先を確認

Yang Wu, Yunhan Wang, Xiangyu Ye, Wenquan Liu, Chang-Kui Duan, Ya Wang, Xing Rong, and Jiangfeng Du

(参考訳) 系の非ハーモニティ性は、エルミート的トポロジーを持たない異なる結び目トポロジーをもたらす。本稿では,長いコヒーレンス時間窒素空洞中心を持つ普遍的希釈法に基づく,ギャップ付き非エルミート系における結び目トポロジーの包括的研究を,$^{\text{12}}$C同位体精製ダイヤモンドで報告する。エネルギーバンドのブレイディングパターンと固有状態トポロジーの両方が明らかにされる。さらに,非エルミート系の位相的不変性を明らかにするため,固有状態トポロジーに関連する大域的生物rthogonal berry相が観察された。提案手法は,非エルミート量子系におけるバンドブレイディング,固有状態トポロジー,対称性間の相互作用のさらなる探索方法である。

The non-Hermiticity of the system gives rise to distinct knot topology that has no Hermitian counterpart. Here, we report a comprehensive study of the knot topology in gapped non-Hermitian systems based on the universal dilation method with a long coherence time nitrogen-vacancy center in a $^{\text{12}}$C isotope purified diamond. Both the braiding patterns of energy bands and the eigenstate topology are revealed. Furthermore, the global biorthogonal Berry phase related to the eigenstate topology has been successfully observed, which identifies the topological invariance for the non-Hermitian system. Our method paves the way for further exploration of the interplay among band braiding, eigenstate topology and symmetries in non-Hermitian quantum systems.

翻訳日:2023-11-14 19:39:55 公開日:2023-11-11

# 汎用的異常検出と理解に向けて:大規模視覚言語モデル(gpt-4v)がリード

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead ( http://arxiv.org/abs/2311.02782v2 )

ライセンス: Link先を確認

Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, and Weiming Shen

(参考訳) 異常検出は、さまざまなドメインとデータタイプにまたがる重要なタスクである。しかし、既存の異常検出モデルは、しばしば特定の領域とモダリティのために設計される。本研究では,視覚言語モデルであるgpt-4v(ision)を用いて,異常検出タスクを汎用的に処理する。 gpt-4vのマルチモダリティ,画像,ビデオ,ポイントクラウド,時系列データを含むマルチドメイン異常検出タスクにおいて,産業,医療,論理,ビデオ,3次元異常検出,ローカライズタスクなど,複数のアプリケーション領域にまたがる適用について検討した。 GPT-4Vの性能を高めるために,クラス情報や人的専門知識,参照画像など,さまざまな種類の付加的手がかりをプロンプトとして組み込んで,GPT-4Vは,ゼロ・ワンショット異常検出において,グローバルおよび微粒なセマンティックパターンの検出と説明に極めて有効であることが実証された。これにより、正常例と異常例を正確に区別することができる。本研究では広範な評価を行ったが,GPT-4Vの汎用異常検出能力のさらなる活用には今後の評価が必要である。定量的指標の探索、評価ベンチマークの拡張、マルチラウンドインタラクションの導入、ヒューマンフィードバックループの導入などだ。それにもかかわらず、gpt-4vは一般的な異常検出と理解において有望な性能を示し、異常検出のための新しい道を開く。

Anomaly detection is a crucial task across different domains and data types. However, existing anomaly detection models are often designed for specific domains and modalities. This study explores the use of GPT-4V(ision), a powerful visual-linguistic model, to address anomaly detection tasks in a generic manner. We investigate the application of GPT-4V in multi-modality, multi-domain anomaly detection tasks, including image, video, point cloud, and time series data, across multiple application areas, such as industrial, medical, logical, video, 3D anomaly detection, and localization tasks. To enhance GPT-4V's performance, we incorporate different kinds of additional cues such as class information, human expertise, and reference images as prompts.Based on our experiments, GPT-4V proves to be highly effective in detecting and explaining global and fine-grained semantic patterns in zero/one-shot anomaly detection. This enables accurate differentiation between normal and abnormal instances. Although we conducted extensive evaluations in this study, there is still room for future evaluation to further exploit GPT-4V's generic anomaly detection capacity from different aspects. These include exploring quantitative metrics, expanding evaluation benchmarks, incorporating multi-round interactions, and incorporating human feedback loops. Nevertheless, GPT-4V exhibits promising performance in generic anomaly detection and understanding, thus opening up a new avenue for anomaly detection.

翻訳日:2023-11-14 19:39:44 公開日:2023-11-11

# 画像ベースおよび臨床バイオメディシンにおけるマルチモーダル機械学習:調査と展望

Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects ( http://arxiv.org/abs/2311.02332v2 )

ライセンス: Link先を確認

Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, Olivier Gevaert and Arvind Rao

(参考訳) 医療人工知能(AI)システムにおける機械学習(ML)の応用は、伝統的な統計手法からディープラーニングモデルの適用の増加へと移行している。本研究は,マルチモーダルmlの現状を概観し,医療画像解析と臨床意思決定支援システムへの深い影響に注目した。マルチモーダル表現,融合,翻訳,アライメント,コラーニングの課題とイノベーションを強調し,臨床予測のためのマルチモーダルモデルの変換可能性について検討した。また、このようなモデルの実用的な実装に疑問を呈し、意思決定支援システムと医療提供者のダイナミクスに注意を向けている。進歩にもかかわらず、多くの生物医学領域におけるデータバイアスや「ビッグデータ」の不足といった課題が続いている。我々は、失敗をさらに進めるために効果的なイノベーションと協力的努力に関する議論を締めくくった。

Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also questions practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on effective innovation and collaborative efforts to further the miss

翻訳日:2023-11-14 19:38:08 公開日:2023-11-11

# リプレースサンプルを用いた言語モデルのベンチマークと汚染の再検討

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples ( http://arxiv.org/abs/2311.04850v2 )

ライセンス: Link先を確認

Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica

(参考訳) 大規模な言語モデルは、人間が生成したすべてのデータに基づいて、ますます訓練されている。多くの人は、事前トレーニングや微調整データセットの潜在的な汚染のために、公開ベンチマークの信頼性を懸念している。ほとんどのデータ汚染対策は、文字列マッチング(例えばn-gramオーバーラップ)を用いてベンチマークデータを除去するが、これらの手法は不十分であり、単純なテストデータ(例えばパラフレーズ、翻訳)はこれらの汚染対策を簡単に回避できることを示す。さらに, テストデータのばらつきが排除されない場合, 13Bモデルはテストベンチマークに容易に適合し, GPT-4と同等の性能が得られることを示した。我々は、MMLU、GSK8k、HumanEvalなどの広く使われているベンチマークにおいて、そのような観測を検証した。この増大するリスクに対処するために,llmに基づくより強固な除染法を提案し,広く使用されている事前訓練および微調整データセットに適用し,これまで未知だったテストの重なりを明らかにした。例えば、RedPajama-Data-1TやStarCoder-Dataといった事前トレーニングセットでは、HumanEvalベンチマークの8-18\%が重複していることが分かりました。興味深いことに、gpt-3.5/4が生成する合成データセットにもそのような汚染が見られ、意図しない汚染の可能性を示唆している。パブリックなベンチマークを使用する場合、コミュニティはより強い汚染除去アプローチを採用するように促します。さらに,モデルを正確に評価するために,新たなワンタイム試験を積極的に実施するようコミュニティに呼びかける。我々の除染ツールはhttps://github.com/lm-sys/llm-decontaminator.comで公開されている。

Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets. While most data decontamination efforts apply string matching (e.g., n-gram overlap) to remove benchmark data, we show that these methods are insufficient, and simple variations of test data (e.g., paraphrasing, translation) can easily bypass these decontamination measures. Furthermore, we demonstrate that if such variation of test data is not eliminated, a 13B model can easily overfit a test benchmark and achieve drastically high performance, on par with GPT-4. We validate such observations in widely used benchmarks such as MMLU, GSK8k, and HumanEval. To address this growing risk, we propose a stronger LLM-based decontamination method and apply it to widely used pre-training and fine-tuning datasets, revealing significant previously unknown test overlap. For example, in pre-training sets such as RedPajama-Data-1T and StarCoder-Data, we identified that 8-18\% of the HumanEval benchmark overlaps. Interestingly, we also find such contamination in synthetic dataset generated by GPT-3.5/4, suggesting a potential risk of unintentional contamination. We urge the community to adopt stronger decontamination approaches when using public benchmarks. Moreover, we call for the community to actively develop fresh one-time exams to evaluate models accurately. Our decontamination tool is publicly available at https://github.com/lm-sys/llm-decontaminator.

翻訳日:2023-11-14 19:26:34 公開日:2023-11-11

# ビデオオブジェクトセグメンテーションにおけるアノテーションの学習

Learning the What and How of Annotation in Video Object Segmentation ( http://arxiv.org/abs/2311.04414v2 )

ライセンス: Link先を確認

Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos

(参考訳) ビデオオブジェクトセグメンテーション(VOS)は、ビデオ編集からビデオデータ生成まで、いくつかのアプリケーションにとって不可欠である。 VOSモデルのトレーニングには、手動でラベル付けされたトレーニングビデオが多数必要である。オブジェクトをアノテートする方法のデファクトでは、ビデオフレームごとにターゲットオブジェクトに詳細なセグメンテーションマスクを描く必要がある。しかし、このアノテーションプロセスは退屈で時間がかかります。このアノテーションコストを削減するため,ビデオオブジェクトセグメンテーションのためのヒューマンインザループアノテーションフレームワークであるEVA-VOSを提案する。従来のアプローチとは異なり、どのフレーム("What")をアノテーションにするか、どのアノテーションタイプ("How")を使うのかを反復的に予測するエージェントを導入します。次に、アノテーションはVOSモジュールの更新に使用される選択されたフレームのみに注釈を付け、アノテーションの時間が大幅に向上する。我々はMOSEとDAVISデータセットの実験を行い、次のように示す。 (a)EVA-VOSは、ビデオの標準的な注釈付け方法よりも3.5倍早く、人間の同意に近い精度のマスクにつながる。 b)我々のフレーム選択は最先端のパフォーマンスを達成する。 c) eva-vosは、他のすべてのメソッドやベースラインと比較して、アノテーション時間の観点から大きなパフォーマンス向上をもたらす。

Video Object Segmentation (VOS) is crucial for several applications, from video editing to video data generation. Training a VOS model requires an abundance of manually labeled training videos. The de-facto traditional way of annotating objects requires humans to draw detailed segmentation masks on the target objects at each video frame. This annotation process, however, is tedious and time-consuming. To reduce this annotation cost, in this paper, we propose EVA-VOS, a human-in-the-loop annotation framework for video object segmentation. Unlike the traditional approach, we introduce an agent that predicts iteratively both which frame ("What") to annotate and which annotation type ("How") to use. Then, the annotator annotates only the selected frame that is used to update a VOS module, leading to significant gains in annotation time. We conduct experiments on the MOSE and the DAVIS datasets and we show that: (a) EVA-VOS leads to masks with accuracy close to the human agreement 3.5x faster than the standard way of annotating videos; (b) our frame selection achieves state-of-the-art performance; (c) EVA-VOS yields significant performance gains in terms of annotation time compared to all other methods and baselines.

翻訳日:2023-11-14 19:24:35 公開日:2023-11-11

# 擬ランダムアイソメトリ

Pseudorandom Isometries ( http://arxiv.org/abs/2311.02901v3 )

ライセンス: Link先を確認

Prabhanjan Ananth, Aditya Gulati, Fatih Kaleoglu, Yao-Ting Lin

(参考訳) 我々は、${\cal Q}$-secure pseudorandom isometries (PRI)と呼ばれる新しい概念を導入する。擬似乱数等長法(pseudorandom isometry)は、n$-qubit状態から$(n+m)$-qubit状態へ等長法でマッピングする効率的な量子回路である。セキュリティに関して言えば、$\rho$ 上の$q$-fold pri の出力は、任意の多項式 $q$ に対して$ \rho \in {\cal q}$ に対して、$\rho$ 上の$q$-fold haar 等長の出力と計算的に区別できないべきである。 ${\cal Q}$を微調整することで、擬似ランダム性の多くの既存の概念を回復する。我々は、pri の構成と、量子一方向関数を仮定すると、${\cal q}$-secure pseudorandom isometries (pri) の安全性を、${\cal q}$ の異なる興味深い設定に対して証明する。また、prisの暗号応用として、量子疑似ランダム性概念に対する長さ拡張定理、量子状態に対するメッセージ認証スキーム、マルチコピーセキュアな公開およびプライベート暗号スキーム、簡潔な量子コミットメントなどがある。

We introduce a new notion called ${\cal Q}$-secure pseudorandom isometries (PRI). A pseudorandom isometry is an efficient quantum circuit that maps an $n$-qubit state to an $(n+m)$-qubit state in an isometric manner. In terms of security, we require that the output of a $q$-fold PRI on $\rho$, for $ \rho \in {\cal Q}$, for any polynomial $q$, should be computationally indistinguishable from the output of a $q$-fold Haar isometry on $\rho$. By fine-tuning ${\cal Q}$, we recover many existing notions of pseudorandomness. We present a construction of PRIs and assuming post-quantum one-way functions, we prove the security of ${\cal Q}$-secure pseudorandom isometries (PRI) for different interesting settings of ${\cal Q}$. We also demonstrate many cryptographic applications of PRIs, including, length extension theorems for quantum pseudorandomness notions, message authentication schemes for quantum states, multi-copy secure public and private encryption schemes, and succinct quantum commitments.

翻訳日:2023-11-14 19:22:49 公開日:2023-11-11

# THOS: ターゲットのヘイトと攻撃的スピーチのためのベンチマークデータセット

THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech ( http://arxiv.org/abs/2311.06446v1 )

ライセンス: Link先を確認

Saad Almohaimeed, Saleh Almohaimeed, Ashfaq Ali Shafin, Bogdan Carbunar and Ladislau B\"ol\"oni

(参考訳) Twitterのようなソーシャルメディア上の有害コンテンツを検出することは、一見単純なye/no分類がかなりの複雑さを隠蔽しているという事実によって難しい。残念なことに、ヘイトとアグレッシブスピーチで分類器を訓練するためにいくつかのデータセットが収集されているが、ターゲットクラスと特定のターゲットの細かい粒度でラベル付けされたデータセットは少ない。本稿では,メッセージのターゲットに関する詳細なアノテーションを手作業でラベル付けした8.3kツイートのデータセットTHOSを紹介する。このデータセットは,大規模言語モデルに基づく分類器を訓練し,この粒度レベルでの分類を可能にすることを実証する。

Detecting harmful content on social media, such as Twitter, is made difficult by the fact that the seemingly simple yes/no classification conceals a significant amount of complexity. Unfortunately, while several datasets have been collected for training classifiers in hate and offensive speech, there is a scarcity of datasets labeled with a finer granularity of target classes and specific targets. In this paper, we introduce THOS, a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message. We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.

翻訳日:2023-11-14 18:49:30 公開日:2023-11-11

# 一般硬貨行列を用いた3状態量子ウォークの固有値解析

Eigenvalue analysis of three-state quantum walks with general coin matrices ( http://arxiv.org/abs/2311.06468v1 )

ライセンス: Link先を確認

Jir\^o Akahori, Chusei Kiumi, Norio Konno, Takuya Watanabe

(参考訳) 固有値の存在に関する数学的解析は、量子ウォークの極めて重要な性質である局所化の発生に対応するため、不可欠である。以前の研究では、転送行列を用いた固有値解析は、グローバー行列を含む特定のコイン行列のクラスを持つ空間不均質な3状態量子ウォークに有用であることが証明されている。本研究では,一般のコイン行列を用いた3状態量子ウォークの伝達行列に注意を向ける。従来の研究手法に基づき, 伝達行列の性質を深く調査し, これまで解析不能であったモデルの固有値の導出に数値解析を適用した。

Mathematical analysis on the existence of eigenvalues is vital, as it corresponds to the occurrence of localization, an exceptionally important property of quantum walks. Previous studies have demonstrated that eigenvalue analysis utilizing the transfer matrix proves beneficial for space inhomogeneous three-state quantum walks with a specific class of coin matrices, including Grover matrices. In this research, we turn our attention to the transfer matrix of three-state quantum walks with a general coin matrix. Building upon previous research methodologies, we dive deeper into investigating the properties of the transfer matrix and employ numerical analysis to derive eigenvalues for models that were previously unanalyzable.

翻訳日:2023-11-14 18:36:21 公開日:2023-11-11

# 項目応答理論を用いた適応言語に基づくメンタルヘルス評価

Adaptive Language-based Mental Health Assessment with Item-Response Theory ( http://arxiv.org/abs/2311.06467v1 )

ライセンス: Link先を確認

Vasudha Varadarajan, Sverker Sikstr\"om, Oscar N.E. Kjell and H. Andrew Schwartz

(参考訳) メンタルヘルスの問題は個人によって大きく異なり、徴候や症状の症状はかなり異種である。近年, 言語による抑うつと不安評価は, 患者自身の言語を評価することによって, この異質な性質を捉えることを約束している。本研究では,適応的な言語に基づくアセスメントを導入する。モデルが問うべき質問に対する限定言語応答に基づいて,個人の心理的スコアを反復的に推定するタスクである。そこで本研究では,古典的テスト理論 (CTT) と項目応答理論 (IRT) の2つの統計的学習に基づく計測・検査手法について検討する。一般に適応テストを用いることで、標準テストで高い妥当性(r ~ 0.7)を達成するのに必要な質問数が大幅に減少し、11問から3問に低下し、5問に不安が生じた。課題の組合せ的性質を考慮し,オーダリングとスコアリングの両目的に対する複数の戦略を実証的に評価し,半教師付き項目応答理論に基づく手法 (ALIRT) と教師付きアクタ批判に基づくモデルを導入する。どちらのモデルもランダム順序と固定順序よりも大幅に改善されているが、alirtはより少ない質問数で最高精度を達成するスケーラブルなモデルである(例えば、pearson r ~ 0.93は3つの質問で達成されている)。全体としてalirtは、精度や計算コストを損なうことなく、質問の数を減らすことができる。

Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs.

翻訳日:2023-11-14 18:36:09 公開日:2023-11-11

# 無線通信に基づく電子通信データリンク暗号化シミュレーション

Electronic Communication Data Link Encryption Simulation Based on Wireless Communication ( http://arxiv.org/abs/2311.06462v1 )

ライセンス: Link先を確認

Rulin Bai

(参考訳) 筆者は,電子通信データリンク暗号化のシミュレーション効果を向上させるため,無線通信に基づくソリューションを提案する。この技術の主な内容は、無線通信の研究、楕円曲線暗号アルゴリズムの改善、システム暗号化モデルの構築、合法的かつ有効なノード秘密鍵の取得、システムの関連するセキュリティ属性の評価と分析、鍵のセキュリティの検証、無線通信の暗号化最適化の実現である。改良された楕円曲線を用いて、ネットワーク通信における証明なし公開鍵暗号システムの下でのシステムデータチェーンの暗号化をシミュレートし、その時間は2.31ミリ秒であり、他のアルゴリズムよりも低い。結論: 無線通信に基づく技術研究が電子通信データリンクの暗号化シミュレーション効果を効果的に改善できることが実証された。

In order to improve the simulation effect of electronic communication data link encryption, the author proposes a solution based on wireless communication. The main content of this technology is based on the research of wireless communication, improve the elliptic curve cryptographic algorithm to build a system encryption model, obtain legal and valid node private keys, evaluate and analyze the relevant security attributes of the system, verify the security of the keys, and realize the encryption optimization of wireless network communication. Experimental results show that: Using the improved elliptic curve to simulate the system data chain encryption under the certificateless public key cryptosystem in network communication, the time is only 2.31 milliseconds, which is lower than other algorithms. Conclusion: It is proved that the technology research based on wireless communication can effectively improve the encryption simulation effect of electronic communication data link.

翻訳日:2023-11-14 18:35:38 公開日:2023-11-11

# Logit Adjusted Softmaxによるオンライン連続学習

Online Continual Learning via Logit Adjusted Softmax ( http://arxiv.org/abs/2311.06460v1 )

ライセンス: Link先を確認

Zhehao Huang, Tao Li, Chenhe Yuan, Yingwen Wu, Xiaolin Huang

(参考訳) オンライン連続学習は、モデルが壊滅的な忘れ去らないまま、非定常データストリームから学ぶ必要がある難しい問題である。トレーニング中のクラス間の不均衡は、忘れる主な原因として特定され、最近学習されたクラスに対するモデル予測バイアスに繋がる。本稿では,クラス間不均衡が不均衡なクラスプライアーによるものであることを理論的に解析し,クラス内固有分布から得られる関数はベイズ最適分類器である。そこで本研究では,トレーニング中のモデルロジットの簡単な調整により,先行クラスバイアスに効果的に抵抗し,対応するベイズ最適化を追求できることを示す。提案手法であるLogit Adjusted Softmaxは,クラス増分だけでなく,現実的な一般設定においてもクラス間不均衡の影響を軽減し,計算コストを抑える。我々は,様々なベンチマークでアプローチを評価し,先行技術と比較して有意な性能改善を示す。例えば、CIFAR10のベースラインを4.6%改善しています。

Online continual learning is a challenging problem where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.

翻訳日:2023-11-14 18:35:23 公開日:2023-11-11

# 非対称コントラストマルチモーダル学習による化学理解の促進

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding ( http://arxiv.org/abs/2311.06456v1 )

ライセンス: Link先を確認

Hao Xu, Yifei Wang, Yunrui Li, Pengyu Hong

(参考訳) マルチモーダル深層学習の汎用性は、科学的研究と実践的応用の進歩に非常に有望である。この分野が発展を続けるにつれ、クロスモーダル分析の集団的力は革新的イノベーションを駆動し、化学理解と発見の新しいフロンティアへと導かれる。そこで本研究では, 分子に適した新しいアプローチとして, 非対称コントラスト型M}ultimodal Learning (ACML)を導入し, 化学分野の進展の可能性を示した。 ACMLは効果的な非対称コントラスト学習の力を利用して、様々な化学修飾物から分子グラフ表現への情報をシームレスに伝達する。事前訓練された化学ユニモーダルエンコーダと浅層設計のグラフエンコーダを組み合わせることで、ACMLは、異なるモダリティから協調した化学意味論の同化を促進する。この革新的な枠組みは、学習表現の解釈性を高め、グラフニューラルネットワークの表現力を高める。異性体識別や薬物発見のための重要な化学的性質の発見といった実践的なタスクを通じて、ACMLは化学研究と応用に革命をもたらす能力を示し、異なるモダリティの化学的意味をより深く理解している。

The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, leading us to new frontiers in chemical understanding and discovery. Hence, we introduce Asymmetric Contrastive M}ultimodal Learning (ACML) as a novel approach tailored for molecules, showcasing its potential to advance the field of chemistry. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. This innovative framework enhances the interpretability of learned representations and bolsters the expressive power of graph neural networks. Through practical tasks such as isomer discrimination and uncovering crucial chemical properties for drug discovery, ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of chemical semantics of different modalities.

翻訳日:2023-11-14 18:35:06 公開日:2023-11-11

# Aria-NeRF:マルチモーダルエゴセントリックビュー合成

Aria-NeRF: Multimodal Egocentric View Synthesis ( http://arxiv.org/abs/2311.06455v1 )

ライセンス: Link先を確認

Jiankai Sun, Jianing Qiu, Chuanyang Zheng, John Tucker, Javier Yu, Mac Schwager

(参考訳) 我々は,Neural Radiance Fields (NeRFs) にインスパイアされた可変体積線トレーシングに基づいて,エゴセントリックデータから学習したリッチでマルチモーダルなシーンモデルの開発を加速する。 Egocentric image sequenceからのNeRFライクなモデルの構築は、人間の行動を理解する上で重要な役割を担い、VR/ARの領域における多様な応用を担っている。このような自己中心型NeRFのようなモデルは現実的なシミュレーションとして利用でき、現実世界でタスクを実行する知的エージェントの進歩に大きく貢献する。 Egocentric view synthesisの将来は、現在のNeRFを超える新しい環境表現に繋がる可能性がある。例えば、移動追跡のためのIMU、表面テクスチャと人間の言語コンテキストをキャプチャするオーディオセンサー、シーンにおける人間の注意パターンを推測するアイ・ゲイズ・トラッカーなどである。エゴセントリック・マルチモーダル・シーン・モデリングの開発と評価を支援するため,包括的マルチモーダル・エゴセントリック・ビデオ・データセットを提案する。このデータセットは、RGB画像、アイトラッキングカメラの映像、マイクからの音声記録、気圧計からの気圧測定、GPSからの位置座標、Wi-FiとBluetoothの接続の詳細、デュアル周波数IMUデータセット(1kHzと800Hz)と磁気センサのペアによる情報を含む、総合的なセンサデータの収集を提供する。データセットはMeta Aria Glassesウェアラブルデバイスプラットフォームで収集された。このデータセットで捉えた多様なデータモダリティと現実世界のコンテキストは、人間の行動に対する理解を深め、VR、AR、ロボット工学の領域でより没入的でインテリジェントな体験を可能にする、堅牢な基盤となる。

We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics.

翻訳日:2023-11-14 18:34:42 公開日:2023-11-11

# 異常予測を識別するためのsaliency-based clustering framework

A Saliency-based Clustering Framework for Identifying Aberrant Predictions ( http://arxiv.org/abs/2311.06454v1 )

ライセンス: Link先を確認

Aina Tersol Montserrat, Alexander R. Loftus, Yael Daihes

(参考訳) 機械学習では、分類タスクは現実世界の幅広いアプリケーションの基礎となる。信頼性があり信頼性の高い分類は、特にバイオメディカルな環境では複雑であり、基礎的真実は本質的に不確実であり、ラベル付けの高度な専門知識に依存している。正確さやリコールのような伝統的なメトリクスは、価値はあるが、これらの曖昧なシナリオのニュアンスを捉えるには不十分である。ここでは,分類誤りの性質が頻度と同じくらい重要であることを強調して,異常予測の概念を紹介する。本稿では,誤分類率の低減と異常予測の識別を目的とした,新しい効率的な学習手法を提案する。我々のフレームワークはモデルの性能を大幅に向上させ、精度を20倍に向上させる。本手法を獣医学の分野である獣医学の分野に応用し, 被曝率は高いが, 人体医学に比べて広く研究されていない。異常予測の識別と緩和に焦点をあてて、獣医学の世界における新しい応用を含む実世界のシナリオにおける機械学習分類器の有用性と信頼性を高める。

In machine learning, classification tasks serve as the cornerstone of a wide range of real-world applications. Reliable, trustworthy classification is particularly intricate in biomedical settings, where the ground truth is often inherently uncertain and relies on high degrees of human expertise for labeling. Traditional metrics such as precision and recall, while valuable, are insufficient for capturing the nuances of these ambiguous scenarios. Here we introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency. We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions. Our framework demonstrates a substantial improvement in model performance, achieving a 20\% increase in precision. We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine. By focusing on the identification and mitigation of aberrant predictions, we enhance the utility and trustworthiness of machine learning classifiers in high-stakes, real-world scenarios, including new applications in the veterinary world.

翻訳日:2023-11-14 18:34:07 公開日:2023-11-11

# docgen: pythonで詳細なパラメータdocstringを生成する

DocGen: Generating Detailed Parameter Docstrings in Python ( http://arxiv.org/abs/2311.06453v1 )

ライセンス: Link先を確認

Vatsal Venkatkrishna, Durga Shree Nagabushanam, Emmanuel Iko-Ojo Simon, Fatemeh H. Fard, Melina Vidoni, Zadia Codabux

(参考訳) ドキュメンテーションの負債は、オープンソースソフトウェアの効果的な利用を妨げる。コード要約ツールは開発者にとって有用だが、ほとんどの場合、高レベルの要約ではなく、関数内の各パラメータの詳細な説明を好む。しかしながら、このような要約の生成は、高品質なトレーニングデータがないため、単一の生成モデルが確実に生成するには複雑すぎる。そこで本稿では,docstringの特定の部分を生成する複数のタスク固有モデルを組み合わせたマルチステップアプローチを提案する。これらのモデルの組み合わせは、最終的な docstring に各セクションを含めることを保証する。提案手法を,自動測定と人中心評価の両方を用いて既存の生成モデルと比較し,既存の手法よりもアプローチの方が優れていることを示す。

Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods.

翻訳日:2023-11-14 18:33:49 公開日:2023-11-11

# 偽負推定によるEコマース検索におけるプールバイアスの緩和

Mitigating Pooling Bias in E-commerce Search via False Negative Estimation ( http://arxiv.org/abs/2311.06444v1 )

ライセンス: Link先を確認

Xiaochen Wang, Xiao Xiao, Ruhan Zhang, Xuan Zhang, Taesik Na, Tejaswi Tenneti, Haixun Wang and Fenglong Ma

(参考訳) ユーザエクスペリエンスとビジネス成功には、効率的で正確な製品関連性評価が不可欠です。熟練した妥当性評価モデルのトレーニングには高品質なクエリ生成ペアが必要である。残念ながら、現在の手法では誤った否定を誤ってサンプリングし、パフォーマンスとビジネスへの影響を減らし、プールバイアスを導入しています。そこで本研究では,従来の偽陰性推定アルゴリズムに基づいて,偽陰性の検出・調整に適した新しいネガティブサンプリング手法であるBias-mitigating Hard Negative Smpling(BHNS)を提案する。 Instacartサーチセッティングの実験により,BHNSが実用的なeコマースに有効であることが確認された。さらに、パブリックデータセットにおける比較分析は、多様なアプリケーションに対するドメインに依存しない可能性を示している。

Efficient and accurate product relevance assessment is critical for user experiences and business success. Training a proficient relevance assessment model requires high-quality query-product pairs, often obtained through negative sampling strategies. Unfortunately, current methods introduce pooling bias by mistakenly sampling false negatives, diminishing performance and business impact. To address this, we present Bias-mitigating Hard Negative Sampling (BHNS), a novel negative sampling strategy tailored to identify and adjust for false negatives, building upon our original False Negative Estimation algorithm. Our experiments in the Instacart search setting confirm BHNS as effective for practical e-commerce use. Furthermore, comparative analyses on public dataset showcase its domain-agnostic potential for diverse applications.

翻訳日:2023-11-14 18:33:38 公開日:2023-11-11

# CVTHead:Vertex-Feature Transformer付きワンショット制御可能なヘッドアバター

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer ( http://arxiv.org/abs/2311.06443v1 )

ライセンス: Link先を確認

Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie

(参考訳) パーソナライズ可能な頭部アバターの再構成は、AR/VRの分野で重要な意味を持つ。 3Dモデル(3DMM)の明示的な顔制御を実現するための既存の方法は、通常、単一の対象の多視点画像やビデオに依存しており、再構成プロセスは複雑である。さらに、従来のレンダリングパイプラインは時間がかかり、リアルタイムアニメーションの可能性を制限する。本稿では,単一参照画像から点ベースニューラルネットワークによる制御可能なニューラルネットワークアバターを生成する新しいアプローチであるcvtheadを提案する。 CVTHeadは、メッシュのスパース頂点をポイントセットとみなし、提案したVertex-Feature Transformerを使用して各頂点のローカル特徴記述子を学習する。これにより、すべての頂点間の長距離依存性のモデリングが可能になる。 VoxCelebデータセットの実験結果は、CVTHeadが最先端のグラフィックスベースの手法と同等のパフォーマンスを達成することを示した。さらに, 表情, ポーズ, カメラビューの異なる新規な人間の頭部の効率的なレンダリングを可能にする。これらの属性は、3dmmの係数を使って明示的に制御でき、リアルタイムシナリオで多用途でリアルなアニメーションが容易になる。

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.

翻訳日:2023-11-14 18:33:25 公開日:2023-11-11

# ChaffからBREADでWheatを分離する - テキストの冗長性を検出するためのオープンソースのベンチマークとメトリクス

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text ( http://arxiv.org/abs/2311.06440v1 )

ライセンス: Link先を確認

Isaac Caswell, Lisa Wang, Isabel Papadimitriou

(参考訳) データ品質は、タスク、ドメイン、アーキテクチャに関係なく、NLPの分野全体に永久に再浮上する問題であり、低リソース言語では特に深刻な問題である。トレーニングデータとモデル出力の両方に影響を及ぼす典型的な悪質な問題は、反復的であり、価格カタログやコンピュータ生成ログファイルのような言語的に興味のないボイラープレートによって支配されるデータである。この問題は多くのWebスクレイプコーパスに浸透しているが、テストするベンチマークや、言語全体にわたって一般化し、データ品質の人間の判断に同意する単純なメトリクスを見つけるための体系的な研究はまだない。本研究では,360言語にまたがる反復型ボイラープレート対有理言語コンテンツに関する人間ラベルベンチマークであるbreadを作成・公開する。いくつかの基準値CRED(Character REDundancy)スコアを同時にリリースし,BREADの有効性を評価する。コミュニティはこのリソースをより優れたフィルタリング方法の開発に利用し、credスコアのリファレンス実装が標準的なコーパス評価ツールになり、クリーンな言語モデリングコーパス、特に低リソース言語の開発を促進することを願っています。

Data quality is a problem that perpetually resurfaces throughout the field of NLP, regardless of task, domain, or architecture, and remains especially severe for lower-resource languages. A typical and insidious issue, affecting both training data and model output, is data that is repetitive and dominated by linguistically uninteresting boilerplate, such as price catalogs or computer-generated log files. Though this problem permeates many web-scraped corpora, there has yet to be a benchmark to test against, or a systematic study to find simple metrics that generalize across languages and agree with human judgements of data quality. In the present work, we create and release BREAD, a human-labeled benchmark on repetitive boilerplate vs. plausible linguistic content, spanning 360 languages. We release several baseline CRED (Character REDundancy) scores along with it, and evaluate their effectiveness on BREAD. We hope that the community will use this resource to develop better filtering methods, and that our reference implementations of CRED scores can become standard corpus evaluation tools, driving the development of cleaner language modeling corpora, especially in low-resource languages.

翻訳日:2023-11-14 18:33:05 公開日:2023-11-11

# 動的システムの制御強化のための制御可能性制約付きディープネットワークモデル

Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems ( http://arxiv.org/abs/2311.06438v1 )

ライセンス: Link先を確認

Suruchi Sharma, Volodymyr Makarenko, Gautam Kumar, Stas Tiomkin

(参考訳) 力学の知識を持たない力学系の制御は重要かつ困難な課題である。ディープニューラルネットワーク(DNN)のような現代の機械学習アプローチは、制御入力と対応する状態観測出力から動的モデルの推定を可能にする。このようなデータ駆動モデルはしばしばモデルベースのコントローラの導出に利用される。しかし、一般的には、dnnで表されるモデルは、制御可能性の正式な制御理論的な意味に従って制御可能であるという保証はない。これはしばしば、正式な制御可能性を保証する必要があるアプリケーションにおけるDNN推定モデルの使用を妨げる。本稿では,制御可能性のあるデータから推定されるモデルを明確に拡張する制御理論手法を提案する。これは、制御可能性の低いモデルにペナルティを与える制御可能性制約でモデル推定目標を増大させることによって達成される。その結果, 制御可能性制約により推定されたモデルでは, より効率的な制御器の導出が可能となり, 制御理論量によって解釈可能となり, 長期予測誤差が低くなった。提案手法は、未知の力学のDNNに基づく推定と解の性質の制御理論的保証との関連性に関する新たな知見を提供する。低分解能高次元画像による状態観察を行う2つの標準古典制御系において,提案手法が優れていることを示す。

Control of a dynamical system without the knowledge of dynamics is an important and challenging task. Modern machine learning approaches, such as deep neural networks (DNNs), allow for the estimation of a dynamics model from control inputs and corresponding state observation outputs. Such data-driven models are often utilized for the derivation of model-based controllers. However, in general, there are no guarantees that a model represented by DNNs will be controllable according to the formal control-theoretical meaning of controllability, which is crucial for the design of effective controllers. This often precludes the use of DNN-estimated models in applications, where formal controllability guarantees are required. In this proof-of-the-concept work, we propose a control-theoretical method that explicitly enhances models estimated from data with controllability. That is achieved by augmenting the model estimation objective with a controllability constraint, which penalizes models with a low degree of controllability. As a result, the models estimated with the proposed controllability constraint allow for the derivation of more efficient controllers, they are interpretable by the control-theoretical quantities and have a lower long-term prediction error. The proposed method provides new insights on the connection between the DNN-based estimation of unknown dynamics and the control-theoretical guarantees of the solution properties. We demonstrate the superiority of the proposed method in two standard classical control systems with state observation given by low resolution high-dimensional images.

翻訳日:2023-11-14 18:32:43 公開日:2023-11-11

# 公平へのステップバイステップ:タスク指向対話システムにおける社会バイアスの帰属

Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems ( http://arxiv.org/abs/2311.06513v1 )

ライセンス: Link先を確認

Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel

(参考訳) 近年,タスク指向対話(TOD)システムにおいて,事前学習された大規模言語モデル(LLM)をエンドツーエンドで活用することにより,大幅な改善が見られた。しかし,TOD システムにおける各コンポーネントの偏りの挙動や,エンドツーエンドフレームワークにおけるエラー伝搬の問題により,TOD 応答のバイアスが深刻になる可能性がある。フェアネスの既存の仕事はシステムのバイアスにのみ焦点を合わせます。本論文では,TODシステムの各コンポーネントに偏りを生じさせる診断手法を提案する。提案手法では,バイアスの発生源についてより深く理解することができる。さらに、より粒度の細かいモデル挙動を緩和することができる。性別,年齢,人種の3つの集団軸に対するtodシステムのバイアスを識別する実験を行った。実験結果から,TODシステムのバイアスは通常応答生成モデルから生じることが示された。

Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.

翻訳日:2023-11-14 18:24:03 公開日:2023-11-11

# CNNモデル伝搬を用いた帯域幅ハイパースペクトル画像パシャパニング

Band-wise Hyperspectral Image Pansharpening using CNN Model Propagation ( http://arxiv.org/abs/2311.06510v1 )

ライセンス: Link先を確認

Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giuseppe Scarpa

(参考訳) ハイパースペクトルパンシャープニングは近年、多くの研究論文や課題によって証明されたように、関心が高まっている。低分解能のハイパースペクトルデータキューブと高分解能のシングルバンド画像であるパンクロマティック画像とのピクセルレベルの融合と、パンクロマティック解像度でハイパースペクトルデータキューブを提供することを目的としている。強力な表現能力のおかげで、ディープラーニングモデルは、多くの汎用画像処理タスクで前例のない結果を提供することに成功した。しかしながら、ドメイン固有の問題に移行する場合、例えばこの場合のように、伝統的なモデルベースのアプローチに対する利点は、いくつかの文脈上の理由から、より明確でない。トレーニングデータの空洞化,地味の欠如,データ形状の変動は,ハイパースペクトルパンシャーピングのための最先端のディープラーニングネットワークの一般化能力を制限する要因である。これらの制約に対処するため、本研究では、各バンドが先行するモデルにパンスハーペンを精製する逐次的帯域適応方式でネストされた単純な単一バンドアン教師付きパンスハーペンモデルを継承する新しいディープラーニング手法を提案する。これにより、簡単なモデルが波長次元に沿って適応的かつ柔軟に伝播し、一定の数のスペクトル帯域を持つ必要がなく、大規模で高価なラベル付きトレーニングデータセットを廃棄する必要がない。提案手法は,従来の学習基準法と深層学習基準法の両方より優れた結果が得られる。提案手法の実装はhttps://github.com/giu-guarino/R-PNNで確認できる。

Hyperspectral pansharpening is receiving a growing interest since the last few years as testified by a large number of research papers and challenges. It consists in a pixel-level fusion between a lower-resolution hyperspectral datacube and a higher-resolution single-band image, the panchromatic image, with the goal of providing a hyperspectral datacube at panchromatic resolution. Thanks to their powerful representational capabilities, deep learning models have succeeded to provide unprecedented results on many general purpose image processing tasks. However, when moving to domain specific problems, as in this case, the advantages with respect to traditional model-based approaches are much lesser clear-cut due to several contextual reasons. Scarcity of training data, lack of ground-truth, data shape variability, are some such factors that limit the generalization capacity of the state-of-the-art deep learning networks for hyperspectral pansharpening. To cope with these limitations, in this work we propose a new deep learning method which inherits a simple single-band unsupervised pansharpening model nested in a sequential band-wise adaptive scheme, where each band is pansharpened refining the model tuned on the preceding one. By doing so, a simple model is propagated along the wavelength dimension, adaptively and flexibly, with no need to have a fixed number of spectral bands, and, with no need to dispose of large, expensive and labeled training datasets. The proposed method achieves very good results on our datasets, outperforming both traditional and deep learning reference methods. The implementation of the proposed method can be found on https://github.com/giu-guarino/R-PNN

翻訳日:2023-11-14 18:23:49 公開日:2023-11-11

# CompCodeVet: コードデータセットに対するコンパイラ誘導検証と拡張アプローチ

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset ( http://arxiv.org/abs/2311.06505v1 )

ライセンス: Link先を確認

Le Chen, Arijit Bhattacharjee, Nesreen K. Ahmed, Niranjan Hasabnis, Gal Oren, Bin Lei, Ali Jannesari

(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションで顕著な性能を持つため、学術や産業でますます顕著になっている。これらのモデルがパラメータの増加とともに進化するにつれて、感情分析や機械翻訳といったタスクに優れている。しかし、数十億のパラメータを持つモデルでさえ、マルチステップ推論を必要とするタスクの課題に直面している。コード生成と理解、特にCとC++は、大きな課題として現れます。コードデータセットでトレーニングされたLLMは、多くのタスクで能力を示すが、コンパイル不可能なCとC++のコードの修正に苦労している。当社の調査では,この部分的なパフォーマンスを,トレーニングデータセットの品質と,複雑な推論を必要とする問題の固有の複雑性という,2つの主要な要因に当てはめています。既存の"Chain of Thought"(CoT)促進技術は、多段階推論を強化することを目的としている。しかし、このアプローチはLLMの潜在的な欠点に関連する制限を保っている。本研究では,コンパイル不能なコードからコンパイル可能なコードを生成するコンパイラ誘導型CoTアプローチであるCompCodeVetを提案する。より大規模なLLMを利用する従来のアプローチとは違い,より堅牢なゼロショット思考プロセスを確立するために,コンパイラを教師として採用している。 2つのオープンソースコードデータセットに対するCompCodeVetの評価は、CompCodeVetがLLMのトレーニングデータセット品質を改善する能力を持っていることを示している。

Large language models (LLMs) have become increasingly prominent in academia and industry due to their remarkable performance in diverse applications. As these models evolve with increasing parameters, they excel in tasks like sentiment analysis and machine translation. However, even models with billions of parameters face challenges in tasks demanding multi-step reasoning. Code generation and comprehension, especially in C and C++, emerge as significant challenges. While LLMs trained on code datasets demonstrate competence in many tasks, they struggle with rectifying non-compilable C and C++ code. Our investigation attributes this subpar performance to two primary factors: the quality of the training dataset and the inherent complexity of the problem which demands intricate reasoning. Existing "Chain of Thought" (CoT) prompting techniques aim to enhance multi-step reasoning. This approach, however, retains the limitations associated with the latent drawbacks of LLMs. In this work, we propose CompCodeVet, a compiler-guided CoT approach to produce compilable code from non-compilable ones. Diverging from the conventional approach of utilizing larger LLMs, we employ compilers as a teacher to establish a more robust zero-shot thought process. The evaluation of CompCodeVet on two open-source code datasets shows that CompCodeVet has the ability to improve the training dataset quality for LLMs.

翻訳日:2023-11-14 18:23:21 公開日:2023-11-11

# 産業欠陥の視覚検査のための自己教師付きコンテキスト学習

Self-supervised Context Learning for Visual Inspection of Industrial Defects ( http://arxiv.org/abs/2311.06504v1 )

ライセンス: Link先を確認

Peng Wang, Haiming Yao, Wenyong Yu

(参考訳) 産業製品における欠陥の教師なし視覚検査は、製品表面のかなりの変化のために重大な課題となる。現在の教師なしモデルは、テクスチャの検出とオブジェクトの欠陥のバランスを保ち、遅延表現と複雑な特徴を識別する能力が欠如している。本稿では,有名なジグソーパズルに取り組むことで,最適なエンコーダを導出する自己教師型学習アルゴリズムを提案する。目的画像を9つのパッチに分割し、エンコーダに2つのパッチ間の相対的な位置関係を予測させ、リッチなセマンティクスを抽出する。次に,正規表現と異常表現の差異を強調する親和性提示法を提案する。古典的サポートベクトルデータ記述アルゴリズムを活用すると、最終的な検出結果が得られる。実験結果から,広範に使用されているMVTec ADデータセットにおいて,95.8%,96.8%の精度で検出およびセグメンテーション性能が向上し,テクスチャとオブジェクトの両欠陥に対する最先端のベンチマークが確立された。包括的実験は,多種多様な産業応用における我々のアプローチの有効性を強調する。

The unsupervised visual inspection of defects in industrial products poses a significant challenge due to substantial variations in product surfaces. Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features. In this paper, we present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle. Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics. Subsequently, we introduce an affinity-augmentation method to accentuate differences between normal and abnormal latent representations. Leveraging the classic support vector data description algorithm yields final detection results. Experimental outcomes demonstrate that our proposed method achieves outstanding detection and segmentation performance on the widely used MVTec AD dataset, with rates of 95.8% and 96.8%, respectively, establishing a state-of-the-art benchmark for both texture and object defects. Comprehensive experimentation underscores the effectiveness of our approach in diverse industrial applications.

翻訳日:2023-11-14 18:23:04 公開日:2023-11-11

# ドメイン固有の質問応答におけるLLMの知識的選好アライメント

Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering ( http://arxiv.org/abs/2311.06503v1 )

ライセンス: Link先を確認

Yichi Zhang, Zhuo Chen, Yin Fang, Lei Cheng, Yanxi Lu, Fangming Li, Wen Zhang, Huajun Chen

(参考訳) 近年,大規模言語モデル(LLM)の開発が学術や産業で広く注目を集めている。 LLMを実際のシナリオにデプロイすることは、現在のインターネット産業における重要な方向のひとつです。本稿では,ドメイン知識グラフ(KG)を組み込んだドメイン固有質問応答(QA)にLLMを適用するパイプラインを提案する。現実世界のアプリケーションとして、llmsが生成するコンテンツはユーザフレンドリーでなければならない。さらに、モデルは信頼できる回答を生成するためにドメイン知識を適切に利用する必要があります。この2つの問題は、バニラの微調整が適切に対処できないため、llmアプリケーションにおける2つの大きな困難である。両方の要件は、実用的応用を達成するために人間と協調する必要があるモデル選好問題として統一できると考えています。そこで我々は,この2つの課題に対処するために,スタイル選好セットと知識選好セットという2種類の選好セットを構築するKnowPAT(KnowPAT)を提案する。さらに,LLMの嗜好と人間の嗜好を一致させる新たなアライメント目的を設計し,実シナリオドメイン固有のQAに対して,信頼性とユーザフレンドリな回答を生成するために,より良いLLMをトレーニングすることを目的とする。実験と15のベースラインメソッドによる総合的な実験により、我々のKnowPATはLLMを用いた実シナリオドメイン固有のQAにおいて、優れたパイプラインであることが示された。私たちのコードはhttps://github.com/zjukg/KnowPAT.comでオープンソースです。

Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.

翻訳日:2023-11-14 18:22:43 公開日:2023-11-11

# druformer: 運転場面の強化運転関係の自己理解による重要物体検出

DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding ( http://arxiv.org/abs/2311.06497v1 )

ライセンス: Link先を確認

Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda

(参考訳) 交通事故はしばしば致命傷を負い、2023年まで5000万人以上の死者を出した。運転の危険を軽減し、個人の安全を確保するためには、走行中の重要な物体を予測するための車両支援が不可欠である。重要物体検出に関するこれまでの研究は、主に個々の参加者の重要性を評価し、それらを独立した実体として扱い、それらの参加者間のつながりをよく見落としていた。残念ながら、このアプローチは複雑なシナリオで重要なオブジェクトを検出するのにあまり効果がないことが分かっています。そこで本研究では,重要な物体検出タスクを強化するために,運転シーン関連自己理解トランス (DRUformer) を提案する。 druformerはトランスフォーマティブベースのマルチモーダル重要な物体検出モデルであり、運転シナリオのすべての参加者間の関係を考慮に入れている。運転意図が運転中の重要な物体の検出に大きく影響していることを認識し,運転意図を埋め込むモジュールを組み込んだ。提案手法の性能を評価するために,演劇データセットの比較実験を行い,他の最先端(sota)モデルと比較した。その結果、mIoUの16.2\%改善とACCの12.3\%向上がSOTA法と比較して顕著に示された。さらに,様々な道路シナリオやクラスにまたがる重要な物体を検出できるモデルの質的分析を行い,多様な文脈における有効性に注目した。最後に,druformerモデルにおいて提案するモジュールの効率を評価するため,様々なアブレーション実験を行った。

Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model.

翻訳日:2023-11-14 18:22:18 公開日:2023-11-11

# LayoutPrompter: 大規模言語モデルの設計能力の覚醒

LayoutPrompter: Awaken the Design Ability of Large Language Models ( http://arxiv.org/abs/2311.06495v1 )

ライセンス: Link先を確認

Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang

(参考訳) ユーザの制約を高品質なレイアウトに自動マッピングする条件付きグラフィックレイアウト生成が,今日では広く注目を集めている。最近の研究は有望な性能を達成しているが、汎用性とデータ効率の欠如は実用的応用を妨げる。そこで本研究では,大規模言語モデル(LLM)を活用したLayoutPrompterを提案する。 LayoutPrompterは、入力出力シリアライゼーション、動的指数選択、レイアウトランキングという3つの重要なコンポーネントで構成されている。具体的には、入力出力シリアライゼーションコンポーネントは、各レイアウト生成タスクの入力および出力フォーマットを慎重に設計する。動的例題選択は、与えられた入力に対して最も有用な例題を選択する責任がある。 LLMの複数の出力から最高品質のレイアウトを選択するためにレイアウトローダが使用される。 4つの公開データセットを用いて既存のレイアウト生成タスクをすべて実験する。このアプローチの単純さにもかかわらず、実験結果から、LayoutPrompterはモデルトレーニングや微調整なしに、これらのタスクにおける最先端のアプローチと競合したり、性能を上回ります。これは、この多用途でトレーニングフリーなアプローチの有効性を示しています。さらに,レイアウトプロンプターは低データ状態におけるトレーニングベースベースラインよりも有意に優れており,レイアウトプロンプターのデータ効率も向上している。私たちのプロジェクトはhttps://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompterで利用可能です。

Conditional graphic layout generation, which automatically maps user constraints to high-quality layouts, has attracted widespread attention today. Although recent works have achieved promising performance, the lack of versatility and data efficiency hinders their practical applications. In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above problems through in-context learning. LayoutPrompter is made up of three key components, namely input-output serialization, dynamic exemplar selection and layout ranking. Specifically, the input-output serialization component meticulously designs the input and output formats for each layout generation task. Dynamic exemplar selection is responsible for selecting the most helpful prompting exemplars for a given input. And a layout ranker is used to pick the highest quality layout from multiple outputs of LLMs. We conduct experiments on all existing layout generation tasks using four public datasets. Despite the simplicity of our approach, experimental results show that LayoutPrompter can compete with or even outperform state-of-the-art approaches on these tasks without any model training or fine-tuning. This demonstrates the effectiveness of this versatile and training-free approach. In addition, the ablation studies show that LayoutPrompter is significantly superior to the training-based baseline in a low-data regime, further indicating the data efficiency of LayoutPrompter. Our project is available at https://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompter.

翻訳日:2023-11-14 18:21:53 公開日:2023-11-11

# L3 アンサンブル:基礎言語モデルのアンサンブルのための生涯学習アプローチ

L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models ( http://arxiv.org/abs/2311.06493v1 )

ライセンス: Link先を確認

Aidin Shiri, Kaushik Roy, Amit Sheth, Manas Gaur

(参考訳) 特定のタスクのための微調整済み基礎言語モデル(FLM)は、特にリソース制約のあるデバイスでは、しばしば実用的ではない。これは、自然言語処理(NLP)タスクのストリームに継続的に適応する、生涯学習(L3)フレームワークの開発を必要とする。本稿では,未知のデータから意味のある表現を抽出し,構造化知識ベースを構築し,タスク性能を漸進的に向上させるアプローチを提案する。我々は,GLUE や SuperGLUE などのベンチマークを含む様々な NLP タスクの有効性を検証する実験を行った。精度,トレーニング効率,知識伝達指標において,優れたパフォーマンスを測定した。初期実験の結果, 提案手法はflmに比べて, モデルの精度を4%～36%向上させることがわかった。さらに、L3モデルは、STSベンチマークで与えられたタスクの最先端言語モデル(T5)と比較して、競争力や優れたパフォーマンス(最大15.4%の精度向上)を維持しながら、微調整のアプローチよりも優れている。

Fine-tuning pre-trained foundational language models (FLM) for specific tasks is often impractical, especially for resource-constrained devices. This necessitates the development of a Lifelong Learning (L3) framework that continuously adapts to a stream of Natural Language Processing (NLP) tasks efficiently. We propose an approach that focuses on extracting meaningful representations from unseen data, constructing a structured knowledge base, and improving task performance incrementally. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. We measured good performance across the accuracy, training efficiency, and knowledge transfer metrics. Initial experimental results show that the proposed L3 ensemble method increases the model accuracy by 4% ~ 36% compared to the fine-tuned FLM. Furthermore, L3 model outperforms naive fine-tuning approaches while maintaining competitive or superior performance (up to 15.4% increase in accuracy) compared to the state-of-the-art language model (T5) for the given task, STS benchmark.

翻訳日:2023-11-14 18:21:29 公開日:2023-11-11

# 動的葉を持つ時空量子力学と古典力学

Spacetime quantum and classical mechanics with dynamical foliation ( http://arxiv.org/abs/2311.06486v1 )

ライセンス: Link先を確認

N. L. Diaz, J. M. Matera, R. Rossignoli

(参考訳) 古典物理学の通常の位相空間は、空間と時間が異なる扱いをし、この差は場の理論と量子力学(qm)に引き継がれる。本稿では、位相空間を2つの主要な拡張により拡張する。まず,ルジャンドル変換の時間選択を動的変数に促進する。次に、物質場のポアソン括弧を時空対称形式に拡張する。続く「時相空間」は、相対論的場の理論に対するハミルトン方程式の明示的な共変版を得るために用いられる。形式主義の正準的な量子化は、場の時空の可換関係を満足し、葉は量子である。このアプローチでは、古典的作用は作用素に昇格し、物質分離分割における非分離性を通して明示的な共分散を保持する。新しい非因果的枠組み(異なる時間における場が独立である)と従来のQMとの対応性を確立する問題は、時空への空間的相関の一般化によって解決される。この一般化では、ハミルトン粒子は作用に置き換わり、従来の粒子はオフシェル粒子に置き換わる。葉が量子化されると、pageおよびwootters機構と類似して、葉の固有状態の条件付けによって以前の地図を復元する。また、システムと環境の間の量子相関から与えられた理論の因果構造が現れる対応の解釈も提供する。このアイデアは一般的な量子系を包含し、密度行列を空間と時間の両方で相関子の情報を含む作用素に一般化することができる。

The conventional phase space of classical physics treats space and time differently, and this difference carries over to field theories and quantum mechanics (QM). In this paper, the phase space is enhanced through two main extensions. Firstly, we promote the time choice of the Legendre transform to a dynamical variable. Secondly, we extend the Poisson brackets of matter fields to a spacetime symmetric form. The ensuing "spacetime phase space" is employed to obtain an explicitly covariant version of Hamilton equations for relativistic field theories. A canonical-like quantization of the formalism is then presented in which the fields satisfy spacetime commutation relations and the foliation is quantum. In this approach, the classical action is also promoted to an operator and retains explicit covariance through its non-separability in the matter-foliation partition. The problem of establishing a correspondence between the new noncausal framework (where fields at different times are independent) and conventional QM is solved through a generalization of spacelike correlators to spacetime. In this generalization, the Hamiltonian is replaced by the action, and conventional particles by off-shell particles. When the foliation is quantized, the previous map is recovered by conditioning on foliation eigenstates, in analogy with the Page and Wootters mechanism. We also provide an interpretation of the correspondence in which the causal structure of a given theory emerges from the quantum correlations between the system and an environment. This idea holds for general quantum systems and allows one to generalize the density matrix to an operator containing the information of correlators both in space and time.

翻訳日:2023-11-14 18:21:13 公開日:2023-11-11

# 重み付き$p$-R\'{e}nyiエントロピーパワー不等式:量子シャノン理論への情報理論

Weighted $p$-R\'{e}nyi Entropy Power Inequality: Information Theory to Quantum Shannon Theory ( http://arxiv.org/abs/2311.06484v1 )

ライセンス: Link先を確認

Junseo Lee, Hyeonjun Yeo, Kabgyun Jeong

(参考訳) p$-R\'{e}nyi エントロピーパワーの不等式を、2つの独立連続確率変数 $X$ と $Y$ の重み係数 $t$ で研究する。この拡張は本質的に、ボブコフとマルシグリッティによるシャープ・ヤングの不等式による変調に依存する。我々の研究は量子シャノン理論の基本的な研究結果として利用でき、量子系におけるエントロピーパワーの不平等の R'{e}nyi 版を提供する。

We study the $p$-R\'{e}nyi entropy power inequality with a weight factor $t$ on two independent continuous random variables $X$ and $Y$. The extension essentially relies on a modulation on the sharp Young's inequality due to Bobkov and Marsiglietti. Our research provides a key result that can be used as a fundamental research finding in quantum Shannon theory, as it offers a R\'{e}nyi version of the entropy power inequality for quantum systems.

翻訳日:2023-11-14 18:20:47 公開日:2023-11-11

# 重ね合わせネットワークによる物理学習の改善:ニューラルネットワークとディープオペレータネットワークへの応用

Stacked networks improve physics-informed training: applications to neural networks and deep operator networks ( http://arxiv.org/abs/2311.06483v1 )

ライセンス: Link先を確認

Amanda A Howard, Sarah H Murphy, Shady E Ahmed, Panos Stinis

(参考訳) 物理インフォームドニューラルネットワークとオペレータネットワークは、物理システムをモデル化する方程式を効果的に解くことを約束している。しかし、これらのネットワークはいくつかの方程式系に対して正確に訓練することは困難または不可能である。本稿では,物理インフォームドニューラルネットワークと演算子ネットワークを積み重ねてトレーニングを容易にする,新しい多忠実度フレームワークを提案する。そこで我々は,学習モデルの表現性を高めつつ,次のステップを訓練するための低忠実度入力として1ステップのアウトプットが機能するネットワークの連鎖を構築した。反復過程の各ステップで課される方程式は同じか異なる(シミュレート・アニーリングのように)。提案手法の反復的(スタックング)な性質は,直接学習しにくい解の特徴を段階的に学習することを可能にする。非線形振り子,波動方程式,粘性バーガース方程式などのベンチマーク問題を通じて,物理に変形したニューラルネットワークと演算子ネットワークの精度向上とサイズ削減にスタック化がいかに役立つかを示す。

Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.

翻訳日:2023-11-14 18:20:33 公開日:2023-11-11

# ロボット学習における分布外検出のためのトポロジーマッチング正規化フロー

Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning ( http://arxiv.org/abs/2311.06481v1 )

ライセンス: Link先を確認

Jianxiang Feng, Jongseok Lee, Simon Geisler, Stephan Gunnemann, Rudolph Triebel

(参考訳) 現実の自律ロボットの信頼性の高い展開を容易にするためには、アウト・オブ・ディストリビューション(OOD)検出機能が必要であることが多い。 OOD検出のための強力なアプローチは、正規化フロー(NF)を用いた密度推定に基づいている。しかし,NFsを用いた先行的な研究は,複雑な対象分布とナイーブ基底分布とをトポロジカルに一致させることで,悪影響が生じる。本研究では,この位相的ミスマッチを,要求されるトポロジーに適合する情報論的目的を訓練した表現型クラス条件ベース分布を用いて回避する。提案手法は,OOD検出能力を向上しつつ,性能劣化や計算オーバーヘッドの最小化を伴わず,既存の学習モデルとの広範な互換性を享受できる。本研究では,密度推定と2次元物体検出ベンチマークにおいて,広範なベースラインと比較し,優れた結果を示す。さらに,本手法の適用性を実ロボットで示す。

To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse implications. In this work, we circumvent this topological mismatch using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. The proposed method enjoys the merits of wide compatibility with existing learned models without any performance degradation and minimum computation overhead while enhancing OOD detection capabilities. We demonstrate superior results in density estimation and 2D object detection benchmarks in comparison with extensive baselines. Moreover, we showcase the applicability of the method with a real-robot deployment.

翻訳日:2023-11-14 18:20:15 公開日:2023-11-11

# クラス不均衡に対処するために発生した呼吸音を用いた敵対的微調整

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance ( http://arxiv.org/abs/2311.06480v1 )

ライセンス: Link先を確認

June-Woo Kim, Chihyeon Yoon, Miika Toikkanen, Sangmin Bae, Ho-Young Jung

(参考訳) 深層生成モデルは、データの不足に対処するために医療画像領域において有望なアプローチとして現れてきた。しかし、呼吸音などのシーケンシャルなデータに対する使用は調査されていない。本研究では,条件付きニューラルボコーダとして音響拡散モデルを用いた非平衡呼吸音データ拡張手法を提案する。また, 合成音と実呼吸音の特徴を整合させ, 呼吸音の分類性能を向上させるために, 簡易かつ効果的な対向微調整法を実証した。 icbhiデータセットにおける実験結果から,提案手法は,従来の拡張法のみを用いて性能低下を示すが,逆向きの微調整が効果的であることが判明した。さらに,本手法はicbhiスコアでベースラインを2.24%上回り,マイノリティクラスの精度を26.58%まで向上させる。追加資料については、https://github.com/kaen2891/adversarial_fine-tuning_using_create_respiratory_soundでコードを提供します。

Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.

翻訳日:2023-11-14 18:19:59 公開日:2023-11-11

# 点状エミッターに焦点をあてた3次元イメージフリー共焦点

FiND: Few-shot three-dimensional image-free confocal focusing on point-like emitters ( http://arxiv.org/abs/2311.06479v1 )

ライセンス: Link先を確認

Swetapadma Sahoo, Junyue Jiang, Jaden Li, Kieran Loehr, Chad E. Germany, Jincheng Zhou, Bryan K. Clark, Simeon I. Bogdanov

(参考訳) 共焦点蛍光顕微鏡は生体分子、材料欠陥、量子光源などの点状放出物質の研究に広く応用されている。共焦点法では、光学分解能の向上、劇的な蛍光背景の拒絶、サブナノメータの局在化、蛍光バイオマーカーの超分解能イメージング、単一分子追跡、量子エミッタのキャラクタリゼーションに有用である。しかし、共焦点顕微鏡では、点状エミッタに焦点をあてる高速でノイズの少ない自動3Dが欠落している。ここでは,ハードウェアアドオンや修正を必要としない,イメージフリーな非トレーニング型3dフォーカスフレームワークであるfind (focusing in noise domain)を紹介する。 FiND は信号対雑音比を 1 まで減らし、信号対雑音比を 5 以上で数発操作する。 FiNDは、教師なしで大規模な、異質な量子エミッタの集合に焦点を合わせることができる。さらに,1つのnvセンターのドリフト軌道を10nmの精度で無期限に追従することにより,リアルタイム3dトラッキングの探索の可能性を示す。その結果,findは生物学,物質科学,量子光学における点状エミッタのスケーラブルな解析に有用なフレームワークであることがわかった。

Confocal fluorescence microscopy is widely applied for the study of point-like emitters such as biomolecules, material defects, and quantum light sources. Confocal techniques offer increased optical resolution, dramatic fluorescence background rejection and sub-nanometer localization, useful in super-resolution imaging of fluorescent biomarkers, single-molecule tracking, or the characterization of quantum emitters. However, rapid, noise-robust automated 3D focusing on point-like emitters has been missing for confocal microscopes. Here, we introduce FiND (Focusing in Noisy Domain), an imaging-free, non-trained 3D focusing framework that requires no hardware add-ons or modifications. FiND achieves focusing for signal-to-noise ratios down to 1, with a few-shot operation for signal-to-noise ratios above 5. FiND enables unsupervised, large-scale focusing on a heterogeneous set of quantum emitters. Additionally, we demonstrate the potential of FiND for real-time 3D tracking by following the drift trajectory of a single NV center indefinitely with a positional precision of < 10 nm. Our results show that FiND is a useful focusing framework for the scalable analysis of point-like emitters in biology, material science, and quantum optics.

翻訳日:2023-11-14 18:19:39 公開日:2023-11-11

# 第1回生成AIと法に関するワークショップ報告

Report of the 1st Workshop on Generative AI and Law ( http://arxiv.org/abs/2311.06477v1 )

ライセンス: Link先を確認

A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry, Milad Nasr, Paul Ohm, Adam Roberts, Tom Rubin, Pamela Samuelson, Ludwig Schubert, Kristen Vaccaro, Luis Villa, Felix Wu, Elana Zeide

(参考訳) 本報告では,2023年7月に開催された第1回生成AI法ワークショップ(GenLaw)について述べる。コンピュータ科学と法学の実践者と学者の学際的なグループが集まり、生成aiに関する法律と法のための生成aiによって提示される技術的、教義的、そして政策上の課題について議論し、特にアメリカ法を強調した。我々は、なぜジェネレーティブAIが法律にとって非常に重要で、非常に難しいのか、という高いレベルの声明でレポートを開始する。これらの課題を満たすために、我々は、必要不可欠なニーズがあると結論づける。 1) 専門分野にまたがる専門家に共通の概念言語を提供する共有知識ベース 2)他のコンピュータ及びAIシステムと比較して,生成型AIシステムの特有な技術的能力の明確化 3) これらの制度が提起する法的問題に関する論理的分類,及び 4) 創発的AIと法律の交差する新興問題における協力と知識共有を促進するための具体的な研究課題。本報告では,これらのニーズに対処し始めるgenlawワークショップの要点をまとめる。リストされた著者の全員がこのレポートをベースとしたワークショップに貢献したが、彼らとその組織は必ずしもこのレポートのすべての特定の主張を支持していない。

This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.

翻訳日:2023-11-14 18:19:12 公開日:2023-11-11

# グラフニューラルネットワークを用いた非構造メッシュ内の渦の同定

Identification of vortex in unstructured mesh with graph neural networks ( http://arxiv.org/abs/2311.06557v1 )

ライセンス: Link先を確認

Lianfa Wang, Yvan Fournier, Jean-Francois Wald, Youssef Mesri

(参考訳) 深層学習は計算流体力学(cfd)データベースからの流れ特性を識別し、研究者が流れ場をよりよく理解できるように支援し、幾何学設計を最適化し、対応する流れ特性に対して正しいcfd構成を選択するために用いられる。畳み込みニューラルネットワーク(CNN)は、フロー特徴の抽出と識別に最も一般的なアルゴリズムの1つである。しかし、追加のフロー場補間なしでの使用は、複雑な幾何学や不規則なメッシュが通常用いられる実際の産業ケースに限定する単純なドメイン幾何学と正規メッシュに限られる。上記の問題に着目し,非構造化メッシュ上でのCFD結果の渦を特定するために,U-Netアーキテクチャを用いたグラフニューラルネットワーク(GNN)モデルを提案する。 CFDメッシュからの代数的乗法を用いたグラフ生成とグラフ階層構築について述べる。 2次元CFDメッシュにおける渦領域をラベル付けするための渦自動ラベル法を提案する。まず, cnn の入力セットを最適化し, cnn モデルに対する現在の gnn カーネルのベンチマークを行い, 分類精度, 訓練効率, 渦形態の同定により, gnn カーネルの性能評価を行った。最後に,非構造メッシュへのアプローチの適応性と,レイノルズ数が異なる乱流モデルが異なる場合に対する一般性を示す。

Deep learning has been employed to identify flow characteristics from Computational Fluid Dynamics (CFD) databases to assist the researcher to better understand the flow field, to optimize the geometry design and to select the correct CFD configuration for corresponding flow characteristics. Convolutional Neural Network (CNN) is one of the most popular algorithms used to extract and identify flow features. However its use, without any additional flow field interpolation, is limited to the simple domain geometry and regular meshes which limits its application to real industrial cases where complex geometry and irregular meshes are usually used. Aiming at the aforementioned problems, we present a Graph Neural Network (GNN) based model with U-Net architecture to identify the vortex in CFD results on unstructured meshes. The graph generation and graph hierarchy construction using algebraic multigrid method from CFD meshes are introduced. A vortex auto-labeling method is proposed to label vortex regions in 2D CFD meshes. We precise our approach by firstly optimizing the input set on CNNs, then benchmarking current GNN kernels against CNN model and evaluating the performances of GNN kernels in terms of classification accuracy, training efficiency and identified vortex morphology. Finally, we demonstrate the adaptability of our approach to unstructured meshes and generality to unseen cases with different turbulence models at different Reynolds numbers.

翻訳日:2023-11-14 18:10:42 公開日:2023-11-11

# 分布シフトによるゼロショット言語間感性分類 : 探索的研究

Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study ( http://arxiv.org/abs/2311.06549v1 )

ライセンス: Link先を確認

Maarten De Raedt, Semere Kiros Bitew, Fr\'ederic Godin, Thomas Demeester and Chris Develder

(参考訳) unseenドメインのout-of-distribution(ood)テストサンプルにおける微調整された言語モデルのパフォーマンスの脆さは、英語でよく研究されているが、多言語モデルでは未検討である。そこで本研究では,OODテストデータのゼロショット言語間転送設定における一般化について検討し,列車データとテストデータ間の言語とドメインのシフトが与える影響を解析した。さらに,単言語的な英語設定ではCADが有用であることが示されているため,OODの一般化改善におけるCADの有効性について検討した。最後に,最近の大規模言語モデル(LLM)のパワーを活用し,CADに関連付けられたコストのかかるアノテーションプロセスを回避するための2つの新しいOOD一般化手法を提案する。英語のimdb movie reviewsでトレーニングされた labse, mbert, xlm-rの3つの多言語モデルを用いて実験を行い,amazon product reviews, tweet, restaurant reviewsの13言語でoodテストセットを評価した。その結果,単言語英語ではOODの低下がみられた。さらに (i)もともとの高リソース言語からの反事実は低リソース言語のOOD一般化を改善し、 (II) 新たに提案したコスト効率のアプローチは,Amazon および Restaurant のレビューにおいて CAD と同等あるいは最大で 3.1% の精度に達する。

The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.

翻訳日:2023-11-14 18:10:21 公開日:2023-11-11

# チャートからatlasへ:潜在空間を1つにマージする

From Charts to Atlas: Merging Latent Spaces into One ( http://arxiv.org/abs/2311.06547v1 )

ライセンス: Link先を確認

Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Li\`o, Emanuele Rodol\`a

(参考訳) 意味的に関連したデータセットとタスクでトレーニングされたモデルは、潜在空間内で同等のサンプル間関係を示す。本研究では,そのような潜在空間を集約し,それらの情報を包含する統一空間を作成する。この目的のために、相対的な表現を用いて空間を描画し、簡単な平均でそれらを集約する2段階のアプローチであるRelative Latent Space Aggregationを導入する。分類問題を3つの異なる設定(サンプル、クラス、あるいは両方)で一連の学習タスクに慎重に分割します。次に各タスクでモデルをトレーニングし、結果の潜在空間を集約します。集約された空間を、すべてのタスクで訓練されたエンドツーエンドモデルから派生した空間と比較し、2つの空間が類似していることを示す。次に、集約された空間が分類に適していることを観察し、その表現の中にタスク固有の埋め込み器が残したユニークなインプリントが原因であることを実証的に示す。最終的に、共有領域が存在しないシナリオでフレームワークをテストし、ナイーブマージよりもメリットが少なくても、スペースのマージに引き続き使用できることを示します。

Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces. We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information. To this end, we introduce Relative Latent Space Aggregation, a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an end-to-end model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by task-specific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging.

翻訳日:2023-11-14 18:09:55 公開日:2023-11-11

# 集合論による一般化の理解

Understanding Generalization via Set Theory ( http://arxiv.org/abs/2311.06545v1 )

ライセンス: Link先を確認

Shiqi Liu

(参考訳) 一般化は機械学習モデルの中核にある。しかし、一般化の定義は完全には明確ではない。アルゴリズム,仮説,データセットの一般化の概念を導入するために集合論を用いる。データセットの一般化の性質を解析し、代理一般化手順に関する定理を証明する。この定理は一般化の方法につながる。 MNISTデータセットの一般化実験により,13,541個のサンプルベースを得た。モデルの性能を評価するためにトレーニングセット全体を使用すると、モデルの精度は99.945%になる。しかし、サンプルベースをシフトしたり、ニューラルネットワーク構造を変更したりすると、性能は著しく低下する。また、常に誤予測されたサンプルを特定し、それらがすべて難しい例であることを示す。実験により,一般化定義の精度と提案手法の有効性を実証した。集合論的推論と実験の両方が一般化をより理解するのに役立ちます。

Generalization is at the core of machine learning models. However, the definition of generalization is not entirely clear. We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization. We analyze the properties of dataset generalization and prove a theorem on surrogate generalization procedures. This theorem leads to our generalization method. Through a generalization experiment on the MNIST dataset, we obtain 13,541 sample bases. When we use the entire training set to evaluate the model's performance, the models achieve an accuracy of 99.945%. However, if we shift the sample bases or modify the neural network structure, the performance experiences a significant decline. We also identify consistently mispredicted samples and find that they are all challenging examples. The experiments substantiated the accuracy of the generalization definition and the effectiveness of the proposed methods. Both the set-theoretic deduction and the experiments help us better understand generalization.

翻訳日:2023-11-14 18:09:36 公開日:2023-11-11

# 双方向長期記憶ネットワークを用いた色生成

Generation Of Colors using Bidirectional Long Short Term Memory Networks ( http://arxiv.org/abs/2311.06542v1 )

ライセンス: Link先を確認

A. Sinha

(参考訳) 人間の視覚は、200万から700万の識別可能な色合いと推定される広大な色を区別することができる。しかし、この印象的な範囲は、これらの色が我々の辞書の中で正確に命名され、記述されていることを本質的に意味していない。私たちはしばしば、日常生活で身近な物体や概念と色を関連付けます。この研究は、無数の陰影に対する視覚的認識と、それらを正確に表現し、命名する能力のギャップを埋めようとしている。この目的を達成するために,双方向長短期記憶(BiLSTM)ネットワークとアクティブラーニングを利用した新しいモデルが開発された。このモデルは、この研究のために慎重にキュレートされたプロプライエタリなデータセット上で動作する。本研究の主な目的は、以前は名前のない色を分類・命名したり、伝統的な色用語を損なう中間色を識別するための多用途ツールを作ることである。この発見は、色知覚と言語に対する我々の理解を革新するこの革新的なアプローチの可能性を基礎にしている。本研究は, 厳密な実験と分析を通じて, 多様な産業における自然言語処理(NLP)応用の道筋を照らすものである。広い色スペクトルの探索を容易にすることで、NLPの潜在的な応用は従来の境界を越えて拡張される。

Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries.

翻訳日:2023-11-14 18:09:28 公開日:2023-11-11

# 集積半導体フォトニクスプラットフォーム上の室温エンタングル量子プロセッサ

Room-Temperature entangled quantum processor on integrated semiconductor photonics platform ( http://arxiv.org/abs/2311.06541v1 )

ライセンス: Link先を確認

Haibo Hu, Yu Zhou, Ailun Yi, Tongyuan Bao, Chengying Liu, Qi Luo, Yao Zhang, Zi Wang, Zhengtong Liu, Shuming Xiao, Xin Ou, and Qinghai Song

(参考訳) 4H-ケイ素-炭化物-イオン絶縁体 (SiCOI) の台頭は、モノリシック量子フォトニクスネットワークの実現に向けた有望な道のりである。しかし、これらの統合フォトニクスプラットフォーム上で室温絡みレジスタを確立するという課題は未解決のままである。ここでは、SiCOIプラットフォーム上で最初の絡み合ったプロセッサを実演する。室温でのSiCOI上では, 単一希薄電子スピンの決定論的生成と1つの$^{13}$C核スピンの準ユニティスピン初期化が達成可能であることを示す。単一の核スピンをコヒーレントに操作するのに加えて、このCMOS互換半導体集積フォトニクス系では最大エンタングル状態が 0.89 である。この研究は、既存の欠陥ベースのコンピューティングおよびセンシングプロトコルにおけるコンパクトでオンチップなソリューションの基礎を確立し、SiCOIプラットフォームをモノリシックな量子フォトニクスネットワーク統合の最も有望な候補として位置づけている。

The rise of the 4H-silicon-carbide-on-insulator (SiCOI) platform marks a promising pathway towards the realization of monolithic quantum photonic networks. However, the challenge of establishing room-temperature entangled registers on these integrated photonics platforms remains unresolved. Herein, we demonstrate the first entangled processor on the SiCOI platform. We show that both deterministic generation of single divacancy electron spins and near-unity spin initialization of a single $^{13}$C nuclear spin can be achieved on SiCOI at room temperature. Besides coherently manipulating the single nuclear spin, a maximally entangled state with a fidelity of 0.89 has been prepared on this CMOS-compatible semiconductor-integrated photonics system. This work establishes the foundation for compact and on-chip solutions within existing defect-based computing and sensing protocols, positioning the SiCOI platform as the most promising candidate for integrated monolithic quantum photonic networks.

翻訳日:2023-11-14 18:09:09 公開日:2023-11-11

# 非等価な無バイアス基底の実験的実証

Experimental Demonstration of Inequivalent Mutually Unbiased Bases ( http://arxiv.org/abs/2311.06539v1 )

ライセンス: Link先を確認

Wen-Zhe Yan, Yunting Li, Zhibo Hou, Huangjun Zhu, Guo-Yong Xiang, Chuan-Feng Li, and Guang-Can Guo

(参考訳) 相互に偏りのない基底(mub)に基づく量子計測は基礎研究や量子情報処理において重要な役割を果たす。 MUBと等価ではないことが知られているが、その運用上の違いについてはほとんど分かっていない。本研究は, 簡易推定問題により, 高精度フォトニクスシステムに基づく次元4におけるMUBの不等値三重項の操作的区別を実験的に実証する。実験的な推定フィデリティは、平均偏差が0.16$\%である理論的な予測とよく一致し、最大推定フィデリティと最小推定フィデリティとの差(4.1$\%$)より25倍小さい。実験により,不等価なMUBには異なる情報抽出能力と量子情報処理の利点があることが明らかとなった。

Quantum measurements based on mutually unbiased bases (MUB) play crucial roles in foundational studies and quantum information processing. It is known that there exist inequivalent MUB, but little is known about their operational distinctions, not to say experimental demonstration. In this work, by virtue of a simple estimation problem we experimentally demonstrate the operational distinctions between inequivalent triples of MUB in dimension 4 based on high-precision photonic systems. The experimental estimation fidelities coincide well with the theoretical predictions with only 0.16$\%$ average deviation, which is 25 times less than the difference (4.1$\%$) between the maximum estimation fidelity and the minimum estimation fidelity. Our experiments clearly demonstrate that inequivalent MUB have different information extraction capabilities and different merits for quantum information processing.

翻訳日:2023-11-14 18:08:52 公開日:2023-11-11

# 機械学習は社会科学において安全で無責任か? レシディズム予測課題からのパラドックスと再考

Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks ( http://arxiv.org/abs/2311.06537v1 )

ライセンス: Link先を確認

Jianhong Liu (1), Dianshi Li (1) ((1) Faculty of Law, University of Macau, Macau, China)

(参考訳) 本論文は,社会科学への計算的アプローチの根底にある,ハイテイクなイベント予測に関する,基本的な,熱い議論を提起する。我々は機械学習に対するいくつかの一般的な見解を疑問視し、計算手法と従来の社会科学アプローチの融合を促進する新しいパラダイムを概説する。

The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.

翻訳日:2023-11-14 18:08:38 公開日:2023-11-11

# CrashCar101: 損傷評価のための手続き生成

CrashCar101: Procedural Generation for Damage Assessment ( http://arxiv.org/abs/2311.06536v1 )

ライセンス: Link先を確認

Jens Parslov, Erik Riise, Dim P. Papadopoulos

(参考訳) 本稿では,自動車などの車両の損傷評価の問題に対処することに関心がある。この作業では、位置や損傷の程度を検出するだけでなく、損傷部分の特定も必要となる。画像中の意味的部分と損傷のセグメンテーションのためのコンピュータビジョンシステムを訓練するためには,画像に高コストの画素アノテーションを加えて手動でアノテートする必要がある。このニーズを克服するために、これらのモデルをトレーニングするために合成データを使用することを提案する。合成データは、高い可変性、ピクセル精度のアノテーション、そして人間の介入なしに任意に大きなトレーニングセットをサンプルに提供することができる。本研究では, 3次元車両モデルに損傷を与えるプロシージャ生成パイプラインを提案し, 部品および損傷カテゴリに対する画素精度アノテーションと組み合わせた損傷車両の合成2次元画像を得る。私たちのアイデアを検証するために、パイプラインを実行し、CrashCar101データセットをレンダリングします。部品分割と損傷分割のタスクのために、3つの実際のデータセットで実験を行う。部分セグメンテーションについては,実データと合成データの組み合わせで学習したセグメンテーションモデルが,実データのみでトレーニングされたすべてのモデルよりも優れていることを示す。損傷セグメンテーションではCrashCar101のsim2real転送能力を示す。

In this paper, we are interested in addressing the problem of damage assessment for vehicles, such as cars. This task requires not only detecting the location and the extent of the damage but also identifying the damaged part. To train a computer vision system for the semantic part and damage segmentation in images, we need to manually annotate images with costly pixel annotations for both part categories and damage types. To overcome this need, we propose to use synthetic data to train these models. Synthetic data can provide samples with high variability, pixel-accurate annotations, and arbitrarily large training sets without any human intervention. We propose a procedural generation pipeline that damages 3D car models and we obtain synthetic 2D images of damaged cars paired with pixel-accurate annotations for part and damage categories. To validate our idea, we execute our pipeline and render our CrashCar101 dataset. We run experiments on three real datasets for the tasks of part and damage segmentation. For part segmentation, we show that the segmentation models trained on a combination of real data and our synthetic data outperform all models trained only on real data. For damage segmentation, we show the sim2real transfer ability of CrashCar101.

翻訳日:2023-11-14 18:08:32 公開日:2023-11-11

# 自動要約による裁判所意見の理解の高まり

Enhancing Public Understanding of Court Opinions with Automated Summarizers ( http://arxiv.org/abs/2311.06534v1 )

ライセンス: Link先を確認

Elliott Ash and Aniket Kesari and Suresh Naidu and Lena Song and Dominik Stammbach

(参考訳) 書記された司法意見は、裁判所決定における公的な信頼を構築するための重要な道具であるが、非専門家が理解することが困難である。本稿では,AIアシスタントを用いて簡易な意見要約を生成するパイプラインを提案する。これらは一般市民によりアクセスしやすく、非専門家にも理解しやすいものであり、簡易的な要約が判断の重要な特徴を理解するのに役立つことを調査実験で示している。大規模言語モデルを用いた研究に法的ドメイン知識を統合する方法について論じる。以上の結果から,AIアシスタントが一般市民に伝える役割と,弁護士がアクセス可能な要約を生成するプロセスを導く役割が示唆された。

Written judicial opinions are an important tool for building public trust in court decisions, yet they can be difficult for non-experts to understand. We present a pipeline for using an AI assistant to generate simplified summaries of judicial opinions. These are more accessible to the public and more easily understood by non-experts, We show in a survey experiment that the simplified summaries help respondents understand the key features of a ruling. We discuss how to integrate legal domain knowledge into studies using large language models. Our results suggest a role both for AI assistants to inform the public, and for lawyers to guide the process of generating accessible summaries.

翻訳日:2023-11-14 18:08:13 公開日:2023-11-11

# マルチモーダル・大規模多言語翻訳のための推論時間における毒性緩和の付加

Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation ( http://arxiv.org/abs/2311.06532v1 )

ライセンス: Link先を確認

Marta R. Costa-juss\`a and David Dale and Maha Elbayad and Bokai Yu

(参考訳) 翻訳の文脈で毒性を加えることは、入力の中に存在するものよりも毒性の高い翻訳出力を生成するという事実を指す。本稿では, 新規な毒性同定パイプラインであるmintoxを提案し, 推理時間に作用するこの問題を緩和する。 MinToxは、マルチモーダル(音声とテキスト)で大規模言語で動作する毒性検出分類器を使用している。この緩和法は、大規模およびテキスト出力に直接言語に適用される。 mintoxは、最新のマルチモーダル機械翻訳システムであるseamlessm4tに適用されている。このシステムのために、MinToxはドメイン、モダリティ、言語方向を横断する毒性を著しく緩和する。 MinToxは、翻訳品質を維持しながら、毒性(モダリティとドメインに依存している)の25%から95%まで、ほぼろ過する。

Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale. The mitigation method is applied to languages at scale and directly in text outputs. MinTox is applied to SEAMLESSM4T, which is the latest multimodal and massively multilingual machine translation system. For this system, MinTox achieves significant added toxicity mitigation across domains, modalities and language directions. MinTox manages to approximately filter out from 25% to 95% of added toxicity (depending on the modality and domain) while keeping translation quality.

翻訳日:2023-11-14 18:08:02 公開日:2023-11-11

# chatgptが脆弱性管理問題を解決する方法

How ChatGPT is Solving Vulnerability Management Problem ( http://arxiv.org/abs/2311.06530v1 )

ライセンス: Link先を確認

Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang

(参考訳) 最近、ChatGPTはコード分析領域から大きな注目を集めています。以前の研究によると、chatgptには抽象構文木生成のような基本的なコード解析タスクの処理能力があり、コード構文と静的な振る舞いを理解するためにchatgptを使用する可能性を示している。しかし、chatgptがセキュリティの関連性の予測やパッチの正確性といった、より複雑な現実世界の脆弱性管理タスクを完了できるかどうかは不明であり、コード構文、プログラムの意味論、関連する手動コメントなど、さまざまな側面を包括的に理解する必要がある。本稿では,78,445のサンプルを含む大規模データセットを用いて,脆弱性管理プロセス全体に関わる6つのタスクに対するchatgptの能力について検討する。各タスクに対して、ChatGPTとSOTAのアプローチを比較し、異なるプロンプトの影響を調査し、困難を調査する。その結果,chatgptを脆弱性管理に活用できる可能性が示唆された。注目すべき例として、ChatGPTのソフトウェアバグレポートのタイトル生成などのタスクにおける熟練度がある。さらに,ChatGPTが抱える困難が明らかとなり,将来的な方向性に光を当てた。例えば、プロンプトでランダムな例を直接提供しても、脆弱性管理における優れたパフォーマンスを一貫して保証することはできない。対照的に、ChatGPTを自己ヒューリスティックな方法で活用 -- 実演例自体から専門知識を抽出し、抽出された専門知識をプロンプトに統合することは、有望な研究方向である。さらにChatGPTは、プロンプトの情報を誤解し、誤用することがある。したがって、ChatGPTが無関係なコンテンツよりも有益な情報に集中するよう効果的に導くことは、まだ未解決の問題である。

Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments. In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 78,445 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way -- extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem.

翻訳日:2023-11-14 18:07:48 公開日:2023-11-11

# TURBO: 自動エンコーダーのスイスナイフ

TURBO: The Swiss Knife of Auto-Encoders ( http://arxiv.org/abs/2311.06527v1 )

ライセンス: Link先を確認

Guillaume Qu\'etant, Yury Belousov, Vitaliy Kinakh, Slava Voloshynovskiy

(参考訳) 本稿では,自動符号化手法の体系的解析と一般化を目的とした新しい情報理論フレームワークTURBOを提案する。まず、情報ボトルネックとボトルネックベースのネットワークの原理を自動エンコーディング設定で検証し、それらの固有の制限を識別することから始める。次に、TURBOフレームワークを導入し、情報フローを反映した2方向の様々なデータ表現間の相互情報の最大化からなる、その中核概念を包括的に導出する。このフレームワークには、多くの一般的なニューラルネットワークモデルが含まれている。本論文は,これらのモデルをすべて解明する上で,情報ボトルネックの概念が不十分であることを示す。 TURBOの導入は、データ表現とニューラルネットワークモデルの構造のより深い理解に寄与し、より効率的で汎用的なアプリケーションを可能にする。

We present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We start by examining the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identifying their inherent limitations, which become more prominent for data with multiple relevant, physics-related representations. The TURBO framework is then introduced, providing a comprehensive derivation of its core concept consisting of the maximisation of mutual information between various data representations expressed in two directions reflecting the information flows. We illustrate that numerous prevalent neural network models are encompassed within this framework. The paper underscores the insufficiency of the information bottleneck concept in elucidating all such models, thereby establishing TURBO as a preferable theoretical reference. The introduction of TURBO contributes to a richer understanding of data representation and the structure of neural network models, enabling more efficient and versatile applications.

翻訳日:2023-11-14 18:07:21 公開日:2023-11-11

# 最小記述長ホップフィールドネットワーク

Minimum Description Length Hopfield Networks ( http://arxiv.org/abs/2311.06518v1 )

ライセンス: Link先を確認

Matan Abudy, Nur Lan, Emmanuel Chemla, Roni Katzir

(参考訳) 連想記憶アーキテクチャは記憶のために設計されているが、その検索方法を通じて、見つからない入力への一般化の形式を提供する:記憶記憶は、この観点からプロトタイプと見なすことができる。 MHN(Modern Hopfield Networks)に着目して,大規模な記憶能力が一般化の機会を損なうことを示す。このトレードオフを最適化するためのソリューションを提供します。最小記述長(MDL)を使用して、トレーニング中に記憶すべき記憶数と、その数を決定する。

Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this tradeoff. It relies on Minimum Description Length (MDL) to determine during training which memories to store, as well as how many of them.

翻訳日:2023-11-14 18:07:07 公開日:2023-11-11

# BClean:ベイジアンのデータクリーニングシステム

BClean: A Bayesian Data Cleaning System ( http://arxiv.org/abs/2311.06517v1 )

ライセンス: Link先を確認

Jianbin Qin, Sifan Huang, Yaoshu Wang, Jing Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao

(参考訳) データクリーニングには、誤ったデータを修正し、汚いデータセットをよりクリーンなものに変換する、さまざまな原則を用いる、かなりの量の作業がある。一般的なアプローチの1つは、ベイズ法を含む確率的手法である。しかし、既存の確率的手法は、しばしば単純分布(例えばガウス分布)を仮定し、それらは実際には不適合であり、専門家が複雑な事前分布(例えば、プログラミング言語を介して)を提供する必要がある。この要件は労働集約的かつ費用がかかるため、実際のアプリケーションには適さない。本稿では,ベイズネットワークの自動構築とユーザインタラクションを特徴とするベイズ清掃システムbcleanを提案する。我々は、データクリーニング問題をベイズ推定として再キャストし、観測されたデータセットの属性とユーザが提供する事前情報の関係を完全に活用する。そこで本研究では,類似度関数を用いた構造学習に基づく関数依存発見法を拡張し,属性間の関係を捉えるベイズネットワーク構築手法を提案する。さらに,本システムでは,生成したベイズネットワークを修正して,自動生成プロセスで特定された事前情報や正確な不正確性を特定する。また,ベイズ推定に必要な効果的なスコアリングモデル(補償スコアリングモデル)を設計する。データクリーニングの効率を高めるために,グラフ分割,ドメインプルーニング,事前検出などベイズ推定のための近似手法を提案する。実世界のデータセットと合成データセットの両方について評価することで、bcleanはデータクリーニングにおいて最大0.9のf-測定を達成でき、既存のベイズ法を2%、その他のデータクリーニング法を15%上回る。

There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.

翻訳日:2023-11-14 18:06:58 公開日:2023-11-11

# フィードバック制御による量子絡み合いの発生と向上

Emergence and enhancement of feedback control induced quantum entanglement ( http://arxiv.org/abs/2311.06578v1 )

ライセンス: Link先を確認

M. Amazioug, D. Dutykh, M. Asjad

(参考訳) 本稿では,機械振動子やマグノンと相互作用しながらキャビティを脱出するキャビティモードにフィードバックを適用し,量子相関を制御する手法を提案する。移動鏡を有するハイブリッドキャビティマグノメカニカルシステムにおいて,提案するコヒーレントフィードバックスキームは,2成分と3成分の量子相関の強化を可能にする。さらに,コヒーレントフィードバック制御の存在下での環境温度に対して,結果として生じる絡み合いは頑健であることを示す。

We present a scheme for controlling quantum correlations by applying feedback to the cavity mode that exits a cavity while interacting with a mechanical oscillator and magnons. In a hybrid cavity magnomechanical system with a movable mirror, the proposed coherent feedback scheme allows for the enhancement of both bipartite and tripartite quantum correlations. Moreover, we demonstrate that the resulting entanglement remains robust with respect to ambient temperatures in the presence of coherent feedback control.

翻訳日:2023-11-14 17:59:14 公開日:2023-11-11

# 強化学習を用いたブラックボックスロボット制御のための知的社会学習に基づく最適化戦略

An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning ( http://arxiv.org/abs/2311.06576v1 )

ライセンス: Link先を確認

Xubo Yang, Jian Gao, Ting Wang, Yaozhen He

(参考訳) ロボットのインテリジェントな制御を実装することは、特に複雑なブラックボックスシステムを扱う場合、これらのロボットの内部動作の可視性と理解が欠如しているため、難しい作業である。本稿では,ブラックボックスロボットシステムのインテリジェント制御を実現するための知的社会学習(ISL)アルゴリズムを提案する。ヒトの社会集団における個人間の相互学習にインスパイアされたISLは、学習、模倣、自己学習スタイルを含む。学習スタイルの個人は、最高のパフォーマーから学び、最も近い関係を形成するために、levy flight search戦略を使用する。模倣スタイルでは、個人はランダムな摂動戦略を用いて第2レベルのラプポートで最高のパフォーマーを模倣する。自己学習スタイルでは、個人は、ベストパフォーマーとの遠い関係を維持しながら、正規分布サンプリング手法を用いて独立して学習する。人口の個人は、それぞれのスタイルで自律的な知的エージェントとみなされる。ニューラルネットワークは、環境とロボットと相互作用し、ネットワークポリシーを反復的に最適化するために、3つのスタイルで戦略的行動を実行する。全体として、ISLは知的最適化の原理に基づいており、強化学習のアイデアを取り入れ、強力な探索能力、高速な計算速度、ハイパーパラメータの減少、スパース報酬に対する感度を持っている。提案するislアルゴリズムは,mujocoの6つの連続制御ベンチマークケースにおいて4つの最先端手法と比較し,その効果と利点を検証した。さらに、UR3ロボットのシミュレーションおよび実験的な把握タスクにISLを採用し、良好な解が得られる。

Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.

翻訳日:2023-11-14 17:59:04 公開日:2023-11-11

# スパース注意に基づくコード分類のためのニューラルネットワーク

Sparse Attention-Based Neural Networks for Code Classification ( http://arxiv.org/abs/2311.06575v1 )

ライセンス: Link先を確認

Ziyang Xiang, Zaixi Zhang, Qi Liu

(参考訳) ソースコードを正確かつ効率的に分類することは、実世界のプログラミング教育プラットフォーム管理において難しい問題である。近年,抽象構文木(AST)を用いたモデルベースアプローチがコード分類タスクに広く適用されている。本稿では,SACC(Sparse Attention-based Neural Network for Code Classification)というアプローチを紹介する。最初のステップでは、ソースコードが構文解析と事前処理を受けています。生成された抽象構文木をサブツリーのシーケンスに分割し、再帰的ニューラルネットワークを用いて符号化して高次元表現を得る。このステップでは、コードに含まれる論理構造と語彙レベルの情報の両方を同時に検討する。第2のステップでは、サブツリーの符号化されたシーケンスは、分類のためにスパースアテンション機構を組み込んだトランスフォーマーモデルに供給される。この方法は、自己認識機構の計算コストを効率よく低減し、有効性を保ちながらトレーニング速度を向上させる。私たちの研究は、コード分類タスクのユニークなニーズを満たすように設計された、慎重に設計されたスパースアテンションパターンを導入しました。この設計は冗長な情報の影響を低減し、モデル全体の性能を向上させるのに役立つ。最後に,前回の研究では,不完全分類ラベルやデータセットサイズの小さといった問題も扱っている。我々は,CodeNetデータセットに,膨大な量のデータを含むアルゴリズム関連ラベリングカテゴリを付加した。コード分類作業におけるSACCの有効性と効率を比較検討した。

Categorizing source codes accurately and efficiently is a challenging problem in real-world programming education platform management. In recent years, model-based approaches utilizing abstract syntax trees (ASTs) have been widely applied to code classification tasks. We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) in this paper. The approach involves two main steps: In the first step, source code undergoes syntax parsing and preprocessing. The generated abstract syntax tree is split into sequences of subtrees and then encoded using a recursive neural network to obtain a high-dimensional representation. This step simultaneously considers both the logical structure and lexical level information contained within the code. In the second step, the encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification. This method efficiently reduces the computational cost of the self-attention mechanisms, thus improving the training speed while preserving effectiveness. Our work introduces a carefully designed sparse attention pattern that is specifically designed to meet the unique needs of code classification tasks. This design helps reduce the influence of redundant information and enhances the overall performance of the model. Finally, we also deal with problems in previous related research, which include issues like incomplete classification labels and a small dataset size. We annotated the CodeNet dataset with algorithm-related labeling categories, which contains a significantly large amount of data. Extensive comparative experimental results demonstrate the effectiveness and efficiency of SACC for the code classification tasks.

翻訳日:2023-11-14 17:58:37 公開日:2023-11-11

# 量子ビット列比較器のための一般化空間効率アルゴリズム

A Generalized Space-Efficient Algorithm for Quantum Bit String Comparators ( http://arxiv.org/abs/2311.06573v1 )

ライセンス: Link先を確認

Khuram Shahzad and Omar Usman Khan

(参考訳) 量子ビット文字列比較器(QBSC)は、nビットの2つのシーケンスで動作し、その関係を等しく、より大きく、より小さくすることができる。これは条件文がプログラミング言語で使われる方法に似ている。その結果、QBSCは量子コンピュータで実行または適応できる様々なアルゴリズムにおいて重要な役割を果たす。 n$-qubitの長さで効率的で一般化された比較器の開発は、コストのかかるフットプリントと量子遅延をもたらすため、長い間課題とされてきた。効率的な比較器は固定長の入力に関連付けられる。その結果、一般化回路を持たないコンパレータはより高レベルには適用できないが、サイズが制限された問題には適している。本稿では,2つのアンシラリービットのみを用いた2つのn$-qubit論理状態の比較のための一般化設計を提案する。設計は、量子ビット要求、補助ビット使用量、量子コスト、量子遅延、ゲート操作、回路の複雑さに基づいて検討され、様々な入力長で総合的にテストされる。この研究は量子アルゴリズムの設計における十分な柔軟性を可能にし、量子アルゴリズムの開発を加速することができる。

Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of efficient and generalized comparators for any $n$-qubit length has long posed a challenge, as they have a high-cost footprint and lead to quantum delays. Comparators that are efficient are associated with inputs of fixed length. As a result, comparators without a generalized circuit cannot be employed at a higher level, though they are well-suited for problems with limited size requirements. In this paper, we introduce a generalized design for the comparison of two $n$-qubit logic states using just two ancillary bits. The design is examined on the basis of qubit requirements, ancillary bit usage, quantum cost, quantum delay, gate operations, and circuit complexity, and is tested comprehensively on various input lengths. The work allows for sufficient flexibility in the design of quantum algorithms, which can accelerate quantum algorithm development.

翻訳日:2023-11-14 17:58:14 公開日:2023-11-11

# Swin UNETR++: 完全自動放射線腫瘍治療に向けたトランスフォーマーベースの高線量予測

Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments ( http://arxiv.org/abs/2311.06572v1 )

ライセンス: Link先を確認

Kuancheng Wang, Hai Siong Tan, Rafe Mcbeth

(参考訳) 放射線腫瘍学の分野は、がん治療のための放射線治療計画の作成を完全自動化するために人工知能を使用する利点がある。この時間的および専門的なタスクは、患者の画像と臓器と腫瘍のセグメンテーションを組み合わせて、3次元放射線線量分布を生成して臨床治療目標を満たす。そこで本研究では,swain unetr++を提案する。swain unetr++では,dcaモジュールを軽量化することにより,畳み込みニューラルネットワークが欠如している各患者固有の解剖学のボリューム内およびボリューム間関係をキャプチャする。私たちのモデルは、Open Knowledge-Based Planningデータセットでトレーニングされ、検証され、テストされました。 Dose Score $\overline{S_{\text{Dose}}}$およびDVH Score $\overline{S_{\text{DVH}}}$の計測値に加えて、予測された3D線量分布と地上の3D線量分布の差を定量的に測定する指標として、平均容積受入率$\overline{R_{\text{VA}}}$と平均臨床受入率$\overline{R_{\text{PA}}}$の定性測定値を提案し、予測の臨床的信頼性を評価する。 Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.

The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense prediction. In this work, we propose Swin UNETR++, that contains a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient's unique anatomy, which fully convolutional neural networks lack. Our model was trained, validated, and tested on the Open Knowledge-Based Planning dataset. In addition to metrics of Dose Score $\overline{S_{\text{Dose}}}$ and DVH Score $\overline{S_{\text{DVH}}}$ that quantitatively measure the difference between the predicted and ground-truth 3D radiation dose distribution, we propose the qualitative metrics of average volume-wise acceptance rate $\overline{R_{\text{VA}}}$ and average patient-wise clinical acceptance rate $\overline{R_{\text{PA}}}$ to assess the clinical reliability of the predictions. Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.

翻訳日:2023-11-14 17:57:54 公開日:2023-11-11

# 深部スパイクニューラルネットワークにおけるADD残差接続との比較精度を実現するOR残差接続

OR Residual Connection Achieving Comparable Accuracy to ADD Residual Connection in Deep Residual Spiking Neural Networks ( http://arxiv.org/abs/2311.06570v1 )

ライセンス: Link先を確認

Yimeng Shan, Xuerui Qiu, Rui-jie Zhu, Ruike Li, Meng Wang, Haicheng Qu

(参考訳) スパイキングニューラルネットワーク(SNN)は、その生物学的忠実さとエネルギー効率のよいスパイク駆動操作を実行する能力のために、脳のような計算にかなりの注意を払っている。 snnのパフォーマンス向上への需要が高まるにつれ、深層ネットワークのトレーニングへの傾向は必然的となり、残りの学習はディープニューラルネットワークのトレーニングの重要な方法となっている。調査では,深部スパイクニューラルネットワークの代表であるSEW-ResNetが非イベント駆動の操作を取り入れていることを確認した。これを修正するために、アーキテクチャにORRC(OR Residual Connect)を導入します。さらに,高量子化によるエネルギー損失を相殺するために,抑制注意(ia)モジュールと多次元注意(ma)モジュールの融合であるsynaモジュールを提案する。ネットワークにSynAを組み込むと、トレーニング後、モデルの分類精度に影響を与えることなく、ネットワーク内のショートカットの一部または全部が自然に消えてしまう「自然なプルーニング」現象が観察された。これにより計算オーバーヘッドが大幅に削減され、エッジデバイスへのデプロイに適している。様々な公開データセットを用いた実験の結果、syna強化またはスピーキングresnetはニューロン当たり0.8スパイクの少ない単一サンプル分類を達成した。さらに, 他のスパイク残差モデルと比較すると, 精度が高く, 消費電力も低かった。コードはhttps://github.com/Ym-Shan/ORRC-SynA-natural-pruningで公開されている。

Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. As the demand for heightened performance in SNNs surges, the trend towards training deeper networks becomes imperative, while residual learning stands as a pivotal method for training deep neural networks. In our investigation, we identified that the SEW-ResNet, a prominent representative of deep residual spiking neural networks, incorporates non-event-driven operations. To rectify this, we introduce the OR Residual connection (ORRC) to the architecture. Additionally, we propose the Synergistic Attention (SynA) module, an amalgamation of the Inhibitory Attention (IA) module and the Multi-dimensional Attention (MA) module, to offset energy loss stemming from high quantization. When integrating SynA into the network, we observed the phenomenon of "natural pruning", where after training, some or all of the shortcuts in the network naturally drop out without affecting the model's classification accuracy. This significantly reduces computational overhead and makes it more suitable for deployment on edge devices. Experimental results on various public datasets confirmed that the SynA enhanced OR-Spiking ResNet achieved single-sample classification with as little as 0.8 spikes per neuron. Moreover, when compared to other spike residual models, it exhibited higher accuracy and lower power consumption. Codes are available at https://github.com/Ym-Shan/ORRC-SynA-natural-pruning.

翻訳日:2023-11-14 17:56:58 公開日:2023-11-11

# SCADI:潜在変数モデルにおける自己教師付き因果解離

SCADI: Self-supervised Causal Disentanglement in Latent Variable Models ( http://arxiv.org/abs/2311.06567v1 )

ライセンス: Link先を確認

Heejeong Nam

(参考訳) 因果的な絡み合いは複雑な状況を取り込む大きな可能性を秘めている。しかし、実用的で効率的なアプローチが欠けている。教師なしの解離解離法の多くは、追加情報なしでは識別可能な結果が得られず、しばしばランダムに解離する出力をもたらすことが知られている。したがって、既存の解離モデルのほとんどは、過大なコストを発生させる内在的要因に関する情報を提供する弱教師付きモデルである。そこで本研究では,意味的要因の発見と因果関係の学習を可能にする新しいモデルであるSCADI(Self-supervised CAusal DIsentanglement)を提案する。本モデルでは, マスク型構造因果モデル (SCM) と疑似ラベル生成器を組み合わせることで, 自己監督型因果解離モデルに新たな方向性を提供する。

Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models.

翻訳日:2023-11-14 17:56:33 公開日:2023-11-11

# convolve and conquer: wiener filterとのデータ比較

Convolve and Conquer: Data Comparison with Wiener Filters ( http://arxiv.org/abs/2311.06558v1 )

ライセンス: Link先を確認

Deborah Pelacani Cruz, George Strong, Oscar Bates, Carlos Cueto, Jiashun Yao, Lluis Guasch

(参考訳) データサンプル間の差異および/または類似性の定量的評価は、学習データ分布に関連する形状最適化問題を定義する。現在のデータ比較法は、そのような分布を捉える際の制限や最適化に望ましい数学的性質(例えば、滑らかさ、微分可能性、凸性)を欠くことが多い。本稿では,Wiener-filter理論にインスパイアされたペアサンプル間の相似性を測定する新しい手法を提案する。 Wienerフィルタの畳み込み特性により、グローバルに相関した方法でデータサンプルを包括的に比較できる。データ圧縮、医用画像計算、翻訳分類、非パラメトリック生成モデリングの4つの機械学習応用において、我々のアプローチを検証する。その結果,従来の平均二乗誤り類似実装と比較して,知覚品質とデータ忠実度が向上し,翻訳に対する堅牢性も向上した。

Quantitative evaluations of differences and/or similarities between data samples define and shape optimisation problems associated with learning data distributions. Current methods to compare data often suffer from limitations in capturing such distributions or lack desirable mathematical properties for optimisation (e.g. smoothness, differentiability, or convexity). In this paper, we introduce a new method to measure (dis)similarities between paired samples inspired by Wiener-filter theory. The convolutional nature of Wiener filters allows us to comprehensively compare data samples in a globally correlated way. We validate our approach in four machine learning applications: data compression, medical imaging imputation, translated classification, and non-parametric generative modelling. Our results demonstrate increased resolution in reconstructed images with better perceptual quality and higher data fidelity, as well as robustness against translations, compared to conventional mean-squared-error analogue implementations.

翻訳日:2023-11-14 17:56:18 公開日:2023-11-11

# Heuristics-Driven Link-of-Analogy Prompting:Document-Level Event Argument extractのための大規模言語モデルの強化

Heuristics-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction ( http://arxiv.org/abs/2311.06555v1 )

ライセンス: Link先を確認

Hanzhang Zhou, Junlang Qian, Zijian Feng, Hui Lu, Zixiao Zhu, Kezhi Mao

(参考訳) 本研究では,文書レベルのイベント引数抽出(EAE)における文脈内学習(ICL)について検討する。本論文では,この問題の課題として,例の選択,文脈長の制限,イベントの多量化,非推論タスクにおけるチェーン・オブ・ソート(CoT)の制限などを挙げる。これらの課題に対処するために,Huristic-Driven Link-of-Analogy (HD-LoA) プロンプト手法を提案する。具体的には、LCM が ICL による実演からタスク固有のヒューリスティックを学ぶことを仮定し、検証する。この仮説に基づいて,haphazardサンプル選択プロセスをタスクヒューリスティックを強調する方法論的手法に変換する,明示的なヒューリスティック駆動型実証構築手法を提案する。さらに,人間のアナロジー推論に触発されて,llmが既知の状況にアナロジーを描き,適応性を高め,新たな状況の処理を可能にするリンク・オブ・アナロジー・プロンプトを提案する。実験の結果,本手法は既存のプロンプト法や数発の教師あり学習法よりも優れており,文書レベルのAEデータセットではF1スコアが4.53%,9.38%向上した。さらに感情分析や自然言語推論タスクに適用すると、hd-loaプロンプトは2.87%と2.63%の精度向上を達成し、異なるタスク間での有効性を示している。

In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE). The paper identifies key challenges in this problem, including example selection, context length limitation, abundance of event types, and the limitation of Chain-of-Thought (CoT) prompting in non-reasoning tasks. To address these challenges, we introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting method. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their adaptability. Extensive experiments show that our method outperforms the existing prompting methods and few-shot supervised learning methods, exhibiting F1 score improvements of 4.53% and 9.38% on the document-level EAE dataset. Furthermore, when applied to sentiment analysis and natural language inference tasks, the HD-LoA prompting achieves accuracy gains of 2.87% and 2.63%, indicating its effectiveness across different tasks.

翻訳日:2023-11-14 17:56:01 公開日:2023-11-11

# 複雑な相互作用ダイナミクスをモデル化するための因子化プロトタイプ付きグラフODE

Graph ODE with Factorized Prototypes for Modeling Complicated Interacting Dynamics ( http://arxiv.org/abs/2311.06554v1 )

ライセンス: Link先を確認

Xiao Luo, Yiyang Gu, Huiyu Jiang, Jinsheng Huang, Wei Ju, Ming Zhang, Yizhou Sun

(参考訳) 本稿では、物理力学や生物学的過程を理解する上で重要な相互作用力学系のモデリング問題について考察する。最近の研究は主に幾何学グラフを用いてこれらの相互作用を表現し、強力なグラフニューラルネットワーク(GNN)によってキャプチャされる。しかし、分散外シフトや複雑なルールといった困難なシナリオにおける相互作用のダイナミクスの予測は未解決である。本稿では,その問題に対処する因子化プロトタイプ(GOAT)を用いたグラフODEという新しい手法を提案する。 GOATの中核となるのは、コンテキスト知識から分解されたプロトタイプを連続グラフODEフレームワークに組み込むことである。具体的には、GOATでは、オブジェクトレベルのコンテキストとシステムレベルのコンテキストの両方を歴史的トラジェクトリから抽出するために、表現のゆがみとシステムパラメータを用いており、それによって、その独立した影響を明示的にモデル化し、システム変更時の一般化能力を高めることができる。そして,これらの非絡み合った潜在表現をグラフODEモデルに統合し,モデル表現性を高めるための様々な対話型プロトタイプの組み合わせを決定する。モデル全体は、確率を最大化するためにエンドツーエンドの変分推論フレームワークを使用して最適化される。分布域内および分布域外における広範囲な実験はヤギの優越性を検証する。

This paper studies the problem of modeling interacting dynamical systems, which is critical for understanding physical dynamics and biological processes. Recent research predominantly uses geometric graphs to represent these interactions, which are then captured by powerful graph neural networks (GNNs). However, predicting interacting dynamics in challenging scenarios such as out-of-distribution shift and complicated underlying rules remains unsolved. In this paper, we propose a new approach named Graph ODE with factorized prototypes (GOAT) to address the problem. The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT employs representation disentanglement and system parameters to extract both object-level and system-level contexts from historical trajectories, which allows us to explicitly model their independent influence and thus enhances the generalization capability under system changes. Then, we integrate these disentangled latent representations into a graph ODE model, which determines a combination of various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood. Extensive experiments in both in-distribution and out-of-distribution settings validate the superiority of GOAT.

翻訳日:2023-11-14 17:55:35 公開日:2023-11-11

# ビジュアルコモンセンスに基づく異種グラフコントラスト学習

Visual Commonsense based Heterogeneous Graph Contrastive Learning ( http://arxiv.org/abs/2311.06553v1 )

ライセンス: Link先を確認

Zongzhao Li, Xiangyu Zhu, Xi Zhang, Zhaoxiang Zhang, Zhen Lei

(参考訳) 視覚的質問応答 (VQA) のような多くのマルチモーダルアプリケーションにおいて、関連するキーオブジェクトの選択方法と複雑な関係性や言語領域の推論は2つの重要な問題である。本研究では,視覚的コモンセンス情報を組み込んで,視覚的推論タスクをより良く仕上げるための異種グラフコントラスト学習法を提案する。本手法はプラグイン・アンド・プレイ方式として設計されており,様々な代表手法と迅速かつ容易に組み合わせることができる。具体的には,コモンセンスに基づくコントラスト学習とグラフ関係ネットワークという2つの重要な構成要素を含む。コントラスト学習を用いて,識別対象と関連する視覚コモンセンス属性に焦点を絞ったモデルを指導する。さらに、グラフ関係ネットワークの導入により、同種エッジ間の相関関係と異種エッジ間の類似性に関するモデルが原因となり、情報伝達がより効果的になる。 4つのベンチマーク実験により,本手法は7つの代表的なVQAモデルを大幅に改善し,その有効性と一般化性を示した。

How to select relevant key objects and reason about the complex relationships cross vision and linguistic domain are two key issues in many multi-modality applications such as visual question answering (VQA). In this work, we incorporate the visual commonsense information and propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task. Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods. Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network. Using contrastive learning, we guide the model concentrate more on discriminative objects and relevant visual commonsense attributes. Besides, thanks to the introduction of the Graph Relation Network, the model reasons about the correlations between homogeneous edges and the similarities between heterogeneous edges, which makes information transmission more effective. Extensive experiments on four benchmarks show that our method greatly improves seven representative VQA models, demonstrating its effectiveness and generalizability.

翻訳日:2023-11-14 17:55:13 公開日:2023-11-11

# Stain Consistency Learning: 自動デジタル病理分類のためのStain Variationの扱い

Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation ( http://arxiv.org/abs/2311.06552v1 )

ライセンス: Link先を確認

Michael Yeung, Todd Watts, Sean YW Tan, Pedro F. Ferreira, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang

(参考訳) Stain variationは、デジタル病理の自動解析にまつわるユニークな課題である。機械学習手法の頑健性を改善するために多くの方法が開発されているが、比較研究は性能に限定的な利点を示している。さらに, H&E染色データに対して, 分類タスクに限定して, 染色変化の処理方法が開発された。本稿では,染色色不変特徴を学習するために,染色特異的増強と染色一貫性損失関数を組み合わせた新しい枠組みである染色一貫性学習を提案する。セグメンテーションタスクにおける染色変化に対処する方法について,まず第1回,広範な比較を行い,マッソンのトリクロムとh&e染色セルと核データセットについてそれぞれ10の方法を比較した。染色の正常化法では同等か劣る性能が得られたが, 染色増補法や染色逆行法では性能が向上し, 提案手法により一貫して最高の性能が得られた。コードは、https://github.com/mlyg/stain_consistency_learningで入手できる。

Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning

翻訳日:2023-11-14 17:54:53 公開日:2023-11-11

# FDNet:歯のCBCT画像のための特徴分離セグメンテーションネットワーク

FDNet: Feature Decoupled Segmentation Network for Tooth CBCT Image ( http://arxiv.org/abs/2311.06551v1 )

ライセンス: Link先を確認

Xiang Feng, Chengkai Wang, Chengyu Wu, Yunxiang Li, Yongbo He, Shuai Wang, Yaiqi Wang

(参考訳) 精密歯列ビームCT(CBCT)画像分割は矯正治療計画に不可欠である。本稿では, CBCTスキャンで遭遇する歯質変化状況, 複雑なアーチファクトや不明瞭な歯の境界などに対して, FDNet(Feature Decoupled Segmentation Network, FDNet)を提案する。低周波ウェーブレット変換 (LF-Wavelet) は, 歯のグローバルな構造的整合性を強調することで, セマンティックな内容の充実を図り, SAMエンコーダを用いて境界線を改良し, 隣接する歯科構造とのコントラストを向上させる。これらの2つの側面を統合することで、FDNetはセマンティックギャップに十分対処し、詳細で正確なセグメンテーションを提供する。フレームワークの有効性は厳格なベンチマークによって検証され、それぞれ85.28%と75.23%のDiceとIoUのスコアを達成している。この意味的特徴と境界的特徴の革新的な分離は、各要素のユニークな強みを生かし、セグメンテーション性能を著しく向上させる。

Precise Tooth Cone Beam Computed Tomography (CBCT) image segmentation is crucial for orthodontic treatment planning. In this paper, we propose FDNet, a Feature Decoupled Segmentation Network, to excel in the face of the variable dental conditions encountered in CBCT scans, such as complex artifacts and indistinct tooth boundaries. The Low-Frequency Wavelet Transform (LF-Wavelet) is employed to enrich the semantic content by emphasizing the global structural integrity of the teeth, while the SAM encoder is leveraged to refine the boundary delineation, thus improving the contrast between adjacent dental structures. By integrating these dual aspects, FDNet adeptly addresses the semantic gap, providing a detailed and accurate segmentation. The framework's effectiveness is validated through rigorous benchmarks, achieving the top Dice and IoU scores of 85.28% and 75.23%, respectively. This innovative decoupling of semantic and boundary features capitalizes on the unique strengths of each element to significantly elevate the quality of segmentation performance.

翻訳日:2023-11-14 17:54:29 公開日:2023-11-11

# 周期駆動型散逸性超低温原子における非エルミート皮膚効果

Non-Hermitian Skin Effect In Periodically-Driven Dissipative Ultracold Atoms ( http://arxiv.org/abs/2311.06550v1 )

ライセンス: Link先を確認

Zhao-Fan Cai and Tao Liu and Zhongmin Yang

(参考訳) 非エルミートスキン効果(英語版)(NHSE)は、バルクバンド固有状態が系の局所的な境界モードに崩壊することを特徴とするもので、非エルミート物理学の分野において最も顕著な性質の1つである。 NHSEに関する特異な物理現象は多くの関心を集めているが、実験的な実現には通常は非相互ホッピングが必要であり、超低温原子系の大きな課題に直面している。本研究では, 周期的に駆動される超低温原子による1次元光学格子中のNHSEを実現することを提案する。高周波近似における有効フロッケハミルトニアンの研究により、周期駆動によるnhseのメカニズムを明らかにした。その結果,ロバストなnhseは動的局在によって表される駆動位相によって調整できることがわかった。最も注目すべきは、異なる駆動相を持つ2つの結合鎖に対する周期的駆動による臨界皮膚効果を明らかにし、サイズ依存性の位相的in-gapモードの出現を伴っていることである。本研究は,超古原子系における非ヘルミティシティと多体統計の相互作用により,nhseを観測し,それに対応する特異な物理現象を探索するための可能な方法を提供する。

The non-Hermitian skin effect (NHSE), featured by the collapse of bulk-band eigenstates into the localized boundary modes of the systems, is one of most striking properties in the fields of non-Hermitian physics. Unique physical phenomena related to the NHSE have attracted a lot of interest, however, their experimental realizations usually require nonreciprocal hopping, which faces a great challenge in ultracold-atom systems. In this work, we propose to realize the NHSE in a 1D optical lattice by periodically-driven ultracold atoms in the presence of staggered atomic loss. By studying the effective Floquet Hamiltonian in the high-frequency approximation, we reveal the underlying mechanism for the periodic-driving-induced the NHSE. We found that the robust NHSE can be tuned by driving phase, which is manifested by the dynamical localization. Most remarkably, we uncover the periodic-driving-induced critical skin effect for two coupled chains with different driving phases, accompanied by the appearance of size-dependent topological in-gap modes. Our studies provide a feasible way for observing the NHSE and exploring corresponding unique physical phenomena due to the interplay of non-Hermiticity and many-body statistics in ultracold-atom systems.

翻訳日:2023-11-14 17:54:08 公開日:2023-11-11

# Back to Basics: 反復アルゴリズムの高速化

Back to Basics: Fast Denoising Iterative Algorithm ( http://arxiv.org/abs/2311.06634v1 )

ライセンス: Link先を確認

Deborah Pereg

(参考訳) ノイズ低減のための高速反復アルゴリズムであるBack to Basics (BTB)を紹介する。本手法は計算効率が高く, 訓練や基礎的真理データを必要としないため, 独立した雑音が存在する場合や, 雑音レベルが不明な相関音(コヒーレント)にも適用できる。光コヒーレンストモグラフィ(OCT)における白色ガウス雑音の存在下での自然像,ポアソン分布画像デノイング,スペックル抑制の3症例について検討した。実験結果から,提案手法は画像品質を効果的に向上できることが示された。収束安定性に関する理論的保証が提供される。

We introduce Back to Basics (BTB), a fast iterative algorithm for noise reduction. Our method is computationally efficient, does not require training or ground truth data, and can be applied in the presence of independent noise, as well as correlated (coherent) noise, where the noise level is unknown. We examine three study cases: natural image denoising in the presence of additive white Gaussian noise, Poisson-distributed image denoising, and speckle suppression in optical coherence tomography (OCT). Experimental results demonstrate that the proposed approach can effectively improve image quality, in challenging noise settings. Theoretical guarantees are provided for convergence stability.

翻訳日:2023-11-14 17:46:23 公開日:2023-11-11

# 精神医学検出アプリケーション、特にうつ病障害における機械学習と解釈可能な機械学習手法の活用の課題と問題点

The Pros and Cons of Using Machine Learning and Interpretable Machine Learning Methods in psychiatry detection applications, specifically depression disorder: A Brief Review ( http://arxiv.org/abs/2311.06633v1 )

ライセンス: Link先を確認

Hossein Simchi, Samira Tajik

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックにより、多くの人々が社会的活動を制限することを余儀なくされ、精神疾患、特にうつ病が増加した。これらの病気を精度とスピードで診断し、自殺などの重篤な結果を防ぐため、機械学習の利用がますます重要になっている。さらに、より良い治療のために正確で理解可能な診断を提供するためには、AI科学者と研究者は、解釈可能なAIベースのソリューションを開発する必要がある。本稿では、機械学習と解釈可能なAIの分野における関連記事の概要を紹介し、精神疾患検出アプリケーションでAIを使用することの利点とデメリットを理解するのに役立つ。

The COVID-19 pandemic has forced many people to limit their social activities, which has resulted in a rise in mental illnesses, particularly depression. To diagnose these illnesses with accuracy and speed, and prevent severe outcomes such as suicide, the use of machine learning has become increasingly important. Additionally, to provide precise and understandable diagnoses for better treatment, AI scientists and researchers must develop interpretable AI-based solutions. This article provides an overview of relevant articles in the field of machine learning and interpretable AI, which helps to understand the advantages and disadvantages of using AI in psychiatry disorder detection applications.

翻訳日:2023-11-14 17:46:04 公開日:2023-11-11

# スパース正定値行列の特定のクラスの正確な決定式

The Exact Determinant of a Specific Class of Sparse Positive Definite Matrices ( http://arxiv.org/abs/2311.06632v1 )

ライセンス: Link先を確認

Mehdi Molkaraie

(参考訳) スパースガウス図形モデルの特定のクラスに対して、共分散行列の行列式に対する閉形式解を提供する。私たちのフレームワークでは、グラフィカル相互作用モデル(すなわち共分散選択モデル)は$\mathcal{K}_{n}$と$\mathcal{K}_{n-1}$の置換積に等しい。この解析は、正規因子グラフ双対定理とホログラフィックアルゴリズムの応用と見なすことができるモデルの局所因子のフーリエ変換を基礎としている。変換されたグラフィカルモデルに行列行列式Lemmaを適用することにより、クローズドフォーム表現を得る。この文脈では、2つのガウスのグラフィカルモデル間の同値の概念も定義する。

For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.

翻訳日:2023-11-14 17:45:41 公開日:2023-11-11

# 画像品質伝達のための3次元条件拡散モデル -低磁場MRIへの応用-

A 3D Conditional Diffusion Model for Image Quality Transfer -- An Application to Low-Field MRI ( http://arxiv.org/abs/2311.06631v1 )

ライセンス: Link先を確認

Seunghoi Kim, Henry F. J. Tregidgo, Ahmed K. Eldaly, Matteo Figini, Daniel C. Alexander

(参考訳) 低磁場(lf)mriスキャナー(<1t)は、限られたリソースや信頼性の低い電源でまだ普及している。しかし、高磁場(HF)スキャナよりも空間分解能とコントラストの低い画像が得られることが多い。この品質格差は、不正確な臨床解釈をもたらす可能性がある。画像品質伝達(IQT)は,低画質画像と高画質画像のマッピング関数を学習することにより,画像の品質を高めるために開発された。既存のIQTモデルは、しばしば高周波の特徴の復元に失敗し、ぼやけた出力をもたらす。本稿では,3次元ボリュームデータ,特にLF MR画像を改善するための3次元条件拡散モデルを提案する。さらに,ネットワークの自己注意とパディングにクロスバッチ機構を組み込んで,小さな3Dパッチの下でもより広いコンテキスト認識を確保する。 IQTと脳解析のためのHuman Connectome Project(HCP)データセットの実験は、我々のモデルが既存の手法よりも定量的かつ質的に優れていることを示した。コードは \url{https://github.com/edshkim98/DiffusionIQT} で公開されている。

Low-field (LF) MRI scanners (<1T) are still prevalent in settings with limited resources or unreliable power supply. However, they often yield images with lower spatial resolution and contrast than high-field (HF) scanners. This quality disparity can result in inaccurate clinician interpretations. Image Quality Transfer (IQT) has been developed to enhance the quality of images by learning a mapping function between low and high-quality images. Existing IQT models often fail to restore high-frequency features, leading to blurry output. In this paper, we propose a 3D conditional diffusion model to improve 3D volumetric data, specifically LF MR images. Additionally, we incorporate a cross-batch mechanism into the self-attention and padding of our network, ensuring broader contextual awareness even under small 3D patches. Experiments on the publicly available Human Connectome Project (HCP) dataset for IQT and brain parcellation demonstrate that our model outperforms existing methods both quantitatively and qualitatively. The code is publicly available at \url{https://github.com/edshkim98/DiffusionIQT}.

翻訳日:2023-11-14 17:45:01 公開日:2023-11-11

# エネルギー移行シナリオの合理化と鍵政策決定

Streamlining Energy Transition Scenarios to Key Policy Decisions ( http://arxiv.org/abs/2311.06625v1 )

ライセンス: Link先を確認

Florian Joseph Baader, Stefano Moret, Wolfram Wiesemann, Iain Staffell, Andr\'e Bardow

(参考訳) エネルギー移行を取り巻く不確実性は、モデラーが政策立案者が解釈し行動することが困難となるような大きなシナリオを提示することにつながる。もう1つのアプローチは、利害関係者の議論からいくつかの質的なストーリーラインを定義することである。一般的な機械学習手法である決定木を活用することで、多くの定量的シナリオから解釈可能なストーリーラインを導き、エネルギー遷移における重要な決定がどのようにリンクされているかを示す。特に, 再生可能エネルギーとセクタ結合の高度展開を選択することで, 気候変動の感度や需要の不確実性に対して, 世界的な脱炭素シナリオが堅牢になることを示す。また、化石のないヨーロッパへのエネルギー移動は、主にバイオエネルギー、貯蔵、熱電化の役割の選択によって決定される。我々の移行可能なアプローチは、膨大なエネルギーモデルの結果を小さな決定セットに変換し、エネルギー遷移を形成する主要な要因を優先順位付けする決定を導く。

Uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. An alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. Specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. Also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. Our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.

翻訳日:2023-11-14 17:44:41 公開日:2023-11-11

# VT-Former:インテリジェントハイウェイ交通システムのためのトランスフォーマーベース車両軌道予測手法

VT-Former: A Transformer-based Vehicle Trajectory Prediction Approach For Intelligent Highway Transportation Systems ( http://arxiv.org/abs/2311.06623v1 )

ライセンス: Link先を確認

Armin Danesh Pazho, Vinit Katariya, Ghazal Alinezhad Noghre, Hamed Tabkhi

(参考訳) 道路の安全性と交通管理の強化は、現代のサイバー物理システムやインテリジェントな輸送システムにとって重要な焦点となっている。自動車軌道予測は、高速道路や道路安全への多くの応用において重要な要素である。これらのアプリケーションには、交通管理や事故防止からワークゾーンの安全性の向上、エネルギー保全の最適化に至るまで、幅広いユースケースが含まれている。この文脈でインテリジェントな管理を実現する能力は、道路網を横断する監視カメラの展開とともに、人工知能(ai)の分野での発展によって大きく進歩した。本稿では,高速道路の安全と監視のための車両軌道予測のためのトランスフォーマーに基づく新しいアプローチ,VT-Formerを提案する。トランスフォーマを使用して長距離の時間パターンを捉えることに加えて、車両間の複雑な社会的相互作用を捉えるために、新しいグラフ注意トークン化(gat)モジュールが提案されている。これら2つのコアコンポーネントを組み合わせることで、車両軌道予測の正確なアプローチが達成される。車両軌道予測におけるVT-Formerの性能と,その一般化性とロバスト性を示す3つの異なる視点を持つ3つのベンチマークデータセットについて検討した。また,組込み基板上でのvt-formerの効率を評価し,サンプルアプリケーションとしての車両異常検出の可能性について検討し,その幅広い適用性を示す。

Enhancing roadway safety and traffic management has become an essential focus area for a broad range of modern cyber-physical systems and intelligent transportation systems. Vehicle Trajectory Prediction is a pivotal element within numerous applications for highway and road safety. These applications encompass a wide range of use cases, spanning from traffic management and accident prevention to enhancing work-zone safety and optimizing energy conservation. The ability to implement intelligent management in this context has been greatly advanced by the developments in the field of Artificial Intelligence (AI), alongside the increasing deployment of surveillance cameras across road networks. In this paper, we introduce a novel transformer-based approach for vehicle trajectory prediction for highway safety and surveillance, denoted as VT-Former. In addition to utilizing transformers to capture long-range temporal patterns, a new Graph Attentive Tokenization (GAT) module has been proposed to capture intricate social interactions among vehicles. Combining these two core components culminates in a precise approach for vehicle trajectory prediction. Our study on three benchmark datasets with three different viewpoints demonstrates the State-of-The-Art (SoTA) performance of VT-Former in vehicle trajectory prediction and its generalizability and robustness. We also evaluate VT-Former's efficiency on embedded boards and explore its potential for vehicle anomaly detection as a sample application, showcasing its broad applicability.

翻訳日:2023-11-14 17:44:05 公開日:2023-11-11

# TrainerAgent: LLM搭載マルチエージェントシステムによるカスタマイズ可能かつ効率的なモデルトレーニング

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System ( http://arxiv.org/abs/2311.06622v1 )

ライセンス: Link先を確認

Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He

(参考訳) AIモデルのトレーニングは、特にパーソナライズされたサービスを提供するカスタムモデルが必要な場合、常に困難だった。アルゴリズムエンジニアは、特定のビジネス要件に合わせて反復的にモデルを開発するための長いプロセスに直面します。高品質で効率的なモデル開発の探求は、大規模言語モデル(llm)エージェントの出現とともに、業界において重要な焦点となっている。 LLMの強力な分析,計画,意思決定機能を活用し,タスク,データ,モデル,サーバエージェントを含むマルチエージェントフレームワークからなるTranerAgentシステムを提案する。これらのエージェントは、ユーザ定義のタスク、入力データ、要求(例えば、精度、速度)を分析し、データとモデルの両方の観点から包括的な最適化を行い、満足なモデルを取得し、最終的にこれらのモデルをオンラインサービスとしてデプロイする。コンピュータビジョンおよび自然言語処理領域における古典的識別的・生成的タスクに関する実験的評価は,我々のシステムが所望の基準を満たすモデルを一貫して生成していることを示す。さらに、システムは、ファンタスティックなシナリオや非倫理的な要求など、達成不可能なタスクを批判的に識別し、拒否する能力を示し、堅牢性と安全性を確保する。本研究は, LLMを用いた分析, 意思決定, 実行能力の統合, および4つのエージェント間の協調により, 従来のモデル開発と比較して, 効率と品質が向上した望ましいモデルの実現において, 大幅な進歩を示すものである。我々は,AI分野におけるモデル開発の新たなパラダイムとして,学術および産業コミュニティにおけるTranerAgentの研究の進展に,我々の研究が貢献することを期待している。

Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.

翻訳日:2023-11-14 17:43:45 公開日:2023-11-11

# 粗粒土壌の粒子径解析のためのコンピュータビジョン

Computer Vision for Particle Size Analysis of Coarse-Grained Soils ( http://arxiv.org/abs/2311.06613v1 )

ライセンス: Link先を確認

Sompote Youwai and Parchya Makam

(参考訳) 粒子径解析(PSA)は土壌の物理的特性を評価するための基礎技術である。しかし、占いのような伝統的な方法は時間と労力がかかる。本研究では,コンピュータビジョン(CV)と,粗粒土のPSAのためのPythonプログラム言語を用いた携帯電話カメラを用いた新しいアプローチを提案する。高性能カメラの必要性をなくすことで, 利便性とコスト削減を実現する。本手法では,通常の照明条件下で撮影されたデジタル写真中の土壌粒子の検出と測定にOPENCVライブラリを使用する。正確な粒子径決定のために、既知の寸法のキャリブレーションターゲットを20種類の異なる砂サンプルと共に平らな紙に配置する。提案手法は従来のシーブ解析と比較し, 平均絶対誤差(MAPE)が約6%の2mm以上の土壌粒子に対して良好な性能を示した。しかし、粒子が2mmより小さいと、メーゼが高くなり、最大60%に達する。この制限に対処するために,より小型の土壌粒子の画像を高分解能カメラで撮影することを推奨する。さらに,本手法の利点,限界,今後の改善の可能性についても論じる。驚くべきことに、このプログラムは携帯電話で実行でき、土壌サンプルを実験室に送ることなくすぐに結果を提供できる。このフィールドフレンドリーな特徴は,従来の実験室環境以外での現場利用に非常に便利である。最終的に、この新手法は、実験室によるシーブ解析に頼らずに、土壌の効率的な粒径分析を可能にする産業の初期破壊を表す。 KEYWORDS:コンピュータビジョン、粒度、ARUCO

Particle size analysis (PSA) is a fundamental technique for evaluating the physical characteristics of soils. However, traditional methods like sieving can be time-consuming and labor-intensive. In this study, we present a novel approach that utilizes computer vision (CV) and the Python programming language for PSA of coarse-grained soils, employing a standard mobile phone camera. By eliminating the need for a high-performance camera, our method offers convenience and cost savings. Our methodology involves using the OPENCV library to detect and measure soil particles in digital photographs taken under ordinary lighting conditions. For accurate particle size determination, a calibration target with known dimensions is placed on a plain paper alongside 20 different sand samples. The proposed method is compared with traditional sieve analysis and exhibits satisfactory performance for soil particles larger than 2 mm, with a mean absolute percent error (MAPE) of approximately 6%. However, particles smaller than 2 mm result in higher MAPE, reaching up to 60%. To address this limitation, we recommend using a higher-resolution camera to capture images of the smaller soil particles. Furthermore, we discuss the advantages, limitations, and potential future improvements of our method. Remarkably, the program can be executed on a mobile phone, providing immediate results without the need to send soil samples to a laboratory. This field-friendly feature makes our approach highly convenient for on-site usage, outside of a traditional laboratory setting. Ultimately, this novel method represents an initial disruption to the industry, enabling efficient particle size analysis of soil without the reliance on laboratory-based sieve analysis. KEYWORDS: Computer vision, Grain size, ARUCO

翻訳日:2023-11-14 17:43:15 公開日:2023-11-11

# 知覚GPT:視覚知覚をLLMに効果的に融合させる

PerceptionGPT: Effectively Fusing Visual Perception into LLM ( http://arxiv.org/abs/2311.06612v1 )

ライセンス: Link先を確認

Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang

(参考訳) 視覚入力と大言語モデル(LLM)の統合は、多モーダル機能において顕著な進歩をもたらし、視覚的大言語モデル(VLLM)がもたらされた。しかしながら、複雑な視覚知覚タスクにVLLMを効果的に活用することは課題である。本稿では,LLMのトークン埋め込みの表現力を生かして,VLLMを視覚的知覚能力に効率よく効果的に装備する,PerceptionGPTという新しいエンドツーエンドフレームワークを提案する。提案手法は, LLMのトークン埋め込みを空間情報のキャリアとして扱い, 軽量な視覚タスクエンコーダとデコーダを利用して視覚知覚タスク(例えば, 検出, セグメンテーション)を実行する。このアプローチは,視覚出力を離散的なトークンとして定式化した従来のアプローチが経験したトレーニングの難しさを著しく軽減し,トレーニング可能なパラメータが少なく,トレーニングデータが少なく,トレーニング時間の短縮によって優れたパフォーマンスを実現する。さらに、視覚的出力をデコードするために1つのトークン埋め込みが必要なため、推論中のシーケンス長が大幅に削減される。これにより,高精度かつ柔軟な表現,視覚知覚タスクのシームレスな統合,複数の視覚出力の効率的な処理が可能となる。我々はこのアプローチの有効性と効率を広範囲な実験によって検証する。その結果、トレーニング可能なパラメータやGPU時間を大幅に削減した従来の手法よりも大幅に改善され、視覚的知覚能力を持つLLMの実現に向けた今後の研究が促進された。

The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate visual perception tasks remains a challenge. In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding. Our proposed method treats the token embedding of the LLM as the carrier of spatial information, then leverage lightweight visual task encoders and decoders to perform visual perception tasks (e.g., detection, segmentation). Our approach significantly alleviates the training difficulty suffered by previous approaches that formulate the visual outputs as discrete tokens, and enables achieving superior performance with fewer trainable parameters, less training data and shorted training time. Moreover, as only one token embedding is required to decode the visual outputs, the resulting sequence length during inference is significantly reduced. Consequently, our approach enables accurate and flexible representations, seamless integration of visual perception tasks, and efficient handling of a multiple of visual outputs. We validate the effectiveness and efficiency of our approach through extensive experiments. The results demonstrate significant improvements over previous methods with much fewer trainable parameters and GPU hours, which facilitates future research in enabling LLMs with visual perception abilities.

翻訳日:2023-11-14 17:42:52 公開日:2023-11-11

# monkey: 画像解像度とテキストラベルは、大規模マルチモーダルモデルにとって重要だ

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models ( http://arxiv.org/abs/2311.06607v1 )

ライセンス: Link先を確認

Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

(参考訳) 大規模なマルチモーダルモデルは、一般的な視覚言語タスクを理解する素晴らしい能力を示している。しかし、サポート対象の入力解像度(例えば448 x 448)の制限と、トレーニングされた画像テキストペアの説明不足のため、これらのモデルは複雑なシーン理解や物語を扱う際の課題に直面することが多い。ここでは猿を提案することでこの問題に対処します。私たちの貢献は2つあります。 1) 初期から事前学習することなく,既存の視覚エンコーダ(例えばvit-bighuge)上に構築することで,最大896 x 1344ピクセルの入力解像度を効果的に向上させることができる。 2)シーンとオブジェクト間の文脈関係を学習するために,モデルをガイドできるリッチな情報を自動的に提供する多レベル記述生成手法を提案する。 16以上の異なるデータセットにわたる広範なテストの結果、Monkeyは画像キャプチャ、一般的なビジュアル質問回答(VQA)、ドキュメント指向のVQAといった基本的なタスクにおいて、既存のLMMよりも一貫して競争力のあるパフォーマンスを実現しています。モデル、インタラクティブなデモ、ソースコードは以下の https://github.com/Yuliang-Liu/Monkey で提供されている。

Large Multimodal Models have demonstrated impressive capabilities in understanding general vision-language tasks. However, due to the limitation of supported input resolution (e.g., 448 x 448) as well as the inexhaustive description of the training image-text pair, these models often encounter challenges when dealing with intricate scene understandings and narratives. Here we address the problem by proposing the Monkey. Our contributions are two-fold: 1) without pretraining from the start, our method can be built upon an existing vision encoder (e.g., vit-BigHuge) to effectively improve the input resolution capacity up to 896 x 1344 pixels; 2) we propose a multi-level description generation method, which automatically provides rich information that can guide model to learn contextual association between scenes and objects. Our extensive testing across more than 16 distinct datasets reveals that Monkey achieves consistently competitive performance over the existing LMMs on fundamental tasks, such as Image Captioning, General Visual Question Answering (VQA), and Document-oriented VQA. Models, interactive demo, and the source code are provided at the following https://github.com/Yuliang-Liu/Monkey.

翻訳日:2023-11-14 17:42:27 公開日:2023-11-11

# 粒子の位置と運動量の連続的同時測定」へのコメント

Comment on "Continuous simultaneous measurement of position and momentum of a particle" ( http://arxiv.org/abs/2311.06606v1 )

ライセンス: Link先を確認

Ad\'elcio C. Oliveira

(参考訳) 最近の論文 [gampel, f. and gajda, m., phys. rev. a 107, 012420, (2023)] では、量子領域における古典的軌道の存在を説明する新しいモデルを提案していると主張した。このアイデアは、位置と運動量の同時測定と「ジャンプマルコフ過程」に基づいている。その結果、古典的軌跡の出現を検出イベントの集合として解釈した。彼らは自由粒子と調和ポテンシャルの下でのモデルの実装に成功した。ここでは,連続観測限界がコヒーレント半古典的展開の実現であることを示す。また,すでに証明されているように,ジャンププロセスは不要であり,観測不能である。言い換えれば、崩壊は非ゴーの定理であり、たとえそれが現実であるとしても、ニュートン古典力学を得るために必要な仮定の下で測定することはできない。

In a recent paper, [Gampel, F. and Gajda, M., Phys. Rev. A 107, 012420, (2023)], the authors claimed they are proposing a new model to explain the existence of classical trajectories in the quantum domain. The idea is based on simultaneous position and momentum measurements and a "jump Markov process". Consequently, they have interpreted the emergence of classical trajectories as sets of detection events. They successfully implemented the model for a free particle and for one under a harmonic potential. Here, we show that the continuous observation limit is a realization of a coherent semiclassical expansion; Also, as has already been demonstrated, the jump process is not necessary and is not observable. In other words, the collapse, as they propose, is a non-go theorem; even if it is real, it can not be measured under the needed assumptions to obtain Newtonian classical dynamics.

翻訳日:2023-11-14 17:42:05 公開日:2023-11-11

# BizBench:ビジネスとファイナンスのための定量的推論ベンチマーク

BizBench: A Quantitative Reasoning Benchmark for Business and Finance ( http://arxiv.org/abs/2311.06602v1 )

ライセンス: Link先を確認

Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner

(参考訳) 大規模言語モデル(LLM)が多くの複雑なドメインに影響を与えるにつれ、公正で正確で厳密な評価ベンチマークを持つことがますます重要になっている。ビジネスおよび金融NLPに必要な推論スキルを評価することは、特に難しい課題である。実存的な金融問題に対するモデルの判断能力を評価するための新しいベンチマークであるbizbenchを紹介する。 BizBenchは8つの量的推論タスクからなる。特に、BizBenchは、プログラム合成(コード生成)による構造化および非構造化の財務データに対する質問応答(QA)の複雑なタスクをターゲットにしている。本稿では,新たに収集および拡張されたQAデータから,金融をテーマとした3つのコード生成タスクを紹介する。さらに,これらの課題を解決するために必要な財務的推論能力を分離する: 正しい中間値を抽出するために必要な財務的テキストと表の理解を読むこと,複雑な解を計算するために必要なドメイン知識(例えば財務的公式)を理解すること。これらのタスクは、モデルの財務的背景知識、財務文書から数値的実体を抽出する能力、およびコードによる問題を解決する能力を評価する。我々は、BizBenchが金融及びビジネス領域における量的推論の難しいベンチマークであることを示すオープンソースおよび商用LCMの詳細な評価を行う。

As large language models (LLMs) impact a growing number of complex domains, it is becoming increasingly important to have fair, accurate, and rigorous evaluation benchmarks. Evaluating the reasoning skills required for business and financial NLP stands out as a particularly difficult challenge. We introduce BizBench, a new benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises 8 quantitative reasoning tasks. Notably, BizBench targets the complex task of question-answering (QA) for structured and unstructured financial data via program synthesis (i.e., code generation). We introduce three diverse financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate distinct financial reasoning capabilities required to solve these QA tasks: reading comprehension of financial text and tables, which is required to extract correct intermediate values; and understanding domain knowledge (e.g., financial formulas) needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to extract numeric entities from financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, illustrating that BizBench is a challenging benchmark for quantitative reasoning in the finance and business domain.

翻訳日:2023-11-14 17:41:50 公開日:2023-11-11

# ロバスト性の観点からのグロッキングの理解

Understanding Grokking Through A Robustness Viewpoint ( http://arxiv.org/abs/2311.06597v1 )

ライセンス: Link先を確認

Zhiquan Tan, Weiran Huang

(参考訳) 近年、グロッキングと呼ばれる異常な現象が注目され、ニューラルネットワークがトレーニングデータに完全に適合した後に一般化することがある。ニューラルネットワークのロバスト性を利用して、この奇妙な現象を理解しようとしている。また,ロバスト性の観点からは,ニューラルネットワークのl_2$weight norm (metric) がグルーキングの十分条件であることを示す。また,l_2$ノルムがテストデータのグロッキングと時間的に相関していることが実証的に分かったので,ロバスト性と情報理論に基づく新しい指標を提案し,新しい指標がグロキング現象とよく相関していることを見いだした。先程の観測に基づいて,一般化過程を高速化する手法を提案する。さらに, モジュロ付加データセットの標準トレーニングプロセスについて検討し, 通勤法など, グルーキング前の基本的なグループ操作をほとんど学ばないことを見出した。興味深いことに,提案手法を用いた一般化の高速化は,モデルがテストデータセットに群がる必要条件である可換法則を学習することによって部分的に説明できる。

Recently, an unusual phenomenon called grokking has gained much attention, where sometimes a neural network generalizes long after it perfectly fits the training data. We try to understand this seemingly strange phenomenon using the robustness of the neural network. Using a robustness viewpoint, we show that the popular $l_2$ weight norm (metric) of the neural network is actually a sufficient condition for grokking. As we also empirically find that $l_2$ norm correlates with grokking on the test data not in a timely way, we propose new metrics based on robustness and information theory and find that our new metrics correlate well with the grokking phenomenon. Based on the previous observations, we propose methods to speed up the generalization process. In addition, we examine the standard training process on modulo addition dataset and find that it hardly learns other basic group operations before grokking, including the commutative law. Interestingly, the speed up of generalization when using our proposed method can be partially explained by learning the commutative law, a necessary condition when the model groks on test dataset.

翻訳日:2023-11-14 17:41:28 公開日:2023-11-11

# 分類から生成へ:言語横断検索型ICLへの展望

From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL ( http://arxiv.org/abs/2311.06595v1 )

ライセンス: Link先を確認

Xiaoqian Li, Ercong Nie, Sheng Liang

(参考訳) 大きな言語モデル(llm)が命令を理解して従う能力は、低リソース言語でのin-context learning(icl)性能によって制限されることがある。そこで本研究では,言語間検索強化型インコンテキスト学習(CREA-ICL)を活用した新しい手法を提案する。高リソース言語から意味的に類似したプロンプトを抽出することで、様々なタスクにわたる多言語事前学習言語モデル(mplm)のゼロショット性能を向上させることを目指している。我々のアプローチは分類タスクを着実に改善するが、生成タスクの課題に直面している。本評価は,分類領域と生成領域にまたがる検索文内学習の性能動態に関する知見を提供する。

The remarkable ability of Large Language Models (LLMs) to understand and follow instructions has sometimes been limited by their in-context learning (ICL) performance in low-resource languages. To address this, we introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL). By extracting semantically similar prompts from high-resource languages, we aim to improve the zero-shot performance of multilingual pre-trained language models (MPLMs) across diverse tasks. Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks. Our evaluation offers insights into the performance dynamics of retrieval-augmented in-context learning across both classification and generation domains.

翻訳日:2023-11-14 17:41:08 公開日:2023-11-11

# ホットシステム間の論理ゲートを用いた量子計算

Quantum computation with logical gates between hot systems ( http://arxiv.org/abs/2311.06588v1 )

ライセンス: Link先を確認

Ferran Riera-S\`abat, Pavel Sekatski, and Wolfgang D\"ur

(参考訳) 量子コンピュータアーキテクチャでは、機械的な基底状態ではないホット量子ビット間で相互作用が媒介される。そのような状況は、例えば、理想的に冷却しない場合や、イオンや原子が動き回る場合などに起こる。論理的に符号化されたシステム間で量子ゲートを導入し、これらのゲートがこのような不完全性に対して弾力性を持つことを示す。このようにして、論理系を大きくすることでゲートの忠実度を向上し、未知の位置や関連する粒子の位置ゆらぎの影響に対処できることを実証する。確率分布における位置の古典的処理と、機械的固有値を用いた量子処理の両方を考慮する。 2つのホットシステム間の相互作用を媒介するクールな論理システムや,その位置が個々に変動するホット物理システムからなる2つの論理システムなど,さまざまな設定を分析した。いずれの場合においても,熱騒音を緩和するためのプラットフォームに依存しないツールを提供するゲートフィダリティの大幅な改善を実証する。

We consider quantum computer architectures where interactions are mediated between hot qubits that are not in their mechanical ground state. Such situations occur, e.g., when not cooling ideally, or when moving ions or atoms around. We introduce quantum gates between logically encoded systems that consist of multiple physical ones and show how the encoding can be used to make these gates resilient against such imperfections. We demonstrate that, in this way, one can improve gate fidelities by enlarging the logical system, and counteract the effect of unknown positions or position fluctuations of involved particles. We consider both a classical treatment of positions in terms of probability distributions, as well a quantum treatment using mechanical eigenmodes. We analyze different settings including a cool logical system mediating interactions between two hot systems, as well as two logical systems consisting of hot physical systems whose positions fluctuate collectively or individually. In all cases, we demonstrate a significant improvement of gate fidelities, which provides a platform-independent tool to mitigate thermal noise.

翻訳日:2023-11-14 17:40:55 公開日:2023-11-11

# 非自明な貯蓄によるAgnostic Membership Query Learning: 新しい結果、テクニック

Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques ( http://arxiv.org/abs/2311.06690v1 )

ライセンス: Link先を確認

Ari Karchmer

(参考訳) (橋渡し) 不可知学習モデル(Haussler, 1992; Kearns et al., 1994)における計算効率のよいアルゴリズムの設計は、非常に難しい。本研究では,2^n$の自明なランタイム上でどれだけの計算を節約できるかに着目し,非依存学習の最前線におけるタッチストーンクラスのメンバシップクエリによる非依存学習について考察する。このアプローチは‘非自明な貯蓄による学習’(Servedio and Tan, 2017)にインスパイアされ、継続している。この目的のために,1 個のゲートからなる回路の非依存学習アルゴリズムを,次数k の多項式しきい値関数で計算可能な任意の関数(回路の深さは大きさのみに制限される)として確立する。このアルゴリズムは s(n) \approx n/(k+1) の時間 2^{n -s(n)} で実行され、 \{0,1\}^n 上のラベルなし例に対する一様分布を学習する。 2) ゲートのサブ線形数からなる回路の非依存学習アルゴリズムでは,各回路は,サブ指数サイズとサブ対数次数 k の \sym^+ 回路で計算可能な任意の関数を計算できる。このアルゴリズムは s(n) \approx n/(k+1) に対して時間 2^{n-s(n)} で実行され、k+1 の任意の分布と未知の分布の積である非競合例の分布について学習する(k+1 が n を割る一般性を失うことなく)。

(Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n).

翻訳日:2023-11-14 17:34:03 公開日:2023-11-11

# p$-spinモデルのための単層デジタル化カウンタダイアバティック量子最適化

Single-Layer Digitized-Counterdiabatic Quantum Optimization for $p$-spin Models ( http://arxiv.org/abs/2311.06682v1 )

ライセンス: Link先を確認

Huijie Guan, Fei Zhou, Francisco Albarr\'an-Arriagada, Xi Chen, Enrique Solano, Narendra N. Hegade, He-Liang Huang

(参考訳) 量子コンピューティングは最適化問題において量子優位の可能性を秘めており、量子アルゴリズムとハードウェア仕様の進歩を必要とする。断熱量子最適化は概念的には有効な解であり、ハードウェアのコヒーレンス時間が限られている。この意味で、反断熱量子プロトコルはこの過程のショートカットを提供し、急速に変化するハミルトニアンの基底状態に沿ってシステムを操る。本研究では,デジタルカウンタダイバティック量子最適化(DCQO)アルゴリズムの利点をフル活用し,最大4局所相互作用までの$p$-spinモデルの最適解を求める。適切なスケジューリング関数と初期ハミルトニアンを選択すると、単層量子回路が十分満足して良い基底状態重なりが得られる。さらに変動法を用いてパラメータを最適化することにより、それぞれ100\%$,93\%$,83\%$のインスタンスに対して、単位精度2-スピン、3-スピン、4-スピンの問題を解く。後者の場合として、5,9,12量子ビットを含む分解問題も解決する。計算オーバーヘッドが低いため、我々のコンパクトなアプローチは、NISQ時代の量子優位性に対する貴重なツールとなりうる。

Quantum computing holds the potential for quantum advantage in optimization problems, which requires advances in quantum algorithms and hardware specifications. Adiabatic quantum optimization is conceptually a valid solution that suffers from limited hardware coherence times. In this sense, counterdiabatic quantum protocols provide a shortcut to this process, steering the system along its ground state with fast-changing Hamiltonian. In this work, we take full advantage of a digitized-counterdiabatic quantum optimization (DCQO) algorithm to find an optimal solution of the $p$-spin model up to 4-local interactions. We choose a suitable scheduling function and initial Hamiltonian such that a single-layer quantum circuit suffices to produce a good ground-state overlap. By further optimizing parameters using variational methods, we solve with unit accuracy 2-spin, 3-spin, and 4-spin problems for $100\%$, $93\%$, and $83\%$ of instances, respectively. As a particular case of the latter, we also solve factorization problems involving 5, 9, and 12 qubits. Due to the low computational overhead, our compact approach may become a valuable tool towards quantum advantage in the NISQ era.

翻訳日:2023-11-14 17:33:26 公開日:2023-11-11

# ポストセレクテッドメトロロジーにおける圧縮チャネルの理論

Theory of Compression Channels for Post-selected Metrology ( http://arxiv.org/abs/2311.06679v1 )

ライセンス: Link先を確認

Jing Yang

(参考訳) 確率的メタロジ(probabilistic merology)としても知られるポストセレクトメロジ(Post-selected merology)は、精度の低下を伴わずにサンプル数を圧縮する効率的なフィルタや圧縮チャネルとして用いられる。このメトロロジースキームは、実際の実験で最終的な測定が非常に騒がしいか高価な場合、特に有利である。本研究では,ポストセレクトメトロジーにおける圧縮チャネルに関する一般的な理論を提唱する。圧縮品質を特徴付ける基本表記法を定義し,基礎構造を照らし出す。選択後の光位相推定と弱値増幅に関する以前の実験は、この一般理論の特別な例である。さらに,二成分系の2つのカテゴリにおいて,圧縮チャネルを1つのサブシステムに制限しても圧縮損失を任意に小さくすることができることを見出した。これらの結果は、測定ノイズとコストが劇的に低減するように量子測定を分配するために用いられる。そのため、量子技術にすぐに応用できると期待している。

Post-selected metrology, also known as probabilistic metrology, can be employed as an efficient filter or compression channel to compress the number of samples without significant loss of precision. This metrological scheme is especially advantageous when the final measurements are either very noisy or expensive in practical experiments. In this work, we put forward a general theory on the compression channels in post-selected metrology. We define the basic notations characterizing the compression quality and illuminate the underlying structure. Previous experiments on post-selected optical phase estimation and weak-value amplification are shown to be particular cases of this general theory. Furthermore, we discover that for two categories of bipartite systems, the compression loss can be made arbitrarily small even when the compression channel is restricted to one subsystem. These findings can be employed to distribute quantum measurements so that the measurement noise and cost are dramatically reduced. Therefore, we expect they will find immediate applications in quantum technology.

翻訳日:2023-11-14 17:33:06 公開日:2023-11-11

# 適応への夢:潜在文脈イマジネーションとMDPイマジネーションによるメタ強化学習

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination ( http://arxiv.org/abs/2311.06673v1 )

ライセンス: Link先を確認

Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng

(参考訳) メタ強化学習(Meta RL)は、類似したタスクから学習した知識を伝達することによって、目立たないタスクを素早く学習するために、十分に研究されている。しかし、ほとんどの最先端のアルゴリズムでは、メタトレーニングタスクはタスクの分散を密にカバーし、それぞれに大量のデータを必要とする。本稿では,メタ想像とMDP想像を行うことにより,実際のトレーニング作業やデータが少ないコンテキストベースのメタRLアルゴリズムであるMetaDreamerを提案する。我々は,不連続な性質を持つ学習された潜在コンテキスト空間を補間し,物理的知識をプレーンvaeネットワークに追加する生成世界モデルを通じてmdpを補間することでメタイマジネーションを行う。様々なベンチマークによる実験により,MetaDreamerはデータ効率と補間一般化の既存手法より優れていることが示された。

Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization.

翻訳日:2023-11-14 17:32:50 公開日:2023-11-11

# デジタル著作権管理(DRM)のガイドライン

Guideline for the Production of Digital Rights Management (DRM) ( http://arxiv.org/abs/2311.06671v1 )

ライセンス: Link先を確認

Shannon Kathleen Coates, Hossein Abroshan

(参考訳) 長年にわたり複数のニュースソースがデジタル著作権管理の問題点について報告してきたが、DRM開発のための改革は行われていない。問題は一般によく知られ、対処しても頻繁に繰り返される。ソフトウェアやそれらを実行するデバイスへの影響だ。しかし、近年、特に提示された問題を排除する意図で、議論されているものはほとんどない。本研究は,デジタル著作権管理を一般論として,取得る様々な形態,drmに影響を及ぼす現行法,現在の公衆の受容と対応などについて検討する。本研究は、DRMのさまざまな種類を概説し、正例と負例の両方を列挙する。

Multiple news sources over the years have reported on the problematic effects of Digital Rights Management, yet there are no reforms for DRM development, simply removal. The issues are well-known to the public, frequently repeated even when addressed: impact on the software and to the devices that run them. Yet few, if any, have discussed it in recent years, especially with the intent of eliminating the shown issues. This study reviews Digital Rights Management as a general topic, including the various forms it can take, the current laws that affect DRM, and the current public reception and responses. This study describes the different types of DRM in general terms and then lists both positive and negative examples.

翻訳日:2023-11-14 17:32:33 公開日:2023-11-11

# In-context Vectors:潜時空間ステアリングによる文脈学習の効率化と制御性

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering ( http://arxiv.org/abs/2311.06668v1 )

ライセンス: Link先を確認

Sheng Liu, Lei Xing, James Zou

(参考訳) 大規模言語モデル(LLM)は、実例に基づく新しいタスクに適応する、創発的なコンテキスト内学習能力を示す。しかし、コンテキスト内学習は多くの設定において限定的な効果を示しており、定量的に制御することは困難であり、コンテキストウィンドウスペースを取る。これらの制限を克服するために,文脈内学習を文脈内ベクトル(icv)として再キャストする手法を提案する。 ICVの使用には2つのステップがある。まず、実演例のフォワードパスを使用して、LCMの潜伏埋め込みからコンテキスト内ベクトルを生成する。このベクトルは、意図したタスクに関する重要な情報をキャプチャする。新しいクエリでは、プロンプトにデモを追加する代わりに、ICVを使ってLCMの潜伏状態を変更する。 icvアプローチにはいくつかの利点があります 1) LLM は,より効果的に実演例に従うことができる。 2)ICVの大きさを調整することで制御が容易である。 3) インコンテキストのデモを取り除き,プロンプトの長さを短縮する。 4) ICVは微調整よりも計算効率が高い。安全,スタイル転送,ロールプレイング,フォーマッティングなど多種多様なタスクに対して,標準のコンテキスト内学習や微調整よりも優れた性能を実現することを実証した。さらに,対応するISV上の単純ベクトル演算により,LLMに異なる命令を同時に追従するように柔軟に教えることができることを示す。

Large language models (LLMs) demonstrate emergent in-context learning capabilities, where they adapt to new tasks based on example demonstrations. However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV). Using ICV has two steps. We first use a forward pass on demonstration examples to create the in-context vector from the latent embedding of the LLM. This vector captures essential information about the intended task. On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV. The ICV approach has several benefits: 1) it enables the LLM to more effectively follow the demonstration examples; 2) it's easy to control by adjusting the magnitude of the ICV; 3) it reduces the length of the prompt by removing the in-context demonstrations; 4) ICV is computationally much more efficient than fine-tuning. We demonstrate that ICV achieves better performance compared to standard in-context learning and fine-tuning on diverse tasks including safety, style transfer, role-playing and formatting. Moreover, we show that we can flexibly teach LLM to simultaneously follow different types of instructions by simple vector arithmetics on the corresponding ICVs.

翻訳日:2023-11-14 17:32:17 公開日:2023-11-11

# 3dfusion - ストリームインスタンスセグメンテーションデータに基づくリアルタイム3dオブジェクト再構築パイプライン

3DFusion, A real-time 3D object reconstruction pipeline based on streamed instance segmented data ( http://arxiv.org/abs/2311.06659v1 )

ライセンス: Link先を確認

Xi Sun, Derek Jacoby, Yvonne Coady

(参考訳) 本稿では,RGB-D画像を用いたリアルタイムセグメンテーション・再構築システムを提案する。最先端のインスタンスセグメンテーション技術を活用し、RGB-Dデータ上でピクセルレベルのセグメンテーションを行い、背景オブジェクトを効果的に分離する。セグメント化されたオブジェクトは、高速な計算プラットフォームで異なる3Dモデルに再構成される。リアルタイム3Dモデリングは、拡張現実、仮想現実、インテリアデザイン、都市計画、道路支援、セキュリティシステムなど、さまざまな分野に適用することができる。本稿では,連続フレームを効果的にサンプリングし,復元品質を確保しつつネットワーク負荷を低減する手法を提案する。さらに、並列3次元再構成のためにマルチプロセスSLAMパイプラインが採用され、クラスタリングオブジェクトを個人に効率的に切断することができる。このシステムは、産業をリードするフレームワークであるYOLOを例に挙げる。 YOLOの性能と精度を向上させるため、類似したオブジェクトの重複や誤検出を解消し、再構成されたモデルがターゲットと一致することを保証した。全体として本研究は,室内環境におけるオブジェクトのセグメンテーションと再構成を著しく向上した,堅牢なリアルタイムシステムを確立する。屋外のシナリオに拡張して、現実世界のアプリケーションに多くの機会を開放する可能性がある。

This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images to generate accurate and detailed individual 3D models of objects within a captured scene. Leveraging state-of-the-art instance segmentation techniques, the system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background. The segmented objects are then reconstructed into distinct 3D models in a high-performance computation platform. The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more. To achieve real-time performance, the paper proposes a method that effectively samples consecutive frames to reduce network load while ensuring reconstruction quality. Additionally, a multi-process SLAM pipeline is adopted for parallel 3D reconstruction, enabling efficient cutting of the clustering objects into individuals. This system employs the industry-leading framework YOLO for instance segmentation. To improve YOLO's performance and accuracy, modifications were made to resolve duplicated or false detection of similar objects, ensuring the reconstructed models align with the targets. Overall, this work establishes a robust real-time system with a significant enhancement for object segmentation and reconstruction in the indoor environment. It can potentially be extended to the outdoor scenario, opening up numerous opportunities for real-world applications.

翻訳日:2023-11-14 17:31:55 公開日:2023-11-11

# マイクロ流体技術におけるダイヤモンド量子センサ

Diamond quantum sensors in microfluidics technology ( http://arxiv.org/abs/2311.06656v1 )

ライセンス: Link先を確認

Masazumi Fujiwara

(参考訳) ダイヤモンド量子センシングは、様々な化学的および生物学的文脈において、ナノからマイクロスケールの複数の物理化学的パラメータを探索する新しい技術である。これらのセンサをマイクロ流体デバイスに統合することで、マイクロスケールチャネル内の小さなサンプルボリュームの正確な定量化と分析が可能になる。本稿では,ダイヤモンド量子センサとマイクロ流体デバイスの統合の最近の進歩について述べるとともに,今後の技術発展に焦点をあてて今後の展望を探る。

Diamond quantum sensing is an emerging technology for probing multiple physico-chemical parameters in the nano- to micro-scale dimensions within diverse chemical and biological contexts. Integrating these sensors into microfluidic devices enables the precise quantification and analysis of small sample volumes in microscale channels. In this Perspective, we present recent advancements in the integration of diamond quantum sensors with microfluidic devices and explore their prospects with a focus on forthcoming technological developments.

翻訳日:2023-11-14 17:31:32 公開日:2023-11-11

# セグメンテーション周波数統計を用いた教師なし・半教師付き共存物体検出

Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics ( http://arxiv.org/abs/2311.06654v1 )

ライセンス: Link先を確認

Souradeep Chakraborty, Shujon Naha, Muhammet Bastan, Amit Kumar K C, Dimitris Samaras

(参考訳) 本稿では、周波数統計を用いた画像群における共起サラリアンオブジェクト(CoSOD)の検出に対処し、さらに半教師付き手法の開発を可能にする。以前の研究は、主に完全な教師付きcosodにフォーカスしていたが、訓練用に制限されたセグメンテーションアノテーションが利用できる場合、協調オブジェクトを検出することにはあまり注意が払われていない。 us-cosod法は,自己教師付き特徴学習を用いて,教師なし単一画像意味セグメンテーションのオブジェクト共起頻度統計と有意義な前景検出を組み合わせる。初めて、ImageNet-1kのような大規模なラベルなしデータセットを効果的に活用し、教師なしのCoSOD性能を大幅に改善できることを示す。我々の教師なしモデルは、特に非常に限られたラベル付きデータがトレーニングに利用可能である場合、半教師付きモデルSS-CoSODのトレーニング前初期化に優れたものです。ラベルなしデータの予測から誤信号の伝播を避けるため,半教師付きトレーニングをガイドする信頼度推定モジュールを提案する。例えば、Cosal2015データセットでは、当社のUS-CoSODモデルはSOTAの教師なしコセグメンテーションモデルよりも8.8%、SS-CoSODモデルはSOTAの半教師付きCoSODモデルよりも11.81%のF測定ゲインを持つ。

In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).

翻訳日:2023-11-14 17:31:25 公開日:2023-11-11

# 局所視覚変換器を用いた交通信号認識

Traffic Sign Recognition Using Local Vision Transformer ( http://arxiv.org/abs/2311.06651v1 )

ライセンス: Link先を確認

Ali Farzipour, Omid Nejati Manzari, Shahriar B. Shokouhi

(参考訳) 交通標識認識は、自動運転車や運転支援システムにおいて重要な側面であり、交通標識認識などの機械視タスクが注目されている。 cnnは機械ビジョンで頻繁に使われているが、視覚トランスフォーマーの導入はグローバル機能学習に代替のアプローチを提供した。本稿では,交通信号認識のための畳み込み型ネットワークと変圧器型ネットワークの利点を融合した新しいモデルを提案する。提案モデルは,局所相関をキャプチャする畳み込みブロックと,グローバル依存を学習するためのトランスフォーマティブブロックを含む。さらに、局所知覚を高めるために局所モジュールが組み込まれている。提案モデルの性能をペルシャ交通信号データセットとドイツ交通信号認識ベンチマークで評価し,SOTA畳み込みモデルと変圧器モデルとの比較を行った。実験評価の結果,局所性モジュールを用いたハイブリッドネットワークは,トランスフォーマーモデルや畳み込みネットワークの精度を上回っていることがわかった。具体的には、提案した最終モデルは、ドイツのトラフィックサイン認識ベンチマークで99.66%、ペルシアのトラフィックサインデータセットで99.8%に達し、最も優れた畳み込みモデルよりも高かった。さらに、高速な推論速度を維持しながら、既存のCNNやViTよりも優れています。その結果,提案手法はより高速で,現実のアプリケーションに適していることがわかった。

Recognition of traffic signs is a crucial aspect of self-driving cars and driver assistance systems, and machine vision tasks such as traffic sign recognition have gained significant attention. CNNs have been frequently used in machine vision, but introducing vision transformers has provided an alternative approach to global feature learning. This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition. The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies. Additionally, a locality module is incorporated to enhance local perception. The performance of the suggested model is evaluated on the Persian Traffic Sign Dataset and German Traffic Sign Recognition Benchmark and compared with SOTA convolutional and transformer-based models. The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy. Specifically, our proposed final model reached 99.66% accuracy in the German traffic sign recognition benchmark and 99.8% in the Persian traffic sign dataset, higher than the best convolutional models. Moreover, it outperforms existing CNNs and ViTs while maintaining fast inference speed. Consequently, the proposed model proves to be significantly faster and more suitable for real-world applications.

翻訳日:2023-11-14 17:30:49 公開日:2023-11-11

# 分岐ネットワークにおけるヒューリスティック最適輸送

Heuristic Optimal Transport in Branching Networks ( http://arxiv.org/abs/2311.06650v1 )

ライセンス: Link先を確認

M. Andrecut

(参考訳) 最適輸送は、通常距離の関数として定義されるコストを最小限にして、ソースをターゲットにマッピングすることを目的としている。この問題の解決策は、ソースをターゲットに最適に接続する直線セグメントで構成されており、分岐は示さない。これらの最適解は、分岐構造が一般的である自然および人工の輸送ネットワークと対照的である。本稿では,ネットワークにおける最適輸送のための高速ヒューリスティック分岐法について論じる。

Optimal transport aims to learn a mapping of sources to targets by minimizing the cost, which is typically defined as a function of distance. The solution to this problem consists of straight line segments optimally connecting sources to targets, and it does not exhibit branching. These optimal solutions are in stark contrast with both natural, and man-made transportation networks, where branching structures are prevalent. Here we discuss a fast heuristic branching method for optimal transport in networks, and we provide several applications.

翻訳日:2023-11-14 17:30:25 公開日:2023-11-11

# テンプレートはあなただけのミームです

A Template Is All You Meme ( http://arxiv.org/abs/2311.06649v1 )

ライセンス: Link先を確認

Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych

(参考訳) ミームはコミュニケーションの現代的な形態であり、ミームテンプレートはベースセマンティクスを持ち、ソーシャルメディアに投稿する人によってカスタマイズできる。機械学習システムはミームに苦しむが、それはおそらく、ミームを理解するためのコンテキストが不十分なためである。ここでは、ミームの理解を支援するために、www.knowyourmeme.comにあるミームの知識ベースと情報を公開し、54,000以上の画像からなるnow your meme knowledge base (kymkb) と呼ぶ。 KYMKBには、人気のあるミームテンプレート、テンプレートの例、テンプレートの詳細情報が含まれている。 memeテンプレートは、以前のアプローチに欠けているコンテキストのモデル注入に使用できる、と仮定しています。仮説を検証するために、非パラメトリックなマジョリティベースの分類器を作成し、これをテンプレートラベルカウンタ(TLC)と呼ぶ。 TLCは微調整ベースラインよりも効果的か,あるいは競争力が高い。ミームテンプレートのパワーと知識ベースと手法の両方の価値を実証するために,5つのミーム分析タスクの文脈において,詳細な分類実験と探索データ分析を行う。

Memes are a modern form of communication and meme templates possess a base semantics that is customizable by whomever posts it on social media. Machine learning systems struggle with memes, which is likely due to such systems having insufficient context to understand memes, as there is more to memes than the obvious image and text. Here, to aid understanding of memes, we release a knowledge base of memes and information found on www.knowyourmeme.com, which we call the Know Your Meme Knowledge Base (KYMKB), composed of more than 54,000 images. The KYMKB includes popular meme templates, examples of each template, and detailed information about the template. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches. To test our hypothesis, we create a non-parametric majority-based classifier, which we call Template-Label Counter (TLC). We find TLC more effective than or competitive with fine-tuned baselines. To demonstrate the power of meme templates and the value of both our knowledge base and method, we conduct thorough classification experiments and exploratory data analysis in the context of five meme analysis tasks.

翻訳日:2023-11-14 17:30:16 公開日:2023-11-11

# ロバストテキスト分類:プロトタイプベースネットワークの解析

Robust Text Classification: Analyzing Prototype-Based Networks ( http://arxiv.org/abs/2311.06647v1 )

ライセンス: Link先を確認

Zhivar Sourati, Darshan Deshpande, Filip Ilievski, Kiril Gashteovski, Sascha Saralajew

(参考訳) 下流のアプリケーションは、正確で堅牢で解釈可能なテキスト分類モデルを必要とすることが多い。最先端言語モデルの精度は人間のパフォーマンスに近似するが、解釈可能ではなく、しばしばノイズの多いデータに性能の低下を示す。クラス(プロトタイプ)の原型的な例と類似性に基づいて例を分類するプロトタイプベースネットワーク(pbns)のファミリは、ネイティブに解釈可能であり、ノイズに頑健であることが示され、コンピュータビジョンタスクに広く使用される。本稿では,PBNのロバスト性がテキスト分類タスクに伝達されるかどうかを考察する。我々は、異なるバックボーンアーキテクチャ、バックボーンサイズ、objective関数を含むpbnを研究するためのモジュラーで包括的なフレームワークを設計する。評価プロトコルは,文字・単語・文レベルの摂動に対するモデルの堅牢性を評価する。 3つのベンチマーク実験により,現実的な摂動に直面したNLP分類タスクへのPBNのロバスト性を示す。さらに、pbnのロバスト性は、主にプロトタイプを解釈可能な目的関数によってサポートされ、データセットが複雑になるにつれて、バニラモデルよりもpbnのロバスト性がより顕著になる。

Downstream applications often require text classification models to be accurate, robust, and interpretable. While the accuracy of the stateof-the-art language models approximates human performance, they are not designed to be interpretable and often exhibit a drop in performance on noisy data. The family of PrototypeBased Networks (PBNs) that classify examples based on their similarity to prototypical examples of a class (prototypes) is natively interpretable and shown to be robust to noise, which enabled its wide usage for computer vision tasks. In this paper, we study whether the robustness properties of PBNs transfer to text classification tasks. We design a modular and comprehensive framework for studying PBNs, which includes different backbone architectures, backbone sizes, and objective functions. Our evaluation protocol assesses the robustness of models against character-, word-, and sentence-level perturbations. Our experiments on three benchmarks show that the robustness of PBNs transfers to NLP classification tasks facing realistic perturbations. Moreover, the robustness of PBNs is supported mostly by the objective function that keeps prototypes interpretable, while the robustness superiority of PBNs over vanilla models becomes more salient as datasets get more complex.

翻訳日:2023-11-14 17:29:54 公開日:2023-11-11

# 医用画像のフェデレーション学習におけるプライバシーリスク分析と緩和

Privacy Risks Analysis and Mitigation in Federated Learning for Medical Images ( http://arxiv.org/abs/2311.06643v1 )

ライセンス: Link先を確認

Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu

(参考訳) 医療画像分析の分野では, 患者データを保護し, プライバシ規制に従うための効果的な手法として, フェデレートラーニング(FL)が普及している。しかし、最近のいくつかの研究により、FLのデフォルト設定がプライバシー攻撃の下でプライベートトレーニングデータを漏洩させる可能性があることが明らかになった。したがって、FLのそのようなプライバシーリスクが医療領域にどの程度存在するのか、また「そのようなリスクを軽減するにはどうすればいいのか」はいまだに不明である。本稿では,まず,フェデレートラーニング(MedPFL)における医療データプライバシリスク分析と緩和のための総合的枠組みを提案し,プライバシリスクを分析し,私的医療データを保護するための効果的な緩和戦略を開発する。第2に、FLを用いて医療画像を処理する場合のプライバシーリスクについて、敵が容易にプライバシー攻撃を行い、医療画像を正確に再構築できることを示す。第3に、ランダムノイズを付加する防御アプローチは、flにおけるプライバシー攻撃から医療画像を保護するために常に効果的に機能するとは限らないことを示し、プライバシー保護のための医療データに関する独特で差し迫った課題を提起する。

Federated learning (FL) is gaining increasing popularity in the medical domain for analyzing medical images, which is considered an effective technique to safeguard sensitive patient data and comply with privacy regulations. However, several recent studies have revealed that the default settings of FL may leak private training data under privacy attacks. Thus, it is still unclear whether and to what extent such privacy risks of FL exist in the medical domain, and if so, ``how to mitigate such risks?''. In this paper, first, we propose a holistic framework for Medical data Privacy risk analysis and mitigation in Federated Learning (MedPFL) to analyze privacy risks and develop effective mitigation strategies in FL for protecting private medical data. Second, we demonstrate the substantial privacy risks of using FL to process medical images, where adversaries can easily perform privacy attacks to reconstruct private medical images accurately. Third, we show that the defense approach of adding random noises may not always work effectively to protect medical images against privacy attacks in FL, which poses unique and pressing challenges associated with medical data for privacy protection.

翻訳日:2023-11-14 17:29:31 公開日:2023-11-11

# 多次元反射問題に対するデータ駆動ルール

Data-driven rules for multidimensional reflection problems ( http://arxiv.org/abs/2311.06639v1 )

ライセンス: Link先を確認

S\"oren Christensen, Asbj{\o}rn Holk Thomsen and Lukas Trottner

(参考訳) 近年,モデル不確実性に直面した確率論的最適制御問題を解くためのデータ駆動アルゴリズムが研究の活発な領域となっている。しかし、特異制御と基底拡散ダイナミクスについては、解析はスカラーの場合に限られている。本稿では,反射型制御を持つ可逆拡散に対する多変量特異制御問題の研究により,このギャップを埋める。私たちの貢献は3倍です。まず, 制御問題を形状最適化問題として同等に特徴付けることができることを示すため, 長期平均コストをドメイン依存機能として明確に決定する。任意の拡散ダイナミクスにおいて、最適領域が強星型であると仮定すると、ポリトープ近似に基づく勾配降下アルゴリズムを提案し、コスト最小化領域を数値的に決定する。最後に,制御器に拡散力学が未知な場合のデータ駆動型解について検討する。確率過程の非パラメトリック統計学の手法を用いて、静的な後悔が非反射過程の不変密度の極小最適推定速度によって束縛される最適領域推定器を構築する。最も困難な状況では、プロセスを制御するために同時にダイナミクスを学ばなければならないとき、新たな探索・探索ジレンマを克服するためのエピソディック学習アルゴリズムを開発し、静的な後悔をベースラインとして考えると、時間単位あたりのサブリニア後悔の損失は1次元の場合と比較して自然秩序であることを示す。

Over the recent past data-driven algorithms for solving stochastic optimal control problems in face of model uncertainty have become an increasingly active area of research. However, for singular controls and underlying diffusion dynamics the analysis has so far been restricted to the scalar case. In this paper we fill this gap by studying a multivariate singular control problem for reversible diffusions with controls of reflection type. Our contributions are threefold. We first explicitly determine the long-run average costs as a domain-dependent functional, showing that the control problem can be equivalently characterized as a shape optimization problem. For given diffusion dynamics, assuming the optimal domain to be strongly star-shaped, we then propose a gradient descent algorithm based on polytope approximations to numerically determine a cost-minimizing domain. Finally, we investigate data-driven solutions when the diffusion dynamics are unknown to the controller. Using techniques from nonparametric statistics for stochastic processes, we construct an optimal domain estimator, whose static regret is bounded by the minimax optimal estimation rate of the unreflected process' invariant density. In the most challenging situation, when the dynamics must be learned simultaneously to controlling the process, we develop an episodic learning algorithm to overcome the emerging exploration-exploitation dilemma and show that given the static regret as a baseline, the loss in its sublinear regret per time unit is of natural order compared to the one-dimensional case.

翻訳日:2023-11-14 17:29:11 公開日:2023-11-11

# 皮膚病変スクリーニングのための自動自己監督学習

Automatized Self-Supervised Learning for Skin Lesion Screening ( http://arxiv.org/abs/2311.06691v1 )

ライセンス: Link先を確認

Vullnet Useini, Stephanie Tanadini-Lang, Quentin Lohmeyer, Mirko Meboldt, Nicolaus Andratschke, Ralph P. Braun and Javier Barranco Garc\'ia

(参考訳) 皮膚がんの死亡率が最も高いメラノーマの発生率は世界中で増加しており、皮膚科医にとって大きな課題となっている。悪性黒色腫の早期発見は患者の生存率の向上に不可欠であるが,現在の皮膚がんスクリーニング法であるアヒルスクリーニング(udスクリーニング)による疑わしい病変の同定は困難であり,色素性病変の専門知識を必要とすることが多い。これらの課題に対処し、患者の成果を改善するために、皮膚科医が広範囲の患者画像からUDを特定するのを支援する人工知能(AI)意思決定支援ツールを開発した。このツールは最先端のオブジェクト検出アルゴリズムを使用して、患者の画像からすべての皮膚病変を特定し、抽出する。このツールの性能を評価するために臨床検証を行った結果、顔色素性皮膚病変の専門家の多数が選択した皮膚病変について、トップ10のai同定udの平均感度は93%であった。研究によると、皮膚科医は自信を増し、AIによって補助された場合、トップ10のAI識別UDとの過半数の合意は100%改善した。このAI意思決定支援ツールの開発は、専門家の不足に対処し、リスクの高い患者がより早く相談を受け、AI支援スクリーニングの影響を理解することを目的としている。このツールの自動化は、皮膚科医が疑わしい病変を特定し、より客観的な評価を提供し、スクリーニングプロセスの主観性を低下させる。このプロジェクトの今後のステップは、組織学的に確認されたメラノーマ症例を含むようにデータセットを拡大することと、ツールの信頼性を強化し、現実のコンサルテーションに適応するための臨床検証参加者の数を増やすことである。

The incidence rates of melanoma, the deadliest form of skin cancer, have been increasing steadily worldwide, presenting a significant challenge to dermatologists. Early detection of melanoma is crucial for improving patient survival rates, but identifying suspicious lesions through ugly duckling (UD) screening, the current method used for skin cancer screening, can be challenging and often requires expertise in pigmented lesions. To address these challenges and improve patient outcomes, an artificial intelligence (AI) decision support tool was developed to assist dermatologists in identifying UD from wide-field patient images. The tool uses a state-of-the-art object detection algorithm to identify and extract all skin lesions from patient images, which are then sorted by suspiciousness using a self-supervised AI algorithm. A clinical validation study was conducted to evaluate the tool's performance, which demonstrated an average sensitivity of 93% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that dermatologists confidence increased, and the average majority agreement with the top-10 AI-identified UDs improved to 100% when assisted by AI. The development of this AI decision support tool aims to address the shortage of specialists, enable at-risk patients to receive faster consultations and understand the impact of AI-assisted screening. The tool's automation can assist dermatologists in identifying suspicious lesions and provide a more objective assessment, reducing subjectivity in the screening process. The future steps for this project include expanding the dataset to include histologically confirmed melanoma cases and increasing the number of participants for clinical validation to strengthen the tool's reliability and adapt it for real-world consultation.

翻訳日:2023-11-14 17:16:34 公開日:2023-11-11

# Neuro-GPT:脳波の基礎モデルの開発

Neuro-GPT: Developing A Foundation Model for EEG ( http://arxiv.org/abs/2311.03764v3 )

ライセンス: Link先を確認

Wenhui Cui, Woojae Jeong, Philipp Th\"olke, Takfarinas Medani, Karim Jerbi, Anand A. Joshi, Richard M. Leahy

(参考訳) 脳-コンピューターインタフェース(bci)タスクのための脳波(eeg)データの不足と不均一性に対処するため、大規模な公開データセットのパワーを活用するために、脳波エンコーダとgptモデルからなる基礎モデルであるneuro-gptを提案する。基礎モデルは、マスクされた脳波セグメントの再構築方法を学ぶ自己教師付きタスクを使用して、大規模データセット上で事前訓練される。次に,モータ画像分類タスクのモデルを微調整し,低データ方式(9項目)の性能評価を行う。基礎モデルの適用は,スクラッチからトレーニングしたモデルと比較して,分類性能を著しく向上できることを実証し,基礎モデルの一般化可能性と,脳波におけるデータ不足や多様性の課題に対処する能力を示す。

To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model. The foundation model is pre-trained on a large-scale data set using a self-supervised task that learns how to reconstruct masked EEG segments. We then fine-tune the model on a Motor Imagery Classification task to validate its performance in a low-data regime (9 subjects). Our experiments demonstrate that applying a foundation model can significantly improve classification performance compared to a model trained from scratch, which provides evidence for the generalizability of the foundation model and its ability to address challenges of data scarcity and heterogeneity in EEG.

翻訳日:2023-11-14 11:07:57 公開日:2023-11-11

PDF登録状況（公開日: 20231111）