# 大規模言語モデルは計算社会科学を変えることができるか? Can Large Language Models Transform Computational Social Science? ( http://arxiv.org/abs/2305.03514v1 ) ライセンス: Link先を確認 | Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang | (参考訳) ChatGPTのような大規模言語モデル(LLM)は、(トレーニングデータなしで)ゼロショットで多くの言語処理タスクを成功させることができる。
この作業は LLM を CSS ツールとして使用するためのロードマップを提供する。
要約すると、LLMはコストを大幅に削減し、人間と共同で社会科学分析の効率を高めることができる。 Large Language Models (LLMs) like ChatGPT are capable of successfully performing many language processing tasks zero-shot (without the need for training data). If this capacity also applies to the coding of social phenomena like persuasiveness and political ideology, then LLMs could effectively transform Computational Social Science (CSS). This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 24 representative CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that today's LLMs can radically augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the hidden meaning behind text). In summary, LLMs can significantly reduce costs and increase efficiency of social science analysis in partnership with humans. | 翻訳日:2023-05-14 21:06:17 公開日:2023-04-12 |
# 重度機関車障害者における脳波信号の色別分類 Color-based classification of EEG Signals for people with the severe locomotive disorder ( http://arxiv.org/abs/2304.11068v1 ) ライセンス: Link先を確認 | Ankit Shrestha, Bikram Adhikari | (参考訳) 脳内のニューロンは電気信号を発生させ、これらの電気信号を集合的に発射すると脳波が発生する。
本稿では,NeuroSky Mindwaveヘッドセット(単一電極脳波センサ)からの生脳波信号を,注目に基づくディープラーニングネットワークで分類した。
2色の分類には93.5\%の精度が得られ、4つの信号の分類には65.75\%の精度が得られた。 The neurons in the brain produces electric signals and a collective firing of these electric signals gives rise to brainwaves. These brainwave signals are captured using EEG (Electroencephalogram) devices as micro voltages. These sequence of signals captured by EEG sensors have embedded features in them that can be used for classification. The signals can be used as an alternative input for people suffering from severe locomotive disorder.Classification of different colors can be mapped for many functions like directional movement. In this paper, raw EEG signals from NeuroSky Mindwave headset (a single electrode EEG sensor) have been classified with an attention based Deep Learning Network. Attention based LSTM Networks have been implemented for classification of two different colors and four different colors. An accuracy of 93.5\% was obtained for classification of two colors and an accuracy of 65.75\% was obtained for classifcation of four signals using the mentioned attention based LSTM network. | 翻訳日:2023-04-30 08:06:33 公開日:2023-04-12 |
# 肯定的AI:ウェルビーイング・アライン・人工知能設計の鍵となる課題 Positive AI: Key Challenges for Designing Wellbeing-aligned Artificial Intelligence ( http://arxiv.org/abs/2304.12241v1 ) ライセンス: Link先を確認 | Willem van der Maden, Derek Lomas, Paul Hekkert | (参考訳) AI(Artificial Intelligence:人工知能)は、私たちが知っているように世界を変えつつある。それは、この技術を「良い」ために使うのが現在の世代次第であることを意味している。私たちは、AIをうまく活用することは、意識的な生物の幸福に合わせることを構成すると論じている。
1) 幸福に対するシステムの影響について理解を深めるべきである。
2) システムは意図的に幸福を促進・維持するように設計されるべきである。3) 肯定的なaiは、世界をより良いものにし、利益を上げることができると信じることから始まる。 Artificial Intelligence (AI) is transforming the world as we know it, implying that it is up to the current generation to use the technology for ''good.'' We argue that making good use of AI constitutes aligning it with the wellbeing of conscious creatures. However, designing wellbeing-aligned AI systems is difficult. In this article, we investigate a total of twelve challenges that can be categorized as related to a lack of knowledge (how to contextualize, operationalize, optimize, and design AI for wellbeing), and lack of motivation (designing AI for wellbeing is seen as risky and unrewarding). Our discussion can be summarized into three key takeaways: 1) our understanding of the impact of systems on wellbeing should be advanced, 2) systems should be designed to promote and sustain wellbeing intentionally, and 3), above all, Positive AI starts with believing that we can change the world for the better and that it is profitable. | 翻訳日:2023-04-30 07:40:04 公開日:2023-04-12 |
# 表情認識のための遺伝的アルゴリズムを用いたニューラルアーキテクチャ探索 Neural Architecture Search Using Genetic Algorithm for Facial Expression Recognition ( http://arxiv.org/abs/2304.12194v1 ) ライセンス: Link先を確認 | Shuchao Deng, Yanan Sun, and Edgar Galvan | (参考訳) 表情は、人間の感情状態や意図を表現するための最も強力で自然で普遍的な信号の1つである。
ニューラルアーキテクチャサーチ(英: Neural Architecture Search、NAS)は、近年出版された多くの科学的研究によって、近年に達成された印象的な成果により、関心が高まりつつある分野である。
実験の結果,提案アルゴリズムはCK+およびFERGデータセット上で最もよく知られた結果とJSFFEデータセット上での競合結果が得られることが示された。 Facial expression is one of the most powerful, natural, and universal signals for human beings to express emotional states and intentions. Thus, it is evident the importance of correct and innovative facial expression recognition (FER) approaches in Artificial Intelligence. The current common practice for FER is to correctly design convolutional neural networks' architectures (CNNs) using human expertise. However, finding a well-performing architecture is often a very tedious and error-prone process for deep learning researchers. Neural architecture search (NAS) is an area of growing interest as demonstrated by the large number of scientific works published in recent years thanks to the impressive results achieved in recent years. We propose a genetic algorithm approach that uses an ingenious encoding-decoding mechanism that allows to automatically evolve CNNs on FER tasks attaining high accuracy classification rates. The experimental results demonstrate that the proposed algorithm achieves the best-known results on the CK+ and FERG datasets as well as competitive results on the JAFFE dataset. | 翻訳日:2023-04-30 07:38:27 公開日:2023-04-12 |
# 分子グラフ構造共設計のための同変生成枠組み An Equivariant Generative Framework for Molecular Graph-Structure Co-Design ( http://arxiv.org/abs/2304.12436v1 ) ライセンス: Link先を確認 | Zaixi Zhang, Qi Liu, Chee-Kong Lee, Chang-Yu Hsieh, Enhong Chen | (参考訳) 望ましい物理化学的性質と機能を持つ分子を設計することは、化学、物質科学、薬物発見における長年の課題である。
近年, 機械学習に基づく生成モデルは, 分子設計における有望なアプローチとして出現している。
ここでは、Roto-translation equivariant generative framework for \underline{Mol}ecular graph-structure \underline{Co-de}signを示す。
特に、molcodeは一貫して有効な (99.95$\%$ valid) と多様な (98.75$\%$ uniqueness) 分子グラフ/構造を望ましい性質で生成するだけでなく、標的タンパク質に高い親和性 (61.8$\%$ high-affinity ratio) を持つ薬様分子を生成する。
分子設計における2次元トポロジーと3次元幾何は本質的に相補的な情報を含み,機械学習に基づく分子表現と生成に関する新たな知見を提供する。 Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for \emph{de novo} molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D topology and 3D geometry information and fail to effectively learn the structure-property relationship for molecule design. Here we present MolCode, a roto-translation equivariant generative framework for \underline{Mol}ecular graph-structure \underline{Co-de}sign. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Extensive experimental results show that MolCode outperforms previous methods on a series of challenging tasks including \emph{de novo} molecule design, targeted molecule discovery, and structure-based drug design. Particularly, MolCode not only consistently generates valid (99.95$\%$ Validity) and diverse (98.75$\%$ Uniqueness) molecular graphs/structures with desirable properties, but also generate drug-like molecules with high affinity to target proteins (61.8$\%$ high-affinity ratio), which demonstrates MolCode's potential applications in material design and drug discovery. Our extensive investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design, and provide new insights into machine learning-based molecule representation and generation. | 翻訳日:2023-04-30 07:29:35 公開日:2023-04-12 |
# ギャップをブリッジする: 深いニューラルシーケンスモデルを説明するための解釈可能な概念としてのギャップイベント Bridging the Gap: Gaze Events as Interpretable Concepts to Explain Deep Neural Sequence Models ( http://arxiv.org/abs/2304.13536v1 ) ライセンス: Link先を確認 | Daniel G. Krakowczyk, Paul Prasse, David R. Reich, Sebastian Lapuschkin, Tobias Scheffer, Lena A. J\"ager | (参考訳) 眼追跡データのためのxaiの最近の研究は、眼科生体認証タスクのための深層神経シーケンスモデルの出力を説明するための特徴帰属法の適合性を評価している。
さらに,サスカディック振幅や固定分散などの事象特性が概念的影響に及ぼす影響について検討した。 Recent work in XAI for eye tracking data has evaluated the suitability of feature attribution methods to explain the output of deep neural sequence models for the task of oculomotric biometric identification. These methods provide saliency maps to highlight important input features of a specific eye gaze sequence. However, to date, its localization analysis has been lacking a quantitative approach across entire datasets. In this work, we employ established gaze event detection algorithms for fixations and saccades and quantitatively evaluate the impact of these events by determining their concept influence. Input features that belong to saccades are shown to be substantially more important than features that belong to fixations. By dissecting saccade events into sub-events, we are able to show that gaze samples that are close to the saccadic peak velocity are most influential. We further investigate the effect of event properties like saccadic amplitude or fixational dispersion on the resulting concept influence. | 翻訳日:2023-04-30 07:20:04 公開日:2023-04-12 |
# SmartChoices: 学習した実装によるソフトウェアの拡張 SmartChoices: Augmenting Software with Learned Implementations ( http://arxiv.org/abs/2304.13033v1 ) ライセンス: Link先を確認 | Daniel Golovin, Gabor Bartok, Eric Chen, Emily Donahue, Tzu-Kuo Huang, Efi Kokiopoulou, Ruoyan Qin, Nikhil Sarda, Justin Sybrandt, Vincent Tjeng | (参考訳) 私たちは機械学習の黄金時代に生きている。
本稿では,大規模産業システムにおけるSmartChoiceを用いた設計哲学と事例研究について説明する。 We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems. | 翻訳日:2023-04-30 07:18:58 公開日:2023-04-12 |
# 2次元セマンティクスセグメンテーションのためのニューラルフィールドコンディショニング戦略 Neural Field Conditioning Strategies for 2D Semantic Segmentation ( http://arxiv.org/abs/2304.14371v1 ) ライセンス: Link先を確認 | Martin Gromniak, Sven Magg and Stefan Wermter | (参考訳) ニューラルネットワークは、座標を所望の信号にマッピングするニューラルネットワークである。
その結果, コンディショニング戦略とコンディショニング戦略では, 性能に有意差が見られた。
さらに,CNNに基づくセマンティックセグメンテーションのためのデコーダと競合し,クロスアテンションによるコンディショニングが最適であることを示す。 Neural fields are neural networks which map coordinates to a desired signal. When a neural field should jointly model multiple signals, and not memorize only one, it needs to be conditioned on a latent code which describes the signal at hand. Despite being an important aspect, there has been little research on conditioning strategies for neural fields. In this work, we explore the use of neural fields as decoders for 2D semantic segmentation. For this task, we compare three conditioning methods, simple concatenation of the latent code, Feature Wise Linear Modulation (FiLM), and Cross-Attention, in conjunction with latent codes which either describe the full image or only a local region of the image. Our results show a considerable difference in performance between the examined conditioning strategies. Furthermore, we show that conditioning via Cross-Attention achieves the best results and is competitive with a CNN-based decoder for semantic segmentation. | 翻訳日:2023-04-30 07:11:46 公開日:2023-04-12 |
# 自動コメント運転のためのテキスト説明 Textual Explanations for Automated Commentary Driving ( http://arxiv.org/abs/2304.08178v1 ) ライセンス: Link先を確認 | Marc Alexander K\"uhn, Daniel Omeiza, Lars Kunze | (参考訳) ディープラーニングに基づく車両制御装置の予測のための自然言語説明の提供は、透明性と監査の容易さを高めるために重要である。
本研究は,新たなSense-Assess--eXplain (SAX) 上で,最先端(SOTA)予測と説明モデルを徹底的に評価し,(ベンチマークとして)検証するものである。
したがって、我々の研究は将来の説明可能な自動運転車の実現に寄与する。 The provision of natural language explanations for the predictions of deep-learning-based vehicle controllers is critical as it enhances transparency and easy audit. In this work, a state-of-the-art (SOTA) prediction and explanation model is thoroughly evaluated and validated (as a benchmark) on the new Sense--Assess--eXplain (SAX). Additionally, we developed a new explainer model that improved over the baseline architecture in two ways: (i) an integration of part of speech prediction and (ii) an introduction of special token penalties. On the BLEU metric, our explanation generation technique outperformed SOTA by a factor of 7.7 when applied on the BDD-X dataset. The description generation technique is also improved by a factor of 1.3. Hence, our work contributes to the realisation of future explainable autonomous vehicles. | 翻訳日:2023-04-23 04:25:49 公開日:2023-04-12 |
# 音楽ストリーミングサービスにおけるプレイリスト自動継続のためのスケーラブルフレームワーク A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services ( http://arxiv.org/abs/2304.09061v1 ) ライセンス: Link先を確認 | Walid Bendada and Guillaume Salha-Galvan and Thomas Bouab\c{c}a and Tristan Cazenave | (参考訳) 音楽ストリーミングサービスは、ユーザーがこれらのサービスで作ったプレイリストを拡張するために曲を推薦することが多い。
しかし、音楽的特徴を保ちながらプレイリストを拡張し、ユーザの好みに合うようにすることは難しい課題であり、一般にはAutomatic Playlist Continuation (APC)と呼ばれる。
APCの最大の公開データセットであるSpotifyのMillion Playlist Dataset(MPD)の詳細な実験検証を通じて、このフレームワークの妥当性を実証する。
我々は,本サービスにおける大規模オンラインA/Bテストの結果を報告し,そのような実世界のアプリケーションにおける我々のアプローチの実践的影響を強調した。 Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend in real-time and among large catalogs with millions of candidates, recent research on APC mainly focused on models with few scalability guarantees and evaluated on relatively small datasets. In this paper, we introduce a general framework to build scalable yet effective APC models for large-scale applications. Based on a represent-then-aggregate strategy, it ensures scalability by design while remaining flexible enough to incorporate a wide range of representation learning and sequence modeling techniques, e.g., based on Transformers. We demonstrate the relevance of this framework through in-depth experimental validation on Spotify's Million Playlist Dataset (MPD), the largest public dataset for APC. We also describe how, in 2022, we successfully leveraged this framework to improve APC in production on Deezer. We report results from a large-scale online A/B test on this service, emphasizing the practical impact of our approach in such a real-world application. | 翻訳日:2023-04-23 04:16:30 公開日:2023-04-12 |
# 技術分析とML/DLモデルを用いた取引の特定 Identifying Trades Using Technical Analysis and ML/DL Models ( http://arxiv.org/abs/2304.09936v1 ) ライセンス: Link先を確認 | Aayush Shah, Mann Doshi, Meet Parekh, Nirmit Deliwala, Prof. Pramila M. Chawan | (参考訳) 株式市場の価格予測の重要性は過大評価できない。
ディープラーニングは株価を正確に予測する上で有望だが、この分野ではまだまだ多くの研究が必要である。 The importance of predicting stock market prices cannot be overstated. It is a pivotal task for investors and financial institutions as it enables them to make informed investment decisions, manage risks, and ensure the stability of the financial system. Accurate stock market predictions can help investors maximize their returns and minimize their losses, while financial institutions can use this information to develop effective risk management policies. However, stock market prediction is a challenging task due to the complex nature of the stock market and the multitude of factors that can affect stock prices. As a result, advanced technologies such as deep learning are being increasingly utilized to analyze vast amounts of data and provide valuable insights into the behavior of the stock market. While deep learning has shown promise in accurately predicting stock prices, there is still much research to be done in this area. | 翻訳日:2023-04-23 04:07:52 公開日:2023-04-12 |
# IoTベースのウェアラブル: 包括的な調査 IoT-based Wearables: A comprehensive Survey ( http://arxiv.org/abs/2304.09861v1 ) ライセンス: Link先を確認 | Yahuza Bello, Emanuel Figetakis | (参考訳) IoTベースのサービスを通じて、企業がかなりの成長を遂げている。
さらに,ウェアラブルの普及に直面する課題と今後の研究方向性について述べる。 A substantial amount of growth is being achieved by businesses through IoT-based services. The emergent of small electronic devices capable of computing, which are commonly known as wearables in IoT domain has proven to have huge impact in people's life. Theses wearables are capable of collecting vital information about a person's activities and behaviours regularly. This makes them suitable for many applications in health monitoring, fitness, sports, education and some industry related applications. To this end, in this paper, we aim to provide a general review on IoT-based wearables, the sensors adopted for several categorized wearables, the communication technologies adopted and the most widely adopted data processing techniques for wearables. Furthermore, we present the challenges faced for wide adoption of wearables and the future research directions. | 翻訳日:2023-04-23 04:07:25 公開日:2023-04-12 |
# nrts: 新生児蘇生シミュレーションシナリオにおける複数学際チームのデータ記録,送信,評価を支援するクライアントサーバアーキテクチャ NRTS: A Client-Server architecture for supporting data recording, transmission and evaluation of multidisciplinary teams during the neonatal resuscitation simulation scenario ( http://arxiv.org/abs/2304.09860v1 ) ライセンス: Link先を確認 | Manuel Striani | (参考訳) 本報告では,新生児蘇生訓練シミュレータ(nrts)について述べる。これは,新生児蘇生のための高忠実度シミュレーションコース中に医療専門家がデータを入力,送信,記録することを支援するandroidモバイルアプリである。
このモバイルアプリは、casale monferrato小児病院(イタリア)の"neonatal intensive care unit"(nicu)から、piemonte orientale大学(イタリア)科学技術イノベーション科(disit)のサーバーに記録されたすべてのデータを自動的に送信することができる。
最後に、医療インストラクターは、シミュレーションシナリオに関わる複数の学際チームの評価のために、デブリーフィングフェーズで使用されるかもしれないシミュレーション演習の統計を見ることができる。 In this technical report, we describe Neonatal Resuscitation Training Simulator (NRTS), an Android mobile app designed to support medical experts to input, transmit and record data during a High-Fidelity Simulation course for neonatal resuscitation. This mobile app allows one to automatically send all the recorded data from "Neonatal Intensive Care Unit" (NICU) of Casale Monferrato Children's Hospital, (Italy) to a server located at the Department of Science and Technological Innovation (DiSIT), University of Piemonte Orientale (Italy). Finally, the medical instructor can view statistics on a simulation exercise that may be used during the de-briefing phase for the evaluation of multidisciplinary teams involved in the simulation scenarios. | 翻訳日:2023-04-23 04:07:16 公開日:2023-04-12 |
# 静的および動的学習可能なグラフ畳み込みネットワークによる時空間海面温度予測 Towards Spatio-temporal Sea Surface Temperature Forecasting via Static and Dynamic Learnable Personalized Graph Convolution Network ( http://arxiv.org/abs/2304.09290v1 ) ライセンス: Link先を確認 | Xiaohan Li, Gaowei Zhang, Kai Huang, Zhaofeng He | (参考訳) 海面温度(SST)は、局地的な気候や地球規模の気候を形作り、生態系に深く影響を及ぼす主要な要因であるため、地球の大気にとって非常に重要である。
実SSTデータセットに関する実験は,提案手法の予測課題における最先端性能を示すものである。 Sea surface temperature (SST) is uniquely important to the Earth's atmosphere since its dynamics are a major force in shaping local and global climate and profoundly affect our ecosystems. Accurate forecasting of SST brings significant economic and social implications, for example, better preparation for extreme weather such as severe droughts or tropical cyclones months ahead. However, such a task faces unique challenges due to the intrinsic complexity and uncertainty of ocean systems. Recently, deep learning techniques, such as graphical neural networks (GNN), have been applied to address this task. Even though these methods have some success, they frequently have serious drawbacks when it comes to investigating dynamic spatiotemporal dependencies between signals. To solve this problem, this paper proposes a novel static and dynamic learnable personalized graph convolution network (SD-LPGC). Specifically, two graph learning layers are first constructed to respectively model the stable long-term and short-term evolutionary patterns hidden in the multivariate SST signals. Then, a learnable personalized convolution layer is designed to fuse this information. Our experiments on real SST datasets demonstrate the state-of-the-art performances of the proposed approach on the forecasting task. | 翻訳日:2023-04-23 04:06:32 公開日:2023-04-12 |
# ローゼン・モース型ポテンシャルに対するシュリンガー方程式の応用による再検討 The Schr\"odinger equation for the Rosen-Morse type potential revisited with applications ( http://arxiv.org/abs/2304.06730v1 ) ライセンス: Link先を確認 | Guillermo Gordillo-N\'u\~nez, Renato Alvarez-Nodarse, Niurka R. Quintero | (参考訳) ローゼン・モース型ポテンシャルの時間独立なシュル=オディンガー方程式を厳密に解く。
また、摂動下でのキンクのダイナミクスの記述や反キンクとの相互作用に有用な固有関数の集合によって満たされる直交性と完全性の関係も導出する。 We rigorously solve the time-independent Schr\"odinger equation for the Rosen-Morse type potential. By using the Nikiforov-Uvarov method, we obtain, in a systematic way, the complete solution of such equation, which includes the so-called bound states (square-integrable solutions) associated with the discrete spectrum, as well as unbound states region (bounded but not necessarily square-integrable solutions) related to the continuous part of the spectrum. The resolution of this problem is used to show that the kinks of the non-linear Klein-Gordon equation with $\varphi^{2p+2}$ type potentials are stable. We also derive the orthogonality and completeness relations satisfied by the set of eigenfunctions which are useful in the description of the dynamics of kinks under perturbations or interacting with antikinks. | 翻訳日:2023-04-17 15:49:08 公開日:2023-04-12 |
# 認知のメタ学習モデル Meta-Learned Models of Cognition ( http://arxiv.org/abs/2304.06729v1 ) ライセンス: Link先を確認 | Marcel Binz, Ishita Dasgupta, Akshay Jagadish, Matthew Botvinick, Jane X. Wang, Eric Schulz | (参考訳) メタラーニングは、手で設計するのではなく、環境との反復的なインタラクションを通じて学習アルゴリズムを学ぶためのフレームワークである。
要約すると、メタラーニングは合理的分析の範囲を大きく広げ、より一般的に認知理論を広めるものである。 Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize previous work in this field and establish such a research program. We rely on three key pillars to accomplish this goal. We first point out that meta-learning can be used to construct Bayes-optimal learning algorithms. This result not only implies that any behavioral phenomenon that can be explained by a Bayesian model can also be explained by a meta-learned model but also allows us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional Bayesian methods. In particular, we argue that meta-learning can be applied to situations where Bayesian inference is impossible and that it enables us to make rational models of cognition more realistic, either by incorporating limited computational resources or neuroscientific knowledge. Finally, we reexamine prior studies from psychology and neuroscience that have applied meta-learning and put them into the context of these new insights. In summary, our work highlights that meta-learning considerably extends the scope of rational analysis and thereby of cognitive theories more generally. | 翻訳日:2023-04-17 15:48:51 公開日:2023-04-12 |
# 再識別リスクの測定 Measuring Re-identification Risk ( http://arxiv.org/abs/2304.07210v1 ) ライセンス: Link先を確認 | CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong | (参考訳) コンパクトなユーザ表現(埋め込みなど)はパーソナライズサービスのバックボーンを形成する。
そこで我々は,Topics APIにおける再識別リスクを推定するために使用する,優れた攻撃アルゴリズムを示すことによって,理論的境界を補完する。
この研究は、再識別リスクという厳密で解釈可能な概念と、それを実世界のアプリケーションに伝えるのに使えるフレームワークを提供すると信じています。 Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications. | 翻訳日:2023-04-17 13:09:43 公開日:2023-04-12 |
# 脆弱性検出のためのChatGPTモデルの評価 Evaluation of ChatGPT Model for Vulnerability Detection ( http://arxiv.org/abs/2304.07232v1 ) ライセンス: Link先を確認 | Anton Cheshkov, Pavel Zadorozhny, Rodion Levichev | (参考訳) 本稿では,コード中の脆弱性検出のためのChatGPTモデルとGPT-3モデルの性能評価を行った。
しかし、ChatGPTモデルは、コード脆弱性検出のためのバイナリとマルチラベルの分類タスクに対してダミー分類器より優れていることがわかった。 In this technical report, we evaluated the performance of the ChatGPT and GPT-3 models for the task of vulnerability detection in code. Our evaluation was conducted on our real-world dataset, using binary and multi-label classification tasks on CWE vulnerabilities. We decided to evaluate the model because it has shown good performance on other code-based tasks, such as solving programming challenges and understanding code at a high level. However, we found that the ChatGPT model performed no better than a dummy classifier for both binary and multi-label classification tasks for code vulnerability detection. | 翻訳日:2023-04-17 12:58:34 公開日:2023-04-12 |
# ロボットスキルの実証から学ぶ Continual Learning from Demonstration of Robotics Skills ( http://arxiv.org/abs/2202.06843v4 ) ライセンス: Link先を確認 | Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodr\'iguez-S\'anchez and Justus Piater | (参考訳) ロボットにモーションスキルを教える方法は、一度に1つのスキルのトレーニングに集中する。
私たちのコードは、新たに収集したデータセットとともに、https://github.com/sayantanauddy/clfd.comで利用可能です。 Methods for teaching motion skills to robots focus on training for a single skill at a time. Robots capable of learning from demonstration can considerably benefit from the added ability to learn new movement skills without forgetting what was learned in the past. To this end, we propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers. We empirically demonstrate the effectiveness of this approach in remembering long sequences of trajectory learning tasks without the need to store any data from past demonstrations. Our results show that hypernetworks outperform other state-of-the-art continual learning approaches for learning from demonstration. In our experiments, we use the popular LASA benchmark, and two new datasets of kinesthetic demonstrations collected with a real robot that we introduce in this paper called the HelloWorld and RoboTasks datasets. We evaluate our approach on a physical robot and demonstrate its effectiveness in learning real-world robotic tasks involving changing positions as well as orientations. We report both trajectory error metrics and continual learning metrics, and we propose two new continual learning metrics. Our code, along with the newly collected datasets, is available at https://github.com/sayantanauddy/clfd. | 翻訳日:2023-04-14 20:53:40 公開日:2023-04-12 |
# 基本量子アルゴリズム Basic Quantum Algorithms ( http://arxiv.org/abs/2201.10574v6 ) ライセンス: Link先を確認 | Renato Portugal | (参考訳) 量子コンピューティングは急速に進化しており、理論の基礎を再検討し、書き直し、更新せざるを得ない。
回路モデルに重点を置いて、この研究はこれらの顕著なアルゴリズムの詳細な記述を提供する。 Quantum computing is evolving so rapidly that it forces us to revisit, rewrite, and update the foundations of the theory. Basic Quantum Algorithms revisits the earliest quantum algorithms. The journey began in 1985 with Deutsch attempting to evaluate a function at two domain points simultaneously. Then, in 1992, Deutsch and Jozsa created a quantum algorithm that determines whether a Boolean function is constant or balanced. The following year, Bernstein and Vazirani realized that the same algorithm could be used to identify a specific Boolean function within a set of linear Boolean functions. In 1994, Simon introduced a novel quantum algorithm that determined whether a function was one-to-one or two-to-one exponentially faster than any classical algorithm for the same problem. That same year, Shor developed two groundbreaking quantum algorithms for integer factoring and calculating discrete logarithms, posing a threat to the widely used cryptography methods. In 1995, Kitaev proposed an alternative version of Shor's algorithms that proved valuable in numerous other applications. The following year, Grover devised a quantum search algorithm that was quadratically faster than its classical equivalent. With an emphasis on the circuit model, this work provides a detailed description of all these remarkable algorithms. | 翻訳日:2023-04-14 20:53:20 公開日:2023-04-12 |
# GoSafeOpt: 動的システムのグローバル最適化のためのスケーラブルな安全な探索 GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems ( http://arxiv.org/abs/2201.09562v4 ) ライセンス: Link先を確認 | Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann | (参考訳) 物理システム上で最適な制御ポリシーを学習することは、単一障害でさえ高価なハードウェア損傷を引き起こす可能性があるため、難しい。
GoSafeOptは、GoSafeの禁止となるロボットアーム上で、モデルフリーの安全な学習方法よりも優れていることを示す。 Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe. | 翻訳日:2023-04-14 20:53:04 公開日:2023-04-12 |
# 一元変換下における混合対称状態の最大絡み合い Maximum entanglement of mixed symmetric states under unitary transformations ( http://arxiv.org/abs/2112.05102v2 ) ライセンス: Link先を確認 | E. Serrano-Ens\'astiga and J. Martin | (参考訳) 状態置換不変性が課されるとき、2および3量子ビット系の大域的ユニタリ変換によって生成される最大絡み合いについて検討する。
さらに, SAS 状態のみを含む球の半径に対して, 数値計算結果から強みが示唆される上限を導出する。 We study the maximum entanglement that can be produced by a global unitary transformation for systems of two and three qubits when state permutation invariance is imposed. This constraint of permutation symmetry appears naturally in the context of bosonic or collective spin systems. We also study the symmetric states that remain separable after any global unitary transformation, called symmetric absolutely separable states (SAS), or absolutely classical for spin states. In particular, we determine the maximal radius of a ball of SAS states around the maximally mixed state in the symmetric sector, and the minimal radius of a ball that contains the set of SAS states. As an application of our results, we also analyse the temperature dependence of the maximum entanglement that can be obtained from the thermal state of a spin-1 system with a spin-squeezing Hamiltonian. For the symmetric three-qubit case, we conjecture a 3-parameter family of states that achieves the maximum negativity in the unitary orbit of any mixed state. In addition, we derive upper bounds, which our numerical results suggest are tight, on the radii of balls containing only/all SAS states. | 翻訳日:2023-04-14 20:51:41 公開日:2023-04-12 |
# 興味ある人が正直を断念する: 連合学習はプライベートではない When the Curious Abandon Honesty: Federated Learning Is Not Private ( http://arxiv.org/abs/2112.02918v2 ) ライセンス: Link先を確認 | Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, Nicolas Papernot | (参考訳) フェデレートラーニング(FL)では、データは機械学習モデルを共同でトレーニングしているときに個人デバイスを離れない。
例えば、high-dimensional vision dataset imagenetでは、トレーニングデータポイントの50%以上を、最大100データポイントのミニバッチから完全に再構築しています。 In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients, parameters, or other model updates, with a central party (e.g., a company) coordinating the training. Because data never "leaves" personal devices, FL is often presented as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive, honest-but-curious attacker observing gradients can reconstruct data of individual users contributing to the protocol. In this work, we show a novel data reconstruction attack which allows an active and dishonest central party to efficiently extract user data from the received gradients. While prior work on data reconstruction in FL relies on solving computationally expensive optimization problems or on making easily detectable modifications to the shared model's architecture or parameters, in our attack the central party makes inconspicuous changes to the shared model's weights before sending them out to the users. We call the modified weights of our attack trap weights. Our active attacker is able to recover user data perfectly, i.e., with zero error, even when this data stems from the same class. Recovery comes with near-zero costs: the attack requires no complex optimization objectives. Instead, our attacker exploits inherent data leakage from model gradients and simply amplifies this effect by maliciously altering the weights of the shared model through the trap weights. These specificities enable our attack to scale to fully-connected and convolutional deep neural networks trained with large mini-batches of data. For example, for the high-dimensional vision dataset ImageNet, we perfectly reconstruct more than 50% of the training data points from mini-batches as large as 100 data points. | 翻訳日:2023-04-14 20:51:24 公開日:2023-04-12 |
# 多モードキャビティ光機械システムにおける双方向光非相反性 Bidirectional optical non-reciprocity in a multi-mode cavity optomechanical system ( http://arxiv.org/abs/2109.01337v3 ) ライセンス: Link先を確認 | Muhib Ullah, Xihua Yang, Li-Gang Wang | (参考訳) 光非相反性 (optical non-reciprocity) は、時間反転対称性の破れに一方向の光学場の流れをもたらす現象である。
これにより、全光ダイオード、光トランジスタ、光スイッチなど、従来とは異なる方法で光子をルーティングする新しいデバイスを実現することができる。 Optical non-reciprocity, a phenomenon that allows unidirectional flow of optical field is pivoted on the time reversal symmetry breaking. The symmetry breaking happens in the cavity optomechanical system (COS) due to non uniform radiation pressure as a result of light-matter interaction, and is crucial in building non-reciprocal optical devices. In our proposed COS, we study the non-reciprocal transport of optical signals across two ports via three optical modes optomechanically coupled to the mechanical excitations of two nano-mechanical resonators (NMRs) under the influence of strong classical drive fields and weak probe fields. By tuning different system parameters, we discover the conversion of reciprocal to non-reciprocal signal transmission. We reveal perfect nonreciprocal transmission of output fields when the effective cavity detuning parameters are near resonant to the NMRs' frequencies. The unidirectional non-reciprocal signal transport is robust to the optomechanical coupling parameters at resonance conditions. Moreover, the cavities' photon loss rates play an inevitable role in the unidirectional flow of signal across the two ports. Bidirectional transmission can be fully controlled by the phase changes associated with the incoming probe and drive fields via two ports. Our scheme may provide a foundation for the compact non-reciprocal communication and quantum information processing, thus enabling new devices that route photons in unconventional ways such as all-optical diodes, optical transistors and optical switches. | 翻訳日:2023-04-14 20:50:38 公開日:2023-04-12 |
# 状態の平衡トラニケーションと勾配共分散による非線形系のモデル削減 Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance ( http://arxiv.org/abs/2207.14387v4 ) ライセンス: Link先を確認 | Samuel E. Otto, Alberto Padovan, Clarence W. Rowley | (参考訳) データ駆動の縮小次モデルでは、例えば適切な直交分解、カーネル主成分分析、オートエンコーダによって、そのような座標がしばしば切り離されるため、低分散の座標に沿って敏感な高次元非線形力学系の正確な予測に失敗する。
これらの手法を実証し, 単純かつ挑戦的な3次元システムと, 10^5$状態変数を持つ非線形軸対称噴流シミュレーションについて, 各種手法との比較を行った。 Data-driven reduced-order models often fail to make accurate forecasts of high-dimensional nonlinear dynamical systems that are sensitive along coordinates with low-variance because such coordinates are often truncated, e.g., by proper orthogonal decomposition, kernel principal component analysis, and autoencoders. Such systems are encountered frequently in shear-dominated fluid flows where non-normality plays a significant role in the growth of disturbances. In order to address these issues, we employ ideas from active subspaces to find low-dimensional systems of coordinates for model reduction that balance adjoint-based information about the system's sensitivity with the variance of states along trajectories. The resulting method, which we refer to as covariance balancing reduction using adjoint snapshots (CoBRAS), is analogous to balanced truncation with state and adjoint-based gradient covariance matrices replacing the system Gramians and obeying the same key transformation laws. Here, the extracted coordinates are associated with an oblique projection that can be used to construct Petrov-Galerkin reduced-order models. We provide an efficient snapshot-based computational method analogous to balanced proper orthogonal decomposition. This also leads to the observation that the reduced coordinates can be computed relying on inner products of state and gradient samples alone, allowing us to find rich nonlinear coordinates by replacing the inner product with a kernel function. In these coordinates, reduced-order models can be learned using regression. We demonstrate these techniques and compare to a variety of other methods on a simple, yet challenging three-dimensional system and a nonlinear axisymmetric jet flow simulation with $10^5$ state variables. | 翻訳日:2023-04-14 20:45:02 公開日:2023-04-12 |
# 絡み合い推定の実験的検討 Experimental Examination of Entanglement Estimates ( http://arxiv.org/abs/2207.07584v3 ) ライセンス: Link先を確認 | Songbo Xie, Yuan-Yuan Zhao, Chao Zhang, Yun-Feng Huang, Chuan-Feng Li, Guang-Can Guo, and Joseph H. Eberly | (参考訳) 近年,3ビットの純状態(Xie and Eberly, Phys. Rev. Lett. 127, 040403 (2021))に対して,真の純多部絡み合い(GME)尺度が発見されている。
主要な提案は g\"uhne, reimpell, and werner [phys. rev. lett. 98, 110502 (2007)] によってなされ、彼は絡み合いの予測値を使って、絡み合いの下限推定を記述した。
本稿では,最近の実験で用意した多数の純粋混合状態に対する絡み合い測度を推定することにより,このアプローチを定義し,それを説明する。 Recently a proper genuine multipartite entanglement (GME) measure has been found for three-qubit pure states [see Xie and Eberly, Phys. Rev. Lett. 127, 040403 (2021)], but capturing useful entanglement measures for mixed states has remained an open challenge. So far, it requires not only a full tomography in experiments, but also huge calculational labor. A leading proposal was made by G\"uhne, Reimpell, and Werner [Phys. Rev. Lett. 98, 110502 (2007)], who used expectation values of entanglement witnesses to describe a lower bound estimation of entanglement. We provide here an extension that also gives genuine upper bounds of entanglement. This advance requires only the expectation value of {\em any} Hermitian operator. Moreover, we identify a class of operators $\A_1$ which not only give good estimates, but also require a remarkably small number of experimental measurements. In this note we define our approach and illustrate it by estimating entanglement measures for a number of pure and mixed states prepared in our recent experiments. | 翻訳日:2023-04-14 20:44:15 公開日:2023-04-12 |
# 長距離交絡量子物質への近道としての計測 Measurement as a shortcut to long-range entangled quantum matter ( http://arxiv.org/abs/2206.13527v3 ) ライセンス: Link先を確認 | Tsung-Cheng Lu, Leonardo A. Lessa, Isaac H. Kim, Timothy H. Hsieh | (参考訳) ユニタリ回路を用いた長距離絡み合った状態の生成はリーブ・ロビンソン境界によって制限されるが、射影測定とフィードバック(`<adaptive circuits''')を持つ回路はそのような制限を回避することができる。
3つのクラスは、テンソルネットワーク構成、マルチスケールエンタングルメント再正規化 ansatz (mera)、parton構成など、異なる物理的洞察にインスパイアされている。
本研究は, 状態形成のための計測の実用的, 概念的汎用性を示す。 The preparation of long-range entangled states using unitary circuits is limited by Lieb-Robinson bounds, but circuits with projective measurements and feedback (``adaptive circuits'') can evade such restrictions. We introduce three classes of local adaptive circuits that enable low-depth preparation of long-range entangled quantum matter characterized by gapped topological orders and conformal field theories (CFTs). The three classes are inspired by distinct physical insights, including tensor-network constructions, multiscale entanglement renormalization ansatz (MERA), and parton constructions. A large class of topological orders, including chiral topological order, can be prepared in constant depth or time, and one-dimensional CFT states and non-abelian topological orders with both solvable and non-solvable groups can be prepared in depth scaling logarithmically with system size. We also build on a recently discovered correspondence between symmetry-protected topological phases and long-range entanglement to derive efficient protocols for preparing symmetry-enriched topological order and arbitrary CSS (Calderbank-Shor-Steane) codes. Our work illustrates the practical and conceptual versatility of measurement for state preparation. | 翻訳日:2023-04-14 20:43:52 公開日:2023-04-12 |
# 非エルミート皮膚効果による絡み合い相転移 Entanglement Phase Transition Induced by the Non-Hermitian Skin Effect ( http://arxiv.org/abs/2206.05384v2 ) ライセンス: Link先を確認 | Kohei Kawabata, Tokiro Numasawa, Shinsei Ryu | (参考訳) 近年、非エルミート的ハミルトニアンによって効果的に記述されたオープン量子システムにおいて顕著な発展が見られる。
その結果, 皮膚効果は粒子のマクロな流れを生じさせ, 絡み合い伝播と熱化を抑制することを示し, 非平衡定常状態における絡み合いエントロピーの面積則を導いた。
さらに, 障害や相互作用を伴わずとも, ユニタリダイナミクスと皮膚効果の競合によって引き起こされる絡み合い相転移を明らかにする。
さらに,lindblad master方程式によって記述されたマルコフ開量子系においても,皮膚効果はフォン・ノイマンエントロピーの精製と減少をもたらすことを示した。
我々の研究は、エンタングルメント成長を制御する方法を開き、熱平衡から遠く離れたオープン量子システムにおける相転移と臨界現象の基本的な理解を確立する。 Recent years have seen remarkable development in open quantum systems effectively described by non-Hermitian Hamiltonians. A unique feature of non-Hermitian topological systems is the skin effect, anomalous localization of an extensive number of eigenstates driven by nonreciprocal dissipation. Despite its significance for non-Hermitian topological phases, the relevance of the skin effect to quantum entanglement and critical phenomena has remained unclear. Here, we find that the skin effect induces a nonequilibrium quantum phase transition in the entanglement dynamics. We show that the skin effect gives rise to a macroscopic flow of particles and suppresses the entanglement propagation and thermalization, leading to the area law of the entanglement entropy in the nonequilibrium steady state. Moreover, we reveal an entanglement phase transition induced by the competition between the unitary dynamics and the skin effect even without disorder or interactions. This entanglement phase transition accompanies nonequilibrium quantum criticality characterized by a nonunitary conformal field theory whose effective central charge is extremely sensitive to the boundary conditions. We also demonstrate that it originates from an exceptional point of the non-Hermitian Hamiltonian and the concomitant scale invariance of the skin modes localized according to the power law. Furthermore, we show that the skin effect leads to the purification and the reduction of von Neumann entropy even in Markovian open quantum systems described by the Lindblad master equation. Our work opens a way to control the entanglement growth and establishes a fundamental understanding of phase transitions and critical phenomena in open quantum systems far from thermal equilibrium. | 翻訳日:2023-04-14 20:43:26 公開日:2023-04-12 |
# 固定次元におけるカーネルリッジレス回帰の不整合について On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions ( http://arxiv.org/abs/2205.13525v3 ) ライセンス: Link先を確認 | Daniel Beaglehole, Mikhail Belkin, Parthe Pandit | (参考訳) ``benign overfitting''は、ノイズの多いトレーニングデータを補間するアルゴリズムの能力だが、サンプル外ではうまく機能する能力であり、近年は大きな関心を集めている。
この結果は、gaussian、laplace、cauchyなど、一般的に使われる翻訳不変カーネルに適用できる。 ``Benign overfitting'', the ability of certain algorithms to interpolate noisy training data and yet perform well out-of-sample, has been a topic of considerable recent interest. We show, using a fixed design setup, that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. In particular, the estimated predictor does not converge to the ground truth with increasing sample size, for any non-zero regression function and any (even adaptive) bandwidth selection. To prove these results, we give exact expressions for the generalization error, and its decomposition in terms of an approximation error and an estimation error that elicits a trade-off based on the selection of the kernel bandwidth. Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy. | 翻訳日:2023-04-14 20:42:08 公開日:2023-04-12 |
# キラルマルチチャネル近藤モデルにおける非アベリア異性体操作 Manipulating Non-Abelian Anyons in a Chiral Multichannel Kondo Model ( http://arxiv.org/abs/2205.04418v4 ) ライセンス: Link先を確認 | Matan Lotem, Eran Sela, Moshe Goldstein | (参考訳) 非アベリア・エノン(Non-Abelian anyon)は、ある種のトポロジカル超伝導体や量子ホール状態を記述すると信じられているギャップ付きトポロジカルモデルの分数励起である。
カイラルエッジを持つマルチチャネル・コンドシステムの実現に向けた最近の印象的な進歩は,予想よりも早く実現できる可能性がある。 Non-Abelian anyons are fractional excitations of gapped topological models believed to describe certain topological superconductors or quantum Hall states. Here, we provide the first numerical evidence that they emerge as independent entities also in gapless electronic models. Starting from a multi-impurity multichannel chiral Kondo model, we introduce a novel mapping to a single-impurity model, amenable to Wilson's numerical renormalization group. We extract its spectral degeneracy structure and fractional entropy, and calculate the $F$ matrices, which encode the topological information regarding braiding of anyons, directly from impurity spin-spin correlations. Impressive recent advances on realizing multichannel Kondo systems with chiral edges may thus bring anyons into reality sooner than expected. | 翻訳日:2023-04-14 20:41:42 公開日:2023-04-12 |
# ベクトル量子化セマンティック通信システム Vector Quantized Semantic Communication System ( http://arxiv.org/abs/2209.11519v2 ) ライセンス: Link先を確認 | Qifan Fu, Huiqiang Xie, Zhijin Qin, Gregory Slabaugh, and Xiaoming Tao | (参考訳) アナログ・セマンティック・コミュニケーション・システムは文献で注目されているが、デジタル・セマンティック・コミュニケーション・システムの研究は少ない。
実験の結果,提案するVQ-DeepSCはBPGよりも頑健であり,DeepJSCC法に匹敵するMS-SSIM性能を有することがわかった。 Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces to perform semantic feature quantization, rendering the data compatible with digital communication systems. Furthermore, we employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator. Experimental results demonstrate that the proposed VQ-DeepSC is more robustness than BPG in digital communication systems and has comparable MS-SSIM performance to the DeepJSCC method. | 翻訳日:2023-04-14 20:32:41 公開日:2023-04-12 |
# 周期駆動非相互多体スピン系における予熱 Prethermalization in periodically-driven nonreciprocal many-body spin systems ( http://arxiv.org/abs/2208.09005v2 ) ライセンス: Link先を確認 | Adam J. McRoberts, Hongzheng Zhao, Roderich Moessner, and Marin Bukov | (参考訳) 相互作用するカオス的古典スピン系の時間周期的非相互力学の新しいクラスを解析し、その運動方程式は保守的(位相空間体積保存)であるがシンプレクティック構造を持たない。
そこで本研究では, スピンが開放的かつ非散逸なサブシステムを構成する補助自由度を用いたハミルトニアン拡張を提案する。
したがって、周期駆動系の高周波限界で観測される熱前力学の概念を非相反系に拡張する。 We analyze a new class of time-periodic nonreciprocal dynamics in interacting chaotic classical spin systems, whose equations of motion are conservative (phase-space-volume-preserving) yet possess no symplectic structure. As a result, the dynamics of the system cannot be derived from any time-dependent Hamiltonian. In the high-frequency limit, we find that the magnetization dynamics features a long-lived metastable plateau, whose duration is controlled by the fourth power of the drive frequency. However, due to the lack of an effective Hamiltonian, the prethermal state the system evolves into cannot be understood within the framework of the canonical ensemble. We propose a Hamiltonian extension of the system using auxiliary degrees of freedom, in which the original spins constitute an open yet nondissipative subsystem. This allows us to perturbatively derive effective equations of motion that manifestly display symplecticity breaking at leading order in the inverse frequency. We thus extend the notion of prethermal dynamics, observed in the high-frequency limit of periodically-driven systems, to nonreciprocal systems. | 翻訳日:2023-04-14 20:32:22 公開日:2023-04-12 |
# 部分観測可能性に基づくネットワーク動的システムのグラフの復元:ディープラーニングアプローチ Recovering the Graph Underlying Networked Dynamical Systems under Partial Observability: A Deep Learning Approach ( http://arxiv.org/abs/2208.04405v3 ) ライセンス: Link先を確認 | S\'ergio Machado, Anirudh Sridhar, Paulo Gil, Jorge Henriques, Jos\'e M. F. Moura, Augusto Santos | (参考訳) 本研究では,時系列間の依存関係のグラフを復元するグラフ構造同定の問題について検討する。
これは、ネットワーク内のすべてのノードの観測や処理が禁止される大規模システムのフレームワークに適合する。 We study the problem of graph structure identification, i.e., of recovering the graph of dependencies among time series. We model these time series data as components of the state of linear stochastic networked dynamical systems. We assume partial observability, where the state evolution of only a subset of nodes comprising the network is observed. We devise a new feature vector computed from the observed time series and prove that these features are linearly separable, i.e., there exists a hyperplane that separates the cluster of features associated with connected pairs of nodes from those associated with disconnected pairs. This renders the features amenable to train a variety of classifiers to perform causal inference. In particular, we use these features to train Convolutional Neural Networks (CNNs). The resulting causal inference mechanism outperforms state-of-the-art counterparts w.r.t. sample-complexity. The trained CNNs generalize well over structurally distinct networks (dense or sparse) and noise-level profiles. Remarkably, they also generalize well to real-world networks while trained over a synthetic network (realization of a random graph). Finally, the proposed method consistently reconstructs the graph in a pairwise manner, that is, by deciding if an edge or arrow is present or absent in each pair of nodes, from the corresponding time series of each pair. This fits the framework of large-scale systems, where observation or processing of all nodes in the network is prohibitive. | 翻訳日:2023-04-14 20:31:51 公開日:2023-04-12 |
# 量子ファンデルpol振動子の位相同期 Topological synchronization of quantum van der Pol oscillators ( http://arxiv.org/abs/2208.01061v2 ) ライセンス: Link先を確認 | Christopher W. W\"achtler, Gloria Platero | (参考訳) 古典的または量子的なシステムの大きなネットワークにおける同期を観察するには、ノード間の相互作用の優れた制御と、関連する非線形性と散逸による初期条件の非常に正確な調整が必要である。
我々の研究はトポロジーの概念を一般的な非線形ダイナミクスとオープン量子システム領域に拡張し、特定のノードが電力グリッドや量子ネットワークのような特別な保護を必要とするネットワークに適用する。 To observe synchronization in a large network of classical or quantum systems demands both excellent control of the interactions between the nodes and very accurate preparation of the initial conditions due to the involved nonlinearities and dissipation. This limits the applicability of this phenomenon for future devices. Here, we demonstrate a route towards significantly enhancing the robustness of synchronized behavior in open nonlinear systems that utilizes the power of topology. In a lattice of quantum van der Pol oscillators with topologically motivated couplings, boundary synchronization emerges in the classical mean field as well as the quantum model. In addition to its robustness against disorder and initial state perturbations, the observed dynamics is independent of the underlying topological insulator model provided the existence of zero-energy modes. Our work extends the notion of topology to the general nonlinear dynamics and open quantum system realm with applications to networks where specific nodes need special protection like power grids or quantum networks. | 翻訳日:2023-04-14 20:31:28 公開日:2023-04-12 |
# 差分プライバシとセキュアアグリゲーションを併用したフェデレーション学習における個々のデータポイントの再構築 Reconstructing Individual Data Points in Federated Learning Hardened with Differential Privacy and Secure Aggregation ( http://arxiv.org/abs/2301.04017v2 ) ライセンス: Link先を確認 | Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, Nicolas Papernot | (参考訳) Federated Learning(FL)は、機械学習モデルを共同でトレーニングするためのフレームワークである。
FLは、データの最小化を提供するプライバシー強化技術(PET)として推進されている: データは、パーソナルデバイスを決して“解放”せず、ユーザは、分散トレーニングをコーディネートするサーバ(例えば、会社)とのみモデル更新を共有する。
しかし、後者のアプローチは、トレーニングされたモデルのパフォーマンス低下の観点から大きなオーバーヘッドを負い、実際にデプロイされる可能性が低くなる。 Federated learning (FL) is a framework for users to jointly train a machine learning model. FL is promoted as a privacy-enhancing technology (PET) that provides data minimization: data never "leaves" personal devices and users share only model updates with a server (e.g., a company) coordinating the distributed training. While prior work showed that in vanilla FL a malicious server can extract users' private data from the model updates, in this work we take it further and demonstrate that a malicious server can reconstruct user data even in hardened versions of the protocol. More precisely, we propose an attack against FL protected with distributed differential privacy (DDP) and secure aggregation (SA). Our attack method is based on the introduction of sybil devices that deviate from the protocol to expose individual users' data for reconstruction by the server. The underlying root cause for the vulnerability to our attack is a power imbalance: the server orchestrates the whole protocol and users are given little guarantees about the selection of other users participating in the protocol. Moving forward, we discuss requirements for privacy guarantees in FL. We conclude that users should only participate in the protocol when they trust the server or they apply local primitives such as local DP, shifting power away from the server. Yet, the latter approaches come at significant overhead in terms of performance degradation of the trained model, making them less likely to be deployed in practice. | 翻訳日:2023-04-14 20:24:58 公開日:2023-04-12 |
# 神経常微分方程式を用いたサブグリッドスケールモデルの学習 Learning Subgrid-scale Models with Neural Ordinary Differential Equations ( http://arxiv.org/abs/2212.09967v3 ) ライセンス: Link先を確認 | Shinhoo Kang, Emil M. Constantinescu | (参考訳) 線形法により解いた偏微分方程式(PDE)とカオス常微分方程式の表現を,ニューラル常微分方程式(NODE)に基づいてシミュレーションする際のサブグリッドスケールモデルの学習手法を提案する。
2スケールのローレンツ96ODE、対流拡散PDE、粘性バーガースのPDEによる数値的な結果を用いて、このアプローチを説明する。 We propose a new approach to learning the subgrid-scale model when simulating partial differential equations (PDEs) solved by the method of lines and their representation in chaotic ordinary differential equations, based on neural ordinary differential equations (NODEs). Solving systems with fine temporal and spatial grid scales is an ongoing computational challenge, and closure models are generally difficult to tune. Machine learning approaches have increased the accuracy and efficiency of computational fluid dynamics solvers. In this approach neural networks are used to learn the coarse- to fine-grid map, which can be viewed as subgrid-scale parameterization. We propose a strategy that uses the NODE and partial knowledge to learn the source dynamics at a continuous level. Our method inherits the advantages of NODEs and can be used to parameterize subgrid scales, approximate coupling operators, and improve the efficiency of low-order solvers. Numerical results with the two-scale Lorenz 96 ODE, the convection-diffusion PDE, and the viscous Burgers' PDE are used to illustrate this approach. | 翻訳日:2023-04-14 20:24:34 公開日:2023-04-12 |
# 因果AIのための因果表現学習と再定義DAGの実現 Realization of Causal Representation Learning and Redefined DAG for Causal AI ( http://arxiv.org/abs/2211.08573v6 ) ライセンス: Link先を確認 | Jia Li, Xiang Li, Xiaowei Jia, Michael Steinbach, Vipin Kumar | (参考訳) 因果DAG(Directed Acyclic Graph)は通常、相関変化や因果効果を区別せずに2次元平面上に位置する。
現在、AI(Artificial Intelligence)はより大規模な構造モデリングを可能にしており、複雑な隠れた境界により、近似誤差はもはや無視できないが、かなりの人口レベルの因果表現バイアスに雪を降らせることができる。
このようなバイアスは、一般化不能因果モデル、未発見個別特徴、DL(Deep Learning)における有効因果知識など、重大な問題を引き起こしている。
本稿では、再定義されたdo-DAGの概念を導入し、CRLを実現するための新しいアーキテクチャと、その実現可能性について実験的に検証する、Causal Representation Learning (CRL)フレームワークを提案する。 Causal DAG(Directed Acyclic Graph) usually lies in a 2D plane without distinguishing correlation changes and causal effects. Also, the causal effect is often approximately estimated by averaging the population's correlation changes. Now, AI(Artificial Intelligence) enables much larger-scale structural modeling, whose complex hidden confoundings make the approximation errors no longer ignorable but can snowball to considerable population-level Causal Representation Bias. Such bias has caused significant problems: ungeneralizable causal models, unrevealed individual-level features, not utilizable causal knowledge in DL(Deep Learning), etc. In short, DAG must be redefined to enable a new framework for causal AI. Observational time series can only reflect correlation changes in statistics. But the DL-based autoencoder can represent them as individual-level feature changes in latent space to reflect causal effects. In this paper, we introduce the redefined do-DAG concept and propose Causal Representation Learning (CRL) framework as the generic solution, along with a novel architecture to realize CRL and experimentally verify its feasibility. | 翻訳日:2023-04-14 20:23:03 公開日:2023-04-12 |
# ニューラルネットワークの原子間ポテンシャルにおけるデータ効率と外挿傾向 Data efficiency and extrapolation trends in neural network interatomic potentials ( http://arxiv.org/abs/2302.05823v2 ) ライセンス: Link先を確認 | Joshua A. Vita, Daniel Schwalbe-Koda | (参考訳) 近年,nnips(neural network interatomic potentials)において,メッセージパッシングネットワーク,等価性,多体拡張といった重要なアーキテクチャ上の進歩が提案されている。
NequIP と MACE に関する大規模な研究により、損失エントロピーはトレーニングセットのみで計算されているにもかかわらず、分布外誤差とMD安定性を予測する。
我々の研究は、多くの共通NNIPの補間性能の深層学習の正当性を提供し、次世代モデルの開発に有用な精度測定以上のツールを導入している。 Over the last few years, key architectural advances have been proposed for neural network interatomic potentials (NNIPs), such as incorporating message-passing networks, equivariance, or many-body expansion terms. Although modern NNIP models exhibit small differences in energy/forces errors, improvements in accuracy are still considered the main target when developing new NNIP architectures. In this work, we show how architectural and optimization choices influence the generalization of NNIPs, revealing trends in molecular dynamics (MD) stability, data efficiency, and loss landscapes. Using the 3BPA dataset, we show that test errors in NNIP follow a scaling relation and can be robust to noise, but cannot predict MD stability in the high-accuracy regime. To circumvent this problem, we propose the use of loss landscape visualizations and a metric of loss entropy for predicting the generalization power of NNIPs. With a large-scale study on NequIP and MACE, we show that the loss entropy predicts out-of-distribution error and MD stability despite being computed only on the training set. Using this probe, we demonstrate how the choice of optimizers, loss function weighting, data normalization, and other architectural decisions influence the extrapolation behavior of NNIPs. Finally, we relate loss entropy to data efficiency, demonstrating that flatter landscapes also predict learning curve slopes. Our work provides a deep learning justification for the extrapolation performance of many common NNIPs, and introduces tools beyond accuracy metrics that can be used to inform the development of next-generation models. | 翻訳日:2023-04-14 20:14:18 公開日:2023-04-12 |
# 効率的なグラフフィールド積分器がポイントクラウドと出会う Efficient Graph Field Integrators Meet Point Clouds ( http://arxiv.org/abs/2302.00942v3 ) ライセンス: Link先を確認 | Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller | (参考訳) 点雲を符号化するグラフ上での効率的な場積分のためのアルゴリズムを2種類提案する。
どちらも、効率的な統合に多大な影響を与えたFMM(Fast Multipole Methods)の機能を提供するが、非ユークリッド空間ではそうではない。
また,剛体および変形可能な物体の面補間(特にメッシュ力学モデリング),点雲のwasserstein距離計算,gromov-wasserstein変種など,徹底的な実験評価を行う。 We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (particularly for mesh-dynamics modeling), Wasserstein distance computations for point clouds, and the Gromov-Wasserstein variant. | 翻訳日:2023-04-14 20:13:25 公開日:2023-04-12 |
# zico:勾配の変動の逆係数によるゼロショットnas ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients ( http://arxiv.org/abs/2301.11300v3 ) ライセンス: Link先を確認 | Guihong Li, Yuedong Yang, Kartikeya Bhardwaj, Radu Marculescu | (参考訳) ニューラルネットワーク探索(NAS)は、多数の候補アーキテクチャの中で最高の性能を持つニューラルネットワークを自動的に取得するために広く使用されている。
我々は、複数のアプリケーション(画像分類/再構成や画素レベルの予測など)において、複数のNASベンチマーク(NASBench101, NATSBench-SSS/TSS, TransNASBench-101)上で、ZiCoがState-Of-The-Art(SOTA)プロキシよりも優れていることを示した。
例えば、ZiCoベースのNASは、イメージネットで0.4GPU日以内に、それぞれ450M、600M、1000M FLOPの推論予算で78.1%、79.4%、80.4%のテスト精度で最適なアーキテクチャを見つけることができる。
我々のコードはhttps://github.com/SLDGroup/ZiCo.comで入手できる。 Neural Architecture Search (NAS) is widely used to automatically obtain the neural network with the best performance among a large number of candidate architectures. To reduce the search time, zero-shot NAS aims at designing training-free proxies that can predict the test performance of a given architecture. However, as shown recently, none of the zero-shot proxies proposed to date can actually work consistently better than a naive proxy, namely, the number of network parameters (#Params). To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate and generalization capacity of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. We demonstrate that ZiCo works better than State-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101, NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., image classification/reconstruction and pixel-level prediction). Finally, we demonstrate that the optimal architectures found via ZiCo are as competitive as the ones found by one-shot and multi-shot NAS methods, but with much less search time. For example, ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code is available at https://github.com/SLDGroup/ZiCo. | 翻訳日:2023-04-14 20:13:04 公開日:2023-04-12 |
# InfluencerRank: Graph Convolutional Attentive Recurrent Neural Networksによる効果的なインフルエンサー発見 InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks ( http://arxiv.org/abs/2304.01897v2 ) ライセンス: Link先を確認 | Seungbae Kim, Jyun-Yu Jiang, Jinyoung Han, Wei Wang | (参考訳) インフルエンサーがソーシャルメディアマーケティングにおいてかなりの役割を果たすと、企業はインフルエンサーマーケティングの予算を増やすことになる。
詳細な分析により,提案する機能やモデルコンポーネントがすべて有効であることがわかった。 As influencers play considerable roles in social media marketing, companies increase the budget for influencer marketing. Hiring effective influencers is crucial in social influencer marketing, but it is challenging to find the right influencers among hundreds of millions of social media users. In this paper, we propose InfluencerRank that ranks influencers by their effectiveness based on their posting behaviors and social relations over time. To represent the posting behaviors and social relations, the graph convolutional neural networks are applied to model influencers with heterogeneous networks during different historical periods. By learning the network structure with the embedded node features, InfluencerRank can derive informative representations for influencers at each period. An attentive recurrent neural network finally distinguishes highly effective influencers from other influencers by capturing the knowledge of the dynamics of influencer representations over time. Extensive experiments have been conducted on an Instagram dataset that consists of 18,397 influencers with their 2,952,075 posts published within 12 months. The experimental results demonstrate that InfluencerRank outperforms existing baseline methods. An in-depth analysis further reveals that all of our proposed features and model components are beneficial to discover effective influencers. | 翻訳日:2023-04-14 20:06:23 公開日:2023-04-12 |
# 異種メモリアーキテクチャを用いたnlpエッジ推論のための省エネルギータスク適応 Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures ( http://arxiv.org/abs/2303.16100v2 ) ライセンス: Link先を確認 | Zirui Fu, Aleksandre Avaliani, Marco Donato | (参考訳) リソース制約のあるエッジデバイス上で機械学習推論タスクを実行するには、注意深いハードウェアとソフトウェアの共同設計最適化が必要だ。
さらに、検証済みのNLPエッジアクセラレータ上でシミュレーションを行い、同じハードウェアプラットフォーム上での従来のALBERTモデルの実行に対する性能、パワー、面積の改善を概説することで、モデルを不均一なオンチップメモリアーキテクチャにマッピングする利点を示す。 Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform. | 翻訳日:2023-04-14 20:04:59 公開日:2023-04-12 |
# ホモロジー量子ローター符号:トーションからの論理量子ビット Homological Quantum Rotor Codes: Logical Qubits from Torsion ( http://arxiv.org/abs/2303.13723v2 ) ライセンス: Link先を確認 | Christophe Vuillot and Alessandro Ciani and Barbara M. Terhal | (参考訳) 複数の量子ローターを用いて論理情報を符号化するホモロジー量子ローター符号を正式に定義する。
特に、実射影平面またはM\ "{o}bius strip" が量子ビットを符号化することによって得られる鎖複体に基づくコードである。
本稿では, 連続安定器位相シフトによって拡散する論理演算子の概念により, 量子ビットの場合よりも微妙な符号間の距離スケーリングについて考察する。
我々は、キータエフの現在のミラー量子ビット(m\"{o}bius strip qubit)と同様に$0$-$\pi$-qubitが、そのようなコードの小さな例であり、拡張の可能性について議論している。 We formally define homological quantum rotor codes which use multiple quantum rotors to encode logical information. These codes generalize homological or CSS quantum codes for qubits or qudits, as well as linear oscillator codes which encode logical oscillators. Unlike for qubits or oscillators, homological quantum rotor codes allow one to encode both logical rotors and logical qudits, depending on the homology of the underlying chain complex. In particular, such a code based on the chain complex obtained from tessellating the real projective plane or a M\"{o}bius strip encodes a qubit. We discuss the distance scaling for such codes which can be more subtle than in the qubit case due to the concept of logical operator spreading by continuous stabilizer phase-shifts. We give constructions of homological quantum rotor codes based on 2D and 3D manifolds as well as products of chain complexes. Superconducting devices being composed of islands with integer Cooper pair charges could form a natural hardware platform for realizing these codes: we show that the $0$-$\pi$-qubit as well as Kitaev's current-mirror qubit -- also known as the M\"{o}bius strip qubit -- are indeed small examples of such codes and discuss possible extensions. | 翻訳日:2023-04-14 20:04:38 公開日:2023-04-12 |
# GTNet:人間と物体の相互作用を検出する誘導トランスネットワーク GTNet:Guided Transformer Network for Detecting Human-Object Interactions ( http://arxiv.org/abs/2108.00596v5 ) ライセンス: Link先を確認 | A S M Iftekhar, Satish Kumar, R. Austin McEver, Suya You, B. S. Manjunath | (参考訳) human-object interaction (hoi) 検出タスクは、人間をローカライズし、オブジェクトをローカライズし、人間とオブジェクトのペア間の相互作用を予測することを指す。
コードはオンラインで入手できる。 The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. HOI is considered one of the fundamental steps in truly understanding complex visual scenes. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images that highlight the interactions between human object pairs. This issue is addressed by the novel self-attention based guided transformer network, GTNet. GTNet encodes this spatial contextual information in human and object visual features via self-attention while achieving state of the art results on both the V-COCO and HICO-DET datasets. Code will be made available online. | 翻訳日:2023-04-14 17:43:24 公開日:2023-04-12 |
# 確率収束を伴う高精度リコール曲線下の領域の確率的最適化 Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence ( http://arxiv.org/abs/2104.08736v5 ) ライセンス: Link先を確認 | Qi Qi, Youzhi Luo, Zhao Xu, Shuiwang Ji, Tianbao Yang | (参考訳) ROC(AUROC)と精度リコール曲線(AUPRC)の下の領域は、不均衡問題に対する分類性能を評価するための一般的な指標である。
提案手法は, AUPRCの非バイアス点推定器である平均精度(AP)を最大化することに基づいている。
soapはlibaucライブラリに~\url{https://libauc.org/}で実装されている。 Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While stochastic optimization of AUROC has been studied extensively, principled stochastic optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We cast the objective into a sum of {\it dependent compositional functions} with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms named SOAP with {\it provable convergence guarantee under mild conditions} by leveraging recent advances in stochastic compositional optimization. Extensive experimental results on image and graph datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems in terms of AUPRC. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence. The SOAP has been implemented in the libAUC library at~\url{https://libauc.org/}. | 翻訳日:2023-04-14 17:43:11 公開日:2023-04-12 |
# 移動可能な標的攻撃に対する自己普遍性の向上 Enhancing the Self-Universality for Transferable Targeted Attacks ( http://arxiv.org/abs/2209.03716v3 ) ライセンス: Link先を確認 | Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang | (参考訳) 本稿では,訓練データに対する補助ネットワークのトレーニングを必要とせず,対向的摂動を最適化するトランスファーベースターゲティング攻撃手法を提案する。
具体的には, 対角的摂動大域画像とランダムに収穫した局所領域との間の特徴類似性を最大化することにより, 学習摂動の普遍化を促す特徴類似性損失を導入する。
特徴的類似性を失うことにより, 対向的摂動の特徴が良性画像よりも支配的になり, 目的の伝達性も向上する。
コードはhttps://github.com/zhipeng-wei/self-universalityで入手できる。 In this paper, we propose a novel transfer-based targeted attack method that optimizes the adversarial perturbations without any extra training efforts for auxiliary networks on training data. Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks. Therefore, we propose to make the perturbation to be agnostic to different local regions within one image, which we called as self-universality. Instead of optimizing the perturbations on different images, optimizing on different regions to achieve self-universality can get rid of using extra data. Specifically, we introduce a feature similarity loss that encourages the learned perturbations to be universal by maximizing the feature similarity between adversarial perturbed global images and randomly cropped local regions. With the feature similarity loss, our method makes the features from adversarial perturbations to be more dominant than that of benign images, hence improving targeted transferability. We name the proposed attack method as Self-Universality (SU) attack. Extensive experiments demonstrate that SU can achieve high success rates for transfer-based targeted attacks. On ImageNet-compatible dataset, SU yields an improvement of 12\% compared with existing state-of-the-art methods. Code is available at https://github.com/zhipeng-wei/Self-Universality. | 翻訳日:2023-04-14 17:35:29 公開日:2023-04-12 |
# KL分割に基づく離散時間モデルのための深層学習 KL-divergence Based Deep Learning for Discrete Time Model ( http://arxiv.org/abs/2208.05100v2 ) ライセンス: Link先を確認 | Li Liu, Xiangeng Fang, Di Wang, Weijing Tang, Kevin He | (参考訳) ニューラルネットワーク(Deep Learning)は、人工知能の現代モデルであり、Survival Analysisで活用されている。
この課題に対処するため,Kulback-Leibler-based Deep Learning(KL)法を開発し,新たに収集した時系列データと外部生存予測モデルを統合する。
ディープラーニングのためのSurvival Analysisにおいて、事前情報を用いて短いデータ問題に対処することを検討する最初の作業である。
シミュレーションと実データの結果から,提案モデルが従来よりも優れた性能と高いロバスト性を実現することが示された。 Neural Network (Deep Learning) is a modern model in Artificial Intelligence and it has been exploited in Survival Analysis. Although several improvements have been shown by previous works, training an excellent deep learning model requires a huge amount of data, which may not hold in practice. To address this challenge, we develop a Kullback-Leibler-based (KL) deep learning procedure to integrate external survival prediction models with newly collected time-to-event data. Time-dependent KL discrimination information is utilized to measure the discrepancy between the external and internal data. This is the first work considering using prior information to deal with short data problem in Survival Analysis for deep learning. Simulation and real data results show that the proposed model achieves better performance and higher robustness compared with previous works. | 翻訳日:2023-04-14 17:35:09 公開日:2023-04-12 |
# 逆熱散逸を伴う生成モデル Generative Modelling With Inverse Heat Dissipation ( http://arxiv.org/abs/2206.13397v7 ) ライセンス: Link先を確認 | Severi Rissanen, Markus Heinonen, Arno Solin | (参考訳) 拡散モデルは画像生成において大きな成功を収めているが、ノイズ反転生成過程は画像の構造を明示的に考慮していない。
熱方程式を確率論的に反転させて画像を生成する拡散モデル, 画像の2次元平面上での走行時に局所的に微細な情報を消去するPDEを提案する。
自然画像のスペクトル解析は拡散モデルとの関係を強調し、それらに暗黙的に粗い帰納バイアスが現れる。 While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation with constant additive noise as a variational approximation in the diffusion latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images. Spectral analysis on natural images highlights connections to diffusion models and reveals an implicit coarse-to-fine inductive bias in them. | 翻訳日:2023-04-14 17:33:58 公開日:2023-04-12 |
# データ幻覚による反復指導 Iterative Teaching by Data Hallucination ( http://arxiv.org/abs/2210.17467v2 ) ライセンス: Link先を確認 | Zeju Qiu, Weiyang Liu, Tim Z. Xiao, Zhen Liu, Umang Bhatt, Yucen Luo, Adrian Weller, Bernhard Sch\"olkopf | (参考訳) 本稿では,教師が個別の入力空間(すなわち有限サンプルのプール)における学習者の状況に基づく事例を逐次提供し,教師の能力を大幅に制限する反復型機械指導の課題について考察する。
大規模な実験によりDHTの有効性が検証された。 We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving an optimization problem or drawn directly from a continuous distribution. Specifically, we propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept. We study a number of challenging teaching setups (e.g., linear/neural learners in omniscient and black-box settings). Extensive empirical results verify the effectiveness of DHT. | 翻訳日:2023-04-14 17:25:42 公開日:2023-04-12 |
# 共有利用自律移動サービスのための予測フリート配置:最適化と学習に基づくアプローチ Anticipatory Fleet Repositioning for Shared-use Autonomous Mobility Services: An Optimization and Learning-Based Approach ( http://arxiv.org/abs/2210.08659v2 ) ライセンス: Link先を確認 | Monika Filipovska, Michael Hyland, Haimanti Bala | (参考訳) モビリティ・オン・デマンドサービス、リッチ・トランスポート・データソース、自動運転車(AV)の開発は、共有用途のAVモビリティサービス(SAMS)において、アクセシブルで需要に反応するパーソナルモビリティを提供する重要な機会を生み出している。
本稿では, アイドル車両の予測再配置によるSAMS車両の効率とサービス品質の向上に焦点をあてる。
本手法は,アドバンテージ・アクタ・アタクタ (a2c) 強化学習に基づく手法を用いて解くマルコフ決定過程として定式化されている。
ニューヨーク市のタクシーデータとエージェントベースのシミュレーションツールを用いて、A2C AV再配置アプローチの2つのバージョンをテストする。
実験は、モデルが将来の需要を予測できる能力と、訓練段階では見られないケースへの転送可能性を示す。 The development of mobility-on-demand services, rich transportation data sources, and autonomous vehicles (AVs) creates significant opportunities for shared-use AV mobility services (SAMSs) to provide accessible and demand-responsive personal mobility. SAMS fleet operation involves multiple interrelated decisions, with a primary focus on efficiently fulfilling passenger ride requests with a high level of service quality. This paper focuses on improving the efficiency and service quality of a SAMS vehicle fleet via anticipatory repositioning of idle vehicles. The rebalancing problem is formulated as a Markov Decision Process, which we propose solving using an advantage actor critic (A2C) reinforcement learning-based method. The proposed approach learns a rebalancing policy that anticipates future demand and cooperates with an optimization-based assignment strategy. The approach allows for centralized repositioning decisions and can handle large vehicle fleets since the problem size does not change with the fleet size. Using New York City taxi data and an agent-based simulation tool, two versions of the A2C AV repositioning approach are tested. The first version, A2C-AVR(A), learns to anticipate future demand based on past observations, while the second, A2C-AVR(B), uses demand forecasts. The models are compared to an optimization-based rebalancing approach and show significant reduction in mean passenger waiting times, with a slightly increased percentage of empty fleet miles travelled. The experiments demonstrate the model's ability to anticipate future demand and its transferability to cases unseen at the training stage. | 翻訳日:2023-04-14 17:25:12 公開日:2023-04-12 |
# sqa3d: 3dシーンで質問に答える場所 SQA3D: Situated Question Answering in 3D Scenes ( http://arxiv.org/abs/2210.07474v5 ) ライセンス: Link先を確認 | Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang | (参考訳) 3dシーンにおける質問応答(sqa3d)の具体化エージェントのシーン理解をベンチマークするタスクを提案する。
SQA3Dは、より強力な状況理解と推論能力を備えた未来のAI研究を促進することができると信じている。 We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g., 3D scan), SQA3D requires the tested agent to first understand its situation (position, orientation, etc.) in the 3D scene as described by text, then reason about its surrounding environment and answer a question under that situation. Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8k unique situations, along with 20.4k descriptions and 33.4k diverse reasoning questions for these situations. These questions examine a wide spectrum of reasoning capabilities for an intelligent agent, ranging from spatial relation comprehension to commonsense understanding, navigation, and multi-hop reasoning. SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the best one only achieves an overall score of 47.20%, while amateur human participants can reach 90.06%. We believe SQA3D could facilitate future embodied AI research with stronger situation understanding and reasoning capability. | 翻訳日:2023-04-14 17:24:40 公開日:2023-04-12 |
# 誘導拡散モデルの蒸留について On Distillation of Guided Diffusion Models ( http://arxiv.org/abs/2210.03142v3 ) ライセンス: Link先を確認 | Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans | (参考訳) 分類器フリーの誘導拡散モデルは最近、高分解能画像生成に非常に有効であることが示されており、dalle-2、stable diffusion、imagenといった大規模拡散フレームワークで広く使われている。
この制限に対処するため, 事前学習した分類器フリーガイド付きモデルが与えられた場合, まず, 条件付きモデルと非条件付きモデルの組み合わせの出力に適合する単一モデルを学習し, より少ないサンプリングステップを必要とする拡散モデルに段階的にそのモデルを蒸留する。
画素空間でトレーニングされた標準拡散モデルでは、ImageNet 64x64 と CIFAR-10 の4段階のサンプリングステップを用いて、元のモデルに匹敵する画像を視覚的に生成することが可能であり、サンプルの最大256倍の速度でFID/ISスコアを達成できる。
潜在空間でトレーニングされた拡散モデル(例えば安定拡散)では、1〜4段階のデノイジングステップで高忠実度画像を生成することができ、imagenet 256x256やlaionデータセットの既存の方法と比較して、少なくとも10倍の推論を加速する。
さらに, 蒸留モデルが2~4段階の分別ステップで高品質な結果を生成することができるように, テキストガイドによる画像編集とインパインティングへのアプローチの有効性を実証した。 Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps. | 翻訳日:2023-04-14 17:23:58 公開日:2023-04-12 |
# クロスドメインリモートセンシング画像セマンティックセマンティックセグメンテーションのための自己学習ガイド付きアンタングル適応 Self-Training Guided Disentangled Adaptation for Cross-Domain Remote Sensing Image Semantic Segmentation ( http://arxiv.org/abs/2301.05526v2 ) ライセンス: Link先を確認 | Qi Zhao, Shuchang Lyu, Binghao Liu, Lijiang Chen, Hongbo Zhao | (参考訳) 深部畳み込みニューラルネットワーク(DCNN)に基づくリモートセンシング(RS)画像セマンティックセグメンテーション技術は、地理的要素解析などの現実世界の多くの応用で大きな成功を収めている。
この課題では, 地中サンプリング距離, リモートセンシングセンサの変動, 地形の異なる3つの要因が, ソース画像とターゲット画像の間で劇的な領域シフトを引き起こしている。
そこで本研究では, 共通特徴を抽出し, ソーススタイルとターゲットスタイルの特徴を識別するドメイン・アンタングル・モジュールを提案する。
私たちのコードはhttps://github.com/cv516Buaa/ST-DASegNetで利用可能です。 Deep convolutional neural networks (DCNNs) based remote sensing (RS) image semantic segmentation technology has achieved great success used in many real-world applications such as geographic element analysis. However, strong dependency on annotated data of specific scene makes it hard for DCNNs to fit different RS scenes. To solve this problem, recent works gradually focus on cross-domain RS image semantic segmentation task. In this task, different ground sampling distance, remote sensing sensor variation and different geographical landscapes are three main factors causing dramatic domain shift between source and target images. To decrease the negative influence of domain shift, we propose a self-training guided disentangled adaptation network (ST-DASegNet). We first propose source student backbone and target student backbone to respectively extract the source-style and target-style feature for both source and target images. Towards the intermediate output feature maps of each backbone, we adopt adversarial learning for alignment. Then, we propose a domain disentangled module to extract the universal feature and purify the distinct feature of source-style and target-style features. Finally, these two features are fused and served as input of source student decoder and target student decoder to generate final predictions. Based on our proposed domain disentangled module, we further propose exponential moving average (EMA) based cross-domain separated self-training mechanism to ease the instability and disadvantageous effect during adversarial optimization. Extensive experiments and analysis on benchmark RS datasets show that ST-DASegNet outperforms previous methods on cross-domain RS image semantic segmentation task and achieves state-of-the-art (SOTA) results. Our code is available at https://github.com/cv516Buaa/ST-DASegNet. | 翻訳日:2023-04-14 17:16:53 公開日:2023-04-12 |
# imagen editorとeditbench: テキストガイド付き画像インパインティングの進歩と評価 Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting ( http://arxiv.org/abs/2212.06909v2 ) ライセンス: Link先を確認 | Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan | (参考訳) テキスト誘導画像編集は、クリエイティブアプリケーションをサポートする上で、変革的な影響を与える可能性がある。
テキスト誘導画像のインペイントを微調整して作成した,カスケード拡散モデルである Imagen Editor を提案する。
imagen editorの編集はテキストプロンプトに忠実であり、オブジェクト検出器を使用してトレーニング中に塗り込みマスクを提案する。
さらに、Imagen Editorは、元の高解像度画像にカスケードパイプラインを条件付けすることで、入力画像の細部をキャプチャする。
EditBench上での大規模な人的評価を通じて、トレーニング中のオブジェクトマスキングは、DALL-E 2やStable DiffusionよりもImagen Editorの方が好まれるような、テキストイメージアライメントの全面的な改善につながることが分かりました。 Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes. | 翻訳日:2023-04-14 17:16:07 公開日:2023-04-12 |
# $\mu$-deformed model of Dark Matter による新しい変形ハイゼンベルク代数 New Deformed Heisenberg Algebra from the $\mu$-Deformed Model of Dark Matter ( http://arxiv.org/abs/2304.05840v1 ) ライセンス: Link先を確認 | A.M. Gavrilik, I.I. Kachurik, A.V. Nazarenko | (参考訳) 最近、ダークマターをモデル化するための$\mu$-deformation-based approachは、$\mu$-deformed thermodynamicsを利用して、銀河ハロー密度プロファイルと、多くの(ドワーフまたは低明度)銀河の回転曲線の研究に拡張された。
この目的のために、レーン-エムデン方程式(LEE)の$\mu$-deformed analogsが提案され、それらの解は密度プロファイルを記述している。
同じ解を持つ$\mu$-deformed LEEには、一見異なる2つのバージョンがあるので、同値性を扱う。
後者の性質から、位置と運動量作用素に対して新しく、かなり珍しい$\mu$-deformed Heisenberg algebra (HA) を導出し、いくつかの可能な形式で$\mu$-HA を提示する(それぞれ $\mu\to0$ が通常の HA を回復する)。
新しい$\mu$-HAと結びついた一般化された不確実性関係は、最大長と最小長の四重項の出現とモータの出現を含む興味深い意味を持つ。 Recently, the $\mu$-deformation-based approach to modeling dark matter, which exploits $\mu$-deformed thermodynamics, was extended to the study of galaxy halo density profile and of the rotation curves of a number of (dwarf or low brightness) galaxies. For that goal, $\mu$-deformed analogs of the Lane--Emden equation (LEE) have been proposed, and their solutions describing density profiles obtained. There are two seemingly different versions of $\mu$-deformed LEE which possess the same solution, and so we deal with their equivalence. From the latter property we derive new, rather unusual, $\mu$-deformed Heisenberg algebra (HA) for the position and momentum operators, and present the $\mu$-HA in few possible forms (each one at $\mu\to0$ recovers usual HA). The generalized uncertainty relation linked with the new $\mu$-HA is studied, along with its interesting implications including the appearance of the quadruple of both maximal and minimal lengths and momenta. | 翻訳日:2023-04-14 16:57:31 公開日:2023-04-12 |
# 最大公平性 Maximal Fairness ( http://arxiv.org/abs/2304.06057v1 ) ライセンス: Link先を確認 | MaryBeth Defrance and Tijl De Bie | (参考訳) AIの公正さは、研究や社会においてもかなりの注目を集めている。
我々の研究は、様々なシナリオにおけるこれらの12の最大公平概念の実践的妥当性に関する関心を提起する。 Fairness in AI has garnered quite some attention in research, and increasingly also in society. The so-called "Impossibility Theorem" has been one of the more striking research results with both theoretical and practical consequences, as it states that satisfying a certain combination of fairness measures is impossible. To date, this negative result has not yet been complemented with a positive one: a characterization of which combinations of fairness notions are possible. This work aims to fill this gap by identifying maximal sets of commonly used fairness measures that can be simultaneously satisfied. The fairness measures used are demographic parity, equal opportunity, false positive parity, predictive parity, predictive equality, overall accuracy equality and treatment equality. We conclude that in total 12 maximal sets of these fairness measures are possible, among which seven combinations of two measures, and five combinations of three measures. Our work raises interest questions regarding the practical relevance of each of these 12 maximal fairness notions in various scenarios. | 翻訳日:2023-04-14 16:48:51 公開日:2023-04-12 |
# ロボットマニピュレーションのためのロバスト強化学習を支援する実時間シミュレーションの固有の確率性 Exploiting Intrinsic Stochasticity of Real-Time Simulation to Facilitate Robust Reinforcement Learning for Robot Manipulation ( http://arxiv.org/abs/2304.06056v1 ) ライセンス: Link先を確認 | Ram Dershan, Amir M. Soufi Enayati, Zengjie Zhang, Dean Richert, and Homayoun Najjaran | (参考訳) シミュレーションは、実世界で実装される前に強化学習(RL)に不可欠であり、特にロボット操作のような安全クリティカルな応用に必要である。
従来のRLエージェントは、シミュレーションと実世界の相違(sim-to-real gap)に敏感である。
本研究は,ロボット操作タスクなどの実用化におけるシム・トゥ・リアル問題に対する新たな視点を提供する。 Simulation is essential to reinforcement learning (RL) before implementation in the real world, especially for safety-critical applications like robot manipulation. Conventionally, RL agents are sensitive to the discrepancies between the simulation and the real world, known as the sim-to-real gap. The application of domain randomization, a technique used to fill this gap, is limited to the imposition of heuristic-randomized models. We investigate the properties of intrinsic stochasticity of real-time simulation (RT-IS) of off-the-shelf simulation software and its potential to improve the robustness of RL methods and the performance of domain randomization. Firstly, we conduct analytical studies to measure the correlation of RT-IS with the occupation of the computer hardware and validate its comparability with the natural stochasticity of a physical robot. Then, we apply the RT-IS feature in the training of an RL agent. The simulation and physical experiment results verify the feasibility and applicability of RT-IS to robust RL agent design for robot manipulation tasks. The RT-IS-powered robust RL agent outperforms conventional RL agents on robots with modeling uncertainties. It requires fewer heuristic randomization and achieves better generalizability than the conventional domain-randomization-powered agents. Our findings provide a new perspective on the sim-to-real problem in practical applications like robot manipulation tasks. | 翻訳日:2023-04-14 16:48:35 公開日:2023-04-12 |
# ロボットマニピュレーションのためのオフライン強化学習における爆発的対称性とヒューリスティックな実証 Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation ( http://arxiv.org/abs/2304.06055v1 ) ライセンス: Link先を確認 | Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, and Homayoun Najjaran | (参考訳) 強化学習は多くの領域で制御ポリシーを自動構築する上で大きな可能性を示すが、次元の呪いによるロボット操作タスクに適用した場合の効率は低い。
本研究の結果は, 一般的な操作作業におけるモデルフリー強化学習の改善を実証するために, 実演回数の影響と行動クローニングの規模を定量化するものである。
提案手法と従来の非政治強化学習アルゴリズムとの比較研究は,アプリケーションにおける学習性能と潜在的価値の利点を示している。 Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications. | 翻訳日:2023-04-14 16:48:15 公開日:2023-04-12 |
# 自己遮蔽深層学習モデルに基づく地すべり感受性予測モデル Landslide Susceptibility Prediction Modeling Based on Self-Screening Deep Learning Model ( http://arxiv.org/abs/2304.06054v1 ) ライセンス: Link先を確認 | Li Zhu, Lekai Liu, Changshi Yu | (参考訳) 地すべりの感受性予測は、常に重要かつ困難なコンテンツである。
しかし, 地すべり試料の誤差や環境要因間の複雑な非線形関係など, 不確実性モデリングにおいて解決すべき問題がいくつかある。
The SGCN-LSTM model was applied to landslide susceptibility prediction in Anyuan County, Jiangxi Province, China, and compared with Cascade-parallel Long Short-Term Memory and Conditional Random Fields (CPLSTM-CRF), Random Forest (RF), Support Vector Machine (SVM), Stochastic Gradient Descent (SGD) and Logistic Regression (LR) models.The landslide prediction experiment in Anyuan County showed that the total accuracy and AUC of SGCN-LSTM model were the highest among the six models, and the total accuracy reached 92.38 %, which was 5.88%, 12.44%, 19.65%, 19.92% and 20.34% higher than those of CPLSTM-CRF, RF, SVM, SGD and LR models, respectively.
AUCの値は 0.9782 に達し、0.0305,0.0532,0.1875,0.1909, 0.1829 であった。
従来の機械学習と比較して,本論文で提案するSGCN-LSTMモデルは地すべり予測精度が高く,ロバスト性も良好であり,LSP分野への応用可能性も高い。 Landslide susceptibility prediction has always been an important and challenging content. However, there are some uncertain problems to be solved in susceptibility modeling, such as the error of landslide samples and the complex nonlinear relationship between environmental factors. A self-screening graph convolutional network and long short-term memory network (SGCN-LSTM) is proposed int this paper to overcome the above problems in landslide susceptibility prediction. The SGCN-LSTM model has the advantages of wide width and good learning ability. The landslide samples with large errors outside the set threshold interval are eliminated by self-screening network, and the nonlinear relationship between environmental factors can be extracted from both spatial nodes and time series, so as to better simulate the nonlinear relationship between environmental factors. The SGCN-LSTM model was applied to landslide susceptibility prediction in Anyuan County, Jiangxi Province, China, and compared with Cascade-parallel Long Short-Term Memory and Conditional Random Fields (CPLSTM-CRF), Random Forest (RF), Support Vector Machine (SVM), Stochastic Gradient Descent (SGD) and Logistic Regression (LR) models.The landslide prediction experiment in Anyuan County showed that the total accuracy and AUC of SGCN-LSTM model were the highest among the six models, and the total accuracy reached 92.38 %, which was 5.88%, 12.44%, 19.65%, 19.92% and 20.34% higher than those of CPLSTM-CRF, RF, SVM, SGD and LR models, respectively. The AUC value reached 0.9782, which was 0.0305,0.0532,0.1875,0.1909 and 0.1829 higher than the other five models, respectively. In conclusion, compared with some existing traditional machine learning, the SGCN-LSTM model proposed in this paper has higher landslide prediction accuracy and better robustness, and has a good application prospect in the LSP field. | 翻訳日:2023-04-14 16:47:52 公開日:2023-04-12 |
# TextANIMAR:テキストベースの3D動物の細粒度検索 TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval ( http://arxiv.org/abs/2304.06053v1 ) ライセンス: Link先を確認 | Trung-Nghia Le, Tam V. Nguyen c, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, E-Ro Nguyen, Minh-Khoi Nguyen-Nhat, Tuan-An To, Trung-Truc Huynh-Le, Nham-Tan Nguyen, Hoang-Chau Luong, Truong Hoai Phong, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Hai-Dang Nguyen, Akihiro Sugimoto, Minh-Triet Tran | (参考訳) 3Dオブジェクトの検索は重要な課題だが、近年はますます注目を集めている。
私たちは3dオブジェクト検索の境界を押し上げ、視覚言語技術によるよりユーザーフレンドリーなインタラクションを促進することができると信じています。 3D object retrieval is an important yet challenging task, which has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D animal models. Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval. Despite the increased difficulty, we believe that this task has the potential to drive useful applications in practice and facilitate more intuitive interactions with 3D objects. Five groups participated in our competition, submitting a total of 114 runs. While the results obtained in our competition are satisfactory, we note that the challenges presented by this task are far from being fully solved. As such, we provide insights into potential areas for future research and improvements. We believe that we can help push the boundaries of 3D object retrieval and facilitate more user-friendly interactions via vision-language technologies. | 翻訳日:2023-04-14 16:47:21 公開日:2023-04-12 |
# コンフォーマル予測とコンフォーマルリスク制御による信頼度物体検出:鉄道信号への応用 Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling ( http://arxiv.org/abs/2304.06052v1 ) ライセンス: Link先を確認 | L\'eo And\'eol (IMT, ANITI), Thomas Fel, Florence De Grancey, Luca Mossina | (参考訳) 現実世界の認定システムへのディープラーニングモデルのデプロイには、不確実性を正確に反映する信頼性評価機能が必要である。
本研究は,モデル性能を評価するための共形予測フレームワークの可能性を示し,正式に保証された不確実性境界を達成するための実践的ガイダンスを提供する。 Deploying deep learning models in real-world certified systems requires the ability to provide confidence estimates that accurately reflect their uncertainty. In this paper, we demonstrate the use of the conformal prediction framework to construct reliable and trustworthy predictors for detecting railway signals. Our approach is based on a novel dataset that includes images taken from the perspective of a train operator and state-of-the-art object detectors. We test several conformal approaches and introduce a new method based on conformal risk control. Our findings demonstrate the potential of the conformal prediction framework to evaluate model performance and provide practical guidance for achieving formally guaranteed uncertainty bounds. | 翻訳日:2023-04-14 16:47:01 公開日:2023-04-12 |
# open-transmind:1st foundation model challenge of intelligent transportationの新しいベースラインとベンチマーク Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation ( http://arxiv.org/abs/2304.06051v1 ) ライセンス: Link先を確認 | Yifeng Shi and Feng Lv and Xinliang Wang and Chunlong Xia and Shaojie Li and Shujie Yang and Teng Xi and Gang Zhang | (参考訳) 近年、コンピューティングパワーとディープラーニングアルゴリズムの継続的な改善により、基盤モデルの人気が高まっている。
ソースコードはhttps://github.com/Traffic-X/Open-TransMind.comで公開しています。 With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: few shots, poor generalization, and a lack of multi-modal techniques. Foundation model technology can significantly alleviate the aforementioned issues. To address these, we designed the 1st Foundation Model Challenge, with the goal of increasing the popularity of foundation model technology in traffic scenarios and promoting the rapid development of the intelligent transportation industry. The challenge is divided into two tracks: all-in-one and cross-modal image retrieval. Furthermore, we provide a new baseline and benchmark for the two tracks, called Open-TransMind. According to our knowledge, Open-TransMind is the first open-source transportation foundation model with multi-task and multi-modal capabilities. Simultaneously, Open-TransMind can achieve state-of-the-art performance on detection, classification, and segmentation datasets of traffic scenarios. Our source code is available at https://github.com/Traffic-X/Open-TransMind. | 翻訳日:2023-04-14 16:46:49 公開日:2023-04-12 |
# 新型コロナウイルス(covid-19)がオンラインゲームとレッスン配信の領域をどのように形成したかの分析 An Analysis of How COVID-19 Shaped the Realm of Online Gaming and Lesson Delivery ( http://arxiv.org/abs/2304.06102v1 ) ライセンス: Link先を確認 | Yingwei Cheng and Nicholas Milikich | (参考訳) 新型コロナウイルス(covid-19)のパンデミックにより、学校や大学はリモート学習に適応せざるを得なくなり、オンラインゲームは教育のツールとして登場した。
パンデミックが教育を混乱させ続ける中、オンラインゲームは教師や学生にとってますます重要なツールになりつつある。 The COVID-19 pandemic has forced schools and universities to adapt to remote learning, and online gaming has emerged as a tool for education. Educational games can make learning fun and engaging, help students develop important skills like problem-solving and collaboration, and reach students who are struggling with traditional learning methods. While there are concerns about the potential drawbacks of online gaming in education, its benefits are clear. As the pandemic continues to disrupt education, online gaming is likely to become an increasingly important tool for teachers and students alike. | 翻訳日:2023-04-14 16:40:48 公開日:2023-04-12 |
# 次元性低減と教師付き機械学習に基づく宇宙密度場の高速エミュレーション Fast emulation of cosmological density fields based on dimensionality reduction and supervised machine-learning ( http://arxiv.org/abs/2304.06099v1 ) ライセンス: Link先を確認 | Miguel Concei\c{c}\~ao, Alberto Krone-Martins, Antonio da Silva, \'Angeles Molin\'e | (参考訳) N体シミュレーションは、大規模構造の非線形進化を研究する最も強力な方法である。
この方法は、単一の自由パラメータエミュレーションに対してそれぞれ$\sim 1\%$と$\sim 3\%$と$\sim 5\%$と$\sim 15\%$の2つの自由パラメータでパワースペクトルとバイスペクトラムを再現しながら、完全なn体シミュレーションを行うよりも3桁のcpu実行時間が得られる。
これにより、様々な宇宙モデルに対する密度立方体の生成が大幅に加速し、ESA/NASAのユークリッドミッションのような完全な調査スケールでのパラメータやモデル推論など、これまで実現できなかった応用に扉を開くことができる。 N-body simulations are the most powerful method to study the non-linear evolution of large-scale structure. However, they require large amounts of computational resources, making unfeasible their direct adoption in scenarios that require broad explorations of parameter spaces. In this work, we show that it is possible to perform fast dark matter density field emulations with competitive accuracy using simple machine-learning approaches. We build an emulator based on dimensionality reduction and machine learning regression combining simple Principal Component Analysis and supervised learning methods. For the estimations with a single free parameter, we train on the dark matter density parameter, $\Omega_m$, while for emulations with two free parameters, we train on a range of $\Omega_m$ and redshift. The method first adopts a projection of a grid of simulations on a given basis; then, a machine learning regression is trained on this projected grid. Finally, new density cubes for different cosmological parameters can be estimated without relying directly on new N-body simulations by predicting and de-projecting the basis coefficients. We show that the proposed emulator can generate density cubes at non-linear cosmological scales with density distributions within a few percent compared to the corresponding N-body simulations. The method enables gains of three orders of magnitude in CPU run times compared to performing a full N-body simulation while reproducing the power spectrum and bispectrum within $\sim 1\%$ and $\sim 3\%$, respectively, for the single free parameter emulation and $\sim 5\%$ and $\sim 15\%$ for two free parameters. This can significantly accelerate the generation of density cubes for a wide variety of cosmological models, opening the doors to previously unfeasible applications, such as parameter and model inferences at full survey scales as the ESA/NASA Euclid mission. | 翻訳日:2023-04-14 16:40:37 公開日:2023-04-12 |
# エネルギー誘導型エントロピー神経輸送 Energy-guided Entropic Neural Optimal Transport ( http://arxiv.org/abs/2304.06094v1 ) ライセンス: Link先を確認 | Petr Mokrov and Alexander Korotin and Evgeny Burnaev | (参考訳) エネルギーベースモデル(EBM)は、機械学習コミュニティで数十年にわたって知られている。
エネルギポテンシャル(英語版) (unnormalized chance function) を用いて生成的モデリング問題を解決する効率的な方法が数多く現れている。
本研究では,EBMとEntropy-regularized OTのギャップを埋める。
本手法を玩具の2次元シナリオに適用し, 画像対画像変換の標準問題にも適用できることを確認した。
単純さのため、我々はエネルギー誘導型エントロピーOT法のバックボーンとして、単純な短・長周期のEMMを選択し、より洗練されたEMMを将来の研究に活用する。 Energy-Based Models (EBMs) are known in the Machine Learning community for the decades. Since the seminal works devoted to EBMs dating back to the noughties there have been appearing a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present the novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. We validate the applicability of our method on toy 2D scenarios as well as standard unpaired image-to-image translation problems. For the sake of simplicity, we choose simple short- and long- run EBMs as a backbone of our Energy-guided Entropic OT method, leaving the application of more sophisticated EBMs for future research. | 翻訳日:2023-04-14 16:40:01 公開日:2023-04-12 |
# 原子と分子のマルチチャネル量子散乱のためのハイブリッド量子古典アルゴリズム A hybrid quantum-classical algorithm for multichannel quantum scattering of atoms and molecules ( http://arxiv.org/abs/2304.06089v1 ) ライセンス: Link先を確認 | Xiaodong Xing, Alejandro Gomez Cadavid, Artur F. Izmaylov and Timur V. Tscherbul | (参考訳) 原子・分子衝突の時間非依存schr\"odinger方程式を解くためのハイブリッド量子古典アルゴリズムを提案する。
古典的アルゴリズム(対称行列反転)の計算ボトルネックは、線形方程式のシステムを解くために最近開発されたノイズの多い中間スケール量子 (NISQ) アルゴリズムである変分量子線形解法 (VQLS) を用いて解決される。
以上の結果から, NISQ量子プロセッサ上での散乱断面積と分子衝突率の計算が可能であることが示され, 気相二分子衝突のスケーラブルなディジタル量子計算の可能性と, 天文学と超低温化学との関係が示唆された。 We propose a hybrid quantum-classical algorithm for solving the time-independent Schr\"odinger equation for atomic and molecular collisions. The algorithm is based on the $S$-matrix version of the Kohn variational principle, which computes the fundamental scattering $S$-matrix by inverting the Hamiltonian matrix expressed in the basis of square-integrable functions. The computational bottleneck of the classical algorithm -- symmetric matrix inversion -- is addressed here using the variational quantum linear solver (VQLS), a recently developed noisy intermediate-scale quantum (NISQ) algorithm for solving systems of linear equations. We apply our algorithm to single and multichannel quantum scattering problems, obtaining accurate vibrational relaxation probabilities in collinear atom-molecule collisions. We also show how the algorithm could be scaled up to simulate collisions of large polyatomic molecules. Our results demonstrate that it is possible to calculate scattering cross sections and rates for complex molecular collisions on NISQ quantum processors, opening up the possibility of scalable digital quantum computation of gas-phase bimolecular collisions and reactions of relevance to astrochemistry and ultracold chemistry. | 翻訳日:2023-04-14 16:39:42 公開日:2023-04-12 |
# トランスモンカップラを用いた高周波フレキシブル2量子フルクソニウムゲート High-Fidelity, Frequency-Flexible Two-Qubit Fluxonium Gates with a Transmon Coupler ( http://arxiv.org/abs/2304.06087v1 ) ライセンス: Link先を確認 | Leon Ding, Max Hays, Youngkyu Sung, Bharath Kannan, Junyoung An, Agustin Di Paolo, Amir H. Karamlou, Thomas M. Hazard, Kate Azar, David K. Kim, Bethany M. Niedzielski, Alexander Melville, Mollie E. Schwartz, Jonilyn L. Yoder, Terry P. Orlando, Simon Gustavsson, Jeffrey A. Grover, Kyle Serniak, William D. Oliver | (参考訳) トランスモンカプラ (ftf, for fluxonium-transmon-fluxonium) を介する2量子ビットゲートのアーキテクチャを提案する。
最後に, 平均ゲート忠実度を99.922\pm0.009\%$まで向上させるため, パルスパラメータのモデルフリー強化学習を行った。
ここではマイクロ波活性化CZゲートの他に、FTFは様々なフラキソニウムゲートスキームにも適用でき、ゲート特性を改善し、不要な$ZZ$相互作用を受動的に低減することができる。 We propose and demonstrate an architecture for fluxonium-fluxonium two-qubit gates mediated by transmon couplers (FTF, for fluxonium-transmon-fluxonium). Relative to architectures that exclusively rely on a direct coupling between fluxonium qubits, FTF enables stronger couplings for gates using non-computational states while simultaneously suppressing the static controlled-phase entangling rate ($ZZ$) down to kHz levels, all without requiring strict parameter matching. Here we implement FTF with a flux-tunable transmon coupler and demonstrate a microwave-activated controlled-Z (CZ) gate whose operation frequency can be tuned over a 2 GHz range, adding frequency allocation freedom for FTF's in larger systems. Across this range, state-of-the-art CZ gate fidelities were observed over many bias points and reproduced across the two devices characterized in this work. After optimizing both the operation frequency and the gate duration, we achieved peak CZ fidelities in the 99.85-99.9\% range. Finally, we implemented model-free reinforcement learning of the pulse parameters to boost the mean gate fidelity up to $99.922\pm0.009\%$, averaged over roughly an hour between scheduled training runs. Beyond the microwave-activated CZ gate we present here, FTF can be applied to a variety of other fluxonium gate schemes to improve gate fidelities and passively reduce unwanted $ZZ$ interactions. | 翻訳日:2023-04-14 16:39:21 公開日:2023-04-12 |
# ホログラフィック多部絡み合い尺度の分類に向けて Towards classification of holographic multi-partite entanglement measures ( http://arxiv.org/abs/2304.06082v1 ) ライセンス: Link先を確認 | Abhijit Gadde, Vineeth Krishna, Trakshu Sharma | (参考訳) 本稿では, ホログラム双対のプローブ近似で計算可能な測度を構築することを目的として, マルチパーティ・エンタングルメントの測度を体系的に研究する。
ホログラムの双対は、レプリカ対称性が大まかに保たれていないと仮定し、2d$ cftで明示的な計算で処方薬をチェックします。
我々は、レプリカ対称性の仮定と、既に知られている絡み合いの方法、例えば絡み合いの負性や反射エントロピーが我々の枠組みにどのように適合するかについて議論する。 In this paper, we systematically study measures of multi-partite entanglement with the aim of constructing measures that can be computed in probe approximation in the holographic dual. We classify and count general measures as invariants of local unitary transformations. After formulating these measures in terms of permutation group elements, we derive conditions that a probe measure should satisfy and find a large class of solutions. These solutions are generalizations of the multi-entropy introduced in arXiv:2206.09723 . We derive their holographic dual with the assumption that the replica symmetry is unbroken in the bulk and check our prescription with explicit computations in $2d$ CFTs. We discuss the replica symmetry assumption and also how the already known entanglement measures, such as entanglement negativity and reflected entropy fit in our framework. | 翻訳日:2023-04-14 16:38:48 公開日:2023-04-12 |
# ツリーテンソルネットワークを用いた長距離量子多体力学の数値シミュレーション Numerical simulations of long-range open quantum many-body dynamics with tree tensor networks ( http://arxiv.org/abs/2304.06075v1 ) ライセンス: Link先を確認 | Dominik Sulz, Christian Lubich, Gianluca Ceruti, Igor Lesanovsky, Federico Carollo | (参考訳) オープン量子系は、量子効果、多体相互作用、散逸過程の競合から生じる集合的挙動の探索に概念的にシンプルな設定を提供する。
本研究では,パワーロー減衰相互作用を持つ散逸型イジングモデルを用いて,パワーロー指数の1次位相遷移のシグネチャを観測する。 Open quantum systems provide a conceptually simple setting for the exploration of collective behavior stemming from the competition between quantum effects, many-body interactions, and dissipative processes. They may display dynamics distinct from that of closed quantum systems or undergo nonequilibrium phase transitions which are not possible in classical settings. However, studying open quantum many-body dynamics is challenging, in particular in the presence of critical long-range correlations or long-range interactions. Here, we make progress in this direction and introduce a numerical method for open quantum systems, based on tree tensor networks. Such a structure is expected to improve the encoding of many-body correlations and we adopt an integration scheme suited for long-range interactions and applications to dissipative dynamics. We test the method using a dissipative Ising model with power-law decaying interactions and observe signatures of a first-order phase transition for power-law exponents smaller than one. | 翻訳日:2023-04-14 16:38:34 公開日:2023-04-12 |
# 円錐交差検出のためのハイブリッド量子アルゴリズム A hybrid quantum algorithm to detect conical intersections ( http://arxiv.org/abs/2304.06070v1 ) ライセンス: Link先を確認 | Emiel Koridon, Joana Fraxanet, Alexandre Dauphin, Lucas Visscher, Thomas E. O'Brien, Stefano Polla | (参考訳) 円錐交差は、光異性化や非放射緩和のような化学過程において重要な役割を果たすことが知られている分子ハミルトニアンのポテンシャルエネルギー面間の位相的に保護された交差である。
最後に、ベリー位相は2つの離散値(0 または $\pi$)しか取ることができないので、定数で区切られた累積誤差であっても、この手順は成功する。
フォーマルジミン分子 (\ce{H2C=NH}) の小さな玩具モデルへのアルゴリズムの適用を数値的に示す。 Conical intersections are topologically protected crossings between the potential energy surfaces of a molecular Hamiltonian, known to play an important role in chemical processes such as photoisomerization and non-radiative relaxation. They are characterized by a non-zero Berry phase, which is a topological invariant defined on a closed path in atomic coordinate space, taking the value $\pi$ when the path encircles the intersection manifold. In this work, we show that for real molecular Hamiltonians, the Berry phase can be obtained by tracing a local optimum of a variational ansatz along the chosen path and estimating the overlap between the initial and final state with a control-free Hadamard test. Moreover, by discretizing the path into $N$ points, we can use $N$ single Newton-Raphson steps to update our state non-variationally. Finally, since the Berry phase can only take two discrete values (0 or $\pi$), our procedure succeeds even for a cumulative error bounded by a constant; this allows us to bound the total sampling cost and to readily verify the success of the procedure. We demonstrate numerically the application of our algorithm on small toy models of the formaldimine molecule (\ce{H2C=NH}). | 翻訳日:2023-04-14 16:38:12 公開日:2023-04-12 |
# 3dシーンにおける質問応答のためのクリップ誘導視覚言語事前学習 CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes ( http://arxiv.org/abs/2304.06061v1 ) ライセンス: Link先を確認 | Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann | (参考訳) 言語知識と視覚概念を2次元画像から3次元世界理解に適用するためのトレーニングモデルは、研究者が最近探求を始めたばかりである。
実験による定量的・定性的な結果から,本手法は最先端の作業よりも優れており,3dシーンの特徴を解釈可能な表現へと導く。 Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power of the popular CLIP model into our 3D encoder by aligning the encoded 3D scene features with the corresponding 2D image and text embeddings produced by CLIP. To assess our model's 3D world reasoning capability, we evaluate it on the downstream task of 3D Visual Question Answering. Experimental quantitative and qualitative results show that our pre-training method outperforms state-of-the-art works in this task and leads to an interpretable representation of 3D scene features. | 翻訳日:2023-04-14 16:37:51 公開日:2023-04-12 |
# 低分解能赤外線アレイを用いたプライバシー保護のための効率的な深層学習モデル Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays ( http://arxiv.org/abs/2304.06059v1 ) ライセンス: Link先を確認 | Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari | (参考訳) 超低解像度赤外線(ir)アレイセンサーは、人計数のための低コスト、エネルギー効率、プライバシー保護ソリューションを提供する。
しかし、これらの文献では、irアレイに基づく様々な効率的なdlアーキテクチャの比較分析が欠落しており、その精度だけでなく、メモリやエネルギー制約のあるiot(internet of things)エッジノードへのデプロイコストも考慮されている。
当社のモデルはすべて,MCUベースのIoTノード上で,バッテリ充電なしで数年間の自律運用が可能な,継続的かつリアルタイムな推論を実現しています。 Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging. | 翻訳日:2023-04-14 16:37:32 公開日:2023-04-12 |
# ラベルフリー概念ボトルネックモデル Label-Free Concept Bottleneck Models ( http://arxiv.org/abs/2304.06129v1 ) ライセンス: Link先を確認 | Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng | (参考訳) 概念ボトルネックモデル(CBM)は、隠れた層ニューロンが人間の理解可能な概念に対応することによって、より解釈可能なニューラルネットワークを作成する一般的な方法である。
しかし、既存のCBMとその変種には2つの重要な制限がある: まず、事前に定義された概念のそれぞれについてラベル付きデータを収集する必要がある。
スケーラブル - イメージネットにスケールした最初のcbmを表示し、効率的 - cbmを作成するには、非常に大きなデータセットであっても数時間しかかからず、自動化 - 新たなデータセットのためにトレーニングするには、最小限の人的労力が必要です。
私たちのコードはhttps://github.com/Trustworthy-ML-Lab/Label-free-CBMで利用可能です。 Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Our code is available at https://github.com/Trustworthy-ML-Lab/Label-free-CBM. | 翻訳日:2023-04-14 16:28:49 公開日:2023-04-12 |
# 実環境におけるディープフェイク検出のための評価フレームワーク Assessment Framework for Deepfake Detection in Real-world Situations ( http://arxiv.org/abs/2304.06125v1 ) ライセンス: Link先を確認 | Yuhang Lu and Touradj Ebrahimi | (参考訳) 画像やビデオにおけるデジタル顔操作の検出は、公衆の信頼を損なう可能性があるため、広く注目を集めている。
本報告では, フレームワークの有効性と利用を実証するために, 3つの一般的なディープフェイク検出手法の広範な実験と詳細な解析を行った。
さらに,現実的な処理操作によって駆動される確率的分解に基づくデータ拡張法を考案し,ディープフェイク検出器のロバスト性を大幅に向上させる。 Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of three popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors. | 翻訳日:2023-04-14 16:28:31 公開日:2023-04-12 |
# followme:自動運転車の設定における車両挙動予測 FollowMe: Vehicle Behaviour Prediction in Autonomous Vehicle Settings ( http://arxiv.org/abs/2304.06121v1 ) ライセンス: Link先を確認 | Abduallah Mohamed, Jundi Liu, Linda Ng Boyle, Christian Claudel | (参考訳) 仮想リード車両計画ルートに続くエゴ車両は、自律車と非自律車との相互作用において必須の要素である。
本研究では,運転者が先頭車両に追従する能力について,後者の質問に答えることで行動・行動予測問題を実現する,新しいデータセットである followme dataset を提案する。
先行運動予測モデルと比較し,先行運動予測モデルでは,先行車両に追従する状況に対応するための異なる設計機構が必要であることを示した。 An ego vehicle following a virtual lead vehicle planned route is an essential component when autonomous and non-autonomous vehicles interact. Yet, there is a question about the driver's ability to follow the planned lead vehicle route. Thus, predicting the trajectory of the ego vehicle route given a lead vehicle route is of interest. We introduce a new dataset, the FollowMe dataset, which offers a motion and behavior prediction problem by answering the latter question of the driver's ability to follow a lead vehicle. We also introduce a deep spatio-temporal graph model FollowMe-STGCNN as a baseline for the dataset. In our experiments and analysis, we show the design benefits of FollowMe-STGCNN in capturing the interactions that lie within the dataset. We contrast the performance of FollowMe-STGCNN with prior motion prediction models showing the need to have a different design mechanism to address the lead vehicle following settings. | 翻訳日:2023-04-14 16:28:10 公開日:2023-04-12 |
# サリエンシマップによる顔認識の解説 Explanation of Face Recognition via Saliency Maps ( http://arxiv.org/abs/2304.06118v1 ) ライセンス: Link先を確認 | Yuhang Lu and Touradj Ebrahimi | (参考訳) 過去数年間の顔認識の著しい進歩にもかかわらず、それらはしばしば「ブラックボックス」として扱われ、説明性に欠けるとして批判されてきた。
さらに,一般的な視覚情報量に基づくXFR法の信頼性と精度を体系的に評価する手法を提案する。 Despite the significant progress in face recognition in the past years, they are often treated as "black boxes" and have been criticized for lacking explainability. It becomes increasingly important to understand the characteristics and decisions of deep face recognition systems to make them more acceptable to the public. Explainable face recognition (XFR) refers to the problem of interpreting why the recognition model matches a probe face with one identity over others. Recent studies have explored use of visual saliency maps as an explanation, but they often lack a deeper analysis in the context of face recognition. This paper starts by proposing a rigorous definition of explainable face recognition (XFR) which focuses on the decision-making process of the deep recognition model. Following the new definition, a similarity-based RISE algorithm (S-RISE) is then introduced to produce high-quality visual saliency maps. Furthermore, an evaluation approach is proposed to systematically validate the reliability and accuracy of general visual saliency-based XFR methods. | 翻訳日:2023-04-14 16:27:56 公開日:2023-04-12 |
# autoshot: 短いビデオデータセットと最先端のショット境界検出 AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection ( http://arxiv.org/abs/2304.06116v1 ) ライセンス: Link先を確認 | Wentao Zhu, Yufang Huang, Xiufeng Xie, Wenxian Liu, Jincan Deng, Debing Zhang, Zhangyang Wang, Ji Liu | (参考訳) ショートフォームビデオは爆発的に人気を博し、新しいソーシャルメディアのトレンドを支配した。
一般的なショートビデオプラットフォームである~\textit{e.g.}、Kuaishou (Kwai)、TikTok、Instagram Reels、YouTube Shortsは、コンテンツの消費と作成方法を変えた。
本研究では,853の完全なショートビデオと11,606のショットアノテーションと,200のテストビデオに2,716の高品質なショット境界アノテーションを備えるSHOTという,新しい公開Short Video sHot bOundary deTectionデータセットをリリースする。
このデータ富を生かして、様々な高度な3D ConvNetとTransformerをカプセル化した検索空間でニューラルアーキテクチャ検索を行うことにより、ビデオSBDのモデル設計を最適化することを提案する。
SHOTデータセットとコードはhttps://github.com/wentaozhu/AutoShot.gitで見ることができる。 The short-form videos have explosive popularity and have dominated the new social media trends. Prevailing short-video platforms,~\textit{e.g.}, Kuaishou (Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way we consume and create content. For video content creation and understanding, the shot boundary detection (SBD) is one of the most essential components in various scenarios. In this work, we release a new public Short video sHot bOundary deTection dataset, named SHOT, consisting of 853 complete short videos and 11,606 shot annotations, with 2,716 high quality shot boundary annotations in 200 test videos. Leveraging this new data wealth, we propose to optimize the model design for video SBD, by conducting neural architecture search in a search space encapsulating various advanced 3D ConvNets and Transformers. Our proposed approach, named AutoShot, achieves higher F1 scores than previous state-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when being derived and evaluated on our newly constructed SHOT dataset. Moreover, to validate the generalizability of the AutoShot architecture, we directly evaluate it on another three public datasets: ClipShots, BBC and RAI, and the F1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%, 0.9% and 1.2%, respectively. The SHOT dataset and code can be found in https://github.com/wentaozhu/AutoShot.git . | 翻訳日:2023-04-14 16:27:40 公開日:2023-04-12 |
# TopTrack: トップからオブジェクトを追跡する TopTrack: Tracking Objects By Their Top ( http://arxiv.org/abs/2304.06114v1 ) ライセンス: Link先を確認 | Jacob Meilleur and Guillaume-Alexandre Bilodeau | (参考訳) 近年,Multi-object Tracking(MOT)タスクに対処する方法として,共同検出・追跡パラダイムが広く用いられている。
TopTrackは、2つのMOTベンチマークで、他の最先端トラッカーと競合する結果を達成している。 In recent years, the joint detection-and-tracking paradigm has been a very popular way of tackling the multi-object tracking (MOT) task. Many of the methods following this paradigm use the object center keypoint for detection. However, we argue that the center point is not optimal since it is often not visible in crowded scenarios, which results in many missed detections when the objects are partially occluded. We propose TopTrack, a joint detection-and-tracking method that uses the top of the object as a keypoint for detection instead of the center because it is more often visible. Furthermore, TopTrack processes consecutive frames in separate streams in order to facilitate training. We performed experiments to show that using the object top as a keypoint for detection can reduce the amount of missed detections, which in turn leads to more complete trajectories and less lost trajectories. TopTrack manages to achieve competitive results with other state-of-the-art trackers on two MOT benchmarks. | 翻訳日:2023-04-14 16:27:10 公開日:2023-04-12 |
# PATMAT: 顔ペンキ用マスク対応変圧器のチューニングを意識した人 PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting ( http://arxiv.org/abs/2304.06107v1 ) ライセンス: Link先を確認 | Saman Motamed and Jianjin Xu and Chen Henry Wu and Fernando De la Torre | (参考訳) StyleGAN2やStable Diffusionのような生成モデルは、画像合成、塗装、ノイズ除去といったコンピュータビジョンタスクにおいて最先端のパフォーマンスを達成した。
本研究では,マスク・アウェア・トランスフォーマー (MAT) のPerson Aware Tuning (PAT) を提案する。
以上の結果から, PATMATはパーソナライズされた顔の塗り絵の質向上に有効である可能性が示唆された。 Generative models such as StyleGAN2 and Stable Diffusion have achieved state-of-the-art performance in computer vision tasks such as image synthesis, inpainting, and de-noising. However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. Our proposed method, PATMAT, effectively preserves identity by incorporating reference images of a subject and fine-tuning a MAT architecture trained on faces. By using ~40 reference images, PATMAT creates anchor points in MAT's style module, and tunes the model using the fixed anchors to adapt the model to a new face identity. Moreover, PATMAT's use of multiple images per anchor during training allows the model to use fewer reference images than competing methods. We demonstrate that PATMAT outperforms state-of-the-art models in terms of image quality, the preservation of person-specific details, and the identity of the subject. Our results suggest that PATMAT can be a promising approach for improving the quality of personalized face inpainting. | 翻訳日:2023-04-14 16:26:57 公開日:2023-04-12 |
# 遺伝的アルゴリズム(3dg-ga)による深部非同定匿名データセット拡張を用いた人工顔面薬物乱用画像の生成 Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA) ( http://arxiv.org/abs/2304.06106v1 ) ライセンス: Link先を確認 | Hazem Zein, Lou Laurent, R\'egis Fournier, Amine Nait-Ali | (参考訳) バイオメディカルリサーチと人工知能では、大規模でバランスのとれた、代表的データセットへのアクセスは、現実世界のシナリオで使用できる信頼できるアプリケーションを開発する上で不可欠である。
本研究は, 薬物乱用の特徴を強調することで, 極めてリアルな合成顔を作り出すことを提案する。
提案手法は「3DG-GA(Deep Dedentified anonymous Dataset Generation)」と呼ばれ、合成顔生成の戦略として遺伝的アルゴリズムを用いる。
データセットは科学コミュニティに開放され、法的または倫理的な制約を避けながら、結果の再現と生成されたデータセットの恩恵を受けることができます。 In biomedical research and artificial intelligence, access to large, well-balanced, and representative datasets is crucial for developing trustworthy applications that can be used in real-world scenarios. However, obtaining such datasets can be challenging, as they are often restricted to hospitals and specialized facilities. To address this issue, the study proposes to generate highly realistic synthetic faces exhibiting drug abuse traits through augmentation. The proposed method, called "3DG-GA", Deep De-identified anonymous Dataset Generation, uses Genetics Algorithm as a strategy for synthetic faces generation. The algorithm includes GAN artificial face generation, forgery detection, and face recognition. Initially, a dataset of 120 images of actual facial drug abuse is used. By preserving, the drug traits, the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuse images. The dataset will be open to the scientific community, which can reproduce our results and benefit from the generated datasets while avoiding legal or ethical restrictions. | 翻訳日:2023-04-14 16:26:32 公開日:2023-04-12 |
# 時間平均制約を考慮した制御系オンライン最適化のためのプライマル・ディダル・コンテクストベイズ最適化 Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints ( http://arxiv.org/abs/2304.06104v1 ) ライセンス: Link先を確認 | Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones | (参考訳) 本稿では,制約付き閉ループ制御システムのオンライン性能最適化の問題点について検討する。
本手法はガウシアンプロセスから採取したサンプルインスタンスと, 連続発振型原子炉パラメータチューニング問題の両方に適用し, シミュレーション結果から, ほぼ最適性能を同時に提供し, 平均的な制約実現性を維持することを示す。
これは、提示されたケーススタディに対する大きな累積的後悔または厳しい制約違反に苦しむ現在の最先端の手法とは対照的である。 This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented. | 翻訳日:2023-04-14 16:26:14 公開日:2023-04-12 |
# 拡散MRIにおける球デコンボリューションのための$E(3) \times SO(3)$-Equivariant Networks $E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI ( http://arxiv.org/abs/2304.06103v1 ) ライセンス: Link先を確認 | Axel Elaldi, Guido Gerig, Neel Dey | (参考訳) 本稿では,各ボクセルが球面信号を含むボリュームのスパースデコンボリューションのための$E(3)\times SO(3)$$同変フレームワークであるRoto-Translation Equivariant Spherical Deconvolution (RT-ESD)を提案する。
その結果、RT-ESDは、DiSCoデータセット上のファイバリカバリ、現実世界の \textit{in vivo} 人間の脳のdMRIにおけるデコンボリューション由来の部分体積推定、トラクトメーターデータセット上のファイバトラクトグラムの下流再構成の改善など、いくつかのタスクにまたがる以前の作業を改善した。
私たちの実装はhttps://github.com/axelelaldi/e3so3_convで利用可能です。 We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv | 翻訳日:2023-04-14 16:25:56 公開日:2023-04-12 |
# BarrierNetを用いた信号時間論理仕様からのロバストと正しいコントローラの学習 Learning Robust and Correct Controllers from Signal Temporal Logic Specifications Using BarrierNet ( http://arxiv.org/abs/2304.06160v1 ) ライセンス: Link先を確認 | Wenliang Liu, Wei Xiao, Calin Belta | (参考訳) 本稿では,信号時相論理(stl)仕様を満たすのに必要なシステムのためのニューラルネットワーク制御系を学習する問題を考察する。
シミュレーションの結果,提案手法は既存のアルゴリズムよりも満足度が高く,優れることがわかった。 In this paper, we consider the problem of learning a neural network controller for a system required to satisfy a Signal Temporal Logic (STL) specification. We exploit STL quantitative semantics to define a notion of robust satisfaction. Guaranteeing the correctness of a neural network controller, i.e., ensuring the satisfaction of the specification by the controlled system, is a difficult problem that received a lot of attention recently. We provide a general procedure to construct a set of trainable High Order Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulas in a fragment of STL. We use the BarrierNet, implemented by a differentiable Quadratic Program (dQP) with HOCBF constraints, as the last layer of the neural network controller, to guarantee the satisfaction of the STL formulas. We train the HOCBFs together with other neural network parameters to further improve the robustness of the controller. Simulation results demonstrate that our approach ensures satisfaction and outperforms existing algorithms. | 翻訳日:2023-04-14 16:20:01 公開日:2023-04-12 |
# コンフォメーション予測のためのポストセレクション推論:精度のためにカバレッジをトレードオフする Post-selection Inference for Conformal Prediction: Trading off Coverage for Precision ( http://arxiv.org/abs/2304.06158v1 ) ライセンス: Link先を確認 | Siddhaarth Sarkar, Arun Kumar Kuchibhotla | (参考訳) 共形推論は、有限サンプル保証付きブラックボックスml予測アルゴリズムの不確実性定量化に重要な役割を果たしている。
これにより、従来の共形推論と類似した有限サンプル保証を維持しながら、任意の選択の基準(予測セットのサイズなど)によって設定された予測の品質に対して、自由にカバー確率を交換することができる。 Conformal inference has played a pivotal role in providing uncertainty quantification for black-box ML prediction algorithms with finite sample guarantees. Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. In practical applications, one might want to update the miscoverage level after computing the prediction set. For example, in the context of binary classification, the analyst might start with a $95\%$ prediction sets and see that most prediction sets contain all outcome classes. Prediction sets with both classes being undesirable, the analyst might desire to consider, say $80\%$ prediction set. Construction of prediction sets that guarantee coverage with data-dependent miscoverage level can be considered as a post-selection inference problem. In this work, we develop uniform conformal inference with finite sample prediction guarantee with arbitrary data-dependent miscoverage levels using distribution-free confidence bands for distribution functions. This allows practitioners to trade freely coverage probability for the quality of the prediction set by any criterion of their choice (say size of prediction set) while maintaining the finite sample guarantees similar to traditional conformal inference. | 翻訳日:2023-04-14 16:19:46 公開日:2023-04-12 |
# ba$_6$cr$_2$s$_{10}$化合物の二量化、電子構造および磁気的性質:第一原理研究 Dimerisation, electronic structure, and magnetic properties in Ba$_6$Cr$_2$S$_{10}$ compounds: First principles studies ( http://arxiv.org/abs/2304.06156v1 ) ライセンス: Link先を確認 | Jianfeng Zhang, Hunching Yang, and Wei Wu | (参考訳) 準一次元系は、非常に豊かで興味深い物理学を示すことができるので興味深い。
ここでは、[Zhang, et al, Adv. Mat. 34 (12), 2106728 (2022)]に示す磁気構造と特性の実験結果と一致するBa$_6$Cr$_2$S$_{10}$の電子構造と磁気特性を計算するための第一原理計算を行った。
3) スピン鎖に沿った反強磁性相互作用により、スピンフラストレーションが進行し、スピン液体が形成される。 Quasi-one-dimensional systems are fascinating as they can exhibit very rich and interesting physics. The spin chain compound Ba$_6$Cr$_2$S$_{10}$ has been synthesised experimentally under extreme conditions recently, which has shown interesting magnetic and toroidal properties due to dimerisation. Here we have performed first principles calculations to compute the electronic structure and magnetic properties of Ba$_6$Cr$_2$S$_{10}$, which are consistent with the experimental results for the magnetic structure and properties shown in [Zhang, et al, Adv. Mat. 34 (12), 2106728 (2022)]. Moreover, based on our calculations, we can find more interesting physics, including (i) the small size of the Hubbard $U$ parameter that implies the screening effect of surrounding Ba atoms, (ii) the dimerisation of Cr atoms mainly induced by the sulfur ligands, and (iii) the next-nearest-neighbouring anti-ferromagnetic interaction along the spin chain, which could bring forward spin frustration, thus spin liquid. | 翻訳日:2023-04-14 16:19:28 公開日:2023-04-12 |
# 偽の科学的要約の検出 Detection of Fake Generated Scientific Abstracts ( http://arxiv.org/abs/2304.06148v1 ) ライセンス: Link先を確認 | Panagiotis C. Theocharopoulos, Panagiotis Anagnostou, Anastasia Tsoukala, Spiros V. Georgakopoulos, Sotiris K. Tasoulis and Vassilis P. Plagianakos | (参考訳) 大規模言語モデルと公開可能なChatGPTの普及は、人工知能を人々の日常生活に組み込む上で、大きな転換点となっている。
この研究を通じて、人工知能が生成するテキストの能力と限界に光を当てた。 The widespread adoption of Large Language Models and publicly available ChatGPT has marked a significant turning point in the integration of Artificial Intelligence into people's everyday lives. The academic community has taken notice of these technological advancements and has expressed concerns regarding the difficulty of discriminating between what is real and what is artificially generated. Thus, researchers have been working on developing effective systems to identify machine-generated text. In this study, we utilize the GPT-3 model to generate scientific paper abstracts through Artificial Intelligence and explore various text representation methods when combined with Machine Learning models with the aim of identifying machine-written text. We analyze the models' performance and address several research questions that rise during the analysis of the results. By conducting this research, we shed light on the capabilities and limitations of Artificial Intelligence generated text. | 翻訳日:2023-04-14 16:19:09 公開日:2023-04-12 |
# 量子ハードウェアを用いた高忠実度ダイマー励起 High-fidelity dimer excitations using quantum hardware ( http://arxiv.org/abs/2304.06146v1 ) ライセンス: Link先を確認 | Norhan M. Eassa, Joe Gibbs, Zoe Holmes, Andrew Sornborger, Lukasz Cincio, Gavin Hester, Paul Kairys, Mario Motta, Jeffrey Cohn, Arnab Banerjee | (参考訳) 多体量子スピン系は、非弾性中性子散乱(ins)実験で異なる励起スペクトルを持つ位相量子スピン液体のような創発的な現象を示す。
時間スケールの長期シミュレーションを前提とした深部回路を必要とする正準トロッタライゼーション法では, 量子ハードウェア上での長時間のダイナミックスを捉えるために, 短距離回路を用いた'direct' Resource-Efficient Fast-forwarding (REFF)測定を実演する。
2スピン相関係数の時間的発展は、中性子散乱断面積の重要な構成要素である力学構造因子 $s(\mathbf{q},\omega)$ の計算を可能にした。
現在の回路ハードウェアにおける我々の結果は、コストのかかるins実験のアウトプットをベンチマークし、あるいは予測するための重要な手段となります。 Many-body entangled quantum spin systems exhibit emergent phenomena such as topological quantum spin liquids with distinct excitation spectra accessed in inelastic neutron scattering (INS) experiments. Here we simulate the dynamics of a quantum spin dimer, the basic quantum unit of emergent many-body spin systems. While canonical Trotterization methods require deep circuits precluding long time-scale simulations, we demonstrate 'direct' Resource-Efficient Fast-forwarding (REFF) measurements with short-depth circuits that can be used to capture longer time dynamics on quantum hardware. The temporal evolution of the 2-spin correlation coefficients enabled the calculation of the dynamical structure factor $S(\mathbf{Q},\omega)$ - the key component of the neutron scattering cross-section. We simulate the triplet gap and the triplet splitting of the quantum dimer with sufficient fidelity to compare to experimental neutron data. Our results on current circuit hardware pave an important avenue to benchmark, or even predict, the outputs of the costly INS experiments. | 翻訳日:2023-04-14 16:18:57 公開日:2023-04-12 |
# RのためのGrowclusters Package The growclusters Package for R ( http://arxiv.org/abs/2304.06145v1 ) ライセンス: Link先を確認 | Randall Powers, Wendy Martinez, and Terrance Savitsky | (参考訳) R用のGrowclustersパッケージは、k-meansクラスタリングの拡張バージョンを実装しており、単一のグローバルパーティションから各クラスタを引き出すデータセットの集合に対するローカルクラスタリングやパーティションの発見を可能にする。
本稿では、growclustersパッケージの動作と機能を視覚的に説明するために設計されたr shinyアプリケーションの作成を含む、growclustersパッケージの機能と機能について述べる。 The growclusters package for R implements an enhanced version of k-means clustering that allows discovery of local clusterings or partitions for a collection of data sets that each draw their cluster means from a single, global partition. The package contains functions to estimate a partition structure for multivariate data. Estimation is performed under a penalized optimization derived from Bayesian non-parametric formulations. This paper describes some of the functions and capabilities of the growclusters package, including the creation of R Shiny applications designed to visually illustrate the operation and functionality of the growclusters package. | 翻訳日:2023-04-14 16:18:27 公開日:2023-04-12 |
# 編集フレンドリーなddpmノイズ空間:インバージョンと操作 An Edit Friendly DDPM Noise Space: Inversion and Manipulations ( http://arxiv.org/abs/2304.06140v1 ) ライセンス: Link先を確認 | Inbar Huberman-Spiegelglas, Vladimir Kulikov and Tomer Michaeli | (参考訳) denoising diffusion probabilistic models (ddpms) は一連の白色ガウスノイズサンプルを用いて画像を生成する。
また,既存の拡散ベースの編集手法を用いて,その品質と多様性を向上させる方法を示す。 Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity. | 翻訳日:2023-04-14 16:18:12 公開日:2023-04-12 |
# 農業用AGI AGI for Agriculture ( http://arxiv.org/abs/2304.06136v1 ) ライセンス: Link先を確認 | Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, Rui Xu, Daniel Petti, Changying Li, Tianming Liu, Changying Li | (参考訳) 人工知能(agi、artificial general intelligence)は、医療、金融、交通、教育など、さまざまな分野に革命をもたらしようとしている。
農業におけるAGIの変革的ポテンシャルは巨大であり,産業に革命をもたらす可能性を強調することを目的としている。 Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry. | 翻訳日:2023-04-14 16:17:40 公開日:2023-04-12 |
# 医用画像用視覚変換器の解説評価に向けて Towards Evaluating Explanations of Vision Transformers for Medical Imaging ( http://arxiv.org/abs/2304.06133v1 ) ライセンス: Link先を確認 | Piotr Komorowski, Hubert Baniecki, Przemys{\l}aw Biecek | (参考訳) 深層学習モデルが医療画像などの重要な領域に応用されるようになるにつれ、透明性と信頼性の高い意思決定の必要性が最重要となる。
Vision Transformer (ViT) は画像分類のための畳み込みニューラルネットワークに代わる有望な代替品となり、その解釈性は依然としてオープンな研究課題である。
本研究は, 医用画像診断における ViT 説明の適用性に関する知見を提供し, 比較に適切な評価基準を用いることの重要性を強調した。 As deep learning models increasingly find applications in critical domains such as medical imaging, the need for transparent and trustworthy decision-making becomes paramount. Many explainability methods provide insights into how these models make predictions by attributing importance to input features. As Vision Transformer (ViT) becomes a promising alternative to convolutional neural networks for image classification, its interpretability remains an open research question. This paper investigates the performance of various interpretation methods on a ViT applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Our findings provide insights into the applicability of ViT explanations in medical imaging and highlight the importance of using appropriate evaluation criteria for comparing them. | 翻訳日:2023-04-14 16:17:20 公開日:2023-04-12 |
# universeg:ユニバーサル・メディカル・イメージセグメンテーション UniverSeg: Universal Medical Image Segmentation ( http://arxiv.org/abs/2304.06131v1 ) ライセンス: Link先を確認 | Victor Ion Butoi, Jose Javier Gonzalez Ortiz, Tianyu Ma, Mert R. Sabuncu, John Guttag, Adrian V. Dalca | (参考訳) 深層学習モデルは医用画像セグメンテーションの主要な方法となっているが、通常、新しい解剖学、画像のモダリティ、ラベルを含む見えないセグメンテーションタスクに一般化することができない。
UniverSegのソースコードとモデルウェイトはhttps://universeg.csail.mit.eduで無料で入手できる。 While deep learning models have become the predominant method for medical image segmentation, they are typically not capable of generalizing to unseen segmentation tasks involving new anatomies, image modalities, or labels. Given a new segmentation task, researchers generally have to train or fine-tune models, which is time-consuming and poses a substantial barrier for clinical researchers, who often lack the resources and expertise to train neural networks. We present UniverSeg, a method for solving unseen medical segmentation tasks without additional training. Given a query image and example set of image-label pairs that define a new segmentation task, UniverSeg employs a new Cross-Block mechanism to produce accurate segmentation maps without the need for additional training. To achieve generalization to new tasks, we have gathered and standardized a collection of 53 open-access medical segmentation datasets with over 22,000 scans, which we refer to as MegaMedical. We used this collection to train UniverSeg on a diverse set of anatomies and imaging modalities. We demonstrate that UniverSeg substantially outperforms several related methods on unseen tasks, and thoroughly analyze and draw insights about important aspects of the proposed system. The UniverSeg source code and model weights are freely available at https://universeg.csail.mit.edu | 翻訳日:2023-04-14 16:17:01 公開日:2023-04-12 |
# 悪い」引用は「良い」効果を持つか? Do "bad" citations have "good" effects? ( http://arxiv.org/abs/2304.06190v1 ) ライセンス: Link先を確認 | Honglin Bao and Misha Teplitskiy | (参考訳) 科学界は一般に、研究論文の著者が、これらの「修辞的」な引用が良い仕事のための文学とインセンティブを低下させると仮定されているため、彼らに影響を与えない論文を引用することを妨げている。
まとめると、修辞的な引用は注意を減らし、既存のアイデアを置き換えやすくするので、それが本当に望ましくないかどうかは、望ましくないと判断するのに使われるメトリクスに依存する。 The scientific community generally discourages authors of research papers from citing papers that did not influence them because such "rhetorical" citations are assumed to degrade the literature and incentives for good work. Intuitively, a world where authors cite only substantively appears attractive. We argue that manding substantive citing may have underappreciated consequences on the allocation of attention and dynamism. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists with papers that support their claims, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability. | 翻訳日:2023-04-14 16:09:32 公開日:2023-04-12 |
# デフォーマル化と自然議論演習のための大規模言語モデルを用いた初心者学生の学習 Using large language models for (de-)formalization and natural argumentation exercises for beginner's students ( http://arxiv.org/abs/2304.06186v1 ) ライセンス: Link先を確認 | Merlin Carl | (参考訳) 大規模言語モデルであるtext-davinci-003 を用いて自動修正を行う2つのシステムについて述べる。
(ii)非数学的なシナリオで自然言語で単純な引数を書く練習。 We describe two systems that use text-davinci-003, a large language model, for the automatized correction of (i) exercises in translating back and forth between natural language and the languages of propositional logic and first-order predicate logic and (ii) exercises in writing simple arguments in natural language in non-mathematical scenarios. | 翻訳日:2023-04-14 16:09:06 公開日:2023-04-12 |
# lingo : タスクの多様性を支える自然言語の指示を視覚的に偏らせる LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity ( http://arxiv.org/abs/2304.06184v1 ) ライセンス: Link先を確認 | Anjana Arunkumar, Shubham Sharma, Rakhi Agrawal, Sriram Chandrasekaran, Chris Bryan | (参考訳) クロスタスクの一般化は、自然言語理解における熟達を定義する重要な結果である。
さらに、lingoの開発と評価で学んだ知見が、複数のドメインにまたがる迅速な作成に関わる労力を最小化することを目的とした、将来のダッシュボードの設計にどのように役立つかについても論じる。 Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing 'bias' in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains. | 翻訳日:2023-04-14 16:09:00 公開日:2023-04-12 |
# 高忠実性RGB-D表面再構成のための動的ボクセル格子最適化 Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction ( http://arxiv.org/abs/2304.06178v1 ) ライセンス: Link先を確認 | Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu | (参考訳) マルチレゾリューションボクセルグリッド上の補間機能の直接的最適化は、mlpライクなモジュールのより効率的な代替として登場した。
提案手法は,ベースライン法であるNeuralRGBDよりもはるかに高速な計算効率を維持しつつ,合成データと実世界のデータの両方を詳細に記述した高品質な3D再構成を生成する。 Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD. | 翻訳日:2023-04-14 16:08:32 公開日:2023-04-12 |
# 屋内農業環境における視覚的トマトサイズ測定システム Visual based Tomato Size Measurement System for an Indoor Farming Environment ( http://arxiv.org/abs/2304.06177v1 ) ライセンス: Link先を確認 | Andy Kweon, Vishnu Hu, Jong Yoon Lim, Trevor Gee, Edmond Liu, Henry Williams, Bruce A. MacDonald, Mahla Nejati, Inkyu Sa, and Ho Seok Ahn | (参考訳) 技術が進歩するにつれて、スマート自動化システムは農業においてますます重要な役割を果たすようになる。
果実のオクルージョンに対処して精度を向上させるため,3カメラシステムでは高さ測定精度0.9114,幅精度0.9443を達成できた。 As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement method combining a machine learning model and depth images captured from three low cost RGBD cameras to detect and measure the height and width of tomatoes. The performance of the presented system is evaluated on a lab environment with real tomato fruits and fake leaves to simulate occlusion in the real farm environment. To improve accuracy by addressing fruit occlusion, our three-camera system was able to achieve a height measurement accuracy of 0.9114 and a width accuracy of 0.9443. | 翻訳日:2023-04-14 16:08:18 公開日:2023-04-12 |
# オブジェクト認識同変基本反応拡散モデルによる正確な遷移状態生成 Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model ( http://arxiv.org/abs/2304.06174v1 ) ライセンス: Link先を確認 | Chenru Duan, Yuanqi Du, Haojun Jia, and Heather J. Kulik | (参考訳) 遷移状態 (TS) 探索は反応機構の解明と反応ネットワークの探索に重要である。
そこで本研究では, 反応剤, TS, 生成物の対を生成するために, 全ての物理対称性と制約を満たすオブジェクト指向SE(3)同変拡散モデルを開発した。
不確実性定量化のための信頼性評価モデルを用いて、最も困難な反応の14%で量子化学に基づく最適化を行うだけで、反応速度推定に必要な精度(2.6 kcal/mol)にアプローチする。
提案手法は,未知の機構を持つ大規模反応ネットワークの構築と構築に有用であると考えられる。 Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D TS structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here, we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating pairs of structures, i.e., reactant, TS, and product, in an elementary reaction. Provided reactant and product, this model generates a TS structure in seconds instead of the hours required when performing quantum chemistry-based optimizations. The generated TS structures achieve an average error of 0.13 A root mean square deviation compared to true TS. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction rate estimation (2.6 kcal/mol) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision the proposed approach to be useful in constructing and pruning large reaction networks with unknown mechanisms. | 翻訳日:2023-04-14 16:08:05 公開日:2023-04-12 |
# ダビンズ車による未知軌道に沿って移動するターゲットの知覚のためのニューラルネットワークアルゴリズム Neural Network Algorithm for Intercepting Targets Moving Along Known Trajectories by a Dubins' Car ( http://arxiv.org/abs/2304.06169v1 ) ライセンス: Link先を確認 | Ivan Nasonov and Andrey Galyaev and Andrey Medvedev | (参考訳) 迎撃モーメントにおける車両の速度の任意の方向の時間最適制御問題として、ダビンス車によって直線または円形軌道に沿って移動する目標を迎撃するタスクを定式化する。
この問題を解き、interception trajectoriesを合成するために、深い決定論的ポリシー勾配アルゴリズムに基づく教師なし学習のニューラルネットワーク手法を用いる。
対象動作の特定のクラスに対するインターセプショントラジェクタの合成にニューラルネットワーク法を用いることの有効性を示した。 The task of intercepting a target moving along a rectilinear or circular trajectory by a Dubins' car is formulated as a time-optimal control problem with an arbitrary direction of the car's velocity at the interception moment. To solve this problem and to synthesize interception trajectories, neural network methods of unsupervised learning based on the Deep Deterministic Policy Gradient algorithm are used. The analysis of the obtained control laws and interception trajectories in comparison with the analytical solutions of the interception problem is performed. The mathematical modeling for the parameters of the target movement that the neural network had not seen before during training is carried out. Model experiments are conducted to test the stability of the neural solution. The effectiveness of using neural network methods for the synthesis of interception trajectories for given classes of target movements is shown. | 翻訳日:2023-04-14 16:07:51 公開日:2023-04-12 |
# np-free:オープンエンド時系列のリアルタイム正規化フリーパラメータチューニングフリー表現手法 NP-Free: A Real-Time Normalization-free and Parameter-tuning-free Representation Approach for Open-ended Time Series ( http://arxiv.org/abs/2304.06168v1 ) ライセンス: Link先を確認 | Ming-Chang Lee, Jia-Chun Lin, and Volker Stolz | (参考訳) より接続されたデバイスがサイバー物理の世界に実装され、データがリアルタイムで収集および処理されることが期待されるため、時系列データを扱う能力はますます重要になっている。
NP-Freeは正規化メソッドやパラメータをチューニングすることなく、時系列の各データポイントをLong Short-Term Memory (LSTM) と Look-Back and Predict-Forward 戦略に基づいてルート平均二乗誤差(RMSE)値に変換することで、生の時系列の表現を生成することができる。
また,表現生成におけるnpフリーの時間消費も評価した。 As more connected devices are implemented in a cyber-physical world and data is expected to be collected and processed in real time, the ability to handle time series data has become increasingly significant. To help analyze time series in data mining applications, many time series representation approaches have been proposed to convert a raw time series into another series for representing the original time series. However, existing approaches are not designed for open-ended time series (which is a sequence of data points being continuously collected at a fixed interval without any length limit) because these approaches need to know the total length of the target time series in advance and pre-process the entire time series using normalization methods. Furthermore, many representation approaches require users to configure and tune some parameters beforehand in order to achieve satisfactory representation results. In this paper, we propose NP-Free, a real-time Normalization-free and Parameter-tuning-free representation approach for open-ended time series. Without needing to use any normalization method or tune any parameter, NP-Free can generate a representation for a raw time series on the fly by converting each data point of the time series into a root-mean-square error (RMSE) value based on Long Short-Term Memory (LSTM) and a Look-Back and Predict-Forward strategy. To demonstrate the capability of NP-Free in representing time series, we conducted several experiments based on real-world open-source time series datasets. We also evaluated the time consumption of NP-Free in generating representations. | 翻訳日:2023-04-14 16:07:37 公開日:2023-04-12 |
# 時間依存マルコフマスター方程式は断続極限を超えた Time dependent Markovian master equation beyond the adiabatic limit ( http://arxiv.org/abs/2304.06166v1 ) ライセンス: Link先を確認 | Giovanni Di Meglio, Martin B. Plenio, Susana F. Huelga | (参考訳) 任意の駆動場と制御場に従属するシステムの進化をモデル化するマルコフマスター方程式を開発した。
この結果は、第一原理導出に依存しない駆動系に対する現象論的マスター方程式によって満たさなければならない厳密な条件を与える。 We develop a Markovian master equation that models the evolution of systems subject to arbitrary driving and control fields. Our approach combines time rescaling and weak-coupling limits for the system-environment interaction with a secular approximation. The derivation makes use of the adiabatic time evolution operator in a manner that allows for the efficient description of strong driving, while recovering the adiabatic master equation in the appropriate limit. To illustrate the effectiveness of our approach, we apply it to the paradigmatic case of a two-level (qubit) system subjected to a form of periodic driving that remains unsolvable using a Floquet representation. We demonstrate the reliability and broad scope of our approach by benchmarking the solutions of the derived reduced time evolution against numerically exact simulations using tensor networks. Our results provide rigorous conditions that must be satisfied by phenomenological master equations for driven systems that do not rely on first principles derivations. | 翻訳日:2023-04-14 16:07:07 公開日:2023-04-12 |
# 可変低損失結合器を用いた超伝導3次元マイクロ波空洞を用いた平面回路の統合 Integrating planar circuits with superconducting 3D microwave cavities using tunable low-loss couplers ( http://arxiv.org/abs/2304.06162v1 ) ライセンス: Link先を確認 | Ziyi Zhao, Eva Gurra, Eric I. Rosenthal, Leila R. Vale, Gene C. Hilton, K. W. Lehnert | (参考訳) 超伝導3次元マイクロ波キャビティと2次元回路間の低損失界面を設計・試験し,結合速度を高度に調整する。
キャビティ外部結合速度は、3.2 nsという特性を持つ内部損失率よりも、無視できるほど小さいものから3桁以上大きいものへと調整することができる。
最後に、結合素子は共振器に0.04Hz/光子自己Kerr非線形性を導入し、高光子数演算では線形である。 We design and test a low-loss interface between superconducting 3-dimensional microwave cavities and 2-dimensional circuits, where the coupling rate is highly tunable. This interface seamlessly integrates a magnetic antenna and a Josephson junction based coupling element with a cavity, and we demonstrate that the introduced loss from this integration only limits the quality factor to 4.5 million. The cavity external coupling rate can then be tuned from negligibly small to over 3 orders of magnitude larger than the internal loss rate with a characteristic time of 3.2 ns. This switching speed does not impose additional limits on the coupling rate because it is much faster than the coupling rate. Moreover, the coupler can be controlled by baseband signals to avoid interference with microwave signals near the cavity or qubit frequencies. Finally, the coupling element introduces a 0.04 Hz/photon self-Kerr nonlinearity to the cavity, remaining linear in high photon number operations. | 翻訳日:2023-04-14 16:06:50 公開日:2023-04-12 |
# SiLK -- 簡単な学習キーポイント SiLK -- Simple Learned Keypoints ( http://arxiv.org/abs/2304.06194v1 ) ライセンス: Link先を確認 | Pierre Gleize, Weiyao Wang, Matt Feiszli | (参考訳) keypoint detection & descriptorは、画像マッチング、3d再構成、視覚オドメトリーなどのコンピュータビジョンタスクのための基礎技術である。
harris corners、sift、hog descriptorといった手作りの手法は数十年にわたって使われてきたが、最近ではキーポイント検出器を改善するために学習を導入する傾向がある。
最近の学習ベースの方法は、実験的なセットアップと設計の選択を多種多様に採用している。 経験的な結果は、バックボーン、プロトコル、データセット、監督の種類、タスクを使用して報告されることが多い。
我々は、各コンポーネントを第一原理から再設計し、完全微分可能で軽量でフレキシブルなSimple Learned Keypoints (SiLK)を提案する。
その単純さにもかかわらず、SiLKはHPatches上の検出再現性とホログラフィー推定タスクとScanNet上の3Dポイントクラウド登録タスクを新たに改善し、2022年のImage Matching ChallengeとScanNetにおける最先端のカメラポーズ推定に対する競合性能を達成する。 Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry. Hand-engineered methods like Harris corners, SIFT, and HOG descriptors have been used for decades; more recently, there has been a trend to introduce learning in an attempt to improve keypoint detectors. On inspection however, the results are difficult to interpret; recent learning-based methods employ a vast diversity of experimental setups and design choices: empirical results are often reported using different backbones, protocols, datasets, types of supervisions or tasks. Since these differences are often coupled together, it raises a natural question on what makes a good learned keypoint detector. In this work, we revisit the design of existing keypoint detectors by deconstructing their methodologies and identifying the key components. We re-design each component from first-principle and propose Simple Learned Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible. Despite its simplicity, SiLK advances new state-of-the-art on Detection Repeatability and Homography Estimation tasks on HPatches and 3D Point-Cloud Registration task on ScanNet, and achieves competitive performance to state-of-the-art on camera pose estimation in 2022 Image Matching Challenge and ScanNet. | 翻訳日:2023-04-14 15:58:25 公開日:2023-04-12 |
# 部分観測非線形システムに対する全契約とリプシッツ閉ループの学習 Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems ( http://arxiv.org/abs/2304.06193v1 ) ライセンス: Link先を確認 | Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester | (参考訳) 本稿では非線形な部分観測力学系に対する学習に基づく制御のためのポリシーパラメータ化を提案する。
このパラメータ化は、Youlaパラメータ化の非線形バージョンと、最近提案されたRecurrent Equilibrium Network (REN)クラスに基づく。
以上より,youla-renは既存の学習ベースおよび最適制御法と同様に動作し,安定性を確保しつつ,対向障害に対するロバスト性も向上した。 This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances. | 翻訳日:2023-04-14 15:58:00 公開日:2023-04-12 |
# アップリンクsrsチャネル推定による5g nrシステムのml対応屋外ユーザ位置決め ML-Enabled Outdoor User Positioning in 5G NR Systems via Uplink SRS Channel Estimates ( http://arxiv.org/abs/2304.06514v1 ) ライセンス: Link先を確認 | Andre R\'ath, Dino Pjani\'c, Bo Bernhardsson and Fredrik Tufvesson | (参考訳) セルユーザーの位置決めは、第5世代ニューラジオ(5G NR)ネットワークが提供する有望なサービスである。
さらに、機械学習(ML)技術は、5G NRシステムに統合され、無線性能の向上と複雑さの低減を図っている。
本稿では,物理層からのアップリンクチャネル推定による5G NR指紋を用いた位置決めのためのML手法について検討する。
ユーザの位置を推測するのに十分なデータを提供するために,SRS (Sounding Reference Signals) チャネル指紋を使用することが可能であることを示す。
さらに, 小型の完全連結型深層ニューラルネットワークは, SRSデータに適用しても, 市販の5G環境において, メートルレベルの精度で屋外ユーザの位置決めを成功させることができることを示す。 Cellular user positioning is a promising service provided by Fifth Generation New Radio (5G NR) networks. Besides, Machine Learning (ML) techniques are foreseen to become an integrated part of 5G NR systems improving radio performance and reducing complexity. In this paper, we investigate ML techniques for positioning using 5G NR fingerprints consisting of uplink channel estimates from the physical layer channel. We show that it is possible to use Sounding Reference Signals (SRS) channel fingerprints to provide sufficient data to infer user position. Furthermore, we show that small fully-connected moderately Deep Neural Networks, even when applied to very sparse SRS data, can achieve successful outdoor user positioning with meter-level accuracy in a commercial 5G environment. | 翻訳日:2023-04-14 14:25:57 公開日:2023-04-12 |
# ニューラルネットワークのハードウェア高速化 Hardware Acceleration of Neural Graphics ( http://arxiv.org/abs/2303.05735v6 ) ライセンス: Link先を確認 | Muhammad Husnain Mubarik, Ramakrishna Kanungo, Tobias Zirr and Rakesh Kumar | (参考訳) 従来のコンピュータグラフィックスを駆動するレンダリングと逆レンダリングアルゴリズムは、最近neural representations (nr)に取って代わられた。
neural graphics (ng) はハードウェアサポートが必要か?
AR/VRアプリケーションでは、所望のパフォーマンスと必要なシステムパワーの間に2-4 OOMのギャップがさらに大きい。
入力エンコーディングとmlpカーネルは性能ボトルネックであり,マルチres.hashgrid,multi res. densegrid,low res. densegridエンコーディングのアプリケーション時間の72%,60%,59%を消費する。
以上の結果から,NGPCでは,NeRFで30FPSで4k,他のNGアプリケーションで120FPSで8kのレンダリングが可能であることが示唆された。 Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications. | 翻訳日:2023-04-14 11:00:55 公開日:2023-04-12 |
# OpenAGI: LLMがドメインエキスパートと出会ったとき OpenAGI: When LLM Meets Domain Experts ( http://arxiv.org/abs/2304.04370v2 ) ライセンス: Link先を確認 | Yingqiang Ge, Wenyue Hua, Jianchao Ji, Juntao Tan, Shuyuan Xu, Yongfeng Zhang | (参考訳) 人間の知性は、複雑なタスクを解決するために、基本的なスキルを複雑なものに組み立てる素晴らしい能力を持っている。
コミュニティによるAGIの能力の長期的な改善と評価を容易にするため、私たちはOpenAGIプロジェクトのコード、ベンチマーク、評価方法をhttps://github.com/agiresearch/OpenAGIでオープンソース化しました。 Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI. | 翻訳日:2023-04-14 10:51:34 公開日:2023-04-12 |
# PreCVAE:ベイズ深部生成モデルを用いたスケーラブルMCMCパラメータ推定 PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling ( http://arxiv.org/abs/2304.04307v2 ) ライセンス: Link先を確認 | Elizaveta Semenova, Max Cairney-Leeming, Seth Flaxman | (参考訳) 推論の速度とモデルの柔軟性が不可欠である応用分野において、確率過程が先行するモデルに対してベイズ推論を用いることは、例えばガウス過程(GP)がユビキタスである。
PriorCVAEのコードはGitHubで見ることができる。 In applied fields where the speed of inference and model flexibility are crucial, the use of Bayesian inference for models with a stochastic process as their prior, e.g. Gaussian processes (GPs) is ubiquitous. Recent literature has demonstrated that the computational bottleneck caused by GP priors or their finite realizations can be encoded using deep generative models such as variational autoencoders (VAEs), and the learned generators can then be used instead of the original priors during Markov chain Monte Carlo (MCMC) inference in a drop-in manner. While this approach enables fast and highly efficient inference, it loses information about the stochastic process hyperparameters, and, as a consequence, makes inference over hyperparameters impossible and the learned priors indistinct. We propose to resolve this issue and disentangle the learned priors by conditioning the VAE on stochastic process hyperparameters. This way, the hyperparameters are encoded alongside GP realisations and can be explicitly estimated at the inference stage. We believe that the new method, termed PriorCVAE, will be a useful tool among approximate inference approaches and has the potential to have a large impact on spatial and spatiotemporal inference in crucial real-life applications. Code showcasing PriorCVAE can be found on GitHub: https://github.com/elizavetasemenova/PriorCVAE | 翻訳日:2023-04-14 10:50:57 公開日:2023-04-12 |
# ChatGPTの可能性を解き明かす - 自然言語処理における応用, アドバンテージ, 限界, 今後の方向性の包括的探索 Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing ( http://arxiv.org/abs/2304.02017v5 ) ライセンス: Link先を確認 | Walid Hariri | (参考訳) 大規模言語モデルは人工知能の分野に革命をもたらし、様々な用途で使われている。
これらのモデルのうち、chatgpt(chat generative pre-trained transformer)はopenaiによって開発されており、広く採用されている強力なツールである。
最後に、人工知能とそのビジョンおよびnlpドメインへの影響について、迅速なエンジニアリング技術への洞察を提供することにより、現在進行中の議論に寄与する。 Large language models have revolutionized the field of artificial intelligence and have been used in various applications. Among these models, ChatGPT (Chat Generative Pre-trained Transformer) has been developed by OpenAI, it stands out as a powerful tool that has been widely adopted. ChatGPT has been successfully applied in numerous areas, including chatbots, content generation, language translation, personalized recommendations, and even medical diagnosis and treatment. Its success in these applications can be attributed to its ability to generate human-like responses, understand natural language, and adapt to different contexts. Its versatility and accuracy make it a powerful tool for natural language processing (NLP). However, there are also limitations to ChatGPT, such as its tendency to produce biased responses and its potential to perpetuate harmful language patterns. This article provides a comprehensive overview of ChatGPT, its applications, advantages, and limitations. Additionally, the paper emphasizes the importance of ethical considerations when using this robust tool in real-world scenarios. Finally, This paper contributes to ongoing discussions surrounding artificial intelligence and its impact on vision and NLP domains by providing insights into prompt engineering techniques. | 翻訳日:2023-04-14 10:48:58 公開日:2023-04-12 |
# 深層学習による恒星変動の存在下での惑星ラジアル速度の測定 Deep-learning based measurement of planetary radial velocities in the presence of stellar variability ( http://arxiv.org/abs/2304.04807v2 ) ライセンス: Link先を確認 | Ian Colwell, Virisha Timmaraju, Alexander Wise | (参考訳) 恒星変動の存在下での小さな惑星半径速度を測定するための深層学習に基づくアプローチを提案する。
我々は、HARPS-N Sun-as-a-starスペクトルの3年間の恒星RVジッタを低減するためにニューラルネットワークを使用する。
このアプローチは、恒星のRV変動を緩和し、前例のない精度で小さな惑星のRVを検出することを約束している。 We present a deep-learning based approach for measuring small planetary radial velocities in the presence of stellar variability. We use neural networks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-star spectra. We develop and compare dimensionality-reduction and data splitting methods, as well as various neural network architectures including single line CNNs, an ensemble of single line CNNs, and a multi-line CNN. We inject planet-like RVs into the spectra and use the network to recover them. We find that the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude, 50 day period, with 8.8% error in the amplitude and 0.7% in the period. This approach shows promise for mitigating stellar RV variability and enabling the detection of small planetary RVs with unprecedented precision. | 翻訳日:2023-04-14 10:40:38 公開日:2023-04-12 |
# RAW領域とsRGB領域における大規模動的データセットを用いたHDRビデオ再構成 HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains ( http://arxiv.org/abs/2304.04773v2 ) ライセンス: Link先を確認 | Huanjing Yue, Yubo Peng, Biting Yu, Xuanwu Yin, Zhenyu Zhou, Jingyu Yang | (参考訳) 高ダイナミックレンジ(HDR)ビデオ再構成は、低ダイナミックレンジ(LDR)ビデオと比較して視覚的品質が優れているため、ますます注目を集めている。
2) sRGB入力と比較して, 生の入力を利用すると, 復元精度が向上し, 提案するRaw-HDRNetは生のHDR再構築の強力なベースラインとなる。
この論文の受理後、私たちのデータセットとコードはリリースされます。 High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper. | 翻訳日:2023-04-14 10:40:24 公開日:2023-04-12 |
# SoK:ディープニューラルネットワークのロバスト性認定 SoK: Certified Robustness for Deep Neural Networks ( http://arxiv.org/abs/2009.04131v9 ) ライセンス: Link先を確認 | Linyi Li, Tao Xie, Bo Li | (参考訳) ディープニューラルネットワーク(DNN)の大幅な進歩により、幅広いタスクにおける最先端のパフォーマンスが向上した。
a) 強靭性証明を提供することなく、通常、再び適応的に攻撃される経験的防御
b) 一定の条件下での攻撃に対するロバストな精度の低い境界とそれに対応するロバストなトレーニングアプローチを提供するロバストな検証とからなるロバストなアプローチ
4) 20以上の代表的堅牢なアプローチを評価するオープンソース統一プラットフォームを提供する。 Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying these models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: a) empirical defenses, which can usually be adaptively attacked again without providing robustness certification; and b) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we systematize certifiably robust approaches and related practical and theoretical implications and findings. We also provide the first comprehensive benchmark on existing robustness verification and training approaches on different datasets. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as summarize the methodologies for representative algorithms, 2) reveal the characteristics, strengths, limitations, and fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and future directions for certifiably robust approaches for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative certifiably robust approaches. | 翻訳日:2023-04-13 20:24:06 公開日:2023-04-12 |
# 論理推論による統計的学習による認定ロバスト性の改善 Improving Certified Robustness via Statistical Learning with Logical Reasoning ( http://arxiv.org/abs/2003.00120v9 ) ライセンス: Link先を確認 | Zhuolin Yang, Zhikuan Zhao, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlas, Ji Liu, Heng Guo, Ce Zhang, and Bo Li | (参考訳) 近年,複雑なmlモデルの証明書ロバスト性が急速に向上するために,集中型アルゴリズムが開発されている。
最後に、高次元画像と自然言語テキストの両方を含む5つのデータセットについて広範な実験を行い、知識に基づく論理的推論による証明されたロバスト性は、明らかに最先端技術よりも優れていることを示す。 Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-arts. | 翻訳日:2023-04-13 20:23:30 公開日:2023-04-12 |
# 美学とニューラルネットワーク画像表現 Aesthetics and neural network image representations ( http://arxiv.org/abs/2109.08103v2 ) ライセンス: Link先を確認 | Romuald A. Janik | (参考訳) 我々はBigGANアーキテクチャの生成ニューラルネットワークによって符号化された画像の空間を分析する。
いずれのネットワークも、人造芸術のイメージにアクセスできなかった。 We analyze the spaces of images encoded by generative neural networks of the BigGAN architecture. We find that generic multiplicative perturbations of neural network parameters away from the photo-realistic point often lead to networks generating images which appear as "artistic renditions" of the corresponding objects. This demonstrates an emergence of aesthetic properties directly from the structure of the photo-realistic visual environment as encoded in its neural network parametrization. Moreover, modifying a deep semantic part of the neural network leads to the appearance of symbolic visual representations. None of the considered networks had any access to images of human-made art. | 翻訳日:2023-04-13 19:42:11 公開日:2023-04-12 |
# 高次元におけるマルコフ連鎖モンテカルロ法の漸近バイアス Asymptotic bias of inexact Markov Chain Monte Carlo methods in high dimension ( http://arxiv.org/abs/2108.00682v2 ) ライセンス: Link先を確認 | Alain Oliviero Durmus and Andreas Eberle | (参考訳) 不正確なマルコフ連鎖モンテカルロ法は、ターゲット分布を正確に保存しないマルコフ連鎖に依存する。
例えば、未調整のランゲヴィンアルゴリズム (ULA) や非調整のハミルトンモンテカルロ (uHMC) がある。
Wasserstein が正確なあるいは近似力学の平衡への収束に依存すると仮定すると、ULA と uHMC の両方において、漸近バイアスがスキームの目標分布または定常確率測度に関連する鍵量に依存することを示す。
その結果、平均場モデル、有限範囲のグラフィカルモデル、およびそれらの摂動といった限られた相互作用を持つモデルに対して、漸近バイアスは、積測度と同様にステップサイズと次元に類似していることがわかった。 Inexact Markov Chain Monte Carlo methods rely on Markov chains that do not exactly preserve the target distribution. Examples include the unadjusted Langevin algorithm (ULA) and unadjusted Hamiltonian Monte Carlo (uHMC). This paper establishes bounds on Wasserstein distances between the invariant probability measures of inexact MCMC methods and their target distributions with a focus on understanding the precise dependence of this asymptotic bias on both dimension and discretization step size. Assuming Wasserstein bounds on the convergence to equilibrium of either the exact or the approximate dynamics, we show that for both ULA and uHMC, the asymptotic bias depends on key quantities related to the target distribution or the stationary probability measure of the scheme. As a corollary, we conclude that for models with a limited amount of interactions such as mean-field models, finite range graphical models, and perturbations thereof, the asymptotic bias has a similar dependence on the step size and the dimension as for product measures. | 翻訳日:2023-04-13 19:42:02 公開日:2023-04-12 |
# 量子非対称性とノイズマルチモード干渉法 Quantum asymmetry and noisy multi-mode interferometry ( http://arxiv.org/abs/2107.11057v2 ) ライセンス: Link先を確認 | Francesco Albarelli, Mateusz Mazelanik, Micha{\l} Lipka, Alexander Streltsov, Micha{\l} Parniak, Rafal Demkowicz-Dobrzanski | (参考訳) 量子非対称性(quantum asymmetry)は、干渉実験における位相符号化を担う発電機の固有空間間のコヒーレンス量と一致する物理資源である。
退化部分空間内でのコヒーレンスの結果、非対称性が \emph{increase} となるという明らかに反直観的な振る舞いを強調する。
最後に, 絡み合い資源理論における効果の類似性も確立する。 Quantum asymmetry is a physical resource which coincides with the amount of coherence between the eigenspaces of a generator responsible for phase encoding in interferometric experiments. We highlight an apparently counter-intuitive behavior that the asymmetry may \emph{increase} as a result of a \emph{decrease} of coherence inside a degenerate subspace. We intuitively explain and illustrate the phenomena by performing a three-mode single-photon interferometric experiment, where one arm carries the signal and two noisy reference arms have fluctuating phases. We show that the source of the observed sensitivity improvement is the reduction of correlations between these fluctuations and comment on the impact of the effect when moving from the single-photon quantum level to the classical regime. Finally, we also establish the analogy of the effect in the case of entanglement resource theory. | 翻訳日:2023-04-13 19:41:45 公開日:2023-04-12 |
# ネットワーク学習 - ネットワークにおける分散トレーニングと推論 In-Network Learning: Distributed Training and Inference in Networks ( http://arxiv.org/abs/2107.03433v3 ) ライセンス: Link先を確認 | Matei Moldoveanu, Abdellatif Zaidi | (参考訳) 現代の機械学習技術をモバイルデバイスやワイヤレスネットワークに活用することで、重要な新しいサービスを実現する可能性があると広く認識されている。
また、一般的な無線無線アクセスにおけるニューラルネットワークを用いた実装の側面についても論じ、最先端技術に対するメリットを示す実験を行う。 It is widely perceived that leveraging the success of modern machine learning techniques to mobile devices and wireless networks has the potential of enabling important new services. This, however, poses significant challenges, essentially due to that both data and processing power are highly distributed in a wireless network. In this paper, we develop a learning algorithm and an architecture that make use of multiple data streams and processing units, not only during the training phase but also during the inference phase. In particular, the analysis reveals how inference propagates and fuses across a network. We study the design criterion of our proposed method and its bandwidth requirements. Also, we discuss implementation aspects using neural networks in typical wireless radio access; and provide experiments that illustrate benefits over state-of-the-art techniques. | 翻訳日:2023-04-13 19:41:27 公開日:2023-04-12 |
# GitTables:リレーショナルテーブルの大規模コーパス GitTables: A Large-Scale Corpus of Relational Tables ( http://arxiv.org/abs/2106.07258v5 ) ライセンス: Link先を確認 | Madelon Hulsebos, \c{C}a\u{g}atay Demiralp, Paul Groth | (参考訳) ディープラーニングの成功は、大規模なテーブルコーパスで訓練されたテーブル表現モデルを用いて、データ準備や検索といったリレーショナルテーブルタスクの改善への関心を喚起した。
コーパスとコードはhttps://gittables.github.io.com/で利用可能です。 The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io. | 翻訳日:2023-04-13 19:41:16 公開日:2023-04-12 |
# Vec2GC - テキスト表現のためのグラフベースのクラスタリング手法 Vec2GC -- A Graph Based Clustering Method for Text Representations ( http://arxiv.org/abs/2104.09439v2 ) ライセンス: Link先を確認 | Rajesh N Rao, Manojit Chakraborty | (参考訳) ラベル付きデータに制限があるNLPパイプラインは、ドキュメント処理の教師なし手法に依存している。
本稿では,新たなクラスタリングアルゴリズムであるVec2GC(Vector to Graph Communities)を導入する。
vec2gcクラスタリングアルゴリズムは密度ベースのアプローチであり、階層的クラスタリングもサポートする。 NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well. | 翻訳日:2023-04-13 19:40:55 公開日:2023-04-12 |
# オフライン強化学習における性能向上のためのエキスパート誘導対称性検出によるデータ拡張 Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning ( http://arxiv.org/abs/2112.09943v3 ) ライセンス: Link先を確認 | Giorgio Angelotti, Nicolas Drougard, Caroline P. C. Chanel | (参考訳) マルコフ決定過程(MDP)の動的モデルのオフライン推定は、学習フェーズで利用可能なデータに大きく依存する非自明なタスクである。
近年の研究では,Deep Neural Network based Normalizing Flows として密度推定手法に依存する専門家誘導パイプラインが,分類的・連続的評価の両面で決定論的環境において,この構造を効果的に検出することを示した。
2) 学習したMDPを解き, 実環境に最適化されたポリシーを適用すると, 前者の結果が性能改善につながることを示す。 Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available in the learning phase. Sometimes the dynamics of the model is invariant with respect to some transformations of the current state and action. Recent works showed that an expert-guided pipeline relying on Density Estimation methods as Deep Neural Network based Normalizing Flows effectively detects this structure in deterministic environments, both categorical and continuous-valued. The acquired knowledge can be exploited to augment the original data set, leading eventually to a reduction in the distributional shift between the true and the learned model. Such data augmentation technique can be exploited as a preliminary process to be executed before adopting an Offline Reinforcement Learning architecture, increasing its performance. In this work we extend the paradigm to also tackle non-deterministic MDPs, in particular, 1) we propose a detection threshold in categorical environments based on statistical distances, and 2) we show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment. | 翻訳日:2023-04-13 19:33:42 公開日:2023-04-12 |
# 2つの射影ビューに対する臨界構成 : 新しいアプローチ Critical configurations for two projective views, a new approach ( http://arxiv.org/abs/2112.05074v3 ) ライセンス: Link先を確認 | Martin Br{\aa}telund | (参考訳) 動きからの構造問題は、物体の3次元構造を2次元画像の集合から復元することに関わる。
また, ユニークな再建が不可能な場合の異なる復元との関係についても述べる。 The problem of structure from motion is concerned with recovering 3-dimensional structure of an object from a set of 2-dimensional images. Generally, all information can be uniquely recovered if enough images and image points are provided, but there are certain cases where unique recovery is impossible; these are called critical configurations. In this paper we use an algebraic approach to study the critical configurations for two projective cameras. We show that all critical configurations lie on quadric surfaces, and classify exactly which quadrics constitute a critical configuration. The paper also describes the relation between the different reconstructions when unique reconstruction is impossible. | 翻訳日:2023-04-13 19:33:23 公開日:2023-04-12 |
# ゼロショット転送学習のための複合スケーリング Combined Scaling for Zero-shot Transfer Learning ( http://arxiv.org/abs/2111.10050v3 ) ライセンス: Link先を確認 | Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le | (参考訳) 我々は,ImageNet ILSVRC-2012バリデーションセットにおいて,ラベル付きImageNet例から学習することなく85.7%のトップ1の精度を実現する,BASICという組み合わせスケーリング手法を提案する。
例えば、ImageNet-{A,R,V2,Sketch} や ObjectNet のような自然な分布シフトを持つ5つのテストセットにおいて、我々のモデルは84.3%のTop-1平均精度を達成する。
そこで我々は,BASICのような画像テキストモデルに対して,大きなコントラストバッチサイズがより小さい一般化ギャップをもたらすことを示す理論的枠組みを開発した。 We present a combined scaling method - named BASIC - that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best published similar models - CLIP and ALIGN - by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution shifts such as ImageNet-{A,R,V2,Sketch} and ObjectNet, our model achieves 84.3% top-1 average accuracy, only a small drop from its original ImageNet accuracy. To achieve these results, we scale up the contrastive learning framework of CLIP and ALIGN in three dimensions: data size, model size, and batch size. Our dataset has 6.6B noisy image-text pairs, which is 4x larger than ALIGN, and 16x larger than CLIP. Our largest model has 3B weights, which is 3.75x larger in parameters and 8x larger in FLOPs than ALIGN and CLIP. Finally, our batch size is 65536 which is 2x more than CLIP and 4x more than ALIGN. We encountered two main challenges with the scaling rules of BASIC. First, the main challenge with implementing the combined scaling rules of BASIC is the limited memory of accelerators, such as GPUs and TPUs. To overcome the memory limit, we propose two simple methods which make use of gradient checkpointing and model parallelism. Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood. To shed light on the benefits of large contrastive batch sizes, we develop a theoretical framework which shows that larger contrastive batch sizes lead to smaller generalization gaps for image-text models such as BASIC. | 翻訳日:2023-04-13 19:32:43 公開日:2023-04-12 |
# Multi-Glimpse Network: 繰り返しダウンサンプル注意に基づくロバストかつ効率的な分類アーキテクチャ Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention ( http://arxiv.org/abs/2111.02018v2 ) ライセンス: Link先を確認 | Sia Huat Tan, Runpei Dong, Kaisheng Ma | (参考訳) ほとんどのフィードフォワード畳み込みニューラルネットワークは、各ピクセルに対してほぼ同じ労力を費やす。
imagenet100における実験は, フィードフォワード方式を改善するために, 繰り返しダウンサンプリングされた注意機構の可能性を実証するものである。
私たちのコードはhttps://github.com/siahuat0727/mgnetで利用可能です。 Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism. Specifically, MGNet sequentially selects task-relevant regions of an image to focus on and then adaptively combines all collected information for the final prediction. MGNet expresses strong resistance against adversarial attacks and common corruptions with less computation. Also, MGNet is inherently more interpretable as it explicitly informs us where it focuses during each iteration. Our experiments on ImageNet100 demonstrate the potential of recurrent downsampled attention mechanisms to improve a single feedforward manner. For example, MGNet improves 4.76% accuracy on average in common corruptions with only 36.9% computational cost. Moreover, while the baseline incurs an accuracy drop to 7.6%, MGNet manages to maintain 44.2% accuracy in the same PGD attack strength with ResNet-50 backbone. Our code is available at https://github.com/siahuat0727/MGNet. | 翻訳日:2023-04-13 19:32:06 公開日:2023-04-12 |
# 重複するユーザやコンテキストを伴わないレビューベースのドメイン・ディスタングル Review-Based Domain Disentanglement without Duplicate Users or Contexts for Cross-Domain Recommendation ( http://arxiv.org/abs/2110.12648v3 ) ライセンス: Link先を確認 | Yoonhyuk Choi, Jiho Choi, Taewook Ko, Hyungho Byun, Chong-Kwon Kim | (参考訳) ドメイン横断のレコメンデーションは、データスパーシリティとコールドスタート問題を解決する上で有望な結果を示している。
集約的な実験とアブレーション研究により、我々の手法は最先端の単ドメインおよびクロスドメインレコメンデーション手法と比較して効率的で堅牢でスケーラブルであることが示された。 A cross-domain recommendation has shown promising results in solving data-sparsity and cold-start problems. Despite such progress, existing methods focus on domain-shareable information (overlapped users or same contexts) for a knowledge transfer, and they fail to generalize well without such requirements. To deal with these problems, we suggest utilizing review texts that are general to most e-commerce systems. Our model (named SER) uses three text analysis modules, guided by a single domain discriminator for disentangled representation learning. Here, we suggest a novel optimization strategy that can enhance the quality of domain disentanglement, and also debilitates detrimental information of a source domain. Also, we extend the encoding network from a single to multiple domains, which has proven to be powerful for review-based recommender systems. Extensive experiments and ablation studies demonstrate that our method is efficient, robust, and scalable compared to the state-of-the-art single and cross-domain recommendation methods. | 翻訳日:2023-04-13 19:31:32 公開日:2023-04-12 |
# 曲率アウェアデリバティブフリー最適化 Curvature-Aware Derivative-Free Optimization ( http://arxiv.org/abs/2109.13391v2 ) ライセンス: Link先を確認 | Bumsu Kim, HanQin Cai, Daniel McKenzie, Wotao Yin | (参考訳) 本稿では、勾配や方向微分へのアクセスを伴わない関数の最小化を伴う微分自由最適化(DFO)について論じる。
提案手法はCurvature-Aware Random Search (CARS) と呼ばれ, 1階と2階の差分近似を用いて候補の$\alpha_{+}$を計算する。
強凸対象関数に対しては, 探索方向が極めて穏やかな条件を満たす分布から引き出されるように, 車体が線形収束することを示す。
また、CARS の立方正規化変種である CARS-CR も、強い凸性の仮定なしで$\mathcal{O}(k^{-1})$ の速度で収束する。
数値実験により、CARSとCARS-CRは、ベンチマーク問題セットの最先端と一致するか、あるいは超えることを示した。 The paper discusses derivative-free optimization (DFO), which involves minimizing a function without access to gradients or directional derivatives, only function evaluations. Classical DFO methods, which mimic gradient-based methods, such as Nelder-Mead and direct search have limited scalability for high-dimensional problems. Zeroth-order methods have been gaining popularity due to the demands of large-scale machine learning applications, and the paper focuses on the selection of the step size $\alpha_k$ in these methods. The proposed approach, called Curvature-Aware Random Search (CARS), uses first- and second-order finite difference approximations to compute a candidate $\alpha_{+}$. We prove that for strongly convex objective functions, CARS converges linearly provided that the search direction is drawn from a distribution satisfying very mild conditions. We also present a Cubic Regularized variant of CARS, named CARS-CR, which converges in a rate of $\mathcal{O}(k^{-1})$ without the assumption of strong convexity. Numerical experiments show that CARS and CARS-CR match or exceed the state-of-the-arts on benchmark problem sets. | 翻訳日:2023-04-13 19:30:32 公開日:2023-04-12 |
# 最近のFew-Shotオブジェクト検出アルゴリズム:性能比較による調査 Recent Few-Shot Object Detection Algorithms: A Survey with Performance Comparison ( http://arxiv.org/abs/2203.14205v2 ) ライセンス: Link先を確認 | Tianying Liu, Lu Zhang, Yang Wang, Jihong Guan, Yanwei Fu, Jiajia Zhao, Shuigeng Zhou | (参考訳) ジェネリックオブジェクト検出(GOD)タスクは、いくつかの一般的なクラスからの注釈付きトレーニングサンプルの雪崩によってトレーニングされた、最近のディープニューラルネットワークによってうまく取り組まれている。
この目的のために、Few-Shot Object Detection (FSOD) は、人間の学習能力を模倣し、学習対象の知識を共通のヘビーテールから新しいロングテールオブジェクトクラスにインテリジェントに伝達するものとして、最近話題となっている。
これらのFSODの成果を概観するために,FSOD の調査論文 [58, 59, 74, 78] を,微調整/トランスファー学習とメタラーニング手法の群として体系的に比較した。
最後に,パフォーマンス,課題,今後の方向性に関するさらなる議論を行う。 The generic object detection (GOD) task has been successfully tackled by recent deep neural networks, trained by an avalanche of annotated training samples from some common classes. However, it is still non-trivial to generalize these object detectors to the novel long-tailed object classes, which have only few labeled training samples. To this end, the Few-Shot Object Detection (FSOD) has been topical recently, as it mimics the humans' ability of learning to learn, and intelligently transfers the learned generic object knowledge from the common heavy-tailed, to the novel long-tailed object classes. Especially, the research in this emerging field has been flourishing in recent years with various benchmarks, backbones, and methodologies proposed. To review these FSOD works, there are several insightful FSOD survey articles [58, 59, 74, 78] that systematically study and compare them as the groups of fine-tuning/transfer learning, and meta-learning methods. In contrast, we review the existing FSOD algorithms from a new perspective under a new taxonomy based on their contributions, i.e., data-oriented, model-oriented, and algorithm-oriented. Thus, a comprehensive survey with performance comparison is conducted on recent achievements of FSOD. Furthermore, we also analyze the technical challenges, the merits and demerits of these methods, and envision the future directions of FSOD. Specifically, we give an overview of FSOD, including the problem definition, common datasets, and evaluation protocols. The taxonomy is then proposed that groups FSOD methods into three types. Following this taxonomy, we provide a systematic review of the advances in FSOD. Finally, further discussions on performance, challenges, and future directions are presented. | 翻訳日:2023-04-13 19:25:19 公開日:2023-04-12 |
# 単一モードキャビティに強く結合した2レベル人工原子からの共鳴蛍光 Resonance Fluorescence from a two-level artificial atom strongly coupled to a single-mode cavity ( http://arxiv.org/abs/2202.12080v4 ) ライセンス: Link先を確認 | Z.H. Peng and D. He and Y. Zhou and J.H. Ding and J. Lu and L. Zhou and J.Q. Liao and L.M. Kuang and Yu-xi Liu and Oleg V. Astafiev and J.S. Tsai | (参考訳) 単モードキャビティ場に強く結合した2レベル人工原子の共鳴蛍光を実験的に実証した。
この効果は30年前にサヴェージ(Phys. Lett. 63, 1376 (1989))によって理論的に予測された。
実験結果は理論計算とよく一致する。 We experimentally demonstrate the resonance fluorescence of a two-level artificial atom strongly coupled to a single-mode cavity field. The effect was theoretically predicted thirty years ago by Savage [Phys. Rev. Lett. 63, 1376 (1989)]. The system consists of a superconducting qubit circuit and a one-dimensional transmission line resonator. In addition, a one-dimensional transmission line strongly coupled to the atom serves as an open space. The effect takes place, when a microwave field is applied to the cavity, which in turn is resonantly coupled to the atom. The fluorescence spectrum is measured via the emission into the transmission line. We find that the central peak is determined by the atom spontaneous emission to the open space and the widths of side peaks are largely determined by the coherent interaction between the atom and the cavity, that is, the fluorescence spectrum here is very different from that of the Mollow triplet. We also derive analytical form for the spectrum. Our experimental results agree well with theoretical calculations. | 翻訳日:2023-04-13 19:24:28 公開日:2023-04-12 |
# OLIVE: スパシフィケーションのリスクに対する信頼された実行環境に関するオープンなフェデレーションラーニング OLIVE: Oblivious Federated Learning on Trusted Execution Environment against the risk of sparsification ( http://arxiv.org/abs/2202.07165v4 ) ライセンス: Link先を確認 | Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa | (参考訳) FL(Federated Learning)とTrusted Execution Environment(TEE)を組み合わせることは、近年大きな学術的注目を集めているプライバシー保護FLを実現するための有望なアプローチである。
実世界データを用いた実験により,提案手法が実用的なスケールで効率的に機能することを示す。 Combining Federated Learning (FL) with a Trusted Execution Environment (TEE) is a promising approach for realizing privacy-preserving FL, which has garnered significant academic attention in recent years. Implementing the TEE on the server side enables each round of FL to proceed without exposing the client's gradient information to untrusted servers. This addresses usability gaps in existing secure aggregation schemes as well as utility gaps in differentially private FL. However, to address the issue using a TEE, the vulnerabilities of server-side TEEs need to be considered -- this has not been sufficiently investigated in the context of FL. The main technical contribution of this study is the analysis of the vulnerabilities of TEE in FL and the defense. First, we theoretically analyze the leakage of memory access patterns, revealing the risk of sparsified gradients, which are commonly used in FL to enhance communication efficiency and model accuracy. Second, we devise an inference attack to link memory access patterns to sensitive information in the training dataset. Finally, we propose an oblivious yet efficient aggregation algorithm to prevent memory access pattern leakage. Our experiments on real-world data demonstrate that the proposed method functions efficiently in practical scales. | 翻訳日:2023-04-13 19:24:13 公開日:2023-04-12 |
# ゼロサムニューロシンボリック同時確率ゲームのための戦略合成 Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games ( http://arxiv.org/abs/2202.06255v5 ) ライセンス: Link先を確認 | Rui Yan, Gabriel Santos, Gethin Norman, David Parker and Marta Kwiatkowska | (参考訳) ニューラルネットワークと古典的な記号技法を組み合わせた人工知能へのニューロシンボリックアプローチは、その正しさを判断するために正式なアプローチを必要とする。
本稿では,ニューラル・シンボリック・コンカレント・確率ゲーム (NS-CSG) と呼ばれる,ニューラル・ネットワーク (NN) として実装された知覚機構を通して観測される共有連続状態環境において相互作用する確率的有限状態エージェントからなるモデリング形式を提案する。
価値を計算し,戦略を合成するために,実装可能なバリューイテレーション (vi) とポリシーイテレーション (pi) のアルゴリズムを初めて提示し,連続状態csgのクラスを解く。
まず、値関数のBorel測定可能なピースワイズ定数(B-PWC)表現を導入し、ミニマックスバックアップをこの表現に拡張し、B-PWC VIを提案する。
提案手法は,b-pwc viアルゴリズムのプロトタイプ実装を用いて,およそ最適戦略を生成することで,動的車両パーキングの例を示す。 Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise probabilistic finite-state agents interacting in a shared continuous-state environment observed through perception mechanisms implemented as neural networks (NNs). We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, implementable value iteration (VI) and policy iteration (PI) algorithms to solve a class of continuous-state CSGs. These require a finite representation of the pre-image of the environment's NN perception mechanism and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved. We illustrate our approach with a dynamic vehicle parking example by generating approximately optimal strategies using a prototype implementation of the B-PWC VI algorithm. | 翻訳日:2023-04-13 19:23:54 公開日:2023-04-12 |
# ランダムな量子ゲートの普遍集合の行列濃度不等式と効率 Matrix concentration inequalities and efficiency of random universal sets of quantum gates ( http://arxiv.org/abs/2202.05371v3 ) ライセンス: Link先を確認 | Piotr Dulian and Adam Sawicki | (参考訳) 量子ゲートのランダム集合 $\mathcal{s} \subset u(d)$ に対して、$\mathcal{s}$ が $\delta$-approximate $t$-design となる確率の境界を与える。
In particular we have found that for $\mathcal{S}$ drawn from an exact $t$-design the probability that it forms a $\delta$-approximate $t$-design satisfies the inequality $\mathbb{P}\left(\delta \geq x \right)\leq 2D_t \, \frac{e^{-|\mathcal{S}| x \, \mathrm{arctanh}(x)}}{(1-x^2)^{|\mathcal{S}|/2}} = O\left( 2D_t \left( \frac{e^{-x^2}}{\sqrt{1-x^2}} \right)^{|\mathcal{S}|} \right)$, where $D_t$ is a sum over dimensions of unique irreducible representations appearing in the decomposition of $U \mapsto U^{\otimes t}\otimes \bar{U}^{\otimes t}$.
この結果を用いて、確率$p$ で$\delta$-approximate $t$-design を得るには、$o( \delta^{-2}(t\log(d)-\log(1-p))) 個のランダムゲートが必要であることを示す。
また、$\delta$はその期待値$\mathbb{E}\delta$ for random $\mathcal{S}$にどのように集中するかを分析する。
我々の結果は対称ゲートと非対称ゲートの両方に対して有効である。 For a random set $\mathcal{S} \subset U(d)$ of quantum gates we provide bounds on the probability that $\mathcal{S}$ forms a $\delta$-approximate $t$-design. In particular we have found that for $\mathcal{S}$ drawn from an exact $t$-design the probability that it forms a $\delta$-approximate $t$-design satisfies the inequality $\mathbb{P}\left(\delta \geq x \right)\leq 2D_t \, \frac{e^{-|\mathcal{S}| x \, \mathrm{arctanh}(x)}}{(1-x^2)^{|\mathcal{S}|/2}} = O\left( 2D_t \left( \frac{e^{-x^2}}{\sqrt{1-x^2}} \right)^{|\mathcal{S}|} \right)$, where $D_t$ is a sum over dimensions of unique irreducible representations appearing in the decomposition of $U \mapsto U^{\otimes t}\otimes \bar{U}^{\otimes t}$. We use our results to show that to obtain a $\delta$-approximate $t$-design with probability $P$ one needs $O( \delta^{-2}(t\log(d)-\log(1-P)))$ many random gates. We also analyze how $\delta$ concentrates around its expected value $\mathbb{E}\delta$ for random $\mathcal{S}$. Our results are valid for both symmetric and non-symmetric sets of gates. | 翻訳日:2023-04-13 19:23:17 公開日:2023-04-12 |
# 偽零点によるスパースマルチウェイカウントデータに対するゼロトランク付きポアソン回帰 Zero-Truncated Poisson Regression for Sparse Multiway Count Data Corrupted by False Zeros ( http://arxiv.org/abs/2201.10014v2 ) ライセンス: Link先を確認 | Oscar L\'opez, Daniel M. Dunlavy, Richard B. Lehoucq | (参考訳) 本稿では,真のゼロカウントとは区別がつかない偽の零点によって崩壊する多元数データに対する新しい統計的推論手法を提案する。
我々の主な結果は、N$-way rank-$R$ parametric tensor $\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$$ Poisson observedを、約$IR^2\log_2^2(I)$ non-zero countsの非負の正準ポリアディック分解によるゼロトランカクテッドポアソン回帰によって正確に推定できることを示している。
そこで, 低ランクマルチパラメータモデルを用いて, 偽零点による実質的破損を伴う未決定シナリオにおいて, 精度の高い回帰を実現するための実装可能な手法を提案する。
理論的な結果を調べるためにいくつかの数値実験が行われた。 We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an $N$-way rank-$R$ parametric tensor $\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$ generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately $IR^2\log_2^2(I)$ non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results. | 翻訳日:2023-04-13 19:22:11 公開日:2023-04-12 |
# ノイズレスおよびノイズの多いプログラマブル量子プロセッサの熱状態の準備 Preparing thermal states on noiseless and noisy programmable quantum processors ( http://arxiv.org/abs/2112.14688v2 ) ライセンス: Link先を確認 | Oles Shtanko, Ramis Movassagh | (参考訳) 自然は正確な物理法則によって支配され、新しいコンピュータ実行シミュレーションアルゴリズムの発見を促すことができる。
既存の量子アルゴリズムには注意事項がある: ほとんどは量子位相推定が必要で、現在のうるさいハードウェアでは実用的でないか、初期化、不毛高原、証明可能な保証の一般的な欠如といった障害に直面した変分である。
本稿では,次世代量子コンピュータにおけるハードコアBose-Hubbardモデルの熱状態のシミュレーションを行う。 Nature is governed by precise physical laws, which can inspire the discovery of new computer-run simulation algorithms. Thermal states are the most ubiquitous for they are the equilibrium states of matter. Simulating thermal states of quantum matter has applications ranging from quantum machine learning to better understanding of high-temperature superconductivity and quantum chemistry. The computational complexity of this task is hopelessly hard for classical computers. The existing quantum algorithms come with caveats: most either require quantum phase estimation rendering them impractical for current noisy hardware, or are variational which face obstacles such as initialization, barren plateaus, and a general lack of provable guarantee. We provide two quantum algorithms with provable guarantees to prepare thermal states on (near-term) quantum computers that avoid these drawbacks. The first algorithm is inspired by the natural thermalization process where the ancilla qubits act as the infinite thermal bath. This algorithm can potentially run in polynomial time to sample thermal distributions of ergodic systems -- the vast class of physical systems that equilibrate in isolation with respect to local observables. The second algorithm works for any system and in general runs in exponential time. However, it requires significantly smaller quantum resources than previous such algorithms. In addition, we provide an error mitigation technique for both algorithms to fight back decoherence, which enables us to run our algorithms on the near-term quantum devices. To illustration, we simulate the thermal state of the hardcore Bose-Hubbard model on the latest generation of available quantum computers. | 翻訳日:2023-04-13 19:21:45 公開日:2023-04-12 |
# TemporalWiki: 進化し続ける言語モデルのトレーニングと評価のための生涯ベンチマーク TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models ( http://arxiv.org/abs/2204.14211v3 ) ライセンス: Link先を確認 | Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Minjoon Seo | (参考訳) 言語モデル(lms)は、世界が変化するにつれて時代遅れになり、訓練中に欠如または異なっていた最近の事実情報を必要とするタスクの実行に失敗する。
データセットとコードはhttps://github.com/joeljang/temporalwikiで入手できる。 Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning. The dataset and the code are available at https://github.com/joeljang/temporalwiki. | 翻訳日:2023-04-13 19:14:16 公開日:2023-04-12 |
# ブロック系複素アダマール行列 Block-Circulant Complex Hadamard Matrices ( http://arxiv.org/abs/2204.11727v3 ) ライセンス: Link先を確認 | Wojciech Bruzda | (参考訳) ブロック循環構造に基づく次元$N\geqslant 7$に対する孤立複素アダマール行列(CHM)の列を得る新しい方法を提案する。
特定の8次元行列間の新しい接続に注目し、$N\geqslant 7$に対するCHMの分類に対する新しい洞察を提供する。
これらのコントリビューションは、量子情報理論や、Mutually Unbiased Bases または Unitary Error Bases の新しいファミリーの構築において、実際の応用を見出すことができる。 A new method of obtaining a sequence of isolated complex Hadamard matrices (CHM) for dimensions $N\geqslant 7$, based on block-circulant structures, is presented. We discuss, several analytic examples resulted from a modification of the Sinkhorn algorithm. In particular, we present new isolated matrices of orders $9$, $10$ and $11$, which elements are not roots of unity, and also several new multiparametric families of order $10$. We note novel connections between certain eight-dimensional matrices and provide new insights towards classification of CHM for $N\geqslant 7$. These contributions can find real applications in Quantum Information Theory and constructions of new families of Mutually Unbiased Bases or Unitary Error Bases. | 翻訳日:2023-04-13 19:13:55 公開日:2023-04-12 |
# AutoMLBench: 自動機械学習フレームワークの総合的な実験的評価 AutoMLBench: A Comprehensive Experimental Evaluation of Automated Machine Learning Frameworks ( http://arxiv.org/abs/2204.08358v2 ) ライセンス: Link先を確認 | Hassan Eldeeb, Mohamed Maher, Radwa Elshawi, and Sherif Sakr | (参考訳) 機械学習アプリケーションの需要が急増する中で、知識のあるデータサイエンティストの数は、デジタル世界におけるデータボリュームやアプリケーションニーズの増大とともにスケールできないことが認識されている。
調査の結果から,AutoMLフレームワークの設計をガイドし,影響を与える,さまざまな興味深い洞察が得られた。 With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. In response to this demand, several automated machine learning (AutoML) frameworks have been developed to fill the gap of human expertise by automating the process of building machine learning pipelines. Each framework comes with different heuristics-based design decisions. In this study, we present a comprehensive evaluation and comparison of the performance characteristics of six popular AutoML frameworks, namely, AutoWeka, AutoSKlearn, TPOT, Recipe, ATM, and SmartML, across 100 data sets from established AutoML benchmark suites. Our experimental evaluation considers different aspects for its comparison, including the performance impact of several design decisions, including time budget, size of search space, meta-learning, and ensemble construction. The results of our study reveal various interesting insights that can significantly guide and impact the design of AutoML frameworks. | 翻訳日:2023-04-13 19:13:43 公開日:2023-04-12 |
# マルチエージェント学習におけるcredoの重要性 The Importance of Credo in Multiagent Learning ( http://arxiv.org/abs/2204.07471v2 ) ライセンス: Link先を確認 | David Radke, Kate Larson, Tim Brecht | (参考訳) 本稿では,複数のグループ(チーム)に構成されたシステム内のエージェントに対する,多目的最適化のモデルであるクレドを提案する。
我々は、すべてのエージェントの利益が一致している場合と比較して、高い平等と著しく高い平均人口報酬を達成する、完全な共通の関心を持たない2つのシナリオを特定する。 We propose a model for multi-objective optimization, a credo, for agents in a system that are configured into multiple groups (i.e., teams). Our model of credo regulates how agents optimize their behavior for the groups they belong to. We evaluate credo in the context of challenging social dilemmas with reinforcement learning agents. Our results indicate that the interests of teammates, or the entire system, are not required to be fully aligned for achieving globally beneficial outcomes. We identify two scenarios without full common interest that achieve high equality and significantly higher mean population rewards compared to when the interests of all agents are aligned. | 翻訳日:2023-04-13 19:13:27 公開日:2023-04-12 |
# 確率整合性と公正保証を用いたレコメンダシステムのためのテンソル補完 Tensor Completion with Provable Consistency and Fairness Guarantees for Recommender Systems ( http://arxiv.org/abs/2204.01815v3 ) ライセンス: Link先を確認 | Tung Nguyen and Jeffrey Uhlmann | (参考訳) 非負・正の行列とテンソル完備問題を定義・解決するための新しい一貫性に基づくアプローチを導入する。
最後に,提案するrs法の許容基準として,コンセンサス順序付け特性を提案する。 We introduce a new consistency-based approach for defining and solving nonnegative/positive matrix and tensor completion problems. The novelty of the framework is that instead of artificially making the problem well-posed in the form of an application-arbitrary optimization problem, e.g., minimizing a bulk structural measure such as rank or norm, we show that a single property/constraint: preserving unit-scale consistency, guarantees the existence of both a solution and, under relatively weak support assumptions, uniqueness. The framework and solution algorithms also generalize directly to tensors of arbitrary dimensions while maintaining computational complexity that is linear in problem size for fixed dimension d. In the context of recommender system (RS) applications, we prove that two reasonable properties that should be expected to hold for any solution to the RS problem are sufficient to permit uniqueness guarantees to be established within our framework. Key theoretical contributions include a general unit-consistent tensor-completion framework with proofs of its properties, e.g., consensus-order and fairness, and algorithms with optimal runtime and space complexities, e.g., O(1) term-completion with preprocessing complexity that is linear in the number of known terms of the matrix/tensor. From a practical perspective, the seamless ability of the framework to generalize to exploit high-dimensional structural relationships among key state variables, e.g., user and product attributes, offers a means for extracting significantly more information than is possible for alternative methods that cannot generalize beyond direct user-product relationships. Finally, we propose our consensus ordering property as an admissibility criterion for any proposed RS method. | 翻訳日:2023-04-13 19:12:41 公開日:2023-04-12 |
# 広量子ニューラルネットワークのダイナミクスに関する解析理論 Analytic theory for the dynamics of wide quantum neural networks ( http://arxiv.org/abs/2203.16711v3 ) ライセンス: Link先を確認 | Junyu Liu, Khadijeh Najafi, Kunal Sharma, Francesco Tacchino, Liang Jiang, Antonio Mezzacapo | (参考訳) パラメタライズド量子回路は量子ニューラルネットワークとして使用することができ、学習問題に対処するために訓練された場合、古典的な量子回路よりも優れる可能性がある。
解析結果を数値実験により検証した。 Parameterized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments. | 翻訳日:2023-04-13 19:11:47 公開日:2023-04-12 |
# ニューラルデータ・テキスト生成のイノベーション:サーベイ Innovations in Neural Data-to-text Generation: A Survey ( http://arxiv.org/abs/2207.12571v2 ) ライセンス: Link先を確認 | Mandar Sharma, Ajay Gogineni, Naren Ramakrishnan | (参考訳) 過去10年間に自然言語処理(NLP)研究を引き起こした神経ブームは、同様に、データ・テキスト生成(DTG)に大きな革新をもたらした。
この包括的視点では、言語能力のあるシステムの設計だけでなく、公平性と説明責任を示すシステムにも焦点をあてたdtg研究の有望な道筋を強調する。 The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability. | 翻訳日:2023-04-13 19:05:47 公開日:2023-04-12 |
# 機械学習におけるランクベースの分解可能な損失:調査 Rank-based Decomposable Losses in Machine Learning: A Survey ( http://arxiv.org/abs/2207.08768v2 ) ライセンス: Link先を確認 | Shu Hu, Xin Wang, Siwei Lyu | (参考訳) 最近の研究で、個々の損失と集約損失を区別する損失関数を設計する上で不可欠なパラダイムが明らかになった。
また,非探索的・残存的・新たな課題にまたがる今後の研究の方向性も提案する。 Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses. | 翻訳日:2023-04-13 19:05:32 公開日:2023-04-12 |
# 時間的注意ユニット:時空間予測学習の効率化を目指して Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning ( http://arxiv.org/abs/2206.12126v3 ) ライセンス: Link先を確認 | Cheng Tan, Zhangyang Gao, Lirong Wu, Yongjie Xu, Jun Xia, Siyuan Li, Stan Z. Li | (参考訳) 時空間予測学習は、歴史的枠組みから学習することで将来のフレームを生成することを目的としている。
大規模な実験により,提案手法により,種々の時空間予測ベンチマークにおいて,導出モデルによる競合性能の達成が可能となった。 Spatiotemporal predictive learning aims to generate future frames by learning from historical frames. In this paper, we investigate existing methods and present a general framework of spatiotemporal predictive learning, in which the spatial encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame correlations. While the mainstream methods employ recurrent units to capture long-term temporal dependencies, they suffer from low computational efficiency due to their unparallelizable architectures. To parallelize the temporal module, we propose the Temporal Attention Unit (TAU), which decomposes the temporal attention into intra-frame statical attention and inter-frame dynamical attention. Moreover, while the mean squared error loss focuses on intra-frame errors, we introduce a novel differential divergence regularization to take inter-frame variations into account. Extensive experiments demonstrate that the proposed method enables the derived model to achieve competitive performance on various spatiotemporal prediction benchmarks. | 翻訳日:2023-04-13 19:04:47 公開日:2023-04-12 |
# NusaX: インドネシアの10のローカル言語のための多言語並列感データセット NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages ( http://arxiv.org/abs/2205.15960v2 ) ライセンス: Link先を確認 | Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder | (参考訳) 自然言語処理(nlp)は機械翻訳や検索エンジンといった技術によって社会に大きな影響を与える。
インドネシアや他の少数言語に関するNLP研究のきっかけになることを期待しています。 Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages. | 翻訳日:2023-04-13 19:04:23 公開日:2023-04-12 |
# 非摂食型摂動型ポストホック説明器 Unfooling Perturbation-Based Post Hoc Explainers ( http://arxiv.org/abs/2205.14772v3 ) ライセンス: Link先を確認 | Zachariah Carmichael, Walter J Scheirer | (参考訳) 人工知能(AI)の目覚ましい進歩は、医師、貸し手、裁判官、その他の専門家の関心を引き付けている。
このことを念頭に置いて、いくつかの自然な疑問 - これらのブラックボックスシステムを監査するにはどうすればよいのか?
提案手法は,ブラックボックスが意思決定過程を逆行的に隠蔽するか否かを検知し,実世界のデータに対する敵攻撃を緩和するものである。 Monumental advancements in artificial intelligence (AI) have lured the interest of doctors, lenders, judges, and other professionals. While these high-stakes decision-makers are optimistic about the technology, those familiar with AI systems are wary about the lack of transparency of its decision-making processes. Perturbation-based post hoc explainers offer a model agnostic means of interpreting these systems while only requiring query-level access. However, recent work demonstrates that these explainers can be fooled adversarially. This discovery has adverse implications for auditors, regulators, and other sentinels. With this in mind, several natural questions arise - how can we audit these black box systems? And how can we ascertain that the auditee is complying with the audit in good faith? In this work, we rigorously formalize this problem and devise a defense against adversarial attacks on perturbation-based explainers. We propose algorithms for the detection (CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP. | 翻訳日:2023-04-13 19:04:10 公開日:2023-04-12 |
# NLP技術の多様性・等価性・包含性の評価:インドの言語を事例として Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages ( http://arxiv.org/abs/2205.12676v3 ) ライセンス: Link先を確認 | Simran Khanuja, Sebastian Ruder, Partha Talukdar | (参考訳) NLP技術が広く適用され、公平で有用なものにするためには、世界中の様々な話者、すなわち特定の言語に不適切な偏見を持たず、特に計算制約が一般的である低リソース環境において、すべてのユーザを包括的に扱う必要がある。
最後に,これらのバイアスを緩和するためのステップについて議論し,言語学的に多様で平等な技術を構築する際に,多面的な評価を行うことをコミュニティに促す。 In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages, be equitable, i.e., not unduly biased towards any particular language, and be inclusive of all users, particularly in low-resource settings where compute constraints are common. In this paper, we propose an evaluation paradigm that assesses NLP technologies across all three dimensions. While diversity and inclusion have received attention in recent literature, equity is currently unexplored. We propose to address this gap using the Gini coefficient, a well-established metric used for estimating societal wealth inequality. Using our paradigm, we highlight the distressed state of current technologies for Indian (IN) languages (a linguistically large and diverse set, with a varied speaker population), across all three dimensions. To improve upon these metrics, we demonstrate the importance of region-specific choices in model building and dataset creation, and more importantly, propose a novel, generalisable approach to optimal resource allocation during fine-tuning. Finally, we discuss steps to mitigate these biases and encourage the community to employ multi-faceted evaluation when building linguistically diverse and equitable technologies. | 翻訳日:2023-04-13 19:03:50 公開日:2023-04-12 |
# 微調整済み言語モデルにラベル規則化が必要か? Do we need Label Regularization to Fine-tune Pre-trained Language Models? ( http://arxiv.org/abs/2205.12428v2 ) ライセンス: Link先を確認 | Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi | (参考訳) 知識蒸留(KD)は、教師のネットワーク予測に大きく依存し、学生モデルのトレーニングを指導する顕著なニューラルモデル圧縮技術である。
プレトレーニング言語モデル (PLM) の長期化を考えると、KD は PLM を含む多くの NLP タスクでよく採用されている。
さらに、NLPとコンピュータビジョンタスクの異なる設定でこの現象を探求し、事前学習自体が正規化の一種であり、追加のラベル正規化は不要であることを示す。 Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model. Considering the ever-growing size of pre-trained language models (PLMs), KD is often adopted in many NLP tasks involving PLMs. However, it is evident that in KD, deploying the teacher network during training adds to the memory and computational requirements of training. In the computer vision literature, the necessity of the teacher network is put under scrutiny by showing that KD is a label regularization technique that can be replaced with lighter teacher-free variants such as the label-smoothing technique. However, to the best of our knowledge, this issue is not investigated in NLP. Therefore, this work concerns studying different label regularization techniques and whether we actually need them to improve the fine-tuning of smaller PLM networks on downstream tasks. In this regard, we did a comprehensive set of experiments on different PLMs such as BERT, RoBERTa, and GPT with more than 600 distinct trials and ran each configuration five times. This investigation led to a surprising observation that KD and other label regularization techniques do not play any meaningful role over regular fine-tuning when the student model is pre-trained. We further explore this phenomenon in different settings of NLP and computer vision tasks and demonstrate that pre-training itself acts as a kind of regularization, and additional label regularization is unnecessary. | 翻訳日:2023-04-13 19:03:30 公開日:2023-04-12 |
# パッチの学習によるCNNの次元曲線回避 CNNs Avoid Curse of Dimensionality by Learning on Patches ( http://arxiv.org/abs/2205.10760v4 ) ライセンス: Link先を確認 | Vamshi C. Madala and Shivkumar Chandrasekaran and Jason Bunk | (参考訳) 多くのコンピュータビジョンタスクにおける畳み込みニューラルネットワーク(CNN)の成功と、その異常な一般化性能にもかかわらず、CNNの一般化エラーを予測しようとする試みは、これまでのところ、後続解析に限られている。
我々のパッチベースの理論はまた、CNNの一般化誤差を改善するために、Cutout、CutMix、ランダムトリミングといったデータ拡張技術が有効である理由を説明する。 Despite the success of convolutional neural networks (CNNs) in numerous computer vision tasks and their extraordinary generalization performances, several attempts to predict the generalization errors of CNNs have only been limited to a posteriori analyses thus far. A priori theories explaining the generalization performances of deep neural networks have mostly ignored the convolutionality aspect and do not specify why CNNs are able to seemingly overcome curse of dimensionality on computer vision tasks like image classification where the image dimensions are in thousands. Our work attempts to explain the generalization performance of CNNs on image classification under the hypothesis that CNNs operate on the domain of image patches. Ours is the first work we are aware of to derive an a priori error bound for the generalization error of CNNs and we present both quantitative and qualitative evidences in the support of our theory. Our patch-based theory also offers explanation for why data augmentation techniques like Cutout, CutMix and random cropping are effective in improving the generalization error of CNNs. | 翻訳日:2023-04-13 19:03:05 公開日:2023-04-12 |
# ニューラルコンビネーション最適化はどの程度優れているか?
旅行セールスマン問題に関するシステム評価 How Good Is Neural Combinatorial Optimization? A Systematic Evaluation on the Traveling Salesman Problem ( http://arxiv.org/abs/2209.10913v2 ) ライセンス: Link先を確認 | Shengcai Liu, Yu Zhang, Ke Tang, Xin Yao | (参考訳) 組合せ最適化(co)問題に取り組む従来の解法は通常、人間の専門家によって設計される。
近年, 深層学習, 特に深層強化学習の活用への関心が高まっており, COの効率的な解法を自動学習している。
結果として得られる新しいパラダイムはneural combinatorial optimization(nco)と呼ばれる。
具体的には, 走行セールスマン問題をテストベッド問題として, 有効性, 効率性, 安定性, スケーラビリティ, 一般化能力の5つの側面で評価する。
以上の結果から, NCO アプローチで学習した解法は, 従来の解法には及ばないことが明らかとなった。
この研究は、NCOの強みと弱みをより深く理解し、NCOアプローチをさらにベンチマークするための包括的な評価プロトコルを提供するのに役立つことを期待している。 Traditional solvers for tackling combinatorial optimization (CO) problems are usually designed by human experts. Recently, there has been a surge of interest in utilizing deep learning, especially deep reinforcement learning, to automatically learn effective solvers for CO. The resultant new paradigm is termed neural combinatorial optimization (NCO). However, the advantages and disadvantages of NCO relative to other approaches have not been empirically or theoretically well studied. This work presents a comprehensive comparative study of NCO solvers and alternative solvers. Specifically, taking the traveling salesman problem as the testbed problem, the performance of the solvers is assessed in five aspects, i.e., effectiveness, efficiency, stability, scalability, and generalization ability. Our results show that the solvers learned by NCO approaches, in general, still fall short of traditional solvers in nearly all these aspects. A potential benefit of NCO solvers would be their superior time and energy efficiency for small-size problem instances when sufficient training instances are available. Hopefully, this work would help with a better understanding of the strengths and weaknesses of NCO and provide a comprehensive evaluation protocol for further benchmarking NCO approaches in comparison to other approaches. | 翻訳日:2023-04-13 18:56:50 公開日:2023-04-12 |
# フランジプラットフォームからの反社会的行動のスパイル : コミュニティ禁止の意図しない結果 Spillover of Antisocial Behavior from Fringe Platforms: The Unintended Consequences of Community Banning ( http://arxiv.org/abs/2209.09803v2 ) ライセンス: Link先を確認 | Giuseppe Russo, Luca Verginer, Manoel Horta Ribeiro, Giona Casiraghi | (参考訳) オンラインプラットフォームは、コミュニティを公然と尊重し続けるよう圧力にさらされている。
r/The_Donald, r/GenderCritical, r/Incelsの3つの禁止されたコミュニティから, 約70,000人のユーザを分析して, この流出の可能性を調査した。
要するに私たちは、fringeプラットフォームからredditへの共同参加を通じて、反社会的行動が流出した証拠を見つけました。 Online platforms face pressure to keep their communities civil and respectful. Thus, the bannings of problematic online communities from mainstream platforms like Reddit and Facebook are often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and where antisocial behaviors like trolling and harassment are widely accepted. As users of these communities often remain co-active across mainstream and fringe platforms, antisocial behaviors may spill over onto the mainstream platform. We study this possible spillover by analyzing around 70,000 users from three banned communities that migrated to fringe platforms: r/The_Donald, r/GenderCritical, and r/Incels. Using a difference-in-differences design, we contrast co-active users with matched counterparts to estimate the causal effect of fringe platform participation on users' antisocial behavior on Reddit. Our results show that participating in the fringe communities increases users' toxicity on Reddit (as measured by Perspective API) and involvement with subreddits similar to the banned community -- which often also breach platform norms. The effect intensifies with time and exposure to the fringe platform. In short, we find evidence for a spillover of antisocial behavior from fringe platforms onto Reddit via co-participation. | 翻訳日:2023-04-13 18:56:31 公開日:2023-04-12 |
# 量子トッフォリゲートのハードウェアによる最適化 Hardware-Conscious Optimization of the Quantum Toffoli Gate ( http://arxiv.org/abs/2209.02669v3 ) ライセンス: Link先を確認 | Max Aksel Bowman, Pranav Gokhale, Jeffrey Larson, Ji Liu, Martin Suchara | (参考訳) 量子コンピューティングは組合せ最適化、電子構造計算、数論において大きな可能性を秘めているが、現在の量子コンピューティングの時代はノイズの多いハードウェアによって制限されている。
ibmqネイティブゲートセット上での toffoli ゲートの最適化に重点を置いているが,提案手法は任意のゲートと超伝導キュービットアーキテクチャに一般化可能である。
最適化されたToffoliゲートの実装は、IBM Jakartaで量子プロセストモグラフィーでベンチマークされた標準実装と比較して、18 %の不忠実さの低減を示す。
ibmqネイティブゲートセットにマルチキュービット相互共振 (mcr) ゲートが組み込まれていると仮定すると、6つのマルチキュービットゲートしか持たない toffoli 実装を作成し、リニア接続されたキュービットに対する標準の8つのマルチキュービット実装から$25\%の削減を行う。 While quantum computing holds great potential in combinatorial optimization, electronic structure calculation, and number theory, the current era of quantum computing is limited by noisy hardware. Many quantum compilation approaches can mitigate the effects of imperfect hardware by optimizing quantum circuits for objectives such as critical path length. Few approaches consider quantum circuits in terms of the set of vendor-calibrated operations (i.e., native gates) available on target hardware. This manuscript expands the analytical and numerical approaches for optimizing quantum circuits at this abstraction level. We present a procedure for combining the strengths of analytical native gate-level optimization with numerical optimization. Although we focus on optimizing Toffoli gates on the IBMQ native gate set, the methods presented are generalizable to any gate and superconducting qubit architecture. Our optimized Toffoli gate implementation demonstrates an $18\%$ reduction in infidelity compared with the canonical implementation as benchmarked on IBM Jakarta with quantum process tomography. Assuming the inclusion of multi-qubit cross-resonance (MCR) gates in the IBMQ native gate set, we produce Toffoli implementations with only six multi-qubit gates, a $25\%$ reduction from the canonical eight multi-qubit implementations for linearly connected qubits. | 翻訳日:2023-04-13 18:55:46 公開日:2023-04-12 |
# 一般化ピーターマン因子による非エルミート系におけるバルクおよびエッジ例外点の検出 Detecting bulk and edge exceptional points in non-Hermitian systems through generalized Petermann factors ( http://arxiv.org/abs/2208.14944v2 ) ライセンス: Link先を確認 | Yue-Yu Zou, Yao Zhou, Li-Mei Chen, Peng Ye | (参考訳) 非エルミート量子系における非直交性は、非ユニタリティに遡り、複素エネルギースペクトルよりも基礎的かつ普遍的な、非常にエキゾチックな量子現象をもたらす。
非エルミート系のモデルパラメータをチューニングすることにより、$\eta$とその一階微分($\partial \eta$)の不連続性は、本質的に非ユニタリ性によって引き起こされるリッチな物理学を顕著に捉えていることが分かる。
我々は,$\partial\eta$ の不連続性について,二段階非エルミートモデルを調べ,$\partial \eta$ の不連続点とバルク状態の eps との接続を確立する。
この関係をより一般的な格子モデルで研究することにより、いくつかのモデルは$\partial\eta$の不連続性を持ち、バルク状態におけるEPの存在を示唆する。 Non-orthogonality in non-Hermitian quantum systems gives rise to tremendous exotic quantum phenomena, which can be fundamentally traced back to non-unitarity and is much more fundamental and universal than complex energy spectrum. In this paper, we introduce an interesting quantity (denoted as $\eta$) as a new variant of the Petermann factor to directly and efficiently measure non-unitarity and the associated non-Hermitian physics. By tuning the model parameters of underlying non-Hermitian systems, we find that the discontinuity of both $\eta$ and its first-order derivative (denoted as $\partial \eta$) pronouncedly captures rich physics that is fundamentally caused by non-unitarity. More concretely, in the 1D non-Hermitian topological systems, two mutually orthogonal edge states that are respectively localized on two boundaries become non-orthogonal in the vicinity of discontinuity of $\eta$ as a function of the model parameter, which is dubbed ``edge state transition''. Through theoretical analysis, we identify that the appearance of edge state transition indicates the existence of exceptional points~(EPs) in topological edge states. Regarding the discontinuity of $\partial\eta$, we investigate a two-level non-Hermitian model and establish a connection between the points of discontinuity of $\partial \eta$ and EPs of bulk states. By studying this connection in more general lattice models, we find that some models have discontinuity of $\partial\eta$, implying the existence of EPs in bulk states. | 翻訳日:2023-04-13 18:55:23 公開日:2023-04-12 |
# 複雑なネットワーク理論を用いた分散型エネルギー資源を用いた配電システムの計画と運用のレジリエンス評価 Evaluating the Planning and Operational Resilience of Electrical Distribution Systems with Distributed Energy Resources using Complex Network Theory ( http://arxiv.org/abs/2208.11543v3 ) ライセンス: Link先を確認 | Divyanshi Dwivedi, Pradeep Kumar Yemula, Mayukha Pal | (参考訳) 電気系統は分散エネルギー資源(ders)によって広範囲に浸透し、エネルギー需要にシステムのレジリエンスを高めるという一般的な認識を満たしている。
提案手法は, 異なる条件下でレジリエンスを維持しつつ, システム内のソーラーパネルのホスト容量を同定し, システムの非レジリエンス化に寄与する最重要ノードを特定するのにも適している。
このフレームワークは、シミュレーションソフトウェアGridLAB-Dを用いて、様々な電気条件のアクティブ電力時系列データを生成することにより、IEEE 123ノードテストフィード上で実証される。
パーコレーション閾値は配電システムの計画と運用のレジリエンスの決定に有効な指標となった。 Electrical Distribution Systems are extensively penetrated with Distributed Energy Resources (DERs) to cater the energy demands with the general perception that it enhances the system's resilience. However, integration of DERs may adversely affect the grid operation and affect the system resilience due to various factors like their intermittent availability, dynamics of weather conditions, non-linearity, complexity, number of malicious threats, and improved reliability requirements of consumers. This paper proposes a methodology to evaluate the planning and operational resilience of power distribution systems under extreme events and determines the withstand capability of the electrical network. The proposed framework is developed by effectively employing the complex network theory. Correlated networks for undesirable configurations are developed from the time series data of active power monitored at nodes of the electrical network. For these correlated networks, computed the network parameters such as clustering coefficient, assortative coefficient, average degree and power law exponent for the anticipation; and percolation threshold for the determination of the network withstand capability under extreme conditions. The proposed methodology is also suitable for identifying the hosting capacity of solar panels in the system while maintaining resilience under different unfavourable conditions and identifying the most critical nodes of the system that could drive the system into non-resilience. This framework is demonstrated on IEEE 123 node test feeder by generating active power time-series data for a variety of electrical conditions using simulation software, GridLAB-D. The percolation threshold resulted as an effective metric for the determination of the planning and operational resilience of the power distribution system. | 翻訳日:2023-04-13 18:54:52 公開日:2023-04-12 |
# 習熟度に基づく日射量予測のメタ分析 A Meta-Analysis of Solar Forecasting Based on Skill Score ( http://arxiv.org/abs/2208.10536v2 ) ライセンス: Link先を確認 | Thi Ngoc Nguyen and Felix M\"usgens | (参考訳) Google Scholarから1,447枚の論文をスクリーニングし,データ抽出のための320枚の論文の全文をレビューした。
位置変数を含む予測間の重要な違いを制御することで,この知見をグローバルに適用することができる。 We conduct the first comprehensive meta-analysis of deterministic solar forecasting based on skill score, screening 1,447 papers from Google Scholar and reviewing the full texts of 320 papers for data extraction. A database of 4,687 points was built and analyzed with multivariate adaptive regression spline modelling, partial dependence plots, and linear regression. The marginal impacts on skill score of ten factors were quantified. The analysis shows the non-linearity and complex interaction between variables in the database. Forecast horizon has a central impact and dominates other factors' impacts. Therefore, the analysis of solar forecasts should be done separately for each horizon. Climate zone variables have statistically significant correlation with skill score. Regarding inputs, historical data and spatial temporal information are highly helpful. For intra-day, sky and satellite images show the most importance. For day-ahead, numerical weather predictions and locally measured meteorological data are very efficient. All forecast models were compared. Ensemble-hybrid models achieve the most accurate forecasts for all horizons. Hybrid models show superiority for intra-hour while image-based methods are the most efficient for intra-day forecasts. More training data can enhance skill score. However, over-fitting is observed when there is too much training data (longer than 2000 days). There has been a substantial improvement in solar forecast accuracy, especially in recent years. More improvement is observed for intra-hour and intra-day than day-ahead forecasts. By controlling for the key differences between forecasts, including location variables, our findings can be applied globally. | 翻訳日:2023-04-13 18:54:24 公開日:2023-04-12 |
# 放射線学におけるゼロショットオーバストインテリジェンスを可能にする画像とレポートからの自己教師型マルチモーダルトレーニング Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology ( http://arxiv.org/abs/2208.05140v4 ) ライセンス: Link先を確認 | Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, and Jong Chul Ye | (参考訳) oversight aiは放射線医学における新たな概念であり、放射線科医の意思決定を継続的に支援することにより、放射線科医との共生を形成する。
本手法は,臨床で頻繁に発生するデータ制限設定において特に成功し,医療領域に広く適用できる可能性が示唆された。 Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language models in the medical domain, as the current vision-language models and learning strategies for photographic images and captions call for the web-scale data corpus of image and text pairs which was not often feasible in the medical domain. To address this, here we present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL), leveraging the key components to be tailored for the medical domain. Our medical X-VL model is based on the following components: self-supervised uni-modal models in medical domain and fusion encoder to bridge them, momentum distillation, sentence-wise contrastive learning for medical reports, and the sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed the current state-of-the-art models in two different medical image database, suggesting the novel clinical usage of our oversight AI model for monitoring human errors. Our method was especially successful in the data-limited setting, which is frequently encountered in the clinics, suggesting the potential widespread applicability in medical domain. | 翻訳日:2023-04-13 18:54:07 公開日:2023-04-12 |
# 集合量子エンジンの信頼性の二次的向上 Quadratic Enhancement in the Reliability of Collective Quantum Engines ( http://arxiv.org/abs/2208.04250v2 ) ライセンス: Link先を確認 | Noufal Jaseem, Sai Vinjanampathy and Victor Mukherjee | (参考訳) 集合系-バス相互作用の存在下で動作する多体量子熱エンジンの変動について検討する。
その結果, 集団効果は高い信頼性 (r$) と低い熱力学的不確実性によって定量化され, 出力の変動を著しく低減できることがわかった。
独立系エンジンとは対照的に, 集合型エンジンの信頼性が2次的に向上することを示す。
これは、多くの身体システムにおける現実的な集合量子熱機械への道を開く。 We study fluctuations in many-body quantum heat engines operating in the presence of collective system-bath interactions. We show that collective effects in open quantum systems can be harnessed to develop highly consistent many-body quantum engines. We consider quantum Otto engines, modeled by $n$ spins collectively coupled to thermal baths. Our results show that collective effects can significantly reduce the fluctuations in the output work, quantified by high reliability ($r$) and low thermodynamic uncertainty. In contrast to independent engines, we demonstrate a quadratic enhancement of the reliability $r$ for their collective counterparts. We extend our analysis to the case of interacting spin models commonly studied in many-body physics, such as the Lipkin-Meshkov-Glick (LMG) model, thereby broadening the regime of applicability of collective effects in quantum thermal machines significantly. This paves the way forward for realistic collective quantum thermal machines in many body systems. | 翻訳日:2023-04-13 18:53:34 公開日:2023-04-12 |
# 言語モデルはより良いプログラミングを教えることができる Language Models Can Teach Themselves to Program Better ( http://arxiv.org/abs/2207.14502v4 ) ライセンス: Link先を確認 | Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai | (参考訳) 最近の言語モデル(LM)は、人間による問題や、競争力のあるプログラミングの問題を解決することで、コード生成において画期的なパフォーマンスを達成する。
問題はプログラミングパズル[schuster et al., 2021]として公式に指定され、コードベースの問題フォーマットで、ソリューションは実行時に容易に検証できる。
この研究は、コードLMがインタプリタとともに、インストラクティブな問題を引き起こし、自身のパフォーマンスを改善する可能性を実証している。 Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model 'improves itself' using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance. | 翻訳日:2023-04-13 18:53:22 公開日:2023-04-12 |
# 異なる次元にわたる入力変換を用いた多変量時系列分類の実証評価 An Empirical Evaluation of Multivariate Time Series Classification with Input Transformation across Different Dimensions ( http://arxiv.org/abs/2210.07713v2 ) ライセンス: Link先を確認 | Leonardos Pantiskas, Kees Verstoep, Mark Hoogendoorn, Henri Bal | (参考訳) 現在の研究では、時間データの分類のための機械学習とディープラーニングのソリューションが、単一チャネルデータセット(ユニバリケート)から複数のチャネル情報(マルチバリケート)の問題へとシフトしている。
本評価では, 追加チャネル次元が自明なものではなく, スケーリングに対する異なるアプローチが解の精度を著しく異なる結果に導くことを実証することを目的とする。
最後に,変換手法と次元と分類器との関係について検討し,一般的な傾向はなく,最適な構成はデータセットと分類器固有のものであると結論付けた。 In current research, machine and deep learning solutions for the classification of temporal data are shifting from single-channel datasets (univariate) to problems with multiple channels of information (multivariate). The majority of these works are focused on the method novelty and architecture, and the format of the input data is often treated implicitly. Particularly, multivariate datasets are often treated as a stack of univariate time series in terms of input preprocessing, with scaling methods applied across each channel separately. In this evaluation, we aim to demonstrate that the additional channel dimension is far from trivial and different approaches to scaling can lead to significantly different results in the accuracy of a solution. To that end, we test seven different data transformation methods on four different temporal dimensions and study their effect on the classification accuracy of five recent methods. We show that, for the large majority of tested datasets, the best transformation-dimension configuration leads to an increase in the accuracy compared to the result of each model with the same hyperparameters and no scaling, ranging from 0.16 to 76.79 percentage points. We also show that if we keep the transformation method constant, there is a statistically significant difference in accuracy results when applying it across different dimensions, with accuracy differences ranging from 0.23 to 47.79 percentage points. Finally, we explore the relation of the transformation methods and dimensions to the classifiers, and we conclude that there is no prominent general trend, and the optimal configuration is dataset- and classifier-specific. | 翻訳日:2023-04-13 18:48:05 公開日:2023-04-12 |
# Mask3D: 3Dセマンティックインスタンスセグメンテーションのためのマスク変換器 Mask3D: Mask Transformer for 3D Semantic Instance Segmentation ( http://arxiv.org/abs/2210.03105v2 ) ライセンス: Link先を確認 | Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe | (参考訳) 現代の3dセマンティクスインスタンスセグメンテーションのアプローチは、主に特殊な投票機構と、注意深く設計された幾何学的クラスタリング技術に依存している。
Mask3Dは新しい最先端ScanNetテスト(+6.2 mAP)、S3DIS 6-fold(+10.1 mAP)、STPLS3D(+11.2 mAP)、ScanNet200テスト(+12.4 mAP)をセットする。 Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. In our model called Mask3D each object instance is represented as an instance query. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Combined with point features, the instance queries directly yield all instance masks in parallel. Mask3D has several advantages over current state-of-the-art approaches, since it neither relies on (1) voting schemes which require hand-selected geometric properties (such as centers) nor (2) geometric grouping mechanisms requiring manually-tuned hyper-parameters (e.g. radii) and (3) enables a loss that directly optimizes instance masks. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP). | 翻訳日:2023-04-13 18:47:16 公開日:2023-04-12 |
# TimesNet: 時系列解析のための時間的2次元変動モデリング TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis ( http://arxiv.org/abs/2210.02186v3 ) ライセンス: Link先を確認 | Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, Mingsheng Long | (参考訳) 時系列解析は、天気予報、異常検出、行動認識などの広範囲の応用において非常に重要である。
コードは、このリポジトリで入手できる。 Time series analysis is of immense importance in extensive applications, such as weather forecasting, anomaly detection, and action recognition. This paper focuses on temporal variation modeling, which is the common key problem of extensive analysis tasks. Previous methods attempt to accomplish this directly from the 1D time series, which is extremely challenging due to the intricate temporal patterns. Based on the observation of multi-periodicity in time series, we ravel out the complex temporal variations into the multiple intraperiod- and interperiod-variations. To tackle the limitations of 1D time series in representation capability, we extend the analysis of temporal variations into the 2D space by transforming the 1D time series into a set of 2D tensors based on multiple periods. This transformation can embed the intraperiod- and interperiod-variations into the columns and rows of the 2D tensors respectively, making the 2D-variations to be easily modeled by 2D kernels. Technically, we propose the TimesNet with TimesBlock as a task-general backbone for time series analysis. TimesBlock can discover the multi-periodicity adaptively and extract the complex temporal variations from transformed 2D tensors by a parameter-efficient inception block. Our proposed TimesNet achieves consistent state-of-the-art in five mainstream time series analysis tasks, including short- and long-term forecasting, imputation, classification, and anomaly detection. Code is available at this repository: https://github.com/thuml/TimesNet. | 翻訳日:2023-04-13 18:46:22 公開日:2023-04-12 |
# 3層サンプリングとパノプティカル表現を用いた都市規模インクリメンタルニューラルマッピング City-scale Incremental Neural Mapping with Three-layer Sampling and Panoptic Representation ( http://arxiv.org/abs/2209.14072v2 ) ライセンス: Link先を確認 | Yongliang Shi, Runyi Yang, Pengfei Li, Zirui Wu, Hao Zhao, Guyue Zhou | (参考訳) ニューラルな暗黙の表現は、表現力があり、連続的でコンパクトであるため、最近ロボットコミュニティから多くの注目を集めている。
コードとモデルは公開されます。 Neural implicit representations are drawing a lot of attention from the robotics community recently, as they are expressive, continuous and compact. However, city-scale continual implicit dense mapping based on sparse LiDAR input is still an under-explored challenge. To this end, we successfully build a city-scale continual neural mapping system with a panoptic representation that consists of environment-level and instance-level modelling. Given a stream of sparse LiDAR point cloud, it maintains a dynamic generative model that maps 3D coordinates to signed distance field (SDF) values. To address the difficulty of representing geometric information at different levels in city-scale space, we propose a tailored three-layer sampling strategy to dynamically sample the global, local and near-surface domains. Meanwhile, to realize high fidelity mapping of instance under incomplete observation, category-specific prior is introduced to better model the geometric details. We evaluate on the public SemanticKITTI dataset and demonstrate the significance of the newly proposed three-layer sampling strategy and panoptic representation, using both quantitative and qualitative results. Codes and model will be publicly available. | 翻訳日:2023-04-13 18:45:42 公開日:2023-04-12 |
# トリパーティイト非局所性を用いたデバイス非依存暗号の高速化 Boosting device-independent cryptography with tripartite nonlocality ( http://arxiv.org/abs/2209.12828v2 ) ライセンス: Link先を確認 | Federico Grasselli, Gl\'aucia Murta, Hermann Kampermann, Dagmar Bru{\ss} | (参考訳) DI会議鍵契約(DICKA)やDIランダムネス拡張(DIRE)のようなデバイス非依存(DI)プロトコルは、2つ以上のパーティがベルの不等式をテストすると、非局所的相関を観察することによってプライベートランダム性を検証する。
さらに,DICKAの必要性は疑問視されているものの,真の多部絡み合いは多部DIREの前提条件ではないことが確認された。 Device-independent (DI) protocols, such as DI conference key agreement (DICKA) and DI randomness expansion (DIRE), certify private randomness by observing nonlocal correlations when two or more parties test a Bell inequality. While most DI protocols are restricted to bipartite Bell tests, harnessing multipartite nonlocal correlations may lead to better performance. Here, we consider tripartite DICKA and DIRE protocols based on testing multipartite Bell inequalities, specifically: the Mermin-Ardehali-Belinskii-Klyshko (MABK) inequality, and the Holz and the Parity-CHSH inequalities introduced in the context of DICKA protocols. We evaluate the asymptotic performance of the DICKA (DIRE) protocols in terms of their conference key rate (net randomness generation rate), by deriving lower bounds on the conditional von Neumann entropy of one party's outcome and two parties' outcomes. For the Holz inequality, we prove a tight analytical lower bound on the one-outcome entropy and conjecture a tight lower bound on the two-outcome entropy. We additionally re-derive the analytical one-outcome entropy bound for the MABK inequality with a much simpler method and obtain a numerical lower bound on the two-outcome entropy for the Parity-CHSH inequality. Our simulations show that DICKA and DIRE protocols employing tripartite Bell inequalities can significantly outperform their bipartite counterparts. Moreover, we establish that genuine multipartite entanglement is not a precondition for multipartite DIRE while its necessity for DICKA remains an open question. | 翻訳日:2023-04-13 18:45:20 公開日:2023-04-12 |
# 実演からの高速長寿命適応逆強化学習 Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations ( http://arxiv.org/abs/2209.11908v7 ) ライセンス: Link先を確認 | Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Viacheslav Zakharov, Matthew Gombolay | (参考訳) 実証から学ぶ(LfD)アプローチは、エンドユーザーに対して、望ましい振る舞いのデモを通じてロボットに新しいタスクを教えること、ロボット工学へのアクセスを民主化する。
本稿では,新しいLfDフレームワークであるFast Lifelong Adaptive Inverse Reinforcement Learning (FLAIR)を提案する。
最後に,テーブルテニスにおけるFLAIRの成功を実証し,FLAIRをより高いタスク (p<.05) とパーソナライズ性能 (p<.05) で評価した。 Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance. | 翻訳日:2023-04-13 18:44:46 公開日:2023-04-12 |
# 非線形熱電流の量子力学理論 Quantum kinetic theory of nonlinear thermal current ( http://arxiv.org/abs/2211.01895v2 ) ライセンス: Link先を確認 | Harsh Varshney, Kamal Das, Pankaj Bhalla, and Amit Agarwal | (参考訳) 温度勾配による2次非線形電子熱輸送について検討する。
これを用いて, 固有散乱時間独立非線形熱電流と, 既知の非線形ドリュードおよびベリー曲率双極子寄与を予測した。
傾斜した大規模ディラック系における熱応答の研究に, 理論を応用した。
異なる散乱時間依存性に加えて, 種々の電流寄与は低温限界において異なる温度依存性を有することを示す。
非線形熱輸送の系統的および包括的理論は,本質的熱応答に関する将来の理論的および実験的研究の道を開く。 We investigate the second-order nonlinear electronic thermal transport induced by temperature gradient. We develop the quantum kinetic theory framework to describe thermal transport in presence of a temperature gradient. Using this, we predict an intrinsic scattering time independent nonlinear thermal current in addition to the known extrinsic nonlinear Drude and Berry curvature dipole contributions. We show that the intrinsic thermal current is determined by the band geometric quantities and is non-zero only in systems where both the space inversion and time-reversal symmetries are broken. We employ the developed theory to study the thermal response in tilted massive Dirac systems. We show that besides the different scattering time dependence, the various current contributions have distinct temperature dependence in the low temperature limit. Our systematic and comprehensive theory for nonlinear thermal transport paves the way for future theoretical and experimental studies on intrinsic thermal responses. | 翻訳日:2023-04-13 18:37:04 公開日:2023-04-12 |
# M3FGM:ノードマスキングと多粒度メッセージパスベースフェデレーショングラフモデルによる時空間データ予測 M3FGM:a node masking and multi-granularity message passing-based federated graph model for spatial-temporal data prediction ( http://arxiv.org/abs/2210.16193v2 ) ライセンス: Link先を確認 | Yuxing Tian, Zheng Liu, Yanwen Qu, Song Li, Jiachi Luo | (参考訳) 研究者たちは、プライバシーとセキュリティの制約に関して、連合学習(fl)とグラフモデルを組み合わせることで、空間-時間予測の課題を解決している。
1) クライアントは,推論フェーズ中にサーバにアクセスできないかもしれない。
2) サーバモデルで手動で設計したクライアントのグラフは,クライアント間の適切な関係を明らかにするものではない。
本稿では,これらの問題に対して,新しいgnn指向分割フェデレート学習法であるnode {\bfseries m}asking と {\bfseries m}ulti-granularity {\bfseries m}essage passing-based federated graph model (m$^3$fgm)を提案する。
2つ目の問題として、MGMP(Multi-Granularity Message Passing)層と呼ばれる新しいGNN層が、各クライアントノードがグローバルおよびローカル情報を知覚できるようにする。
その結果、M$^3$FGMはベースラインと変種モデルより優れており、データセットとシナリオの両方で最高の結果が得られることがわかった。 Researchers are solving the challenges of spatial-temporal prediction by combining Federated Learning (FL) and graph models with respect to the constrain of privacy and security. In order to make better use of the power of graph model, some researchs also combine split learning(SL). However, there are still several issues left unattended: 1) Clients might not be able to access the server during inference phase; 2) The graph of clients designed manually in the server model may not reveal the proper relationship between clients. This paper proposes a new GNN-oriented split federated learning method, named node {\bfseries M}asking and {\bfseries M}ulti-granularity {\bfseries M}essage passing-based Federated Graph Model (M$^3$FGM) for the above issues. For the first issue, the server model of M$^3$FGM employs a MaskNode layer to simulate the case of clients being offline. We also redesign the decoder of the client model using a dual-sub-decoders structure so that each client model can use its local data to predict independently when offline. As for the second issue, a new GNN layer named Multi-Granularity Message Passing (MGMP) layer enables each client node to perceive global and local information. We conducted extensive experiments in two different scenarios on two real traffic datasets. Results show that M$^3$FGM outperforms the baselines and variant models, achieves the best results in both datasets and scenarios. | 翻訳日:2023-04-13 18:36:26 公開日:2023-04-12 |
# LittleBird: 質問応答のための高速でより長い変換器 LittleBird: Efficient Faster & Longer Transformer for Question Answering ( http://arxiv.org/abs/2210.11870v2 ) ライセンス: Link先を確認 | Minchul Lee (1), Kijong Han (1), Myeong Cheol Shin (1) ((1) Kakao Enterprise Corp.) | (参考訳) BERTは様々なNLPタスクで多くのサスメントを示してきた。
特に,Attention with Linear Biases (ALiBi) に基づく,より柔軟で効率的な位置表現法を提案する。
また,bigbird に代表されるグローバル情報を pack や unpack attention に置き換えることがより効果的であることを示す。
その結果、LittleBirdは様々な言語で非常にうまく機能し、特にKorQuAD2.0, Korean Question Answering Datasetにおいて、質問応答タスクの高性能化を実現していることがわかった。 BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs. | 翻訳日:2023-04-13 18:35:06 公開日:2023-04-12 |
# 構造クラスタリングに基づく自己教師付き不均質グラフ事前学習 Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering ( http://arxiv.org/abs/2210.10462v2 ) ライセンス: Link先を確認 | Yaming Yang, Ziyu Guan, Zhe Wang, Wei Zhao, Cai Xu, Weigang Lu, Jianbin Huang | (参考訳) 近年, 従来の半教師付きヘテロジニアスグラフニューラルネットワーク (HGNN) と比較して, 有望な競争力を示した。
ソースコードはhttps://github.com/kepsail/shgp。 Recent self-supervised pre-training methods on Heterogeneous Information Networks (HINs) have shown promising competitiveness over traditional semi-supervised Heterogeneous Graph Neural Networks (HGNNs). Unfortunately, their performance heavily depends on careful customization of various strategies for generating high-quality positive examples and negative examples, which notably limits their flexibility and generalization ability. In this work, we present SHGP, a novel Self-supervised Heterogeneous Graph Pre-training approach, which does not need to generate any positive examples or negative examples. It consists of two modules that share the same attention-aggregation scheme. In each iteration, the Att-LPA module produces pseudo-labels through structural clustering, which serve as the self-supervision signals to guide the Att-HGNN module to learn object embeddings and attention coefficients. The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings. Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. We release our source code at: https://github.com/kepsail/SHGP. | 翻訳日:2023-04-13 18:34:46 公開日:2023-04-12 |
# 不完全情報に基づく知識グラフの品質評価 Knowledge Graph Quality Evaluation under Incomplete Information ( http://arxiv.org/abs/2212.00994v3 ) ライセンス: Link先を確認 | Xiaodong Li, Chenxin Zou, Yi Cai, Yuelong Zhu | (参考訳) 知識グラフ(KG)は多くのタスクにおける基本的な役割のため、ますます注目を集めている。
4組のKGの実験結果から,QEIIはベースラインと比較して,不完全情報下での能力レベルにおいて合理的な品質評価を行うことを示した。 Knowledge graphs (KGs) have attracted more and more attentions because of their fundamental roles in many tasks. Quality evaluation for KGs is thus crucial and indispensable. Existing methods in this field evaluate KGs by either proposing new quality metrics from different dimensions or measuring performances at KG construction stages. However, there are two major issues with those methods. First, they highly rely on raw data in KGs, which makes KGs' internal information exposed during quality evaluation. Second, they consider more about the quality at data level instead of ability level, where the latter one is more important for downstream applications. To address these issues, we propose a knowledge graph quality evaluation framework under incomplete information (QEII). The quality evaluation task is transformed into an adversarial Q&A game between two KGs. Winner of the game is thus considered to have better qualities. During the evaluation process, no raw data is exposed, which ensures information protection. Experimental results on four pairs of KGs demonstrate that, compared with baselines, the QEII implements a reasonable quality evaluation at ability level under incomplete information. | 翻訳日:2023-04-13 18:28:10 公開日:2023-04-12 |
# 非エルミート量子系に対する半古典的フシミ分布 Semiclassical Husimi distributions for non-Hermitian quantum systems ( http://arxiv.org/abs/2211.15336v2 ) ライセンス: Link先を確認 | Joesph Hall, Simon Malzard, and Eva-Maria Graefe | (参考訳) 非エルミート量子系におけるシュールベクトルの半古典位相空間密度を構築する。
この構成の一般性を示すために、混合的およびカオス的古典力学の条件下でのPT対称キックローターを非常に非自明な例に適用する。 We construct a semiclassical phase-space density of Schur vectors in non-Hermitian quantum systems. Each Schur vector is associated to a single Planck cell. The Schur states are organised according to a classical norm landscape on phase space - a classical manifestation of the lifetimes which are characteristic of non-Hermitian systems. To demonstrate the generality of this construction we apply it to a highly non-trivial example, a PT-symmetric kicked rotor in the regimes of mixed and chaotic classical dynamics. | 翻訳日:2023-04-13 18:27:43 公開日:2023-04-12 |
# クラス適応型ネットワーク校正 Class Adaptive Network Calibration ( http://arxiv.org/abs/2211.15088v2 ) ライセンス: Link先を確認 | Bingyuan Liu, J\'er\^ome Rony, Adrian Galdran, Jose Dolz, Ismail Ben Ayed | (参考訳) 最近の研究では、従来の精度以上のキャリブレーションは、現代のディープニューラルネットワークのトレーニングにも考慮すべきであることが示されている。
1) スカラーバランスの重みは,すべてのクラスにおいて同じであり,クラス間の内在的困難や不均衡に対処する能力を妨げる。
2) バランスウェイトは適応戦略を使わずに固定され, 精度とキャリブレーションの最良の妥協点に達するのを防ぎ, 各アプリケーションに対してハイパーパラメーター探索が必要となる。
コードはhttps://github.com/by-liu/CALSで公開されている。 Recent studies have revealed that, beyond conventional accuracy, calibration should also be considered for training modern deep neural networks. To address miscalibration during learning, some methods have explored different penalty functions as part of the learning objective, alongside a standard classification loss, with a hyper-parameter controlling the relative contribution of each term. Nevertheless, these methods share two major drawbacks: 1) the scalar balancing weight is the same for all classes, hindering the ability to address different intrinsic difficulties or imbalance among classes; and 2) the balancing weight is usually fixed without an adaptive strategy, which may prevent from reaching the best compromise between accuracy and calibration, and requires hyper-parameter search for each application. We propose Class Adaptive Label Smoothing (CALS) for calibrating deep networks, which allows to learn class-wise multipliers during training, yielding a powerful alternative to common label smoothing penalties. Our method builds on a general Augmented Lagrangian approach, a well-established technique in constrained optimization, but we introduce several modifications to tailor it for large-scale, class-adaptive training. Comprehensive evaluation and multiple comparisons on a variety of benchmarks, including standard and long-tailed image classification, semantic segmentation, and text classification, demonstrate the superiority of the proposed method. The code is available at https://github.com/by-liu/CALS. | 翻訳日:2023-04-13 18:27:35 公開日:2023-04-12 |
# ジョブショップスケジューリングのための教師付き学習による制約プログラミングの強化 Enhancing Constraint Programming via Supervised Learning for Job Shop Scheduling ( http://arxiv.org/abs/2211.14492v2 ) ライセンス: Link先を確認 | Yuan Sun, Su Nguyen, Dhananjay Thiruvady, Xiaodong Li, Andreas T. Ernst and Uwe Aickelin | (参考訳) 制約プログラミング(cp)は制約満足度と最適化問題を解決する強力な手法である。
最後に,機械学習に基づく変数順序付け手法を従来のドメインベース手法と併用することが有用であることを示す。 Constraint programming (CP) is a powerful technique for solving constraint satisfaction and optimization problems. In CP solvers, the variable ordering strategy used to select which variable to explore first in the solving process has a significant impact on solver effectiveness. To address this issue, we propose a novel variable ordering strategy based on supervised learning, which we evaluate in the context of job shop scheduling problems. Our learning-based methods predict the optimal solution of a problem instance and use the predicted solution to order variables for CP solvers. \added[]{Unlike traditional variable ordering methods, our methods can learn from the characteristics of each problem instance and customize the variable ordering strategy accordingly, leading to improved solver performance.} Our experiments demonstrate that training machine learning models is highly efficient and can achieve high accuracy. Furthermore, our learned variable ordering methods perform competitively when compared to four existing methods. Finally, we demonstrate that hybridising the machine learning-based variable ordering methods with traditional domain-based methods is beneficial. | 翻訳日:2023-04-13 18:27:11 公開日:2023-04-12 |
# pic-score:複数生体認証における最適一致信頼度のための確率的解釈可能な比較スコア PIC-Score: Probabilistic Interpretable Comparison Score for Optimal Matching Confidence in Single- and Multi-Biometric (Face) Recognition ( http://arxiv.org/abs/2211.12483v2 ) ライセンス: Link先を確認 | Pedro C. Neto, Ana F. Sequeira, Jaime S. Cardoso, Philipp Terh\"orst | (参考訳) 生体認証学の文脈では、信頼の一致とは、与えられた一致した決定が正しいという自信を指す。
コードは公開されている。 In the context of biometrics, matching confidence refers to the confidence that a given matching decision is correct. Since many biometric systems operate in critical decision-making processes, such as in forensics investigations, accurately and reliably stating the matching confidence becomes of high importance. Previous works on biometric confidence estimation can well differentiate between high and low confidence, but lack interpretability. Therefore, they do not provide accurate probabilistic estimates of the correctness of a decision. In this work, we propose a probabilistic interpretable comparison (PIC) score that accurately reflects the probability that the score originates from samples of the same identity. We prove that the proposed approach provides optimal matching confidence. Contrary to other approaches, it can also optimally combine multiple samples in a joint PIC score which further increases the recognition and confidence estimation performance. In the experiments, the proposed PIC approach is compared against all biometric confidence estimation methods available on four publicly available databases and five state-of-the-art face recognition systems. The results demonstrate that PIC has a significantly more accurate probabilistic interpretation than similar approaches and is highly effective for multi-biometric recognition. The code is publicly-available. | 翻訳日:2023-04-13 18:26:55 公開日:2023-04-12 |
# セルフアンサンブル保護:トレーニングチェックポイントは優れたデータプロテクター Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors ( http://arxiv.org/abs/2211.12005v3 ) ライセンス: Link先を確認 | Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang | (参考訳) データがますます重要になるにつれて、競争相手が高性能モデルのトレーニングに使用するため、企業はデータを公開することに非常に慎重になるでしょう。
3つのデータセットと5つのアーキテクチャの9つのベースラインによる広範囲な実験により、sepは新たな最先端である、例えば、cifar-10 resnet18の精度を94.56%から14.68%に低下させる。
コードはhttps://github.com/Sizhe-Chen/SEPで入手できる。 As data becomes increasingly vital, a company would be very cautious about releasing data, because the competitors could use it to train high-performance models, thereby posing a tremendous threat to the company's commercial competence. To prevent training good models on the data, we could add imperceptible perturbations to it. Since such perturbations aim at hurting the entire training process, they should reflect the vulnerability of DNN training, rather than that of a single model. Based on this new idea, we seek perturbed examples that are always unrecognized (never correctly classified) in training. In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures. That is, our amazing performance of ensemble only requires the computation of training one model. By extensive experiments with 9 baselines on 3 datasets and 5 architectures, SEP is verified to be a new state-of-the-art, e.g., our small $\ell_\infty=2/255$ perturbations reduce the accuracy of a CIFAR-10 ResNet18 from 94.56% to 14.68%, compared to 41.35% by the best-known method. Code is available at https://github.com/Sizhe-Chen/SEP. | 翻訳日:2023-04-13 18:26:37 公開日:2023-04-12 |
# web ベース質問応答とマルチモーダル融合を用いた知識ベース補完 Knowledge Base Completion using Web-Based Question Answering and Multimodal Fusion ( http://arxiv.org/abs/2211.07098v3 ) ライセンス: Link先を確認 | Yang Peng | (参考訳) 過去数年間、大量の知識を蓄積する大規模な知識基盤が構築されてきた。
抽出品質を向上させるため、質問応答システムは、エンティティタイプやエンティティ間関連性といった知識ベースからの構造化情報を用いる。 Over the past few years, large knowledge bases have been constructed to store massive amounts of knowledge. However, these knowledge bases are highly incomplete. To solve this problem, we propose a web-based question answering system system with multimodal fusion of unstructured and structured information, to fill in missing information for knowledge bases. To utilize unstructured information from the Web for knowledge base completion, we design a web-based question answering system using multimodal features and question templates to extract missing facts, which can achieve good performance with very few questions. To help improve extraction quality, the question answering system employs structured information from knowledge bases, such as entity types and entity-to-entity relatedness. | 翻訳日:2023-04-13 18:26:05 公開日:2023-04-12 |
# 単位キュービットチャネルについて On unital qubit channels ( http://arxiv.org/abs/2301.01358v2 ) ライセンス: Link先を確認 | Chi-Kwong Li and Man-Duen Choi | (参考訳) 局所ユニタリ変換の下でのユニタリ量子ビットチャネルの正準形式を得る。
より一般に、ユニタリな量子ビットチャネルは、対流係数 $p_1, \dots, p_m$ を持つユニタリチャネルの凸結合として表現することができ、また、チャネルのchoi行列の固有値のベクトルによって、$(p_1, \dots, p_m)$ がメジャー化される。
ブロッホ球面を対応する楕円体に送る自然線型写像の詳細な構造を考察する。 A canonical form for unital qubit channels under local unitary transforms is obtained. In particular, it is shown that the eigenvalues of the Choi matrix of a unital quantum channel form a complete set of invariants of the canonical form. It follows immediately that every unital qubit channel is the average of four unitary channels. More generally, a unital qubit channel can be expressed as the convex combination of unitary channels with convex coefficients $p_1, \dots, p_m$ as long as $2(p_1, \dots, p_m)$ is majorized by the vector of eigenvalues of the Choi matrix of the channel. A unital qubit channel in the canonical form will transform the Bloch sphere onto an ellipsoid. We look into the detailed structure of the natural linear maps sending the Bloch sphere onto a corresponding ellipsoid. | 翻訳日:2023-04-13 18:18:32 公開日:2023-04-12 |
# Floquetエンジニアリングによるソリトン列車の生成 Generating soliton trains through Floquet engineering ( http://arxiv.org/abs/2212.11904v2 ) ライセンス: Link先を確認 | Pablo Blanco-Mas and Charles E. Creffield | (参考訳) 光格子電位の存在下でパラボリックトラップに保持された超低温粒子の相互作用ガスについて検討した。
Floquet法は, 低温原子系におけるソリトン生成法として有用かつ安定した手法である。 We study a gas of interacting ultracold bosons held in a parabolic trap in the presence of an optical lattice potential. Treating the system as a discretised Gross-Pitaevskii model, we show how Floquet engineering, by rapidly ``shaking'' the lattice, allows the ground-state of the system to be converted into a train of bright solitons by inverting the sign of the hopping energy. We study how the number of solitons produced depends on the system's nonlinearity and the curvature of the trap, show how the technique can be applied both in the high and low driving-frequency regimes, and demonstrate the phenomenon's stability against noise. We conclude that the Floquet approach is a useful and stable method of preparing solitons in cold atom systems. | 翻訳日:2023-04-13 18:18:02 公開日:2023-04-12 |
# ddcolor:デュアルデコーダによるフォトリアリスティック・セマンティックアウェア画像のカラー化に向けて DDColor: Towards Photo-Realistic and Semantic-Aware Image Colorization via Dual Decoders ( http://arxiv.org/abs/2212.11613v3 ) ライセンス: Link先を確認 | Xiaoyang Kang, Tao Yang, Wenqi Ouyang, Peiran Ren, Lingzhi Li, Xuansong Xie | (参考訳) 画像の自動着色は難しい問題だ。
コードはhttps://github.com/piddnad/DDColor.comで公開される。 Automatic image colorization is a challenging problem. Due to the high illness and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Recent transformer-based methods can deliver better results, but they often rely on manually designed priors, which are hard to implement and suffer from poor generalization ability. Moreover, they tend to introduce serious color bleeding effects since color attention is performed on single-scale features, thus fail to exploit sufficient semantic information. To address these issues, we propose DDColor, a new end-to-end method with dual decoders for image colorization. Our approach includes a multi-scale image decoder and a transformer-based color decoder. The former restores the spatial resolution of the image, while the latter establishes the correlation between color and semantic representations via cross-attention. Rather than using additional priors, our two decoders work together to leverage multi-scale image features to guide optimization of adaptive color queries, significantly alleviating color bleeding effects. In addition, a simple yet effective colorfulness loss is introduced to further enhance the color richness of generated results. Our extensive experiments demonstrate that DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available at https://github.com/piddnad/DDColor. | 翻訳日:2023-04-13 18:17:49 公開日:2023-04-12 |
# 質問応答のためのモーメントコントラスト事前学習 Momentum Contrastive Pre-training for Question Answering ( http://arxiv.org/abs/2212.05762v2 ) ライセンス: Link先を確認 | Minda Hu, Muzhi Li, Yasheng Wang and Irwin King | (参考訳) 既存の抽出質問回答(QA)の事前学習手法は、構文構造において自然質問とは異なるクローゼのようなクエリを生成する。
そこで本研究では,抽出QAのための新しいMomentum Contrastive pRe-training fOr queStion anSwering(MCROSS)法を提案する。
3つのベンチマークQAデータセットによる実験結果から,本手法は教師付きシナリオとゼロショットシナリオの両方のベースラインと比較して顕著な改善が得られた。 Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries different from natural questions in syntax structure, which could overfit pre-trained models to simple keyword matching. In order to address this problem, we propose a novel Momentum Contrastive pRe-training fOr queStion anSwering (MCROSS) method for extractive QA. Specifically, MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs. Hence, the pre-trained models can better transfer the knowledge learned in cloze-like samples to answering natural questions. Experimental results on three benchmarking QA datasets show that our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios. | 翻訳日:2023-04-13 18:17:07 公開日:2023-04-12 |
# 人間互換自動車を目指して:感情遷移モデルを用いた自動走行における非言語チューリングテストの検討 Towards human-compatible autonomous car: A study of non-verbal Turing test in automated driving with affective transition modelling ( http://arxiv.org/abs/2212.02908v5 ) ライセンス: Link先を確認 | Zhaoning Li, Qiaoli Jiang, Zhengming Wu, Anqi Liu, Haiyan Wu, Miner Huang, Kai Huang and Yixuan Ku | (参考訳) 人間がハンズフリーの道を進むとき、自動運転車は不可欠だ。
本研究は、自律運転の今後の方向性となる乗客の人間性記述における情緒変化の重要な役割を示唆する。 Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the human likeness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving. | 翻訳日:2023-04-13 18:16:38 公開日:2023-04-12 |
# 最適輸送としての階層的政策 Hierarchical Policy Blending As Optimal Transport ( http://arxiv.org/abs/2212.01938v3 ) ライセンス: Link先を確認 | An T. Le, Kay Hansel, Jan Peters, Georgia Chalvatzaki | (参考訳) 最適輸送 (HiPBOT) として階層的政策ブレンディングを提案する。
詳細はhttps://sites.google.com/view/hipobotを参照。 We present hierarchical policy blending as optimal transport (HiPBOT). HiPBOT hierarchically adjusts the weights of low-level reactive expert policies of different agents by adding a look-ahead planning layer on the parameter space. The high-level planner renders policy blending as unbalanced optimal transport consolidating the scaling of the underlying Riemannian motion policies. As a result, HiPBOT effectively decides the priorities between expert policies and agents, ensuring the task's success and guaranteeing safety. Experimental results in several application scenarios, from low-dimensional navigation to high-dimensional whole-body control, show the efficacy and efficiency of HiPBOT. Our method outperforms state-of-the-art baselines -- either adopting probabilistic inference or defining a tree structure of experts -- paving the way for new applications of optimal transport to robot control. More material at https://sites.google.com/view/hipobot | 翻訳日:2023-04-13 18:16:17 公開日:2023-04-12 |
# マルチモーダル知識グラフ上のマルチモーダルパス融合を用いたクエリ駆動知識ベース補完 Query-Driven Knowledge Base Completion using Multimodal Path Fusion over Multimodal Knowledge Graph ( http://arxiv.org/abs/2212.01923v2 ) ライセンス: Link先を確認 | Yang Peng | (参考訳) 過去数年間、大量の知識を蓄積する大規模な知識基盤が構築されてきた。
システムの有効性と効率を実証する大規模な実験が実施されている。 Over the past few years, large knowledge bases have been constructed to store massive amounts of knowledge. However, these knowledge bases are highly incomplete, for example, over 70% of people in Freebase have no known place of birth. To solve this problem, we propose a query-driven knowledge base completion system with multimodal fusion of unstructured and structured information. To effectively fuse unstructured information from the Web and structured information in knowledge bases to achieve good performance, our system builds multimodal knowledge graphs based on question answering and rule inference. We propose a multimodal path fusion algorithm to rank candidate answers based on different paths in the multimodal knowledge graphs, achieving much better performance than question answering, rule inference and a baseline fusion algorithm. To improve system efficiency, query-driven techniques are utilized to reduce the runtime of our system, providing fast responses to user queries. Extensive experiments have been conducted to demonstrate the effectiveness and efficiency of our system. | 翻訳日:2023-04-13 18:16:02 公開日:2023-04-12 |
# 多体物理学からの強い量子メトロロジー限界 Strong quantum metrological limit from many-body physics ( http://arxiv.org/abs/2301.12113v2 ) ライセンス: Link先を確認 | Yaoming Chu, Xiangbei Li, and Jianming Cai | (参考訳) 標準の量子限界を超え、量子エンタングルメントを用いてハイゼンベルク限界に達することさえも、量子メトロロジーの聖杯を表している。
これにより、量子力学の量子的優位性を達成するのに不可欠な量子多体系の本質的特徴を特定でき、多体量子力学と量子メートル法の間に興味深いつながりをもたらす。 Surpassing the standard quantum limit and even reaching the Heisenberg limit using quantum entanglement, represents the Holy Grail of quantum metrology. However, quantum entanglement is a valuable resource that does not come without a price. The exceptional time overhead for the preparation of large-scale entangled states raises disconcerting concerns about whether the Heisenberg limit is fundamentally achievable. Here we find a universal speed limit set by the Lieb-Robinson light cone for the quantum Fisher information growth to characterize the metrological potential of quantum resource states during their preparation. Our main result establishes a strong precision limit of quantum metrology accounting for the complexity of many-body quantum resource state preparation and reveals a fundamental constraint for reaching the Heisenberg limit in a generic many-body lattice system with bounded one-site energy. It enables us to identify the essential features of quantum many-body systems that are crucial for achieving the quantum advantage of quantum metrology, and brings an interesting connection between many-body quantum dynamics and quantum metrology. | 翻訳日:2023-04-13 18:10:30 公開日:2023-04-12 |
# LDMIC:学習型分散マルチビュー画像符号化 LDMIC: Learning-based Distributed Multi-view Image Coding ( http://arxiv.org/abs/2301.09799v3 ) ライセンス: Link先を確認 | Xinjie Zhang, Jiawei Shao, Jun Zhang | (参考訳) マルチビュー画像圧縮は3D関連アプリケーションにおいて重要な役割を果たす。
コードはhttps://github.com/Xinjie-Q/LDMICでリリースされる。 Multi-view image compression plays a critical role in 3D-related applications. Existing methods adopt a predictive coding architecture, which requires joint encoding to compress the corresponding disparity as well as residual information. This demands collaboration among cameras and enforces the epipolar geometric constraint between different views, which makes it challenging to deploy these methods in distributed camera systems with randomly overlapping fields of view. Meanwhile, distributed source coding theory indicates that efficient data compression of correlated sources can be achieved by independent encoding and joint decoding, which motivates us to design a learning-based distributed multi-view image coding (LDMIC) framework. With independent encoders, LDMIC introduces a simple yet effective joint context transfer module based on the cross-attention mechanism at the decoder to effectively capture the global inter-view correlations, which is insensitive to the geometric relationships between images. Experimental results show that LDMIC significantly outperforms both traditional and learning-based MIC methods while enjoying fast encoding speed. Code will be released at https://github.com/Xinjie-Q/LDMIC. | 翻訳日:2023-04-13 18:09:59 公開日:2023-04-12 |
# フィールドインストールファイバリンク上の決定論的単一光子源を用いた量子鍵分布 Quantum Key Distribution using Deterministic Single-Photon Sources over a Field-Installed Fibre Link ( http://arxiv.org/abs/2301.09399v2 ) ライセンス: Link先を確認 | Mujtaba Zahidy, Mikkel T. Mikkelsen, Ronny M\"uller, Beatrice Da Lio, Martin Krehbiel, Ying Wang, Nikolai Bart, Andreas D. Wieck, Arne Ludwig, Michael Galili, S{\o}ren Forchhammer, Peter Lodahl, Leif K. Oxenl{\o}we, Davide Bacco, and Leonardo Midolo | (参考訳) 量子ドットベースの単一光子源は、コンピューティングと通信のためにオンデマンドのスケーラブルな量子リソースを提供する量子情報技術にとって重要な資産である。
本研究は、量子インターネットの目標に向けて、デバイス非依存の量子鍵分布を含む高度な単一光子ベースの通信プロトコルを整備しつつ、決定論的単一光子ソース技術の成熟度を強調した。 Quantum-dot-based single-photon sources are key assets for quantum information technology, supplying on-demand scalable quantum resources for computing and communication. However, longlasting issues such as limited long-term stability and source brightness have traditionally impeded their adoption in real-world applications. Here, we realize a quantum key distribution field trial using true single photons across an 18-km-long dark fibre, located in the Copenhagen metropolitan area, using an optimized, state-of-the-art, quantum-dot single-photon source frequency-converted to the telecom wavelength. A secret key generation rate of >2 kbits/s realized over a 9.6 dB channel loss is achieved with a polarization-encoded BB84 scheme, showing remarkable stability for more than 24 hours of continuous operation. Our results highlight the maturity of deterministic single-photon source technology while paving the way for advanced single-photon-based communication protocols, including fully device-independent quantum key distribution, towards the goal of a quantum internet. | 翻訳日:2023-04-13 18:09:39 公開日:2023-04-12 |
# 不確実性定量化を用いた物理システムモデリングのための物理情報場理論 Physics-informed Information Field Theory for Modeling Physical Systems with Uncertainty Quantification ( http://arxiv.org/abs/2301.07609v3 ) ライセンス: Link先を確認 | Alex Alberts, Ilias Bilionis | (参考訳) データ駆動アプローチと物理知識は、システムをモデル化するための強力なテクニックである。
IFT を物理インフォームド IFT (PIFT) に拡張し,フィールドを記述する物理法則に関する情報を符号化する。
次に, 確率勾配ランジュバン力学の変種を開発し, 関節後方からフィールド上およびモデルパラメータ上にサンプルを抽出した。
本手法は, モデル形式誤差の異なる数値例と非線形微分方程式を含む逆問題に適用する。
このため, 数値実験により, この手法は十分なデータが得られる物理の誤った表現に対しても頑健であることがわかった。
本手法は,物理が信頼できない場合に正しく識別できることを数値的に証明し,その場合,フィールドの学習を回帰問題として自動的に扱う。 Data-driven approaches coupled with physical knowledge are powerful techniques to model systems. The goal of such models is to efficiently solve for the underlying field by combining measurements with known physical laws. As many systems contain unknown elements, such as missing parameters, noisy data, or incomplete physical laws, this is widely approached as an uncertainty quantification problem. The common techniques to handle all the variables typically depend on the numerical scheme used to approximate the posterior, and it is desirable to have a method which is independent of any such discretization. Information field theory (IFT) provides the tools necessary to perform statistics over fields that are not necessarily Gaussian. We extend IFT to physics-informed IFT (PIFT) by encoding the functional priors with information about the physical laws which describe the field. The posteriors derived from this PIFT remain independent of any numerical scheme and can capture multiple modes, allowing for the solution of problems which are ill-posed. We demonstrate our approach through an analytical example involving the Klein-Gordon equation. We then develop a variant of stochastic gradient Langevin dynamics to draw samples from the joint posterior over the field and model parameters. We apply our method to numerical examples with various degrees of model-form error and to inverse problems involving nonlinear differential equations. As an addendum, the method is equipped with a metric which allows the posterior to automatically quantify model-form uncertainty. Because of this, our numerical experiments show that the method remains robust to even an incorrect representation of the physics given sufficient data. We numerically demonstrate that the method correctly identifies when the physics cannot be trusted, in which case it automatically treats learning the field as a regression problem. | 翻訳日:2023-04-13 18:09:02 公開日:2023-04-12 |
# ビデオイベント関連予測のための構造記号表現の防御 In Defense of Structural Symbolic Representation for Video Event-Relation Prediction ( http://arxiv.org/abs/2301.03410v2 ) ライセンス: Link先を確認 | Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang | (参考訳) ビデオ内のイベント関係を理解するには、イベントの基盤となる構造(イベントタイプ、関連する引数ロール、対応するエンティティ)と推論の事実的知識を理解するモデルが必要である。
1) SSR ベースの方法が失敗した理由
2) 映像イベント関連予測の評価設定を適切に理解する方法
3) SSR に基づく手法の可能性を明らかにする方法。
その結果、新たな最先端モデルによって、25%のマクロ精度のパフォーマンス向上が実現される。 Understanding event relationships in videos requires a model to understand the underlying structures of events (i.e. the event type, the associated argument roles, and corresponding entities) and factual knowledge for reasoning. Structural symbolic representation (SSR) based methods directly take event types and associated argument roles/entities as inputs to perform reasoning. However, the state-of-the-art video event-relation prediction system shows the necessity of using continuous feature vectors from input videos; existing methods based solely on SSR inputs fail completely, even when given oracle event types and argument roles. In this paper, we conduct an extensive empirical analysis to answer the following questions: 1) why SSR-based method failed; 2) how to understand the evaluation setting of video event relation prediction properly; 3) how to uncover the potential of SSR-based methods. We first identify suboptimal training settings as causing the failure of previous SSR-based video event prediction models. Then through qualitative and quantitative analysis, we show how evaluation that takes only video as inputs is currently unfeasible, as well as the reliance on oracle event information to obtain an accurate evaluation. Based on these findings, we propose to further contextualize the SSR-based model to an Event-Sequence Model and equip it with more factual knowledge through a simple yet effective way of reformulating external visual commonsense knowledge bases into an event-relation prediction pretraining dataset. The resultant new state-of-the-art model eventually establishes a 25% Macro-accuracy performance boost. | 翻訳日:2023-04-13 18:07:22 公開日:2023-04-12 |
# 半教師付きノード分類のための信頼度に基づくサブグラフマッチングによる親和性近傍の探索 Finding Heterophilic Neighbors via Confidence-based Subgraph Matching for Semi-supervised Node Classification ( http://arxiv.org/abs/2302.09755v2 ) ライセンス: Link先を確認 | Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim | (参考訳) グラフニューラルネットワーク(GNN)は多くのグラフベースのアプリケーションで強力であることが証明されている。
次に, エッジ係数を効果的に活用するために, 改良ラベル伝搬機構をgnnに適用する。
ベンチマークデータセットにおける実験は、モデルが過剰動作を緩和し、パフォーマンスが向上することを示している。 Graph Neural Networks (GNNs) have proven to be powerful in many graph-based applications. However, they fail to generalize well under heterophilic setups, where neighbor nodes have different labels. To address this challenge, we employ a confidence ratio as a hyper-parameter, assuming that some of the edges are disassortative (heterophilic). Here, we propose a two-phased algorithm. Firstly, we determine edge coefficients through subgraph matching using a supplementary module. Then, we apply GNNs with a modified label propagation mechanism to utilize the edge coefficients effectively. Specifically, our supplementary module identifies a certain proportion of task-irrelevant edges based on a given confidence ratio. Using the remaining edges, we employ the widely used optimal transport to measure the similarity between two nodes with their subgraphs. Finally, using the coefficients as supplementary information on GNNs, we improve the label propagation mechanism which can prevent two nodes with smaller weights from being closer. The experiments on benchmark datasets show that our model alleviates over-smoothing and improves performance. | 翻訳日:2023-04-13 18:00:39 公開日:2023-04-12 |
# MixNeRF:スパース入力からの新しいビュー合成のための混合密度線をモデル化する MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs ( http://arxiv.org/abs/2302.08788v2 ) ライセンス: Link先を確認 | Seunghyeon Seo, Donghoon Han, Yeonjin Chang, Nojun Kwak | (参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、そのシンプルな概念と最先端の品質のために、新しいビュー合成の新たな基盤を壊した。
当社のmixnerfは、トレーニングと推論の効率が優れた、さまざまな標準ベンチマークで、最先端のメソッドよりも優れています。 Neural Radiance Field (NeRF) has broken new ground in the novel view synthesis due to its simple concept and state-of-the-art quality. However, it suffers from severe performance degradation unless trained with a dense set of images with different camera poses, which hinders its practical applications. Although previous methods addressing this problem achieved promising results, they relied heavily on the additional training resources, which goes against the philosophy of sparse-input novel-view synthesis pursuing the training efficiency. In this work, we propose MixNeRF, an effective training strategy for novel view synthesis from sparse inputs by modeling a ray with a mixture density model. Our MixNeRF estimates the joint distribution of RGB colors along the ray samples by modeling it with mixture of distributions. We also propose a new task of ray depth estimation as a useful training objective, which is highly correlated with 3D scene geometry. Moreover, we remodel the colors with regenerated blending weights based on the estimated ray depth and further improves the robustness for colors and viewpoints. Our MixNeRF outperforms other state-of-the-art methods in various standard benchmarks with superior efficiency of training and inference. | 翻訳日:2023-04-13 18:00:23 公開日:2023-04-12 |
# viewmaker networkを用いたマルチスペクトルコントラスト学習 Multispectral Contrastive Learning with Viewmaker Networks ( http://arxiv.org/abs/2302.05757v2 ) ライセンス: Link先を確認 | Jasmine Bayrooti, Noah Goodman, Alex Tamkin | (参考訳) 対照的な学習方法は、データポイントの類似した「ビュー」を識別する訓練モデルにより、様々な領域やモダリティに適用されている。
最近提案されたビュー作成手法であるViewmaker Networkは、ドメイン知識や試行錯誤を伴わずに、この環境でビューを生成することを約束している。
ソースコードはhttps://github.com/jbayrooti/divmakerにある。 Contrastive learning methods have been applied to a range of domains and modalities by training models to identify similar "views" of data points. However, specialized scientific modalities pose a challenge for this paradigm, as identifying good views for each scientific instrument is complex and time-intensive. In this paper, we focus on applying contrastive learning approaches to a variety of remote sensing datasets. We show that Viewmaker networks, a recently proposed method for generating views, are promising for producing views in this setting without requiring extensive domain knowledge and trial and error. We apply Viewmaker to four multispectral imaging problems, each with a different format, finding that Viewmaker can outperform cropping- and reflection-based methods for contrastive learning in every case when evaluated on downstream classification tasks. This provides additional evidence that domain-agnostic methods can empower contrastive learning to scale to real-world scientific domains. Open source code can be found at https://github.com/jbayrooti/divmaker. | 翻訳日:2023-04-13 17:59:17 公開日:2023-04-12 |
# 言語モデルの連続事前学習 Continual Pre-training of Language Models ( http://arxiv.org/abs/2302.03241v4 ) ライセンス: Link先を確認 | Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, and Bing Liu | (参考訳) 言語モデル(LM)は、自然言語処理の急速な進歩に役立っている。
本稿では, LMの連続的事前訓練, 特に連続的ドメイン適応型事前訓練(あるいは連続的DAP訓練)について検討する。
実験評価の結果,提案手法の有効性が示された。 Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method. | 翻訳日:2023-04-13 17:58:58 公開日:2023-04-12 |
# 副次的評価を有するエージェント間の良質・良質な項目の分割 Dividing Good and Better Items Among Agents with Submodular Valuations ( http://arxiv.org/abs/2302.03087v2 ) ライセンス: Link先を確認 | Cyrus Cousins, Vignesh Viswanathan and Yair Zick | (参考訳) 我々は,二価のサブモジュラー価値を持つエージェント間で,一組の不可分な商品を公平に割り当てる問題について検討する。
本稿では,最近導入されたYankee Swap機構に基づいて,レキシミン,最大ナッシュ福祉(MNW),および$a$が$b$を分割した場合のアロケーションを最大化する$p$平均福祉など,様々なソリューション概念を計算できる簡単な逐次アルゴリズムフレームワークを提案する。
この結果は、$a$ が$b$ を割らない場合、レキシミンとmnwの割り当ての計算不能性に関する既存の結果によって補完される。
envy freenessでは、レキシミンとmnwの割り当ては1つの良いものまでenvy freeであることが保証されていない(ef1)。
この分率は、エージェントが2値の付加価値を持つ場合、それぞれ$\frac13$と$\frac{a}{b+2a}$に改善される。 We study the problem of fairly allocating a set of indivisible goods among agents with bivalued submodular valuations -- each good provides a marginal gain of either $a$ or $b$ ($a < b$) and goods have decreasing marginal gains. This is a natural generalization of two well-studied valuation classes -- bivalued additive valuations and binary submodular valuations. We present a simple sequential algorithmic framework, based on the recently introduced Yankee Swap mechanism, that can be adapted to compute a variety of solution concepts, including leximin, max Nash welfare (MNW) and $p$-mean welfare maximizing allocations when $a$ divides $b$. This result is complemented by an existing result on the computational intractability of leximin and MNW allocations when $a$ does not divide $b$. We further examine leximin and MNW allocations with respect to two well-known properties -- envy freeness and the maximin share guarantee. On envy freeness, we show that neither the leximin nor the MNW allocation is guaranteed to be envy free up to one good (EF1). This is surprising since for the simpler classes of bivalued additive valuations and binary submodular valuations, MNW allocations are known to be envy free up to any good (EFX). On the maximin share guarantee, we show that MNW and leximin allocations guarantee each agent $\frac14$ and $\frac{a}{b+3a}$ of their maximin share respectively when $a$ divides $b$. This fraction improves to $\frac13$ and $\frac{a}{b+2a}$ respectively when agents have bivalued additive valuations. | 翻訳日:2023-04-13 17:58:41 公開日:2023-04-12 |
# 小さな$^4he_n$クラスターに対する深層ニューラルネットワークと変分モンテカルロ法との相乗効果 Synergy between deep neural networks and the variational Monte Carlo method for small $^4He_N$ clusters ( http://arxiv.org/abs/2302.00599v2 ) ライセンス: Link先を確認 | William Freitas and S. A. Vitiello | (参考訳) 本稿ではBose-Einstein統計量を満たす波動関数をモデル化するためのニューラルネットワークに基づくアプローチを提案する。
これは、我々のニューラルネットワークアプローチが、ボース=アインシュタイン統計に従う多体システムを調べる強力なツールであることを示唆している。 We present a neural network-based approach for modeling wave functions that satisfies Bose-Einstein statistics. By applying this model to small $^4He_N$ clusters with N ranging from 2 to 14 atoms, we were able to accurately predict ground state energies, pair density functions, and two-body contact parameters $C^{(N)}_2$ associated with weak unitarity. The results obtained through the use of the variational Monte Carlo method are in remarkable agreement with previous studies that employed the diffusion Monte Carlo method. This suggests that our neural network approach is a powerful tool for investigating many-body systems that obey Bose-Einstein statistics. | 翻訳日:2023-04-13 17:57:49 公開日:2023-04-12 |
# 低温原子のための時間軌道型チップトラップ A time-orbiting potential chip trap for cold atoms ( http://arxiv.org/abs/2302.00078v2 ) ライセンス: Link先を確認 | C. A. Sackett and J. A. Stickney | (参考訳) 本稿では、時間軌道ポテンシャル技術を用いた原子チップトラップの設計について述べる。
磁場を変形させて重力に対する支持勾配を与えることができ、三次元トラップを2次元ガイドに変換することができる。 We present a design for an atom chip trap that uses the time-orbiting potential technique. The design offers several advantages compared to other chip-trap methods. It uses a simple crossed-wire pattern on the chip, along with a rotating bias field. The trap is naturally close to spherically symmetric, and it can be modified to be exactly symmetric in quadratic order of the coordinates. Loading from a magneto-optical trap is facilitated because the trap can be positioned an arbitrary distance from the chip. The fields can be modified to provide a gradient for support against gravity, and the three-dimensional trap can be adiabatically transformed into a two-dimensional guide. | 翻訳日:2023-04-13 17:57:37 公開日:2023-04-12 |
# 拡散の識別における時系列分類法のベンチマーク最適性 Benchmarking optimality of time series classification methods in distinguishing diffusions ( http://arxiv.org/abs/2301.13112v3 ) ライセンス: Link先を確認 | Zehong Zhang, Fei Lu, Esther Xu Fei, Terry Lyons, Yannis Kevrekidis, and Tom Woolf | (参考訳) 統計的最適性ベンチマークは時系列分類(TSC)アルゴリズムの解析と設計に不可欠である。
本研究では, 拡散過程を高次比検定(LRT)により識別するTSCアルゴリズムの最適性を評価することを提案する。
lrt は neyman-pearson lemma による最適分類器である。
さらに、LRTベンチマークは、時間長、寸法、時間サンプリング周波数、時系列のランダム性に対する分類精度の依存性を分析するツールを提供する。 Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need training, and the diffusion processes can be efficiently simulated and are flexible to reflect the specific features of real-world applications. We demonstrate the benchmarking with three widely-used TSC algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying high-dimensional nonlinear multivariate time series. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. | 翻訳日:2023-04-13 17:57:28 公開日:2023-04-12 |
# 最適な採餌戦略を学習し、L''evy ウォークを上回る Optimal foraging strategies can be learned and outperform L\'evy walks ( http://arxiv.org/abs/2303.06050v2 ) ライセンス: Link先を確認 | Gorka Mu\~noz-Gil, Andrea L\'opez-Incera, Lukas J. Fiderer and Hans J. Briegel | (参考訳) L'evy walkとその他の理論モデルが実世界のシナリオを記述するのに成功し、経済、物理学、生態学、進化生物学などいくつかの分野に注目が集まっている。
まず, 強化学習モデルにおける報酬の最大化が, 捕食効率の最適化と等価であることを理論的に証明する。
次に, エージェントがL''evy walkのような既知の戦略の効率を上回り, 捕食戦略を学習する数値実験を行った。 L\'evy walks and other theoretical models of optimal foraging have been successfully used to describe real-world scenarios, attracting attention in several fields such as economy, physics, ecology, and evolutionary biology. However, it remains unclear in most cases which strategies maximize foraging efficiency and whether such strategies can be learned by living organisms. To address these questions, we model foragers as reinforcement learning agents. We first prove theoretically that maximizing rewards in our reinforcement learning model is equivalent to optimizing foraging efficiency. We then show with numerical experiments that our agents learn foraging strategies which outperform the efficiency of known strategies such as L\'evy walks. | 翻訳日:2023-04-13 17:52:12 公開日:2023-04-12 |
# cvt-slr:可変アライメントを用いた手話認識のためのコントラスト的視覚テキスト変換 CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment ( http://arxiv.org/abs/2303.05725v4 ) ライセンス: Link先を確認 | Jiangbin Zheng, Yile Wang, Cheng Tan, Siyuan Li, Ge Wang, Jun Xia, Yidong Chen, Stan Z. Li | (参考訳) 手話認識(SLR)は、手話ビデオにテキストグルースとして注釈をつける弱い教師付きタスクである。
公開データセット(PHOENIX-2014およびPHOENIX-2014T)の大規模な実験により,提案したCVT-SLRは既存の単一キュー法より一貫して優れ,SOTAマルチキュー法よりも優れていた。 Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. Most SLR works thereby adopt pretrained visual modules and develop two mainstream solutions. The multi-stream architectures extend multi-cue visual features, yielding the current SOTA performances but requiring complex designs and might introduce potential noise. Alternatively, the advanced single-cue SLR frameworks using explicit cross-modal alignment between visual and textual modalities are simple and effective, potentially competitive with the multi-cue framework. In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. Based on the single-cue cross-modal alignment framework, we propose a variational autoencoder (VAE) for pretrained contextual knowledge while introducing the complete pretrained language module. The VAE implicitly aligns visual and textual modalities while benefiting from pretrained contextual knowledge as the traditional contextual module. Meanwhile, a contrastive cross-modal alignment algorithm is designed to explicitly enhance the consistency constraints. Extensive experiments on public datasets (PHOENIX-2014 and PHOENIX-2014T) demonstrate that our proposed CVT-SLR consistently outperforms existing single-cue methods and even outperforms SOTA multi-cue methods. | 翻訳日:2023-04-13 17:51:58 公開日:2023-04-12 |
# 希少部分群における画像分類器の系統誤差の同定 Identification of Systematic Errors of Image Classifiers on Rare Subgroups ( http://arxiv.org/abs/2303.05072v2 ) ライセンス: Link先を確認 | Jan Hendrik Metzen, Robin Hutmacher, N. Grace Hua, Valentyn Boreiko, Dan Zhang | (参考訳) 多くの画像分類器の平均ケース性能にもかかわらず、それらの性能はトレーニングデータで表現されていないデータのセマンティックコヒーレントな部分群で著しく低下する。
本稿では、ImageNet分類器にPromptAttackを適用し、稀なサブグループの新しい体系的エラーを特定する。 Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups. | 翻訳日:2023-04-13 17:51:28 公開日:2023-04-12 |
# 進化的強化学習:調査 Evolutionary Reinforcement Learning: A Survey ( http://arxiv.org/abs/2303.04150v3 ) ライセンス: Link先を確認 | Hui Bai and Ran Cheng and Yaochu Jin | (参考訳) 強化学習(Reinforcement Learning, RL)は、エージェントに環境とのインタラクションを通じて累積報酬を最大化する機械学習アプローチである。
この調査の助けを借りて、研究者や実践者はより効率的な方法やEvoRLのベンチマークを作成できるようになり、この有望な学際的な研究分野をさらに進めることができる。 Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field. | 翻訳日:2023-04-13 17:51:10 公開日:2023-04-12 |
# ノイズ系の共鳴蛍光 Resonance fluorescence of noisy systems ( http://arxiv.org/abs/2303.01531v2 ) ライセンス: Link先を確認 | Rafa{\l} A. Bogaczewicz, Pawe{\l} Machnikowski | (参考訳) 共鳴蛍光と呼ばれる共鳴またはほぼ共鳴励起系からの光散乱は、物質の量子状態の調査や量子情報の読み出しのための汎用的なツールとして重要視されている。
したがって、RFスペクトルは物理系に存在する雑音の特性に関する情報を伝達する。 Light scattering from resonantly or nearly resonantly excited systems, known as resonance fluorescence, has been gaining importance as a versatile tool for investigating quantum states of matter and readout of quantum information, recently including also the inherently noisy solid state systems. In this work we develop a general theory of resonance fluorescence in the low excitation limit on systems in which the transition energy is subject to noise for two important classes of noise processes: white noise fluctuations that lead to phase diffusion and an arbitrary stationary Markovian noise process on a finite set of states. We apply the latter to the case of random telegraph noise and a sum of an arbitrary number of identical random telegraph noise contributions. We show that different classes of noise influence the RF spectrum in a characteristic way. Hence, the RF spectrum carries information on the characteristics of noise present in the physical system. | 翻訳日:2023-04-13 17:50:16 公開日:2023-04-12 |
# 2つのリンドブラッド浴に結合したスピン1/2xxz鎖:平衡相関関数による非平衡定常状態の構築 The spin-1/2 XXZ chain coupled to two Lindblad baths: Constructing nonequilibrium steady states from equilibrium correlation functions ( http://arxiv.org/abs/2303.00430v2 ) ライセンス: Link先を確認 | Tjark Heitmann, Jonas Richter, Fengping Jin, Sourav Nandy, Zala Lenar\v{c}i\v{c}, Jacek Herbrych, Kristel Michielsen, Hans De Raedt, Jochen Gemmer, Robin Steinigeweg | (参考訳) 多体量子システムの輸送係数を抽出するための最先端のアプローチは、広く2つのカテゴリに分類される。
(ii) 選択されたモデルとパラメータの選択については, 文献で異論が指摘されている。
スピン1/2 xxz鎖における磁化輸送の研究から, 弱駆動では, 開系における非平衡定常状態は, 時間的構築を含めて, 閉系における相関関数に基づいて, 著しく構成できることを示した。
また,有限系の非平衡定常状態から輸送係数を抽出する場合の潜在的な落とし穴を指摘する。 State-of-the-art approaches to extract transport coefficients of many-body quantum systems broadly fall into two categories: (i) they target the linear-response regime in terms of equilibrium correlation functions of the closed system; or (ii) they consider an open-system situation typically modeled by a Lindblad equation, where a nonequilibrium steady state emerges from driving the system at its boundaries. While quantitative agreement between (i) and (ii) has been found for selected model and parameter choices, also disagreement has been pointed out in the literature. Studying magnetization transport in the spin-1/2 XXZ chain, we here demonstrate that at weak driving, the nonequilibrium steady state in an open system, including its buildup in time, can remarkably be constructed just on the basis of correlation functions in the closed system. We numerically illustrate this direct correspondence of closed-system and open-system dynamics, and show that it allows the treatment of comparatively large open systems, usually only accessible to matrix product state simulations. We also point out potential pitfalls when extracting transport coefficients from nonequilibrium steady states in finite systems. | 翻訳日:2023-04-13 17:50:02 公開日:2023-04-12 |
# Few-Shot Name Entity Recognition のためのジョイントコントラスト学習による特徴的セマンティックデカップリング法 A Prototypical Semantic Decoupling Method via Joint Contrastive Learning for Few-Shot Name Entity Recognition ( http://arxiv.org/abs/2302.13610v2 ) ライセンス: Link先を確認 | Guanting Dong and Zechen Wang and Liwen Wang and Daichi Guo and Dayuan Fu and Yuxiang Wu and Chen Zeng and Xuefeng Li and Tingfeng Hui and Keqing He and Xinyue Cui and Qixiang Gao and Weiran Xu | (参考訳) 名前付きエンティティ認識(NER)は、わずかにラベル付きインスタンスに基づいて名前付きエンティティを識別することを目的としている。
拡張解析はPSDCの有効性と一般化をさらに検証する。 Few-shot named entity recognition (NER) aims at identifying named entities based on only few labeled instances. Most existing prototype-based sequence labeling models tend to memorize entity mentions which would be easily confused by close prototypes. In this paper, we proposed a Prototypical Semantic Decoupling method via joint Contrastive learning (PSDC) for few-shot NER. Specifically, we decouple class-specific prototypes and contextual semantic prototypes by two masking strategies to lead the model to focus on two different semantic information for inference. Besides, we further introduce joint contrastive learning objectives to better integrate two kinds of decoupling information and prevent semantic collapse. Experimental results on two few-shot NER benchmarks demonstrate that PSDC consistently outperforms the previous SOTA methods in terms of overall performance. Extensive analysis further validates the effectiveness and generalization of PSDC. | 翻訳日:2023-04-13 17:49:43 公開日:2023-04-12 |
# vlsp2022-evjvqaチャレンジ:多言語視覚質問応答 VLSP2022-EVJVQA Challenge: Multilingual Visual Question Answering ( http://arxiv.org/abs/2302.11752v4 ) ライセンス: Link先を確認 | Ngan Luu-Thuy Nguyen, Nghia Hieu Nguyen, Duong T.D Vo, Khanh Quoc Tran, Kiet Van Nguyen | (参考訳) VQA(Visual Question Answering)は自然言語処理(NLP)とコンピュータビジョン(CV)の課題であり、研究者から大きな注目を集めている。
EVJVQAはベトナム語と音声処理に関する第9回ワークショップ(VLSP 2022)で、多言語視覚質問応答の課題に対するベンチマークデータセットとして使用されている。
我々は,さらなる研究のために,codalab評価システムに関する課題を公開した。 Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research. | 翻訳日:2023-04-13 17:49:28 公開日:2023-04-12 |
# bipotent architectureにおけるqaoaの最適化 Optimizing QAOA on Bipotent Architectures ( http://arxiv.org/abs/2303.13109v2 ) ライセンス: Link先を確認 | Yanjun Ji, Kathrin F. Koenig, and Ilia Polian | (参考訳) 量子ゲートの活発な最適化は、最適化されたゲートがいくつかの量子ビットで利用できるが、他の量子ビットでは利用できない二元的量子アーキテクチャをもたらす。
本研究は,2次量子アーキテクチャにおける最適量子ビット選択に関する実践的ガイダンスを提供し,それらのアーキテクチャの改善の必要性を示唆し,最終的にすべてのゲートタイプに対してパルスレベルの最適化を実現する。 Vigorous optimization of quantum gates has led to bipotent quantum architectures, where the optimized gates are available for some qubits but not for others. However, such gate-level improvements limit the application of user-side pulse-level optimizations, which have proven effective for quantum circuits with a high level of regularity, such as the ansatz circuit of the Quantum Approximate Optimization Algorithm (QAOA). In this paper, we investigate the trade-off between hardware-level and algorithm-level improvements on bipotent quantum architectures. Our results for various QAOA instances on two quantum computers offered by IBM indicate that the benefits of pulse-level optimizations currently outweigh the improvements due to vigorously optimized monolithic gates. Furthermore, our data indicate that the fidelity of circuit primitives is not always the best indicator for the overall algorithm performance; also their gate type and schedule duration should be taken into account. This effect is particularly pronounced for QAOA on dense portfolio optimization problems, since their transpilation requires many SWAP gates, for which efficient pulse-level optimization exists. Our findings provide practical guidance on optimal qubit selection on bipotent quantum architectures and suggest the need for improvements of those architectures, ultimately making pulse-level optimization available for all gate types. | 翻訳日:2023-04-13 17:41:53 公開日:2023-04-12 |
# 人工知能の火花:GPT-4による初期の実験 Sparks of Artificial General Intelligence: Early experiments with GPT-4 ( http://arxiv.org/abs/2303.12712v4 ) ライセンス: Link先を確認 | S\'ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang | (参考訳) 人工知能(AI)の研究者たちは、さまざまなドメインやタスクにまたがる優れた能力を示す大規模な言語モデル(LLM)を開発し、洗練し、学習と認知の理解に挑戦しています。
我々は, GPT-4の探索において, 限界の発見に特に重点を置いており, 次世代の予測を超えて新たなパラダイムを追求する必要性を含む, より深く包括的なAGIバージョンに向けて進む上での課題について論じている。
我々は,最近の技術的飛躍と今後の研究方向の社会的な影響を振り返って結論づける。 Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions. | 翻訳日:2023-04-13 17:41:31 公開日:2023-04-12 |
# 潜在グラフ推論のためのモデル空間の投影 Projections of Model Spaces for Latent Graph Inference ( http://arxiv.org/abs/2303.11754v3 ) ライセンス: Link先を確認 | Haitz S\'aez de Oc\'ariz Borde, \'Alvaro Arroyo, Ingmar Posner | (参考訳) グラフニューラルネットワークは、グラフの接続構造を帰納バイアスとして利用する。
ホモフィルグラフとヘテロフィルグラフの両方で実験を行う。 Graph Neural Networks leverage the connectivity structure of graphs as an inductive bias. Latent graph inference focuses on learning an adequate graph structure to diffuse information on and improve the downstream performance of the model. In this work we employ stereographic projections of the hyperbolic and spherical model spaces, as well as products of Riemannian manifolds, for the purpose of latent graph inference. Stereographically projected model spaces achieve comparable performance to their non-projected counterparts, while providing theoretical guarantees that avoid divergence of the spaces when the curvature tends to zero. We perform experiments on both homophilic and heterophilic graphs. | 翻訳日:2023-04-13 17:41:05 公開日:2023-04-12 |
# mind meets machine: gpt-4の認知心理学を解き放つ Mind meets machine: Unravelling GPT-4's cognitive psychology ( http://arxiv.org/abs/2303.11436v2 ) ライセンス: Link先を確認 | Sifatkaur Dhingra, Manmeet Singh, Vaisakh SB, Neetiraj Malviya, Sukhpal Singh Gill | (参考訳) 認知心理学は、知覚、注意、記憶、言語、問題解決、意思決定、推論を理解することに集中する。
その結果, GPT-4の認知心理学的能力に対する評価と信頼性が向上した。
機械が人間と機械の推論のギャップを埋めることによって、AIの分野に革命をもたらす大きな可能性を秘めている。 Cognitive psychology delves on understanding perception, attention, memory, language, problem-solving, decision-making, and reasoning. Large language models (LLMs) are emerging as potent tools increasingly capable of performing human-level tasks. The recent development in the form of GPT-4 and its demonstrated success in tasks complex to humans exam and complex problems has led to an increased confidence in the LLMs to become perfect instruments of intelligence. Although GPT-4 report has shown performance on some cognitive psychology tasks, a comprehensive assessment of GPT-4, via the existing well-established datasets is required. In this study, we focus on the evaluation of GPT-4's performance on a set of cognitive psychology datasets such as CommonsenseQA, SuperGLUE, MATH and HANS. In doing so, we understand how GPT-4 processes and integrates cognitive psychology with contextual information, providing insight into the underlying cognitive processes that enable its ability to generate the responses. We show that GPT-4 exhibits a high level of accuracy in cognitive psychology tasks relative to the prior state-of-the-art models. Our results strengthen the already available assessments and confidence on GPT-4's cognitive psychology abilities. It has significant potential to revolutionize the field of AI, by enabling machines to bridge the gap between human and machine reasoning. | 翻訳日:2023-04-13 17:40:55 公開日:2023-04-12 |
# 児童中心型aiにおけるgoldilocksゾーンに向けて Towards Goldilocks Zone in Child-centered AI ( http://arxiv.org/abs/2303.11221v2 ) ライセンス: Link先を確認 | Tahiya Chowdhury | (参考訳) この研究では、YouTube Kidsを例として、子どものAIとのインタラクションプロセスを理解することの必要性と、子どもの感情的、社会的、創造的な開発に広く影響することについて議論する。
子ども中心のaiで価値駆動のインタラクションを作成するためのデザインの推奨事項をいくつか紹介する。 Using YouTube Kids as an example, in this work, we argue the need to understand a child's interaction process with AI and its broader implication on a child's emotional, social, and creative development. We present several design recommendations to create value-driven interaction in child-centric AI that can guide designing compelling, age-appropriate, beneficial AI experiences for children. | 翻訳日:2023-04-13 17:40:36 公開日:2023-04-12 |
# テキスト誘導拡散画像スタイル転送のためのゼロショットコントラスト損失 Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer ( http://arxiv.org/abs/2303.08622v2 ) ライセンス: Link先を確認 | Serin Yang, Hyunmin Hwang, Jong Chul Ye | (参考訳) 拡散モデルはテキスト誘導型画像スタイル転送において大きな可能性を示しているが、その確率的な性質から、スタイル変換とコンテンツ保存のトレードオフがある。
提案手法の有効性を実験的に検証した。 Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method. | 翻訳日:2023-04-13 17:40:28 公開日:2023-04-12 |
# 血液細胞形態におけるロバスト単一細胞分類のための不均衡領域の一般化 Imbalanced Domain Generalization for Robust Single Cell Classification in Hematological Cytomorphology ( http://arxiv.org/abs/2303.07771v2 ) ライセンス: Link先を確認 | Rao Muhammad Umer, Armin Gruber, Sayedali Shetab Boushehri, Christian Metak, Carsten Marr | (参考訳) 白血球の正確な形態分類(WBCs)は白血病の診断において重要なステップであり、非機能的ブラスト細胞が骨髄に蓄積する疾患である。
これは血液形態学における不均衡領域の一般化の初めての実証であり、実験室や診療所への応用のための堅牢な単細胞分類方法の道を開くものである。 Accurate morphological classification of white blood cells (WBCs) is an important step in the diagnosis of leukemia, a disease in which nonfunctional blast cells accumulate in the bone marrow. Recently, deep convolutional neural networks (CNNs) have been successfully used to classify leukocytes by training them on single-cell images from a specific domain. Most CNN models assume that the distributions of the training and test data are similar, i.e., the data are independently and identically distributed. Therefore, they are not robust to different staining procedures, magnifications, resolutions, scanners, or imaging protocols, as well as variations in clinical centers or patient cohorts. In addition, domain-specific data imbalances affect the generalization performance of classifiers. Here, we train a robust CNN for WBC classification by addressing cross-domain data imbalance and domain shifts. To this end, we use two loss functions and demonstrate their effectiveness in out-of-distribution (OOD) generalization. Our approach achieves the best F1 macro score compared to other existing methods and is able to consider rare cell types. This is the first demonstration of imbalanced domain generalization in hematological cytomorphology and paves the way for robust single cell classification methods for the application in laboratories and clinics. | 翻訳日:2023-04-13 17:40:13 公開日:2023-04-12 |
# 量子ダブルロックイン増幅器 Quantum Double Lock-in Amplifier ( http://arxiv.org/abs/2303.07559v2 ) ライセンス: Link先を確認 | Sijie Chen, Min Zhuang, Ruihuang Fang, Yun Chen, Chengyin Han, Bo Lu, Jiahao Huang, and Chaohong Lee | (参考訳) 量子ロックイン増幅器は、量子戦略を用いて強いノイズ背景内の交互信号を抽出することを目的としている。
本研究は, 強い雑音背景下での交互信号の完全な特性を抽出するための道を開き, 実用的な量子センシング技術の開発に有用である。 Quantum lock-in amplifier aims to extract an alternating signal within strong noise background by using quantum strategy. However, as the target signal usually has an unknown initial phase, we can't obtain the complete information of its amplitude, frequency and phase in a single lock-in measurement. Here, to overcome this challenge, we give a general protocol for achieving a quantum double lock-in amplifier and illustrate its realization. In analog to a classical double lock-in amplifier, our protocol is accomplished via two quantum mixers under orthogonal pulse sequences. The two orthogonal pulse sequences act the roles of two orthogonal reference signals in a classical double lock-in amplifier. Combining the output signals, the complete characteristics of the target signal can be obtained. As an example, we illustrate the realization of our quantum double lock-in amplifier via a five-level double-$\Lambda$ coherent population trapping system with $^{87}$Rb atoms, in which each $\Lambda$ structure acts as a quantum mixer and the two applied dynamical decoupling sequences take the roles of two orthogonal reference signals. Our numerical calculations show that the quantum double lock-in amplifier is robust against experimental imperfections, such as finite pulse length and stochastic noise. Our study opens an avenue for extracting complete characteristics of an alternating signal within strong noise background, which is beneficial for developing practical quantum sensing technologies. | 翻訳日:2023-04-13 17:39:51 公開日:2023-04-12 |
# 最適nによるnステップ時間差学習 n-Step Temporal Difference Learning with Optimal n ( http://arxiv.org/abs/2303.07068v2 ) ライセンス: Link先を確認 | Lakshmi Mandal and Shalabh Bhatnagar | (参考訳) 我々は,n-step temporal difference (TD) アルゴリズムにおいて,n の最適値を求める問題を考える。
我々は,同時摂動確率近似 (spsa) のモデルフリー最適化手法を用いて最適な n を求める。
我々は, 離散最適化フレームワークへの連続最適化を目的として, 巡回摂動列を組み込んだ1シミュレーションのspsa手法を採用する。
実験により、n の最適値は任意の任意の初期値に対して SDPSA を用いて達成されることを示す。 We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) algorithm. We find the optimal n by resorting to the model-free optimization technique of simultaneous perturbation stochastic approximation (SPSA). We adopt a one-simulation SPSA procedure that is originally for continuous optimization to the discrete optimization framework but incorporates a cyclic perturbation sequence. We prove the convergence of our proposed algorithm, SDPSA, and show that it finds the optimal value of n in n-step TD. Through experiments, we show that the optimal value of n is achieved with SDPSA for any arbitrary initial value of the same. | 翻訳日:2023-04-13 17:38:56 公開日:2023-04-12 |
# ワンウェイ関数によるタンパ耐性公開鍵を用いた量子公開鍵暗号 Quantum Public-Key Encryption with Tamper-Resilient Public Keys from One-Way Functions ( http://arxiv.org/abs/2304.01800v2 ) ライセンス: Link先を確認 | Fuyuki Kitagawa, Tomoyuki Morimae, Ryo Nishimaki, Takashi Yamakawa | (参考訳) 量子公開鍵暗号を一方向関数から構築する。
ワンウェイ関数(または擬似ランダム関数のような弱いプリミティブ)からの量子公開鍵暗号も近年の著作(森前-山川, eprint:2022/1336, Coladangelo, eprint:2023/282, Grilo-Sattath-Vu, eprint:2023/345, Barooti-Malavolta-Walter, eprint:2023/306]で提案されている。
しかし、それらには大きな欠点がある: 量子公開鍵が送信者(暗号化アルゴリズムを実行する)に送信され、相手に邪魔されることなく、セキュアな量子チャネルのような不満足な物理設定の仮定を必要とする場合にのみ、安全である。
従来の公開鍵暗号の目的を達成する最初の量子公開鍵暗号であり,安全でない通信路上でセキュアな通信を確立することを目的としている。 We construct quantum public-key encryption from one-way functions. In our construction, public keys are quantum, but ciphertexts are classical. Quantum public-key encryption from one-way functions (or weaker primitives such as pseudorandom function-like states) are also proposed in some recent works [Morimae-Yamakawa, eprint:2022/1336; Coladangelo, eprint:2023/282; Grilo-Sattath-Vu, eprint:2023/345; Barooti-Malavolta-Walter, eprint:2023/306]. However, they have a huge drawback: they are secure only when quantum public keys can be transmitted to the sender (who runs the encryption algorithm) without being tampered with by the adversary, which seems to require unsatisfactory physical setup assumptions such as secure quantum channels. Our construction is free from such a drawback: it guarantees the secrecy of the encrypted messages even if we assume only unauthenticated quantum channels. Thus, the encryption is done with adversarially tampered quantum public keys. Our construction based only on one-way functions is the first quantum public-key encryption that achieves the goal of classical public-key encryption, namely, to establish secure communication over insecure channels. | 翻訳日:2023-04-13 17:34:03 公開日:2023-04-12 |
# 臨界1+1Dアベリアン・ヒッグス模型のスペクトル特性 Spectral properties of critical 1+1D Abelian-Higgs model ( http://arxiv.org/abs/2304.01030v2 ) ライセンス: Link先を確認 | Titas Chanda, Marcello Dalmonte, Maciej Lewenstein, Jakub Zakrzewski, Luca Tagliacozzo | (参考訳) 1+1d におけるゲージ対称性の存在は、動的ゲージボソンの存在を意味するものではないため冗長であることが知られている。
しかし, [phys. rev. lett. 18, 090601 (2022)] で発表された最近の研究により, 格子上で系を離散化した場合の予期せぬ相転移が明らかになった。
本稿では、この$c=3/2$理論の2つの成分、すなわち自由マヨラナフェルミオンおよびボゾン成分を平衡および外平衡スペクトル分析によって特徴づけることを目的とする。 The presence of gauge symmetry in 1+1D is known to be redundant, since it does not imply the existence of dynamical gauge bosons. As a consequence, in the continuum, the Abelian-Higgs model, the theory of bosonic matter interacting with photons, just possesses a single phase, as the higher dimensional Higgs and Coulomb phases are connected via non-perturbative effects. However, recent research published in [Phys. Rev. Lett. 128, 090601 (2022)] has revealed an unexpected phase transition when the system is discretized on the lattice. This transition is described by a conformal field theory with a central charge of $c=3/2$. In this paper, we aim to characterize the two components of this $c=3/2$ theory -- namely the free Majorana fermionic and bosonic parts -- through equilibrium and out-of-equilibrium spectral analyses. | 翻訳日:2023-04-13 17:33:34 公開日:2023-04-12 |
# FedIN: モデル不均一性のためのフェデレーション中間層学習 FedIN: Federated Intermediate Layers Learning for Model Heterogeneity ( http://arxiv.org/abs/2304.00759v2 ) ライセンス: Link先を確認 | Yun-Hin Chan, Zhihan Jiang, Jing Deng, Edith C.-H. Ngai | (参考訳) フェデレートラーニング(FL)は、エッジデバイスがローカルおよびプライベートにトレーニングデータを維持しながら、グローバルな共有モデルを協調的にトレーニングすることを促進する。
本研究では,FedIN(Federated Intermediate Layers Learning)と呼ばれる新しいFL手法を提案する。
さらに,本研究では,イントレーニングの有効性と凸最適化問題に対する解法を示す。 Federated learning (FL) facilitates edge devices to cooperatively train a global shared model while maintaining the training data locally and privately. However, a common but impractical assumption in FL is that the participating edge devices possess the same required resources and share identical global model architecture. In this study, we propose a novel FL method called Federated Intermediate Layers Learning (FedIN), supporting heterogeneous models without utilizing any public dataset. The training models in FedIN are divided into three parts, including an extractor, the intermediate layers, and a classifier. The model architectures of the extractor and classifier are the same in all devices to maintain the consistency of the intermediate layer features, while the architectures of the intermediate layers can vary for heterogeneous devices according to their resource capacities. To exploit the knowledge from features, we propose IN training, training the intermediate layers in line with the features from other clients. Additionally, we formulate and solve a convex optimization problem to mitigate the gradient divergence problem induced by the conflicts between the IN training and the local training. The experiment results show that FedIN achieves the best performance in the heterogeneous model environment compared with the state-of-the-art algorithms. Furthermore, our ablation study demonstrates the effectiveness of IN training and the solution to the convex optimization problem. | 翻訳日:2023-04-13 17:33:18 公開日:2023-04-12 |
# 拡散モデルにおけるパラメータ効率のチューニングについて A Closer Look at Parameter-Efficient Tuning in Diffusion Models ( http://arxiv.org/abs/2303.18181v2 ) ライセンス: Link先を確認 | Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu | (参考訳) 安定拡散のような大規模拡散モデルは強力であり、様々な実世界のアプリケーションを見つける一方で、微調整によるモデルカスタマイズはメモリと時間の両方で非効率である。
近年の自然言語処理の進歩により, 学習可能な小モジュール(終端アダプタ)を挿入することにより, 大規模拡散モデルにおけるパラメータ効率の調整について検討した。
特に,アダプタの設計空間を直交因子(入力位置,出力位置,および関数形式)に分解し,離散(設計オプション)と連続変数(評価指標)の相関を解析するための古典的統計手法であるANOVA(Analytic of Variance)を実行する。
そして, 入力位置の選択を慎重に検討し, 追加の可視化分析により, クロスアテンションブロックの後に入力位置を置けば, 最高の性能が得られることを示した。
最後に,完全微調整ベースライン(DreamBoothなど)に匹敵せず,パラメータを0.75 %追加するだけで,様々なカスタマイズタスクに匹敵する拡散モデルのパラメータ効率性チューニングのレシピを提供する。 Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 \% extra parameters, across various customized tasks. | 翻訳日:2023-04-13 17:32:57 公開日:2023-04-12 |
# 画像データにおける物体検出のためのモデル非依存説明可能な人工知能 Model-agnostic explainable artificial intelligence for object detection in image data ( http://arxiv.org/abs/2303.17249v2 ) ライセンス: Link先を確認 | Milad Moradi, Ke Yan, David Colwell, Matthias Samwald, Rhona Asgari | (参考訳) 物体検出はコンピュータビジョンの基本的な課題であり、大規模かつ複雑なディープラーニングモデルを開発することで大きく進歩してきた。
さらに,BODEMが生成する局所マスクを用いて物体検出器のさらなる訓練を行い,検出精度とロバスト性を向上させるデータ拡張実験を行った。 Object detection is a fundamental task in computer vision, which has been greatly progressed through developing large and intricate deep learning models. However, the lack of transparency is a big challenge that may not allow the widespread adoption of these models. Explainable artificial intelligence is a field of research where methods are developed to help users understand the behavior, decision logics, and vulnerabilities of AI-based systems. Black-box explanation refers to explaining decisions of an AI system without having access to its internals. In this paper, we design and implement a black-box explanation method named Black-box Object Detection Explanation by Masking (BODEM) through adopting a new masking approach for AI-based object detection systems. We propose local and distant masking to generate multiple versions of an input image. Local masks are used to disturb pixels within a target object to figure out how the object detector reacts to these changes, while distant masks are used to assess how the detection model's decisions are affected by disturbing pixels outside the object. A saliency map is then created by estimating the importance of pixels through measuring the difference between the detection output before and after masking. Finally, a heatmap is created that visualizes how important pixels within the input image are to the detected objects. The experimentations on various object detection datasets and models showed that BODEM can be effectively used to explain the behavior of object detectors and reveal their vulnerabilities. This makes BODEM suitable for explaining and validating AI based object detection systems in black-box software testing scenarios. Furthermore, we conducted data augmentation experiments that showed local masks produced by BODEM can be used for further training the object detectors and improve their detection accuracy and robustness. | 翻訳日:2023-04-13 17:32:35 公開日:2023-04-12 |
# Seer:潜時拡散モデルを用いた言語指示ビデオ予測 Seer: Language Instructed Video Prediction with Latent Diffusion Models ( http://arxiv.org/abs/2303.14897v2 ) ライセンス: Link先を確認 | Xianfan Gu, Chuan Wen, Jiaming Song, Yang Gao | (参考訳) 将来の軌道を想像することは、ロボットが音を立てて目標を達成するための鍵だ。
フレーム間の事前学習T2Iモデルの豊富な事前知識を伝播させるために, 自己回帰的空間的注意とフレーム列テキスト分解という2つの新しい手法を用いて, U-Netと言語条件モデルを構築した。
v2(ssv2)とbridgedataデータセットによる実験結果は、4つのrtx 3090 gpuで約210時間トレーニングを行い、ssv2上で現在のsomaモデルのfvdを290から200に減らし、人間評価において少なくとも70\%の選好を達成するという、優れたビデオ予測性能を示している。 Imagining the future trajectory is the key for robots to make sound planning and successfully reach their goals. Therefore, text-conditioned video prediction (TVP) is an essential task to facilitate general robot policy learning, i.e., predicting future video frames with a given language instruction and reference frames. It is a highly challenging task to ground task-level goals specified by instructions and high-fidelity frames together, requiring large-scale data and computation. To tackle this task and empower robots with the ability to foresee the future, we propose a sample and computation-efficient model, named \textbf{Seer}, by inflating the pretrained text-to-image (T2I) stable diffusion models along the temporal axis. We inflate the denoising U-Net and language conditioning model with two novel techniques, Autoregressive Spatial-Temporal Attention and Frame Sequential Text Decomposer, to propagate the rich prior knowledge in the pretrained T2I models across the frames. With the well-designed architecture, Seer makes it possible to generate high-fidelity, coherent, and instruction-aligned video frames by fine-tuning a few layers on a small amount of data. The experimental results on Something Something V2 (SSv2) and Bridgedata datasets demonstrate our superior video prediction performance with around 210-hour training on 4 RTX 3090 GPUs: decreasing the FVD of the current SOTA model from 290 to 200 on SSv2 and achieving at least 70\% preference in the human evaluation. | 翻訳日:2023-04-13 17:31:18 公開日:2023-04-12 |
# chatgptをメタバースに解き放つ:救世主か破壊者か? Unleashing ChatGPT on the Metaverse: Savior or Destroyer? ( http://arxiv.org/abs/2303.13856v2 ) ライセンス: Link先を確認 | Pengyuan Zhou | (参考訳) 人工知能(AI)技術の組み込み、特に自然言語処理(NLP)は、没入的で対話的なメタバース体験の開発にますます不可欠になりつつある。
本稿は,ChatGPTがメタバースに与える影響と,これらの機会と障害を評価することで,より没入的で魅力的な仮想環境を効果的に構築する方法について,読者の理解を支援することを目的とする。 The incorporation of artificial intelligence (AI) technology, and in particular natural language processing (NLP), is becoming increasingly vital for the development of immersive and interactive metaverse experiences. One such artificial intelligence tool that is gaining traction in the metaverse is ChatGPT, a large language model trained by OpenAI. The article delves into the pros and cons of utilizing ChatGPT for metaverse-based education, entertainment, personalization, and support. Dynamic and personalized experiences are possible with this technology, but there are also legitimate privacy, bias, and ethical issues to consider. This article aims to help readers understand the possible influence of ChatGPT on the metaverse and how it may be used to effectively create a more immersive and engaging virtual environment by evaluating these opportunities and obstacles. | 翻訳日:2023-04-13 17:30:45 公開日:2023-04-12 |
# GPT-4の医学的課題 Capabilities of GPT-4 on Medical Challenge Problems ( http://arxiv.org/abs/2303.13375v2 ) ライセンス: Link先を確認 | Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, Eric Horvitz | (参考訳) 大規模言語モデル(LLM)は、医学を含む様々な領域にわたる自然言語理解と生成において顕著な能力を示した。
実験では, モデル性能の測定以外にも, テキストと画像を含むテスト質問がモデル性能に及ぼす影響, トレーニング中の内容の記憶の探究, 医療などのハイテイクな応用において重要な確率校正について検討した。
以上の結果から, GPT-4は, 特別なプロンプト工法を使わずにUSMLEのパススコアを20点以上越え, 先進汎用モデル(GPT-3.5)と, 医療知識に特化されたモデル(Flan-PaLM 540Bのプロンプト調整版であるMed-PaLM)よりも優れていた。
さらに、GPT-4 は GPT-3.5 よりも格付けがかなり良く、その答えが正しい可能性を予測する能力が大幅に改善されている。
本研究の意義は,医学教育,評価,臨床実習におけるGPT-4の有用性について考察し,精度と安全性の課題に適切な注意を払っている。 Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation across various domains, including medicine. We present a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. GPT-4 is a general-purpose model that is not specialized for medical problems through training or engineered to solve clinical tasks. Our analysis covers two sets of official practice materials for the USMLE, a three-step examination program used to assess clinical competency and grant licensure in the United States. We also evaluate performance on the MultiMedQA suite of benchmark datasets. Beyond measuring model performance, experiments were conducted to investigate the influence of test questions containing both text and images on model performance, probe for memorization of content during training, and study probability calibration, which is of critical importance in high-stakes applications like medicine. Our results show that GPT-4, without any specialized prompt crafting, exceeds the passing score on USMLE by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). In addition, GPT-4 is significantly better calibrated than GPT-3.5, demonstrating a much-improved ability to predict the likelihood that its answers are correct. We also explore the behavior of the model qualitatively through a case study that shows the ability of GPT-4 to explain medical reasoning, personalize explanations to students, and interactively craft new counterfactual scenarios around a medical case. Implications of the findings are discussed for potential uses of GPT-4 in medical education, assessment, and clinical practice, with appropriate attention to challenges of accuracy and safety. | 翻訳日:2023-04-13 17:30:30 公開日:2023-04-12 |
# 学習における再現性と安定性 Replicability and stability in learning ( http://arxiv.org/abs/2304.03757v2 ) ライセンス: Link先を確認 | Zachary Chase, Shay Moran, Amir Yehudayoff | (参考訳) 研究結果の検証と検証を可能にするため、科学において再現性は不可欠である。
impagliazzo, lei, pitassi, sorrell (`22)は最近、機械学習における再現性の研究を開始した。
この変種はグローバル安定性と呼ばれ、Bun, Livni and Moran ('20) によって差分プライバシーの文脈で導入された。
Impagliazzo et al. は、任意の複製可能なアルゴリズムを、任意の確率が 1 に近く同じ出力を生成するように、どのように向上させるかを示した。
さらに、リストの複製性は、確率を任意に 1 に近づけることで達成できることを示す。
グローバル安定性とリストリプライ可能性の等価性はアルゴリズム的である。 Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran ('20) in the context of differential privacy. Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results, in addition, imply that besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized. The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'{e}-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic. | 翻訳日:2023-04-13 17:22:50 公開日:2023-04-12 |
# RFAConv: 空間的意識と標準的畳み込み運用の革新 RFAConv: Innovating Spatital Attention and Standard Convolutional Operation ( http://arxiv.org/abs/2304.03198v2 ) ライセンス: Link先を確認 | Xin Zhang, Chen Liu, Degang Yang, Tingting Song, Yichen Ye, Ke Li, and Yingze Song | (参考訳) 空間的注意は、重要な情報に焦点を当てることで畳み込みニューラルネットワークの性能を向上させるために広く使われている。
そこで我々は、RFA(Receptive-Field Attention)と呼ばれる新しい注意機構を導入する。
CBAM(Convolutional Block Attention Module)やCA(Coordinate Attention)といった以前の注目メカニズムは空間的特徴のみにのみ焦点をあてていたが、畳み込みカーネルパラメータ共有の問題を完全に解決することはできない。
RFA が開発した Receptive-Field Attention Convolutional Operation (RFAConv) は、標準の畳み込み操作を置き換える新しいアプローチである。
imagenet-1k,ms coco,vocデータセットの一連の実験を行い,分類,オブジェクト検出,セマンティクスセグメンテーションなど,さまざまなタスクにおけるアプローチの優位性を実証した。
関連するタスクのコードと事前トレーニングされたモデルは、https://github.com/liuchen1997/rfaconvで見ることができる。 Spatial attention has been widely used to improve the performance of convolutional neural networks by allowing them to focus on important information. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that it can solve the problem of convolutional kernel parameter sharing. Despite this, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we introduce a new attention mechanism called Receptive-Field Attention (RFA). While previous attention mechanisms such as the Convolutional Block Attention Module (CBAM) and Coordinate Attention (CA) only focus on spatial features, they cannot fully address the issue of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated the superiority of our approach in various tasks including classification, object detection, and semantic segmentation. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. By doing so, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at https://github.com/Liuchen1997/RFAConv. | 翻訳日:2023-04-13 17:22:21 公開日:2023-04-12 |
# 非定常時系列のモーメント移動推定器を用いた適応的学生のt分布 Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series ( http://arxiv.org/abs/2304.03069v2 ) ライセンス: Link先を確認 | Jarek Duda | (参考訳) 実寿命の時系列は通常非定常であり、モデル適応の難しい問題を引き起こす。
例えば、$f_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta (x_\tau))$ move log-likelihood などである。
例えば、1つまたは複数のパワーに対して進化する絶対中心モーメント $E[|x-\mu|^p]$ $p\in\mathbb{R}^+$ using $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$のように、安価な指数移動平均(EMA)を用いてパラメータを推定することができる。
標準的なarma-archアプローチは$\mu$と$\sigma$の進化を提供するが、ここでは$\nu$が$\rho(x)\sim |x|^{-\nu-1}$のテール形状、極端なイベントの確率を記述している。 The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like ARMA-ARCH assume arbitrary type of dependence. To avoid such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta (x_\tau))$ moving log-likelihood, evolving in time. It allows for example to estimate parameters using inexpensive exponential moving averages (EMA), like absolute central moments $E[|x-\mu|^p]$ evolving for one or multiple powers $p\in\mathbb{R}^+$ using $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$. Application of such general adaptive methods of moments will be presented on Student's t-distribution, popular especially in economical applications, here applied to log-returns of DJIA companies. While standard ARMA-ARCH approaches provide evolution of $\mu$ and $\sigma$, here we also get evolution of $\nu$ describing $\rho(x)\sim |x|^{-\nu-1}$ tail shape, probability of extreme events - which might turn out catastrophic, destabilizing the market. | 翻訳日:2023-04-13 17:21:53 公開日:2023-04-12 |
# 一対の代替品のほぼ最適操作 Almost optimal manipulation of a pair of alternatives ( http://arxiv.org/abs/2304.03060v2 ) ライセンス: Link先を確認 | Jacek Szybowski and Konrad Ku{\l}akowski and Sebastian Ernst | (参考訳) 意思決定プロセスにおける専門家の役割は、最終勧告が彼の処分、心の明確さ、経験、問題の知識に依存するため、非常に重要である。
理論的考察は実例で示される。 The role of an expert in the decision-making process is crucial, as the final recommendation depends on his disposition, clarity of mind, experience, and knowledge of the problem. However, the recommendation also depends on their honesty. But what if the expert is dishonest? Then, the answer on how difficult it is to manipulate in a given case becomes essential. In the presented work, we consider manipulation of a ranking obtained by comparing alternatives in pairs. More specifically, we propose an algorithm for finding an almost optimal way to swap the positions of two selected alternatives. Thanks to this, it is possible to determine how difficult such manipulation is in a given case. Theoretical considerations are illustrated by a practical example. | 翻訳日:2023-04-13 17:21:16 公開日:2023-04-12 |
# 勾配解析によるニューラルネットワークのパービューの探索 Probing the Purview of Neural Networks via Gradient Analysis ( http://arxiv.org/abs/2304.02834v2 ) ライセンス: Link先を確認 | Jinsol Lee, Charlie Lehman, Mohit Prabhushankar, Ghassan AlRegib | (参考訳) ニューラルネットワークのデータ依存キャパシティを分析し、推論中のネットワークの観点から入力の異常を評価する。
本手法は, 分布外, 敵対的, 腐敗したサンプルを含む異常な入力の検出に応用する。
このアプローチでは、ハイパーパラメータチューニングや追加のデータ処理を必要とせず、aurocスコアの最大2.7%、19.8%、35.6%を上回っている。 We analyze the data-dependent capacity of neural networks and assess anomalies in inputs from the perspective of networks during inference. The notion of data-dependent capacity allows for analyzing the knowledge base of a model populated by learned features from training data. We define purview as the additional capacity necessary to characterize inference samples that differ from the training data. To probe the purview of a network, we utilize gradients to measure the amount of change required for the model to characterize the given inputs more accurately. To eliminate the dependency on ground-truth labels in generating gradients, we introduce confounding labels that are formulated by combining multiple categorical labels. We demonstrate that our gradient-based approach can effectively differentiate inputs that cannot be accurately represented with learned features. We utilize our approach in applications of detecting anomalous inputs, including out-of-distribution, adversarial, and corrupted samples. Our approach requires no hyperparameter tuning or additional data processing and outperforms state-of-the-art methods by up to 2.7%, 19.8%, and 35.6% of AUROC scores, respectively. | 翻訳日:2023-04-13 17:21:04 公開日:2023-04-12 |
# ペナライズド・ダイバーシティを必要とするソースフリードメイン適応 Source-free Domain Adaptation Requires Penalized Diversity ( http://arxiv.org/abs/2304.02798v2 ) ライセンス: Link先を確認 | Laya Rafiee Sevyeri, Ivaxi Sheth, Farhood Farahnak, Alexandre See, Samira Ebrahimi Kahou, Thomas Fevens, Mohammad Havaei | (参考訳) ニューラルネットワークは、画像分類などの多くのタスクで人間のようなパフォーマンスを達成することができるが、各モデルの印象的なパフォーマンスは、独自のデータセットに限られている。
本研究では,異なる特徴抽出器をDBA(Distinct Backbone Architectures)と組み合わせることで,表現の多様性を促進する新しい無教師付きSFDAアルゴリズムを提案する。
本研究は, DBAとWHPの相乗効果を非教師なし領域適応に適用し, 共変量シフトに適用するPinalized Diversity (PD)を提案する。
自然, 合成, 医療領域における実験結果から, 分散シフトの違いによるPDの有効性が示された。 While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts. | 翻訳日:2023-04-13 17:20:46 公開日:2023-04-12 |
# galactic chitchat: 大きな言語モデルを使って天文学文献と会話する Galactic ChitChat: Using Large Language Models to Converse with Astronomy Literature ( http://arxiv.org/abs/2304.05406v1 ) ライセンス: Link先を確認 | Ioana Ciuc\u{a} and Yuan-Sen Ting | (参考訳) 我々は,現在最先端のOpenAI GPT-4大言語モデルが,文脈内プロンプトを用いて天文学論文と有意義な対話を行う可能性を示す。
効率を最適化するために, 段落構造と全体的な意味的整合性を維持しつつ, 元の入力紙のサイズを50倍に効果的に削減する蒸留技術を用いる。
以上の結果から, GPT-4は多文書領域で優れており, 関連する研究成果の枠組み内での詳細な回答が得られた。
以上の結果から,天文学コミュニティにおける大規模言語モデルの可能性を示し,さらなる探索,特に仮説生成にモデルを活用する可能性を示唆した。 We demonstrate the potential of the state-of-the-art OpenAI GPT-4 large language model to engage in meaningful interactions with Astronomy papers using in-context prompting. To optimize for efficiency, we employ a distillation technique that effectively reduces the size of the original input paper by 50\%, while maintaining the paragraph structure and overall semantic integrity. We then explore the model's responses using a multi-document context (ten distilled documents). Our findings indicate that GPT-4 excels in the multi-document domain, providing detailed answers contextualized within the framework of related research findings. Our results showcase the potential of large language models for the astronomical community, offering a promising avenue for further exploration, particularly the possibility of utilizing the models for hypothesis generation. | 翻訳日:2023-04-13 17:13:24 公開日:2023-04-12 |
# 透過性センシングのためのベイズ最小平均二乗誤差 Bayesian minimum mean square error for transmissivity sensing ( http://arxiv.org/abs/2304.05539v1 ) ライセンス: Link先を確認 | Boyu Zhou, Boulat A. Bash, Saikat Guha, Christos N. Gagatsos | (参考訳) ベイジアンの観点からは、純損失チャネルの透過率を推定する問題、すなわち、未知変数上の事前確率分布関数 (pdf) が利用可能であると考え、ベイジアン最小平均二乗誤差 (mmse) を計算する手法を用いる。
我々は,漁獲アプローチに基づく平均二乗誤差のベイズ下限の代わりにmmseを計算することを強調する。 We address the problem of estimating the transmissivity of the pure-loss channel from the Bayesian point of view, i.e., we consider that some prior probability distribution function (PDF) on the unknown variable is available and we employ methods to compute the Bayesian minimum mean square error (MMSE). Specifically, we consider two prior PDFs: the two-point and the beta distributions. By fixing the input mean photon number to an integer, for the two-point PDF we prove analytically that the optimal state is the Fock state and the optimal measurement is photon-counting, while for the beta PDF our numerical investigation provides evidence on the optimality of the Fock state and photon-counting. Moreover, we investigate the situation where the input mean photon number is any (non-negative) real number. For said case, we conjecture the form of the optimal input states and we study the performance of photon-counting, which is a sub-optimal yet practical measurement. Our methods can be applied for any prior PDF. We emphasize that we compute the MMSE instead of Bayesian lower bounds on the mean square error based on the Fisherian approach. | 翻訳日:2023-04-13 16:38:00 公開日:2023-04-12 |
# ニューラルインバータブル可変光収差補正 Neural Invertible Variable-degree Optical Aberrations Correction ( http://arxiv.org/abs/2304.05564v1 ) ライセンス: Link先を確認 | Shuang Cui, Bingnan Wang, Quan Zheng | (参考訳) 光学系の光学収差は撮像品質を著しく低下させる。
定量的および定性的な実験結果から,本手法は可変度光収差補正法よりも優れることが示された。 Optical aberrations of optical systems cause significant degradation of imaging quality. Aberration correction by sophisticated lens designs and special glass materials generally incurs high cost of manufacturing and the increase in the weight of optical systems, thus recent work has shifted to aberration correction with deep learning-based post-processing. Though real-world optical aberrations vary in degree, existing methods cannot eliminate variable-degree aberrations well, especially for the severe degrees of degradation. Also, previous methods use a single feed-forward neural network and suffer from information loss in the output. To address the issues, we propose a novel aberration correction method with an invertible architecture by leveraging its information-lossless property. Within the architecture, we develop conditional invertible blocks to allow the processing of aberrations with variable degrees. Our method is evaluated on both a synthetic dataset from physics-based imaging simulation and a real captured dataset. Quantitative and qualitative experimental results demonstrate that our method outperforms compared methods in correcting variable-degree optical aberrations. | 翻訳日:2023-04-13 16:27:27 公開日:2023-04-12 |
# シュミット位と行列位によるエンタングルメント蒸留 Entanglement distillation in terms of Schmidt rank and matrix rank ( http://arxiv.org/abs/2304.05563v1 ) ライセンス: Link先を確認 | Tianyi Ding, Lin Chen | (参考訳) エンタングルメント蒸留は量子情報処理において重要なタスクである。
本稿では,Schmidt階数と行列階数の非正分位 (NPT) バイパルタイト状態を蒸留する。
次に, 生成物ベクトルを含む低ランクのB値のNPT状態が蒸留可能であることを示し, 低ランクのB値のNPT状態は, 大容量の密度演算子に対して蒸留可能であることを示した。
最終的には、$M\times N$ bipartite state of rank $\max\{M,N\}+1$ を蒸留する等価条件を示す。 Entanglement distillation is a key task in quantum-information processing. In this paper, we distill non-positive-partial-transpose (NPT) bipartite states of some given Schmidt rank and matrix rank. We show that all bipartite states of Schmidt rank two are locally equivalent to classical-classical states, and all bipartite states of Schmidt rank three are 1-undistillable. Subsequently, we show that low-rank B-irreducible NPT states are distillable for large-rank reduced density operators by proving low-rank B-irreducible NPT state whose range contains a product vector is distillable. Eventually, we present an equivalent condition to distill $M\times N$ bipartite states of rank $\max\{M,N\}+1$. | 翻訳日:2023-04-13 16:27:10 公開日:2023-04-12 |
# 深部バイオメトリック表現の逆変換について On the Adversarial Inversion of Deep Biometric Representations ( http://arxiv.org/abs/2304.05561v1 ) ライセンス: Link先を確認 | Gioacchino Tangari and Shreesh Keskar and Hassan Jameel Asghar and Dali Kaafar | (参考訳) 生体認証サービスプロバイダは、しばしば、数学的(特徴空間)表現から指紋や顔画像などのユーザーの生の生体認証サンプルをリバースエンジニアリングすることは不可能であると主張する。
この攻撃は、元の認識モデル(顔の精度83\%、指紋の86\%)を効果的に推定でき、いくつかのモデルで1-vs-1認証精度で認証された効果的な生体認証再構築を成功させることができる。 Biometric authentication service providers often claim that it is not possible to reverse-engineer a user's raw biometric sample, such as a fingerprint or a face image, from its mathematical (feature-space) representation. In this paper, we investigate this claim on the specific example of deep neural network (DNN) embeddings. Inversion of DNN embeddings has been investigated for explaining deep image representations or synthesizing normalized images. Existing studies leverage full access to all layers of the original model, as well as all possible information on the original dataset. For the biometric authentication use case, we need to investigate this under adversarial settings where an attacker has access to a feature-space representation but no direct access to the exact original dataset nor the original learned model. Instead, we assume varying degree of attacker's background knowledge about the distribution of the dataset as well as the original learned model (architecture and training process). In these cases, we show that the attacker can exploit off-the-shelf DNN models and public datasets, to mimic the behaviour of the original learned model to varying degrees of success, based only on the obtained representation and attacker's prior knowledge. We propose a two-pronged attack that first infers the original DNN by exploiting the model footprint on the embedding, and then reconstructs the raw data by using the inferred model. We show the practicality of the attack on popular DNNs trained for two prominent biometric modalities, face and fingerprint recognition. The attack can effectively infer the original recognition model (mean accuracy 83\% for faces, 86\% for fingerprints), and can craft effective biometric reconstructions that are successfully authenticated with 1-vs-1 authentication accuracy of up to 92\% for some models. | 翻訳日:2023-04-13 16:26:57 公開日:2023-04-12 |
# パノラマ画像の直立調整のためのエンドツーエンドネットワーク An End-to-End Network for Upright Adjustment of Panoramic Images ( http://arxiv.org/abs/2304.05556v1 ) ライセンス: Link先を確認 | Heyu Chen, Jianfeng Li and Shigang Li | (参考訳) 現在、パノラマカメラで簡単にパノラマ画像を得ることができる。
画像再構成に関して、画像再構成において、ディープラーニングネットワークを用いたパノラマ画像のリアルタイムオンラインアップライト再構築を初めて達成した。 Nowadays, panoramic images can be easily obtained by panoramic cameras. However, when the panoramic camera orientation is tilted, a non-upright panoramic image will be captured. Existing upright adjustment models focus on how to estimate more accurate camera orientation, and attribute image reconstruction to offline or post-processing tasks. To this end, we propose an online end-to-end network for upright adjustment. Our network is designed to reconstruct the image while finding the angle. Our network consists of three modules: orientation estimation, LUT online generation, and upright reconstruction. Direction estimation estimates the tilt angle of the panoramic image. Then, a converter block with upsampling function is designed to generate angle to LUT. This module can output corresponding online LUT for different input angles. Finally, a lightweight generative adversarial network (GAN) aims to generate upright images from shallow features. The experimental results show that in terms of angles, we have improved the accuracy of small angle errors. In terms of image reconstruction, In image reconstruction, we have achieved the first real-time online upright reconstruction of panoramic images using deep learning networks. | 翻訳日:2023-04-13 16:26:30 公開日:2023-04-12 |
# マルチモーダル情報監督による移動可能な歩行者表現の学習 Learning Transferable Pedestrian Representation from Multimodal Information Supervision ( http://arxiv.org/abs/2304.05554v1 ) ライセンス: Link先を確認 | Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian | (参考訳) 教師なし人物の再識別〜(reID)に関する最近の研究は、ラベルなし人物画像の事前訓練が、ImageNetの事前訓練よりも下流のreIDタスクにおいて優れた性能を発揮することを示した。
広範な実験により,提案手法は一般歩行者表現の学習を容易にし,様々な歩行者分析タスクに有望な結果をもたらすことを実証した。 Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet. However, those pre-trained methods are specifically designed for reID and suffer flexible adaption to other pedestrian analysis tasks. In this paper, we propose VAL-PAT, a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. To train our framework, we introduce three learning objectives, \emph{i.e.,} self-supervised contrastive learning, image-text contrastive learning and multi-attribute classification. The self-supervised contrastive learning facilitates the learning of the intrinsic pedestrian properties, while the image-text contrastive learning guides the model to focus on the appearance information of pedestrians.Meanwhile, multi-attribute classification encourages the model to recognize attributes to excavate fine-grained pedestrian information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations, and then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search. Extensive experiments demonstrate that our framework facilitates the learning of general pedestrian representations and thus leads to promising results on various pedestrian analysis tasks. | 翻訳日:2023-04-13 16:26:16 公開日:2023-04-12 |
# dynamicdet: オブジェクト検出のための統一的な動的アーキテクチャ DynamicDet: A Unified Dynamic Architecture for Object Detection ( http://arxiv.org/abs/2304.05552v1 ) ライセンス: Link先を確認 | Zhihao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu | (参考訳) 動的ニューラルネットワークは、ディープラーニングにおける新たな研究トピックである。
また, 動的検出器の検出損失に基づいて, 出口基準を用いた新しい最適化手法を提案する。
コードはhttps://github.com/VDIGPKU/DynamicDet.comで入手できる。 Dynamic neural network is an emerging research topic in deep learning. With adaptive inference, dynamic models can achieve remarkable accuracy and computational efficiency. However, it is challenging to design a powerful dynamic detector, because of no suitable dynamic architecture and exiting criterion for object detection. To tackle these difficulties, we propose a dynamic framework for object detection, named DynamicDet. Firstly, we carefully design a dynamic architecture based on the nature of the object detection task. Then, we propose an adaptive router to analyze the multi-scale information and to decide the inference route automatically. We also present a novel optimization strategy with an exiting criterion based on the detection losses for our dynamic detectors. Last, we present a variable-speed inference strategy, which helps to realize a wide range of accuracy-speed trade-offs with only one dynamic detector. Extensive experiments conducted on the COCO benchmark demonstrate that the proposed DynamicDet achieves new state-of-the-art accuracy-speed trade-offs. For instance, with comparable accuracy, the inference speed of our dynamic detector Dy-YOLOv7-W6 surpasses YOLOv7-E6 by 12%, YOLOv7-D6 by 17%, and YOLOv7-E6E by 39%. The code is available at https://github.com/VDIGPKU/DynamicDet. | 翻訳日:2023-04-13 16:25:52 公開日:2023-04-12 |
# 2次元ヒューマン・ポーズ推定のための蒸留乾式ポース変圧器 Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation ( http://arxiv.org/abs/2304.05548v1 ) ライセンス: Link先を確認 | Feixiang Ren | (参考訳) 近年、人間のポーズ推定にはトランスフォーマーモデルが広く使われている。
最近のトークン処理されたPose Transformer (PPT)は、画像の背景トークンをプルーニングすることでこの問題を解決する。
この問題を解決するために,人間のポーズ推定(DPPT)のためのDistilling Pruned-Token Transformerを提案する。
MPIIデータセットによる実験結果から,DPPTは計算複雑性を低減しつつ,従来のPPTモデルと比較してPCKを大幅に改善できることが示された。 Human pose estimation has seen widespread use of transformer models in recent years. Pose transformers benefit from the self-attention map, which captures the correlation between human joint tokens and the image. However, training such models is computationally expensive. The recent token-Pruned Pose Transformer (PPT) solves this problem by pruning the background tokens of the image, which are usually less informative. However, although it improves efficiency, PPT inevitably leads to worse performance than TokenPose due to the pruning of tokens. To overcome this problem, we present a novel method called Distilling Pruned-Token Transformer for human pose estimation (DPPT). Our method leverages the output of a pre-trained TokenPose to supervise the learning process of PPT. We also establish connections between the internal structure of pose transformers and PPT, such as attention maps and joint features. Our experimental results on the MPII datasets show that our DPPT can significantly improve PCK compared to previous PPT models while still reducing computational complexity. | 翻訳日:2023-04-13 16:25:30 公開日:2023-04-12 |
# 分類学クラスインクリメンタル学習 Taxonomic Class Incremental Learning ( http://arxiv.org/abs/2304.05547v1 ) ライセンス: Link先を確認 | Yuzhao Chen, Zonghuan Li, Zhiyuan Hu, Nuno Vasconcelos | (参考訳) 継続的学習の問題は近年注目を集めている。
そこで本研究では,Taxonomic Class Incremental Learning (TCIL) 問題を提案する。
CIFAR-100 と ImageNet-100 の実験では,既存の SOTA 法を CIFAR-100 と ImageNet-100 で最終精度で2% 上回った TCIL 法の有効性を示した。 The problem of continual learning has attracted rising attention in recent years. However, few works have questioned the commonly used learning setup, based on a task curriculum of random class. This differs significantly from human continual learning, which is guided by taxonomic curricula. In this work, we propose the Taxonomic Class Incremental Learning (TCIL) problem. In TCIL, the task sequence is organized based on a taxonomic class tree. We unify existing approaches to CIL and taxonomic learning as parameter inheritance schemes and introduce a new such scheme for the TCIL learning. This enables the incremental transfer of knowledge from ancestor to descendant class of a class taxonomy through parameter inheritance. Experiments on CIFAR-100 and ImageNet-100 show the effectiveness of the proposed TCIL method, which outperforms existing SOTA methods by 2% in terms of final accuracy on CIFAR-100 and 3% on ImageNet-100. | 翻訳日:2023-04-13 16:25:12 公開日:2023-04-12 |
# MEMA Runtime Framework:マイクロコントローラ上のTinyMLの外部メモリアクセスを最小化 MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers ( http://arxiv.org/abs/2304.05544v1 ) ライセンス: Link先を確認 | Andrew Sabot, Vikas Natesh, H.T. Kung, Wei-Te Ting | (参考訳) 本稿では,行列乗算のための外部メモリアクセスを最小限に抑える効率的な推論ランタイムの簡易かつ迅速な導出のためのmemaフレームワークを提案する。
例えば、ARM Cortex-M4のニューラルネットワークベンチマークでは、最大1.8倍のスピードアップと44%のエネルギー削減を実現しています。 We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in the current practice, that is, optimal schedules tend to be found only through a time consuming and heuristic search of a large scheduling space. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems. For example, for neural network benchmarks on the ARM Cortex-M4, we achieve up to a 1.8x speedup and 44% energy reduction over CMSIS-NN. | 翻訳日:2023-04-13 16:24:58 公開日:2023-04-12 |
# CLCLSA:非完全マルチオミクスデータとのマルチオミクス統合のためのコントラスト学習と自己注意によるクロスオミクスの埋め込み CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data ( http://arxiv.org/abs/2304.05542v1 ) ライセンス: Link先を確認 | Chen Zhao, Anqi Liu, Xiao Zhang, Xuewei Cao, Zhengming Ding, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou | (参考訳) 不均一・高次元マルチオミクスデータの統合は、遺伝データの理解においてますます重要になっている。
実験の結果,clclsaは不完全マルチオミクスデータを用いたマルチオミクスデータ分類の最先端手法よりも優れていた。 Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data. | 翻訳日:2023-04-13 16:24:44 公開日:2023-04-12 |
# 微分可能プログラミングと機械学習を用いた学習型マルチフィジカルインバージョン Learned multiphysics inversion with differentiable programming and machine learning ( http://arxiv.org/abs/2304.05592v1 ) ライセンス: Link先を確認 | Mathias Louboutin and Ziyi Yin and Rafael Orozco and Thomas J. Grady II and Ali Siahkoohi and Gabrio Rizzuti and Philipp A. Witte and Olav M{\o}yner and Gerard J. Gorman and Felix J. Herrmann | (参考訳) 本稿では,画像・モデリング・モニタリングのためのslim(quasmic laboratory for imaging and modeling/monitoring)オープンソースソフトウェアフレームワークと,さらに一般的には,波動方程式(地震波・医用超音波など),学習された事前情報による規則化,多相流シミュレーションのための学習ニューラルネットワークサロゲートに関する逆問題について述べる。
波動物理学と多相流の結合を別にして,タイムラプスクロスウェル地震データから透過性反転するスケーラブルなプロトタイプを構築することで,当社の設計原理とそのメリットを実証し,実証する。 We present the Seismic Laboratory for Imaging and Modeling/Monitoring (SLIM) open-source software framework for computational geophysics and, more generally, inverse problems involving the wave-equation (e.g., seismic and medical ultrasound), regularization with learned priors, and learned neural surrogates for multiphase flow simulations. By integrating multiple layers of abstraction, our software is designed to be both readable and scalable. This allows researchers to easily formulate their problems in an abstract fashion while exploiting the latest developments in high-performance computing. We illustrate and demonstrate our design principles and their benefits by means of building a scalable prototype for permeability inversion from time-lapse crosswell seismic data, which aside from coupling of wave physics and multiphase flow, involves machine learning. | 翻訳日:2023-04-13 16:19:07 公開日:2023-04-12 |
# FLAN-T5における意味的特徴検証 Semantic Feature Verification in FLAN-T5 ( http://arxiv.org/abs/2304.05591v1 ) ライセンス: Link先を確認 | Siddharth Suresh, Kushin Mukherjee, Timothy T. Rogers | (参考訳) 本研究では,認知科学における概念構造評価のための重要なツールである意味的特徴規範の生成を支援する大規模言語モデルの可能性を評価した。
その結果,LLMは従来の意味的特徴ノルム検証手法を大幅に強化し,人間や機械における概念表現の理解に寄与することが示唆された。 This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of semantic similarity amongst items that are distally related. The results suggest that LLMs can greatly enhance traditional methods of semantic feature norm verification, with implications for our understanding of conceptual representation in humans and machines. | 翻訳日:2023-04-13 16:18:51 公開日:2023-04-12 |
# 先行画像のない不適切な画像再構成 Ill-Posed Image Reconstruction Without an Image Prior ( http://arxiv.org/abs/2304.05589v1 ) ライセンス: Link先を確認 | Oscar Leong and Angela F. Gao and He Sun and Katherine L. Bouman | (参考訳) 未解決画像や地中サンプルにアクセスせずに逆問題を解くことを検討する。
我々が提案するフレームワークは, 一般的な前方モデル破壊を処理可能であり, 少数の地上画像 (\leqslant 150$) から得られる測定値が, 「優先的」な画像再構成に十分であることを示す。
我々は, 様々な凸・非凸逆問題に対して, ノイズ除去, 位相抽出, ブラックホールビデオ再構成などのアプローチを実証する。 We consider solving ill-posed imaging inverse problems without access to an image prior or ground-truth examples. An overarching challenge in these inverse problems is that an infinite number of images, including many that are implausible, are consistent with the observed measurements. Thus, image priors are required to reduce the space of possible solutions to more desireable reconstructions. However, in many applications it is difficult or potentially impossible to obtain example images to construct an image prior. Hence inaccurate priors are often used, which inevitably result in biased solutions. Rather than solving an inverse problem using priors that encode the spatial structure of any one image, we propose to solve a set of inverse problems jointly by incorporating prior constraints on the collective structure of the underlying images. The key assumption of our work is that the underlying images we aim to reconstruct share common, low-dimensional structure. We show that such a set of inverse problems can be solved simultaneously without the use of a spatial image prior by instead inferring a shared image generator with a low-dimensional latent space. The parameters of the generator and latent embeddings are found by maximizing a proxy for the Evidence Lower Bound (ELBO). Once identified, the generator and latent embeddings can be combined to provide reconstructed images for each inverse problem. The framework we propose can handle general forward model corruptions, and we show that measurements derived from only a small number of ground-truth images ($\leqslant 150$) are sufficient for "prior-free" image reconstruction. We demonstrate our approach on a variety of convex and non-convex inverse problems, ranging from denoising, phase retrieval, and black hole video reconstruction. | 翻訳日:2023-04-13 16:18:40 公開日:2023-04-12 |
# スパイクニューラルネットワークシミュレーション、シリアライズ、相互運用性のための分散圧縮スパース列フォーマット Distributed Compressed Sparse Row Format for Spiking Neural Network Simulation, Serialization, and Interoperability ( http://arxiv.org/abs/2304.05587v1 ) ライセンス: Link先を確認 | Felix Wang | (参考訳) ニューロモルフィックプラットフォームとその関連ソフトウェアツールの開発が増加し、スパイクニューラルネットワーク(SNN)モデルの規模が増大するにつれ、ネットワーク状態の相互運用可能でスケーラブルな表現に対する圧力が高まっている。
我々は, ニューロンやシナプス状態などの付加的なネットワーク情報を, dCSR がネットワーク状態のパーティショニングに基づく直接分布を提供するため, その隣接性に合わせて整理する。
私たちはまた、潜在的な実装を提供し、ニューラルコンピューティングコミュニティ内での採用を前進させています。 With the increasing development of neuromorphic platforms and their related software tools as well as the increasing scale of spiking neural network (SNN) models, there is a pressure for interoperable and scalable representations of network state. In response to this, we discuss a parallel extension of a widely used format for efficiently representing sparse matrices, the compressed sparse row (CSR), in the context of supporting the simulation and serialization of large-scale SNNs. Sparse matrices for graph adjacency structure provide a natural fit for describing the connectivity of an SNN, and prior work in the area of parallel graph partitioning has developed the distributed CSR (dCSR) format for storing and ingesting large graphs. We contend that organizing additional network information, such as neuron and synapse state, in alignment with its adjacency as dCSR provides a straightforward partition-based distribution of network state. For large-scale simulations, this means each parallel process is only responsible for its own partition of state, which becomes especially useful when the size of an SNN exceeds the memory resources of a single compute node. For potentially long-running simulations, this also enables network serialization to and from disk (e.g. for checkpoint/restart fault-tolerant computing) to be performed largely independently between parallel processes. We also provide a potential implementation, and put it forward for adoption within the neural computing community. | 翻訳日:2023-04-13 16:18:10 公開日:2023-04-12 |
# 情報性は重要か?
教育対話行為分類のためのアクティブラーニング Does Informativeness Matter? Active Learning for Educational Dialogue Act Classification ( http://arxiv.org/abs/2304.05578v1 ) ライセンス: Link先を確認 | Wei Tan, Jionghao Lin, David Lang, Guanliang Chen, Dragan Gasevic, Lan Du, Wray Buntine | (参考訳) 対話法(DA)は、専門家の家庭教師が何をし、授業中に学生が知っていることを説明するために用いられる。
そこで本研究では, ALサンプリングプロセスにおいて, DA分類器をサポートするために, AL法が情報的サンプルを選択する方法について検討した。
また,alサンプリングプロセスにおいて,alメソッドが手動アノテーションのコストを削減する方法を示す。 Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are then used to train DA classifiers. However, these studies have paid little attention to sample informativeness, which can reflect the information quantity of the selected samples and inform the extent to which a classifier can learn patterns. Notably, the informativeness level may vary among the samples and the classifier might only need a small amount of low informative samples to learn the patterns. Random sampling may overlook sample informativeness, which consumes human labelling costs and contributes less to training the classifiers. As an alternative, researchers suggest employing statistical sampling methods of Active Learning (AL) to identify the informative samples for training the classifiers. However, the use of AL methods in educational DA classification tasks is under-explored. In this paper, we examine the informativeness of annotated sentence samples. Then, the study investigates how the AL methods can select informative samples to support DA classifiers in the AL sampling process. The results reveal that most annotated sentences present low informativeness in the training dataset and the patterns of these sentences can be easily captured by the DA classifier. We also demonstrate how AL methods can reduce the cost of manual annotation in the AL sampling process. | 翻訳日:2023-04-13 16:17:44 公開日:2023-04-12 |
# SGL:カメラローカライゼーションのための構造指導学習 SGL: Structure Guidance Learning for Camera Localization ( http://arxiv.org/abs/2304.05571v1 ) ライセンス: Link先を確認 | Xudong Zhang, Shuang Gao, Xiaohu Nan, Haikuan Ning, Yuchen Yang, Yishan Ping, Jixiang Wan, Shuzhou Dong, Jijunnan Li, Yandong Guo | (参考訳) カメラのローカライゼーション(英: camera localization)は、さまざまな人工知能やロボット工学の応用に役立つ古典的なコンピュータビジョンタスクである。
近年、Deep Neural Networks(DNN)の急速な発展に伴い、エンド・ツー・エンドの視覚的ローカライゼーション手法が繁栄している。
sota(state-of-the-art)法と十分なアブレーション実験との比較により,提案手法の有効性を確認した。 Camera localization is a classical computer vision task that serves various Artificial Intelligence and Robotics applications. With the rapid developments of Deep Neural Networks (DNNs), end-to-end visual localization methods are prosperous in recent years. In this work, we focus on the scene coordinate prediction ones and propose a network architecture named as Structure Guidance Learning (SGL) which utilizes the receptive branch and the structure branch to extract both high-level and low-level features to estimate the 3D coordinates. We design a confidence strategy to refine and filter the predicted 3D observations, which enables us to estimate the camera poses by employing the Perspective-n-Point (PnP) with RANSAC. In the training part, we design the Bundle Adjustment trainer to help the network fit the scenes better. Comparisons with some state-of-the-art (SOTA) methods and sufficient ablation experiments confirm the validity of our proposed architecture. | 翻訳日:2023-04-13 16:17:20 公開日:2023-04-12 |
# 非線形転位カー状態とその非古典的性質 Nonlinear displaced Kerr state and its nonclassical properties ( http://arxiv.org/abs/2304.05570v1 ) ライセンス: Link先を確認 | Arpita Chatterjee and Rupamanjari Ghosh | (参考訳) 我々は,よく知られた光子付加コヒーレント状態が通常のケラ媒質を介して送信された状態に対して変位演算子を適用することにより,非線形変位ケラ状態の異なるクラスを構築する。
低ケラパラメータ近似を用いて無限レベル問題を切り詰めた離散2レベル系に還元し、生成した非古典性を線形光学装置の出力状態の2つのモード間を2部的な絡み合いに変換する。 We construct a distinct class of nonlinear displaced Kerr state by application of the displacement operator upon a state which is prepared by sending the well-known photon-added coherent state through a normal Kerr medium. A sketch for the experimental set-up for preparing the state is suggested. We evaluate some statistical properties such as the photon number distribution, Mandel's $Q$ parameter, Husimi-$Q$ and Wigner functions, and quadrature squeezing, for the nonlinear displaced Kerr state, and then analyze the nonclassicality in terms of these standard parameters. We reduce the infinite-level problem to a truncated discrete two-level system by using low Kerr parameter approximation and then convert the generated nonclassicality into bipartite entanglement between the two modes of an output state of a linear optical device. | 翻訳日:2023-04-13 16:17:03 公開日:2023-04-12 |
# デュアルエンコーダを用いたシーンテキスト編集のための拡散モデルの改善 Improving Diffusion Models for Scene Text Editing with Dual Encoders ( http://arxiv.org/abs/2304.05568v1 ) ライセンス: Link先を確認 | Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang | (参考訳) シーンテキスト編集は、自然でリアルな外観を維持しながら、画像中の特定のテキストを修正または挿入する難しいタスクである。
https://github.com/UCSB-NLP-Chang/DiffSTE Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE | 翻訳日:2023-04-13 16:16:47 公開日:2023-04-12 |
# 2つの減衰量子化場の相互作用の厳密解 Exact solution for the interaction of two decaying quantized fields ( http://arxiv.org/abs/2304.05566v1 ) ライセンス: Link先を確認 | L. Hern\'andez-S\'anchez, I. Ramos-Prieto, F. Soto-Eguibar, H. M. Moya-Cessa | (参考訳) 2つの結合調和振動子のマルコフダイナミクスをschr\"odinger方程式と有効な非エルミートハミルトニアンを用いて解析できることを示した。
最後に、余分な非ユニタリ変換を適用することで、有効な非エルミートハミルトニアンを対角化し、完全な量子領域における任意の入力状態の進化を得ることができる。 We show that the Markovian dynamics of two coupled harmonic oscillators may be analyzed using a Schr\"odinger equation and an effective non-Hermitian Hamiltonian. This may be achieved by a non-unitary transformation that involves superoperators; such transformation enables the removal of quantum jump superoperators, that allows us to rewrite the Lindblad master equation in terms of a von Neumann-like equation with an effective non-Hermitian Hamiltonian. This may be generalized to an arbitrary number of interacting fields. Finally, by applying an extra non-unitary transformation, we may diagonalize the effective non-Hermitian Hamiltonian to obtain the evolution of any input state in a fully quantum domain. | 翻訳日:2023-04-13 16:16:22 公開日:2023-04-12 |
# セメストラルコース通過確率の同定における機械学習アルゴリズムを用いた予測モデル A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course ( http://arxiv.org/abs/2304.05565v1 ) ライセンス: Link先を確認 | Anabella C. Doctor | (参考訳) 本研究の目的は,学期前半に受講したコースを受講する確率を学習するための予測モデルを決定することである。
知識の伝達や学生の学業成績向上のプロセスを改善することにより、教育システムにおける意思決定に有用な結果をもたらす、高い受理性、正確、精度のよい予測モデルを発見し、CRISP-DM(Cross-Industry Standard Process for Data Mining)方法論を厳密に踏襲する。
さらに、一部の学生の人口統計情報、データセット内の膨大なデータ、生徒がどの基準を規制できる予測基準指標の自動的および手作業によるプロセス、学期半ばから早くも学期末のコースを受講するためには、より多くの改善が必要となる。 This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed. | 翻訳日:2023-04-13 16:16:08 公開日:2023-04-12 |
# 知識蒸留によるニューラルネットワークからのディープスパイクニューラルネットワークの構築 Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation ( http://arxiv.org/abs/2304.05627v1 ) ライセンス: Link先を確認 | Qi Xu, Yaxin Li, Jiangrong Shen, Jian K Liu, Huajin Tang, Gang Pan | (参考訳) スパイクニューラルネットワーク(snn)は、スパイクを生体神経系に近い情報単位として利用する重要なコンポーネントであるため、高い計算効率を持つ脳に触発されたモデルとしてよく知られている。
提案手法は,より効率的かつ合理的な深層スパイク構造を構築するだけでなく,直接訓練やANN to SNN法と比較して,モデル全体をトレーニングするための時間ステップも少ない。
提案手法は,より深い構造を高スループットで構築し,実用シナリオの軽量で効率的な脳にインスパイアされた計算に活用することで,snの性能を向上させる効率的な手法を提供する。 Spiking neural networks (SNNs) are well known as the brain-inspired models with high computing efficiency, due to a key component that they utilize spikes as information units, close to the biological neural systems. Although spiking based models are energy efficient by taking advantage of discrete spike signals, their performance is limited by current network structures and their training methods. As discrete signals, typical SNNs cannot apply the gradient descent rules directly into parameters adjustment as artificial neural networks (ANNs). Aiming at this limitation, here we propose a novel method of constructing deep SNN models with knowledge distillation (KD) that uses ANN as teacher model and SNN as student model. Through ANN-SNN joint training algorithm, the student SNN model can learn rich feature information from the teacher ANN model through the KD method, yet it avoids training SNN from scratch when communicating with non-differentiable spikes. Our method can not only build a more efficient deep spiking structure feasibly and reasonably, but use few time steps to train whole model compared to direct training or ANN to SNN methods. More importantly, it has a superb ability of noise immunity for various types of artificial noises and natural signals. The proposed novel method provides efficient ways to improve the performance of SNN through constructing deeper structures in a high-throughput fashion, with potential usage for light and efficient brain-inspired computing of practical scenarios. | 翻訳日:2023-04-13 16:09:12 公開日:2023-04-12 |
# 合成関連拡散画像データを用いた乳がん臨床診断支援のための複数施設のオープンソースベンチマークデータセット A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data ( http://arxiv.org/abs/2304.05623v1 ) ライセンス: Link先を確認 | Chi-en Amy Tai, Hayden Gunraj, Alexander Wong | (参考訳) 近年, 合成相関拡散(CDI$^s$)画像と呼ばれる新しいMRI法が導入され, 現行の金標準MRI法と比較して, 前立腺癌などのがんに対する臨床診断支援にかなりの期待が持たれている。
CDI$^s$を用いて、乳がんに対するコンピュータ支援臨床診断支援の開発を推進すべく、乳がん患者の画像データであるCDI$^s$の多施設向けオープンソースベンチマークデータセットであるC cancer-Net BCaを紹介した。
癌-Net BCaは10施設にわたる253人の前治療コホートからCDI$^s$の容積画像と、詳細な診断メタデータ(病変型、遺伝子亜型、MRI(MRLD)の最長径、SBR(Scarff-Bloom-Richardson)グレード、治療後の乳癌の病理学的完全反応(pCR)などを含む。
我々はさらに、がん-net bcaデータセットの人口動態と腫瘍多様性を調べ、潜在的なバイアスに対する深い洞察を得る。
Cancer-Net BCaは、機械学習の進歩を加速し、がんと戦う臨床医を助ける、グローバルなオープンソースイニシアチブの一部として、一般公開されている。 Recently, a new form of magnetic resonance imaging (MRI) called synthetic correlated diffusion (CDI$^s$) imaging was introduced and showed considerable promise for clinical decision support for cancers such as prostate cancer when compared to current gold-standard MRI techniques. However, the efficacy for CDI$^s$ for other forms of cancers such as breast cancer has not been as well-explored nor have CDI$^s$ data been previously made publicly available. Motivated to advance efforts in the development of computer-aided clinical decision support for breast cancer using CDI$^s$, we introduce Cancer-Net BCa, a multi-institutional open-source benchmark dataset of volumetric CDI$^s$ imaging data of breast cancer patients. Cancer-Net BCa contains CDI$^s$ volumetric images from a pre-treatment cohort of 253 patients across ten institutions, along with detailed annotation metadata (the lesion type, genetic subtype, longest diameter on the MRI (MRLD), the Scarff-Bloom-Richardson (SBR) grade, and the post-treatment breast cancer pathologic complete response (pCR) to neoadjuvant chemotherapy). We further examine the demographic and tumour diversity of the Cancer-Net BCa dataset to gain deeper insights into potential biases. Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer. | 翻訳日:2023-04-13 16:08:47 公開日:2023-04-12 |
# SAMM(Segment Any Medical Model):SAMへの3Dスライダ統合 SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM ( http://arxiv.org/abs/2304.05622v1 ) ライセンス: Link先を確認 | Yihao Liu, Jiaming Zhang, Zhangcong She, Amir Kheradmand and Mehran Armand | (参考訳) segment anything model(sam)は、現在最大のセグメンテーションデータセットでトレーニングされている新しいイメージセグメンテーションツールである。
医療画像におけるSAMの開発,評価,利用を支援するため,医療画像コミュニティで広く利用されているオープンソース画像処理および可視化ソフトウェアである3Dスライダ上のSAMの拡張であるSegment Any Medical Model (SAMM)を紹介した。
3D Slicerのオープンソース拡張とそのデモはGitHubに投稿されている(https://github.com/bingogome/samm)。
SAMMは完全なサイクルの0.6秒のレイテンシを実現し、ほぼリアルタイムで画像マスクを推測できる。 The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm). SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time. | 翻訳日:2023-04-13 16:08:17 公開日:2023-04-12 |
# NutritionVerse-Thin:3次元食品モデルのレンダリング改善のための最適化戦略 NutritionVerse-Thin: An Optimized Strategy for Enabling Improved Rendering of 3D Thin Food Models ( http://arxiv.org/abs/2304.05620v1 ) ライセンス: Link先を確認 | Chi-en Amy Tai, Jason Li, Sriram Kumar, Saeejith Nair, Yuhao Chen, Pengcheng Xi, Alexander Wong | (参考訳) 生成モデルの能力向上に伴い、一般的な3D食品のリアルなレンダリングを用いて、食品印刷、栄養予測、食品の無駄管理といった下流業務を改善することへの関心が高まっている。
単純ながら、この技術は細い3Dオブジェクトの迅速かつ高度に一貫したキャプチャに利用できる。 With the growth in capabilities of generative models, there has been growing interest in using photo-realistic renders of common 3D food items to improve downstream tasks such as food printing, nutrition prediction, or management of food wastage. Despite 3D modelling capabilities being more accessible than ever due to the success of NeRF based view-synthesis, such rendering methods still struggle to correctly capture thin food objects, often generating meshes with significant holes. In this study, we present an optimized strategy for enabling improved rendering of thin 3D food models, and demonstrate qualitative improvements in rendering quality. Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method and tailors the strategy at both the data collection and training stages to better handle thin objects. While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects. | 翻訳日:2023-04-13 16:08:04 公開日:2023-04-12 |
# NutritionVerse-3D:栄養摂取推定のための3次元食品モデルデータセット NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation ( http://arxiv.org/abs/2304.05619v1 ) ライセンス: Link先を確認 | Chi-en Amy Tai, Matthew Keller, Mattie Kerrigan, Yuhao Chen, Saeejith Nair, Pengcheng Xi, Alexander Wong | (参考訳) 50歳以上の成人の77%は今日、適切な栄養摂取を確保する上で大きな課題となっている。
本稿では, 食品の高品質な3Dモデル収集手法を, スピードと一貫性に特化して開発し, 関連量, 食品名, 栄養価とともに, 105個の食品モデルからなる大規模高画質な高解像度データセットであるNutritionVerse-3Dを導入する。
NutritionVerse-3Dは、栄養センシングのための機械学習を加速するオープンイニシアチブの一部として公開されている。 77% of adults over 50 want to age in place today, presenting a major challenge to ensuring adequate nutritional intake. It has been reported that one in four older adults that are 65 years or older are malnourished and given the direct link between malnutrition and decreased quality of life, there have been numerous studies conducted on how to efficiently track nutritional intake of food. Recent advancements in machine learning and computer vision show promise of automated nutrition tracking methods of food, but require a large high-quality dataset in order to accurately identify the nutrients from the food on the plate. Unlike existing datasets, a collection of 3D models with nutritional information allow for view synthesis to create an infinite number of 2D images for any given viewpoint/camera angle along with the associated nutritional information. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce NutritionVerse-3D, a large-scale high-quality high-resolution dataset of 105 3D food models, in conjunction with their associated weight, food name, and nutritional value. These models allow for large quantity food intake scenes, diverse and customizable scene layout, and an infinite number of camera settings and lighting conditions. NutritionVerse-3D is publicly available as a part of an open initiative to accelerate machine learning for nutrition sensing. | 翻訳日:2023-04-13 16:07:47 公開日:2023-04-12 |
# 配電系統推薦のための深層安定多目的学習 Deep Stable Multi-Interest Learning for Out-of-distribution Sequential Recommendation ( http://arxiv.org/abs/2304.05615v1 ) ライセンス: Link先を確認 | Qiang Liu, Zhaocheng Liu, Zhenxi Zhu, Shu Wu, Liang Wang | (参考訳) 近年,複数表現ベクトルとしてユーザの興味を抽出した多目的モデルが,逐次レコメンデーションに有望な性能を示している。
以上の OOD 一般化問題に対処するため,提案手法では,抽出した関心事の非相関化を図るために,Deep Stable Multi-Interest Learning (DESMIL) と呼ばれる新しい多目的ネットワークを提案する。
一方、DESMILは、トレーニングサンプルを重み付けしたHilbert-Schmidt Independence Criterion(HSIC)に基づく重み付き相関推定損失を取り入れ、抽出された利益間の相関を最小化する。
OODとランダムな設定の両方で大規模な実験が行われ、それぞれ36.8%と21.7%の相対的な改善が達成されている。 Recently, multi-interest models, which extract interests of a user as multiple representation vectors, have shown promising performances for sequential recommendation. However, none of existing multi-interest recommendation models consider the Out-Of-Distribution (OOD) generalization problem, in which interest distribution may change. Considering multiple interests of a user are usually highly correlated, the model has chance to learn spurious correlations between noisy interests and target items. Once the data distribution changes, the correlations among interests may also change, and the spurious correlations will mislead the model to make wrong predictions. To tackle with above OOD generalization problem, we propose a novel multi-interest network, named DEep Stable Multi-Interest Learning (DESMIL), which attempts to de-correlate the extracted interests in the model, and thus spurious correlations can be eliminated. DESMIL applies an attentive module to extract multiple interests, and then selects the most important one for making final predictions. Meanwhile, DESMIL incorporates a weighted correlation estimation loss based on Hilbert-Schmidt Independence Criterion (HSIC), with which training samples are weighted, to minimize the correlations among extracted interests. Extensive experiments have been conducted under both OOD and random settings, and up to 36.8% and 21.7% relative improvements are achieved respectively. | 翻訳日:2023-04-13 16:07:24 公開日:2023-04-12 |
# chatgpt beyond english:多言語学習における大規模言語モデルの包括的評価に向けて ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning ( http://arxiv.org/abs/2304.05613v1 ) ライセンス: Link先を確認 | Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen | (参考訳) ここ数年、大規模言語モデル (LLM) が自然言語処理(NLP)における最も重要なブレークスルーとして現れ、この分野の研究と発展を根本的に変えてきた。
従来のモデルと比較すると,様々なNLPタスクや言語に対するChatGPTの性能は低下しており,より優れたモデル開発と多言語学習の理解が求められている。 Over the last few years, large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) that fundamentally transform research and developments in the field. ChatGPT represents one of the most exciting LLM systems developed recently to showcase impressive skills for language generation and highly attract public attention. Among various exciting applications discovered for ChatGPT in English, the model can process and generate texts for multiple languages due to its multilingual training data. Given the broad adoption of ChatGPT for English in different problems and areas, a natural question is whether ChatGPT can also be applied effectively for other languages or it is necessary to develop more language-specific technologies. The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i.e., beyond reported anecdotes), which is still missing or limited in current research. Our work aims to fill this gap for the evaluation of ChatGPT and similar LLMs to provide more comprehensive information for multilingual NLP applications. While this work will be an ongoing effort to include additional experiments in the future, our current paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. We also focus on the zero-shot learning setting for ChatGPT to improve reproducibility and better simulate the interactions of general users. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages, calling for further research to develop better models and understanding for multilingual learning. | 翻訳日:2023-04-13 16:06:57 公開日:2023-04-12 |
# 環境正義データツールにおける割当害の可能性 Potential for allocative harm in an environmental justice data tool ( http://arxiv.org/abs/2304.05603v1 ) ライセンス: Link先を確認 | Benjamin Q. Huynh, Elizabeth T. Chin, Allison Koenecke, Derek Ouyang, Daniel E. Ho, Mathew V. Kiang, David H. Rehkopf | (参考訳) 政策決定を知らせるために、近隣レベルのスクリーニングアルゴリズムがますます展開されている。
このアルゴリズムは経済的に連続しており、4年間で208億ドル (\$1.56-2.41B) に相当する104% (62-145%) の資金調達の増加と評価されている。
我々は,誤用を防止するために,注意度分析を組み込むことを推奨する。 Neighborhood-level screening algorithms are increasingly being deployed to inform policy decisions. We evaluate one such algorithm, CalEnviroScreen - designed to promote environmental justice and used to guide hundreds of millions of dollars in public funding annually - assessing its potential for allocative harm. We observe high sensitivity to subjective model decisions and susceptibility to manipulation, resulting in allocative tradeoffs with ethical concerns. We find the algorithm to be financially consequential, estimating the effect of its positive designations as a 104% (62-145%) increase in funding, equivalent to \$2.08 billion (\$1.56-2.41 billion) over four years. We recommend incorporating sensitivity analyses to mitigate allocative harm and accountability mechanisms to prevent misuse. | 翻訳日:2023-04-13 16:06:28 公開日:2023-04-12 |
# Floquet $0-\pi$ qubitの量子制御とノイズ保護 Quantum control and noise protection of a Floquet $0-\pi$ qubit ( http://arxiv.org/abs/2304.05601v1 ) ライセンス: Link先を確認 | Zhaoyou Wang, Amir H. Safavi-Naeini | (参考訳) 時間周期系は、限られた物理的相互作用から新しい効果的なハミルトニアンを設計できる。
本稿では,機械式Kapitza振り子の超伝導回路アナログであるFloquet qubitを$\textit{Kapitzonium}$として提案する。
しかし,Floquet qubit 部分空間から散逸が漏れることが判明した。
送電時の高忠実度量子制御に不可欠な量子ビット部分空間を安定化するために, 受動的冷却方式を考案した。
我々の研究は、大規模に保護されたエンジニアリングダイナミクスを実現するために、より複雑なフロケ量子システムをゼロから開発する最初のステップを提供する。 Time-periodic systems allow engineering new effective Hamiltonians from limited physical interactions. For example, the inverted position of the Kapitza pendulum emerges as a stable equilibrium with rapid drive of its pivot point. In this work, we propose the $\textit{Kapitzonium}$: a Floquet qubit that is the superconducting circuit analog of a mechanical Kapitza pendulum. Under periodic driving, the emerging qubit states are exponentially protected against bit and phase flips caused by dissipation, which is the primary source of decoherence of current qubits. However, we find that dissipation causes leakage out of the Floquet qubit subspace. We engineer a passive cooling scheme to stabilize the qubit subspace, which is crucial for high fidelity quantum control under dissipation. Furthermore, we introduce a hardware-efficient fluorescence-based method for qubit measurement and discuss the experimental implementation of the Floquet qubit. The proposed Kapitzonium is one of the simplest Floquet qubits that can be realized with current technology -- and it already has many intriguing features and capabilities. Our work provides the first steps to develop more complex Floquet quantum systems from the ground up to realize large-scale protected engineered dynamics. | 翻訳日:2023-04-13 16:06:14 公開日:2023-04-12 |
# 類似しているように聞こえる:オーディオビジュアル表現学習のための反事実的クロスモーダルペアの活用 Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning ( http://arxiv.org/abs/2304.05600v1 ) ライセンス: Link先を確認 | Nikhil Singh, Chih-Wei Wu, Iroro Orife, Mahdi Kalayeh | (参考訳) 視覚表現学習は一般的に視覚と音の対応に依存する。
以上の結果から,dub-augmented trainingは,言語的タスク全体のパフォーマンスに大きな影響を与えることなく,聴覚的および聴覚的タスクのパフォーマンスを向上させることが示された。
これらの知見は、シーンレベルの音声視覚対応を学習する際の音声変化を考慮することの重要性を強調し、より堅牢なパフォーマンスに向けてオーディオ視覚モデルを訓練する上で有用な拡張手法であることを示す。 Audiovisual representation learning typically relies on the correspondence between sight and sound. However, there are often multiple audio tracks that can correspond with a visual scene. Consider, for example, different conversations on the same crowded street. The effect of such counterfactual pairs on audiovisual representation learning has not been previously explored. To investigate this, we use dubbed versions of movies to augment cross-modal contrastive learning. Our approach learns to represent alternate audio tracks, differing only in speech content, similarly to the same video. Our results show that dub-augmented training improves performance on a range of auditory and audiovisual tasks, without significantly affecting linguistic task performance overall. We additionally compare this approach to a strong baseline where we remove speech before pretraining, and find that dub-augmented training is more effective, including for paralinguistic and audiovisual tasks where speech removal leads to worse performance. These findings highlight the importance of considering speech variation when learning scene-level audiovisual correspondences and suggest that dubbed audio can be a useful augmentation technique for training audiovisual models toward more robust performance. | 翻訳日:2023-04-13 16:05:53 公開日:2023-04-12 |
# 開語彙課題における説明力向上のためのCLIP手術 CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks ( http://arxiv.org/abs/2304.05653v1 ) ライセンス: Link先を確認 | Yi Li, Hualiang Wang, Yiqun Duan, Xiaomeng Li | (参考訳) コントラスト型言語イメージ事前学習(clip)は,ゼロショット学習やテキスト誘導型視覚タスクなど,下流タスクに大きなメリットがある強力なマルチモーダル大規模視覚モデルである。
これらの知見に基づいて,複数のオープン語彙タスクにおいて,推論アーキテクチャや特徴に対する手術的な修正を可能にするCLIPオペレーション(CLIP Surgery)を提案する。
さらに,Segment Anything Model (SAM) のようなマルチモーダルな可視化や対話型セグメンテーションなどのタスクにも有効である。
コードはhttps://github.com/xmed-lab/CLIP_Surgeryで入手できる。 Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision model that has demonstrated significant benefits for downstream tasks, including many zero-shot learning and text-guided vision tasks. However, we notice some severe problems regarding the model's explainability, which undermines its credibility and impedes related tasks. Specifically, we find CLIP prefers the background regions than the foregrounds according to the predicted similarity map, which contradicts human understanding. Besides, there are obvious noisy activations on the visualization results at irrelevant positions. To address these two issues, we conduct in-depth analyses and reveal the reasons with new findings and evidences. Based on these insights, we propose the CLIP Surgery, a method that enables surgery-like modifications for the inference architecture and features, for better explainability and enhancement in multiple open-vocabulary tasks. The proposed method has significantly improved the explainability of CLIP for both convolutional networks and vision transformers, surpassing existing methods by large margins. Besides, our approach also demonstrates remarkable improvements in open-vocabulary segmentation and multi-label recognition tasks. For examples, the mAP improvement on NUS-Wide multi-label recognition is 4.41% without any additional training, and our CLIP Surgery surpasses the state-of-the-art method by 8.74% at mIoU on Cityscapes open-vocabulary semantic segmentation. Furthermore, our method benefits other tasks including multimodal visualization and interactive segmentation like Segment Anything Model (SAM). The code is available at https://github.com/xmed-lab/CLIP_Surgery | 翻訳日:2023-04-13 16:00:33 公開日:2023-04-12 |
# 赤外・可視画像登録のためのモダリティ不変表現 Modality-Invariant Representation for Infrared and Visible Image Registration ( http://arxiv.org/abs/2304.05646v1 ) ライセンス: Link先を確認 | Zhiying Jiang, Zengxi Zhang, Jinyuan Liu, Xin Fan, Risheng Liu | (参考訳) 視野、解像度、相対位置の違いから、赤外線カメラと可視カメラからなるマルチモダリティセンシングモジュールは、より正確なシーン知覚を有するように登録する必要がある。
広範囲な実験により,提案手法の有効性が検証され,その後の応用が進展する。 Since the differences in viewing range, resolution and relative position, the multi-modality sensing module composed of infrared and visible cameras needs to be registered so as to have more accurate scene perception. In practice, manual calibration-based registration is the most widely used process, and it is regularly calibrated to maintain accuracy, which is time-consuming and labor-intensive. To cope with these problems, we propose a scene-adaptive infrared and visible image registration. Specifically, in regard of the discrepancy between multi-modality images, an invertible translation process is developed to establish a modality-invariant domain, which comprehensively embraces the feature intensity and distribution of both infrared and visible modalities. We employ homography to simulate the deformation between different planes and develop a hierarchical framework to rectify the deformation inferred from the proposed latent representation in a coarse-to-fine manner. For that, the advanced perception ability coupled with the residual estimation conducive to the regression of sparse offsets, and the alternate correlation search facilitates a more accurate correspondence matching. Moreover, we propose the first ground truth available misaligned infrared and visible image dataset, involving three synthetic sets and one real-world set. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications. | 翻訳日:2023-04-13 15:59:59 公開日:2023-04-12 |
# WildRefer: マルチモーダルビジュアルデータと自然言語を用いた大規模動的シーンにおける3次元オブジェクトのローカライゼーション WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language ( http://arxiv.org/abs/2304.05645v1 ) ライセンス: Link先を確認 | Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma | (参考訳) 本稿では,2次元画像と3次元LiDAR点雲を含む,自然言語記述とオンラインキャプチャによるマルチモーダル視覚データに基づく大規模動的シーンにおける3次元視覚グラウンドの課題を紹介する。
コードとデータセットは、論文が公開されたときにリリースされる。 We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds. We present a novel method, WildRefer, for this task by fully utilizing the appearance features in images, the location and geometry features in point clouds, and the dynamic features in consecutive input frames to match the semantic features in language. In particular, we propose two novel datasets, STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios with abundant 3D object and natural language annotations. Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots. Extensive comparisons and ablation studies illustrate that our method achieves state-of-the-art performance on two proposed datasets. Code and dataset will be released when the paper is published. | 翻訳日:2023-04-13 15:59:37 公開日:2023-04-12 |
# Global Prompt Cell: 効率的なPromptのためのポータブルコントロールモジュール Global Prompt Cell: A Portable Control Module for Effective Prompt ( http://arxiv.org/abs/2304.05642v1 ) ライセンス: Link先を確認 | Chi Liu, Haochun Wang, Nuwa Xi, Sendong Zhao, Bing Qin | (参考訳) 事前学習されたモデルのチューニングにおける新しいアプローチとして、プロンプトチューニングは、第1層の入力にトレーニング可能な埋め込みを挿入しながら、下流タスクのパラメータを凍結する。
この問題に対処するために,すべてのエンコーダ層にまたがるプロンプト情報を選択的に保存するプロンプトチューニングモジュールであるGPC(Global Prompt Cell)を導入する。
実験の結果,バニラプロンプトチューニングと比較して,SuperGLUEデータセットは5.8%改善した。 As a novel approach to tuning pre-trained models, prompt tuning involves freezing the parameters in downstream tasks while inserting trainable embeddings into inputs in the first layer.However,previous methods have mainly focused on the initialization of prompt embeddings. The question of how to train and utilize prompt embeddings in a reasonable way has become aa limiting factor in the effectiveness of prompt tuning. To address this issue, we introduce the Global Prompt Cell (GPC), a portable control module for prompt tuning that selectively preserves prompt information across all encoder layers. Our experimental results demonstrate a 5.8% improvement on SuperGLUE datasets compared to vanilla prompt tuning. | 翻訳日:2023-04-13 15:59:20 公開日:2023-04-12 |
# Face Anti-Spoofingのためのインスタンス対応ドメイン一般化 Instance-Aware Domain Generalization for Face Anti-Spoofing ( http://arxiv.org/abs/2304.05640v1 ) ライセンス: Link先を確認 | Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma | (参考訳) ドメイン一般化(DG)に基づく対面アンチスプーフィング(FAS)は、最近、目に見えないシナリオの一般化を改善するために研究されている。
これらの問題に対処するために、ドメインラベルを必要とせずに、インスタンスレベルで機能を整列するDG FASの新しい視点を提案する。
具体的には,Asymmetric Instance Adaptive Whiteningを提案し,特徴相関を適応的に排除し,一般化を促進する。
コードはhttps://github.com/qianyuzqy/IADG.comで公開される。 Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG. | 翻訳日:2023-04-13 15:59:07 公開日:2023-04-12 |
# 連続セルオートマトンにおける開放型進化の大規模シミュレーションに向けて Towards Large-Scale Simulations of Open-Ended Evolution in Continuous Cellular Automata ( http://arxiv.org/abs/2304.05639v1 ) ライセンス: Link先を確認 | Bert Wang-Chak Chan | (参考訳) 生物と文化の進化に触発されて、人工知能と人工生命の開放性に必要な条件を探求し、解明する試みが数多く行われている。
ベースシステムとしてレニアと呼ばれる連続セルオートマトンを用い,並列計算フレームワーク jax を用いた大規模進化シミュレーションを行い,自己組織的パターンの絶え間ない進化を目標とした。
1) 遺伝的操作者の暗黙的実装, パターンの自己複製による複製, 差在的成功による選択, (2) 遺伝的情報の局在化, 3) 局所化された遺伝子型を動的に維持し表現型に翻訳するアルゴリズムなど, システム設計の選択肢を数多く報告した。
この実験に基づいて,仮想環境設計,質量保存,エネルギー制約など,よりオープンな進化を促進する要因をいくつか提案する。 Inspired by biological and cultural evolution, there have been many attempts to explore and elucidate the necessary conditions for open-endedness in artificial intelligence and artificial life. Using a continuous cellular automata called Lenia as the base system, we built large-scale evolutionary simulations using parallel computing framework JAX, in order to achieve the goal of never-ending evolution of self-organizing patterns. We report a number of system design choices, including (1) implicit implementation of genetic operators, such as reproduction by pattern self-replication, and selection by differential existential success; (2) localization of genetic information; and (3) algorithms for dynamically maintenance of the localized genotypes and translation to phenotypes. Simulation results tend to go through a phase of diversity and creativity, gradually converge to domination by fast expanding patterns, presumably a optimal solution under the current design. Based on our experimentation, we propose several factors that may further facilitate open-ended evolution, such as virtual environment design, mass conservation, and energy constraints. | 翻訳日:2023-04-13 15:58:44 公開日:2023-04-12 |
# PLCに基づく制御プロセスにおける進化的アルゴリズムによる自己最適化と自動コード生成 Self Optimisation and Automatic Code Generation by Evolutionary Algorithms in PLC based Controlling Processes ( http://arxiv.org/abs/2304.05638v1 ) ライセンス: Link先を確認 | Marlon L\"oppenberg and Andreas Schwung | (参考訳) 自動化のデジタルトランスフォーメーションは、産業プロセスにおけるデータ取得と処理に新たな要求をもたらす。
提案手法は,多目的最適化問題を考慮した産業用液体ステーションプロセスで評価する。 The digital transformation of automation places new demands on data acquisition and processing in industrial processes. Logical relationships between acquired data and cyclic process sequences must be correctly interpreted and evaluated. To solve this problem, a novel approach based on evolutionary algorithms is proposed to self optimise the system logic of complex processes. Based on the genetic results, a programme code for the system implementation is derived by decoding the solution. This is achieved by a flexible system structure with an upstream, intermediate and downstream unit. In the intermediate unit, a directed learning process interacts with a system replica and an evaluation function in a closed loop. The code generation strategy is represented by redundancy and priority, sequencing and performance derivation. The presented approach is evaluated on an industrial liquid station process subject to a multi-objective optimisation problem. | 翻訳日:2023-04-13 15:58:25 公開日:2023-04-12 |
# 適応表現と集約による弱監督型医用画像分割の統一とパーソナライズ Unifying and Personalizing Weakly-supervised Federated Medical Image Segmentation via Adaptive Representation and Aggregation ( http://arxiv.org/abs/2304.05635v1 ) ライセンス: Link先を確認 | Li Lin, Jiewei Wu, Yixiang Liu, Kenneth K. Y. Wong, Xiaoying Tang | (参考訳) フェデレーション学習(fl)は、データのプライバシとセキュリティを損なうことなく、複数のサイトが協力して強力な深層モデルのトレーニングを可能にする。
弱い教師付きセグメンテーション(sparsely-grained (point-, bounding box-, scribble-, block-wise) によるセグメント化は,アノテーションコストの削減という大きな可能性から,ますます注目されている。
本稿では、AdaptIve Contrastive Representation and Aggregationにより、不均一な弱い監督を均一に活用する医療画像セグメンテーションのための新しいFLフレームワークであるFedICRAを提案する。
私たちのコードとデータはhttps://github.com/llmir/FedICRA.comで公開されています。 Federated learning (FL) enables multiple sites to collaboratively train powerful deep models without compromising data privacy and security. The statistical heterogeneity (e.g., non-IID data and domain shifts) is a primary obstacle in FL, impairing the generalization performance of the global model. Weakly supervised segmentation, which uses sparsely-grained (i.e., point-, bounding box-, scribble-, block-wise) supervision, is increasingly being paid attention to due to its great potential of reducing annotation costs. However, there may exist label heterogeneity, i.e., different annotation forms across sites. In this paper, we propose a novel personalized FL framework for medical image segmentation, named FedICRA, which uniformly leverages heterogeneous weak supervision via adaptIve Contrastive Representation and Aggregation. Concretely, to facilitate personalized modeling and to avoid confusion, a channel selection based site contrastive representation module is employed to adaptively cluster intra-site embeddings and separate inter-site ones. To effectively integrate the common knowledge from the global model with the unique knowledge from each local model, an adaptive aggregation module is applied for updating and initializing local models at the element level. Additionally, a weakly supervised objective function that leverages a multiscale tree energy loss and a gated CRF loss is employed to generate more precise pseudo-labels and further boost the segmentation performance. Through extensive experiments on two distinct medical image segmentation tasks of different modalities, the proposed FedICRA demonstrates overwhelming performance over other state-of-the-art personalized FL methods. Its performance even approaches that of fully supervised training on centralized data. Our code and data are available at https://github.com/llmir/FedICRA. | 翻訳日:2023-04-13 15:58:16 公開日:2023-04-12 |
# 気分はどうですか?
映画シーンにおける感情と精神状態の学習 How you feelin'? Learning Emotions and Mental States in Movie Scenes ( http://arxiv.org/abs/2304.05634v1 ) ライセンス: Link先を確認 | Dhruv Srivastava and Aditya Kumar Singh and Makarand Tapaswi | (参考訳) 映画のストーリー分析にはキャラクターの感情や精神状態を理解する必要がある。
EmoTxの自己注意スコアを分析すると、表現的な感情がしばしば文字トークンを見るのに対し、他の精神状態はビデオやダイアログの手がかりに依存することが分かる。 Movie story analysis requires understanding characters' emotions and mental states. Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character. We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. By leveraging annotations from the MovieGraphs dataset, we aim to predict classic emotions (e.g. happy, angry) and other mental states (e.g. honest, helpful). We conduct experiments on the most frequently occurring 10 and 25 labels, and a mapping that clusters 181 labels to 26. Ablation studies and comparison against adapted state-of-the-art emotion recognition approaches shows the effectiveness of EmoTx. Analyzing EmoTx's self-attention scores reveals that expressive emotions often look at character tokens while other mental states rely on video and dialog cues. | 翻訳日:2023-04-13 15:57:36 公開日:2023-04-12 |
How you feelin'? Learning Emotions and Mental States in Movie Scenes
Movie story analysis requires understanding characters' emotions and mental states.
# 実時間軌道に基づくソーシャルグループ検出 Real-time Trajectory-based Social Group Detection ( http://arxiv.org/abs/2304.05678v1 ) ライセンス: Link先を確認 | Simindokht Jahangard, Munawar Hayat and Hamid Rezatofighi | (参考訳) ソーシャルグループ検出は、ロボットナビゲーションや人間とロボットのインタラクションなど、さまざまなロボットアプリケーションの重要な側面である。
これまでに、F-formation や trajectory similarity framework など、この課題に対処するために様々なモデルベースのテクニックが採用されている。
これらの結果は,提案手法が実時間ロボット応用に適していることを示す。 Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications. | 翻訳日:2023-04-13 15:50:23 公開日:2023-04-12 |
# ドメイン一般化のためのセマンティック・アウェア・ミックスアップ Semantic-Aware Mixup for Domain Generalization ( http://arxiv.org/abs/2304.05675v1 ) ライセンス: Link先を確認 | Chengchao Xu and Xinmei Tian | (参考訳) ディープニューラルネットワーク(DNN)は、様々なタスクにおいてエキサイティングなパフォーマンスを示しているが、未知のターゲットドメインに合うと一般化の失敗に悩まされる。
いくつかのDGベンチマークで画像分類タスクを用いてSAMの有効性を検証する。 Deep neural networks (DNNs) have shown exciting performance in various tasks, yet suffer generalization failures when meeting unknown target domains. One of the most promising approaches to achieve domain generalization (DG) is generating unseen data, e.g., mixup, to cover the unknown target data. However, existing works overlook the challenges induced by the simultaneous appearance of changes in both the semantic and distribution space. Accordingly, such a challenge makes source distributions hard to fit for DNNs. To mitigate the hard-fitting issue, we propose to perform a semantic-aware mixup (SAM) for domain generalization, where whether to perform mixup depends on the semantic and domain information. The feasibility of SAM shares the same spirits with the Fourier-based mixup. Namely, the Fourier phase spectrum is expected to contain semantics information (relating to labels), while the Fourier amplitude retains other information (relating to style information). Built upon the insight, SAM applies different mixup strategies to the Fourier phase spectrum and amplitude information. For instance, SAM merely performs mixup on the amplitude spectrum when both the semantic and domain information changes. Consequently, the overwhelmingly large change can be avoided. We validate the effectiveness of SAM using image classification tasks on several DG benchmarks. | 翻訳日:2023-04-13 15:49:59 公開日:2023-04-12 |
# 合成データを用いた深層学習による眼画像における角膜反射の精密局在 Precise localization of corneal reflections in eye images using deep learning trained on synthetic data ( http://arxiv.org/abs/2304.05673v1 ) ライセンス: Link先を確認 | Sean Anthony Byrne, Marcus Nystr\"om, Virmarie Maquiling, Enkelejda Kasneci, Diederick C. Niehorster | (参考訳) 眼球画像中の1つの角膜反射(CR)の中心を正確に位置決めする深層学習法を提案する。
CR中心のより優れた局在化と適用容易性により、CRベースのアイトラッカーの精度と精度を向上させる可能性がある。 We present a deep learning method for accurately localizing the center of a single corneal reflection (CR) in an eye image. Unlike previous approaches, we use a convolutional neural network (CNN) that was trained solely using simulated data. Using only simulated data has the benefit of completely sidestepping the time-consuming process of manual annotation that is required for supervised training on real eye images. To systematically evaluate the accuracy of our method, we first tested it on images with simulated CRs placed on different backgrounds and embedded in varying levels of noise. Second, we tested the method on high-quality videos captured from real eyes. Our method outperformed state-of-the-art algorithmic methods on real eye images with a 35% reduction in terms of spatial precision, and performed on par with state-of-the-art on simulated images in terms of spatial accuracy.We conclude that our method provides a precise method for CR center localization and provides a solution to the data availability problem which is one of the important common roadblocks in the development of deep learning models for gaze estimation. Due to the superior CR center localization and ease of application, our method has the potential to improve the accuracy and precision of CR-based eye trackers | 翻訳日:2023-04-13 15:49:40 公開日:2023-04-12 |
# サイクルグラフ上の量子ウォークによる任意の量子演算の実装 Implementing arbitrary quantum operations via quantum walks on a cycle graph ( http://arxiv.org/abs/2304.05672v1 ) ライセンス: Link先を確認 | Jia-Yi Lin, Xin-Yu Li, Yu-Hao Shao, Wei Wang and Shengjun Wu | (参考訳) 量子回路モデル(quantum circuit model)は、量子コンピュータや量子ニューラルネットワークを実装する上で最も一般的に用いられるモデルである。
本研究は、量子計算におけるDTQWベースのニューラルネットワークの機能とその実験室実装における可能性を示す。 The quantum circuit model is the most commonly used model for implementing quantum computers and quantum neural networks whose essential tasks are to realize certain unitary operations. The circuit model usually implements a desired unitary operation by a sequence of single-qubit and two-qubit unitary gates from a universal set. Although this certainly facilitates the experimentalists as they only need to prepare several different kinds of universal gates, the number of gates required to implement an arbitrary desired unitary operation is usually large. Hence the efficiency in terms of the circuit depth or running time is not guaranteed. Here we propose an alternative approach; we use a simple discrete-time quantum walk (DTQW) on a cycle graph to model an arbitrary unitary operation without the need to decompose it into a sequence of gates of smaller sizes. Our model is essentially a quantum neural network based on DTQW. Firstly, it is universal as we show that any unitary operation can be realized via an appropriate choice of coin operators. Secondly, our DTQW-based neural network can be updated efficiently via a learning algorithm, i.e., a modified stochastic gradient descent algorithm adapted to our network. By training this network, one can promisingly find approximations to arbitrary desired unitary operations. With an additional measurement on the output, the DTQW-based neural network can also implement general measurements described by positive-operator-valued measures (POVMs). We show its capacity in implementing arbitrary 2-outcome POVM measurements via numeric simulation. We further demonstrate that the network can be simplified and can overcome device noises during the training so that it becomes more friendly for laboratory implementations. Our work shows the capability of the DTQW-based neural network in quantum computation and its potential in laboratory implementations. | 翻訳日:2023-04-13 15:49:14 公開日:2023-04-12 |
# 効率良く正確な材料照明推定のための因子化逆経路追跡 Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation ( http://arxiv.org/abs/2304.05669v1 ) ライセンス: Link先を確認 | Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi | (参考訳) 近年,室内シーンの幾何および多視点hdr観測により,複合材料と照明推定に逆経路追跡が適用されている。
ソースコードはhttps://github.com/lwwu2/fiptで入手できる。 Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. We propose a novel Factorized Inverse Path Tracing (FIPT) method which utilizes a factored light transport formulation and finds emitters driven by rendering errors. Our algorithm enables accurate material and lighting optimization faster than previous work, and is more effective at resolving ambiguities. The exhaustive experiments on synthetic scenes show that our method (1) outperforms state-of-the-art indoor inverse rendering and relighting methods particularly in the presence of complex illumination effects; (2) speeds up inverse path tracing optimization to less than an hour. We further demonstrate robustness to noisy inputs through material and lighting estimates that allow plausible relighting in a real scene. The source code is available at: https://github.com/lwwu2/fipt | 翻訳日:2023-04-13 15:48:50 公開日:2023-04-12 |
# 鉄道検知:効率的なロウベースネットワークと新しいベンチマーク Rail Detection: An Efficient Row-based Network and A New Benchmark ( http://arxiv.org/abs/2304.05667v1 ) ライセンス: Link先を確認 | Xinpeng Li and Xiaojiang Peng | (参考訳) 鉄道異常検出に不可欠な鉄道検出は、ビデオフレーム内の鉄道領域を特定することを目的としている。
(i)実世界の鉄道データセットである rail-db を7432対のイメージとアノテーションで提供する。
(ii)軽量畳み込みバックボーンとアンカー分類器を備えた効率的な行ベースレール検出手法である rail-net を提案する。
具体的には, レール検出の過程を行選択問題として定式化する。
(iii) resnet から vision transformer までのクロスシーン設定やネットワークバックボーンを含む広範な実験を行い, rail-db 上の rail-net を評価した。
データベースとコードは、https://github.com/Sampson-Lee/Rail-Detection.comで入手できる。 Rail detection, essential for railroad anomaly detection, aims to identify the railroad region in video frames. Although various studies on rail detection exist, neither an open benchmark nor a high-speed network is available in the community, making algorithm comparison and development difficult. Inspired by the growth of lane detection, we propose a rail database and a row-based rail detection method. In detail, we make several contributions: (i) We present a real-world railway dataset, Rail-DB, with 7432 pairs of images and annotations. The images are collected from different situations in lighting, road structures, and views. The rails are labeled with polylines, and the images are categorized into nine scenes. The Rail-DB is expected to facilitate the improvement of rail detection algorithms. (ii) We present an efficient row-based rail detection method, Rail-Net, containing a lightweight convolutional backbone and an anchor classifier. Specifically, we formulate the process of rail detection as a row-based selecting problem. This strategy reduces the computational cost compared to alternative segmentation methods. (iii) We evaluate the Rail-Net on Rail-DB with extensive experiments, including cross-scene settings and network backbones ranging from ResNet to Vision Transformers. Our method achieves promising performance in terms of both speed and accuracy. Notably, a lightweight version could achieve 92.77% accuracy and 312 frames per second. The Rail-Net outperforms the traditional method by 50.65% and the segmentation one by 5.86%. The database and code are available at: https://github.com/Sampson-Lee/Rail-Detection. | 翻訳日:2023-04-13 15:48:33 公開日:2023-04-12 |
# ランダムウォーク型量子ニューラルネットワークによる状態分類 State Classification via a Random-Walk-Based Quantum Neural Network ( http://arxiv.org/abs/2304.05662v1 ) ライセンス: Link先を確認 | Lu-Ji Wang, Jia-Yi Lin, and Shengjun Wu | (参考訳) 量子情報技術では、重要な情報は異なる量子状態に定期的に符号化される。
以上の結果から,QSNNは未知の量子状態を量子情報で処理する大きな可能性を示唆している。 In quantum information technology, crucial information is regularly encoded in different quantum states. To extract information, the identification of one state from the others is inevitable. However, if the states are non-orthogonal and unknown, this task will become awesomely tricky, especially when our resources are also limited. Here, we introduce the quantum stochastic neural network (QSNN), and show its capability to accomplish the binary discrimination of quantum states. After a handful of optimizing iterations, the QSNN achieves a success probability close to the theoretical optimum, no matter whether the states are pure or mixed. Other than binary discrimination, the QSNN is also applied to classify an unknown set of states into two types: entangled ones and separable ones. After training with four samples, it can classify a number of states with acceptable accuracy. Our results suggest that the QSNN has the great potential to process unknown quantum states in quantum information. | 翻訳日:2023-04-13 15:48:12 公開日:2023-04-12 |
# superpixelgraph:意味に敏感なスーパーピクセルとニューラルネットワークによるビルディングフットプリントの半自動生成 SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks ( http://arxiv.org/abs/2304.05661v1 ) ライセンス: Link先を確認 | Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang and Qing Zhu | (参考訳) ほとんどの都市アプリケーションは、ピクセルワイドのラスタ画像ではなく、シャープな境界を持つ簡潔なベクトルグラフィックスの形で、フットプリントを構築する必要がある。
さらに,インタラクティブな編集を行うための最適化された洗練されたパイプラインを考案し,結果の質をさらに向上させた。 Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. We observe a 10\% enhancement in the metric for superpixel clustering and an 8\% increment in vector graphics evaluation, when compared with established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results. | 翻訳日:2023-04-13 15:47:57 公開日:2023-04-12 |
# RIFormer:Token Mixerを外しながらビジョンバックボーンを効果的に保つ RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer ( http://arxiv.org/abs/2304.05659v1 ) ライセンス: Link先を確認 | Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin | (参考訳) 本稿では,基本構造ブロックのトークンミキサーを除去しながら,視覚バックボーンを効果的に維持する方法について検討する。
プロジェクトページ: https://techmonsterwang.github.io/riformer/ This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are intended to perform information communication between different spatial tokens but suffer from considerable computational cost and latency. However, directly removing them will lead to an incomplete model structure prior, and thus brings a significant accuracy drop. To this end, we first develop an RepIdentityFormer base on the re-parameterizing idea, to study the token mixer free model architecture. And we then explore the improved learning paradigm to break the limitation of simple token mixer free backbone, and summarize the empirical practice into 5 guidelines. Equipped with the proposed optimization strategy, we are able to build an extremely simple vision backbone with encouraging performance, while enjoying the high efficiency during inference. Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy. We hope this work can serve as a starting point for the exploration of optimization-driven efficient network design. Project page: https://techmonsterwang.github.io/RIFormer/. | 翻訳日:2023-04-13 15:47:30 公開日:2023-04-12 |
# 正規化・多視点支援ベクトル機械学習のローカライズ Localisation of Regularised and Multiview Support Vector Machine Learning ( http://arxiv.org/abs/2304.05655v1 ) ライセンス: Link先を確認 | Aurelian Gheondea and Cankat Tilki | (参考訳) 我々は、H.Q.~Minh, L によって導入された正規化および多視点支援ベクトル機械学習問題の局所化バージョンに対するいくつかの代表者定理を証明した。
~murino, \textit{journal of machine learning research}, \textbf{17}(2016) 1--72, 演算子値の正の半定義核とその再生成核ヒルベルト空間を含む。
一般化されたフレームワークは無限次元の入力空間と非凸損失関数を特別な場合、特に損失関数が g\^ateaux 微分可能である場合に許容する。
部分非線形問題につながる指数最小二乗損失関数について、詳細な計算が提供される。 We prove a few representer theorems for a localised version of the regularised and multiview support vector machine learning problem introduced by H.Q.~Minh, L.~Bazzani, and V.~Murino, \textit{Journal of Machine Learning Research}, \textbf{17}(2016) 1--72, that involves operator valued positive semidefinite kernels and their reproducing kernel Hilbert spaces. The results concern general cases when convex or nonconvex loss functions and finite or infinite dimensional input spaces are considered. We show that the general framework allows infinite dimensional input spaces and nonconvex loss functions for some special cases, in particular in case the loss functions are G\^ateaux differentiable. Detailed calculations are provided for the exponential least squares loss functions that leads to partially nonlinear problems. | 翻訳日:2023-04-13 15:47:08 公開日:2023-04-12 |
# 普遍偏光変換:深層学習による回折偏光変換を用いた偏光散乱行列の空間計画 Universal Polarization Transformations: Spatial programming of polarization scattering matrices using a deep learning-designed diffractive polarization transformer ( http://arxiv.org/abs/2304.05724v1 ) ライセンス: Link先を確認 | Yuhang Li, Jingxi Li, Yifan Zhao, Tianyi Gan, Jingtian Hu, Mona Jarrahi, Aydogan Ozcan | (参考訳) 本研究では,任意の位置の偏光状態と入力フィールドオブビュー(fov)間の任意に選択された複素値の偏光散乱行列を合成できる,工学的回折体積に基づく普遍偏光トランスを示す。
本研究では,N_i と N_o が入力と出力の FOV の画素数を表すため,N_i x N_o = 10,000 個の異なる空間符号化偏光散乱行列を単一拡散体積内に無視可能な誤差で実装できることを実証した。
本研究では, ワイヤグリッド偏光子を作製し, 3dプリント回折層と一体化し, 0.75 mm 波長の物理偏光トランスを形成することにより, スペクトルのテラヘルツ部におけるこの普遍偏光変換の枠組みを実験的に検証した。
このフレームワークは、ユニバーサル偏光制御のための新しい光学デバイスを開発するための新しい道を開き、リモートセンシング、医療画像、セキュリティ、材料検査、機械ビジョンなどの様々な応用を見出すことができる。 We demonstrate universal polarization transformers based on an engineered diffractive volume, which can synthesize a large set of arbitrarily-selected, complex-valued polarization scattering matrices between the polarization states at different positions within its input and output field-of-views (FOVs). This framework comprises 2D arrays of linear polarizers with diverse angles, which are positioned between isotropic diffractive layers, each containing tens of thousands of diffractive features with optimizable transmission coefficients. We demonstrate that, after its deep learning-based training, this diffractive polarization transformer could successfully implement N_i x N_o = 10,000 different spatially-encoded polarization scattering matrices with negligible error within a single diffractive volume, where N_i and N_o represent the number of pixels in the input and output FOVs, respectively. We experimentally validated this universal polarization transformation framework in the terahertz part of the spectrum by fabricating wire-grid polarizers and integrating them with 3D-printed diffractive layers to form a physical polarization transformer operating at 0.75 mm wavelength. Through this set-up, we demonstrated an all-optical polarization permutation operation of spatially-varying polarization fields, and simultaneously implemented distinct spatially-encoded polarization scattering matrices between the input and output FOVs of a compact diffractive processor that axially spans 200 wavelengths. This framework opens up new avenues for developing novel optical devices for universal polarization control, and may find various applications in, e.g., remote sensing, medical imaging, security, material inspection and machine vision. | 翻訳日:2023-04-13 15:41:49 公開日:2023-04-12 |
# 格子ゲージ理論とサブシステム符号の相互作用 Interplay between lattice gauge theory and subsystem codes ( http://arxiv.org/abs/2304.05718v1 ) ライセンス: Link先を確認 | Yoshihito Kuno, Ikuo Ichinose | (参考訳) 2+1)次元の特定の開境界条件におけるz_2$格子ゲージヒッグスモデルについて検討した。
Wilson と 't Hooft ループによって与えられる一形式対称性とモデルの双対性は、位相構造の同定に重要な役割を果たす。
サブシステムのコードに関する一般的な議論に加えて、higgs と confinement のフェーズにおけるコード(エンコード qubit)の具体的記述も与える。
本研究は,閉じ込め相がヒッグス相として対称性保護位相であることを明らかにする。 We study $Z_2$ lattice gauge-Higgs model in (2+1)-dimensions with specific open boundary conditions. Suitable order parameters are supplied by the boundary conditions, which distinguish the Higgs and confinement phases, and they are conjugate with each other. One-form symmetries given by Wilson and 't Hooft loops, as well as duality of the model, play an important role for the identification of the phase structure. Gauss-law constraints are regarded as stabilizers inherent subsystem codes. The order parameters are nothing but logical operators in subsystem codes, and mixed anomaly of them dictates the existence of boundary zero modes and the degeneracy of states even in very high-energy levels. Subsystem codes are embedded in the Higgs and confinement phases. In addition to general argument on the subsystem code, we give concrete description of the code (encoded qubit) in the Higgs and confinement phases, which are dual with each other. Numerical methods are used to corroborate analytically-obtained results. The present work reveals that the confinement phase is a symmetry-protected-topological phase as the Higgs phase. | 翻訳日:2023-04-13 15:41:17 公開日:2023-04-12 |
# 擬似深さが最小ユーザ誘導によるオープンワールドオブジェクトセグメンテーションに及ぼす影響 Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance ( http://arxiv.org/abs/2304.05716v1 ) ライセンス: Link先を確認 | Robin Sch\"on, Katja Ludwig, Rainer Lienhart | (参考訳) 擬似深度マップ(Pseudo depth map)は、訓練中に真理として使用される深度マップの述語である。
Semantic境界データセットでは、トレーニング中にトレーニングクラスの半分しか使用せず、深度マップのみのセグメンテーションを実行する場合、目に見えないクラスのIoUスコアが61.57$から69.79$に改善されます。 Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object's surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization performance from seen to unseen object types when depth is used. On the Semantic Boundaries Dataset we achieve an improvement from $61.57$ to $69.79$ IoU score on unseen classes, when only using half of the training classes during training and performing the segmentation on depth maps only. | 翻訳日:2023-04-13 15:40:58 公開日:2023-04-12 |
# 資源測度の量子古典分解によるR'enyiエントロピーの非対称性およびより厳密な不確実性関係 Asymmetry and tighter uncertainty relations for R\'enyi entropies via quantum-classical decompositions of resource measures ( http://arxiv.org/abs/2304.05704v1 ) ライセンス: Link先を確認 | Michael J. W. Hall | (参考訳) 量子可観測物の分散とエントロピーは、本質的に量子的および古典的な寄与に分解されることが知られている。
これらの関係は量子古典的分解に言及せずに解釈でき、一方の観測可能な非対称性を他方のエントロピーの観点で束縛するトレードオフ関係として解釈できる。 It is known that the variance and entropy of quantum observables decompose into intrinsically quantum and classical contributions. Here a general method of constructing quantum-classical decompositions of resources such as uncertainty is discussed, with the quantum contribution specified by a measure of the noncommutativity of a given set of operators relative to the quantum state, and the classical contribution generated by the mixedness of the state. Suitable measures of noncommutativity or `quantumness' include quantum Fisher information and the asymmetry of a given set, group or algebra of operators, and are generalised to nonprojective observables and quantum channels. Strong entropic uncertainty relations and lower bounds for R\'enyi entropies are obtained, valid for both projective and nonprojective observables, that take the mixedness of the state into account via a classical contribution to the lower bound. These relations can also be interpreted without reference to quantum-classical decompositions, as tradeoff relations that bound the asymmetry of one observable in terms of the entropy of another. | 翻訳日:2023-04-13 15:40:36 公開日:2023-04-12 |
# ダイナミックモーションプリミティブによるコンプライアンス強化による人間ロボットのスキル伝達 Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives ( http://arxiv.org/abs/2304.05703v1 ) ライセンス: Link先を確認 | Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, and Homayoun Najjaran | (参考訳) ロボットの軌道を適応させる効率的な方法を見つけることは、ロボットの全体的な性能を改善するための優先事項である。
Dynamic Movement Primitives (DMP) フレームワークは、LfDのこの制限に対して実行可能なソリューションであるが、定式化において2階のダイナミクスをチューニングする必要がある。
その結果、ロボットの性能は安定し、蓄積した距離誤差に基づいて高い人間の類似性を保ち、最高のヒューリスティックチューニングを実現した。 Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning. | 翻訳日:2023-04-13 15:40:12 公開日:2023-04-12 |
# 量子力学におけるベル作用素の表現について On the representations of Bell's operators in Quantum Mechanics ( http://arxiv.org/abs/2304.05696v1 ) ライセンス: Link先を確認 | Silvio Paolo Sorella | (参考訳) ヒルベルト空間の次元が 2 より大きいとき、ベル=チェシュの不等式に入るベル作用素は同値なユニタリ行列表現を示す。
この特徴は、ベル-CHSHの不等式をテストするために使用される絡み合った状態のモード間のペアリング機構に依存している。 We point out that, when the dimension of the Hilbert space is greater than two, Bell's operators entering the Bell-CHSH inequality do exhibit inequivalent unitary matrix representations. Although the Bell-CHSH inequality turns out to be violated, the size of the violation is different for different representations, the maximum violation being given by Tsirelson's bound. The feature relies on a pairing mechanism between the modes of the entangled state employed to test the Bell-CHSH inequality. | 翻訳日:2023-04-13 15:39:47 公開日:2023-04-12 |
# 3次元点雲分類のための多次元形状認識トランス Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification ( http://arxiv.org/abs/2304.05694v1 ) ライセンス: Link先を確認 | Xian Wei, Muyu Wang, Shing-Ho Jonathan Lin, Zhengyu Li, Jian Yang, Arafat Al-Jawari, Xuan Tang | (参考訳) セルフアテンションモジュールは、長距離リレーションシップをキャプチャし、ポイントクラウドタスクのパフォーマンスを改善する際、顕著な機能を示した。
これらの問題に対処するため,本研究では,多スケール幾何対応トランス (MGT) を用いた自己注意型プラグインモジュールを提案する。
実験の結果,MGTは自己保持機構を用いてマルチスケールの幾何を捕捉する能力を大幅に向上し,主流のクラウドベンチマーク上での強力な競争性能を実現することが示された。 Self-attention modules have demonstrated remarkable capabilities in capturing long-range relationships and improving the performance of point cloud tasks. However, point cloud objects are typically characterized by complex, disordered, and non-Euclidean spatial structures with multiple scales, and their behavior is often dynamic and unpredictable. The current self-attention modules mostly rely on dot product multiplication and dimension alignment among query-key-value features, which cannot adequately capture the multi-scale non-Euclidean structures of point cloud objects. To address these problems, this paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT). MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects. At first, the MGT divides point cloud data into patches with multiple scales. Secondly, a local feature extractor based on sphere mapping is proposed to explore the geometry inner each patch and generate a fixed-length representation for each patch. Thirdly, the fixed-length representations are fed into a novel geodesic-based self-attention to capture the global non-Euclidean geometry between patches. Finally, all the modules are integrated into the framework of MGT with an end-to-end training scheme. Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism and achieves strong competitive performance on mainstream point cloud benchmarks. | 翻訳日:2023-04-13 15:39:38 公開日:2023-04-12 |
# HybrIK-X:全体メッシュ回復のためのハイブリッド解析・ニューラル逆運動学 HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery ( http://arxiv.org/abs/2304.05690v1 ) ライセンス: Link先を確認 | Jiefeng Li, Siyuan Bian, Chao Xu, Zhicun Chen, Lixin Yang, Cewu Lu | (参考訳) 視覚的コンテンツから抽象的なポーズと形状パラメータを推測して全身メッシュを復元することで、現実的な構造を持つ3dボディを得ることができる。
全身の詳細を網羅的に把握するために,HybrIK-X という包括的枠組みをさらに発展させ,HybrIK を手話と表情で強化する。
コードと結果はhttps://jeffli.site/HybrIK-X/で確認できる。 Recovering whole-body mesh by inferring the abstract pose and shape parameters from visual content can obtain 3D bodies with realistic structures. However, the inferring process is highly non-linear and suffers from image-mesh misalignment, resulting in inaccurate reconstruction. In contrast, 3D keypoint estimation methods utilize the volumetric representation to achieve pixel-level accuracy but may predict unrealistic body structures. To address these issues, this paper presents a novel hybrid inverse kinematics solution, HybrIK, that integrates the merits of 3D keypoint estimation and body mesh recovery in a unified framework. HybrIK directly transforms accurate 3D joints to body-part rotations via twist-and-swing decomposition. The swing rotations are analytically solved with 3D joints, while the twist rotations are derived from visual cues through neural networks. To capture comprehensive whole-body details, we further develop a holistic framework, HybrIK-X, which enhances HybrIK with articulated hands and an expressive face. HybrIK-X is fast and accurate by solving the whole-body pose with a one-stage model. Experiments demonstrate that HybrIK and HybrIK-X preserve both the accuracy of 3D joints and the realistic structure of the parametric human model, leading to pixel-aligned whole-body mesh recovery. The proposed method significantly surpasses the state-of-the-art methods on various benchmarks for body-only, hand-only, and whole-body scenarios. Code and results can be found at https://jeffli.site/HybrIK-X/ | 翻訳日:2023-04-13 15:39:13 公開日:2023-04-12 |
# 複雑相互作用下での拡散に基づくマルチヒューマンモーション生成 InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions ( http://arxiv.org/abs/2304.05684v1 ) ライセンス: Link先を確認 | Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu | (参考訳) 最近、現実的な人間の動きを生み出すための拡散の進歩が著しく進んでいる。
さらに, 相互作用拡散モデルの学習中に対応する減衰スキームを備える空間関係を符号化する2つの新しい正規化項を導入する。
特に、従来の方法よりも多様で説得力のある2人の動作を生成し、人間のインタラクションに様々な下流の応用を可能にする。 We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the rich multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 16,756 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions. | 翻訳日:2023-04-13 15:38:43 公開日:2023-04-12 |
# 光スイッチを用いた時間ビングリーンベルガー-ホーネ-ザイリンガー状態の生成 Generation of a time-bin Greenberger--Horne--Zeilinger state with an optical switch ( http://arxiv.org/abs/2304.05683v1 ) ライセンス: Link先を確認 | Hsin-Pin Lo, Takuya Ikuta, Koji Azuma, Toshimori Honjo, William J. Munro, and Hiroki Takesue | (参考訳) 多成分絡み合いは量子情報処理において重要な資源であり、二成分系よりもよりリッチな現象と強い相関を示す。
我々の3光子GHZ状態は、長距離マルチユーザ量子通信に使用できると期待している。 Multipartite entanglement is a critical resource in quantum information processing that exhibits much richer phenomenon and stronger correlations than in bipartite systems. This advantage is also reflected in its multi-user applications. Although many demonstrations have used photonic polarization qubits, polarization-mode dispersion confines the transmission of photonic polarization qubits through an optical fiber. Consequently, time-bin qubits have a particularly important role to play in quantum communication systems. Here, we generate a three-photon time-bin Greenberger-Horne-Zeilinger (GHZ) state using a 2 x 2 optical switch as a time-dependent beam splitter to entangle time-bin Bell states from a spontaneous parametric down-conversion source and a weak coherent pulse. To characterize the three-photon time-bin GHZ state, we performed measurement estimation, showed a violation of the Mermin inequality, and used quantum state tomography to fully reconstruct a density matrix, which shows a state fidelity exceeding 70%. We expect that our three-photon time-bin GHZ state can be used for long-distance multi-user quantum communication. | 翻訳日:2023-04-13 15:38:23 公開日:2023-04-12 |
# 線形畳み込みネットワークの機能空間と臨界点 Function Space and Critical Points of Linear Convolutional Networks ( http://arxiv.org/abs/2304.05752v1 ) ライセンス: Link先を確認 | Kathl\'en Kohn, Guido Mont\'ufar, Vahid Shahverdi, Matthew Trager | (参考訳) 1次元畳み込み層を有する線形ネットワークの幾何構造について検討する。
この性質は、密度線形ネットワークや直線畳み込みネットワークでは偽であることが知られている。 We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one. | 翻訳日:2023-04-13 15:31:10 公開日:2023-04-12 |
# Segment Anythingは必ずしも完璧ではない: SAMによる現実世界のさまざまなアプリケーションに関する調査 Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications ( http://arxiv.org/abs/2304.05750v1 ) ライセンス: Link先を確認 | Wei Ji, Jingjing Li, Qi Bi, Wenbo Li, Li Cheng | (参考訳) 最近、Meta AI Researchは、前例のないほど大きなセグメンテーションデータセット(SA-1B)で事前訓練された、一般的な、プロンプト可能なセグメンテーションモデル(SAM)にアプローチしている。
本研究では, 自然画像, 農業, 製造, リモートセンシング, 医療など, 様々な応用分野におけるsamの性能について, 一連の興味深い調査を行った。
この研究は、将来の一般的なセグメンテーションに向けた研究活動を促進する洞察を提供するものと期待されている。 Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B). Without a double, the emergence of SAM will yield significant benefits for a wide array of practical image segmentation applications. In this study, we conduct a series of intriguing investigations into the performance of SAM across various applications, particularly in the fields of natural images, agriculture, manufacturing, remote sensing, and healthcare. We analyze and discuss the benefits and limitations of SAM and provide an outlook on future development of segmentation tasks. Note that our work does not intend to propose new algorithms or theories, but rather provide a comprehensive view of SAM in practice. This work is expected to provide insights that facilitate future research activities toward generic segmentation. | 翻訳日:2023-04-13 15:31:00 公開日:2023-04-12 |
# データ拡張による連続時間動的グラフネットワークの長期予測性能の向上 Boosting long-term forecasting performance for continuous-time dynamic graph networks via data augmentation ( http://arxiv.org/abs/2304.05749v1 ) ライセンス: Link先を確認 | Yuxing Tian, Mingjie Zhu, Jiachi Luo, Song Li | (参考訳) 本研究では,実世界のモデリングにおいて重要な連続時間動的グラフネットワーク(CTDGN)の長期予測(LTF)に焦点を当てた。
本研究では,ctdgnsの中間層への埋め込みに不確実性を導入するために不確実性推定を行うためのプラグ・アンド・プレイモジュールである,ununderline{m}ix\underline{u}p (ummu) を提案する。
実世界の3つの動的グラフデータセットの総合的な実験を行い、UmmUがCTDGNの長期予測性能を効果的に向上できることを示した。 This study focuses on long-term forecasting (LTF) on continuous-time dynamic graph networks (CTDGNs), which is important for real-world modeling. Existing CTDGNs are effective for modeling temporal graph data due to their ability to capture complex temporal dependencies but perform poorly on LTF due to the substantial requirement for historical data, which is not practical in most cases. To relieve this problem, a most intuitive way is data augmentation. In this study, we propose \textbf{\underline{U}ncertainty \underline{M}asked \underline{M}ix\underline{U}p (UmmU)}: a plug-and-play module that conducts uncertainty estimation to introduce uncertainty into the embedding of intermediate layer of CTDGNs, and perform masked mixup to further enhance the uncertainty of the embedding to make it generalize to more situations. UmmU can be easily inserted into arbitrary CTDGNs without increasing the number of parameters. We conduct comprehensive experiments on three real-world dynamic graph datasets, the results demonstrate that UmmU can effectively improve the long-term forecasting performance for CTDGNs. | 翻訳日:2023-04-13 15:30:47 公開日:2023-04-12 |
# 非エルミート系における位相的モノモード Topological Monomodes in non-Hermitian Systems ( http://arxiv.org/abs/2304.05748v1 ) ライセンス: Link先を確認 | E. Slootman, W. Cherifi, L. Eek, R. Arouca, M. Bourennane, C. Morais Smith | (参考訳) 損失工学によって生成された非エルミート系におけるトポロジカルモノモデの存在を理論的、実験的に示す。
これは、エッジ状態は必ず $\mathbb{z}_2$ symmetry-protected topological systems の対になるという考えに挑戦する。
非エルミート 1D と 2D SSH モデルにおけるモノモードの存在を理論的に示す。
この理論を裏付けるために、非エルミート1D SSH鎖でモノモードが観測されるフォトニック格子の実験を行う。 We show theoretically and experimentally the existence of topological monomodes in non-Hermitian systems created by loss engineering. This challenges the idea that edge states always come in pairs in $\mathbb{Z}_2$ symmetry-protected topological systems. We theoretically show the existence of a monomode in a non-Hermitian 1D and 2D SSH models. Furthermore, we classify the systems in terms of the (non-Hermitian) symmetries that are present and calculate the corresponding topological invariant. To corroborate the theory, we present experiments in photonic lattices in which a monomode is observed in a non-Hermitian 1D SSH chain. | 翻訳日:2023-04-13 15:30:24 公開日:2023-04-12 |
# 深層学習を用いた胎児画像中の物体の探索と検出の学習 Learning to search for and detect objects in foveal images using deep learning ( http://arxiv.org/abs/2304.05741v1 ) ライセンス: Link先を確認 | Beatriz Paula and Plinio Moreno | (参考訳) 人間の視覚システムは解像度の異なる画像を処理し、葉は網膜のごく一部であり、最高視力領域を捉え、視野の周囲に向かって徐々に減少する。
両タスクの相補的な性質から,学習プロセスは知識の共有から恩恵を受け,前回のアプローチのベースラインスコアと比較した場合のパフォーマンスが向上することがわかった。 The human visual system processes images with varied degrees of resolution, with the fovea, a small portion of the retina, capturing the highest acuity region, which gradually declines toward the field of view's periphery. However, the majority of existing object localization methods rely on images acquired by image sensors with space-invariant resolution, ignoring biological attention mechanisms. As a region of interest pooling, this study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image. The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene. Throughout this two-stage pipeline method, we investigate the varying results obtained by utilizing high-level or panoptic features and provide a ground-truth label function for fixation sequences that is smoother, considering in a better way the spatial structure of the problem. Finally, we present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks. We conclude that, due to the complementary nature of both tasks, the training process benefited from the sharing of knowledge, resulting in an improvement in performance when compared to the previous approach's baseline scores. | 翻訳日:2023-04-13 15:30:13 公開日:2023-04-12 |
# 機械学習説明における不確かさのコミュニケーション:予測プロセスモニタリングのための可視化分析アプローチ Communicating Uncertainty in Machine Learning Explanations: A Visualization Analytics Approach for Predictive Process Monitoring ( http://arxiv.org/abs/2304.05736v1 ) ライセンス: Link先を確認 | Nijat Mehdiyev, Maxim Majlatow and Peter Fettke | (参考訳) データ駆動のインテリジェントシステムが進歩するにつれ、信頼性と透明性を備えた意思決定メカニズムの必要性がますます重要になっている。
本研究では,PDP(Partial Dependence Plots)やICE(Personal Conditional expectation)プロットなど,グローバルおよびローカルなポストホックな説明手法においてモデル不確実性を効果的に伝達する方法を検討する。
最後に,本研究は,提案手法の適合性を評価するためのエキスパートインタビューと,製造領域における実世界の予測プロセス監視問題に対するインタフェース設計を含む。 As data-driven intelligent systems advance, the need for reliable and transparent decision-making mechanisms has become increasingly important. Therefore, it is essential to integrate uncertainty quantification and model explainability approaches to foster trustworthy business and operational process analytics. This study explores how model uncertainty can be effectively communicated in global and local post-hoc explanation approaches, such as Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots. In addition, this study examines appropriate visualization analytics approaches to facilitate such methodological integration. By combining these two research directions, decision-makers can not only justify the plausibility of explanation-driven actionable insights but also validate their reliability. Finally, the study includes expert interviews to assess the suitability of the proposed approach and designed interface for a real-world predictive process monitoring problem in the manufacturing domain. | 翻訳日:2023-04-13 15:29:49 公開日:2023-04-12 |
# クロスドメイン疾患分類のためのマイズショットクラスインクリメンタルラーニング Few-shot Class-incremental Learning for Cross-domain Disease Classification ( http://arxiv.org/abs/2304.05734v1 ) ライセンス: Link先を確認 | Hao Yang, Weijian Huang, Jiarun Liu, Cheng Li, Shanshan Wang | (参考訳) 限られたサンプルから新しいクラスを段階的に学ぶ能力は、実際の臨床応用のための人工知能システムの開発に不可欠である。
本稿では,cdfscil(cross-domain few-shot incremental learning)問題について検討する。
MedMNISTの実験から,本手法の分類性能は,他の漸進学習法よりも優れていることが示された。 The ability to incrementally learn new classes from limited samples is crucial to the development of artificial intelligence systems for real clinical application. Although existing incremental learning techniques have attempted to address this issue, they still struggle with only few labeled data, particularly when the samples are from varied domains. In this paper, we explore the cross-domain few-shot incremental learning (CDFSCIL) problem. CDFSCIL requires models to learn new classes from very few labeled samples incrementally, and the new classes may be vastly different from the target space. To counteract this difficulty, we propose a cross-domain enhancement constraint and cross-domain data augmentation method. Experiments on MedMNIST show that the classification performance of this method is better than other similar incremental learning methods. | 翻訳日:2023-04-13 15:29:32 公開日:2023-04-12 |
# sketchanimar: スケッチに基づく3d動物の微細な検索 SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval ( http://arxiv.org/abs/2304.05731v1 ) ライセンス: Link先を確認 | Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Ngoc-Linh Nguyen-Ha, Tuong-Vy Truong-Thuy, Truong Hoai Phong, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, Hoai-Danh Vo, Minh-Hoa Doan, Hai-Dang Nguyen, Akihiro Sugimoto, Minh-Triet Tran | (参考訳) 近年、3dオブジェクトの検索は、コンピュータビジョン、コンピュータグラフィックス、仮想現実、拡張現実といった幅広い応用により、非常に重要になっている。
また,特徴抽出およびマッチング技術の改善や,検索性能を評価するためのより多様なデータセットの作成など,今後の研究分野に関する洞察も提供する。 The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches. Furthermore, a new dataset named ANIMAR was constructed in this study, comprising a collection of 711 unique 3D animal models and 140 corresponding sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs. Although further improvement is necessary, the proposed task has the potential to incentivize additional research in the domain of 3D object retrieval, potentially yielding benefits for a wide range of applications. We also provide insights into potential areas of future research, such as improving techniques for feature extraction and matching, and creating more diverse datasets to evaluate retrieval performance. | 翻訳日:2023-04-13 15:29:19 公開日:2023-04-12 |
# ニューラルネットワークを用いた動的グラフ表現学習:サーベイ Dynamic Graph Representation Learning with Neural Networks: A Survey ( http://arxiv.org/abs/2304.05729v1 ) ライセンス: Link先を確認 | Leshanshui Yang, S\'ebastien Adam, Cl\'ement Chatelain | (参考訳) 近年、動的グラフ(DG)表現は、トポロジ的情報と時間的情報の両方をコンパクトな表現に統合する能力により、動的システムのモデリングにますます利用されている。
最後に、動的グラフ学習問題に直面した場合のDGNNデザイナの一般的なガイドラインを提供する。 In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs allow to efficiently handle applications such as social network prediction, recommender systems, traffic forecasting or electroencephalography analysis, that can not be adressed using standard numeric representations. As a direct consequence of the emergence of dynamic graph representations, dynamic graph learning has emerged as a new machine learning problem, combining challenges from both sequential/temporal data processing and static graph learning. In this research area, Dynamic Graph Neural Network (DGNN) has became the state of the art approach and plethora of models have been proposed in the very recent years. This paper aims at providing a review of problems and models related to dynamic graph learning. The various dynamic graph supervised learning settings are analysed and discussed. We identify the similarities and differences between existing models with respect to the way time information is modeled. Finally, general guidelines for a DGNN designer when faced with a dynamic graph learning problem are provided. | 翻訳日:2023-04-13 15:28:59 公開日:2023-04-12 |
# ディープニューラルネットワークにおけるプリエンプティブプルーニングクリーバーハンス戦略 Preemptively Pruning Clever-Hans Strategies in Deep Neural Networks ( http://arxiv.org/abs/2304.05727v1 ) ライセンス: Link先を確認 | Lorenz Linhardt, Klaus-Robert M\"uller, Gr\'egoire Montavon | (参考訳) 説明可能なAIは、マシンラーニングモデルを検証するための一般的なツールになっている。
モデルに隠された欠陥を緩和することはできるが、我々は、肯定的な説明フィードバックの対象になっていないMLモデルの変動を選択的に誘発する新しい方法、Explaination-Guided Exposure Minimization (EGEM) を提供することによってこれを実証する。
自然画像データを用いた実験により,本手法は隠れたClever Hans戦略への依存を強く低減し,その結果,新たなデータに対する精度の向上につながることが示された。 Explainable AI has become a popular tool for validating machine learning models. Mismatches between the explained model's decision strategy and the user's domain knowledge (e.g. Clever Hans effects) have also been recognized as a starting point for improving faulty models. However, it is less clear what to do when the user and the explanation agree. In this paper, we demonstrate that acceptance of explanations by the user is not a guarantee for a ML model to function well, in particular, some Clever Hans effects may remain undetected. Such hidden flaws of the model can nevertheless be mitigated, and we demonstrate this by contributing a new method, Explanation-Guided Exposure Minimization (EGEM), that premptively prunes variations in the ML model that have not been the subject of positive explanation feedback. Experiments on natural image data demonstrate that our approach leads to models that strongly reduce their reliance on hidden Clever Hans strategies, and consequently achieve higher accuracy on new data. | 翻訳日:2023-04-13 15:28:42 公開日:2023-04-12 |
# エントロピー不確実性関係に基づく2量子絡み合いの逐次共有 Sequential sharing of two-qudit entanglement based on entropic uncertainty relation ( http://arxiv.org/abs/2304.05791v1 ) ライセンス: Link先を確認 | Ming-Liang Hu, Heng Fan | (参考訳) 絡み合いと不確かさの関係は量子論の2つの焦点である。
異なるポインターを用いた弱い測定による$(d\times d)$-dimensional系における絡み合い共有とエントロピー不確実性関係を関連づける。
我々は, 絡み合ったペアを複数のアリスに分散する一方の逐次測定と, 絡み合ったペアを複数のアリスとボブに分散する一方の逐次測定の両方のシナリオを考察する。
観測者の最大数は最大の絡み合いではなく、十分な絡み合いがある場合でも変化しない。 Entanglement and uncertainty relation are two focuses of quantum theory. We relate entanglement sharing to entropic uncertainty relation in a $(d\times d)$-dimensional system via weak measurements with different pointers. We consider both the scenarios of one-sided sequential measurements in which the entangled pair is distributed to multiple Alices and one Bob and two-sided sequential measurements in which the entangled pair is distributed to multiple Alices and Bobs. It is found that the maximum number of observers sharing the entanglement strongly depends on the measurement scenarios, the pointer states of the apparatus, and the local dimension $d$ of each subsystem, while the required minimum measurement precision to achieve entanglement sharing decreases to its asymptotic value with the increase of $d$. The maximum number of observers remain unaltered even when the state is not maximally entangled but has strong enough entanglement. | 翻訳日:2023-04-13 15:22:47 公開日:2023-04-12 |
# 次元の呪いを伴わない複合関数のディープニューラルネットワーク近似 Deep neural network approximation of composite functions without the curse of dimensionality ( http://arxiv.org/abs/2304.05790v1 ) ライセンス: Link先を確認 | Adrian Riekert | (参考訳) 本稿では,直交線形単位(ReLU)の活性化を伴うディープニューラルネットワーク(DNN)によって近似できる高次元連続関数の一般クラスを次元性の呪いなしに同定する。
このクラス内の関数は、積、最大値、ある種の平行化リプシッツ連続関数を含む特殊関数の合成の潜在的非有界数として表現することができる。 In this article we identify a general class of high-dimensional continuous functions that can be approximated by deep neural networks (DNNs) with the rectified linear unit (ReLU) activation without the curse of dimensionality. In other words, the number of DNN parameters grows at most polynomially in the input dimension and the approximation error. The functions in our class can be expressed as a potentially unbounded number of compositions of special functions which include products, maxima, and certain parallelized Lipschitz continuous functions. | 翻訳日:2023-04-13 15:22:30 公開日:2023-04-12 |
# 西スラヴ語モデルにおけるジェンダーバイアスの測定 Measuring Gender Bias in West Slavic Language Models ( http://arxiv.org/abs/2304.05783v1 ) ライセンス: Link先を確認 | Sandra Martinkov\'a, Karolina Sta\'nczak Isabelle Augenstein | (参考訳) トレーニング済みの言語モデルは、基礎となるデータセットからダウンストリームタスクへのバイアスを持続することが知られている。
チェコ語、スロバキア語、ポーランド語のモデルは、被検者として男性に対してより傷つきやすい完成をもたらしており、検査の結果、暴力、死、病気に関連する完成が原因であることが判明しました。 Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these findings are predominantly based on monolingual language models for English, whereas there are few investigative studies of biases encoded in language models for languages beyond English. In this paper, we fill this gap by analysing gender bias in West Slavic language models. We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects. We complete the sentences using both mono- and multilingual language models and assess their suitability for the masked language modelling objective. Next, we measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words. We find that these language models produce hurtful completions that depend on the subject's gender. Perhaps surprisingly, Czech, Slovak, and Polish language models produce more hurtful completions with men as subjects, which, upon inspection, we find is due to completions being related to violence, death, and sickness. | 翻訳日:2023-04-13 15:22:23 公開日:2023-04-12 |
# 複数RDF知識グラフを用いたChatGPT応答の強化 Using Multiple RDF Knowledge Graphs for Enriching ChatGPT Responses ( http://arxiv.org/abs/2304.05774v1 ) ライセンス: Link先を確認 | Michalis Mountantonakis and Yannis Tzitzikas | (参考訳) 人工知能のchatgptチャットボックスは、多くの知識分野にまたがって、詳細な回答と明瞭な回答を提供する。
本稿では,ChatGPT と RDF KGs の組み合わせを実現するために,GPToLODS と呼ばれる研究プロトタイプを提案する。
特に、LODシンデシスKG(400のRDF KGと4億1200万以上のエンティティの統合データを含む)への統計とハイパーリンクで応答の各エンティティを識別し、注釈する。
このようにして、エンティティの内容を充実させ、リアルタイムに応答の事実の事実チェックと検証を行うことが可能である。 There is a recent trend for using the novel Artificial Intelligence ChatGPT chatbox, which provides detailed responses and articulate answers across many domains of knowledge. However, in many cases it returns plausible-sounding but incorrect or inaccurate responses, whereas it does not provide evidence. Therefore, any user has to further search for checking the accuracy of the answer or/and for finding more information about the entities of the response. At the same time there is a high proliferation of RDF Knowledge Graphs (KGs) over any real domain, that offer high quality structured data. For enabling the combination of ChatGPT and RDF KGs, we present a research prototype, called GPToLODS, which is able to enrich any ChatGPT response with more information from hundreds of RDF KGs. In particular, it identifies and annotates each entity of the response with statistics and hyperlinks to LODsyndesis KG (which contains integrated data from 400 RDF KGs and over 412 million entities). In this way, it is feasible to enrich the content of entities and to perform fact checking and validation for the facts of the response at real time. | 翻訳日:2023-04-13 15:22:06 公開日:2023-04-12 |
# 肖像画の画質評価データセット An Image Quality Assessment Dataset for Portraits ( http://arxiv.org/abs/2304.05772v1 ) ライセンス: Link先を確認 | Nicolas Chahine, Ana-Stefania Calarasanu, Davide Garcia-Civiero, Theo Cayla, Sira Ferradans, Jean Ponce (NYU) | (参考訳) スマートフォンの写真の需要は年々増え続けており、特にポートレート写真の分野では増え続けている。
主観的な性質から,クラウドソーシングに広く用いられている平均意見スコア (mos) に欠けている特徴である iqa プロセスの一貫性を推定し,保証する必要がある。
提案された統計分析とBIQAアルゴリズムと共にデータセットが利用可能である。 Year after year, the demand for ever-better smartphone photos continues to grow, in particular in the domain of portrait photography. Manufacturers thus use perceptual quality criteria throughout the development of smartphone cameras. This costly procedure can be partially replaced by automated learning-based methods for image quality assessment (IQA). Due to its subjective nature, it is necessary to estimate and guarantee the consistency of the IQA process, a characteristic lacking in the mean opinion scores (MOS) widely used for crowdsourcing IQA. In addition, existing blind IQA (BIQA) datasets pay little attention to the difficulty of cross-content assessment, which may degrade the quality of annotations. This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. The dataset includes individuals of various genders and ethnicities who have given explicit and informed consent for their photographs to be used in public research. It is annotated by pairwise comparisons (PWC) collected from over 30 image quality experts for three image attributes: face detail preservation, face target exposure, and overall image quality. An in-depth statistical analysis of these annotations allows us to evaluate their consistency over PIQ23. Finally, we show through an extensive comparison with existing baselines that semantic information (image context) can be used to improve IQA predictions. The dataset along with the proposed statistical analysis and BIQA algorithms are available: https://github.com/DXOMARK-Research/PIQ2023 | 翻訳日:2023-04-13 15:21:45 公開日:2023-04-12 |
# 国勢調査データを用いた言語モデルにおける規範バイアスおよび記述バイアスの測定 Measuring Normative and Descriptive Biases in Language Models Using Census Data ( http://arxiv.org/abs/2304.05764v1 ) ライセンス: Link先を確認 | Samia Touileb, Lilja {\O}vrelid, Erik Velldal | (参考訳) 本稿では,性別に対する職業の分布が,事前学習された言語モデルにどのように反映されるかを検討する。
このアプローチは、国勢調査データやその他の人口統計変数の他の次元にも拡張することができる。 We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models. Such distributions are not always aligned to normative ideals, nor do they necessarily reflect a descriptive assessment of reality. In this paper, we introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive occupational distributions. To this end, we use official demographic information about gender--occupation distributions provided by the national statistics agencies of France, Norway, United Kingdom, and the United States. We manually generate template-based sentences combining gendered pronouns and nouns with occupations, and subsequently probe a selection of ten language models covering the English, French, and Norwegian languages. The scoring system we introduce in this work is language independent, and can be used on any combination of template-based sentences, occupations, and languages. The approach could also be extended to other dimensions of national census data and other demographic variables. | 翻訳日:2023-04-13 15:21:18 公開日:2023-04-12 |
# 技術実践における言語倫理 Languaging Ethics in Technology Practice ( http://arxiv.org/abs/2304.05761v1 ) ライセンス: Link先を確認 | Colin M. Gray, Shruthi Sai Chivukula, Janna Johns, Matthew Will, Ikechukwu Obi, Ziqing Li | (参考訳) 技術実践者によって具現化された倫理は、アイデンティティ、組織的、専門的な複雑さの相互作用に関する単純な定義に抵抗する。
3つの事例において、倫理が3つの重要な環境発生の領域にまたがる言語を通してどのように交渉されたかを述べる: 倫理に関する実践者の「中核的」信念、これらの中核的信念を形作り、あるいは媒介する内外的な生態要素、そして彼らが報告した最終的境界について。
これらの知見に基づいて,倫理の緩和が技術倫理研究,実践,教育において倫理に定義的かつ実践的に関与する機会を明らかにする。 Ethics as embodied by technology practitioners resists simple definition, particularly as it relates to the interplay of identity, organizational, and professional complexity. In this paper we use the linguistic notion of languaging as an analytic lens to describe how technology and design practitioners negotiate their conception of ethics as they reflect upon their everyday work. We engaged twelve practitioners in individual co-creation workshops, encouraging them to reflect on their ethical role in their everyday work through a series of generative and evaluative activities. We analyzed these data to identify how each practitioner reasoned about ethics through language and artifacts, finding that practitioners used a range of rhetorical tropes to describe their ethical commitments and beliefs in ways that were complex and sometimes contradictory. Across three cases, we describe how ethics was negotiated through language across three key zones of ecological emergence: the practitioner's "core" beliefs about ethics, internal and external ecological elements that shaped or mediated these core beliefs, and the ultimate boundaries they reported refusing to cross. Building on these findings, we describe how the languaging of ethics reveals opportunities to definitionally and practically engage with ethics in technology ethics research, practice, and education. | 翻訳日:2023-04-13 15:21:03 公開日:2023-04-12 |
# 2-Body Pose Forecastingのベストプラクティス Best Practices for 2-Body Pose Forecasting ( http://arxiv.org/abs/2304.05758v1 ) ライセンス: Link先を確認 | Muhammad Rameez Ur Rahman, Luca Scofano, Edoardo De Matteis, Alessandro Flaborea, Alessio Sampieri, Fabio Galasso | (参考訳) 協調的な人間のポーズ予測のタスクは、複数の相互作用する人々の将来のポーズを予測するためのものである。
プロジェクトのページはhttps://www.pinlab.org/bestpractices2bodyを参照。 The task of collaborative human pose forecasting stands for predicting the future poses of multiple interacting people, given those in previous frames. Predicting two people in interaction, instead of each separately, promises better performance, due to their body-body motion correlations. But the task has remained so far primarily unexplored. In this paper, we review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best for 2-body collaborative motion forecasting. Our study confirms the positive impact of frequency input representations, space-time separable and fully-learnable interaction adjacencies for the encoding GCN and FC decoding. Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding. We further contribute a novel initialization procedure for the 2-body spatial interaction parameters of the encoder, which benefits performance and stability. Altogether, our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset, whereby the novel initialization accounts for 3.5%. See our project page at https://www.pinlab.org/bestpractices2body | 翻訳日:2023-04-13 15:20:41 公開日:2023-04-12 |
# ALADIN-NST:ニューラル・スタイル・トランスファーによるアートスタイルの自己教師型非絡み合い表現学習 ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer ( http://arxiv.org/abs/2304.05755v1 ) ライセンス: Link先を確認 | Dan Ruta, Gemma Canet Tarres, Alex Black, Andrew Gilbert, John Collomosse | (参考訳) 表現学習(representation learning)は、与えられたサンプルの固有の特性をドメインごとに強く識別するコンパクトで記述的な形式で、ドメインの個々のサルエント特徴を発見することを目的としている。
学習信号の測定と駆動にはneural style transfer(nst)を使用し,明示的異種メトリクスを用いた最先端表現学習を実現する。
本稿では,スタイルとコンテンツの絡み合いに強く対処することで,スタイル固有のメトリクスが大幅に向上し,より少ない意味情報をエンコードし,下流のマルチモーダルアプリケーションにおいて最先端の精度が得られることを示す。 Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully achieved. Our paper aims to learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream multimodal applications. | 翻訳日:2023-04-13 15:20:20 公開日:2023-04-12 |
# ワイルドフェイスのアンチスプーフィングチャレンジ2023:ベンチマークと結果 Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results ( http://arxiv.org/abs/2304.05753v1 ) ライセンス: Link先を確認 | Dong Wang, Jia Guo, Qiqi Shao, Haochi He, Zhian Chen, Chuanbao Xiao, Ajian Liu, Sergio Escalera, Hugo Jair Escalante, Lei Zhen, Jun Wan, Jiankang Deng | (参考訳) 顔認証システム(FAS)は、顔認識システムの完全性を保護するための重要なメカニズムである。
これらの欠点に対処するために、制約のない環境で収集された大規模で多様なFASデータセットであるWFASデータセット(Wild Face Anti-Spoofing)を導入する。
WFASデータセットとプロトコル1(Known-Type)を活用して、CVPR2023ワークショップでWild Face Anti-Spoofing Challengeを開催します。
さらに,Protocol 1 とProtocol 2 (Unknown-Type) を用いた代表メソッドの評価を行った。
データセットはInsightfaceでリリースされている。 Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets, which often leads to overfitting during training or saturation during testing. In terms of quantity, the number of spoof subjects is a critical determinant. Most datasets comprise fewer than 2,000 subjects. With regard to diversity, the majority of datasets consist of spoof samples collected in controlled environments using repetitive, mechanical processes. This data collection methodology results in homogenized samples and a dearth of scenario diversity. To address these shortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, a large-scale, diverse FAS dataset collected in unconstrained settings. Our dataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 images of 148,169 live subjects, representing a substantial increase in quantity. Moreover, our dataset incorporates spoof data obtained from the internet, spanning a wide array of scenarios and various commercial sensors, including 17 presentation attacks (PAs) that encompass both 2D and 3D forms. This novel data collection strategy markedly enhances FAS data diversity. Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop. Additionally, we meticulously evaluate representative methods using Protocol 1 and Protocol 2 (Unknown-Type). Through an in-depth examination of the challenge outcomes and benchmark baselines, we provide insightful analyses and propose potential avenues for future research. The dataset is released under Insightface. | 翻訳日:2023-04-13 15:20:04 公開日:2023-04-12 |
# DiscoGen: 遺伝子制御ネットワークの発見を学ぶ DiscoGen: Learning to Discover Gene Regulatory Networks ( http://arxiv.org/abs/2304.05823v1 ) ライセンス: Link先を確認 | Nan Rosemary Ke, Sara-Jane Dunn, Jorg Bornschein, Silvia Chiappa, Melanie Rey, Jean-Baptiste Lespiau, Albin Cassirer, Jane Wang, Theophane Weber, David Barrett, Matthew Botvinick, Anirudh Goyal, Mike Mozer, Danilo Rezende | (参考訳) 遺伝子制御ネットワーク(GRN)の正確な推論は、生物学における重要な課題である。
我々のモデルはSOTAニューラルネットワークに基づく因果探索法より優れていることを示す。 Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery methods have significantly improved causal discovery, including handling interventional data, improvements in performance and scalability. However, applying state-of-the-art (SOTA) causal discovery methods in biology poses challenges, such as noisy data and a large number of samples. Thus, adapting the causal discovery methods is necessary to handle these challenges. In this paper, we introduce DiscoGen, a neural network-based GRN discovery method that can denoise gene expression measurements and handle interventional data. We demonstrate that our model outperforms SOTA neural network-based causal discovery methods. | 翻訳日:2023-04-13 15:13:03 公開日:2023-04-12 |
# DUFormer: 航空画像の電力線分割のための新しいアーキテクチャ DUFormer: A Novel Architecture for Power Line Segmentation of Aerial Images ( http://arxiv.org/abs/2304.05821v1 ) ライセンス: Link先を確認 | Deyu An, Qiang Zhang, Jianshu Chao, Ting Li, Feng Qiao, Yong Deng, Zhenpeng Bian, Jia Xu | (参考訳) 電力線は低高度で運用される無人航空機(uav)にとって重大な安全上の脅威となる。
提案手法は,TTPLAデータセット上での電力線セグメンテーションにおける最先端性能を実現することを実証した。 Power lines pose a significant safety threat to unmanned aerial vehicles (UAVs) operating at low altitudes. However, detecting power lines in aerial images is challenging due to the small size of the foreground data (i.e., power lines) and the abundance of background information. To address this challenge, we propose DUFormer, a semantic segmentation algorithm designed specifically for power line detection in aerial images. We assume that performing sufficient feature extraction with a convolutional neural network (CNN) that has a strong inductive bias is beneficial for training an efficient Transformer model. To this end, we propose a heavy token encoder responsible for overlapping feature re-mining and tokenization. The encoder comprises a pyramid CNN feature extraction module and a power line feature enhancement module. Following sufficient feature extraction for power lines, the feature fusion is carried out, and then the Transformer block is used for global modeling. The final segmentation result is obtained by fusing local and global features in the decode head. Additionally, we demonstrate the significance of the joint multi-weight loss function in power line segmentation. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance in power line segmentation on the publicly available TTPLA dataset. | 翻訳日:2023-04-13 15:12:50 公開日:2023-04-12 |
# グラデーションフリーテキストインバージョン Gradient-Free Textual Inversion ( http://arxiv.org/abs/2304.05818v1 ) ライセンス: Link先を確認 | Zhengcong Fei, Mingyuan Fan, Junshi Huang | (参考訳) 最近のパーソナライズされたテキスト・ツー・イメージ生成の研究は、通常、特定の主題やいくつかの画像のスタイルに特別なトークンを結び付けることを学習する。
提案手法を応用したテキスト・ツー・イメージモデルの性能は,GPU/CPUプラットフォームを改良したグラデーションベースモデルと同等であり,フレキシブルな採用,計算効率が向上することを示した。 Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent. It is natural to question whether we can optimize the textual inversions by only accessing the process of model inference. As only requiring the forward computation to determine the textual inversion retains the benefits of less GPU memory, simple deployment, and secure access for scalable models. In this paper, we introduce a \emph{gradient-free} framework to optimize the continuous textual inversion in an iterative evolutionary strategy. Specifically, we first initialize an appropriate token embedding for textual inversion with the consideration of visual and text vocabulary information. Then, we decompose the optimization of evolutionary strategy into dimension reduction of searching space and non-convex gradient-free optimization in subspace, which significantly accelerates the optimization process with negligible performance loss. Experiments in several applications demonstrate that the performance of text-to-image model equipped with our proposed gradient-free method is comparable to that of gradient-based counterparts with variant GPU/CPU platforms, flexible employment, as well as computational efficiency. | 翻訳日:2023-04-13 15:12:32 公開日:2023-04-12 |
# CEC:分散最適化のためのクラウドソーシングベースの進化計算 CEC: Crowdsourcing-based Evolutionary Computation for Distributed Optimization ( http://arxiv.org/abs/2304.05817v1 ) ライセンス: Link先を確認 | Feng-Feng Wei, Wei-Neng Chen, Xiao-Qi Guo, Bowen Zhao, Sang-Woon Jeon and Jun Zhang | (参考訳) クラウドソーシングは、複雑な問題を解決するために群衆の知性を活用する、新興のコンピューティングパラダイムである。
ベンチマーク関数と分散クラスタリング最適化問題の比較結果から,cecの可能性を示す。 Crowdsourcing is an emerging computing paradigm that takes advantage of the intelligence of a crowd to solve complex problems effectively. Besides collecting and processing data, it is also a great demand for the crowd to conduct optimization. Inspired by this, this paper intends to introduce crowdsourcing into evolutionary computation (EC) to propose a crowdsourcing-based evolutionary computation (CEC) paradigm for distributed optimization. EC is helpful for optimization tasks of crowdsourcing and in turn, crowdsourcing can break the spatial limitation of EC for large-scale distributed optimization. Therefore, this paper firstly introduces the paradigm of crowdsourcing-based distributed optimization. Then, CEC is elaborated. CEC performs optimization based on a server and a group of workers, in which the server dispatches a large task to workers. Workers search for promising solutions through EC optimizers and cooperate with connected neighbors. To eliminate uncertainties brought by the heterogeneity of worker behaviors and devices, the server adopts the competitive ranking and uncertainty detection strategy to guide the cooperation of workers. To illustrate the satisfactory performance of CEC, a crowdsourcing-based swarm optimizer is implemented as an example for extensive experiments. Comparison results on benchmark functions and a distributed clustering optimization problem demonstrate the potential applications of CEC. | 翻訳日:2023-04-13 15:12:12 公開日:2023-04-12 |
# ベル状態回転のベイズ推定 Bayesian Estimation for Bell State Rotations ( http://arxiv.org/abs/2304.05815v1 ) ライセンス: Link先を確認 | Luke Anastassiou, Jason F. Ralph, Simon Maskell, Pieter Kok | (参考訳) 本稿では,2量子ベル状態に対する3次元回転の影響を考察し,回転パラメータ推定のためのベイズ法を提案する。
また, 推定法の精度が混合状態の純度関数であることを示す。 This paper explores the effect of three-dimensional rotations on two-qubit Bell states and proposes a Bayesian method for the estimation of the parameters of the rotation. We use a particle filter to estimate the parameters of the rotation from a sequence of Bell state measurements and we demonstrate that the resultant improvement over the optimal single qubit case approaches the $\sqrt{2}$ factor that is consistent with the Heisenberg limit. We also demonstrate how the accuracy of the estimation method is a function of the purity of mixed states. | 翻訳日:2023-04-13 15:11:52 公開日:2023-04-12 |
# 分散進化計算に関する一考察 A Survey on Distributed Evolutionary Computation ( http://arxiv.org/abs/2304.05811v1 ) ライセンス: Link先を確認 | Wei-Neng Chen, Feng-Feng Wei, Tian-Fang Zhao, Kay Chen Tan and Jun Zhang | (参考訳) 並列および分散コンピューティングパラダイムの急速な発展は、コンピューティングに大きな革命をもたらした。
本稿では,分散ec (distributed ec,dec) に関する体系的レビューを行う。
第3に,dec の目的が空間的分散パラダイムのブームに伴う ec の新興かつ魅力的なトレンドであることに注目しながら,分散最適化を体系的に定義し,それを次元分散,データ分散,客観的分散最適化問題に分類する。
また、DECの設計を啓蒙し、今後の発展への道を開くことを目指して、課題や研究の方向性についても論じる。 The rapid development of parallel and distributed computing paradigms has brought about great revolution in computing. Thanks to the intrinsic parallelism of evolutionary computation (EC), it is natural to implement EC on parallel and distributed computing systems. On the one hand, the computing power provided by parallel computing systems can significantly improve the efficiency and scalability of EC. On the other hand, data are collected and processed in a distributed manner, which brings a novel development direction and new challenges to EC. In this paper, we intend to give a systematic review on distributed EC (DEC). First, a new taxonomy for DEC is proposed from top design mechanism to bottom implementation mechanism. Based on this taxonomy, existing studies on DEC are reviewed in terms of purpose, parallel structure of the algorithm, parallel model for implementation, and the implementation environment. Second, we clarify two major purposes of DEC, i.e., improving efficiency through parallel processing for centralized optimization and cooperating distributed individuals/sub-populations with partial information to perform distributed optimization. Third, noting that the latter purpose of DEC is an emerging and attractive trend for EC with the booming of spatially distributed paradigms, this paper gives a systematic definition of the distributed optimization and classifies it into dimension distributed-, data distributed-, and objective distributed-optimization problems. Formal formulations for these problems are provided and various DEC studies on these problems are reviewed. We also discuss challenges and potential research directions, aiming to enlighten the design of DEC and pave the way for future developments. | 翻訳日:2023-04-13 15:11:42 公開日:2023-04-12 |
# シュミド・ブルガダエフ散逸性量子相転移の観測 Observation of the Schmid-Bulgadaev dissipative quantum phase transition ( http://arxiv.org/abs/2304.05806v1 ) ライセンス: Link先を確認 | Roman Kuzmin, Nitish Mehta, Nicholas Grabon, Raymond A. Mencia, Amir Burshtein, Moshe Goldstein, Vladimir E. Manucharyan | (参考訳) 量子力学は多くのマクロ超伝導デバイスに適用されるが、基本的な予測は数十年にわたって議論を呼んだ。
すなわち、抵抗器に接続されたジョセフソン接合は、抵抗器の値が$h/4e^2 \approx 6.5~\textrm{k}\omega$(h$はプランク定数、e$は電子電荷)を超えると、超伝導体から絶縁体への散逸誘起量子相転移を受ける必要がある。
位相境界では、接合自体が理想的な抵抗として機能し、弾性散乱に加えて、入射光子は周波数非依存の確率で自発的に下向きに変換することができる。 Although quantum mechanics applies to many macroscopic superconducting devices, one basic prediction remained controversial for decades. Namely, a Josephson junction connected to a resistor must undergo a dissipation-induced quantum phase transition from superconductor to insulator once the resistor's value exceeds $h/4e^2 \approx 6.5~\textrm{k}\Omega$ ($h$ is Planck's constant, $e$ is the electron charge). Here we finally demonstrate this transition by observing the resistor's internal dynamics. Implementing our resistor as a long transmission line section, we find that a junction scatters electromagnetic excitations in the line as either inductance (superconductor) or capacitance (insulator), depending solely on the line's wave impedance. At the phase boundary, the junction itself acts as ideal resistance: in addition to elastic scattering, incident photons can spontaneously down-convert with a frequency-independent probability, which provides a novel marker of quantum-critical behavior. | 翻訳日:2023-04-13 15:11:18 公開日:2023-04-12 |
# 人工ニューラルネットワークによるGDPの今 : 長期記憶はどれくらい重要か? GDP nowcasting with artificial neural networks: How much does long-term memory matter? ( http://arxiv.org/abs/2304.05805v1 ) ライセンス: Link先を確認 | Krist\'of N\'emeth, D\'aniel Hadh\'azi | (参考訳) 本研究は,米国経済の四半期gdp成長率の予測に異なる統計モデルを適用した。
第1期(2010:q1 -- 2019:q4)はバランスの取れた経済成長を特徴とし、第2期(2010:q1 -- 2022:q3)は新型コロナウイルスの景気後退の時期を含む。
その結果, 同一パラメータの1次元CNNは, 両評価期間において, 正確な現在放送を生成することがわかった。
そこで本研究では,まず,このニューラルネットワークアーキテクチャを経済的な流れに利用することを提案する。 In our study, we apply different statistical models to nowcast quarterly GDP growth for the US economy. Using the monthly FRED-MD database, we compare the nowcasting performance of the dynamic factor model (DFM) and four artificial neural networks (ANNs): the multilayer perceptron (MLP), the one-dimensional convolutional neural network (1D CNN), the long short-term memory network (LSTM), and the gated recurrent unit (GRU). The empirical analysis presents the results from two distinctively different evaluation periods. The first (2010:Q1 -- 2019:Q4) is characterized by balanced economic growth, while the second (2010:Q1 -- 2022:Q3) also includes periods of the COVID-19 recession. According to our results, longer input sequences result in more accurate nowcasts in periods of balanced economic growth. However, this effect ceases above a relatively low threshold value of around six quarters (eighteen months). During periods of economic turbulence (e.g., during the COVID-19 recession), longer training sequences do not help the models' predictive performance; instead, they seem to weaken their generalization capability. Our results show that 1D CNN, with the same parameters, generates accurate nowcasts in both of our evaluation periods. Consequently, first in the literature, we propose the use of this specific neural network architecture for economic nowcasting. | 翻訳日:2023-04-13 15:10:56 公開日:2023-04-12 |
# Proximity Forest 2.0: 時系列の新しい有効でスケーラブルな類似性に基づく分類器 Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series ( http://arxiv.org/abs/2304.05800v1 ) ライセンス: Link先を確認 | Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb | (参考訳) 時系列分類(TSC)は、傾向、ばらつき、頻度、大きさ、および様々なパターンを含む様々な分類タスクに関連があるかもしれない機能の種類が異なるため、難しい課題である。
本稿では,新しい類似度ベース分類器である近接フォレストバージョン2.0 (pf 2.0) を提案し,類似度ベース手法が最良であるベンチマークにおいて,udrベンチマークで先行する類似度ベース分類器を上回り,最先端カーネル,ニューラルネットワーク,ハイブリッド手法を上回った。
pf 2.0は3つの最近の時系列類似度測定の進歩を取り入れている: (1) 弾性類似度計算を高速化するために、計算効率のよい早期放棄と刈り取り、(2) 新たな弾性類似度測定、 amerced dynamic time warping (adtw)、(3) コスト関数チューニング。
私たちは単一のC++フレームワークでPF 1.0とPF 2.0の両方を実装しました。 Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient. | 翻訳日:2023-04-13 15:10:30 公開日:2023-04-12 |
# 連続変数系におけるリウビリアン例外点 Liouvillian exceptional points in continuous variable system ( http://arxiv.org/abs/2304.05792v1 ) ライセンス: Link先を確認 | B. A. Tay | (参考訳) 一般環境における発振器の量子マルコフマスター方程式に対するリウヴィリア例外点を求める。
この状況はcaldeira-leggett (cl)方程式とhu-paz--zhang方程式のマルコフ極限によって示され、他のパラメータは振動子の有効質量を変化させ、非常に重い振動子の限界に達する。この状況はkossakowski-lindblad (kl)方程式の修正形によって示される。
我々は,cl方程式の一般化固有ベクトルを用いて,非減衰領域における発振器の最初の励起状態の緩和,例外点に対応する臨界減衰領域,過減衰領域の比較を行った。 The Liouvillian exceptional points for a quantum Markovian master equation of an oscillator in a generic environment are obtained. They occur at the points when the modified frequency of the oscillator vanishes, whereby the eigenvalues of the Liouvillian become real. In a generic system there are two parameters that modify the oscillator's natural frequency. One of the parameters can be the damping rate. The exceptional point then corresponds to critical damping of the oscillator. This situation is illustrated by the Caldeira--Leggett (CL) equation and the Markovian limit of the Hu--Paz--Zhang (HPZ) equation. The other parameter changes the oscillator's effective mass whereby the exceptional point is reached in the limit of extremely heavy oscillator. This situation is illustrated by a modified form of the Kossakowski--Lindblad (KL) equation. The eigenfunctions coalesce at the exceptional points and break into subspaces labelled by a natural number $N$. In each of the $N$-subspace, there is a $(N+1)$-fold degeneracy and the Liouvillian has a Jordan block structure of order-$(N+1)$. We obtain the explicit form of the generalized eigenvectors for a few Liouvillians. Because of the degeneracies, there is a freedom of choice in the generalized eigenfunctions. This freedom manifests itself as an invariance in the Jordan block structure under a similarity transformation whose form is obtained. We compare the relaxation of the first excited state of an oscillator in the underdamped region, critically damped region which corresponds to the exceptional point, and overdamped region using the generalized eigenvectors of the CL equation. | 翻訳日:2023-04-13 15:10:00 公開日:2023-04-12 |
# 線上の線伸張ダンクル発振器 Rationally-extended Dunkl oscillator on the line ( http://arxiv.org/abs/2304.05846v1 ) ライセンス: Link先を確認 | C. Quesne | (参考訳) ダンクル多項式による通常の微分の置き換えと古典直交多項式の例外的直交多項式の置き換えと、正確に解ける量子力学的問題の拡張は容易に結合できることが示されている。
対応する波動関数は、X_m$-Laguerre の3つの異なるタイプの直交多項式の項で定義される、例外的直交一般化エルミート多項式の項で表される。
さらに、拡張ダンクル振動子ハミルトニアンは、拡張ダンクル微分といくつかの非調和振動子ポテンシャルの観点から表現可能であることが示されている。 It is shown that the extensions of exactly-solvable quantum mechanical problems connected with the replacement of ordinary derivatives by Dunkl ones and with that of classical orthogonal polynomials by exceptional orthogonal ones can be easily combined. For such a purpose, the example of the Dunkl oscillator on the line is considered and three different types of rationally-extended Dunkl oscillators are constructed. The corresponding wavefunctions are expressed in terms of exceptional orthogonal generalized Hermite polynomials, defined in terms of the three different types of $X_m$-Laguerre exceptional orthogonal polynomials. Furthermore, the extended Dunkl oscillator Hamiltonians are shown to be expressible in terms of some extended Dunkl derivatives and some anharmonic oscillator potentials. | 翻訳日:2023-04-13 15:03:56 公開日:2023-04-12 |
# Dense RetrievalのFew-Shot能力の再考 Rethinking Dense Retrieval's Few-Shot Ability ( http://arxiv.org/abs/2304.05845v1 ) ライセンス: Link先を確認 | Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan Liu, Deiming Ye and Jie Bao | (参考訳) いくつかのサンプルを学習することで,新たな検索シナリオを効果的に一般化することを目的としている。
私たちのコードとデータはhttps://github.com/OpenMatch/ANCE-Tele.comでオープンソース化されます。 Few-shot dense retrieval (DR) aims to effectively generalize to novel search scenarios by learning a few samples. Despite its importance, there is little study on specialized datasets and standardized evaluation protocols. As a result, current methods often resort to random sampling from supervised datasets to create "few-data" setups and employ inconsistent training strategies during evaluations, which poses a challenge in accurately comparing recent progress. In this paper, we propose a customized FewDR dataset and a unified evaluation benchmark. Specifically, FewDR employs class-wise sampling to establish a standardized "few-shot" setting with finely-defined classes, reducing variability in multiple sampling rounds. Moreover, the dataset is disjointed into base and novel classes, allowing DR models to be continuously trained on ample data from base classes and a few samples in novel classes. This benchmark eliminates the risk of novel class leakage, providing a reliable estimation of the DR model's few-shot ability. Our extensive empirical results reveal that current state-of-the-art DR models still face challenges in the standard few-shot scene. Our code and data will be open-sourced at https://github.com/OpenMatch/ANCE-Tele. | 翻訳日:2023-04-13 15:03:43 公開日:2023-04-12 |
# 振幅減衰に対する量子ゼノ効果の一般ポインターベースへの応用 Quantum Zeno Effect applied to amplitude damping on a general pointer basis ( http://arxiv.org/abs/2304.05843v1 ) ライセンス: Link先を確認 | Guilherme Zambon, Diogo O. Soares-Pinto | (参考訳) 量子システムにおける情報保存のためのプロトコルの開発は、現実的な量子計算を実装するための中心的な探求である。
ここでは, 従来の仮定から一歩離れて, 物理キュービットの古典ビットを単一の計算ステップで保存する確率を解析し, キュービットが自由に進化する場合と繰り返し測定される場合の両方について解析する。
この最後の結果は、情報が連続的に環境に失われているとき、実際の量子計算を行うには、情報を何らかの形でシステムに戻さなければならないことを示し、オープン量子システムにおけるノイズを減らすことを目的としたあらゆる技術の中核的な特徴として強調する。 Developing protocols for preserving information in quantum systems is a central quest for implementing realistic quantum computation. However, many of the most promising approaches to this problem rely on hypotheses that may not reflect practical physical scenarios, like knowing the exact dynamics of the qubit-environment system or being able to store an informational qubit in multiple physical qubits. Here, we step away from these usual assumptions and analyze the probability of successfully storing a classical bit of information on a physical qubit during a single computational step, both for the case in which the qubit evolves freely and also when it is subject to a sequence of repeated measurements. The setup consists of a qubit coupled to a heat bath at finite temperature, whose dynamics is given by a generalized amplitude damping channel in a pointer basis that does not necessarily coincide with the computational basis of the qubit. We first show that requiring the dynamics to be Markovian implies an exponential decay of the pointer basis' populations. Then, we obtain the success probability as function of time and angle $\theta_0$ between the initial state of the qubit and the ground state of the pointer basis. Finally, we calculate these probabilities for the Zeno effective dynamics and show that they are never larger than those for the free evolution, implying that a repeated measurements protocol cannot improve the probability of a successful storage in our model. This last result indicates that to perform realistic quantum computation, when information is being continuously lost to the environment, the information must be somehow driven back into the system, highlighting this as the core feature of any technique that aims at reducing noise in open quantum systems. | 翻訳日:2023-04-13 15:03:23 公開日:2023-04-12 |
# 教師なしビデオ異常検出のための拡散モデル探索 Exploring Diffusion Models for Unsupervised Video Anomaly Detection ( http://arxiv.org/abs/2304.05841v1 ) ライセンス: Link先を確認 | Anil Osman Tur and Nicola Dall'Asen and Cigdem Beyan and Elisa Ricci | (参考訳) 本稿では,ビデオ異常検出(VAD)における拡散モデルの性能について,データアノテーションを使用しない最も困難なシナリオについても検討する。 希薄で、多様で、文脈的であり、しばしば曖昧であるので、異常事象を正確に検出することは非常に野心的な作業である。 この目的のために,情報豊富な時空間データのみに依存し,高い再構成誤差を生かした拡散モデルの再構成能力を用いて異常を判定する。 2つの大規模ビデオ異常検出データセットを用いて行った実験は、提案手法の |