Related papers: Mapping the Internet: Modelling Entity Interactions in Complex Heterogeneous Networks

Mapping the Internet: Modelling Entity Interactions in Complex Heterogeneous Networks

URL: http://arxiv.org/abs/2104.09650v1
Date: Mon, 19 Apr 2021 21:32:44 GMT
Title: Mapping the Internet: Modelling Entity Interactions in Complex Heterogeneous Networks
Authors: \v{S}imon Mandl\'ik and Tom\'a\v{s} Pevn\'y
Abstract summary: We propose a versatile, unified framework called HMill' for sample representation, model definition and training. We show an extension of the universal approximation theorem to the set of all functions realized by models implemented in the framework. We solve three different problems from the cybersecurity domain using the framework.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Even though machine learning algorithms already play a significant role in data science, many current methods pose unrealistic assumptions on input data. The application of such methods is difficult due to incompatible data formats, or heterogeneous, hierarchical or entirely missing data fragments in the dataset. As a solution, we propose a versatile, unified framework called `HMill' for sample representation, model definition and training. We review in depth a multi-instance paradigm for machine learning that the framework builds on and extends. To theoretically justify the design of key components of HMill, we show an extension of the universal approximation theorem to the set of all functions realized by models implemented in the framework. The text also contains a detailed discussion on technicalities and performance improvements in our implementation, which is published for download under the MIT License. The main asset of the framework is its flexibility, which makes modelling of diverse real-world data sources with the same tool possible. Additionally to the standard setting in which a set of attributes is observed for each object individually, we explain how message-passing inference in graphs that represent whole systems of objects can be implemented in the framework. To support our claims, we solve three different problems from the cybersecurity domain using the framework. The first use case concerns IoT device identification from raw network observations. In the second problem, we study how malicious binary files can be classified using a snapshot of the operating system represented as a directed graph. The last provided example is a task of domain blacklist extension through modelling interactions between entities in the network. In all three problems, the solution based on the proposed framework achieves performance comparable to specialized approaches.

Related papers

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism [67.56918651825056]
We propose a new decoder architecture with the parallel Multi-time Inquiries (MI) mechanism. Our MI based model, MI-DETR, outperforms all existing DETR-like models on COCO benchmark. A series of diagnostic and visualization experiments demonstrate the effectiveness, rationality, and interpretability of MI.
arXiv Detail & Related papers (2025-03-03T12:19:06Z)
Generic Multi-modal Representation Learning for Network Traffic Analysis [6.372999570085887]
Network traffic analysis is fundamental for network management, troubleshooting, and security. We propose a flexible Multi-modal Autoencoder (MAE) pipeline that can solve different use cases. We argue that the MAE architecture is generic and can be used to learn representations useful in multiple scenarios.
arXiv Detail & Related papers (2024-05-04T12:24:29Z)
An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources. We propose a data processing framework that integrates a Processing Module and an Analyzing Module. The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z)
Hawk: An Industrial-strength Multi-label Document Classifier [0.0]
The paper describes the significance of these problems in detail and proposes a unique neural network architecture that addresses the above problems. A hydranet-like architecture is designed to have granular control over and improve the modularity, coupled with a weighted loss driving task-specific heads. The experimental results reveal that the proposed model outperforms the existing methods by a substantial margin.
arXiv Detail & Related papers (2023-01-15T09:52:18Z)
FV-UPatches: Enhancing Universality in Finger Vein Recognition [0.6299766708197883]
We propose a universal learning-based framework, which achieves generalization while training with limited data. The proposed framework shows application potential in other vein-based biometric recognition as well.
arXiv Detail & Related papers (2022-06-02T14:20:22Z)
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition. We introduce the object query for segmentation and the attribute query for attribute prediction. For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z)
Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder. We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets. We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Explaining Representation by Mutual Information [0.0]
We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power.
arXiv Detail & Related papers (2021-03-28T12:26:56Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
Towards a Flexible Embedding Learning Framework [15.604564543883122]
We propose an embedding learning framework that is flexible in terms of the relationships that can be embedded into the learned representations. A sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings. Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks.
arXiv Detail & Related papers (2020-09-23T08:00:56Z)
iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels. It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.