Mapping the Internet: Modelling Entity Interactions in Complex
Heterogeneous Networks
- URL: http://arxiv.org/abs/2104.09650v1
- Date: Mon, 19 Apr 2021 21:32:44 GMT
- Title: Mapping the Internet: Modelling Entity Interactions in Complex
Heterogeneous Networks
- Authors: \v{S}imon Mandl\'ik and Tom\'a\v{s} Pevn\'y
- Abstract summary: We propose a versatile, unified framework called HMill' for sample representation, model definition and training.
We show an extension of the universal approximation theorem to the set of all functions realized by models implemented in the framework.
We solve three different problems from the cybersecurity domain using the framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even though machine learning algorithms already play a significant role in
data science, many current methods pose unrealistic assumptions on input data.
The application of such methods is difficult due to incompatible data formats,
or heterogeneous, hierarchical or entirely missing data fragments in the
dataset. As a solution, we propose a versatile, unified framework called
`HMill' for sample representation, model definition and training. We review in
depth a multi-instance paradigm for machine learning that the framework builds
on and extends. To theoretically justify the design of key components of HMill,
we show an extension of the universal approximation theorem to the set of all
functions realized by models implemented in the framework. The text also
contains a detailed discussion on technicalities and performance improvements
in our implementation, which is published for download under the MIT License.
The main asset of the framework is its flexibility, which makes modelling of
diverse real-world data sources with the same tool possible. Additionally to
the standard setting in which a set of attributes is observed for each object
individually, we explain how message-passing inference in graphs that represent
whole systems of objects can be implemented in the framework. To support our
claims, we solve three different problems from the cybersecurity domain using
the framework. The first use case concerns IoT device identification from raw
network observations. In the second problem, we study how malicious binary
files can be classified using a snapshot of the operating system represented as
a directed graph. The last provided example is a task of domain blacklist
extension through modelling interactions between entities in the network. In
all three problems, the solution based on the proposed framework achieves
performance comparable to specialized approaches.
Related papers
- Generic Multi-modal Representation Learning for Network Traffic Analysis [6.372999570085887]
Network traffic analysis is fundamental for network management, troubleshooting, and security.
We propose a flexible Multi-modal Autoencoder (MAE) pipeline that can solve different use cases.
We argue that the MAE architecture is generic and can be used to learn representations useful in multiple scenarios.
arXiv Detail & Related papers (2024-05-04T12:24:29Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Hawk: An Industrial-strength Multi-label Document Classifier [0.0]
The paper describes the significance of these problems in detail and proposes a unique neural network architecture that addresses the above problems.
A hydranet-like architecture is designed to have granular control over and improve the modularity, coupled with a weighted loss driving task-specific heads.
The experimental results reveal that the proposed model outperforms the existing methods by a substantial margin.
arXiv Detail & Related papers (2023-01-15T09:52:18Z) - FV-UPatches: Enhancing Universality in Finger Vein Recognition [0.6299766708197883]
We propose a universal learning-based framework, which achieves generalization while training with limited data.
The proposed framework shows application potential in other vein-based biometric recognition as well.
arXiv Detail & Related papers (2022-06-02T14:20:22Z) - Fashionformer: A simple, Effective and Unified Baseline for Human
Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition.
We introduce the object query for segmentation and the attribute query for attribute prediction.
For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z) - Towards a Flexible Embedding Learning Framework [15.604564543883122]
We propose an embedding learning framework that is flexible in terms of the relationships that can be embedded into the learned representations.
A sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings.
Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks.
arXiv Detail & Related papers (2020-09-23T08:00:56Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.