Detecting Anomalies Through Contrast in Heterogeneous Data
- URL: http://arxiv.org/abs/2104.01156v1
- Date: Fri, 2 Apr 2021 17:21:12 GMT
- Title: Detecting Anomalies Through Contrast in Heterogeneous Data
- Authors: Debanjan Datta, Sathappan Muthiah and Naren Ramakrishnan
- Abstract summary: We propose Contrastive Learning based Heterogeneous Anomaly Detector to address shortcomings of prior models.
Our model uses an asymmetric autoencoder that can effectively handle large arity categorical variables.
We provide a qualitative study to showcase the effectiveness of our model in detecting anomalies in timber trade.
- Score: 21.56932906044264
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Detecting anomalies has been a fundamental approach in detecting potentially
fraudulent activities. Tasked with detection of illegal timber trade that
threatens ecosystems and economies and association with other illegal
activities, we formulate our problem as one of anomaly detection. Among other
challenges annotations are unavailable for our large-scale trade data with
heterogeneous features (categorical and continuous), that can assist in
building automated systems to detect fraudulent transactions. Modelling the
task as unsupervised anomaly detection, we propose a novel model Contrastive
Learning based Heterogeneous Anomaly Detector to address shortcomings of prior
models. Our model uses an asymmetric autoencoder that can effectively handle
large arity categorical variables, but avoids assumptions about structure of
data in low-dimensional latent space and is robust to changes to
hyper-parameters. The likelihood of data is approximated through an estimator
network, which is jointly trained with the autoencoder,using negative sampling.
Further the details and intuition for an effective negative sample generation
approach for heterogeneous data are outlined. We provide a qualitative study to
showcase the effectiveness of our model in detecting anomalies in timber trade.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.