Related papers: A Recurrent Neural Network Approach to the Answering Machine Detection Problem

A Recurrent Neural Network Approach to the Answering Machine Detection Problem

URL: http://arxiv.org/abs/2410.08235v1
Date: Mon, 7 Oct 2024 21:28:09 GMT
Title: A Recurrent Neural Network Approach to the Answering Machine Detection Problem
Authors: Kemal Altwlkany, Sead Delalic, Elmedin Selmanovic, Adis Alihodzic, Ivica Lovric,
Abstract summary: This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of telecommunications and cloud communications, accurately and in real-time detecting whether a human or an answering machine has answered an outbound call is of paramount importance. This problem is of particular significance during campaigns as it enhances service quality, efficiency and cost reduction through precise caller identification. Despite the significance of the field, it remains inadequately explored in the existing literature. This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The YAMNet architecture facilitates the training of a recurrent-based classifier, enabling real-time processing of audio streams, as opposed to fixed-length recordings. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved with the integration of a silence detection algorithm, such as the one provided by FFmpeg.

Related papers

Combolutional Neural Networks [21.93943668751019]
We propose a combolutional layer a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain.<n>We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important.
arXiv Detail & Related papers (2025-07-28T13:30:51Z)
Refining music sample identification with a self-supervised graph neural network [16.73613870989583]
We propose a lightweight and scalable encoding architecture employing a Graph Neural Network within a contrastive learning framework.<n>Our model uses only 9% of the trainable parameters compared to the current state-of-the-art system while achieving comparable performance, reaching a mean average precision (mAP) of 44.2%.<n>In addition, because queries in real-world applications are often short in duration, we benchmark our system for short queries using new fine-grained annotations for the Sample100 dataset.
arXiv Detail & Related papers (2025-06-17T16:19:21Z)
Automatic welding detection by an intelligent tool pipe inspection [0.0]
This work provide a model based on machine learning techniques in welds recognition, based on signals obtained through in-line inspection tool called smart pig in Oil and Gas pipelines. The results show that is possible to identify welding automatically with an efficiency between 90 and 98 percent.
arXiv Detail & Related papers (2025-03-11T15:52:28Z)
Dense Object Detection Based on De-homogenized Queries [12.33849715319161]
Dense object detection is widely used in automatic driving, video surveillance, and other fields. Currently, detection methods based on greedy algorithms, such as non-maximum suppression (NMS), often produce many repetitive predictions or missed detections in dense scenarios. Through the end-to-end DETR (DEtection TRansformer), as a type of detector that can incorporate the post-processing de-duplication capability of NMS, etc., into the network, we found that homogeneous queries in the query-based detector lead to a reduction in the de-duplication capability of the network and the learning efficiency of the encoder
arXiv Detail & Related papers (2025-02-11T02:36:10Z)
Feature Selection for Network Intrusion Detection [3.7414804164475983]
We present a novel information-theoretic method that facilitates the exclusion of non-informative features when detecting network intrusions. The proposed method is based on function approximation using a neural network, which enables a version of our approach that incorporates a recurrent layer.
arXiv Detail & Related papers (2024-11-18T14:25:55Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Histogram Layer Time Delay Neural Networks for Passive Sonar Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification. The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z)
SeqNet: An Efficient Neural Network for Automatic Malware Detection [5.365259648024797]
We propose a lightweight malware detection model called SeqNet which could be trained at high speed with low memory required on the raw binaries. By avoiding contextual confusion and reducing semantic loss, SeqNet maintains the detection accuracy when reducing the number of parameters to only 136K.
arXiv Detail & Related papers (2022-05-08T12:31:35Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings [37.0780415938284]
We propose a lightweight, yet robust, and completely interpretable framework for lung sound event detection. We use a multi-branch TCN architecture and exploit a novel fusion strategy to combine the resultant features from these branches. Our analysis of different feature fusion strategies shows that the proposed feature concatenation method leads to better suppression of non-informative features.
arXiv Detail & Related papers (2021-06-30T06:36:22Z)
SignalNet: A Low Resolution Sinusoid Decomposition and Estimation Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples. We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions. In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z)
Lightweight Convolutional Neural Network with Gaussian-based Grasping Representation for Robotic Grasping Detection [4.683939045230724]
Current object detectors are difficult to strike a balance between high accuracy and fast inference speed. We present an efficient and robust fully convolutional neural network model to perform robotic grasping pose estimation. The network is an order of magnitude smaller than other excellent algorithms.
arXiv Detail & Related papers (2021-01-25T16:36:53Z)
BiDet: An Efficient Binarized Object Detector [96.19708396510894]
We propose a binarized neural network learning method called BiDet for efficient object detection. Our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal. Our method outperforms the state-of-the-art binary neural networks by a sizable margin.
arXiv Detail & Related papers (2020-03-09T08:16:16Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.