A Review of Machine Learning Techniques in Imbalanced Data and Future
Trends
- URL: http://arxiv.org/abs/2310.07917v1
- Date: Wed, 11 Oct 2023 22:14:17 GMT
- Title: A Review of Machine Learning Techniques in Imbalanced Data and Future
Trends
- Authors: Elaheh Jafarigol, Theodore Trafalis
- Abstract summary: We have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers.
This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For over two decades, detecting rare events has been a challenging task among
researchers in the data mining and machine learning domain. Real-life problems
inspire researchers to navigate and further improve data processing and
algorithmic approaches to achieve effective and computationally efficient
methods for imbalanced learning. In this paper, we have collected and reviewed
258 peer-reviewed papers from archival journals and conference papers in an
attempt to provide an in-depth review of various approaches in imbalanced
learning from technical and application perspectives. This work aims to provide
a structured review of methods used to address the problem of imbalanced data
in various domains and create a general guideline for researchers in academia
or industry who want to dive into the broad field of machine learning using
large-scale imbalanced data.
Related papers
- Interpretable Machine Learning for Discovery: Statistical Challenges \&
Opportunities [1.2891210250935146]
We discuss and review the field of interpretable machine learning.
We outline the types of discoveries that can be made using Interpretable Machine Learning.
We focus on the grand challenge of how to validate these discoveries in a data-driven manner.
arXiv Detail & Related papers (2023-08-02T23:57:31Z) - Statistical Foundation Behind Machine Learning and Its Impact on
Computer Vision [8.974457198386414]
This paper revisits the principle of uniform convergence in statistical learning, discusses how it acts as the foundation behind machine learning, and attempts to gain a better understanding of the essential problem that current deep learning algorithms are solving.
Using computer vision as an example domain in machine learning, the discussion shows that recent research trends in leveraging increasingly large-scale data to perform pre-training for representation learning are largely to reduce the discrepancy between a practically tractable empirical loss and its ultimately desired but intractable expected loss.
arXiv Detail & Related papers (2022-09-06T17:59:04Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Lessons Learned from Data-Driven Building Control Experiments:
Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement
Learning [0.0]
This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques.
It is compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applications.
arXiv Detail & Related papers (2022-05-31T11:40:22Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - A Survey on Machine Learning Techniques for Source Code Analysis [14.129976741300029]
We aim to summarize the current knowledge in the area of applied machine learning for source code analysis.
To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021.
arXiv Detail & Related papers (2021-10-18T20:13:38Z) - Human-Robot Collaboration and Machine Learning: A Systematic Review of
Recent Research [69.48907856390834]
Human-robot collaboration (HRC) is the approach that explores the interaction between a human and a robot.
This paper proposes a thorough literature review of the use of machine learning techniques in the context of HRC.
arXiv Detail & Related papers (2021-10-14T15:14:33Z) - Scaling up Search Engine Audits: Practical Insights for Algorithm
Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions.
We demonstrate the successful performance of our research infrastructure across multiple data collections.
We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z) - Knowledge as Invariance -- History and Perspectives of
Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point.
Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks.
This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.