Data Augmentation for Deep Candlestick Learner
- URL: http://arxiv.org/abs/2005.06731v2
- Date: Fri, 29 May 2020 06:02:46 GMT
- Title: Data Augmentation for Deep Candlestick Learner
- Authors: Chia-Ying Tsao, Jun-Hao Chen, Samuel Yen-Chi Chen, and Yun-Cheng Tsai
- Abstract summary: We propose a Modified Local Search Attack Sampling method to augment the candlestick data.
Our results show that the proposed method can generate high-quality data which are hard to distinguish by human.
- Score: 2.104922050913737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To successfully build a deep learning model, it will need a large amount of
labeled data. However, labeled data are hard to collect in many use cases. To
tackle this problem, a bunch of data augmentation methods have been introduced
recently and have demonstrated successful results in computer vision, natural
language and so on. For financial trading data, to our best knowledge,
successful data augmentation framework has rarely been studied. Here we propose
a Modified Local Search Attack Sampling method to augment the candlestick data,
which is a very important tool for professional trader. Our results show that
the proposed method can generate high-quality data which are hard to
distinguish by human and will open a new way for finance community to employ
existing machine learning techniques even if the dataset is small.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Investigating the Impact of Semi-Supervised Methods with Data Augmentation on Offensive Language Detection in Romanian Language [2.2823100315094624]
Offensive language detection is a crucial task in today's digital landscape.
Building robust offensive language detection models requires large amounts of labeled data.
Semi-supervised learning offers a feasible solution by utilizing labeled and unlabeled data.
arXiv Detail & Related papers (2024-07-29T15:02:51Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Data Augmentation for Neural NLP [0.0]
Data augmentation is a low-cost approach for tackling data scarcity.
This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing.
arXiv Detail & Related papers (2023-02-22T14:47:15Z) - Evaluating and Crafting Datasets Effective for Deep Learning With Data
Maps [0.0]
Training on large datasets often requires excessive system resources and an infeasible amount of time.
For supervised learning, large datasets require more time for manually labeling samples.
We propose a method of curating smaller datasets with comparable out-of-distribution model accuracy after an initial training session.
arXiv Detail & Related papers (2022-08-22T03:30:18Z) - Fix your Models by Fixing your Datasets [0.6058427379240697]
Current machine learning tools lack streamlined processes for improving the data quality.
We introduce a systematic framework for finding noisy or mislabelled samples in the dataset.
We demonstrate the efficacy of our framework on public as well as private enterprise datasets of two Fortune 500 companies.
arXiv Detail & Related papers (2021-12-15T02:41:50Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - Evaluating data augmentation for financial time series classification [85.38479579398525]
We evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models.
For a relatively small dataset augmentation methods achieve up to $400%$ improvement in risk adjusted return performance.
For a larger stock dataset augmentation methods achieve up to $40%$ improvement.
arXiv Detail & Related papers (2020-10-28T17:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.