"If we didn't solve small data in the past, how can we solve Big Data
today?"
- URL: http://arxiv.org/abs/2111.04442v1
- Date: Mon, 8 Nov 2021 16:31:01 GMT
- Title: "If we didn't solve small data in the past, how can we solve Big Data
today?"
- Authors: Akash Ravi
- Abstract summary: We aim to research terms such as'small' and 'big' data, understand their attributes, and look at ways in which they can add value.
Based on the research, it can be inferred that, regardless of how small data might have been used, organizations can still leverage big data with the right technology and business vision.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data is a critical aspect of the world we live in. With systems producing and
consuming vast amounts of data, it is essential for businesses to digitally
transform and be equipped to derive the most value out of data. Data analytics
techniques can be used to augment strategic decision-making. While this overall
objective of data analytics remains fairly constant, the data itself can be
available in numerous forms and can be categorized under various contexts. In
this paper, we aim to research terms such as 'small' and 'big' data, understand
their attributes, and look at ways in which they can add value. Specifically,
the paper probes into the question "If we didn't solve small data in the past,
how can we solve Big Data today?". Based on the research, it can be inferred
that, regardless of how small data might have been used, organizations can
still leverage big data with the right technology and business vision.
Related papers
- A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Big Data and Analytics Implementation in Tertiary Institutions to
Predict Students Performance in Nigeria [0.0]
The term Big Data has been coined to refer to the gargantuan bulk of data that cannot be dealt with by traditional data-handling techniques.
This paper explores the attributes of big data that are relevant to educational institutions.
It investigates the factors influencing the adoption of big data and analytics in learning institutions.
arXiv Detail & Related papers (2022-07-29T13:52:24Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - Occams Razor for Big Data? On Detecting Quality in Large Unstructured
Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science.
Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time.
The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z) - From Data to Knowledge to Action: A Global Enabler for the 21st Century [26.32590947516587]
A confluence of advances in the computer and mathematical sciences has unleashed unprecedented capabilities for enabling true evidence-based decision making.
These capabilities are making possible the large-scale capture of data and the transformation of that data into insights and recommendations.
The shift of commerce, science, education, art, and entertainment to the web makes available unprecedented quantities of structured and unstructured databases about human activities.
arXiv Detail & Related papers (2020-07-31T19:19:42Z) - Data Augmentation for Deep Candlestick Learner [2.104922050913737]
We propose a Modified Local Search Attack Sampling method to augment the candlestick data.
Our results show that the proposed method can generate high-quality data which are hard to distinguish by human.
arXiv Detail & Related papers (2020-05-14T06:02:31Z) - Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects.
Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.