Related papers: Big data searching using words

Big data searching using words

URL: http://arxiv.org/abs/2409.15346v1
Date: Tue, 10 Sep 2024 13:46:14 GMT
Title: Big data searching using words
Authors: Santanu Acharjee, Ripunjoy Choudhury,
Abstract summary: We introduce some fundamental ideas related to the neighborhood structure of words in data searching. We also introduce big data primal in big data searching and discuss the application of neighborhood structures in detecting anomalies in data searching.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Big data analytics is one of the most promising areas of new research and development in computer science, enterprises, e-commerce, and defense. For many organizations, big data is regarded as one of their most important strategic assets. This explosive growth has made it necessary to develop effective techniques for examining and analyzing big data from a mathematical perspective. Among various methods of analyzing big data, topological data analysis (TDA) is now considered one of the useful tools. However, there is no fundamental concept related to topological structure in big data. In this paper, we introduce some fundamental ideas related to the neighborhood structure of words in data searching, which can be extended to form important topological structures of big data in the future. Additionally, we introduce big data primal in big data searching and discuss the application of neighborhood structures in detecting anomalies in data searching using the Jaccard similarity coefficient.

Related papers

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains. BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z)
Exploiting Formal Concept Analysis for Data Modeling in Data Lakes [0.29998889086656577]
This paper introduces a practical data visualization and analysis approach rooted in Formal Concept Analysis (FCA) We represent data structures as objects, analyze the concept lattice, and present two strategies-top-down and bottom-up-to unify these structures and establish a common schema. We achieve a complete coverage of 80 percent of data structures with only 34 distinct field names.
arXiv Detail & Related papers (2024-08-11T13:58:31Z)
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning. We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z)
Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora [104.16648246740543]
We propose an efficient data collection method based on large language models. The method bootstraps seed information through a large language model and retrieves related data from public corpora. It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures.
arXiv Detail & Related papers (2024-01-26T03:38:23Z)
Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z)
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook [95.32949323258251]
Temporal data, notably time series andtemporal-temporal data, are prevalent in real-world applications. Recent advances in large language and other foundational models have spurred increased use in time series andtemporal data mining.
arXiv Detail & Related papers (2023-10-16T09:06:00Z)
Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates. Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z)
LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning. However, the promising results achieved on current public datasets may not be applicable to practical scenarios. We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z)
Big Data and Analytics Implementation in Tertiary Institutions to Predict Students Performance in Nigeria [0.0]
The term Big Data has been coined to refer to the gargantuan bulk of data that cannot be dealt with by traditional data-handling techniques. This paper explores the attributes of big data that are relevant to educational institutions. It investigates the factors influencing the adoption of big data and analytics in learning institutions.
arXiv Detail & Related papers (2022-07-29T13:52:24Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Occams Razor for Big Data? On Detecting Quality in Large Unstructured Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science. Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time. The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z)
Big Issues for Big Data: challenges for critical spatial data analytics [0.0]
We focus on a set of challenges underlying the collection and analysis of big data. We consider the issues related to inference when working with usually biased big data. In particular we consider the need to place individual data science studies in a wider social and economic contexts.
arXiv Detail & Related papers (2020-07-22T09:11:56Z)
Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.