Related papers: Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset

Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset

URL: http://arxiv.org/abs/2105.00574v1
Date: Sun, 2 May 2021 23:24:25 GMT
Title: Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset
Authors: W. Y. Ayele
Abstract summary: This paper proposes a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM) The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data mining project managers can benefit from using standard data mining process models. The benefits of using standard process models for data mining, such as the de facto and the most popular, Cross-Industry-Standard-Process model for Data Mining (CRISP-DM) are reduced cost and time. Also, standard models facilitate knowledge transfer, reuse of best practices, and minimize knowledge requirements. On the other hand, to unlock the potential of ever-growing textual data such as publications, patents, social media data, and documents of various forms, digital innovation is increasingly needed. Furthermore, the introduction of cutting-edge machine learning tools and techniques enable the elicitation of ideas. The processing of unstructured textual data to generate new and useful ideas is referred to as idea mining. Existing literature about idea mining merely overlooks the utilization of standard data mining process models. Therefore, the purpose of this paper is to propose a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM). The design and development of the CRISP-IM are done following the design science approach. The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles. The adapted CRISP-IM can be used to guide the process of identifying trends using scholarly literature datasets or temporally organized patent or any other textual dataset of any domain to elicit ideas. The ex-post evaluation of the CRISP-IM is left for future study.

Related papers

Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning [88.68573198200698]
We introduce ExploreToM, the first framework to allow large-scale generation of diverse and challenging theory of mind data. Our approach leverages an A* search over a custom domain-specific language to produce complex story structures and novel, diverse, yet plausible scenarios. Our evaluation reveals that state-of-the-art LLMs, such as Llama-3.1-70B and GPT-4o, show accuracies as low as 0% and 9% on ExploreToM-generated data.
arXiv Detail & Related papers (2024-12-12T21:29:00Z)
A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources. We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
A Gentle Introduction and Tutorial on Deep Generative Models in Transportation Research [21.66278922813198]
Deep Generative Models (DGMs) have rapidly advanced in recent years, becoming essential tools in various fields. This paper offers a comprehensive introduction and tutorial on DGMs, with a focus on their applications in transportation. It begins with an overview of generative models, followed by detailed explanations of fundamental models, a systematic review of the literature, and practical tutorial code to aid implementation.
arXiv Detail & Related papers (2024-10-09T17:11:22Z)
Dataset Regeneration for Sequential Recommendation [69.93516846106701]
We propose a data-centric paradigm for developing an ideal training dataset using a model-agnostic dataset regeneration framework called DR4SR. To demonstrate the effectiveness of the data-centric paradigm, we integrate our framework with various model-centric methods and observe significant performance improvements across four widely adopted datasets.
arXiv Detail & Related papers (2024-05-28T03:45:34Z)
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys) It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
A toolbox for idea generation and evaluation: Machine learning, data-driven, and contest-driven approaches to support idea generation [0.0]
This thesis includes a list of data-driven and machine learning techniques with corresponding data sources and models to support idea generation. The results include two models, one method and one framework, to better support data-driven and contest- driven idea generation. Human-centred AI is a promising area of research that can contribute to the artefacts' further development and promote creativity.
arXiv Detail & Related papers (2022-05-19T20:28:49Z)
Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
T-METASET: Task-Aware Generation of Metamaterial Datasets by Diversity-Based Active Learning [14.668178146934588]
We propose t-METASET: an intelligent data acquisition framework for task-aware dataset generation. We validate the proposed framework in three hypothetical deployment scenarios, which encompass general use, task-aware use, and tailorable use.
arXiv Detail & Related papers (2022-02-21T22:46:49Z)
A Systematic Literature Review about Idea Mining: The Use of Machine-driven Analytics to Generate Ideas [0.0]
This study focuses on state-of-the-art machine-driven analytics for idea generation and data sources. A systematic literature review is conducted to identify relevant scholarly literature from IEEE, Scopus, Web of Science and Google Scholar. The results indicate that idea generation through machine-driven analytics applies text mining, information retrieval (IR), artificial intelligence (AI), deep learning, machine learning, statistical techniques, natural language processing (NLP), NLP-based morphological analysis, network analysis, and bibliometric to support idea generation.
arXiv Detail & Related papers (2022-01-30T21:46:21Z)
RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training [19.987304448524043]
We propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. We divide the researchers' data into semantic document sets and community graph. We propose three self-supervised learning objectives to train the whole model.
arXiv Detail & Related papers (2021-10-08T03:42:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.