Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating
Ideas Using a Textual Dataset
- URL: http://arxiv.org/abs/2105.00574v1
- Date: Sun, 2 May 2021 23:24:25 GMT
- Title: Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating
Ideas Using a Textual Dataset
- Authors: W. Y. Ayele
- Abstract summary: This paper proposes a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM)
The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data mining project managers can benefit from using standard data mining
process models. The benefits of using standard process models for data mining,
such as the de facto and the most popular, Cross-Industry-Standard-Process
model for Data Mining (CRISP-DM) are reduced cost and time. Also, standard
models facilitate knowledge transfer, reuse of best practices, and minimize
knowledge requirements. On the other hand, to unlock the potential of
ever-growing textual data such as publications, patents, social media data, and
documents of various forms, digital innovation is increasingly needed.
Furthermore, the introduction of cutting-edge machine learning tools and
techniques enable the elicitation of ideas. The processing of unstructured
textual data to generate new and useful ideas is referred to as idea mining.
Existing literature about idea mining merely overlooks the utilization of
standard data mining process models. Therefore, the purpose of this paper is to
propose a reusable model to generate ideas, CRISP-DM, for Idea Mining
(CRISP-IM). The design and development of the CRISP-IM are done following the
design science approach. The CRISP-IM facilitates idea generation, through the
use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and
subsequent statistical analysis on a dataset of scholarly articles. The adapted
CRISP-IM can be used to guide the process of identifying trends using scholarly
literature datasets or temporally organized patent or any other textual dataset
of any domain to elicit ideas. The ex-post evaluation of the CRISP-IM is left
for future study.
Related papers
- A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - A Gentle Introduction and Tutorial on Deep Generative Models in Transportation Research [21.66278922813198]
Deep Generative Models (DGMs) have rapidly advanced in recent years, becoming essential tools in various fields.
This paper offers a comprehensive introduction and tutorial on DGMs, with a focus on their applications in transportation.
It begins with an overview of generative models, followed by detailed explanations of fundamental models, a systematic review of the literature, and practical tutorial code to aid implementation.
arXiv Detail & Related papers (2024-10-09T17:11:22Z) - A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys)
It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS.
Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - A toolbox for idea generation and evaluation: Machine learning,
data-driven, and contest-driven approaches to support idea generation [0.0]
This thesis includes a list of data-driven and machine learning techniques with corresponding data sources and models to support idea generation.
The results include two models, one method and one framework, to better support data-driven and contest- driven idea generation.
Human-centred AI is a promising area of research that can contribute to the artefacts' further development and promote creativity.
arXiv Detail & Related papers (2022-05-19T20:28:49Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning [14.668178146934588]
We propose t-METASET: an intelligent data acquisition framework for task-aware dataset generation.
We validate the proposed framework in three hypothetical deployment scenarios, which encompass general use, task-aware use, and tailorable use.
arXiv Detail & Related papers (2022-02-21T22:46:49Z) - A Systematic Literature Review about Idea Mining: The Use of
Machine-driven Analytics to Generate Ideas [0.0]
This study focuses on state-of-the-art machine-driven analytics for idea generation and data sources.
A systematic literature review is conducted to identify relevant scholarly literature from IEEE, Scopus, Web of Science and Google Scholar.
The results indicate that idea generation through machine-driven analytics applies text mining, information retrieval (IR), artificial intelligence (AI), deep learning, machine learning, statistical techniques, natural language processing (NLP), NLP-based morphological analysis, network analysis, and bibliometric to support idea generation.
arXiv Detail & Related papers (2022-01-30T21:46:21Z) - RPT: Toward Transferable Model on Heterogeneous Researcher Data via
Pre-Training [19.987304448524043]
We propose a multi-task self-supervised learning-based researcher data pre-training model named RPT.
We divide the researchers' data into semantic document sets and community graph.
We propose three self-supervised learning objectives to train the whole model.
arXiv Detail & Related papers (2021-10-08T03:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.