Related papers: Subtask Analysis of Process Data Through a Predictive Model

Subtask Analysis of Process Data Through a Predictive Model

URL: http://arxiv.org/abs/2009.00717v1
Date: Sat, 29 Aug 2020 21:11:01 GMT
Title: Subtask Analysis of Process Data Through a Predictive Model
Authors: Zhi Wang, Xueying Tang, Jingchen Liu and Zhiliang Ying
Abstract summary: This paper develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction. We use the process data from PIAAC 2012 to demonstrate how exploratory analysis of process data can be done with the new approach.
Score: 5.7668512557707166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Response process data collected from human-computer interactive items contain rich information about respondents' behavioral patterns and cognitive processes. Their irregular formats as well as their large sizes make standard statistical tools difficult to apply. This paper develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction, easy clustering and meaningful interpretation. Each subprocess is considered a subtask. The segmentation is based on sequential action predictability using a parsimonious predictive model combined with the Shannon entropy. Simulation studies are conducted to assess performance of the new methods. We use the process data from PIAAC 2012 to demonstrate how exploratory analysis of process data can be done with the new approach.

Related papers

Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction [1.3563640142303988]
Large language models (LLMs) can process lengthy documents even without supervised training on a task-specific dataset. One feasible approach for tasks with lengthy, complex input is to first summarize the document and then apply supervised fine-tuning to the summary. We present a method for processing the summaries of long documents aimed to capture different important aspects of the original document.
arXiv Detail & Related papers (2025-02-14T18:59:28Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling. We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process. Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z)
Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z)
Mining a Minimal Set of Behavioral Patterns using Incremental Evaluation [3.16536213610547]
Existing approaches to behavioral pattern mining suffer from two limitations. First, they show limited scalability as incremental computation is incorporated only in the generation of pattern candidates. Second, process analysis based on mined patterns shows limited effectiveness due to an overwhelmingly large number of patterns obtained in practical application scenarios.
arXiv Detail & Related papers (2024-02-05T11:41:37Z)
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions. This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts. It has been implemented using scalable software and methods to exploit large volumes of data. Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z)
Clustering Object-Centric Event Logs [0.36748639131154304]
We propose a clustering-based approach to cluster similar objects in OCELs to simplify the obtained process models. Our approach reduces the complexity of the process models and generates coherent subsets of objects which help the end-users gain insights into the process.
arXiv Detail & Related papers (2022-07-26T09:16:39Z)
Process-BERT: A Framework for Representation Learning on Educational Process Data [68.8204255655161]
We propose a framework for learning representations of educational process data. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data. We apply our framework to the 2019 nation's report card data mining competition dataset.
arXiv Detail & Related papers (2022-04-28T16:07:28Z)
What Averages Do Not Tell -- Predicting Real Life Processes with Sequential Deep Learning [0.1376408511310322]
Process Mining concerns discovering insights on business processes from their execution data that are logged by systems. Many Deep Learning techniques have been successfully adapted for predictive Process Mining that aims to predict process outcomes. Traces in Process Mining are multimodal sequences and very differently structured than natural language sentences or images.
arXiv Detail & Related papers (2021-10-19T19:45:05Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Rissanen Data Analysis: Examining Dataset Characteristics via Description Length [78.42578316883271]
We introduce a method to determine if a certain capability helps to achieve an accurate model of given data. Since minimum program length is uncomputable, we estimate the labels' minimum description length (MDL) as a proxy. We call the method Rissanen Data Analysis (RDA) after the father of MDL.
arXiv Detail & Related papers (2021-03-05T18:58:32Z)
Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data. In this paper, we propose to use (block-) structured programs directly as target process models. We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z)
ProcData: An R Package for Process Data Analysis [5.278929511653198]
R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.
arXiv Detail & Related papers (2020-06-09T05:44:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.