Extracting candidate factors affecting long-term trends of student
abilities across subjects
- URL: http://arxiv.org/abs/2103.06446v1
- Date: Thu, 11 Mar 2021 04:13:58 GMT
- Title: Extracting candidate factors affecting long-term trends of student
abilities across subjects
- Authors: Satoshi Takahashi and Hiroki Kuno and Atsushi Yoshikawa
- Abstract summary: Long-term student achievement data provide useful information to formulate the research question of what types of student skills would impact future trends across subjects.
We propose a novel approach to extract candidate factors affecting long-term trends across subjects from long-term data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-term student achievement data provide useful information to formulate
the research question of what types of student skills would impact future
trends across subjects. However, few studies have focused on long-term data.
This is because the criteria of examinations vary depending on their designers;
additionally, it is difficult for the same designer to maintain the coherence
of the criteria of examinations beyond grades. To solve this inconsistency
issue, we propose a novel approach to extract candidate factors affecting
long-term trends across subjects from long-term data. Our approach is composed
of three steps: Data screening, time series clustering, and causal inference.
The first step extracts coherence data from long-term data. The second step
groups the long-term data by shape and value. The third step extracts factors
affecting the long-term trends and validates the extracted variation factors
using two or more different data sets. We then conducted evaluation experiments
with student achievement data from five public elementary schools and four
public junior high schools in Japan. The results demonstrate that our approach
extracts coherence data, clusters long-term data into interpretable groups, and
extracts candidate factors affecting academic ability across subjects.
Subsequently, our approach formulates a hypothesis and turns archived
achievement data into useful information.
Related papers
- Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement [62.87020831987625]
We propose a novel framework designed to identify the influential and high-quality samples enriched with long-range dependency relations.
We select the most challenging samples as the influential data to effectively frame the long-range dependencies.
Experiments indicate that GATEAU effectively identifies samples enriched with long-range dependency relations and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.
arXiv Detail & Related papers (2024-10-21T04:30:53Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Combining the Strengths of Dutch Survey and Register Data in a Data Challenge to Predict Fertility (PreFer) [8.4153358785173]
We present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands.
One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics.
The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents.
arXiv Detail & Related papers (2024-02-01T16:00:21Z) - A Review and Roadmap of Deep Causal Model from Different Causal
Structures and Representations [23.87336875544181]
We redefinition causal data into three categories: definite data, semi-definite data, and indefinite data.
Definite data pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning.
Indefinite data is an emergent research sphere inferred from the progression of data forms by us.
arXiv Detail & Related papers (2023-11-02T01:31:42Z) - Continual Release of Differentially Private Synthetic Data from Longitudinal Data Collections [19.148874215745135]
We study the problem of continually releasing differentially private synthetic data from longitudinal data collections.
We introduce a model where, in every time step, each individual reports a new data element.
We give continual synthetic data generation algorithms that preserve two basic types of queries.
arXiv Detail & Related papers (2023-06-13T16:22:08Z) - Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples.
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z) - Towards Federated Long-Tailed Learning [76.50892783088702]
Data privacy and class imbalance are the norm rather than the exception in many machine learning tasks.
Recent attempts have been launched to, on one side, address the problem of learning from pervasive private data, and on the other side, learn from long-tailed data.
This paper focuses on learning with long-tailed (LT) data distributions under the context of the popular privacy-preserved federated learning (FL) framework.
arXiv Detail & Related papers (2022-06-30T02:34:22Z) - A Survey on Long-Tailed Visual Recognition [13.138929184395423]
We focus on the problems caused by long-tailed data distribution, sort out the representative long-tailed visual recognition datasets and summarize some mainstream long-tailed studies.
Based on the Gini coefficient, we quantitatively study 20 widely-used and large-scale visual datasets proposed in the last decade.
arXiv Detail & Related papers (2022-05-27T06:22:55Z) - Long-term Causal Inference Under Persistent Confounding via Data Combination [38.026740610259225]
We study the identification and estimation of long-term treatment effects when both experimental and observational data are available.
Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data.
arXiv Detail & Related papers (2022-02-15T07:44:20Z) - Multi-characteristic Subject Selection from Biased Datasets [79.82881947891589]
We present a constrained optimization-based method that finds the best possible sampling fractions for the different population subgroups.
Our results show that our proposed method outperforms the baselines for all problem variations by up to 90%.
arXiv Detail & Related papers (2020-12-18T15:55:27Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.