Automatic Catalog of RRLyrae from $\sim$ 14 million VVV Light Curves:
How far can we go with traditional machine-learning?
- URL: http://arxiv.org/abs/2005.00220v2
- Date: Tue, 4 May 2021 19:29:32 GMT
- Title: Automatic Catalog of RRLyrae from $\sim$ 14 million VVV Light Curves:
How far can we go with traditional machine-learning?
- Authors: Juan B. Cabral, Felipe Ramos, Sebasti\'an Gurovich and Pablo Granitto
- Abstract summary: The creation of a 3D map of the bulge using RRLyrae (RRL) is one of the main goals of the VVV(X) surveys.
Previous works introduced the use of Machine Learning (ML) methods for the variable star classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The creation of a 3D map of the bulge using RRLyrae (RRL) is one of the main
goals of the VVV(X) surveys. The overwhelming number of sources under analysis
request the use of automatic procedures. In this context, previous works
introduced the use of Machine Learning (ML) methods for the variable star
classification. Our goal is the development and analysis of an automatic
procedure, based on ML, for the identification of RRLs in the VVV Survey. This
procedure will be use to generate reliable catalogs integrated over several
tiles in the survey. After the reconstruction of light-curves, we extract a set
of period and intensity-based features. We use for the first time a new subset
of pseudo color features. We discuss all the appropriate steps needed to define
our automatic pipeline: selection of quality measures; sampling procedures;
classifier setup and model selection. As final result, we construct an ensemble
classifier with an average Recall of 0.48 and average Precision of 0.86 over 15
tiles. We also make available our processed datasets and a catalog of candidate
RRLs. Perhaps most interestingly, from a classification perspective based on
photometric broad-band data, is that our results indicate that Color is an
informative feature type of the RRL that should be considered for automatic
classification methods via ML. We also argue that Recall and Precision in both
tables and curves are high quality metrics for this highly imbalanced problem.
Furthermore, we show for our VVV data-set that to have good estimates it is
important to use the original distribution more than reduced samples with an
artificial balance. Finally, we show that the use of ensemble classifiers helps
resolve the crucial model selection step, and that most errors in the
identification of RRLs are related to low quality observations of some sources
or to the difficulty to resolve the RRL-C type given the date.
Related papers
- Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget.
We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z) - LIMR: Less is More for RL Scaling [25.477841726836836]
We introduce Learning Impact Measurement (LIM), an automated method to evaluate and prioritize training samples.
Our method achieves comparable or even superior performance using only 1,389 samples versus the full 8,523 samples dataset.
For reproducible research and future innovation, we are open-sourcing LIMR, including implementation of LIM, training and evaluation code, curated datasets, and trained models.
arXiv Detail & Related papers (2025-02-17T15:13:29Z) - BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration [7.261063083251448]
We present a complete framework for calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses.
We use AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT)
We propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT)
arXiv Detail & Related papers (2024-10-28T13:54:10Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - A semi-supervised learning using over-parameterized regression [0.0]
Semi-supervised learning (SSL) is an important theme in machine learning.
In this paper, we consider a method of incorporating information on unlabeled samples into kernel functions.
arXiv Detail & Related papers (2024-09-06T03:05:35Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - Drifting Features: Detection and evaluation in the context of automatic
RRLs identification in VVV [0.0]
We introduce and discuss the notion of Drifting Features, related with small changes in the properties as measured in the data features.
We show that this method can efficiently identify a reduced set of features that contains useful information about the tile of origin of the sources.
arXiv Detail & Related papers (2021-05-04T19:07:32Z) - Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)
We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.