A Hierarchical Approach to Scaling Batch Active Search Over Structured
Data
- URL: http://arxiv.org/abs/2007.10263v1
- Date: Mon, 20 Jul 2020 16:50:25 GMT
- Title: A Hierarchical Approach to Scaling Batch Active Search Over Structured
Data
- Authors: Vivek Myers and Peyton Greenside
- Abstract summary: We present a general hierarchical framework based on bandit algorithms to scale active search to large batch sizes.
We focus our application of HBBS on modern biology, where large batch experimentation is often fundamental to the research process.
- Score: 0.5076419064097732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active search is the process of identifying high-value data points in a large
and often high-dimensional parameter space that can be expensive to evaluate.
Traditional active search techniques like Bayesian optimization trade off
exploration and exploitation over consecutive evaluations, and have
historically focused on single or small (<5) numbers of examples evaluated per
round. As modern data sets grow, so does the need to scale active search to
large data sets and batch sizes. In this paper, we present a general
hierarchical framework based on bandit algorithms to scale active search to
large batch sizes by maximizing information derived from the unique structure
of each dataset. Our hierarchical framework, Hierarchical Batch Bandit Search
(HBBS), strategically distributes batch selection across a learned embedding
space by facilitating wide exploration of different structural elements within
a dataset. We focus our application of HBBS on modern biology, where large
batch experimentation is often fundamental to the research process, and
demonstrate batch design of biological sequences (protein and DNA). We also
present a new Gym environment to easily simulate diverse biological sequences
and to enable more comprehensive evaluation of active search methods across
heterogeneous data sets. The HBBS framework improves upon standard performance,
wall-clock, and scalability benchmarks for batch search by using a broad
exploration strategy across coarse partitions and fine-grained exploitation
within each partition of structured data.
Related papers
- HiBO: Hierarchical Bayesian Optimization via Adaptive Search Space Partitioning [0.7737746260673106]
HiBO is a novel hierarchical algorithm integrating global-level search space partitioning information into the acquisition strategy of a local BO-based.
A set of evaluations demonstrates that HiBO outperforms state-of-the-art methods in high-dimensional synthetic benchmarks.
arXiv Detail & Related papers (2024-10-30T16:04:16Z) - FOR-instance: a UAV laser scanning benchmark dataset for semantic and
instance segmentation of individual trees [0.06597195879147556]
FOR-instance dataset comprises five curated and ML-ready UAV-based laser scanning data collections.
The dataset is divided into development and test subsets, enabling method advancement and evaluation.
The inclusion of diameter at breast height data expands its utility to the measurement of a classic tree variable.
arXiv Detail & Related papers (2023-09-03T22:08:29Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Towards Personalized Preprocessing Pipeline Search [52.59156206880384]
ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering.
We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines.
Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
arXiv Detail & Related papers (2023-02-28T05:45:05Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - Frequent Itemset-driven Search for Finding Minimum Node Separators in
Complex Networks [61.2383572324176]
We propose a frequent itemset-driven search approach, which integrates the concept of frequent itemset mining in data mining into the well-known memetic search framework.
It iteratively employs the frequent itemset recombination operator to generate promising offspring solution based on itemsets that frequently occur in high-quality solutions.
In particular, it discovers 29 new upper bounds and matches 18 previous best-known bounds.
arXiv Detail & Related papers (2022-01-18T11:16:40Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Multidimensional Assignment Problem for multipartite entity resolution [69.48568967931608]
Multipartite entity resolution aims at integrating records from multiple datasets into one entity.
We apply two procedures, a Greedy algorithm and a large scale neighborhood search, to solve the assignment problem.
We find evidence that design-based multi-start can be more efficient as the size of databases grow large.
arXiv Detail & Related papers (2021-12-06T20:34:55Z) - Structural Textile Pattern Recognition and Processing Based on
Hypergraphs [2.4963790083110426]
We introduce an approach for recognising similar weaving patterns based on their structures for textile archives.
We first represent textile structures using hypergraphs and extract multisets of k-neighbourhoods describing weaving patterns from these graphs.
The resulting multisets are clustered using various distance measures and various clustering algorithms.
arXiv Detail & Related papers (2021-03-21T00:44:40Z) - Learning from Data to Speed-up Sorted Table Search Procedures:
Methodology and Practical Guidelines [0.0]
We study to what extend Machine Learning Techniques can contribute to obtain such a speed-up.
We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing.
Indeed, we formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
arXiv Detail & Related papers (2020-07-20T16:26:54Z) - AutoSTR: Efficient Backbone Search for Scene Text Recognition [80.7290173000068]
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes.
We propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks.
arXiv Detail & Related papers (2020-03-14T06:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.