Related papers: Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

URL: http://arxiv.org/abs/2212.03423v1
Date: Wed, 7 Dec 2022 03:12:41 GMT
Title: Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning
Authors: Yukun Cao, Xike Xie, and Kexin Huang
Abstract summary: We propose a learning-to-explore framework, based on meta-learning, which learns how to learn a classifier with automatically generated meta-tasks. Our proposal outperforms existing explore-by-example solutions in terms of accuracy and efficiency.
Score: 8.92180350317399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive data exploration (IDE) is an effective way of comprehending big data, whose volume and complexity are beyond human abilities. The main goal of IDE is to discover user interest regions from a database through multi-rounds of user labelling. Existing IDEs adopt active-learning framework, where users iteratively discriminate or label the interestingness of selected tuples. The process of data exploration can be viewed as the process of training a classifier, which determines whether a database tuple is interesting to a user. An efficient exploration thus takes very few iterations of user labelling to reach the data region of interest. In this work, we consider the data exploration as the process of few-shot learning, where the classifier is learned with only a few training examples, or exploration iterations. To this end, we propose a learning-to-explore framework, based on meta-learning, which learns how to learn a classifier with automatically generated meta-tasks, so that the exploration process can be much shortened. Extensive experiments on real datasets show that our proposal outperforms existing explore-by-example solutions in terms of accuracy and efficiency.

Related papers

Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems [0.6267574471145215]
We propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. Our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.
arXiv Detail & Related papers (2024-12-31T05:12:51Z)
LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics. The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
Actively Discovering New Slots for Task-oriented Conversation [19.815466126158785]
We propose a general new slot task in an information extraction fashion to realize human-in-the-loop learning. We leverage existing language tools to extract value candidates where the corresponding labels are leveraged as weak supervision signals. We conduct extensive experiments on several public datasets and compare with a bunch of competitive baselines to demonstrate our method.
arXiv Detail & Related papers (2023-05-06T13:33:33Z)
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process. InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining. In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z)
ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection. Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z)
Active metric learning and classification using similarity queries [21.589707834542338]
We show that a novel unified query framework can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification.
arXiv Detail & Related papers (2022-02-04T03:34:29Z)
AstronomicAL: An interactive dashboard for visualisation, integration and classification of data using Active Learning [0.0]
AstronomicAL is a human-in-the-loop interactive labelling and training dashboard. It allows users to create reliable datasets and robust classifiers using active learning. System allows users to visualise and integrate data from different sources.
arXiv Detail & Related papers (2021-09-11T07:32:26Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.