Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming
- URL: http://arxiv.org/abs/2203.01382v1
- Date: Wed, 2 Mar 2022 19:57:32 GMT
- Title: Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming
- Authors: Cheng-Yu Hsieh, Jieyu Zhang, Alexander Ratner
- Abstract summary: We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
- Score: 77.38174112525168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weak Supervision (WS) techniques allow users to efficiently create large
training datasets by programmatically labeling data with heuristic sources of
supervision. While the success of WS relies heavily on the provided labeling
heuristics, the process of how these heuristics are created in practice has
remained under-explored. In this work, we formalize the development process of
labeling heuristics as an interactive procedure, built around the existing
workflow where users draw ideas from a selected set of development data for
designing the heuristic sources. With the formalism, we study two core problems
of how to strategically select the development data to guide users in
efficiently creating informative heuristics, and how to exploit the information
within the development process to contextualize and better learn from the
resultant heuristics. Building upon two novel methodologies that effectively
tackle the respective problems considered, we present Nemo, an end-to-end
interactive system that improves the overall productivity of WS learning
pipeline by an average 20% (and up to 47% in one task) compared to the
prevailing WS approach.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments [33.83610929282721]
Learn-by-interact is a data-centric framework to adapt large language models (LLMs) to any given environments without human annotations.
We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL)
Experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact.
arXiv Detail & Related papers (2025-01-18T22:34:41Z) - Intelligent Spark Agents: A Modular LangGraph Framework for Scalable, Visualized, and Enhanced Big Data Machine Learning Workflows [1.4582633500696451]
LangGraph framework is designed to enhance machine learning through scalability, visualization, and intelligent process optimization.
At its core, the framework introduces Agent AI, a pivotal innovation that leverages Spark's distributed computing capabilities.
The framework also incorporates large language models through the LangChain ecosystem, enhancing interaction with unstructured data.
arXiv Detail & Related papers (2024-12-02T13:41:38Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Process-aware Human Activity Recognition [1.912429179274357]
We propose a novel approach that incorporates process information from context to enhance the HAR performance.
Specifically, we align probabilistic events generated by machine learning models with process models derived from contextual information.
This alignment adaptively weighs these two sources of information to optimise HAR accuracy.
arXiv Detail & Related papers (2024-11-13T17:53:23Z) - Collaborative Evolving Strategy for Automatic Data-Centric Development [17.962373755266068]
This paper introduces the automatic data-centric development (AD2) task.
It outlines its core challenges, which require domain-experts-like task scheduling and implementation capability.
We propose an autonomous agent equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval.
arXiv Detail & Related papers (2024-07-26T12:16:47Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learning Context-Aware Service Representation for Service Recommendation
in Workflow Composition [6.17189383632496]
This paper proposes a novel NLP-inspired approach to recommending services throughout a workflow development process.
A workflow composition process is formalized as a step-wise, context-aware service generation procedure.
Service embeddings are then learned by applying deep learning model from the NLP field.
arXiv Detail & Related papers (2022-05-24T04:18:01Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.