Related papers: Chat Failures and Troubles: Reasons and Solutions

Related papers

Pre-training for Recommendation Unlearning [14.514770044236375]
UnlearnRec is a model-agnostic pre-training paradigm that prepares systems for efficient unlearning operations.<n>Our method delivers exceptional unlearning effectiveness while providing more than 10x speedup compared to retraining approaches.
arXiv Detail & Related papers (2025-05-28T17:57:11Z)
Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Reimplementation of Learning to Reweight Examples for Robust Deep Learning [0.0]
Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. The performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these sets, noisy labels and training set biases, are known to frequently cause poor generalization performance.
arXiv Detail & Related papers (2024-05-11T00:43:56Z)
Enhancing the Fairness and Performance of Edge Cameras with Explainable AI [3.4719449211802456]
Our research presents a diagnostic method using Explainable AI (XAI) for model debug. We found the training dataset as the main bias source and suggested model augmentation as a solution.
arXiv Detail & Related papers (2024-01-18T10:08:24Z)
Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z)
A Reusable AI-Enabled Defect Detection System for Railway Using Ensembled CNN [5.381374943525773]
Defect detection is crucial for ensuring the trustworthiness of railway systems. Current approaches rely on single deep-learning models, like CNNs. We propose a reusable AI-enabled defect detection approach.
arXiv Detail & Related papers (2023-11-24T19:45:55Z)
Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z)
On-Robot Bayesian Reinforcement Learning for POMDPs [16.667924736270415]
This paper advances Bayesian reinforcement learning for robotics by proposing a specialized framework for physical systems. We capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework. We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model.
arXiv Detail & Related papers (2023-07-22T01:16:29Z)
Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot. We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z)
Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. We apply multi-task learning to make the model learn to generalize to new tasks better. Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z)
Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach. It provides examples of such problems, and presents a theorem on how close some approximate solutions can be. The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.