Related papers: ReInform: Selecting paths with reinforcement learning for contextualized link prediction

Related papers

Reinforcement Pre-Training [78.5355979575498]
We introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL)<n>RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific annotated answers.<n>The results position RPT as an effective and promising scaling paradigm to advance language model pre-training.
arXiv Detail & Related papers (2025-06-09T17:59:53Z)
Outcome-based Reinforcement Learning to Predict the Future [1.4313866885019229]
We show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1.<n>The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10%.
arXiv Detail & Related papers (2025-05-23T14:56:07Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
LLMs Can Teach Themselves to Better Predict the Future [1.0923877073891446]
We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models. We generate pairs of diverse reasoning trajectories and probabilistic forecasts for a set of diverse questions. We then rank pairs of these reasoning traces by their distance to the actual outcomes before fine-tuning the model.
arXiv Detail & Related papers (2025-02-07T17:21:16Z)
Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method [11.794628063040108]
Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question. We propose Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions.
arXiv Detail & Related papers (2024-10-22T08:04:32Z)
Deep Limit Model-free Prediction in Regression [0.0]
We provide a Model-free approach based on Deep Neural Network (DNN) to accomplish point prediction and prediction interval under a general regression setting. Our method is more stable and accurate compared to other DNN-based counterparts, especially for optimal point predictions.
arXiv Detail & Related papers (2024-08-18T16:37:53Z)
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting [15.916325272109454]
We propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario. This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving.
arXiv Detail & Related papers (2024-07-12T17:57:00Z)
Preference Alignment with Flow Matching [23.042382086241364]
Preference Flow Matching (PFM) is a new framework for preference-based reinforcement learning (PbRL) It streamlines the integration of preferences into an arbitrary class of pre-trained models. We provide theoretical insights that support our method's alignment with standard PbRL objectives.
arXiv Detail & Related papers (2024-05-30T08:16:22Z)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble [67.4269821365504]
Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data. We contribute a reward ensemble method that allows the reward model to make more accurate predictions.
arXiv Detail & Related papers (2024-01-30T00:17:37Z)
Enhanced Local Explainability and Trust Scores with Random Forest Proximities [0.9423257767158634]
We exploit the fact that any random forest (RF) regression and classification model can be mathematically formulated as an adaptive weighted K nearest-neighbors model. We show that this linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set. We show how this proximity-based approach to explainability can be used in conjunction with SHAP to explain not just the model predictions, but also out-of-sample performance.
arXiv Detail & Related papers (2023-10-19T02:42:20Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg) ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z)
Multi-Aspect Explainable Inductive Relation Prediction by Sentence Transformer [60.75757851637566]
We introduce the concepts of relation path coverage and relation path confidence to filter out unreliable paths prior to model training to elevate the model performance. We propose Knowledge Reasoning Sentence Transformer (KRST) to predict inductive relations in knowledge graphs.
arXiv Detail & Related papers (2023-01-04T15:33:49Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching. A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.