Related papers: Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons

Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons

URL: http://arxiv.org/abs/2505.18030v1
Date: Fri, 23 May 2025 15:35:39 GMT
Title: Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons
Authors: Hazhar Rahmani, Jie Fu,
Abstract summary: This paper considers a class of preference inference problems where the user's unknown preference is represented by a preorder.<n>We first show that a preference relation over temporal goals can be modeled by a Preference Deterministic Finite Automaton.<n>We develop an algorithm that guarantees to learn, given a characteristic sample, the minimal PDFA equivalent to the true PDFA from which the sample is drawn.
Score: 28.920090391513
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many preference elicitation algorithms consider preference over propositional logic formulas or items with different attributes. In sequential decision making, a user's preference can be a preorder over possible outcomes, each of which is a temporal sequence of events. This paper considers a class of preference inference problems where the user's unknown preference is represented by a preorder over regular languages (sets of temporal sequences), referred to as temporal goals. Given a finite set of pairwise comparisons between finite words, the objective is to learn both the set of temporal goals and the preorder over these goals. We first show that a preference relation over temporal goals can be modeled by a Preference Deterministic Finite Automaton (PDFA), which is a deterministic finite automaton augmented with a preorder over acceptance conditions. The problem of preference inference reduces to learning the PDFA. This problem is shown to be computationally challenging, with the problem of determining whether there exists a PDFA of size smaller than a given integer $k$, consistent with the sample, being NP-Complete. We formalize the properties of characteristic samples and develop an algorithm that guarantees to learn, given a characteristic sample, the minimal PDFA equivalent to the true PDFA from which the sample is drawn. We present the method through a running example and provide detailed analysis using a robotic motion planning problem.

Related papers

Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection [37.54564513506548]
Generative Language Models rely on autoregressive decoding to produce the output sequence token by token.<n>We introduce an evaluation of a comprehensive collection of decoding-free candidate selection approaches on a comprehensive set of tasks.
arXiv Detail & Related papers (2025-01-28T23:21:28Z)
FastGAS: Fast Graph-based Annotation Selection for In-Context Learning [53.17606395275021]
In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Existing methods have proposed to select a subset of unlabeled examples for annotation. We propose a graph-based selection method, FastGAS, designed to efficiently identify high-quality instances.
arXiv Detail & Related papers (2024-06-06T04:05:54Z)
Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies [25.731912021122287]
We consider systems modeled as Markov decision processes, given a partially ordered preference over a set of temporally extended goals. To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. A most preferred policy under a ordering induces a nondominated probability distribution over the finite paths in the MDP.
arXiv Detail & Related papers (2024-03-27T02:46:09Z)
Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach [10.293894471295205]
We study a ranking and selection problem of learning from choice-based feedback with dynamic assortments.<n>We present novel and simple algorithms for both learning goals.
arXiv Detail & Related papers (2023-07-13T05:05:30Z)
Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives [26.180359884973566]
We study temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs) This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces. We formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences.
arXiv Detail & Related papers (2023-04-23T13:03:27Z)
Probabilistic Planning with Partially Ordered Preferences over Temporal Goals [22.77805882908817]
We study planning in Markov decision processes (MDPs) with preferences over temporally extended goals. We introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals. We prove that a weak-stochastic nondominated policy given the preference specification is optimal in the constructed multi-objective MDP.
arXiv Detail & Related papers (2022-09-25T17:13:24Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.