LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Zeyu Feng†1, Hao Luan†1, Pranav Goyal1, and Harold Soh1,2 Equal contribution.1All authors are with the Department of Computer Science, School of Computing, National University of Singapore, Singapore. {zeyu, haoluan, pgoyal, harold}@comp.nus.edu.sg2H. S. is also with the Smart Systems Institute, National University of Singapore.
Abstract

Operating effectively in complex environments while complying with specified constraints is crucial for the safe and successful deployment of robots that interact with and operate around people. In this work, we focus on generating long-horizon trajectories that adhere to static and temporally-extended constraints/instructions at test time. We propose a data-driven diffusion-based framework, LTLDoG, that modifies the inference steps of the reverse process given an instruction specified using finite linear temporal logic (LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT). LTLDoG leverages a satisfaction value function on LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and guides the sampling steps using its gradient field. This value function can also be trained to generalize to new instructions not observed during training, enabling flexible test-time adaptability. Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences. Code and supplementary material are available online at https://github.com/clear-nus/ltldog.

I Introduction

Recent methodologies [1, 2, 3] utilizing data-driven diffusion models [4, 5, 6] have shown remarkable performance in generating robot behaviors across a wide range of tasks. Thanks to their ability to model complex distributions, these methods have surpassed several leading offline reinforcement learning techniques and classical model-based trajectory optimization methods, especially in long-horizon decision-making tasks [1, 3]. However, while conventional diffusion models excel at learning from training datasets, they lack the ability to adapt to new objectives or comply with new constraints during deployment. This shortcoming can lead to unsafe behaviors, posing risks to humans, robots, and their surrounding environment.

In view of this limitation, there has been very recent work on diffusing safe trajectories. Xiao et al. [7] integrated a dynamics model into the denoising diffusion process and incorporated a class of Control Barrier Functions (CBF) to meet safety criteria. Botteghi et al. [8] approached the issue by embedding both safety and reward considerations into a constrained optimization framework, employing CBF constraints as labels for classifier guidance. However, these approaches primarily address static environmental constraints. For example, while they can maneuver around obstacles on a local scale, they fail to comply with more complex temporally-extended directives such as “avoid the kitchen until you are clean”.

Refer to caption
Figure 1: We present LTLDoG, a diffusion-based planning framework for generating trajectories that comply with specified LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae. In the example above, a robot dog is tasked to arrive at the goal position (A), but first has to visit B and avoid obstacles (crosses).

In this work, we propose an alternative approach to flexible trajectory planning with diffusion models, designed to satisfy both static safety requirements and temporal constraints. The core idea is to plan with diffusion models to satisfy finite linear temporal logic (LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT) formulae [9]. LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT offers the ability to define a broad spectrum of instructions/constraints that might emerge during deployment. For example, LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT can describe a visitation order of different objects and locations. The use of propositional logic operators, such as not, facilitates the delineation of safe regions within the state space.

We develop LTLDoG (LTL Diffusion-orienting Guidance, pronounced “Little Dog”), a posterior-sampling based diffusion framework that accommodates finite LTL formulae at test time. We present two variants of LTLDoG: our main method, LTLDoG-S, can be applied to the generation of finite-length trajectory in robot tasks where the labeling function for propositional events is differentiable. LTLDoG-S employs a differentiable formula checker in conditional sampling — specifically, we modify the reverse process to condition upon the criteria that the final (predicted) trajectory satisfies a given LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula. For when a differentiable labeling function is unavailable, we propose LTLDoG-R, which uses a trained LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT neural-symbolic evaluator for posterior sampling. Notably, both variants do not require collecting expert demonstrations for every potential LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instruction. They retain the temporal compositionality and local consistency properties associated with diffusion models [1] — as long as the dataset contains a diverse set of paths, they can potentially “stitch together” snippets of trajectories from the training data to generate plans for unseen LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae.

Experiments on two benchmark environments (long-horizon planning for navigation and policy learning for manipulation) demonstrate that LTLDoG is able to generate trajectories that satisfy feasible safety and temporal constraints. We find that our methods possess the ability to re-plan alternative paths at a high-level based on given instruction. Moreover, real robot experiments show that the generated trajectories can be successfully transferred to a quadruped robot. In summary, this paper makes three key contributions:

  • A conditional trajectory sampling approach designed for LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instructions that leverages pre-trained diffusion models;

  • A regressor-guidance neural network for diffusion that generalizes to novel LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae within a given template structure;

  • Experimental results on benchmark problems and real world demonstrations that validate the effectiveness of planning with safety and temporal constraints.

From a broader perspective, LTLDoG is the first method that fuses symbolic model checking (using LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT) with expressive diffusion-based generative models. We hope our results lays the groundwork towards performant, yet safer and more trustworthy robots.

II Preliminaries and Notation

In this work, our focus is to extend diffusion-based planning methods towards generating trajectories that comply with specified LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae. Here, we provide a concise introduction to diffusion methods in the context of planning and finite linear temporal logic.

II-A Planning with Diffusion

Many tasks in planning, reinforcement learning, and imitation learning require generating trajectories under some specific objective. Let 𝒮𝒮\mathcal{S}caligraphic_S and 𝒜𝒜\mathcal{A}caligraphic_A denote the state and action space, respectively. We use 𝝉=(𝐬0,𝐚0,𝐬1,𝐚1,,𝐬T,𝐚T)𝝉subscript𝐬0subscript𝐚0subscript𝐬1subscript𝐚1subscript𝐬𝑇subscript𝐚𝑇\bm{\tau}=\left(\mathbf{s}_{0},\mathbf{a}_{0},\mathbf{s}_{1},\mathbf{a}_{1},% \ldots,\mathbf{s}_{T},\mathbf{a}_{T}\right)bold_italic_τ = ( bold_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_s start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) to refer to a trajectory, where T𝑇Titalic_T is the planning horizon. The environment transitions to a new state 𝐬t+1subscript𝐬𝑡1\mathbf{s}_{t+1}bold_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT when an agent executes action 𝐚tsubscript𝐚𝑡\mathbf{a}_{t}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at state 𝐬tsubscript𝐬𝑡\mathbf{s}_{t}bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Let the abbreviation 𝒥(𝝉|g)𝒥conditional𝝉𝑔\mathcal{J}(\bm{\tau}|g)caligraphic_J ( bold_italic_τ | italic_g ) denote the objective value function conditioned on a goal state where the trajectory must terminate at, for example, (discounted) cumulative rewards in reinforcement learning, cumulative error of actions in imitation learning, or cost for safety constraints.

Diffusion-based planning methods directly generate partial or entire trajectories by using diffusion models pre-trained on a dataset of trajectories. Let p0(𝝉0)subscript𝑝0superscript𝝉0p_{0}\left(\bm{\tau}^{0}\right)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) denote the distribution of trajectories in dataset, where 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT represents a noiseless trajectory. Given an N𝑁Nitalic_N-step discrete approximation of forward diffusion process p(𝝉i|𝝉i1)𝑝conditionalsuperscript𝝉𝑖superscript𝝉𝑖1p\left(\bm{\tau}^{i}|\bm{\tau}^{i-1}\right)italic_p ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) that slowly corrupts data by adding prespecified noise, diffusion models learn an iterative denoising procedure by approximating the score function 𝝉ilogpi(𝝉i)subscriptsuperscript𝝉𝑖subscript𝑝𝑖superscript𝝉𝑖\nabla_{\bm{\tau}^{i}}\log p_{i}\left(\bm{\tau}^{i}\right)∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) using a step-dependent neural network sθsubscript𝑠𝜃s_{\theta}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT trained with denoising score matching [10]:

θ=argminθ𝔼i,𝝉i,𝝉0[sθ(𝝉i,i)𝝉ilogp(𝝉i|𝝉0)2],superscript𝜃subscriptargmin𝜃subscript𝔼𝑖superscript𝝉𝑖superscript𝝉0delimited-[]superscriptdelimited-∥∥subscript𝑠𝜃superscript𝝉𝑖𝑖subscriptsuperscript𝝉𝑖𝑝conditionalsuperscript𝝉𝑖superscript𝝉02\theta^{*}=\operatorname*{arg\,min}_{\theta}\mathbb{E}_{i,\bm{\tau}^{i},\bm{% \tau}^{0}}\left[\left\lVert s_{\theta}\left(\bm{\tau}^{i},i\right)-\nabla_{\bm% {\tau}^{i}}\log p\left(\bm{\tau}^{i}|\bm{\tau}^{0}\right)\right\rVert^{2}% \right],italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_i , bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∥ italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ) - ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (1)

in which i𝒰{1,2,,N}similar-to𝑖𝒰12𝑁i\sim\mathcal{U}\{1,2,\ldots,N\}italic_i ∼ caligraphic_U { 1 , 2 , … , italic_N } is the diffusion timestep, and 𝝉ip(𝝉i|𝝉0)similar-tosuperscript𝝉𝑖𝑝conditionalsuperscript𝝉𝑖superscript𝝉0\bm{\tau}^{i}\sim p\left(\bm{\tau}^{i}|\bm{\tau}^{0}\right)bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∼ italic_p ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) is the trajectory 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT corrupted with noise. Throughout the paper, we adopt Denoising Diffusion Probabilistic Models (DDPM) [5] as the sampling method, where p(𝝉i|𝝉0)=𝒩(α¯i𝝉0,(1α¯i)𝑰)𝑝conditionalsuperscript𝝉𝑖superscript𝝉0𝒩subscript¯𝛼𝑖superscript𝝉01subscript¯𝛼𝑖𝑰p\left(\bm{\tau}^{i}|\bm{\tau}^{0}\right)=\mathcal{N}\left(\sqrt{\bar{\alpha}_% {i}}\bm{\tau}^{0},\left(1-\bar{\alpha}_{i}\right)\bm{I}\right)italic_p ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) = caligraphic_N ( square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_I ), α¯i:=j=1iαi\bar{\alpha}_{i}\mathrel{\mathop{:}}=\prod_{j=1}^{i}\alpha_{i}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, αi:=1βi\alpha_{i}\mathrel{\mathop{:}}=1-\beta_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : = 1 - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and {βi}subscript𝛽𝑖\{\beta_{i}\}{ italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is a sequence of positive noise scales 0<β1,β2,,βN<1formulae-sequence0subscript𝛽1subscript𝛽2subscript𝛽𝑁10<\beta_{1},\beta_{2},\dots,\beta_{N}<10 < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT < 1.

II-B Linear Temporal Logic (LTL)

Given a finite set of propositional symbols 𝒫𝒫\mathcal{P}caligraphic_P, the formula set ΨΨ\Psiroman_Ψ of LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT contains formulas recursively defined in Backus-Naur form as follows [11, 12]:

φ:=p|¬φ|φψ|φ|φ𝖴ψ,\varphi\mathrel{\mathop{:}}=p\;|\;\neg\varphi\;|\;\varphi\wedge\psi\;|\;% \bigcirc\varphi\;|\;\varphi\operatorname{\mathsf{U}}{}\psi,italic_φ : = italic_p | ¬ italic_φ | italic_φ ∧ italic_ψ | ○ italic_φ | italic_φ sansserif_U italic_ψ ,

where p𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P and φ,ψΨ𝜑𝜓Ψ\varphi,\psi\in\Psiitalic_φ , italic_ψ ∈ roman_Ψ. Intuitively, the formula φabsent𝜑\bigcirc\varphi○ italic_φ (next φ𝜑\varphiitalic_φ) is satisfied if φ𝜑\varphiitalic_φ is satisfied at the next time step. φ𝖴ψ𝜑𝖴𝜓\varphi\operatorname{\mathsf{U}}{}\psiitalic_φ sansserif_U italic_ψ (φ𝜑\varphiitalic_φ until ψ𝜓\psiitalic_ψ) is satisfied if φ𝜑\varphiitalic_φ is satisfied until ψ𝜓\psiitalic_ψ is satisfied, and ψ𝜓\psiitalic_ψ is satisfied by the end of the sequence. From these, other commonly used logical connectives and temporal operators can be defined according to the following equivalences: φψ=¬(¬φ¬ψ)𝜑𝜓𝜑𝜓\varphi\vee\psi=\neg\left(\neg\varphi\wedge\neg\psi\right)italic_φ ∨ italic_ψ = ¬ ( ¬ italic_φ ∧ ¬ italic_ψ ), φ=𝗍𝗋𝗎𝖾𝖴φ𝜑𝗍𝗋𝗎𝖾𝖴𝜑\Diamond\varphi=\mathsf{true}{}\operatorname{\mathsf{U}}{}\varphi◇ italic_φ = sansserif_true sansserif_U italic_φ (eventually φ𝜑\varphiitalic_φ) and φ=¬(¬φ)𝜑𝜑\Box\varphi=\neg\Diamond\left(\neg\varphi\right)□ italic_φ = ¬ ◇ ( ¬ italic_φ ) (always φ𝜑\varphiitalic_φ). The symbols 𝗍𝗋𝗎𝖾𝗍𝗋𝗎𝖾\mathsf{true}{}sansserif_true and 𝖿𝖺𝗅𝗌𝖾𝖿𝖺𝗅𝗌𝖾\mathsf{false}{}sansserif_false can also be in the formula set defined by 𝗍𝗋𝗎𝖾=φ¬φ𝗍𝗋𝗎𝖾𝜑𝜑\mathsf{true}{}=\varphi\vee\neg\varphisansserif_true = italic_φ ∨ ¬ italic_φ and 𝖿𝖺𝗅𝗌𝖾=¬𝗍𝗋𝗎𝖾𝖿𝖺𝗅𝗌𝖾𝗍𝗋𝗎𝖾\mathsf{false}{}=\neg\mathsf{true}{}sansserif_false = ¬ sansserif_true.

In contrast to propositional logic, these formulas are evaluated over finite sequences of observations 𝝈=σ0,σ1,σ2,σT𝝈subscript𝜎0subscript𝜎1subscript𝜎2subscript𝜎𝑇\bm{\sigma}=\langle\sigma_{0},\sigma_{1},\sigma_{2},\ldots\sigma_{T}\ranglebold_italic_σ = ⟨ italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_σ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⟩ (i.e.truth assignments to the propositional symbols in 𝒫𝒫\mathcal{P}caligraphic_P), where σt{0,1}|𝒫|\sigma_{t}\in\{0,1\}^{|\mathcal{P}\rvert}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT | caligraphic_P | end_POSTSUPERSCRIPT and σt,p=1subscript𝜎𝑡𝑝1\sigma_{t,p}=1italic_σ start_POSTSUBSCRIPT italic_t , italic_p end_POSTSUBSCRIPT = 1 iff proposition p𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P is satisfied at time step t𝑡titalic_t. 𝗍𝗋𝗎𝖾𝗍𝗋𝗎𝖾\mathsf{true}{}sansserif_true (𝖿𝖺𝗅𝗌𝖾𝖿𝖺𝗅𝗌𝖾\mathsf{false}{}sansserif_false) is always satisfied (not satisfied) by any assignment. Formally, 𝝈𝝈\bm{\sigma}bold_italic_σ satisfies φ𝜑\varphiitalic_φ at time t0𝑡0t\geq 0italic_t ≥ 0, denoted by 𝝈,tφmodels𝝈𝑡𝜑\langle{\bm{\sigma},t}\rangle\models\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_φ, as follows:

  • 𝝈,tpmodels𝝈𝑡𝑝\langle{\bm{\sigma},t}\rangle\models p⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_p iff σt,p=1subscript𝜎𝑡𝑝1\sigma_{t,p}=1italic_σ start_POSTSUBSCRIPT italic_t , italic_p end_POSTSUBSCRIPT = 1, where p𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P

  • 𝝈,t¬φmodels𝝈𝑡𝜑\langle{\bm{\sigma},t}\rangle\models\neg\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧ ¬ italic_φ iff 𝝈,t⊧̸φnot-models𝝈𝑡𝜑\langle{\bm{\sigma},t}\rangle\not\models\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧̸ italic_φ

  • 𝝈,t(φψ)models𝝈𝑡𝜑𝜓\langle{\bm{\sigma},t}\rangle\models(\varphi\wedge\psi)⟨ bold_italic_σ , italic_t ⟩ ⊧ ( italic_φ ∧ italic_ψ ) iff 𝝈,tφmodels𝝈𝑡𝜑\langle{\bm{\sigma},t}\rangle\models\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_φ and 𝝈,tψmodels𝝈𝑡𝜓\langle{\bm{\sigma},t}\rangle\models\psi⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_ψ

  • 𝝈,tφ\langle{\bm{\sigma},t}\rangle\models\bigcirc\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧ ○ italic_φ iff 𝝈,t+1φmodels𝝈𝑡1𝜑\langle{\bm{\sigma},t+1}\rangle\models\varphi⟨ bold_italic_σ , italic_t + 1 ⟩ ⊧ italic_φ

  • 𝝈,tφ𝖴ψmodels𝝈𝑡𝜑𝖴𝜓\langle{\bm{\sigma},t}\rangle\models\varphi\operatorname{\mathsf{U}}{}\psi⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_φ sansserif_U italic_ψ iff t2[t,T]subscript𝑡2𝑡𝑇\exists t_{2}\in[t,T]∃ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ italic_t , italic_T ] s.t. 𝝈,t2ψmodels𝝈subscript𝑡2𝜓\langle{\bm{\sigma},t_{2}}\rangle\models\psi⟨ bold_italic_σ , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ ⊧ italic_ψ and t1[t,t2)for-allsubscript𝑡1𝑡subscript𝑡2\forall t_{1}\in[t,t_{2})∀ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ italic_t , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), 𝝈,t1φmodels𝝈subscript𝑡1𝜑\langle{\bm{\sigma},t_{1}}\rangle\models\varphi⟨ bold_italic_σ , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ ⊧ italic_φ

A sequence 𝝈𝝈\bm{\sigma}bold_italic_σ is then said to satisfy φ𝜑\varphiitalic_φi.e., 𝝈φmodels𝝈𝜑\bm{\sigma}\models\varphibold_italic_σ ⊧ italic_φ, iff 𝝈,0φmodels𝝈0𝜑\langle{\bm{\sigma},0}\rangle\models\varphi⟨ bold_italic_σ , 0 ⟩ ⊧ italic_φ.

III Method

In this section, we describe our primary contribution, LTLDoG, a diffusion-based framework for generating trajectories that satisfy LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae. We first discuss how to conditionally sample using diffusion models, followed by how LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae can be used to guide the diffusion process.

III-A Conditional Sampling in Diffusion Models

Given a trained score function from (1) such that sθ𝝉ilogpi(𝝉i)subscript𝑠𝜃subscriptsuperscript𝝉𝑖subscript𝑝𝑖superscript𝝉𝑖s_{\theta}\approx\nabla_{\bm{\tau}^{i}}\log p_{i}\left(\bm{\tau}^{i}\right)italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ≈ ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), a diffusion model denoises samples according to the distribution pθ(𝝉i1|𝝉i)=𝒩(1αi(𝝉i+(1αi)sθ(𝝉i,i)),(1αi)𝑰)subscript𝑝𝜃conditionalsuperscript𝝉𝑖1superscript𝝉𝑖𝒩1subscript𝛼𝑖superscript𝝉𝑖1subscript𝛼𝑖subscript𝑠𝜃superscript𝝉𝑖𝑖1subscript𝛼𝑖𝑰p_{\theta}\left(\bm{\tau}^{i-1}|\bm{\tau}^{i}\right)=\mathcal{N}\left(\frac{1}% {\sqrt{\alpha_{i}}}\left(\bm{\tau}^{i}+\left(1-\alpha_{i}\right)s_{\theta}% \left(\bm{\tau}^{i},i\right)\right),\left(1-\alpha_{i}\right)\bm{I}\right)italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = caligraphic_N ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ) ) , ( 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_I ) starting from a Gaussian prior 𝝉N𝒩(𝟎,𝑰)similar-tosuperscript𝝉𝑁𝒩0𝑰\bm{\tau}^{N}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_τ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I ). For example, Diffuser [1] samples a trajectory 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT from a diffusion model, which an agent then executes. However, this original sampling process is unable to control detailed properties of generated context.

Here, we are interested in sampling trajectories that satisfy both the final goal and the specified instructions encoded as an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula φ𝜑\varphiitalic_φ that is provided during deployment. In other words, we aim to sample trajectories under an objective function 𝒥φ(𝝉0|g)subscript𝒥𝜑conditionalsuperscript𝝉0𝑔\mathcal{J}_{\varphi}\left(\bm{\tau}^{0}|g\right)caligraphic_J start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT | italic_g ). For example, 𝒥𝒥\mathcal{J}caligraphic_J can have a high value if the events induced by 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT satisfy φ𝜑\varphiitalic_φ and have a low value otherwise.

Formally, given a set of atomic propositions 𝒫𝒫\mathcal{P}caligraphic_P, the assignments for 𝝉𝝉\bm{\tau}bold_italic_τ are given by a labeling function L:𝒮×𝒜2|𝒫|:𝐿𝒮𝒜superscript2𝒫L:\mathcal{S}\times\mathcal{A}\rightarrow 2^{\lvert\mathcal{P}\rvert}italic_L : caligraphic_S × caligraphic_A → 2 start_POSTSUPERSCRIPT | caligraphic_P | end_POSTSUPERSCRIPT, where each timestep of 𝝉𝝉\bm{\tau}bold_italic_τ induces an assignment to the propositional symbols in 𝒫𝒫\mathcal{P}caligraphic_P. For example, in navigation task, 𝒫𝒫\mathcal{P}caligraphic_P can represent multiple regions to avoid and L𝐿Litalic_L is a function indicating whether (𝐬t,𝐚t)subscript𝐬𝑡subscript𝐚𝑡(\mathbf{s}_{t},\mathbf{a}_{t})( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in 𝝉𝝉\bm{\tau}bold_italic_τ are inside these regions or not. We will slightly abuse notation and write 𝝉φmodels𝝉𝜑\bm{\tau}\models\varphibold_italic_τ ⊧ italic_φ to indicate that 𝝉𝝉\bm{\tau}bold_italic_τ’s assignments 𝝈𝝈\bm{\sigma}bold_italic_σ satisfy a LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula.

We aim to sample from the posterior p0(𝝉0|𝝉0φ,g)subscript𝑝0modelsconditionalsuperscript𝝉0superscript𝝉0𝜑𝑔p_{0}\left(\bm{\tau}^{0}|\bm{\tau}^{0}\models\varphi,g\right)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊧ italic_φ , italic_g ) with the diffusion model as the prior. In this work, the constraint of goal state conditioning g𝑔gitalic_g can be either achieved by inpainting (similarly on the start state 𝐬0subscript𝐬0\mathbf{s}_{0}bold_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) or implicitly encoded in the dataset, which does not require separate modeling in conditional sampling. Therefore, we mainly target the posterior p0(𝝉0|𝝉0φ)subscript𝑝0modelsconditionalsuperscript𝝉0superscript𝝉0𝜑p_{0}\left(\bm{\tau}^{0}|\bm{\tau}^{0}\models\varphi\right)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊧ italic_φ ). Let y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denote a binary random variable indicating the likelihood of 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT satisfying φ𝜑\varphiitalic_φ. Hence, the denoising process requires a score function conditioned on y0=1subscript𝑦01y_{0}=1italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 and by Bayes’ rule: 𝝉ilogpi(𝝉i|y0=1)=𝝉ilogpi(𝝉i)+𝝉ilogpi(y0=1|𝝉i)subscriptsuperscript𝝉𝑖subscript𝑝𝑖conditionalsuperscript𝝉𝑖subscript𝑦01subscriptsuperscript𝝉𝑖subscript𝑝𝑖superscript𝝉𝑖subscriptsuperscript𝝉𝑖subscript𝑝𝑖subscript𝑦0conditional1superscript𝝉𝑖\nabla_{\bm{\tau}^{i}}\log p_{i}\left(\bm{\tau}^{i}|y_{0}=1\right)=\nabla_{\bm% {\tau}^{i}}\log p_{i}\left(\bm{\tau}^{i}\right)+\nabla_{\bm{\tau}^{i}}\log p_{% i}\left(y_{0}=1|\bm{\tau}^{i}\right)∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 ) = ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). The first term on the right-hand side has been learned by a neural network in diffusion. However, the latter term requires an integration over all possible values of 𝝉0superscript𝝉0\bm{\tau}^{0}bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT: pi(y0|𝝉i)=p(y0|𝝉0)pi(𝝉0|𝝉i)d𝝉0subscript𝑝𝑖conditionalsubscript𝑦0superscript𝝉𝑖𝑝conditionalsubscript𝑦0superscript𝝉0subscript𝑝𝑖conditionalsuperscript𝝉0superscript𝝉𝑖differential-dsuperscript𝝉0p_{i}\left(y_{0}|\bm{\tau}^{i}\right)=\int p\left(y_{0}|\bm{\tau}^{0}\right)p_% {i}\left(\bm{\tau}^{0}|\bm{\tau}^{i}\right)\mathop{}\!\mathrm{d}\bm{\tau}^{0}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∫ italic_p ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_d bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. We consider the plug-and-play conditional generation setting and approximating this integration with sample estimation [13]e.g., point estimation with pi(y0|𝝉i)p(y0|𝝉^0)subscript𝑝𝑖conditionalsubscript𝑦0superscript𝝉𝑖𝑝conditionalsubscript𝑦0superscript^𝝉0p_{i}\left(y_{0}|\bm{\tau}^{i}\right)\approx p\left(y_{0}|\hat{\bm{\tau}}^{0}\right)italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≈ italic_p ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) where the noiseless trajectory 𝝉^0superscript^𝝉0\hat{\bm{\tau}}^{0}over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is estimated via Tweedie’s formula [14] 𝝉^0=1α¯i(𝝉i+(1α¯i)𝝉ilogpi(𝝉i))superscript^𝝉01subscript¯𝛼𝑖superscript𝝉𝑖1subscript¯𝛼𝑖subscriptsuperscript𝝉𝑖subscript𝑝𝑖superscript𝝉𝑖\hat{\bm{\tau}}^{0}=\frac{1}{\sqrt{\bar{\alpha}_{i}}}\left(\bm{\tau}^{i}+\left% (1-\bar{\alpha}_{i}\right)\nabla_{\bm{\tau}^{i}}\log p_{i}\left(\bm{\tau}^{i}% \right)\right)over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ). We model the likelihood term as p(y|𝝉)=Ze𝟙[y=𝟙[𝝉φ]]𝑝conditional𝑦𝝉𝑍superscript𝑒1delimited-[]𝑦1delimited-[]models𝝉𝜑p\left(y|\bm{\tau}\right)=Ze^{\mathds{1}\left[y=\mathds{1}\left[\bm{\tau}% \models\varphi\right]\right]}italic_p ( italic_y | bold_italic_τ ) = italic_Z italic_e start_POSTSUPERSCRIPT blackboard_1 [ italic_y = blackboard_1 [ bold_italic_τ ⊧ italic_φ ] ] end_POSTSUPERSCRIPT, where Z𝑍Zitalic_Z is a normalizing constant and 𝟙[]1delimited-[]\mathds{1}\left[\cdot\right]blackboard_1 [ ⋅ ] is the indicator function. Putting the above elements together, the conditional score function can be computed as follows,

𝝉ilogpi(y0|𝝉i)𝝉ilogp(y0|𝝉^0)=𝝉ilog(Ze𝟙[y=𝟙[𝝉^0φ]])=𝝉i𝟙[y=𝟙[𝝉^0φ]].subscriptsuperscript𝝉𝑖subscript𝑝𝑖conditionalsubscript𝑦0superscript𝝉𝑖subscriptsuperscript𝝉𝑖𝑝conditionalsubscript𝑦0superscript^𝝉0subscriptsuperscript𝝉𝑖𝑍superscript𝑒1delimited-[]𝑦1delimited-[]modelssuperscript^𝝉0𝜑subscriptsuperscript𝝉𝑖1delimited-[]𝑦1delimited-[]modelssuperscript^𝝉0𝜑\begin{split}\nabla_{\bm{\tau}^{i}}\log p_{i}\left(y_{0}|\bm{\tau}^{i}\right)&% \approx\nabla_{\bm{\tau}^{i}}\log p\left(y_{0}|\hat{\bm{\tau}}^{0}\right)\\ &=\nabla_{\bm{\tau}^{i}}\log\left(Ze^{\mathds{1}\left[y=\mathds{1}\left[\hat{% \bm{\tau}}^{0}\models\varphi\right]\right]}\right)\\ &=\nabla_{\bm{\tau}^{i}}\mathds{1}\left[y=\mathds{1}\left[\hat{\bm{\tau}}^{0}% \models\varphi\right]\right].\end{split}start_ROW start_CELL ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL start_CELL ≈ ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log ( italic_Z italic_e start_POSTSUPERSCRIPT blackboard_1 [ italic_y = blackboard_1 [ over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊧ italic_φ ] ] end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_1 [ italic_y = blackboard_1 [ over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊧ italic_φ ] ] . end_CELL end_ROW (2)

Unfortunately, both the indicator function and the satisfaction evaluation (performed by techniques like model checking with finite automata) are non-differentiable — this prohibits application in the gradient-based sampling process of diffusion models. We address this problem in the next subsection.

III-B Differentiable Evaluation of LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT

Our key approach is to “soften” the satisfaction evaluation. To make the evaluation differentiable, we modify our formula evaluator to output positive real values if a trajectory satisfies the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula, and negative real values otherwise. Instead of using a binary labeling functions, we assume real valued assignments of atomic propositions can be obtained through computation on the generated trajectories. As a specific example, consider a navigation task where a robot has to avoid obstacles; one can determine the assignment for proposition p𝑝pitalic_p at timestep t𝑡titalic_t using the Euclidean distance between 𝐬tsubscript𝐬𝑡\mathbf{s}_{t}bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the centers of the region 𝐜psubscript𝐜𝑝\mathbf{c}_{p}bold_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT (assuming a circular shape). A positive value of (p,t)=rp𝐬t𝐜p2𝑝𝑡subscript𝑟𝑝subscriptdelimited-∥∥subscript𝐬𝑡subscript𝐜𝑝2\ell(p,t)=r_{p}-\lVert\mathbf{s}_{t}-\mathbf{c}_{p}\rVert_{2}roman_ℓ ( italic_p , italic_t ) = italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - ∥ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where rpsubscript𝑟𝑝r_{p}italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the radius of the circle, indicates a true assignment. Consequently, the labeling function for the entire trajectory L:(𝒮×𝒜)T+1|𝒫|×(T+1):𝐿superscript𝒮𝒜𝑇1superscript𝒫𝑇1L:(\mathcal{S}\times\mathcal{A})^{T+1}\rightarrow\mathbb{R}^{\lvert\mathcal{P}% \rvert\times(T+1)}italic_L : ( caligraphic_S × caligraphic_A ) start_POSTSUPERSCRIPT italic_T + 1 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT | caligraphic_P | × ( italic_T + 1 ) end_POSTSUPERSCRIPT is differentiable and the assignments are 𝝈=L(𝝉)𝝈𝐿𝝉\bm{\sigma}=L(\bm{\tau})bold_italic_σ = italic_L ( bold_italic_τ ).

A binary version of 𝝈𝝈\bm{\sigma}bold_italic_σ through a sign function sgn(𝝈)sgn𝝈\text{sgn}(\bm{\sigma})sgn ( bold_italic_σ ) satisfies the definition of satisfy defined in Section II-B, but the sign operation breaks differentiability. As such, we employ a formula evaluator ft(φ,𝝈t:T):Ψ×|𝒫|×(Tt+1):subscript𝑓𝑡𝜑subscript𝝈:𝑡𝑇Ψsuperscript𝒫𝑇𝑡1f_{t}\left(\varphi,\bm{\sigma}_{t:T}\right):\Psi\times\mathbb{R}^{\lvert% \mathcal{P}\rvert\times(T-t+1)}\rightarrow\mathbb{R}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT ) : roman_Ψ × blackboard_R start_POSTSUPERSCRIPT | caligraphic_P | × ( italic_T - italic_t + 1 ) end_POSTSUPERSCRIPT → blackboard_R to check satisfaction [15], with positive values implying 𝝈,tφmodels𝝈𝑡𝜑\langle{\bm{\sigma},t}\rangle\models\varphi⟨ bold_italic_σ , italic_t ⟩ ⊧ italic_φ, similar to signal temporal logic [16]. As such, f𝑓fitalic_f is differentiable with the evaluation process defined as follows:

  • ft(𝗍𝗋𝗎𝖾,𝝈)=+subscript𝑓𝑡𝗍𝗋𝗎𝖾𝝈f_{t}(\mathsf{true}{},\bm{\sigma})=+\inftyitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( sansserif_true , bold_italic_σ ) = + ∞

  • ft(𝖿𝖺𝗅𝗌𝖾,𝝈)=subscript𝑓𝑡𝖿𝖺𝗅𝗌𝖾𝝈f_{t}(\mathsf{false}{},\bm{\sigma})=-\inftyitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( sansserif_false , bold_italic_σ ) = - ∞

  • ft(p,𝝈)=𝝈t,psubscript𝑓𝑡𝑝𝝈subscript𝝈𝑡𝑝f_{t}(p,\bm{\sigma})=\bm{\sigma}_{t,p}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p , bold_italic_σ ) = bold_italic_σ start_POSTSUBSCRIPT italic_t , italic_p end_POSTSUBSCRIPT

  • ft(¬φ,𝝈)=ft(φ,𝝈)subscript𝑓𝑡𝜑𝝈subscript𝑓𝑡𝜑𝝈f_{t}(\neg\varphi,\bm{\sigma})=-f_{t}(\varphi,\bm{\sigma})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ¬ italic_φ , bold_italic_σ ) = - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ )

  • ft(φψ,𝝈)=minγ{ft(φ,𝝈),ft(ψ,𝝈)}subscript𝑓𝑡𝜑𝜓𝝈superscript𝛾subscript𝑓𝑡𝜑𝝈subscript𝑓𝑡𝜓𝝈f_{t}(\varphi\wedge\psi,\bm{\sigma})=\min^{\gamma}\left\{f_{t}(\varphi,\bm{% \sigma}),f_{t}(\psi,\bm{\sigma})\right\}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ ∧ italic_ψ , bold_italic_σ ) = roman_min start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ ) , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ψ , bold_italic_σ ) }

  • ft(φψ,𝝈)=maxγ{ft(φ,𝝈),ft(ψ,𝝈)}subscript𝑓𝑡𝜑𝜓𝝈superscript𝛾subscript𝑓𝑡𝜑𝝈subscript𝑓𝑡𝜓𝝈f_{t}(\varphi\vee\psi,\bm{\sigma})=\max^{\gamma}\{f_{t}(\varphi,\bm{\sigma}),f% _{t}(\psi,\bm{\sigma})\}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ ∨ italic_ψ , bold_italic_σ ) = roman_max start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ ) , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ψ , bold_italic_σ ) }

  • ft(φ,𝝈)=ft+1(φ,𝝈)f_{t}(\bigcirc\varphi,\bm{\sigma})=f_{t+1}(\varphi,\bm{\sigma})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ○ italic_φ , bold_italic_σ ) = italic_f start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ )

  • ft(φ,𝝈)=minγ{ft:T(φ,𝝈)}subscript𝑓𝑡𝜑𝝈superscript𝛾subscript𝑓:𝑡𝑇𝜑𝝈f_{t}(\Box\varphi,\bm{\sigma})=\min^{\gamma}\{f_{t:T}(\varphi,\bm{\sigma})\}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( □ italic_φ , bold_italic_σ ) = roman_min start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ ) }

  • ft(φ,𝝈)=maxγ{ft:T(φ,𝝈)}subscript𝑓𝑡𝜑𝝈superscript𝛾subscript𝑓:𝑡𝑇𝜑𝝈f_{t}(\Diamond\varphi,\bm{\sigma})=\max^{\gamma}\{f_{t:T}(\varphi,\bm{\sigma})\}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ◇ italic_φ , bold_italic_σ ) = roman_max start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ ) }

  • ft(φ𝖴ψ,𝝈)=minγ{ft:k(φ,𝝈),ft(ψ,𝝈)}subscript𝑓𝑡𝜑𝖴𝜓𝝈superscript𝛾subscript𝑓:𝑡𝑘𝜑𝝈subscript𝑓𝑡𝜓𝝈f_{t}(\varphi\operatorname{\mathsf{U}}{}\psi,\bm{\sigma})=\min^{\gamma}\{f_{t:% k}(\varphi,\bm{\sigma}),f_{t}\left(\Diamond\psi,\bm{\sigma}\right)\}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_φ sansserif_U italic_ψ , bold_italic_σ ) = roman_min start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT { italic_f start_POSTSUBSCRIPT italic_t : italic_k end_POSTSUBSCRIPT ( italic_φ , bold_italic_σ ) , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ◇ italic_ψ , bold_italic_σ ) }, where kt𝑘𝑡k\geq titalic_k ≥ italic_t is the smallest integer s.t. fk(ψ,𝝈)>0subscript𝑓𝑘𝜓𝝈0f_{k}(\psi,\bm{\sigma})>0italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_ψ , bold_italic_σ ) > 0

Note that the min\minroman_min and max\maxroman_max functions are likewise “soft” to maintain differentiability. To reduce clutter, we have omitted the subscript t𝑡titalic_t when t=0𝑡0t=0italic_t = 0. With these operations, f𝑓fitalic_f maintains quantitative semantics, which preserves the relative values between different 𝝈𝝈\bm{\sigma}bold_italic_σ such that trajectories with larger margin to a satisfying assignment have larger values.

Given L𝐿Litalic_L and f𝑓fitalic_f, we can obtain a differentiable score function by replacing the likelihood term in (2) with p(y|𝝉)=Ze(2y1)f(φ,L(𝝉))𝑝conditional𝑦𝝉𝑍superscript𝑒2𝑦1𝑓𝜑𝐿𝝉p\left(y|\bm{\tau}\right)=Ze^{(2y-1)f\left(\varphi,L\left(\bm{\tau}\right)% \right)}italic_p ( italic_y | bold_italic_τ ) = italic_Z italic_e start_POSTSUPERSCRIPT ( 2 italic_y - 1 ) italic_f ( italic_φ , italic_L ( bold_italic_τ ) ) end_POSTSUPERSCRIPT. The conditional score is then

𝝉ilogpi(y0=1|𝝉i)𝝉ilog(Zef(φ,L(𝝉^0)))=𝝉if(φ,L(𝝉^0)).subscriptsuperscript𝝉𝑖subscript𝑝𝑖subscript𝑦0conditional1superscript𝝉𝑖subscriptsuperscript𝝉𝑖𝑍superscript𝑒𝑓𝜑𝐿superscript^𝝉0subscriptsuperscript𝝉𝑖𝑓𝜑𝐿superscript^𝝉0\begin{split}\nabla_{\bm{\tau}^{i}}\log p_{i}\left(y_{0}=1|\bm{\tau}^{i}\right% )&\approx\nabla_{\bm{\tau}^{i}}\log\left(Ze^{f\left(\varphi,L\left(\hat{\bm{% \tau}}^{0}\right)\right)}\right)\\ &=\nabla_{\bm{\tau}^{i}}f\left(\varphi,L\left(\hat{\bm{\tau}}^{0}\right)\right% ).\end{split}start_ROW start_CELL ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL start_CELL ≈ ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log ( italic_Z italic_e start_POSTSUPERSCRIPT italic_f ( italic_φ , italic_L ( over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_φ , italic_L ( over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) . end_CELL end_ROW (3)

This approximation of the conditional score function can be directly used with a pre-trained diffusion model to sample trajectories conditioned on an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula. We call this method LTLDoG-S, since it performs the above posterior sampling in the reverse process (Algorithm 1). The gradient ascent step is controlled by a stepsize {ζi}i=1Nsuperscriptsubscriptsubscript𝜁𝑖𝑖1𝑁\{\zeta_{i}\}_{i=1}^{N}{ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT; in practice, the stepsize for each denoise step can be adaptive such that 𝝉i1superscript𝝉𝑖1\bm{\tau}^{i-1}bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT remains valid according to the formula evaluator f(φ,L(𝝉^0))𝑓𝜑𝐿superscript^𝝉0f\left(\varphi,L\left(\hat{\bm{\tau}}^{0}\right)\right)italic_f ( italic_φ , italic_L ( over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ).

Algorithm 1 LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT Planning with Posterior Sampling
1:φ𝜑\varphiitalic_φ, N𝑁Nitalic_N, sθsubscript𝑠𝜃s_{\theta}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, {ζi}i=1Nsuperscriptsubscriptsubscript𝜁𝑖𝑖1𝑁\{\zeta_{i}\}_{i=1}^{N}{ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
2:𝝉N𝒩(𝟎,𝑰)similar-tosuperscript𝝉𝑁𝒩0𝑰\bm{\tau}^{N}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_τ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I )
3:for i=N1𝑖𝑁1i=N-1italic_i = italic_N - 1 to 00 do
4:     s^sθ(𝝉i,i)^𝑠subscript𝑠𝜃superscript𝝉𝑖𝑖\hat{s}\leftarrow s_{\theta}\left(\bm{\tau}^{i},i\right)over^ start_ARG italic_s end_ARG ← italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i )
5:     𝝉^01α¯i(𝝉i+(1α¯i)s^)superscript^𝝉01subscript¯𝛼𝑖superscript𝝉𝑖1subscript¯𝛼𝑖^𝑠\hat{\bm{\tau}}^{0}\leftarrow\frac{1}{\sqrt{\bar{\alpha}_{i}}}\left(\bm{\tau}^% {i}+\left(1-\bar{\alpha}_{i}\right)\hat{s}\right)over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ← divide start_ARG 1 end_ARG start_ARG square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) over^ start_ARG italic_s end_ARG )
6:     ϵ𝒩(𝟎,𝑰)similar-tobold-italic-ϵ𝒩0𝑰\bm{\epsilon}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_ϵ ∼ caligraphic_N ( bold_0 , bold_italic_I )
7:     𝝉i1αi(1α¯i1)1α¯i𝝉i+α¯i1(1αi)1α¯i𝝉^0+1αiϵsuperscript𝝉𝑖1subscript𝛼𝑖1subscript¯𝛼𝑖11subscript¯𝛼𝑖superscript𝝉𝑖subscript¯𝛼𝑖11subscript𝛼𝑖1subscript¯𝛼𝑖superscript^𝝉01subscript𝛼𝑖bold-italic-ϵ\bm{\tau}^{i-1}\leftarrow\frac{\sqrt{\alpha_{i}}\left(1-\bar{\alpha}_{i-1}% \right)}{1-\bar{\alpha}_{i}}\bm{\tau}^{i}+\frac{\sqrt{\bar{\alpha}_{i-1}}(1-% \alpha_{i})}{1-\bar{\alpha}_{i}}\hat{\bm{\tau}}^{0}+\sqrt{1-\alpha_{i}}\bm{\epsilon}bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ← divide start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + divide start_ARG square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG ( 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + square-root start_ARG 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_ϵ
8:     𝝉i1𝝉i1+ζi𝝉if(φ,L(𝝉^0))superscript𝝉𝑖1superscript𝝉𝑖1subscript𝜁𝑖subscriptsuperscript𝝉𝑖𝑓𝜑𝐿superscript^𝝉0\bm{\tau}^{i-1}\leftarrow\bm{\tau}^{i-1}+{\zeta_{i}}\nabla_{\bm{\tau}^{i}}f% \left(\varphi,L\left(\hat{\bm{\tau}}^{0}\right)\right)bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ← bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_φ , italic_L ( over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) )
9:end for
10:return 𝝉^0superscript^𝝉0\hat{\bm{\tau}}^{0}over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT

III-C Classifier Guidance over LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT

One limitation of LTLDoG-S is that it necessitates a known (and differentiable) expression for the formula evaluator f(φ,L(𝝉^0))𝑓𝜑𝐿superscript^𝝉0f\left(\varphi,L\left(\hat{\bm{\tau}}^{0}\right)\right)italic_f ( italic_φ , italic_L ( over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ). This requirement can be challenging to meet in scenarios where the truth assignments of propositions are uncertain, for example, when the ground-truth physical dynamics are unknown. Here, we circumvent this problem by employing classifier guidance using a trained formula evaluator.

We propose a variant of LTLDoG with regressor guidance, which we abbreviate as LTLDoG-R. Using an 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT loss, we train a neural network to predict the satisfaction values from noisy trajectories conditioned on LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instructions. In other words, our neural network approximates the conditional score function 𝝉ilogpi(y0=1|𝝉i)subscriptsuperscript𝝉𝑖subscript𝑝𝑖subscript𝑦0conditional1superscript𝝉𝑖\nabla_{\bm{\tau}^{i}}\log p_{i}\left(y_{0}=1|\bm{\tau}^{i}\right)∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). Note that in contrast to learning a binary classifier for trajectory satisfaction, we apply the labeling function in Section III-B on the dataset and associate with each trajectory its objective value 𝒥φ(𝝉0)subscript𝒥𝜑superscript𝝉0\mathcal{J}_{\varphi}\left(\bm{\tau}^{0}\right)caligraphic_J start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ). We conduct ablation study in Section V-D to show that using real values performs better than using binary labels.

To generalize over different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, the neural network takes both noisy trajectory and LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula embedding as input. Multiple methods exist for embedding LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae. For instance, one can use Graph Neural Networks (GNNs) [17, 18] to embed the tree representation of an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula directly. Alternatively, the deterministic finite-state automaton (DFA) [19, 20] associated with the formulae can be embedded [21]. In this work we employ LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT tasks from [22] and embed the directed graph of an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula using the Relational Graph Convolutional Network (R-GCN) [23], which can generalize to LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae with same template structure. The model that approximates the score function sϕ(ϕ,𝝉i,i)𝝉ilogpi(y0|𝝉i)subscript𝑠italic-ϕitalic-ϕsuperscript𝝉𝑖𝑖subscriptsuperscript𝝉𝑖subscript𝑝𝑖conditionalsubscript𝑦0superscript𝝉𝑖s_{\phi}(\phi,\bm{\tau}^{i},i)\approx\nabla_{\bm{\tau}^{i}}\log p_{i}\left(y_{% 0}|\bm{\tau}^{i}\right)italic_s start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_ϕ , bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ) ≈ ∇ start_POSTSUBSCRIPT bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) after training can be plugged into the conditional reverse process using regressor guidance as summarized in Algorithm 2.

Algorithm 2 LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT Planning with Regressor Guidance
1:φ𝜑\varphiitalic_φ, N𝑁Nitalic_N, sθsubscript𝑠𝜃s_{\theta}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, sϕsubscript𝑠italic-ϕs_{\phi}italic_s start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, {ζi}i=1Nsuperscriptsubscriptsubscript𝜁𝑖𝑖1𝑁\{\zeta_{i}\}_{i=1}^{N}{ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
2:𝝉N𝒩(𝟎,𝑰)similar-tosuperscript𝝉𝑁𝒩0𝑰\bm{\tau}^{N}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_τ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I )
3:for i=N1𝑖𝑁1i=N-1italic_i = italic_N - 1 to 00 do
4:     ϵ𝒩(𝟎,𝑰)similar-tobold-italic-ϵ𝒩0𝑰\bm{\epsilon}\sim\mathcal{N}\left(\bm{0},\bm{I}\right)bold_italic_ϵ ∼ caligraphic_N ( bold_0 , bold_italic_I )
5:     𝝉i11αi(𝝉i+(1αi)sθ(𝝉i,i))+1αiϵsuperscript𝝉𝑖11subscript𝛼𝑖superscript𝝉𝑖1subscript𝛼𝑖subscript𝑠𝜃superscript𝝉𝑖𝑖1subscript𝛼𝑖bold-italic-ϵ\bm{\tau}^{i-1}\leftarrow\frac{1}{\sqrt{\alpha_{i}}}\left(\bm{\tau}^{i}+(1-% \alpha_{i})s_{\theta}\left(\bm{\tau}^{i},i\right)\right)+\sqrt{1-\alpha_{i}}% \bm{\epsilon}bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ← divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ) ) + square-root start_ARG 1 - italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_ϵ
6:     𝝉i1𝝉i1+ζisϕ(ϕ,𝝉i,i)superscript𝝉𝑖1superscript𝝉𝑖1subscript𝜁𝑖subscript𝑠italic-ϕitalic-ϕsuperscript𝝉𝑖𝑖\bm{\tau}^{i-1}\leftarrow\bm{\tau}^{i-1}+\zeta_{i}s_{\phi}(\phi,\bm{\tau}^{i},i)bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ← bold_italic_τ start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_ϕ , bold_italic_τ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i )
7:end for
8:return 𝝉^0superscript^𝝉0\hat{\bm{\tau}}^{0}over^ start_ARG bold_italic_τ end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT

IV Related Work

LTLDoG builds upon prior work in diffusion-based planning and symbolic reasoning using LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT for robotics. In the following, we give a brief overview of related work.

Learning and Planning under LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. As an expressive language for specifying high-level planning requirements [24, 25, 26], LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT has been extensively used in various robotic tasks to express temporally extended goals [27, 28]. These methods usually require the information about the environment’s dynamics, e.g., a model or an abstraction, to effectively plan under a given formula. Reinforcement learning agent learn in a model-free way under LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT objectives or constraints [29, 30, 31] with the ability to generalize over different formulae [22]. However, these methods operate agent in an online manner via trial and error, which can lead to expensive or even unsafe interactions.

Planning and Policy Learning with Diffusion. Recent diffusion-based planning methods are flexible that only rely on offline datasets without access or interaction to environments. They have been successfully applied to long-horizon planning problems by generating states or actions for control [1, 3, 2], but not tasks with test-time temporal requirements. Recent work has looked into safety critical tasks e.g., the aforementioned CBF-based methods [7, 8]. As discussed above, these methods were designed for static safety criteria, and the lack the ability to satisfy temporally extended behaviors. Our work inherits the advantages of diffusion based methods and can fulfill LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT requirements.

Inverse Problems in Diffusion. Our proposed method formulates conditional measurement under LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT with differentiable loss function using the unnormalized likelihood [13, 32] for posterior sampling and can guide the sampling process in a plug-and-play fashion. The most popular methods to guide diffusion models during inference is classifier guidance [33] and classifier-free guidance [34]. However, these methods cannot be applied in a plug-and-play fashion for new conditioning factors. Our work is also related to inverse task that infers a posterior distribution over data given a measurement. Inverse methods [35, 36, 37, 38] do not require training a conditional model in diffusion and can directly leverage pre-trained neural networks.

Refer to caption
(a) Studio-like Lab.
Refer to caption
(b) Office room.
Refer to caption
(c) Unitree Go2.
Figure 2: Real world environments for quadruped robot navigation.

V Experiments

TABLE I: Performance on Avoidance Tasks in Maze2d
Method\Perf. Satisfaction rate1 (%percent\%%) \uparrow Reward (UnCon) 2 \uparrow
Planning Rollout
Diffuser 9.5±plus-or-minus\pm±3.1 11.0±plus-or-minus\pm±1.7 142.2±plus-or-minus\pm±5.1
SafeDiffuser 99.4±plus-or-minus\pm±0.9 12.3±plus-or-minus\pm±3.5 135.8±plus-or-minus\pm±5.2
LTLDoG-S 99.0±plus-or-minus\pm±0.8 73.0±plus-or-minus\pm±3.0 97.3±plus-or-minus\pm±2.8
LTLDoG-R 98.8±plus-or-minus\pm±0.8 92.0±plus-or-minus\pm±1.4 127.1±plus-or-minus\pm±5.1
  • 1

    Mean and standard deviation calculated from 10101010 groups of tests. Each test contains 100100100100 trials, where a trajectory is labeled as either satisfied or not satisfied in each trial. Best result during rollout is highlighted. Same for other tables of Maze2d.

  • 2

    Unconstrained rewards do not take unsafe penalties into account.

TABLE II: Results of Avoidance Task in PushT
Method\Perf. Satisfaction rate(%percent\%%) \uparrow Overlap Score1 \uparrow
Diffusion Policy 34.8±plus-or-minus\pm±18.0 0.941±plus-or-minus\pm±0.0584
LTLDoG-S 85.6±plus-or-minus\pm±13.1 0.890±plus-or-minus\pm±0.0647
LTLDoG-R 85.6±plus-or-minus\pm±12.5 0.842±plus-or-minus\pm±0.0985
  • 1

    measures the final overlap of the T block and target area (min: 0.00.00.00.0, max: 1.01.01.01.0).

Refer to caption
(a) Diffuser.
Refer to caption
(b) SafeDiffuser.
Refer to caption
(c) LTLDoG-S.
Figure 3: Examples of safe planning in Maze2d-Large. There are three unsafe blocks (red squares, labeled pLsubscript𝑝𝐿p_{L}italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, pMsubscript𝑝𝑀p_{M}italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, pRsubscript𝑝𝑅p_{R}italic_p start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT from left to right) that need to be avoided during navigation to the goal (shaded circle). The LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT constraint for this task is φ=¬(pLpMpR)𝜑subscript𝑝𝐿subscript𝑝𝑀subscript𝑝𝑅\varphi=\Box\neg(p_{L}\wedge p_{M}\wedge p_{R})italic_φ = □ ¬ ( italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∧ italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∧ italic_p start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ). (a) Trajectories from Diffuser ignore safety and can violate the specified constraints. (b) SafeDiffuser produces discontinuous trajectories. (c) Our LTLDoG is able to plan trajectories that detours around the obstacles to successfully arrive at the goal.
Refer to caption
(a) PushT environment.
Refer to caption
(b) Diffusion Policy.
Refer to caption
(c) LTLDoG-S.
Figure 4: Results of safe control in PushT. (a) A robot arm’s end effector (circles filled in blue) should manipulate the T block (gray) to a goal pose (green), and avoid entering unsafe regions (hollow circles marked with pX𝑝𝑋pXitalic_p italic_X), specified by an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula (text in black). In this example, the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT specifies the end effector should never enter regions p1𝑝1p1italic_p 1 (purple) and p3𝑝3p3italic_p 3 (cyan). (b) The actions generated and executed by Diffusion Policy do not satisfy the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula. (c) In contrast, LTLDoG-S guides the diffusion to avoid p1𝑝1p1italic_p 1 and p3𝑝3p3italic_p 3, yet still completes the manipulation task.

Our experiments focus on testing LTLDoG’s ability to handle static and temporal safety constraints. We first briefly describe the simulated and real environments, and baseline data-driven methods. Then we report empirical results on benchmark environments and demonstrate LTLDoG’s applicability in real world tasks through a case study on a quadruped robot (Fig. 2). We conclude with a brief ablation study and analysis. Due to space restrictions, we focus on conveying our main results. More details of the environments, implementations and analysis can be found in the appendix.

V-A Experimental Setup

Environments. We evaluate methods in two simulation benchmark environments (Maze2d [1] and PushT [3]) and demonstrate in two real indoor rooms. Maze2d (Fig. 3 and 5) presents challenging long-horizon navigation tasks, where state-of-the-art offline algorithms fail to learn a reliable goal-reaching policy. The atomic propositions are determined by the occurrence of events when the agent is inside key regions in the maze (for avoidance and visitation). The PushT task (Fig. 4) requires manipulation of a T block through interaction with a controllable mover. In our experiments, the mover is constrained to visit specific regions and avoid others. Our real-world experiments involve two indoor environments: a lab designed to mimic a studio apartment, and an office environment (Fig. 2).

Compared methods. Our work involves trajectory generation by learning from an offline dataset and as such, we compare against data-driven planning methods. Diffuser and Diffusion Policy are state-of-the-art methods for sampling viable plans but without any guarantees over external constraints. To evaluate how well LTLDoG enforces safety specifications, we compare with SafeDiffuser, a safe planning diffusion model using CBFs. However, note that SafeDiffuser cannot handle temporal constraints or instructions; to our knowledge, our work is the first data-driven method to handle both static and temporal constraints. As such, there is no direct comparison baseline. We analyze the differences between the two variants of our method, LTLDoG-S and LTLDoG-R.

V-B Comparative Analysis of Methods

Can LTLDoG achieve safe planning for static constraints? Our results indicate that yes, LTLDoG is better able to generate trajectories that satisfy given region-avoidance constraints relative to existing methods. In both the Maze2D and PushT benchmarks, LTLDoG achieves high success rates (in both planning and rollout), without severely compromising reward accumulation (Tables I and  II). Lower total rewards are expected since safe paths are typically longer and rollout scores are generally lower as the low-level controller may not exactly follow the diffused trajectory. Qualitatively, Fig. 3 shows that LTLDoG is able to generate safe trajectories in Maze2D, whilst SafeDiffuser is limited to “local” deviations and fails to find paths that detour around unsafe regions. Results in PushT are consistent with Maze2D; Fig. 4 shows that LTLDoG performs the orientation task without entering unsafe regions.

Refer to caption
Refer to caption
Refer to caption
(a) Regions for evaluation of atomic propositions in 𝒫𝒫\mathcal{P}caligraphic_P.
Refer to caption
Refer to caption
Refer to caption
(b) Diffuser.
Refer to caption
Refer to caption
Refer to caption
(c) Ours (LTLDoG-S).
Figure 5: Temporal Constraints in Maze2D. (a) Each maze has 6 non-overlapping regions. Agents are requested to visit some of these blocks under different temporally-extended orders. (b) and (c) show generated trajectories under φ=¬p3𝖴(p5(¬p2𝖴p0))𝜑subscript𝑝3𝖴subscript𝑝5subscript𝑝2𝖴subscript𝑝0\varphi=\neg p_{3}\operatorname{\mathsf{U}}{}(p_{5}\wedge(\neg p_{2}% \operatorname{\mathsf{U}}{}p_{0}))italic_φ = ¬ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT sansserif_U ( italic_p start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ∧ ( ¬ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sansserif_U italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ). Our method can satisfy ¬\neg¬ propositions (red zones) before reaching the green regions.
TABLE III: Performance on Different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs in Maze2d.
Environment Method\Performance Training LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs Testing LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs
Satisfaction rate (%percent\%%) \uparrow Reward (UnCon) \uparrow Satisfaction rate (%percent\%%) \uparrow Reward (UnCon) \uparrow
Planning Rollout Planning Rollout
U-Maze (Horizon 256) Diffuser 31.1±plus-or-minus\pm±0.5 31.0±plus-or-minus\pm±0.5 33.5±plus-or-minus\pm±2.7 33.9±plus-or-minus\pm±0.5 34.1±plus-or-minus\pm±0.6 35.6±plus-or-minus\pm±0.3
LTLDoG-S 83.8±plus-or-minus\pm±0.2 57.6±plus-or-minus\pm±1.3 31.3±plus-or-minus\pm±1.2 82.7±plus-or-minus\pm±0.3 56.6±plus-or-minus\pm±0.9 32.8±plus-or-minus\pm±0.5
LTLDoG-R 56.3±plus-or-minus\pm±0.4 51.3±plus-or-minus\pm±0.9 31.5±plus-or-minus\pm±0.2 57.7±plus-or-minus\pm±0.4 52.1±plus-or-minus\pm±0.3 32.3±plus-or-minus\pm±0.7
Medium (Horizon 384) Diffuser 15.0±plus-or-minus\pm±0.7 13.4±plus-or-minus\pm±0.6 84.8±plus-or-minus\pm±0.3 11.6±plus-or-minus\pm±1.4 10.1±plus-or-minus\pm±1.2 84.8±plus-or-minus\pm±0.5
LTLDoG-S 77.9±plus-or-minus\pm±5.7 31.8±plus-or-minus\pm±2.6 53.1±plus-or-minus\pm±5.2 68.4±plus-or-minus\pm±6.7 28.7±plus-or-minus\pm±3.5 50.5±plus-or-minus\pm±4.7
LTLDoG-R 51.8±plus-or-minus\pm±1.8 39.5±plus-or-minus\pm±1.6 57.3±plus-or-minus\pm±0.2 43.3±plus-or-minus\pm±4.4 30.6±plus-or-minus\pm±1.9 57.7±plus-or-minus\pm±0.1
Large (Horizon 512) Diffuser 13.5±plus-or-minus\pm±0.4 12.8±plus-or-minus\pm±0.1 76.3±plus-or-minus\pm±0.1 11.6±plus-or-minus\pm±2.3 11.5±plus-or-minus\pm±1.7 77.8±plus-or-minus\pm±3.9
LTLDoG-S 73.8±plus-or-minus\pm±2.4 32.6±plus-or-minus\pm±1.4 42.3±plus-or-minus\pm±5.0 66.6±plus-or-minus\pm±2.7 24.9±plus-or-minus\pm±1.7 40.9±plus-or-minus\pm±4.4
LTLDoG-R 66.9±plus-or-minus\pm±0.6 47.4±plus-or-minus\pm±0.8 54.6±plus-or-minus\pm±1.3 57.5±plus-or-minus\pm±2.3 39.0±plus-or-minus\pm±2.9 54.5±plus-or-minus\pm±3.9
TABLE IV: Generalization to Different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae in PushT
Method\Performance LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT Set Satisf. rate (%percent\%%) \uparrow Score \uparrow
Diffusion Policy Training 22.9±plus-or-minus\pm±8.0 0.354±plus-or-minus\pm±0.153
Test 30.7±plus-or-minus\pm±13.9 0.371±plus-or-minus\pm±0.177
LTLDoG-S Training 28.2±plus-or-minus\pm±8.33 0.290±plus-or-minus\pm±0.115
Test 43.0±plus-or-minus\pm±17.0 0.299±plus-or-minus\pm±0.145
LTLDoG-R Training 69.3±plus-or-minus\pm±9.90 0.292±plus-or-minus\pm±0.121
Test 66.0±plus-or-minus\pm±20.8 0.340±plus-or-minus\pm±0.168

Can LTLDoG satisfy static and temporal constraints, and generalize to novel LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae? Tables III and IV show the performance of the compared methods on both training and test LTLs in Maze2d and PushT environments. To elaborate, we follow the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT specifications in [22] and adopt the Until sampler to generate random LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs (200200200200 for Maze2d and 36363636 for PushT) that contain different visitation sequences and unsafe regions. The training set has 80%percent8080\%80 % of all LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs and the rests are used as test set. Atomic propositions consist of 6 regions in Maze2d (Fig. 5(a), i.e., p0,p1,,p5subscript𝑝0subscript𝑝1subscript𝑝5p_{0},p_{1},\ldots,p_{5}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT) and 4 regions in PushT (Fig. 4(a)).

Results reveal that both LTLDoG-S and LTLDoG-R achieve significantly higher success rates than Diffuser/Diffusion Policy. The baselines have a non-zero performance as some generated LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae are trivial to satisfy at some start locations. Also note that some specifications may be impossible to satisfy given the physical locations of the agent, walls, and propositional regions in the maze. In these cases, planned trajectories will violate constraints during rollout, causing a drop in the satisfaction rate. Fig. 5 shows examples of trajectories generated in Maze2d.

Refer to caption
(a) Lab map and dataset coverage.
Refer to caption
(b) Diffuser.
Refer to caption
(c) Ours.
Refer to caption
(d) Office map and dataset coverage.
Refer to caption
(e) Diffuser.
Refer to caption
(f) Ours.
Figure 6: Results in real world rooms. The instructed LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is φ=¬Passageway𝖴Kitchen𝜑Passageway𝖴Kitchen\varphi=\neg\text{Passageway}\operatorname{\mathsf{U}}{}\text{Kitchen}italic_φ = ¬ Passageway sansserif_U Kitchen for lab (first row) and φ=(DoorSeat)𝜑DoorSeat\varphi=\Diamond\left(\text{Door}\wedge\Diamond\text{Seat}\right)italic_φ = ◇ ( Door ∧ ◇ Seat ) for office (second row). In the lab task the robot has to first unload in the kitchen area after entering the door before being allowed to go to the goal. When loaded, the robot is prohibited from going through the narrow passageway where people sit on the side. In the office task, the robot has to first visit the seat, followed by the door, before the goal.
TABLE V: Results of achieving goals and LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT in real-world tasks.
Environment Method\Performance Satisfaction rate (%percent\%%) \uparrow
Goal LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT
Lab Diffuser 100.00 0.00
LTLDoG-S 91.67 91.67
Office Diffuser 100.00 0.00
LTLDoG-S 95.83 95.83

V-C Real Robot Case Study.

We show that LTLDoG can plan for a robot dog (Unitree Go2 Edu) given LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instructions in two real-world navigation environments — a lab that mimics a studio apartment and an office room. Training of diffusion models was performed in simulation using Gazebo and ROS1 using LIDAR scanned maps (Fig. 6(a) and 6(d)). Note that the training trajectories do not require running an oracle policy to satisfy many different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae; we simply made the robot navigate to randomly sampled goals using the global planner and TEB Local Planner [39] from the ROS1 Navigation stack (Fig. 6(a) and 6(d)).

To test on potential constraints, we queried GPT-4 for LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae representing meaningful robotic tasks including obstacle avoidance and sequential navigation. We used 4 kinds of LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs from the generated results, i.e., ¬p0subscript𝑝0\Box\neg p_{0}□ ¬ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, p0p1subscript𝑝0subscript𝑝1\Diamond p_{0}\wedge\Diamond p_{1}◇ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ ◇ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, (p0p1)subscript𝑝0subscript𝑝1\Diamond\left(p_{0}\wedge\Diamond p_{1}\right)◇ ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ ◇ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and ¬p1𝖴p0subscript𝑝1𝖴subscript𝑝0\neg p_{1}\operatorname{\mathsf{U}}{}p_{0}¬ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sansserif_U italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The first LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT corresponds to an obstacle avoidance task where the robot should never visit a specific region. For example, ¬SeatSeat\Box\neg\text{Seat}□ ¬ Seat means the robot should never enter the Seat region (Fig 6(d)). The remaining three LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae represent: 1) visiting all regions at least once; 2) visiting regions in a specific sequence; 3) avoiding a specific region until another has been visited. Some example regions designed in our real environments and start/goal locations are shown in Fig 6.

In total, 96969696 trials were executed on the real robot, involving 12121212 trajectories (6666 for baseline Diffuser and 6666 for our method LTLDoG-S) with varying regions and different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae in each room. The overall satisfaction rate of all raw generated trajectories in simulation is 85.8±14.0%plus-or-minus85.8percent14.085.8\pm 14.0\%85.8 ± 14.0 % (c.f. baseline 2.9±5.4%plus-or-minus2.9percent5.42.9\pm 5.4\%2.9 ± 5.4 %). For each specific formula, a sample was selected for real-world execution based on their feasibility. The results in TABLE V show that LTLDoG has a high satisfaction rate compared to Diffuser; Fig. 6(c) and 6(f) illustrate trajectories for qualitative comparison.

V-D Ablation Study and Analysis

TABLE VI: Ablation Study – Binary Classifier Guidance
Method\Performance LTL Set Satisfaction rate (%percent\%%) \uparrow Reward (UnCon) \uparrow
Planning Rollout
Diffuser Training 31.1±plus-or-minus\pm±0.5 31.0±plus-or-minus\pm±0.5 32.5±plus-or-minus\pm±2.7
Test 33.9±plus-or-minus\pm±0.5 34.1±plus-or-minus\pm±0.6 35.6±plus-or-minus\pm±0.3
Classifier guidance1 Training 41.1±plus-or-minus\pm±0.6 40.6±plus-or-minus\pm±0.7 33.9±plus-or-minus\pm±0.9
Test 40.0±plus-or-minus\pm±0.8 41.8±plus-or-minus\pm±1.1 35.3±plus-or-minus\pm±0.3
LTLDoG-R Training 56.3±plus-or-minus\pm±0.4 51.3±plus-or-minus\pm±0.9 31.5±plus-or-minus\pm±0.2
Test 57.7±plus-or-minus\pm±0.4 52.1±plus-or-minus\pm±0.3 32.3±plus-or-minus\pm±0.7
  • 1

    The classifier guidance method only leverages binary labels for LTL satisfaction checking in Maze2d U-Maze.

Ablation study. Unlike classifier guidance, where each trajectory is labelled as satisfy or not satisfy, we leverage the continuous values from our formula evaluator (as described in Section III-C) and train a regressor guidance network. Although classifier guidance improves over the vanilla Diffuser, it achieves lower performance than LTLDoG-R ( Table VI). This comparison supports the notion that soft labels improves the guidance neural network; we posit using real values provides richer information in terms of how well the trajectory satisfies a given LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula.

Analysis on horizon. Fig. 7 demonstrates that planning with a longer horizon leads to improved performance in terms of LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT satisfaction. This improvement is attributed to the fact that LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instructions often require a longer sequence of steps in a path compared to mere goal navigation, e.g., visiting a specific region before reaching the goal.

Refer to caption
(a) Planning.
Refer to caption
(b) Rollout.
Figure 7: Performance in Maze2d U-Maze with different lengths of trajectory. DM stands for Diffuser and DoG-S/DoG-R are our methods.

VI Conclusion, Discussion and Future Work

In this work, we presented LTLDoG, an approach towards generating safe trajectories that comply with LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT specifications at test time. Within our overall scheme, we presented two methods: LTLDoG-S guides the sampling process under any LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula while LTLDoG-R uses a trained model that we show generalizes to new formulae with similar structure. To our knowledge, this work is the first that successfully incorporates model checking using a formal language with diffusion models for safe planning. Notably, LTLDoG does not require data collection for each potential LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT instruction; rather, we control the sampling process during diffusion using “soft” model checking to generate new trajectories using existing information provided by the training dataset.

Limitations and Future Work. LTLDoG is a step towards trustworthy trajectory planning using generative models. There are several areas where LTLDoG can be improved. Similar to other diffusion models, LTLDoG is generally unable to generate realistic trajectories when the context (environment/goal) is far from the training distribution. As such, the dataset should preferably contain trajectories with a variety of complex behaviors that can potentially satisfy different test-time LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formulae. It would be interesting to develop methods to ensure the sampling process of LTLDoG-S adheres to the data manifold. Additionally, LTLDoG plans in an open-loop fashion, where the entire trajectory is generated conditioned on the constraints. This is mainly because the evaluation of an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT formula depends on the entire trajectory. We plan to further explore planning using receding horizon control using partial evaluations on an incomplete trajectory. Finally, diffusion models usually require significant amounts of training data and many diffusion steps during inference. Recent work on interpolant diffusion methods [40] leverages source policies to reduce data and computation costs. We aim to explore how integrating this approach with conditional sampling using LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT can enhance the generation of safe trajectories.

Acknowledgements

This research is supported by A*STAR under its National Robotics Programme (NRP) (Award M23NBK0053). The authors would also like to acknowledge partial support from a Google South Asia & Southeast Asia Award and from the National Research Foundation, Singapore under its Medium Sized Center for Advanced Robotics Technology Innovation.

References

  • [1] M. Janner, Y. Du, J. B. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” in Int. Conf. Mach. Learn., vol. 162, 2022, pp. 9902–9915.
  • [2] A. Ajay, Y. Du, A. Gupta, J. B. Tenenbaum, T. S. Jaakkola, and P. Agrawal, “Is conditional generative modeling all you need for decision making?” in Int. Conf. Learn. Representations, 2023.
  • [3] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Proc. Robot.: Sci. and Syst. (RSS), 2023.
  • [4] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
  • [5] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Inf. Process. Syst., 2020, pp. 6840–6851.
  • [6] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Int. Conf. Learn. Representations, 2021.
  • [7] W. Xiao, T.-H. Wang, C. Gan, and D. Rus, “SafeDiffuser: Safe planning with diffusion probabilistic models,” arXiv preprint arXiv:2306.00148, 2023.
  • [8] N. Botteghi, F. Califano, M. Poel, and C. Brune, “Trajectory generation, control, and safety with denoising diffusion probabilistic models,” arXiv preprint arXiv:2306.15512, 2023.
  • [9] A. Pnueli, “The temporal logic of programs,” in 18th Annu. Symp. Found. Comput. Sci., 1977, pp. 46–57.
  • [10] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Comput., vol. 23, no. 7, pp. 1661–1674, 2011.
  • [11] C. Baier and J. Katoen, Principles of Model Checking.   MIT Press, 2008.
  • [12] C. Belta, B. Yordanov, and E. A. Gol, Formal Methods for Discrete-Time Dynamical Systems.   Springer, 2017, vol. 89.
  • [13] H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye, “Diffusion posterior sampling for general noisy inverse problems,” in Int. Conf. Learn. Representations, 2023.
  • [14] B. Efron, “Tweedie’s formula and selection bias,” J. Amer. Statistical Assoc., vol. 106, no. 496, pp. 1602–1614, 2011.
  • [15] Z. Xu, Y. S. Rawat, Y. Wong, M. Kankanhalli, and M. Shah, “Don’t pour cereal into coffee: Differentiable temporal logic for temporal action segmentation,” in Advances in Neural Inf. Process. Syst., 2022.
  • [16] K. Leung, N. Aréchiga, and M. Pavone, “Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods,” Int. J. Robot. Res., vol. 42, no. 6, pp. 356–370, 2023.
  • [17] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” in Proc. IEEE Int. Joint Conf. Neural Netw., vol. 2, 2005, pp. 729–734.
  • [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Trans. Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
  • [19] S. Zhu, L. M. Tabajara, J. Li, G. Pu, and M. Y. Vardi, “Symbolic LTLf synthesis,” in Int. Joint Conf. Artif. Intell., 2017, pp. 1362–1369.
  • [20] A. Camacho, J. Baier, C. Muise, and S. McIlraith, “Finite LTL synthesis as planning,” in Proc. Int. Conf. Automated Planning and Scheduling, vol. 28, 2018, pp. 29–38.
  • [21] Y. Xie, F. Zhou, and H. Soh, “Embedding symbolic temporal knowledge into deep sequential models,” in IEEE Int. Conf. Robot. Automat., 2021, pp. 4267–4273.
  • [22] P. Vaezipoor, A. C. Li, R. A. T. Icarte, and S. A. Mcilraith, “LTL2Action: Generalizing LTL instructions for multi-task RL,” in Int. Conf. Mach. Learn., 2021, pp. 10 497–10 508.
  • [23] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” in The Semantic Web, 2018, pp. 593–607.
  • [24] F. Bacchus and F. Kabanza, “Using temporal logics to express search control knowledge for planning,” Artif. Intell., vol. 116, no. 1, pp. 123–191, 2000.
  • [25] J. A. Baier and S. A. McIlraith, “Planning with temporally extended goals using heuristic search,” in Proc. Int. Conf. Automated Planning and Scheduling, 2006, p. 342–345.
  • [26] A. Camacho, E. Triantafillou, C. Muise, J. Baier, and S. McIlraith, “Non-deterministic planning with temporally extended goals: LTL over finite and infinite traces,” in Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, 2017.
  • [27] G. Fainekos, H. Kress-Gazit, and G. Pappas, “Temporal logic motion planning for mobile robots,” in IEEE Int. Conf. Robot. Automat., 2005, pp. 2020–2025.
  • [28] V. Kurtz and H. Lin, “Temporal logic motion planning with convex optimization via graphs of convex sets,” IEEE Trans. Robot., vol. 39, no. 5, pp. 3791–3804, 2023.
  • [29] C. Yang, M. L. Littman, and M. Carbin, “On the (in)tractability of reinforcement learning for LTL objectives,” in Int. Joint Conf. Artif. Intell., 2022, pp. 3650–3658.
  • [30] R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith, “Teaching multiple tasks to an RL agent using LTL,” in Proc. Int. Conf. Autonomous Agents Multiagent Syst., 2018, pp. 452–461.
  • [31] C. Voloshin, H. M. Le, S. Chaudhuri, and Y. Yue, “Policy optimization with linear temporal logic constraints,” in Advances in Neural Inf. Process. Syst., 2022, pp. 17 690–17 702.
  • [32] J. Song, Q. Zhang, H. Yin, M. Mardani, M.-Y. Liu, J. Kautz, Y. Chen, and A. Vahdat, “Loss-guided diffusion models for plug-and-play controllable generation,” in Int. Conf. Mach. Learn., vol. 202, 2023, pp. 32 483–32 498.
  • [33] P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” in Advances in Neural Inf. Process. Syst., vol. 34, 2021, pp. 8780–8794.
  • [34] J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  • [35] A. Jalal, M. Arvinte, G. Daras, E. Price, A. G. Dimakis, and J. Tamir, “Robust compressed sensing MRI with deep generative priors,” in Advances in Neural Inf. Process. Syst., vol. 34, 2021, pp. 14 938–14 954.
  • [36] H. Chung and J. C. Ye, “Score-based diffusion models for accelerated MRI,” Med. Image Anal., p. 102479, 2022.
  • [37] H. Chung, B. Sim, and J. C. Ye, “Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction,” in IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 12 413–12 422.
  • [38] H. Chung, B. Sim, D. Ryu, and J. C. Ye, “Improving diffusion models for inverse problems using manifold constraints,” in Advances in Neural Inf. Process. Syst., vol. 35, 2022, pp. 25 683–25 696.
  • [39] C. Rösmann, F. Hoffmann, and T. Bertram, “Integrated online trajectory planning and optimization in distinctive topologies,” Robot. Auton. Syst., vol. 88, pp. 142–153, 2017.
  • [40] K. Chen, E. Lim, K. Lin, Y. Chen, and H. Soh, “Behavioral refinement via interpolant-based policy diffusion,” in Proc. Robot.: Sci. and Syst. (RSS), 2024.
  • [41] J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4RL: Datasets for deep data-driven reinforcement learning,” arXiv preprint arXiv:2004.07219, 2020.
  • [42] A. Camacho, R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith, “LTL and beyond: Formal languages for reward function specification in reinforcement learning,” in Int. Joint Conf. Artif. Intell., 2019, pp. 6065–6073.
  • [43] C. Menghi, C. Tsigkanos, M. Askarpour, P. Pelliccione, G. Vázquez, R. Calinescu, and S. García, “Mission specification patterns for mobile robots: Providing support for quantitative properties,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 2741–2760, 2023.

-A Data

-A1 Maze2d

The Maze2d environment is part of a well-known open-source benchmark for offline reinforcement learning — D4RL [41]. Datasets of trajectories are publicly available in the D4RL library. Each of the trajectories within is one continuous trajectory of an agent navigating to random goals without interruption. See Fig. 8 for an overview of the environments. We utilize the original dataset in D4RL as is to train the baseline Diffuser, which also serves as the diffusion model of LTLDoG.

Refer to caption
(a) U-Maze.
Refer to caption
(b) Medium.
Refer to caption
(c) Large.
Figure 8: Maze2d environments. The agent is tasked to move force-actuated ball to goals (red circle) from random start locations (green ball). There are three kinds of maze configurations with increasing difficulties and all of them are long-horizon path planning tasks with continuous states and actions.

-A2 PushT

PushT is a robot manipulation benchmark adopted in Diffusion Policy [3] to showcase diffusion models’ planning capabilities. A dataset of expert policies for pushing a T block to the same goal position starting from random poses are publicly available at [3]. Though Diffusion Policy achieves good performance on the original manipulation task, trained with around only 200 expert demonstrations (i.e., 206 episodes), we find that these trajectories lack the modalities or abundant behaviors that potentially or partially satisfy different LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT constraints. As such, expanding the original dataset is necessary for our method to work.

Data Augmentation Given that obtaining extra expert demonstrations is inefficient, we performed common data augmentation techniques to expand the original dataset. Concretely, we first employed simple geometric augmentation, only rotation and flipping, for each trajectory in the original dataset, assuming basic physical symmetry in this simulation environment. Then, we used a pretrained Diffusion Policy to push the T block to the goal pose, aligning with the original dataset in which the goal of the manipulation task is implicitly encoded in trajectories. In this way, we augmented the original set from 206 trajectories to 6686 trajectories, and elongated the average episode length from 125 to 289 steps. The augmented dataset is publicly available online111https://github.com/clear-nus/ltldog.

In this task, the baseline (Diffusion Policy) and our methods share the same diffusion model backbone, which is trained over the augmented dataset.

-A3 Real Robot Scenarios

Data for both of the real-world scenarios are collected in simulated environments. For the office we collected 4 long continuous trajectories in the simulation of step size of 11,000, 107,000, 60,000 and 130,000. Similarly for the lab map we collected 4 trajectories of steps 11,000, 19,000, 110,000 and 120,000. First, we built by SLAM the maps of the two environments, a studio-like lab and an office room, and the maps are used to create two simulation scenarios. For creating the maps, we teleoperated a quadruped robot (Unitree Go2) mounted with LiDAR in each of these environments. We used the slam_toolbox along with Nav2 for generating the maps in both environments. We created simple 3D world for each of the maps for simulation in gazebo. After that, we enabled a Go2 robot simulated in Gazebo and driven by ROS Navigation Stack to navigate to random goal positions sampled in the maps for data collection, creating long trajectories in a similar fashion to Maze2d.

-A4 LTL Satisfaction Values

The soft labels (values indicating how well the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs are satisfied) for training the regressor guide (LTLDoG-R) are calculated with an adapted version of the DTL toolbox [15] and saved before each training session, and can be reused if possible, since each training session may sample and cut different segments of the trajectories in datasets based on different training configurations. Values calculation for benchmarking test results are performed online simultaneously with all tests.

-B LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT Generation

To generate practical LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs meaningful for real robotic tasks, we query GPT-4 via prompting with some constraints on LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs. The prompt and the response from ChatGPT is listed in TABLE VII. In the real experiments, we tested the first four returned LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs as these are the most representative LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs among them and are also commonly used in robot learning literature [30, 42, 22]. Some of the other generated LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs shown in this table are either not logically correct or not meaningful for robotic tasks. For instance, the 5555th LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT has a similar meaning of sequential visiting as the 2222nd LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. The imply operator does not enforce any constraints if the condition is not met (e.g., the condition in the 6666th LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT can never be satisfied for disjoint regions). The 7777th, 8888th and 9999th LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs are not correct with their corresponding descriptions. The last LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT can be considered as a combination of the 2222nd and 3333rd LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs. More complex LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs and combinations of them can be derived based on different visitation and avoidance patterns [43]. We plan to further explore these temporal specifications in environments like Robomimic, and adapting our method to complex combinations of LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs in future work.

TABLE VII: Prompts and Responses from ChatGPT (GPT-4)1 for LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT Generation.
Prompt In a robot navigation task, the robot is required to visit four key regions (referred to as p_0, p_1, p_2 and p_3) in specific temporal orders. The constraints on the robot’s trajectory concerning the visitation (or non-visitation) of these regions can be specified using finite Linear Temporal Logic (LTLf). Please generate several LTLf formulas (e.g., 10) that are realistic in real-world environments and tasks involving robot navigation. The details of the task and additional requirements are as follows: 1. The LTLf operators that can be used are: ¬\neg¬ (Negate), \wedge (And), \vee (Or), F (Eventually), G (Always), U (Until). Avoid using other LTL operators. Also, avoid using X (Next) since the number of timesteps needed to move from one region to another is unknown. 2. The atomic propositions are determined by whether the robot is inside p_0, p_1, p_2 and p_3. Therefore, the labelling function returns assignments with a 4-dimensional 0-1 vector at each timestep of the robot’s trajectory. 3. Only one proposition can be true at each timestep because these regions are disjoint. The robot cannot be inside any two regions simultaneously. An assignment at a timestep can be a zero vector (0, 0, 0, 0), indicating that the robot is not currently in any of these regions. This occurs when the robot is initially not in these regions and all atomic propositions are False. 4. The robot cannot move directly from one region to another as the regions might not be connected. Therefore, LTLf formulas such as p_0 U p_1 are not valid since both propositions will be False when the robot is not inside these two regions. 5. The robot’s trajectory is finite, so the robot cannot repeatedly visit different regions. 6. The LTLf formulas are supposed to apply from the first timestep of the assignment. 7. When generating an LTLf formula, please analyze the generated formula first to ensure that it is valid and that there exist assignments that can satisfy it.
ChatGPT To create realistic LTLf formulas for a robot navigation task considering the specified constraints and operators, we need to structure these formulas in a way that reflects possible navigation scenarios. Here are ten LTLf formulas that align with the given requirements: 1. Visit all regions at least once: F(p0)F(p1)F(p2)F(p3)𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2𝐹subscript𝑝3F(p_{0})\wedge F(p_{1})\wedge F(p_{2})\wedge F(p_{3})italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_F ( italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) This formula states that eventually, the robot will visit each of the four regions at least once during its trajectory. 2. Visit regions in a specific sequence: F(p0F(p1F(p2F(p3))))𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2𝐹subscript𝑝3F(p_{0}\wedge F(p_{1}\wedge F(p_{2}\wedge F(p_{3}))))italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) ) ) This formula requires the robot to visit p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, followed by p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and finally p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, in that order. 3. Avoid a specific region until another has been visited: ¬p1𝖴p0subscript𝑝1𝖴subscript𝑝0\neg p_{1}\operatorname{\mathsf{U}}{}p_{0}¬ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sansserif_U italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT Here, the robot must avoid p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT until p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT has been visited. 4. Never visit a specific region: G(¬p3)𝐺subscript𝑝3G(\neg p_{3})italic_G ( ¬ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) This formula specifies that the robot should never enter region p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. 5. Visit one region only after visiting two others: F(p0F(p1))F(p2)𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2F(p_{0}\wedge F(p_{1}))\operatorname{\Rightarrow}{}F(p_{2})italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ⇒ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) The robot can visit p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT only after it has visited p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and then p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. 6. Ensure two regions are visited before a third: F(p0p1)F(p2)𝐹subscript𝑝0subscript𝑝1𝐹subscript𝑝2F(p_{0}\wedge p_{1})\operatorname{\Rightarrow}{}F(p_{2})italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⇒ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) This formula ensures that p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are visited (in no particular order) before visiting p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT: 7. Visit at least three regions: F(p0)F(p1)F(p2)F(p3)𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2𝐹subscript𝑝3F(p_{0})\vee F(p_{1})\vee F(p_{2})\vee F(p_{3})italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∨ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∨ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ italic_F ( italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) This ensures the robot visits at least three out of the four regions. It is a less strict requirement, allowing for flexibility in navigation. 8. Visit a specific region last: F(p0F(p1F(p2)))G(¬p3𝖴(p0F(p1F(p2))))𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2𝐺subscript𝑝3𝖴subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2F(p_{0}\wedge F(p_{1}\wedge F(p_{2})))\wedge G(\neg p_{3}\operatorname{\mathsf% {U}}{}(p_{0}\wedge F(p_{1}\wedge F(p_{2}))))italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ) ∧ italic_G ( ¬ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT sansserif_U ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ) ) The robot must visit p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT last, after visiting p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in any order. 9. Avoid a region until all other regions have been visited: ¬p3𝖴(F(p0)F(p1)F(p2))subscript𝑝3𝖴𝐹subscript𝑝0𝐹subscript𝑝1𝐹subscript𝑝2\neg p_{3}\operatorname{\mathsf{U}}{}(F(p_{0})\wedge F(p_{1})\wedge F(p_{2}))¬ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT sansserif_U ( italic_F ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_F ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) The robot should avoid p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT until it has visited p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in any order. 10. Visit a pair of regions, avoiding others until then: ¬(p2p3)𝖴(p0F(p1))subscript𝑝2subscript𝑝3𝖴subscript𝑝0𝐹subscript𝑝1\neg(p_{2}\vee p_{3})\operatorname{\mathsf{U}}{}(p_{0}\wedge F(p_{1}))¬ ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∨ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) sansserif_U ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∧ italic_F ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) The robot must avoid p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT until it has visited p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and then p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT: Each of these formulas considers the finite nature of the robot’s trajectory, the impossibility of being in multiple regions simultaneously, and the requirement that the robot may need to travel through unspecified areas (not within p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, or p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) to move from one region to another. The formulas are structured to reflect realistic navigation tasks that could be encountered in real-world robot navigation scenarios, such as search and rescue, surveillance, or delivery tasks within a specified area.

-C Additional Illustrations

More illustrations and comparisons against the baseline for the obstacle avoidance task in Maze2d environment are in Fig. 9.

Refer to caption
Refer to caption
Refer to caption
(a) Trajectories from Diffuser.
Refer to caption
Refer to caption
(b) Trajectories from SafeDiffuser.
Refer to caption
Refer to caption
(c) Trajectories from LTLDoG-S.
Refer to caption
Refer to caption
(d) Trajectories from LTLDoG-R.
Figure 9: Results of safe planning in Maze2d-Large with unsafe regions. There are three unsafe blocks (red squares, labeled pLsubscript𝑝𝐿p_{L}italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, pMsubscript𝑝𝑀p_{M}italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, pRsubscript𝑝𝑅p_{R}italic_p start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT from left to right) that need to be avoided during navigation to the goal (shaded circle). The LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT constraint for this task is φ=¬(pLpMpR)𝜑subscript𝑝𝐿subscript𝑝𝑀subscript𝑝𝑅\varphi=\Box\neg(p_{L}\wedge p_{M}\wedge p_{R})italic_φ = □ ¬ ( italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∧ italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∧ italic_p start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ). (a) Trajectories from Diffuser ignore safety and can violate the specified constraints. (b) SafeDiffuser produces discontinuous trajectories (the constraints here are slightly reshaped to accommodate the CBF function). (c, d) Our LTLDoG is able to plan trajectories that detours around the obstacles to successfully arrive at the goal.
TABLE VIII: Results of generated LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPTs in simulation rooms.
Task Method\Performance Success rate (%percent\%%) \uparrow
Lab Office
Obstacle avoidance Diffuser 11.7±plus-or-minus\pm±3.7 0.0±plus-or-minus\pm±0.0
LTLDoG-S 100.0±plus-or-minus\pm±0.0 80.0±plus-or-minus\pm±8.1
Temporal behaviors Diffuser 0.0±plus-or-minus\pm±0.0 0.0±plus-or-minus\pm±0.0
LTLDoG-S 96.5±plus-or-minus\pm±1.5 66.7±plus-or-minus\pm±1.8
Refer to caption
(a) Planning.
Refer to caption
(b) Rollout C1.
Refer to caption
(c) Rollout C2.
Figure 10: Example of failure cases from LTLDoG-S.
TABLE IX: Comparison of Controllers1 in Avoidance Tasks in Maze2d
Method\Perf. Success rate (%percent\%%) \uparrow Reward (UnCon) \uparrow
Planning Rollout C1 C2
C1 C2
Diffuser 9.5±plus-or-minus\pm±3.1 11.0±plus-or-minus\pm±1.7 15.0±plus-or-minus\pm±2.6 142.2±plus-or-minus\pm±5.1 92.1±plus-or-minus\pm±3.0
SafeDiffuser 99.4±plus-or-minus\pm±0.9 12.3±plus-or-minus\pm±3.5 14.2±plus-or-minus\pm±2.8 135.8±plus-or-minus\pm±5.2 131.9±plus-or-minus\pm±5.4
LTLDoG-S 99.0±plus-or-minus\pm±0.8 73.0±plus-or-minus\pm±3.0 90.6±plus-or-minus\pm±0.6 97.3±plus-or-minus\pm±2.8 36.2±plus-or-minus\pm±11.7
LTLDoG-R 98.8±plus-or-minus\pm±0.8 92.0±plus-or-minus\pm±1.4 99.6±plus-or-minus\pm±0.6 127.1±plus-or-minus\pm±5.1 76.4±plus-or-minus\pm±13.4
  • 1

    Planned trajectories are rolled out using two different low level control strategies, denoted as C1 and C2.

Fig. 11–Fig. 13 delineate two more examples of the comparisons between our method LTLDoG against the baseline Diffusion Policy on the obstacle avoidance and temporal behavior tasks. These results demonstrate our method’s capabilities in guiding diffusion models to follow not only purely spatial but also spatial-temporal constraints.

Refer to caption
(a) Regions.
Refer to caption
(b) Diffusion Policy.
Refer to caption
(c) Ours (LTLDoG-S).
Refer to caption
(d) Ours (LTLDoG-R).
Figure 11: PushT task. Diffusion policy generates next locations of the mover to push the T block by following these locations. Atomic propositions for LTLs are defined by constraints over actions.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) LTLDoG-S
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(b) LTLDoG-R
Figure 12: Snapshots of two successful rollouts of the two variants of our method accomplishing the same task in Fig. 11. The mover (circle filled in blue) should avoid region p1𝑝1p1italic_p 1 and p3𝑝3p3italic_p 3 at all time, as designated in the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT(¬p1¬p3)subscript𝑝1subscript𝑝3\Box(\neg p_{1}\wedge\neg p_{3})□ ( ¬ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ ¬ italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).” Fig. 12(a) and Fig. 12(b) showcase that LTLDoG-S and LTLDoG-R exploit different modalities to complete the original manipulation task while respecting the constraint specified by the LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) Diffusion Policy (baseline)
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(b) LTLDoG-S (ours)
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(c) LTLDoG-R (ours)
Figure 13: Snapshots of comparisons between the baseline and LTLDoG given an LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT with intricate temporal constraints. The LTLfsubscriptLTL𝑓\textnormal{LTL}_{f}LTL start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT¬p1𝖴(p2(¬p0𝖴p3))subscript𝑝1𝖴subscript𝑝2subscript𝑝0𝖴subscript𝑝3\neg p_{1}\operatorname{\mathsf{U}}{}(p_{2}\wedge(\neg p_{0}\operatorname{% \mathsf{U}}{}p_{3}))¬ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sansserif_U ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ ( ¬ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT sansserif_U italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) )” demands the mover never enter p1𝑝1p1italic_p 1 until (or, unless by the time of entering) the two conditions are both met: 1) it has already been in p2𝑝2p2italic_p 2, and 2) p0𝑝0p0italic_p 0 is never reached until p3𝑝3p3italic_p 3 is visited. Clearly, the baseline breaches this complex constraint when proceeding with the original manipulation task, shown from the third to the fifth snapshot (the mover enters p0𝑝0p0italic_p 0 without reaching p3𝑝3p3italic_p 3 beforehand and then gets into p0𝑝0p0italic_p 0). In contrast, our method, both LTLDoG-S and LTLDoG-R, manage to satisfy the constraint and complete the manipulation task by two different ways (the common ground is to first visit p3𝑝3p3italic_p 3 and p2𝑝2p2italic_p 2 during or on the way to maneuvers).

We also show in Fig. 14 more snapshots of a quadruped robot running in the simulated Lab environment with different policies (baseline against ours). The overall satisfaction rate of all raw generated trajectories in different tasks and environments are summarized in Table VIII. The overall performance is 85.8±14.0%plus-or-minus85.8percent14.085.8\pm 14.0\%85.8 ± 14.0 % (c.f. baseline 2.9±5.4%plus-or-minus2.9percent5.42.9\pm 5.4\%2.9 ± 5.4 %) as reported in the main paper.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) Frames of trajectories from Diffuser in simulation.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(b) Frames of trajectories from LTLDoG-S in simulation.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(c) Frames of trajectories from LTLDoG-R in simulation.
Figure 14: Results of safe planning and execution in simulation and indoor room on a quadruped robot.

-D Failure case study

Fig. 10(a) shows an infeasible trajectory generated from LTLDoG-S. Our hypothesis is that posterior sampling in LTLDoG-S calculate gradients based on Euclidean distance on the trajectory space due to the LTL evaluation, which may push the trajectory away from the data manifold. On the other hand, the neural network in regressor guidance is trained on noised intermediate samples, which are linked to the data manifold. Gradients ascent on noise space will likely push the noised example to a place where both the satisfaction value and likelihood are high. Visualizations of the difference between two controllers as shown in Fig. 10(b) Fig. 10(c) suggest that selecting next locations sequentially without distance checking is a less safe option.

Low-Level Control Policy The success rate during rollout also depends on the specific controller used to visit each location of the generated trajectory. TABLE IX shows the comparative results between the two low-level control policies. A more conservative controller (denoted as C2) that visits every location exactly with distance checking can enjoy higher success rate than greedily move (C1, originally used in [1]).