Automated planning
“Planning is the art and practice of thinking before acting: of reviewing the courses of action one has available and predicting their expected (and unexpected) results to be able to choose the course of action most beneficial with respect to one’s goals.”
–Patrik Haslum–
Introduction
Basic Planning Problem: given descriptions of possible initial states of the world, desired goals and a set of possible actions, synthesize a plan that is guaranteed to generate a state which contains the desired goals.
In artificial intelligence, planning refers to the process of generating a sequence of actions to achieve a specific goal or set of goals. Planning involves finding an optimal sequence of actions that can transform an initial state of the world into a desired goal state while satisfying any relevant constraints and optimizing some objective function.
In summary, AI Planning is the model-based approach to action selection: prouduces the behaviour from the model (solves the model)
Classical Planning Model
Classical planning model is a tuple of six elements: \[\mathcal{S} = \langle S, s_0, S_G, A, f, c \rangle \] where:- the first is the finite set of states \(S\)
- one of this states is called the initial state \(s_0∈S\)
- some known non empty subset of these are called goal states: \(S_G⊆S\)
- a final set of actions \(A(s) ⊆ A\) that if a subset of applicable actions defined for each state \(s∈S\)
- A deterministic transition function maps a state s and an applicable action a into a resulting state s’ \[s' = f(a, s)~for~a∈A(s)\]
- finally each transition has associated a non-negative action cost \(c(a, s)\)
So solutions are sequences of applicable actions that start at the initial state \(s_0\) and add some goal states \(S_G\)
Language for Classical Planning: Strips
In order to be able to completely represent those models we use the planning languages. The simplest one and the most common is Strips, which stands for Stanford Research Institute of Problem Solver.
In Strips the Planning task is given by a 5-tuple of elements: \[\Pi = \langle F, O, c, I, G \rangle\]- F: finite set of atoms or boolean variables
- O: finite set of operators or actions of form \(\langle Add, Del, Pre \rangle\) (Add/Delete/Preconditions), each of these are a subset of atoms
- \(c: O \mapsto \mathbb{R}\): a non negative operator cost or action cost that gives a non negative value for each action
- I: initial state (a subset of atoms)
- G: goal description (also a subset of atoms)
A plan is a sequence of applicable actions that maps I into a state consistent with G
From Language to Models
Given a Strips planning task we can define the model as follows: a Strips Planning task \(\Pi = \langle F, O, c, I, G \rangle\) determines state model \(\mathcal{S}(\Pi)\) where:- the states \(s∈S\) are collections of atoms from F (all possible subsets of F)
- the initial state \(s_0\) is I
- the goal states \(s\) are such that \(G⊆s\) (all states that are super sets of G)
- the actions a in \(A(s)\) are ops in O s.t. \(Pre(a)⊆s\) (the actions in A(s) are operators in O such as the precondition is included in the state)
- and the next state is \(s'=s− Del(a) + Add(a)\) is generated by removing the set of delete effects and adding the set of additions
- action costs \(c(a, s)=c(a)\): all transitions that belongs to the same action will get the same cost
Solutions to the plannig model are exactly the solutions for the plannig task: Solutions of \(S(\Pi)\) are plans of \(\Pi\)
Planning and Model-based Reinforcement Learning
Let’s see how Strips planning tasks can be used to define the commonly used models in the realm. For the forward model we need to define the next state, that is done given the current state and an action: you take a state, remove the delete effects and give up the add effects. The only difference is that is done only for applicable actions, for non applicable actions is just a cell flow. \[(a, s_i) \rightarrow s_{(i+1)} = s_i−Del(a)+Add(a)~if~Pre(a)⊆s_i, (a,s_i) \mapsto s_i~ otherwise\]
For the backwards/reverse model, we regress an action in a state by removing the add effects and adding preconditions. This is only valid for actions whose delete effects don’t appear in the state. \[s_{(i+1)} \rightarrow (a, s_i)~where~Del(a)∩s_{(i+1)}=∅~and~s_i = s_{(i+1)} − Add(a) + Pre(a)\]
For inverse model we can find an action whose preconditions are included in the \(s_i\) and a plan which would result in \(s_{(i+1)\). That’s can be done just by going over all applicable actions. \[(s_{(i+1)}, s_i) \rightarrow a,~where~ Pre(a)⊆s_i~and~s_{(i+1)} = s_i − Del(a) + Add(a)\]
The rewards approximate the negative of the true optimal cost \(-c^*(s_{(i+1)})\) of reaching the goal (reward obtanaible) from the state \(s_{(i+1)}:(s_i, a, s_{(i+1)}) \rightarrow h(s_{(i+1)}) − h(s_i)\)
Computational problems of Classical Planning
- Cost of ptimal planning: the cost of find a plan (sequence of actions) that minimizes the summed action cost
- In satisficing planning: we want to find any plan improving the plan cost as much as possible (the cheaper plan)
- In agile planning: we care only about how quickly a plan can found ignoring completely its quantity
- For top-k planning: find k plans such that no cheaper plans exist (basically genrealizes the cost optimal planning)
- In top-quality planning: we search for all possible plans up to a certain cost
-
Diverse planning: deals with both plan quality and plan diversity (variety of problems, aiming at obtaining diverse set of plans considering plan quality as well)
References
AAAI 2022 Tutorial on AI Planning: Theory and Practice
https://aiplanning-tutorial.github.io/
https://www.youtube.com/watch?v=q-mShBwHkc4
https://www.youtube.com/watch?v=PKISCipS9Og
https://planning.wiki/
https://www.youtube.com/watch?v=XW0z8Oik6G8&list=PL1Q0jeuU6XppflOPFx1qQVuWbXTcjxevU
https://www.youtube.com/watch?v=_NOVa4i7Us8&list=PL1Q0jeuU6XppflOPFx1qQVuWbXTcjxevU&index=7