## Friday, June 26, 2009

### A* / HPA* links and scribblings shared

There just was a question on the AiGameDev forums asking about A* (Astar), I remembered I had my own old links and scribblings somewhere so I shared them, so here they are if anybody needs them:

A* / HPA* links, references, implementation considerations:

A* basic theory scribblings:

If you have been following the blog you will see that I recently got into Reinforcement Learning and Dynamic Programming (see previous post), this gave me a much better overview about the 'essence' A* is and how it came to be, really just a case of applying dynamic programming.

The essence:
"Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." (written in 1957 by Bellman, a genius)

.

## Sunday, June 21, 2009

### Pure math versus game animation technology

Today was yet another enlightening day, the game developer part of me likes to call those days "Level-up" days, if you look at my AiGameDev forum posts, my twits on twitter, the discussions we have on #gameai IRC channel, my emails with Mathematicians I have never met, you could see the conext of today's enlightment, but here is the full story:

Questionable game animation technology research

Working on the AIGameDev animation system with Alex Champandard, we reached a rather experimental (for now) stage of wanting to use Reinforcement learning to learn the heuristic of an A* planner used for planning locomotion based on a given, non-annotated, step based, automatically generated motion graph.

The motivation of doing this might be questionable, but the whole thing developed step by step.
In summary we had the motion graph builders, written by Alex, he also had the brilliant idea of making them step based, using very sensible points in time to make animation transitions, we had the generic A* code I have written, we wanted to combine the two to have something moving on the screen, showing the usefulness of this foot-skate free motion graph approach.
This gave birth to the A* motion planner, A planner that is not very suitable for multiple characters in real time, a planner that only needs a motion graph, and no extra code in form of hand written controllers, that gets you from A to B in a fast, high transition quality way.
It was not very good at reaching the destination with an orientation constraint, it was possible, but it made the heuristic too complicated, almost like writing a manual controller, and much
slower. In theory, this should be very easy, or at least easily learnable.

Reinforcement Learning

There are papers that use RL for motion planning, pretty recent ones (starting 2007), like "Real-Time Planning for Parametrized Human Motion" or "Near Optimal Character Animation with Continuous Control", I mention those two because of I have them printed on my running-out-of-space desk, along with countless "Gleicher"/"Kovar" papers about animation and motion graphs. It was Alex who drove me into all of this, so any straying is both his fault, and also his merit, I am grateful, I have long been searching for something interesting, some of my previous jobs failed big there.

The idea was to learn the heuristic, making the planner suitable for real time, this might be questionable for some people, the industry mostly drives it's animations, no need for planning at all, simple, effective, KISS, and full of foot-skate.

At the AiGameDev conference in Paris this month, there was one panel where we discussed the uncanny valley, the need or no-need to cross it, I also discussed this afterwards with Chrisitiaan Moleman and Markus Mohr, Remco Straatman and others, I mention those because I was extremely pleasantly surprised about how passionate and nice those people and everybody else who was at the conference are, I am not getting tired of repeating this because it was remarkable.

So to go back, the motivation, is by all means questionable, except if we decide that we want to see how a next step in animation would look like, I am not saying we should be planning the whole time, there will be times for driving the animations and times for planning, but I'm keeping this under wraps for the moment, although it's not very difficult to imagine what I mean.

Applying Reinforcement Learning

So all of this got me into Reinforcement Learning, a classical reference is the Sutton/Barto book "Reinforcement Learning, an Introduction" available online for free. I also bought a couple of related books. I read the parts of online book I needed the 1st time and designed an RL approach to learning what we needed to learn to help the A* heuristic, the result was that it basically 'worked' but was a naive first take that needed refinement to be really useful, the details will be available to look at in the AiGamDev sandbox at some point. At this stage I had read parts of the online book some in more detailed than other, and I would say I had revisited some parts 3 or 4 times. I was at the stage where I understood what we need, make a first attempt, understood the problem much better because of it, and was ready to design the second take, which needed a technique that is a bit more evolved that the 1st technique I had used.

Reading in detail and understanding the Blues

I decided, that I do not feel like coming back to the basics anymore, and to read the full book in maximum detail, all while writing a take-away and never look back. This alone was very enlightening, there were many subtle issues that at the level of detail that I had read until then, blurred together and looked like one simple thing, I like to compare this with music (being an ex-quarter-musician). When you listen to Progressive Metal, Jazz, Blues the first time, and you listen to one second, then the second, you feel that they are the same thing ... it's all the same, and this is true, they are, and then fun is in the differences and nuances, which are more detailed and intricate than in other styles, in blues you have a small selection of rhythmic and chord patterns that it looks extremely limiting at first sight, it turns out, there is such a huge richness to express thanks to this limitation, it sets the context and allows you to play with the rest.
In a way I also find this related to trying to write solutions that are extremely general, I like generalizations, I like unified theories of everything (I hope the theoretical physicists will find a unifying theory that will allow us to do now cool practical stuff before I seize to exist, as an alternative, I hope they scientists find a way to escape immortality, another alternative is vampires really existing and after reading my wish here visiting me), I like being 'lazy' and writing code that will be reused by me and others in many different contexts without needing adapt it, but it turns out, in practice KISS is the way to go, just like Blues setting the limits, it is very important to set your own limits when choosing your problem, per example when designing your next AI, physics, graphics or gameplay technology. It is not easy, because what ends up limiting you are your computational resources and maybe the talent of your team, both are difficult if not impossible to measure for a designer.

So, I decided I wanted to give the book the full detail treatment, all with a take away that I am sharing online at: http://docs.google.com/View?id=dcm3hb4r_19gkzkdbdr.
One additional help was the way this was presented in "http://paginas.fe.up.pt/~eol/schaefer/diplom/ReinforcementLearning.htm" saying:
The diagram shows that when initially in state s action a is selected, the successor state is s1 and reward r1 is expected but also r2 is expected passing to s2. If in state s action b is chosen, reward r3 is expected and leads to s1 but also reward r4 is expected and leads to state s2.

This diagram can be described by the following Bellman-Equation: ...

This was our conclusion although I was expecting I would find a step by step expansion based on Mathematical rules and RL definitions, maybe using the rules of Linearity in Expectation and Iterated Expectation, applied to the RL definitions of environment models.

Rigorousness, a waste of time

In general I am very skeptical in accepting things without asking lots of 'why's.
I also have been getting a bit into rigorous math, because I see this as one of the current barrier standing between mean and world domination (ooops now everybody knows). I am trying to put a bit of time into improving and updating my math skills because it is a fact, that some academic papers or writing, no matter how useful like to use cryptic (for me currently) equations which could be expressed in a much nicer way in English. This is a long topic by itself, some months ago I was wondering that in order to prove something rigorously, you need to base it on another thing, but obviously, I thought, it must start somewhere, I set myself to search for that something in math, I found something probably obvious to all mathematicians, you need axioms to start with, you don't prove axioms. Now I know this from school, but I never thought of it this way, this journey took me many places mostly on wikipedia ending at the "There Ain't No Such Thing As A Free Lunch" theorem and it's sibling 'Full employment' which I thought was quite amazing and increased my new found love to going back to Math, one can actually logicall prove that computer scienetists will never run out of jobs, well done Math!

Enlightenment, I am not alone

Alex was of the opinion that I am taking it too far again and that this is not useful, but I was able to partially convince of the usefulness because it would make many more papers and academic writing accessible to me. Anyway, I kept digging, this time into the base of the equations I wanted to detail, the Bellman Equations, which led me to this document: http://www.wu.ac.at/usr/h99c/h9951826/bellman_dynprog.pdf, this is the main topic of this post, why? because this genius named Richard Bellman, who writes "At Stanford I had a chance to do analytic number theory, which I had wanted to do since I was sixteen." !! touched on many of the topics I am worried about and that I described here and constantly trying to understand better, it was extremely enlightening, Level up, here are some quotes and their relation to this post:

“An interesting question is, ‘Where did the name,
dynamic programming, come from?’ The 1950s were not
good years for mathematical research. We had a very interesting
gentleman in Washington named Wilson. He was
Secretary of Defense, and he actually had a pathological
fear and hatred of the word, research. I’m not using the
term lightly; I’m using it precisely. His face would suffuse,
he would turn red, and he would get violent if people used
the term, research, in his presence."
This is related to the desire of people to research things, and that in the end come up with useful things because of that, despite it seeming pointless to the 'management'. Is it pointless looking into RL for locomotion?

"Let’s take a word that has an
absolutely precise meaning, namely dynamic, in the classical
physical sense. It also has a very interesting property
as an adjective, and that is it’s impossible to use the word,
dynamic, in a pejorative sense. Try thinking of some combination
that will possibly give it a pejorative meaning.
It’s impossible. Thus, I thought dynamic programming was
a good name. It was something not even a Congressman
could object to. So I used it as an umbrella for my activities”
There are many ways to present the same idea, it is ok to choose the one that fits the target. This is not directly related and not new, but it shows that even something as cool as RL needed this to start off.
“I could either be a traditional intellectual, or a modern
intellectual using the results of my research for the
problems of contemporary society. This was a dangerous
path. Either I could do too much research and too little
application, or too little research and too much application
."
That's also one of my ongoing concerns that no one else seems to worry about, finding the right balance, and even Bellman used to think about it.

“My first task in dynamic programming was to put it on
a rigorous basis. I found that I was using the same technique
over and over again to derive a functional equation.
I decided to call this technique “The principle of optimality.”
Oliver Gross said one day, ‘The principle is not rigorous.’
I replied, ‘Of course not. It’s not even precise.’ A good
principle should guide the intuition."
Aha, so that's where it all comes from! intuition! and not equation 3.10 with cryptic expansion steps. Relieving...

"This is
pertinent to a comment made by Felix Klein, the great
German mathematician, concerning a certain type of mathematician.
When this individual discovers that he can jump
across a stream, he returns to the other side, ties a chair
to his leg, and sees if he can still jump across the stream.
Some may enjoy this sport; others, like myself, may feel
that it is more fun to see if you can jump across bigger
streams, or at least different ones"
Coding-wize, I tended to be on the 'ties a chair to his legs' type but that was long time ago, and it actually is useful to be in this state for a limited amount of time, checking out the possible extremes is always good, even Buddha checked the extremes before finding the golden middle, it is only logical, how can you know where the middle is if you have never seen where the extremes are! I am therefore happy I have been at both extremes and have developed a good eye for the Golden middle, not only in code.

“What is worth noting about the foregoing development
is that I should have seen the application of dynamic programming
to control theory several years before. I should
have, but I didn’t. It is very well to start a lecture by saying,
‘Clearly, a control process can be regarded as a multistage
decision process in which... ,’ but it is a bit misleading.
Scientific developments can always be made logical and
rational with sufficient hindsight. It is amazing, however,
how clouded the crystal ball looks beforehand. We all wear
such intellectual blinders and make such inexplicable blunders
that it is amazing that any progress is made at all."
I found this one great as well, equations are thrown at us in lectures, in papers, in tutorials, and we are supposed to just say yes it makes sense, this is usually not enough for me, I prefer to be able to get into the context which allowed the 'inventor' to come up with those ideas, based on what he knew, the problem he faced, and how he thought to come up with what he came up with.
A lot of work and time goes into what becomes a one line 'obvious' equation, it was not always obvious! not even to the person that came up with it, treating it as obvious is almost a crime. However I also understand that there is the danger of tying one's legs to the chair if one wants to do this for every tiny bit of theory, not getting any new progress done is a crime as well, balance is key, as usual. This exact dilemma is what started me on today's enlightenment Journey, and again Bellman touches on that :), using a writing style 100x superior to mine of course, but the idea is there ...

"All this contributes to the misleading nature of conventional
history, whether it be analysis of a scientific discovery
or of a political movement. We are always looking at
the situation from the wrong side, when events have already
been frozen in time. Since we know what happened, it is
not too difficult to present convincing arguments to justify a
particular course of events. None of these analyses must be
taken too seriously, no more than Monday morning quarterbacking."

Conclusion

Level up.

.