Recursive Escapades

Fair warning: I don’t know if I like this post. It may be because I’m tired of looking at the math, but it might also be due to incoherence, or a terribly weak ending. I’d rather post it than work on it more, so heads-up.

Let’s say there are two doors. Behind the left door is $500 and behind the right one is $2. You can only go through one door. Which door do you pick?

This is not a trick question.

I actually can’t say that you’ll pick the left door, as you’re a pretty complex (and probably somewhat random/noisy/stochastic) system. But let’s imagine we ask a robot the same question, and the robot is designed to maximize moolah. I theorize that the robot will (if programmed well) pick the left door because if the robot goes through the left door, it will get more money. Duh. This is simple.

Let’s make it more interesting. Let’s say that behind the left door is $10, and behind the right door is a 50% chance at $100 (and 50% of $0). So, which gives you more money? There’s no way to tell. If we want to program our robot to maximize money, we have to decide how it treats uncertainty. I’m going to define the robot as maximizing average dollars collected. In other words, whenever confronted with a probabilistic outcome, it multiplies the dollars earned in each possibility with the probability of said outcome.

AverageDollars(RightDoor) = $100 * 50% = $50
AverageDollars(LeftDoor) = $10 * 100% = $10
$50 > $10, so pick RightDoor

Now, what if the right door offered a (non-overlapping) 1/3 chance of $18, 1/3 chance of $9, and a 1/3 chance of $6. In this case, the AverageDollars is the sum of the products of each amount with their probabilities.

AverageDollars(RightDoor) = $18 * 1/3 + $9 * 1/3 + $6 * 1/3
= $6 + $3 + $2 = $11

If I define “Utility” as the same thing as AverageDollars, this can be generalized to:

equation 1
Where:
U(x) = Utility of x
A = an Action (like picking the RightDoor or the LeftDoor)
O = an Outcome (how many dollars are behind a door)
P(O|A) = Probability of an Outcome given an Action

Yeah. It’s still pretty basic. You may be bored. Let me make things more interesting. Let’s say I offer you one of two kinds of boxes (Red or Blue), and inside each box is a random prize.

If you open a red box, you have a (non-overlapping) 50% chance of $5, and a %50 chance of $10. But if you open a blue box, you have a %50 chance of $2, and a %50 chance of finding $9 and a red box. Here’s how I work out the math:

equation 2
equation 3

In English, I lump the expected value of the red box in with the $9 when computing the value of the second possibility in the blue box.

Now, what if the red boxes were changed to have a %50 chance of $5, a %40 chance of $10, and a %10 chance of just a blue box?

equation 4
equation 5

Put these two functions into a robot and it’ll recursively compute for infinity. There will ever be potentially smaller boxes infinitely deep, according to the math. Thankfully, this problem has a simple solution. First, I’ll substitute the first equation into the second, expand the terms, and simplify.

equation 5
equation 6
equation 7

Now, I’ll subtract both sides by U(BlueBox)/20, factor out the U components, and divide both sides by the new coefficient.

equation 9
equation 10
equation 11

Ta-da! By waiting to evaluate the recursive term, we can group and factor it out, leaving a simple numerical output. (I ran into this problem when working on a game-playing AI, and I thought up the solution myself! It’s not all too fancy, but I’m happy that at least my algebra is paying off. ^_^) Here’s the general form of the method being applied:

equation 12

As a reminder, here is the general form of the utility function from before (this time with each Outcome O passed through a Value function V, so that we’re allowed to use symbolic outcomes that aren’t just real numbers):

equation 13

Now, if the system in question has any loops, the function cannot be computed in that form. A more computable function would break the function down to the form of the recursive factor method (m*U(A) + b or more simply b/(1-m)).

equation 14
Where:
Vb(O,A) = The value of an Outcome, given an Action, ignoring recursive terms (i.e. those with U(A) as a factor).
Pb(O,A) = The probability of an Outcome, given an Action, ignoring recursive terms.
Vm(O,A) = The value of an Outcome, given an Action, divided by U(A), ignoring NON-recursive terms.
Pm(O,A) = You get the point.

This function isn’t perfect, but it’s pretty cool. The major edge-case that it fails on are piecewise functions with recursive terms in the condition, but it should work on problems resembling the red and blue boxes.

This entry was posted in Math and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Trackback

  1. By Simple Rationality on March 3, 2011 at 2:03 pm

    [...] to content Who I AmHire Me « Recursive Escapades Defining Feedback Loops [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>