Fair warning: I don’t know if I like this post. It may be because I’m tired of looking at the math, but it might also be due to incoherence, or a terribly weak ending. I’d rather post it than work on it more, so heads-up.
Let’s say there are two doors. Behind the left door is $500 and behind the right one is $2. You can only go through one door. Which door do you pick?
This is not a trick question.
I actually can’t say that you’ll pick the left door, as you’re a pretty complex (and probably somewhat random/noisy/stochastic) system. But let’s imagine we ask a robot the same question, and the robot is designed to maximize moolah. I theorize that the robot will (if programmed well) pick the left door because if the robot goes through the left door, it will get more money. Duh. This is simple.
Let’s make it more interesting. Let’s say that behind the left door is $10, and behind the right door is a 50% chance at $100 (and 50% of $0). So, which gives you more money? There’s no way to tell. If we want to program our robot to maximize money, we have to decide how it treats uncertainty. I’m going to define the robot as maximizing average dollars collected. In other words, whenever confronted with a probabilistic outcome, it multiplies the dollars earned in each possibility with the probability of said outcome.
AverageDollars(RightDoor) = $100 * 50% = $50
AverageDollars(LeftDoor) = $10 * 100% = $10
$50 > $10, so pick RightDoor
Now, what if the right door offered a (non-overlapping) 1/3 chance of $18, 1/3 chance of $9, and a 1/3 chance of $6. In this case, the AverageDollars is the sum of the products of each amount with their probabilities.
AverageDollars(RightDoor) = $18 * 1/3 + $9 * 1/3 + $6 * 1/3
= $6 + $3 + $2 = $11
If I define “Utility” as the same thing as AverageDollars, this can be generalized to:

Where:
U(x) = Utility of x
A = an Action (like picking the RightDoor or the LeftDoor)
O = an Outcome (how many dollars are behind a door)
P(O|A) = Probability of an Outcome given an Action
Yeah. It’s still pretty basic. You may be bored. Let me make things more interesting. Let’s say I offer you one of two kinds of boxes (Red or Blue), and inside each box is a random prize.
If you open a red box, you have a (non-overlapping) 50% chance of $5, and a %50 chance of $10. But if you open a blue box, you have a %50 chance of $2, and a %50 chance of finding $9 and a red box. Here’s how I work out the math:


In English, I lump the expected value of the red box in with the $9 when computing the value of the second possibility in the blue box.
Now, what if the red boxes were changed to have a %50 chance of $5, a %40 chance of $10, and a %10 chance of just a blue box?


Put these two functions into a robot and it’ll recursively compute for infinity. There will ever be potentially smaller boxes infinitely deep, according to the math. Thankfully, this problem has a simple solution. First, I’ll substitute the first equation into the second, expand the terms, and simplify.



Now, I’ll subtract both sides by U(BlueBox)/20, factor out the U components, and divide both sides by the new coefficient.



Ta-da! By waiting to evaluate the recursive term, we can group and factor it out, leaving a simple numerical output. (I ran into this problem when working on a game-playing AI, and I thought up the solution myself! It’s not all too fancy, but I’m happy that at least my algebra is paying off. ^_^) Here’s the general form of the method being applied:

As a reminder, here is the general form of the utility function from before (this time with each Outcome O passed through a Value function V, so that we’re allowed to use symbolic outcomes that aren’t just real numbers):

Now, if the system in question has any loops, the function cannot be computed in that form. A more computable function would break the function down to the form of the recursive factor method (m*U(A) + b or more simply b/(1-m)).

Where:
Vb(O,A) = The value of an Outcome, given an Action, ignoring recursive terms (i.e. those with U(A) as a factor).
Pb(O,A) = The probability of an Outcome, given an Action, ignoring recursive terms.
Vm(O,A) = The value of an Outcome, given an Action, divided by U(A), ignoring NON-recursive terms.
Pm(O,A) = You get the point.
This function isn’t perfect, but it’s pretty cool. The major edge-case that it fails on are piecewise functions with recursive terms in the condition, but it should work on problems resembling the red and blue boxes.

One Trackback
[...] to content Who I AmHire Me « Recursive Escapades Defining Feedback Loops [...]