Loading [MathJax]/jax/output/HTML-CSS/jax.js

January 10, 2013

Approximate with a constant value


When we do a regression with a decision tree like CART, as part of the learning bias is the fact that we choose to approximate values from a region with a single constant value. Basically what decision trees does in case of regression is to recursively split the p-dimensional space into regions. Than for each region it selects the corresponding instances and using the class variable values from subset, it calculates a constant value used later for prediction.

How CART and its friends works, regarding the splitting process is not the subject here. What I want to talk about is how to find a proper constant value for a region. To reformulate in simpler terms, I am interested in how to calculate a constant value usable for prediction, given a set of y=(y1,y2,..yn) variables. In order to obtain that we use a loss function. A loss function is a function which establish a certain criteria for choosing the proper constant values.

I was interested in to very often used loss functions: square loss and absolute loss.

Square loss


Square loss function is defined as f(y,c)=(yc)2=ni=1(yic)2 and what we are searching for is the constant c for which the value of loss function is minimal.: c=argmincni=1(yic)2 Here we find an simple way to calculate using the fact that the function is positive. Based on that we know that the function has a minimum and that minimum is given by derivative equals 0. c[ni=1(yic)2]=0 We calculate c[ni=1(yic)2]=ni=1c[(yic)2]=ni=12(yic)c(yic)=ni=12(yic)=0 ni=1yi=ni=1c=nc c=ni=1yin=¯y

Absolute loss


Absolute loss function is defined as f(y,c)=|yc|=ni=1|yic|. Again, what we are searching for is the constant c for which the value of loss function is minimal.: c=argmincni=1|yic| We know that function is positive so we know also that is has a minimum somewhere. This time we can't use derivation. I read somewhere that c is the median, and that is statistically correct. I do not know what is the meaning of statistically correct, and that sounds fuzzy to my. After few tries I saw a way to proof that. I hope some day I will learn about a shorter proof.

We consider without loose of generality that y variables are sorted (y1y2...yn). We are interested first to see what happens when c<y1. We calculate f(y,c)c<y1=ni=1|yic|=ni=1|yi+c+y1y1|=ni=1|y1+c+yiy1|=ni=1|y1+c|+ni=1|yiy1|ni=1|yiy1|=f(y,c)c=y1 So f(y,c)c<y1f(y,c)c=y1 In a similar manner we show also that f(y,c)c>ynf(y,c)c=yn What we found is that to minimize loss absolute loss, c must lie between y1 and yn inclusive. Next we note that, when c is in this interval, the distance to both ends is constant |y1c|+|ync|=|yny1| We simply do not consider anymore those two points (y1,yn), we reduce our interest on interval y2,yn1 and apply recursively the same process.

Finally we will find either a point in the middle, or an interval with no other y values included. If we found a point than this is our solution. If we found an interval, any point from that interval is a possible solution. We note that the definition of the median fits those conditions and is a solution which minimize absolute loss function.

No comments:

Post a Comment