May 11, 2015

Biased coin - short tale with my children

Few days ago my both children had to follow an aerosol therapy. You know that thing called nebuliser which convert the medicine to fine mist, which you breathe in through a mask. While not intrusive at all, none of my both children (my son, 6 years old, and my daughter, 7 years old) is happy to follow. When I asked who will be the first to try it, both of them started to point their finger to each other. I didn't want to choose one of them, so I found a clever solution:

"Let's flip a coin! It's fair, and the one who's face will remain up after draw will be the first!"

Everybody was happy. They chose their side with no problem. My son choose the head and his sister the tail. We flip the coin. Head. My son was sad, but he accepted the result. However I started to encourage him.

"Hey, don't be sad. Next time it might come tail. Believe me. Look, if we draw it 
again it will come tail, just to show you".

And I flipped the coin again. Head. No very helpful for my case.

"Well, it happens by chance. But you have to believe me, it's fair. We will try again and it will be tail. Let's see."

I flipped again one more time. Head. Not a nice result. The bias sneaked into my mind. My confidence level started to drop. My children started to smile.

"This time will BE tail!"

Well, it was not. My children started to laugh at me. My karma points were out.

"Really, right now it will be TAIL!" 

And I started to stair at the coin in a strange way. I really wanted to control that damn coin to be reasonable. But I could not control that damn coin. Heads again. My children started to jump and laugh. To Gods of Random was laughing at me, also. I felt betrayed.

After some additional draws I had eight heads, not even a single tail. How can I tell him that is fair? The nine-th time I was hopeless. But the Gods were merciful. Tail. And another head and another tail. Pity consolation. I played my last card. I asked him:

"You see? It comes also tails!"

He looked at me in his own unique generous way, and whispered me:

"Yes. But the heads comes more often."

I lost the battle.

[A day later]

The same problem, same dilemma, same process. But this time the tail was up.My karma level started again to take some strictly positive values.

January 28, 2015

Contour lines and iso bars with rapaio library

Mesh Grid Tutorial Page

Introduction

In this tutorial page I will present a nice graphical tool which can be used to draw 2D surfaces in plane.
For now is implemented only MeshContour, known also as contour curves. In cartography, a contour line joins points of equal elevation (height) above a given level, such as mean sea level. A contour map is a map illustrated with contour lines, for example a topographic map, which thus shows valleys and hills, and the steepness of slopes.
While I was working to find a way to plot contour lines and contour areas, I tried various thing and through iterative approaches I found an algorithm which I hoped that I discovered. In the end, after additional research I found (it happen again and again) that there is already an algorithm which does the same thing in a striking similar way. This algorithm is called Marching Squares.
The idea behind of the algorithm is to build a rectangular grid area with values computed for each point of the grid. I named this object a MeshGrid. A MeshGrid is used to describe the behaviour of a function against some conditions which are used to define the contour lines.
The simplest MeshGrid can be computed from a grid of function values (which are scalar values in this case), and a threshold value used as the elevation of the contour plot. If the grid points are very dense, then a linear interpolation found for each square of the grid will produce a smooth looking curve.
The idea behind a contour map can be extended to any condition applied to the function values. We can think of a MeshGrid with multi-dimensional vectors and conditions could be functions applied on multi-dimensional values.

A simple example: simulation of a kernel density in 2 dimensions

Kernel density estimation is a procedure which enables one to estimate a probability density using only information from a sample of values. Usually a continuous probability density is a smooth curve, but if one has only a hand of sample values it is clear that it is difficult to estimate a continuous density with few discrete values. What kernel density estimation does is to 'spread' the probability mass from each sample value in its neighbourhood. In the end, if the local influences are summed up, a smooth density function appears.
For a uni-variate sample variable DensityLine utility can be used together with KDE objects to build some graphs. The question is: how can this be done in 2 dimensions?
Well, all the kernel densities I used have the nice property that they are symmetric. Thus been said, we note that the 'spreading' of probability will be the same in any direction, for all the points at the same euclidean distance.
Suppose our sample has only 3 points. These 3 points are \( A=(3,3), B=(-1,-1), C=(-2,6)\). These points were chosen at random, they does not have any special meaning.
We will 'spread' probability mass around these points according with a normal distribution with standard deviation equals 2.
 Frame xy = SolidFrame.newWrapOf(
    Numeric.newWrapOf(3, -1, -2).withName("x"),
    Numeric.newWrapOf(3, -1, 6).withName("y")
    );
    Normal d = new Normal(0, 2);
We build a mesh grid of 1-dimensional values. In order to do so, we first build two vectors with coordinates of the mesh grid. Using those vectors, we later build a MeshGrid1D object, which is filled with the values of a given spread function. The spread function sums the values of each 3 normal densities (one from each sample point), density which is computed on the euclidean distance from the current point and each sample point.
 Numeric x = Numeric.newSeq(-3, 10, 0.05);
    Numeric y = Numeric.newSeq(-3, 10, 0.05);
    MeshGrid1D mg = new MeshGrid1D(x, y);
    BiFunction
     bi = (xx, yy) ->
    IntStream.range(0, 3).mapToDouble(
    row -> d.pdf(Math.sqrt(Math.pow(xx - xy.value(row, "x"), 2) + Math.pow(yy - xy.value(row, "y"), 2)))
    ).sum();
    mg.fillWithFunction(bi);
Now that we have a mash grid we draw a plot with one iso band for each quantile interval of with 0.1.
graphics

 Plot p = new Plot();
    Var q = Numeric.newSeq(0, 1, 0.1);
    double[] qq = mg.quantiles(q.stream().mapToDouble().toArray());
    qq[qq.length - 1] = Double.POSITIVE_INFINITY;

    for (int i = 0; i < q.rowCount() - 1; i++) {
    p.add(new MeshContour(mg.compute(qq[i], qq[i + 1]), true, true)
    .color(new Color(0.f, 0.f, 1f, (float) qq[i])).lwd(0.2f)
    );
    }
    p.add(new Points(xy.var("x"), xy.var("y")));
    draw(p, 600, 400);

A more complete example: iris classification

We will use again the classical UCI data set to build an example. Because SVM algorithm that is implemented work only for binary classification case, I will truncate the iris data set.
Because petal variables separates very well all the case, I will use sepal variables, to make a little harder the problem. I will also filter our cases for the third class.
> summary(frame,
    [sepal-length, sepal-width, class])
    rowCount: 100
    complete: 100/100
    varCount: 3
    varNames: [sepal-length, sepal-width, class]
    sepal-length sepal-width class
    Min. : 4.300 Min. : 2.000 Iris-setosa : 50
    1st Qu. : 5.000 1st Qu. : 2.800 Iris-versicolor : 50
    Median : 5.400 Median : 3.050
    Mean : 5.471 Mean : 3.094
    2nd Qu. : 5.900 2nd Qu. : 3.400
    Max. : 7.000 Max. : 4.400
                                                       
graphics

 Frame iris = Datasets.loadIrisDataset();
    iris = iris.mapVars("sepal-length,sepal-width,class");
    iris = iris.stream().filter(s -> s.index(2)!=3).toMappedFrame();
    iris = iris.solidCopy();

    draw(new Plot().add(new Points(iris.var(0), iris.var(1)).color(iris.var(2))));
Now we will build a SMO classifier, with a poly kernel of degree 2, we learn the train set and fit the trained SMO to the same train set to see how it works.
> ConfusionMatrix

    | Predicted
    Actual| Iris-setosa Iris-versicolor| Total
    --------------- --------------- --------------- ---------------
    Iris-setosa| 50 0| 50
    Iris-versicolor| 0 50| 50
    --------------- --------------- --------------- ---------------
    Total| 50 50| 100

    Complete cases 100 from 100
    Acc: 1.000 (Accuracy )
    F1: 1.000 (F1 score / F-measure)
    MCC: 1.000 (Matthew correlation coefficient)
    Pre: 1.000 (Precision)
    Rec: 1.000 (Recall)
    G: 1.000 (G-measure)

 BinarySMO smo = new BinarySMO().withKernel(new PolyKernel(2));
    smo.learn(iris, "class");
    CResult cr = smo.predict(iris);
    new ConfusionMatrix(iris.var("class"), cr.firstClasses()).summary();
It looks like there is no error there. However it is legitimate to ask yourself how it looks the decision function. One possibility is with iso lines.
graphics

The code to generate this is now a little bit tedious. Later on, after some good practice rules emerge from usage scenarios, this kind of procedures will be encapsulated into a more complex procedure with simple and fast syntax.

Last case: an interesting decision surface

I generated a data set consisting on some binary labeled points placed randomly on a chess board. The points are labels according with the color of the square which contains it. I used this in order to plot a nice example of drawing in interesting surface.
> ConfusionMatrix

    | Predicted
    Actual| True False| Total
    ------ ------ ------ ------
    True| 721 79| 800
    False| 8 792| 800
    ------ ------ ------ ------
    Total| 729 871| 1600

    Complete cases 1600 from 1600
    Acc: 0.946 (Accuracy )
    F1: 0.943 (F1 score / F-measure)
    MCC: 0.895 (Matthew correlation coefficient)
    Pre: 0.989 (Precision)
    Rec: 0.901 (Recall)
    G: 0.944 (G-measure)

graphics

And the code which produced the last graph.
 // build the data set

    RandomSource.setSeed(1);

    Var v1 = Numeric.newEmpty().withName("x");
    Var v2 = Numeric.newEmpty().withName("y");
    Var v3 = Nominal.newEmpty().withName("class");

    for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 4; j++) {
    for (int k = 0; k < 100; k++) {
    v1.addValue(RandomSource.nextDouble() + i);
    v2.addValue(RandomSource.nextDouble() + j);
    v3.addLabel((i + j) % 2 == 0 ? "True" : "False");
    }
    }
    }

    Frame df = SolidFrame.newWrapOf(v1, v2, v3);

    // build classifier

    Classifier c = new BinarySMO().withKernel(new WaveletKernel(0.55)).withC(0.1);
    c.learn(df, "class");

    new ConfusionMatrix(df.var("class"), c.predict(df).firstClasses()).summary();

    // new build the mesh grid

    x = Numeric.newSeq(new Minimum(df.var(0)).value() - 1, new Maximum(df.var(0)).value() + 1, 0.05).withName("x");
    y = Numeric.newSeq(new Minimum(df.var(1)).value() - 1, new Maximum(df.var(1)).value() + 1, 0.05).withName("y");

    mg = new MeshGrid1D(x, y);

    Var x1 = Numeric.newEmpty().withName("x");
    Var y1 = Numeric.newEmpty().withName("y");

    for (int i = 0; i < x.rowCount(); i++) {
    for (int j = 0; j < y.rowCount(); j++) {
    x1.addValue(x.value(i));
    y1.addValue(y.value(j));
    }
    }

    // fit the mesh grid values

    Frame grid = SolidFrame.newWrapOf(x1, y1);
    cr = c.predict(grid, true, true);

    pos = 0;
    for (int i = 0; i < x.rowCount(); i++) {
    for (int j = 0; j < y.rowCount(); j++) {
    mg.setValue(i, j, cr.firstDensity().value(pos, 1) - cr.firstDensity().value(pos, 2));
    pos++;
    }
    }

    // and finally plot the results

    p = new Plot();
    double[] pp = new double[]{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1};
    qq = mg.quantiles(pp);

    for (int i = 0; i < qq.length - 1; i++) {
    p.add(new MeshContour(mg.compute(qq[i], qq[i + 1]), false, true).color(new Color(0f, 0f, 1f, (float) pp[i + 1])));
    }
    p.add(new MeshContour(mg.compute(0, Double.POSITIVE_INFINITY), true, false).color(0).lwd(2f));

    WS.draw(p.add(new Points(df.var(0), df.var(1)).color(df.var(2))), 800, 600);

>>>This tutorial is generated with Rapaio document printer facilities.<<<