Broken thoughts: January 2015

Mesh Grid Tutorial Page

Introduction

In this tutorial page I will present a nice graphical tool which can be used to draw 2D surfaces in plane.
For now is implemented only MeshContour, known also as contour curves. In cartography, a contour line joins points of equal elevation (height) above a given level, such as mean sea level. A contour map is a map illustrated with contour lines, for example a topographic map, which thus shows valleys and hills, and the steepness of slopes.
While I was working to find a way to plot contour lines and contour areas, I tried various thing and through iterative approaches I found an algorithm which I hoped that I discovered. In the end, after additional research I found (it happen again and again) that there is already an algorithm which does the same thing in a striking similar way. This algorithm is called Marching Squares.
The idea behind of the algorithm is to build a rectangular grid area with values computed for each point of the grid. I named this object a MeshGrid. A MeshGrid is used to describe the behaviour of a function against some conditions which are used to define the contour lines.
The simplest MeshGrid can be computed from a grid of function values (which are scalar values in this case), and a threshold value used as the elevation of the contour plot. If the grid points are very dense, then a linear interpolation found for each square of the grid will produce a smooth looking curve.
The idea behind a contour map can be extended to any condition applied to the function values. We can think of a MeshGrid with multi-dimensional vectors and conditions could be functions applied on multi-dimensional values.

A simple example: simulation of a kernel density in 2 dimensions

Kernel density estimation is a procedure which enables one to estimate a probability density using only information from a sample of values. Usually a continuous probability density is a smooth curve, but if one has only a hand of sample values it is clear that it is difficult to estimate a continuous density with few discrete values. What kernel density estimation does is to 'spread' the probability mass from each sample value in its neighbourhood. In the end, if the local influences are summed up, a smooth density function appears.
For a uni-variate sample variable DensityLine utility can be used together with KDE objects to build some graphs. The question is: how can this be done in 2 dimensions?
Well, all the kernel densities I used have the nice property that they are symmetric. Thus been said, we note that the 'spreading' of probability will be the same in any direction, for all the points at the same euclidean distance.
Suppose our sample has only 3 points. These 3 points are \( A=(3,3), B=(-1,-1), C=(-2,6)\). These points were chosen at random, they does not have any special meaning.
We will 'spread' probability mass around these points according with a normal distribution with standard deviation equals 2.

 Frame xy = SolidFrame.newWrapOf(
    Numeric.newWrapOf(3, -1, -2).withName("x"),
    Numeric.newWrapOf(3, -1, 6).withName("y")
    );
    Normal d = new Normal(0, 2);

We build a mesh grid of 1-dimensional values. In order to do so, we first build two vectors with coordinates of the mesh grid. Using those vectors, we later build a MeshGrid1D object, which is filled with the values of a given spread function. The spread function sums the values of each 3 normal densities (one from each sample point), density which is computed on the euclidean distance from the current point and each sample point.

 Numeric x = Numeric.newSeq(-3, 10, 0.05);
    Numeric y = Numeric.newSeq(-3, 10, 0.05);
    MeshGrid1D mg = new MeshGrid1D(x, y);
    BiFunction
     bi = (xx, yy) ->
    IntStream.range(0, 3).mapToDouble(
    row -> d.pdf(Math.sqrt(Math.pow(xx - xy.value(row, "x"), 2) + Math.pow(yy - xy.value(row, "y"), 2)))
    ).sum();
    mg.fillWithFunction(bi);

Now that we have a mash grid we draw a plot with one iso band for each quantile interval of with 0.1.
graphics

 Plot p = new Plot();
    Var q = Numeric.newSeq(0, 1, 0.1);
    double[] qq = mg.quantiles(q.stream().mapToDouble().toArray());
    qq[qq.length - 1] = Double.POSITIVE_INFINITY;

    for (int i = 0; i < q.rowCount() - 1; i++) {
    p.add(new MeshContour(mg.compute(qq[i], qq[i + 1]), true, true)
    .color(new Color(0.f, 0.f, 1f, (float) qq[i])).lwd(0.2f)
    );
    }
    p.add(new Points(xy.var("x"), xy.var("y")));
    draw(p, 600, 400);

A more complete example: iris classification

We will use again the classical UCI data set to build an example. Because SVM algorithm that is implemented work only for binary classification case, I will truncate the iris data set.
Because petal variables separates very well all the case, I will use sepal variables, to make a little harder the problem. I will also filter our cases for the third class.

> summary(frame,
    [sepal-length, sepal-width, class])
    rowCount: 100
    complete: 100/100
    varCount: 3
    varNames: [sepal-length, sepal-width, class]
    sepal-length sepal-width class
    Min. : 4.300 Min. : 2.000 Iris-setosa : 50
    1st Qu. : 5.000 1st Qu. : 2.800 Iris-versicolor : 50
    Median : 5.400 Median : 3.050
    Mean : 5.471 Mean : 3.094
    2nd Qu. : 5.900 2nd Qu. : 3.400
    Max. : 7.000 Max. : 4.400

 Frame iris = Datasets.loadIrisDataset();
    iris = iris.mapVars("sepal-length,sepal-width,class");
    iris = iris.stream().filter(s -> s.index(2)!=3).toMappedFrame();
    iris = iris.solidCopy();

    draw(new Plot().add(new Points(iris.var(0), iris.var(1)).color(iris.var(2))));

Now we will build a SMO classifier, with a poly kernel of degree 2, we learn the train set and fit the trained SMO to the same train set to see how it works.

> ConfusionMatrix

    | Predicted
    Actual| Iris-setosa Iris-versicolor| Total
    --------------- --------------- --------------- ---------------
    Iris-setosa| 50 0| 50
    Iris-versicolor| 0 50| 50
    --------------- --------------- --------------- ---------------
    Total| 50 50| 100

    Complete cases 100 from 100
    Acc: 1.000 (Accuracy )
    F1: 1.000 (F1 score / F-measure)
    MCC: 1.000 (Matthew correlation coefficient)
    Pre: 1.000 (Precision)
    Rec: 1.000 (Recall)
    G: 1.000 (G-measure)

 BinarySMO smo = new BinarySMO().withKernel(new PolyKernel(2));
    smo.learn(iris, "class");
    CResult cr = smo.predict(iris);
    new ConfusionMatrix(iris.var("class"), cr.firstClasses()).summary();

It looks like there is no error there. However it is legitimate to ask yourself how it looks the decision function. One possibility is with iso lines.
graphics

The code to generate this is now a little bit tedious. Later on, after some good practice rules emerge from usage scenarios, this kind of procedures will be encapsulated into a more complex procedure with simple and fast syntax.

Last case: an interesting decision surface

I generated a data set consisting on some binary labeled points placed randomly on a chess board. The points are labels according with the color of the square which contains it. I used this in order to plot a nice example of drawing in interesting surface.

> ConfusionMatrix

    | Predicted
    Actual| True False| Total
    ------ ------ ------ ------
    True| 721 79| 800
    False| 8 792| 800
    ------ ------ ------ ------
    Total| 729 871| 1600

    Complete cases 1600 from 1600
    Acc: 0.946 (Accuracy )
    F1: 0.943 (F1 score / F-measure)
    MCC: 0.895 (Matthew correlation coefficient)
    Pre: 0.989 (Precision)
    Rec: 0.901 (Recall)
    G: 0.944 (G-measure)

And the code which produced the last graph.

 // build the data set

    RandomSource.setSeed(1);

    Var v1 = Numeric.newEmpty().withName("x");
    Var v2 = Numeric.newEmpty().withName("y");
    Var v3 = Nominal.newEmpty().withName("class");

    for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 4; j++) {
    for (int k = 0; k < 100; k++) {
    v1.addValue(RandomSource.nextDouble() + i);
    v2.addValue(RandomSource.nextDouble() + j);
    v3.addLabel((i + j) % 2 == 0 ? "True" : "False");
    }
    }
    }

    Frame df = SolidFrame.newWrapOf(v1, v2, v3);

    // build classifier

    Classifier c = new BinarySMO().withKernel(new WaveletKernel(0.55)).withC(0.1);
    c.learn(df, "class");

    new ConfusionMatrix(df.var("class"), c.predict(df).firstClasses()).summary();

    // new build the mesh grid

    x = Numeric.newSeq(new Minimum(df.var(0)).value() - 1, new Maximum(df.var(0)).value() + 1, 0.05).withName("x");
    y = Numeric.newSeq(new Minimum(df.var(1)).value() - 1, new Maximum(df.var(1)).value() + 1, 0.05).withName("y");

    mg = new MeshGrid1D(x, y);

    Var x1 = Numeric.newEmpty().withName("x");
    Var y1 = Numeric.newEmpty().withName("y");

    for (int i = 0; i < x.rowCount(); i++) {
    for (int j = 0; j < y.rowCount(); j++) {
    x1.addValue(x.value(i));
    y1.addValue(y.value(j));
    }
    }

    // fit the mesh grid values

    Frame grid = SolidFrame.newWrapOf(x1, y1);
    cr = c.predict(grid, true, true);

    pos = 0;
    for (int i = 0; i < x.rowCount(); i++) {
    for (int j = 0; j < y.rowCount(); j++) {
    mg.setValue(i, j, cr.firstDensity().value(pos, 1) - cr.firstDensity().value(pos, 2));
    pos++;
    }
    }

    // and finally plot the results

    p = new Plot();
    double[] pp = new double[]{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1};
    qq = mg.quantiles(pp);

    for (int i = 0; i < qq.length - 1; i++) {
    p.add(new MeshContour(mg.compute(qq[i], qq[i + 1]), false, true).color(new Color(0f, 0f, 1f, (float) pp[i + 1])));
    }
    p.add(new MeshContour(mg.compute(0, Double.POSITIVE_INFINITY), true, false).color(0).lwd(2f));

    WS.draw(p.add(new Points(df.var(0), df.var(1)).color(df.var(2))), 800, 600);

>>>This tutorial is generated with Rapaio document printer facilities.<<<

Broken thoughts

January 28, 2015

Contour lines and iso bars with rapaio library

Introduction

A simple example: simulation of a kernel density in 2 dimensions

A more complete example: iris classification

Last case: an interesting decision surface