Broken thoughts: statistics

Showing posts with label statistics. Show all posts

June 7, 2016

z test in rapaio library

Hypothesis testing

Rapaio library aims to contain an extensive set of alternatives for hypothesis testing. Right now there are available just some of them. Hypothesis testing an invaluable tool to answer questions when we are dealing with uncertainty.

We deal with presenting hypothesis testing by following some examples.

Z tests

Any hypothesis test which uses normal distribution for the computed statistic is named a z test. We should note that z tests needs to know standard deviations for the involved populations. It is accustomed that when the sample is large enough the value of the population standard deviation can be estimated from data. This is not implemented in library. For the case when one does not know the involved standard deviations one can use t tests.

Note than because of this requirement, the z tests are rarely used. This is so because we rarely know the population parameters.

Example 1: One sample z-test

Sue is in charge of Quality Control at a bottling facility¹. She is checking the operation of a machine that should deliver 355 mL of liquid into an aluminum can. If the machine delivers too little, then the local Regulatory Agency may fine the company. If the machine delivers too much, then the company may lose money. For these reasons, Sue is looking for any evidence that the amount delivered by the machine is different from 355 mL.

During her investigation, Sue obtains a random sample of

10

cans. She measures the following volumes:

355.02 355.47 353.01 355.93 356.66 355.98 353.74 354.96 353.81 355.79

The machine's specifications claim that the amount of liquid delivered varies according to a normal distribution, with mean

μ

= 355 mL and standard deviation

σ

= 0.05 mL.

Do the data suggest that the machine is operating correctly?

The null hypothesis is that the machine is operating according to its specifications; thus

H_{0} : μ = 355

(

μ

is the mean volume delivered by the machine)

Sue is looking for evidence of any difference; thus, the alternate hypothesis is

H_{1} : μ \neq 355

Since the hypothesis concerns a single population mean and the population follows a normal distribution with known standard deviation, a z-test is appropriate.

What we can do is to use HTTools facility which offers shortcut methods to all implemented hypothesis tests. One of them is one sample z test which enables one to test if the sample mean is far from the expected sample mean.

// build the sample
Var cans = Numeric.copy(355.02, 355.47, 353.01, 355.93, 356.66, 355.98, 353.74, 354.96, 353.81, 355.79);
// run the test and print results
HTTools.zTestOneSample(cans, 355, 0.05).printSummary();

> HTTools.zTestOneSample

 One Sample z-test

mean: 355
sd: 0.05
significance level: 0.05
alternative hypothesis: two tails P > |z|

sample size: 10
sample mean: 355.037
z score: 2.3400855
p-value: 0.019279327322640594
conf int: [355.0060102,355.0679898]

The interpretation of the results is the following:

the z-score is $2.34$ , which means that the computed sample mean is greater with more than 2 standard deviations $2.34$
for critical level being $0.05$ and p-value $0.019$ , we reject the null hypothesis that the mean volume delivered by the machine is equal with $355$

Note: even if we know that the sample mean is greater than the proposed mean, we cannot propose this conclusion. The proper conclusion would be that is different than $355$ .

What if we ask if the machine produces more than standard specification?

We deal with this question by changing the null hypothesis. Our hypotheses become:

H_{0} : μ \leq 355

H_{1} \geq 355

Our code looks like:

HTTools.zTestOneSample(cans, 
  355, \\ mean
  0.05, \\ sd 
  0.05, \\ significance level
  HTTools.Alternative.GREATER_THAN \\ alternative
).printSummary();

> HTTools.zTestOneSample

 One Sample z-test

mean: 355
sd: 0.05
significance level: 0.05
alternative hypothesis: one tail P > z

sample size: 10
sample mean: 355.037
z score: 2.3400855
p-value: 0.009639663661320297
conf int: [355.0060102,355.0679898]

As expected the statistical power of this test is increased. As a consequence the p value was smaller and we still reject the null hypothesis. In this case we had an obvious case, when testing one side gave the same result as testing with two sides. I gave example just to help the user to pay attention to those kind of details.

1: This example is adapted from here.

Example 2: One sample z-test

A herd of $1500$ steer was fed a special high‐protein grain for a month. A random sample of $29$ were weighed and had gained an average of $6.7$ pounds. If the standard deviation of weight gain for the entire herd is $7.1$ , test the hypothesis that the average weight gain per steer for the month was more than $5$

pounds.²

We have the following null and alternative hypothesis:

H_{0} : μ = 5

H_{1} : μ > 5

ZTestOneSample ztest = HTTools.zTestOneSample(
  6.7, // sample mean
  29, // sample size
  5, // tested mean 
  7.1, // population standard deviation
  0.05, // significance level
  HTTools.Alternative.GREATER_THAN // alternative
);
ztest.printSummary();

> HTTools.zTestOneSample

 One Sample z-test

mean: 5
sd: 7.1
significance level: 0.05
alternative hypothesis: one tail P > z

sample size: 29
sample mean: 6.7
z score: 1.2894057
p-value: 0.0986285477062051
conf int: [4.1159112,9.2840888]

P-value is greater than significance level which means that we cannot reject the null hypothesis. We don't have enough evidence.

2: This example is taken from here.

Example 3: Two samples z test

The amount of a certain trace element in blood is known to vary with a standard deviation of $14.1$ ppm (parts per million) for male blood donors and $9.5$ ppm for female donors. Random samples of $75$ male and $50$ female donors yield concentration means of $28$ and $33$ ppm, respectively. What is the likelihood that the population means of concentrations of the element are the same for men and women?³

According with central limit theorem we can assume that the distribution of the sample mean is a normal distribution. More than that, since we have random samples, than the sample mean difference has a normal distribution. And because we know the standard deviation for each population, we can use a two sample z test for testing the difference of the sample means.

H_{0} : μ_{1} - μ_{2} = 0

H_{1} : μ_{1} - μ_{2} \neq 0

HTTools.zTestTwoSamples(
                28, 75, // male sample mean and size
                33, 50, // female sample mean and size
                0, // difference of means
                14.1, 9.5, // standard deviations
                ).printSummary();

> HTTools.zTestTwoSamples

 Two Samples z-test

x sample mean: 28
x sample size: 75
y sample mean: 33
y sample size: 50
mean: 0
x sd: 14.1
y sd: 9.5
significance level: 0.05
alternative hypothesis: two tails P > |z|

sample mean: -5
z score: -2.3686842
p-value: 0.017851489594360337
conf int: [-9.1372421,-0.8627579]

The test run with

0.05

significance level (because it was a default value). The alternative is two tails since we test for difference in means not equal with zero. The resulted p-value is lower than the significance value which means that we reject the hypothesis that the two populations have the same mean. We can see that also from confidence interval, since it does not include

0

Note: If we would considered a significance level of

0.01

than we would not be able to reject the null hypothesis.

3: This example is taken from here.

May 11, 2015

Biased coin - short tale with my children

Few days ago my both children had to follow an aerosol therapy. You know that thing called nebuliser which convert the medicine to fine mist, which you breathe in through a mask. While not intrusive at all, none of my both children (my son, 6 years old, and my daughter, 7 years old) is happy to follow. When I asked who will be the first to try it, both of them started to point their finger to each other. I didn't want to choose one of them, so I found a clever solution:

"Let's flip a coin! It's fair, and the one who's face will remain up after draw will be the first!"

Everybody was happy. They chose their side with no problem. My son choose the head and his sister the tail. We flip the coin. Head. My son was sad, but he accepted the result. However I started to encourage him.

"Hey, don't be sad. Next time it might come tail. Believe me. Look, if we draw it
again it will come tail, just to show you".

And I flipped the coin again. Head. No very helpful for my case.

"Well, it happens by chance. But you have to believe me, it's fair. We will try again and it will be tail. Let's see."

I flipped again one more time. Head. Not a nice result. The bias sneaked into my mind. My confidence level started to drop. My children started to smile.

"This time will BE tail!"

Well, it was not. My children started to laugh at me. My karma points were out.

"Really, right now it will be TAIL!"

And I started to stair at the coin in a strange way. I really wanted to control that damn coin to be reasonable. But I could not control that damn coin. Heads again. My children started to jump and laugh. To Gods of Random was laughing at me, also. I felt betrayed.

After some additional draws I had eight heads, not even a single tail. How can I tell him that is fair? The nine-th time I was hopeless. But the Gods were merciful. Tail. And another head and another tail. Pity consolation. I played my last card. I asked him:

"You see? It comes also tails!"

He looked at me in his own unique generous way, and whispered me:

"Yes. But the heads comes more often."

I lost the battle.

[A day later]

The same problem, same dilemma, same process. But this time the tail was up.My karma level started again to take some strictly positive values.

December 21, 2013

ROC graphs with Rapaio

Back

Data set preparation

For exemplification I will use the classical spam data set. We load the data set and we split randomly in two pieces. The first sample will be used for training purposes and it will have ~ 0.66 of the data, the second sample will be used for testing our model.

        RandomSource.setSeed(2718);
        final Frame spam = ColFilters.retainCols(Datasets.loadSpamBase(), "0-4,spam");
        List samples = randomSample(spam, new int[]{(int) (spam.getRowCount() * 0.6)});
        final Frame train = samples.get(0);
        final Frame test = samples.get(1);

If you are not aware how the data for spam data looks like that what you will have to know is that it consists of many numerical attributes used to predict a nominal attribute called $spam$
Thus we know there are 2788 instances classified as $ham$, codified by value 0 ($not spam$), and 1813 instances codified by 1, which denotes spam emails. There are a lot of numeric features in this data set. We use only the first 5 numerical features for prediction.

>>summary(frame, [word_freq_make, word_freq_address, word_freq_all, word_freq_3d, word_freq_our, spam])
rows: 4601, cols: 6
 word_freq_make  word_freq_address    word_freq_all      word_freq_3d     word_freq_our 
   Min. : 0.000      Min. :  0.000     Min. : 0.000     Min. :  0.000     Min. :  0.000 
1st Qu. : 0.000   1st Qu. :  0.000  1st Qu. : 0.000  1st Qu. :  0.000  1st Qu. :  0.000 
 Median : 0.000    Median :  0.000   Median : 0.000   Median :  0.000   Median :  0.000 
   Mean : 0.105      Mean :  0.213     Mean : 0.281     Mean :  0.065     Mean :  0.312 
2nd Qu. : 0.000   2nd Qu. :  0.000  2nd Qu. : 0.420  2nd Qu. :  0.000  2nd Qu. :  0.383 
   Max. : 4.540      Max. : 14.280     Max. : 5.100     Max. : 42.810     Max. : 10.000 
                                                                                        
    spam 
0 : 2788 
1 : 1813

Now we can do some predictions.

Binary classification

We will build 3 models for prediction. We will use the train test which consists of 66% percents of our initial data. For testing how well the model predicts we use the remaining data.

OneRule

This first model is one of the simplest model possible. It basically build a decision tree with a single level. For documentation obout this algorithm you can check the original paper Holte, R.C. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63-91 (1993).

        OneRule oneRule = new OneRule();
        oneRule.learn(train, "spam");
        oneRule.predict(test);

One of the most used ways to check the performance of a classifier is the accuracy. Accuracy is the percentage of cases with correct prediction from total number of cases. With rapaio library one way to see the accuracy is to summarize the confusion matrix.

        ROC rocOR = new ROC(oneRule.getPrediction(), test.getCol("spam"), "1");

Confusion matrix

      | Predicted
Actual|     0      1| Total 
------ ------ ------ ------ 
     0|   941    182|  1123
     1|   361    357|   718
------ ------ ------ ------ 
 Total|  1302    539|  1841

Complete cases 1841 from 1841
Accuracy: 0.7051

Random Forest

The second prediction model is a random forest with 200 random trees.

        RandomForest rf = new RandomForest().setMtrees(200);
        rf.learn(train, "spam");
        rf.predict(test);

Confusion matrix

      | Predicted
Actual|     0      1| Total 
------ ------ ------ ------ 
     0|   968    155|  1123
     1|   272    446|   718
------ ------ ------ ------ 
 Total|  1240    601|  1841

Complete cases 1841 from 1841
Accuracy: 0.7681

AdaBoost.M1

The third prediction model is a boosting algorithm called AdaBoost.M1. This model is is build with decision stump as a weak learner, and boosting 200 iterations. The following code shows how one can achieve that using rapaio.

        AdaBoostM1 ab = new AdaBoostM1(new DecisionStump(), 200);
        ab.learn(train, "spam");
        ab.predict(test);

Confusion matrix

      | Predicted
Actual|     0      1| Total 
------ ------ ------ ------ 
     0|   909    214|  1123
     1|   263    455|   718
------ ------ ------ ------ 
 Total|  1172    669|  1841

Complete cases 1841 from 1841
Accuracy: 0.7409

ROC Curves

When accuracy is used to compare the performance of some classifiers it is very often the case that the comparison is misleading. That happens because accuracy is a measure which depends on many factors which pose some assumptions which are not always true.
I will not explain what a ROC graph is. There is enought literature on this topic. Among many useful documents, I found one which gives crystal clear details and explanations on ROC curves: Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Machine Learning.
In order to draw ROC graphs for the previous models with rapaio you can use the ROCCurve plot component which builds and draws a curve according with a given computed ROC object. The following code does this.
graphics

        ROC rocOR = new ROC(oneRule.getPrediction(), test.getCol("spam"), "1");
        ROC rocRF = new ROC(rf.getDistribution().getCol("1"), test.getCol("spam"), "1");
        ROC rocAB = new ROC(ab.getDistribution().getCol("1"), test.getCol("spam"), "1");
        draw(new Plot()
                .add(new ROCCurve(rocOR).setColorIndex(1))
                .add(new ROCCurve(rocRF).setColorIndex(2))
                .add(new ROCCurve(rocAB).setColorIndex(3))
                .add(new Legend(0.6, 0.33, 
                        new String[]{"onerule", "rf", "adaboost.m1"}, 
                        new int[]{1, 2, 3})),
                600, 400);

As you can see, ROC objects are used to compute values for ROC curves, and ROCCurve plot is used to add these on a plot graphic.
Note however, that Random Forst model used exhibits a ROC graph which is better than adaboost model most of the times in the conservative area of the graph. AdaBoost tends to be a little better in the liberal area, but in the extreme liberal area, again the random forest model exhibits better performance.
OneRule behaves sub-optimal, as it was expected in this specific case.
>>>This tutorial is generated with Rapaio document printer facilities.<<<

Back