Hypothesis testing
(section from online manual) 
Rapaio library aims to contain an extensive 
set of alternatives for hypothesis testing. Right now there are 
available just some of them. Hypothesis testing an invaluable tool to 
answer  questions when we are dealing with uncertainty. 
We deal with presenting hypothesis testing by following some examples.
Z tests
Any hypothesis test which uses normal 
distribution for the computed statistic is named a z test. We should 
note that z tests needs to know standard deviations for the involved 
populations. It is accustomed that when the sample is large enough the 
value of the population standard deviation can be estimated from data. 
This is not implemented in library. For the case when one does not know 
the involved standard deviations one can use t tests. 
Note than because of this requirement, the z
 tests are rarely used. This is so because we rarely know the population
 parameters. 
Example 1: One sample z-test
Sue is in charge of Quality Control at a bottling facility1.
 She is checking the operation of a machine that should deliver 355 mL 
of liquid into an aluminum can. If the machine delivers too little, then
 the local Regulatory Agency may fine the company. If the machine 
delivers too much, then the company may lose money. For these reasons, 
Sue is looking for any evidence that the amount delivered by the machine
 is different from 355 mL.
During her investigation, Sue obtains a random sample of 10  cans. She measures the following volumes:
355.02 355.47 353.01 355.93 356.66 355.98 353.74 354.96 353.81 355.79
The machine's specifications claim that the amount of liquid delivered varies according to a normal distribution, with mean μ  = 355 mL and standard deviation σ  = 0.05 mL.
Do the data suggest that the machine is operating correctly?
The null hypothesis is that the machine is operating according to its specifications; thus
(
Sue is looking for evidence of any difference; thus, the alternate hypothesis is
Since the hypothesis concerns a single 
population mean and the population follows a normal distribution with 
known standard deviation, a z-test is appropriate.
What we can do is to use HTTools 
facility which offers shortcut methods to all implemented hypothesis 
tests. One of them is one sample z test which enables one to test if the
 sample mean is far from the expected sample mean.
 // build the sample
Var cans = Numeric.copy(355.02, 355.47, 353.01, 355.93, 356.66, 355.98, 353.74, 354.96, 353.81, 355.79);
// run the test and print results
HTTools.zTestOneSample(cans, 355, 0.05).printSummary();
> HTTools.zTestOneSample
 One Sample z-test
mean: 355
sd: 0.05
significance level: 0.05
alternative hypothesis: two tails P > |z|
sample size: 10
sample mean: 355.037
z score: 2.3400855
p-value: 0.019279327322640594
conf int: [355.0060102,355.0679898]
The interpretation of the results is the following:
- the z-score is 2.34 , which means that the computed sample mean is greater with more than 2 standard deviations 
- for critical level being 0.05 and p-value0.019 , we reject the null hypothesis that the mean volume delivered by the machine is equal with355 
Note: even if we know that 
the sample mean is greater than the proposed mean, we cannot propose 
this conclusion. The proper conclusion would be that is different than 355 . 
What if we ask if the machine produces more than standard specification?
We deal with this question by changing the null hypothesis. Our hypotheses become:
Our code looks like:
 HTTools.zTestOneSample(cans, 
  355, \\ mean
  0.05, \\ sd 
  0.05, \\ significance level
  HTTools.Alternative.GREATER_THAN \\ alternative
).printSummary();
> HTTools.zTestOneSample
 One Sample z-test
mean: 355
sd: 0.05
significance level: 0.05
alternative hypothesis: one tail P > z
sample size: 10
sample mean: 355.037
z score: 2.3400855
p-value: 0.009639663661320297
conf int: [355.0060102,355.0679898]
As expected the statistical 
power of this test is increased. As a consequence the p value was 
smaller and we still reject the null hypothesis. In this case we had an 
obvious case, when testing one side gave the same result as testing with
 two sides. I gave example just to help the user to pay attention to 
those kind of details.  
Example 2: One sample z-test
A herd of 1500   steer was fed a special high‐protein grain for a month. A random sample of 29  were weighed and had gained an average of 6.7  pounds. If the standard deviation of weight gain for the entire herd is 7.1 , test the hypothesis that the average weight gain per steer for the month was more than 5 
 pounds.2
We have the following null and alternative hypothesis:
 ZTestOneSample ztest = HTTools.zTestOneSample(
  6.7, // sample mean
  29, // sample size
  5, // tested mean 
  7.1, // population standard deviation
  0.05, // significance level
  HTTools.Alternative.GREATER_THAN // alternative
);
ztest.printSummary();
> HTTools.zTestOneSample
 One Sample z-test
mean: 5
sd: 7.1
significance level: 0.05
alternative hypothesis: one tail P > z
sample size: 29
sample mean: 6.7
z score: 1.2894057
p-value: 0.0986285477062051
conf int: [4.1159112,9.2840888]
P-value is greater than significance level which means that we cannot reject the null hypothesis. We don't have enough evidence.
Example 3: Two samples z test
The amount of a certain trace element in blood is known to vary with a standard deviation of 14.1  ppm (parts per million) for male blood donors and 9.5  ppm for female donors. Random samples of 75  male and 50  female donors yield concentration means of 28  and 33  ppm, respectively. What is the likelihood that the population means of 
concentrations of the element are the same for men and women?3
According with central limit theorem we can 
assume that the distribution of the sample mean is a normal 
distribution. More than that, since we have random samples, than the 
sample mean difference has a normal distribution. And because we know 
the standard deviation for each population, we can use a two sample z 
test for testing the difference of the sample means.
 HTTools.zTestTwoSamples(
                28, 75, // male sample mean and size
                33, 50, // female sample mean and size
                0, // difference of means
                14.1, 9.5, // standard deviations
                ).printSummary();
> HTTools.zTestTwoSamples
 Two Samples z-test
x sample mean: 28
x sample size: 75
y sample mean: 33
y sample size: 50
mean: 0
x sd: 14.1
y sd: 9.5
significance level: 0.05
alternative hypothesis: two tails P > |z|
sample mean: -5
z score: -2.3686842
p-value: 0.017851489594360337
conf int: [-9.1372421,-0.8627579]
The test run with 0.05  significance level (because it was a default value). The alternative is
 two tails since we test for difference in means not equal with zero. 
The resulted p-value is lower than the significance value which means 
that we reject the hypothesis that the two populations have the same 
mean. We can see that also from confidence interval, since it does not 
include 0 .
Note: If we would considered a significance level of 0.01 
 than we would not be able to reject the null hypothesis.



