Introduction
This tutorial presents you the correlations tools offered by Rapaio library.We will use the classical iris data set. The numerical columns of this dataset are:
Frame df = Datasets.loadIrisDataset();
df = ColFilters.retainNumeric(df);
names(df);
>>names("iris")
sepal-length
sepal-width
petal-length
petal-width
Pearson product-moment correlation
Pearson product-moment correlation measures the linear correlation between two random variables. Among other type of correlation measures, the Pearson product-moment detects only linear correlations.Definition
Pearson product-moment coefficient measures the linear correlation between two random variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is negative correlation.Pearson's correlation coefficient when applied to a population is commonly represented by the Greek letter ho (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient. The formula for ρ is:
ρX,Y=cov(X,Y)σXσY=E[(X−μX)(Y−μY)]σXσYPearson's correlation coefficient when applied to a sample is commonly represented by the letter r and may be referred to as the sample correlation coefficient or the sample Pearson correlation coefficient. We can obtain a formula for r by substituting estimates of the covariances and variances based on a sample into the formula above. That formula for r is:
r=∑ni=1(Xi−ˉX)(Yi−ˉY)√∑ni=1(Xi−ˉX)2√∑ni=1(Yi−ˉY)2The interpretation of a correlation coefficient depends on the context and purposes. A correlation of 0.8 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences where there may be a greater contribution from complicating factors.
Usa Rapaio for Pearson correlation
Rapaio library allows one to compute Pearson r for more then one vector at a time. Thus the result will be a matrix with computed r values between vectors,using vectors index position as indexes in resulted matrix. PearsonRCorrelation corr = new PearsonRCorrelation(df);
summary(corr);
pearson[[sepal-length, sepal-width, petal-length, petal-width]] - Pearson product-moment correlation coefficient
1.sepal-length 2.sepal-width 3.petal-length 4.petal-width
1. x -0.109369 0.871754 0.817954
2. -0.109369 x -0.420516 -0.356544
3. 0.871754 -0.420516 x 0.962757
4. 0.817954 -0.356544 0.962757 x
We can spot with eas that many of the attributes are linearly correlated. As a sample we find from the correlation summary that petal-length and petal-width have a very strong linear correlation. Let's check this intuition with a plot:Another r coefficient which have a value close to 1 is between sepal-length and petal-length. Let's check that with a plot, also:
Finally, we plot again, but this time using a coefficient which is closer to 0, which could mean that the variables are not linearly correlated. Such a value for correlation we have between sepal-length and sepal-width.
Spearman's rank correlation coefficient
often denoted by the Greek letter ρ (rho) or as rs, is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.Definition
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. For a sample of size n, the n raw scores Xi, Yi are converted to ranks xi, yi, and ρ is computed from these:ρ=∑i(xi−ˉx)(yi−ˉy)√∑i(xi−ˉx)2∑i(yi−ˉy)2Identical values (rank ties or value duplicates) are assigned a rank equal to the average of their positions in the ascending order of the values.
Use Rapaio to compute Spearman's rank correlation
Rapaio library allows one to compute Spearman ρ for more then one vector at a time. Thus the result will be a matrix with computed ρ values between vectors,using vectors index position as indexes in resulted matrix.spearman[[sepal-length, sepal-width, petal-length, petal-width]] - Spearman's rank correlation coefficient
1.sepal-length 2.sepal-width 3.petal-length 4.petal-width
1. x -0.159457 0.881386 0.834421
2. -0.159457 x -0.303421 -0.277511
3. 0.881386 -0.303421 x 0.936003
4. 0.834421 -0.277511 0.936003 x
pearson[[sepal-length, sepal-width, petal-length, petal-width]] - Pearson product-moment correlation coefficient
1.sepal-length 2.sepal-width 3.petal-length 4.petal-width
1. x -0.109369 0.871754 0.817954
2. -0.109369 x -0.420516 -0.356544
3. 0.871754 -0.420516 x 0.962757
4. 0.817954 -0.356544 0.962757 x
No comments:
Post a Comment