|
WebCab Probability and Statistics v3.5 (J2SE Edition) |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--webcab.lib.statistics.correlation.CorrelationStateful
This is the stateful implementation of the Correlation and Regression class allowing the investigation of linear relationships between two variables using the techniques of correlation and linear regression. This version of the Correlation and Regression functionality allows the data set of pairs which is being studied to be set and then for various qualitative properties of this set of pairs to be evaluated, This approach is particularly appropriate in instances where repeated evaluation of the various Correlation properties will be performed on the same data set. The reason being that for each of these evaluation the data set
This stateful version implements the functionality of the Correlation and Regression class using the OOP notion and technique of state. In instances, where the which will allow for more efficient execution in instances when the data is "sent over the wire" (for example in instances when the data set of retrieved from a remote DBMS).
We study the relationship between two variables by considering a data set of pairs of values which correspond to particular instances of values taken simultaneously by the two underlying variables. We then study the correlation and linear regression properties of this data set in order to deduce information concerning the relationship between the to variables.
In particular, we allow the linear regression line to be constructed which allows us to predict one variable from given values of the other variables to a certain degree of confidence dependent on the `linearity' of the date set. We also cover linear (Pearson's, t-test, z-transform) and rank (Spearman's, Kendall's) correlation.
That is, by using this class for a given data set you are able to decide to what degree two variables are correlated, determine the confidence interval and the level of significance of the correlation tests performed. You are also able to construct the regression line for the data set. Similarly, you can determine for two data samples with corresponds regression lines the confidence interval for the conditional mean between these two regression lines.
Such data sets appear in a number of contexts. Examples of pairs for which such data sets could be constructed include:
By tabulating a given set of students or sales data; respectively against the above criteria, the application of this class would address the following type of questions:
The effectiveness of the functionality in terms of being able to predict values will depend on the nature of the data set considered. The reason being that we will only be able to confidently makes predictions when there exist a strong linear relationship between the two variables considered. The reason being that we have implemented a linear regression model (see note below for more details).
The correlation functionality implemented consists of a number of coefficients which are designed to measuring the correlation (i.e. the degree to which one variable moves with the other) for differing types of sets (see notes below).
addValue - Add pairs of values to the data set one at a time.
addValues - Add pairs of values to the data set many at a time.
pearsonCorrelationCoefficient() - Evaluates
Pearson's Correlation Coefficient.
spearmanRankTest() - Spearman's Rank Correlation
Coefficient.
kendallCorrelationCoefficient() - Evaluates
Kendall's Correlation Coefficient.
significance - Calculates the significance test for a given
correlation coefficient.
meanX - Mean of the values of the first elements of the pairs from which
the current data set is constructed.
meanY - Mean of the values of the second elements of the pairs from which
the current data set is constructed.
sampleVarianceX - The variance of the first elements from the pairs
from which the current data set is constructed.
sampleVarianceY - The variance of the second elements from the pairs
from which the current data set is constructed.
leastSquaresRegressionLineY - Constructs the regression
line of Y on X using the method of least squares.
leastSquaresRegressionLineX - Constructs the regression
line of X on Y using the method of least squares.
coefficientOfDetermination - Calculates the coefficient
of determination for the current set of data.
residuals - Determines the residual for a given pair of points.
residualsAverage - Determines the arithmetic average of all the
residuals.
| Constructor Summary | |
CorrelationStateful()
Creates a new instance of the Correlation class with an empty initial data set. |
|
CorrelationStateful(double[] xValues,
double[] yValues)
Constructs a new Correlation instance using the specified value pairs for its initial data set. |
|
| Method Summary | |
void |
addValue(double xValue,
double yValue)
Adds a new pair of (ordered) numbers (xValue, yValue) to the data set. |
void |
addValues(double[] xValues,
double[] yValues)
Adds a new set of (ordered) pairs of data to the data set. |
double |
coefficientOfDetermination()
Calculates the coefficient of determination for the current set of data. |
double |
estimateX(double yValue)
Estimates the value of the X variable when the Y variable is known
using the regression line of X on Y, which can be evaluated using
leastSquaresRegressionLineX(). |
double |
estimateY(double xValue)
Estimates the value of the Y variable when the X variable is known
using the regression line of Y on X, which can be evaluated using
leastSquaresRegressionLineY(). |
double |
kendallCorrelationCoefficient()
Calculates Kendall's correlation coefficient for the current data set. |
double[] |
leastSquaresRegressionLineX()
Constructs the regression line of X on Y using the method of least
squares. |
double[] |
leastSquaresRegressionLineY()
Constructs the regression line of Y on X using the method of least
squares. |
double |
meanX()
Calculates the arithmetic mean of the elements of the first element (i.e. |
double |
meanY()
Calculates the arithmetic mean of the elements of the second element (i.e. |
double |
pearsonCorrelationCoefficient()
Calculates Pearson's correlation coefficient for the current data set. |
double |
residuals(int index)
Determines the residual for a given pair of points within the current data set in accordance with the regression line constructed using leastSquaresRegressionLineX(). |
double |
residualsAverage()
Determines the arithmetic average of the residuals for all pairs of points within the current data set in accordance with the regression line constructed using leastSquaresRegressionLineX(). |
double |
sampleVarianceX()
Calculates the sample variance of the elements of the first element (i.e. |
double |
sampleVarianceY()
Calculates the sample variance of the elements of the second element (i.e. |
double |
significance(double correlationCoefficient)
Calculates the significance test for a given correlation coefficient. |
double |
spearmanRankTest()
Calculates Spearson's Rank correlation coefficient for the current data set. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public CorrelationStateful()
addValue and the addValues methods.
public CorrelationStateful(double[] xValues,
double[] yValues)
| Method Detail |
public void addValue(double xValue,
double yValue)
(xValue, yValue) to the data set.
xValue - the value in the first variable of the (ordered) pair which is added to the data setyValue - the value in the second variable of the (ordered) pair which is added to the date set.
public void addValues(double[] xValues,
double[] yValues)
Note:
(xValues[i],yValues[i]).
X' value and one `Y'
value.
xValues - an array which are the ordered elements which make up the first variable of the set of (ordered) pairs which is to be added to the data set.yValues - an array which are the ordered elements which make up the second variable of the set of (ordered) pairs which is to be added to the data set.public double pearsonCorrelationCoefficient()
addValue and/or addValues.public double spearmanRankTest()
public double kendallCorrelationCoefficient()
public double significance(double correlationCoefficient)
correlationCoefficient - either evaluation of Pearson's correlation coefficient or Spearson's rank correlation coefficient for the current data set.
public double meanX()
X)
of the pairs of values from which the current data set is constructed.
public double meanY()
Y)
of the pairs of values from which the current data set is constructed.
public double sampleVarianceX()
X)
of the pairs of values from which the current data set is constructed.
public double sampleVarianceY()
Y)
of the pairs of values from which the current data set is constructed.
public double[] leastSquaresRegressionLineY()
Y on X using the method of least
squares. That is, the regression line using the least squares method is constructed when the
second element of the pairs from which the data set of constructed is plot against the first
elements of the pairs.
d with two elements, where the regression line
will be given in functional form by the following formula: Y(y)=d[0]*X+d[1]Calculates the regression line of X on
Y, using the method of least squares.public double estimateY(double xValue)
Y variable when the X variable is known
using the regression line of Y on X, which can be evaluated using
leastSquaresRegressionLineY().
public double[] leastSquaresRegressionLineX()
X on Y using the method of least
squares. That is, the regression line using the least squares method is constructed when the
first element of the pairs from which the data set of constructed is plot against the second
elements of the pairs.
d = {d[0], d[1]}, with two elements, where the regression line
will be given in functional form by the following formula: X(y)=d[0]*Y+d[1]Calculates the regression line of Y on
X, using the method of least squares.public double estimateX(double yValue)
X variable when the Y variable is known
using the regression line of X on Y, which can be evaluated using
leastSquaresRegressionLineX().
public double coefficientOfDetermination()
The coefficient of determination is the amount of variation in the second
variable (i.e. Y) which is explained by the regression line of the second
variable Y on the first variable X
(see leastSquaresRegressionLineY()), divided by the total amount
of variation of the second variable (i.e. Y).
public double residuals(int index)
leastSquaresRegressionLineX(). Recall that
the residual is the variation of the second variable (i.e. Y) around the
regression line.
index - the index of the pair of points within the current data set. The indexing of the pairs of points within the data set starts from 0; and hence 0 corresponds to the first pair of points, 1 to the second pairs of point and so on.
Evaluates the arithmetic average of the residuals.public double residualsAverage()
leastSquaresRegressionLineX(). This method simple determines the arithmetic
mean between all values determined by residuals.
Evaluates the residuals for a given pair of points.
|
WebCab Probability and Statistics v3.5 (J2SE Edition) |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||