|
WebCab Probability and Statistics v3.5 (J2SE Edition) |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--webcab.lib.statistics.statistics.DataPresentation
The DataPresentation class provides several tabular procedures commonly used to summarize quantitative data. Towards this end we offer methods for the evaluation of the frequency table, relative frequency table and cumulative frequency table for a finite set of (real) numerical values.
A frequency table is a tabular summary of a data set which counts the number of elements from the data set lying in each of the classes given by the boundary points defined in accordance with the open left or open rights boundary convention.
Say for example the boundary points used are {b_1, b_2, ..., b_n},
now if the open left boundary convention is used then the members of the data
set will be assigned to the sub-intervals (or classes):
(-infinity, b_1], (b_1, b_2], (b_2, b_3], (b_3, b_4], (b_4, b_5], (b_5, infinity)
If the open right convention is used then the members of the data set will be
assigned to the sub-intervals (or classes):
(-infinity, b_1), [b_1, b_2), [b_2, b_3), [b_3, b_4), [b_4, b_5), [b_5, infinity)
The frequency tables with respect to the open left and open right boundary
conventions are evaluated using one of frequencyTableOL() or frequencyTableOR()
respectively. However, before these methods are called you are required to set
the data set considered and the boundary points used by called setDataSet(double[])
and setBoundariesIntervals(double[]).
A relative frequency table (or distribution) calculates the proportion of the
total number of items belonging to each class. The data set (or total number of
items) is set using the method setDataSet(double[]). The classes are given by setting
the boundary values of the sub-intervals of the real line by calling the method
setBoundariesIntervals(double[]).
A cumulative frequency table (or distribution) sum the values of the frequency table above or below a given class. This allows us to know the number of items which are ''less than or equal to the upper class limit'' or ''greater than or equal to the lower class limit'' of each class. The cumulative frequency table or distribution) can be evaluated in the four possible cases by using one of the methods:
cFrequencyTableBOL() - Evaluates the Cumulative Frequency Table from below using the open left boundary convention.
cFrequencyTableBOR() - Evaluates the Cumulative Frequency Table from below using the open right boundary convention.
cFrequencyTableAOL() - Evaluates the Cumulative Frequency Table from above using the open left boundary convention.
cFrequencyTableAOR() - Evaluates the Cumulative Frequency Table from above using the open right boundary convention.
| Constructor Summary | |
DataPresentation()
Creates a new instance of the DataPresentation class with an empty initial data set and no boundaries intervals. |
|
DataPresentation(double[] dataSet,
double[] boundaries)
Creates a new DataPresentation instance with a specified initial data set and the boundaries of the intervals in which the data set points will be assigned. |
|
| Method Summary | |
double[] |
cFrequencyTableAOL()
Calculates the cumulative frequency table from above for a discrete data set in accordance with the open left boundary (OLB) convention. |
double[] |
cFrequencyTableAOR()
Calculates the cumulative frequency table from above for a discrete data set in accordance with the open right boundary (ORB) convention. |
double[] |
cFrequencyTableBOL()
Calculates the cumulative frequency table from below for a discrete data set in accordance with the open left boundary (OLB) convention. |
double[] |
cFrequencyTableBOR()
Calculates the cumulative frequency table from below for a discrete data set in accordance with the open right boundary (ORB) convention. |
double[] |
frequencyTableOL()
Calculates the frequency table with respect to the open left boundary convention for a discrete data set. |
double[] |
frequencyTableOR()
Calculates the frequency table with respect to the open right boundary convention for a discrete data set. |
double[] |
getBoundariesIntervals()
Retrieve the currently registered boundary intervals which have been set using setBoundariesIntervals. |
double[] |
getDataSet()
Retrieves the currently registered data set. |
double[] |
relativeFrequencyTableOL()
Calculates the relative frequency table for a discrete data set in accordance with the open left boundary (OLB) convention. |
double[] |
relativeFrequencyTableOR()
Calculates the relative frequency table for a discrete data set in accordance with the open right boundary (ORB) convention. |
void |
setBoundariesIntervals(double[] boundaries)
Registers new boundary intervals. |
void |
setDataSet(double[] dataSet)
Registers a new data set used from within the business methods of this class. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public DataPresentation()
public DataPresentation(double[] dataSet,
double[] boundaries)
Lets assume for example that we have the data set {1.4, 2.4, 2,5, 2.0, 1.5},
and we have the boundary set of {0, 1, 2, 3}. Now the boundary set
will divide the interval into the sub-intervals (0,1] (i.e. the
interval from 0, to 1 which does include 0,
but does not include 1), (1,2], (2,3].
Now the values of the data set which belong to each of these intervals
are as follows:
(0, 1] interval has no element from the data set.
(1, 2] interval has the elements 1.4, 1.5, 2.0 from the data set.
(2, 3] interval has the elements 2.4, 2.5 from the data set.
dataSet - an array where the k-th term corresponds to the k-th
element of the data set.boundaries - a strictly increasing sequence of boundaries of
the intervals over the real line in which the data sets point will
be assigned.| Method Detail |
public void setDataSet(double[] dataSet)
Say we wish to display the results of an experiment which has the following
outcomes 1.4, 2.4, 2,5, 2.0, 1.5; within a tabulated form, for
examples a bar or frequency chart. In first step required in order to achieve
this is to register the data set using this methods. After which the boundaries
can be set using setBoundariesIntervals(double[]) and then the frequency table
evaluated using one of frequencyTableOL() or frequencyTableOR().
dataSet - the set of values to be registered.
StatisticsException - if the data set or boundary intervals are null.public double[] getDataSet()
setDataSet(double[]).
public void setBoundariesIntervals(double[] boundaries)
Lets assume for example that we have the data set {1.4, 2.4, 2,5, 2.0, 1.5},
and we have the boundary set of {0, 1, 2, 3}. Now if you use the convention of an
open left boundary then this boundary set will divide the interval into the sub-intervals
(-infinity, 0], (0,1] (i.e. the interval from 0, to
1 which does include 0, but does not include 1),
(1,2], (2,3]. Now the values of the data set which belong to each
of these intervals are as follows:
(-infinity, 0] interval has no data set elements.
(0, 1] interval has no elements from the data set.
(1, 2] interval has the elements 1.4, 1.5, 2.0; from the data set.
(2, 3] interval has the elements 2.4, 2.5; from the data set.
boundaries - a strictly increasing sequence of boundaries of the intervals over the real line in which the data sets point will be assigned.public double[] getBoundariesIntervals()
setBoundariesIntervals. Note that the boundaries
divide the real line into subintervals in which members of the data set considered
are assigned.
public double[] frequencyTableOL()
throws StatisticsException
{b_1, b_2, ..., b_n},
now the first term of the array returned which represents the frequency table, is the
number of elements from the data set within the interval (-infinity, b_1),
the second term of the array returned is the number of elements from the data set within
the interval [b_1, b_2), and so on...
Note that before this method is called you are required it to set the discrete data
set and the boundaries intervals are set to calling setDataSet(double[]) and
setBoundariesIntervals(double[]) respectively.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open left boundary convention then the
real line will be divided into the sub-intervals:
(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
if the assign this data set in accordance with the above the conventions then we will
have: using Open Left Boundary (OLB) convention:
(-infinity, 1], we assign the data element 0.5;
and hence the frequency of this interval is 1.
(1, 2], we assign the data element 1.4, 1.3, 2.0;
and hence the frequency of this interval (wrt OLB convention) is 3.
(2, 3], we assign the data element 2.3, and hence
the frequency of this interval (wrt OLB convention) is 1.
(3, 4], we assign no data elements, and hence the frequency of
this interval (wrt OLB convention) is 0.
(4, 5], we assign the data element 4.5, and hence
the frequency of this interval (wrt OLB convention) is 1.
(5, infinity), we assign the data element 5.5, and
hence the frequency of this interval (wrt OLB convention) is 1.
Hence, in this case the array returned corresponding to the frequency table will be
{1, 3, 1, 0, 1, 1}.
StatisticsException - thrown if the data set or boundary intervals are null.
public double[] frequencyTableOR()
throws StatisticsException
{b_1, b_2, ..., b_n}, now the
first term of the array returned which represents the frequency table, is the number of elements
from the data set within the interval (-infinity, b_1), the second term of the array
returned is the number of elements from the data set within the interval [b_1, b_2),
and so on...
The discrete data set and the boundary intervals are set using setDataSet(double[]) and
setBoundariesIntervals(double[]) respectively.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open right boundary convention then the
real line will be divided into the sub-intervals:
(-infinity, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
if you assign this data set in accordance with the above the conventions then we will
have using Open Right Boundary (ORB) convention:
(-infinity, 1], we assign the data element 0.5;
and hence the frequency of this interval is 1.
[1, 2), we assign the data element 1.4, 1.3;
and hence the frequency of this interval (wrt ORB convention) is 2.
[2, 3), we assign the data element 2.0, 2.3;
and hence the frequency of this interval (wrt ORB convention) is 2.
[3, 4), we assign no data elements, and hence the frequency
of this interval (wrt ORB convention) is 0.
[4, 5), we assign the data element 4.5,
and hence the frequency of this interval (wrt ORB convention) is 1.
[5, infinity), we assign the data element 5.5,
and hence the frequency of this interval (wrt ORB convention) is 1.
Hence, in this case the array returned corresponding to the frequency table will be
{1, 2, 2, 0, 1, 1}.
StatisticsException - thrown if the data set or the boundary intervals are null.
public double[] relativeFrequencyTableOL()
throws StatisticsException
frequencyTableOL().
Before this method is called you are required to set the discrete data set
and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).
If we are comparing two or more data sets then the frequencies should be normalized to reflect the possible different sizes of the data sets themselves. To normalize a data set we much first divide the data set into a collection of classes into which the elements are assigned. Here we assign the data set in accordance with the open left boundary convention where the class frequencies are just the number of elements within each of the sub-intervals of the real line in accordance with the open left boundary convention (see example below).
To evaluate the relative frequency we apply the following formula to each
class:
Relative frequency = (class frequency) / (total frequency)
where the class frequency is the number of data points within a given sub-interval
of the real line, and the total frequency is the total number of elements within
the data set considered.
Consider the set of boundaries { b_1, b_2, b_3, b_4, b_5 }, where
b_1 < b_2 < b_3 < b_4 < b_5, which divide the real line into six
sub-intervals. Now if we use the open left boundary convention then the real line
will be divided into the sub-intervals:
(-infinity, b_1], (b_1,b_2], (b_2,b_3], (b_3,b_4], (b_4,b_5], (5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
StatisticsException - thrown if the data set is null.
public double[] relativeFrequencyTableOR()
throws StatisticsException
frequencyTableOL().
Before this method is called you are required to set the discrete data set and
the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).
If we are comparing two or more data sets then the frequencies should be normalized to reflect the possible different sizes of the data sets themselves. To normalize a data set we much first divide the data set into a collection of classes into which the elements are assigned. Here we assign the data set in accordance with the open right boundary convention where the class frequencies are just the number of elements within each of the sub-intervals of the real line in accordance with the open right boundary convention (see example below).
To evaluate the relative frequency we apply the following formula to each
class:
Relative frequency = (class frequency) / (total frequency)
where the class frequency is the number of data points within a given sub-interval
of the real line, and the total frequency is the total number of elements within
the data set considered.
Consider the set of boundaries { b_1, b_2, b_3, b_4, b_5 }, where
b_1 < b_2 < b_3 < b_4 < b_5, which divide the real line into
six sub-intervals. Now if we use the open right boundary convention then the real line
will be divided into the sub-intervals:
(-infinity, b_1), [b_1,b_2), [b_2,b_3), [b_3,b_4), [b_4,b_5), [b_5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
StatisticsException - thrown if the data set is null.
public double[] cFrequencyTableBOL()
throws StatisticsException
Before this method is called you are required to set the discrete data set and
the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[])
respectively.
Within this example we work through an illustration in which the cumulative frequency table from below using the open left boundary convention is evaluated.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open left boundary convention the the real
line will be divided into the sub-intervals:
(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
if the assign this data set in accordance with the above the conventions then we will
have: using Open Left Boundary (OLB) convention:
(-infinity, 1], we assign the data element 0.5;
and hence the frequency of this interval is 1.
(1, 2], we assign the data element 1.4, 1.3, 2.0;
and hence the frequency of this interval (wrt OLB convention) is 3.
(2, 3], we assign the data element 2.3, and hence
the frequency of this interval (wrt OLB convention) is 1.
(3, 4], we assign no data elements, and hence the frequency of
this interval (wrt OLB convention) is 0.
(4, 5], we assign the data element 4.5, and hence
the frequency of this interval (wrt OLB convention) is 1.
(5, infinity), we assign the data element 5.5, and
hence the frequency of this interval (wrt OLB convention) is 1.
Now in follows that the associated values of the cumulative frequency table are given by:
1 is: 1
2 is: 4
3 is: 5
4 is: 5
5 is: 6
5 is: 7
Hence, for this case the array returned by this methods to represent the cumulative
frequency table would be: {1, 4, 5, 5, 6, 7}.
StatisticsException - thrown if the data set or the boundary intervals are null.
public double[] cFrequencyTableBOR()
throws StatisticsException
Before this method is called you are required to set the discrete data set and
the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).
Within this example we work through an illustration in which the cumulative frequency table from below using the open right boundary convention is evaluated.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open right boundary convention then the
real line will be divided into the sub-intervals:
(-infinity, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
if the assign this data set in accordance with the above the conventions then we will
have using Open Right Boundary (ORB) convention:
(-infinity, 1], we assign the data element 0.5;
and hence the frequency of this interval is 1.
[1, 2), we assign the data element 1.4, 1.3;
and hence the frequency of this interval (wrt ORB convention) is 2.
[2, 3), we assign the data element 2.0, 2.3;
and hence the frequency of this interval (wrt ORB convention) is 2.
[3, 4), we assign no data elements, and hence the frequency
of this interval (wrt ORB convention) is 0.
[4, 5), we assign the data element 4.5,
and hence the frequency of this interval (wrt ORB convention) is 1.
[5, infinity), we assign the data element 5.5,
and hence the frequency of this interval (wrt ORB convention) is 1.
Now in follows that the associated values of the cumulative frequency table are given by:
1 is: 1
2 is: 3
3 is: 5
4 is: 5
5 is: 6
5 is: 7
Hence, for this case the array returned by this methods to represent the cumulative
frequency table would be: {1, 3, 5, 5, 6, 7}.
StatisticsException - thrown if the data set or boundary intervals are null.
public double[] cFrequencyTableAOL()
throws StatisticsException
Before this method is called you are required to set the discrete data set and
the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).
Within this example we work through an illustration in which the cumulative frequency table from above using the open left boundary convention is evaluated.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open left boundary convention then
the real line will be divided into the sub-intervals:
(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
if the assign this data set in accordance with the above the conventions then we will
have: using Open Left Boundary (OLB) convention:
(-infinity, 1], we assign the data element 0.5;
and hence the frequency of this interval is 1.
(1, 2], we assign the data element 1.4, 1.3, 2.0;
and hence the frequency of this interval (wrt OLB convention) is 3.
(2, 3], we assign the data element 2.3, and hence
the frequency of this interval (wrt OLB convention) is 1.
(3, 4], we assign no data elements, and hence the frequency of
this interval (wrt OLB convention) is 0.
(4, 5], we assign the data element 4.5, and hence
the frequency of this interval (wrt OLB convention) is 1.
(5, infinity), we assign the data element 5.5, and
hence the frequency of this interval (wrt OLB convention) is 1.
Now in follows that the associated values of the cumulative frequency table are given by:
-infinity is: 1 + 1 + 0 + 1 + 3 + 1 = 7
1 is: 1 + 0 + 1 + 3 + 1 = 6
2 is: 0 + 1 + 3 + 1 = 5
3 is: 1 + 3 + 1 = 5
4 is: 3 + 1 = 4
5 is: 1
Hence, for this case the array returned by this methods to represent the cumulative
frequency table would be: {7, 6, 5, 5, 4, 1}.
StatisticsException - thrown is the data set or boundary intervals are null.
public double[] cFrequencyTableAOR()
throws StatisticsException
Before this method is called you are required to set the discrete data set and the boundaries
interval using setDataSet(double[]) and setBoundariesIntervals(double[]).
Within this example we work through an illustration in which the cumulative frequency table from above using the open right boundary convention is evaluated.
Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real
line into six sub-intervals. Now if we use the open right boundary convention then
the real line will be divided into the sub-intervals:
(-infinity, 1), [1,2), [2,3), [3,4), [4,5), [5, infinity)
Note that, each point on the real line can be assigned to one of these sub-intervals
and therefore when assigning a data point to one of these intervals there will only
be one sub-interval in which it belongs.
Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5},
and if we assign this data set in accordance with the Open Right Boundary (ORB) convention
then we will have:
(-infinity, 1), we assign the data element 0.5;
and hence the frequency of this interval is 1.
[1, 2), we assign the data element 1.4, 1.3;
and hence the frequency of this interval (wrt ORB convention) is 2.
[2, 3), we assign the data element 2.0, 2.3, and hence
the frequency of this interval (wrt ORB convention) is 2.
[3, 4), we assign no data elements, and hence the frequency of
this interval (wrt ORB convention) is 0.
[4, 5), we assign the data element 4.5, and hence
the frequency of this interval (wrt ORB convention) is 1.
[5, infinity), we assign the data element 5.5, and
hence the frequency of this interval (wrt ORB convention) is 1.
Now in follows that the associated values of the cumulative frequency table are given by:
-infinity is: 1 + 1 + 0 + 2 + 2 + 1 = 7
1 is: 1 + 0 + 2 + 2 + 1 = 6
2 is: 0 + 2 + 2 + 1 = 5
3 is: 2 + 2 + 1 = 5
4 is: 2 + 1 = 3
5 is: 1
Hence, for this case the array returned by this methods to represent the cumulative
frequency table would be: {7, 6, 5, 5, 3, 1}.
StatisticsException - thrown if the data set has not been set.
|
WebCab Probability and Statistics v3.5 (J2SE Edition) |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||