WebCab Probability and Statistics
v3.5
(J2SE Edition)

webcab.lib.statistics.statistics
Class DataPresentation

java.lang.Object
  |
  +--webcab.lib.statistics.statistics.DataPresentation
All Implemented Interfaces:
Serializable

public class DataPresentation
extends Object
implements Serializable

The DataPresentation class provides several tabular procedures commonly used to summarize quantitative data. Towards this end we offer methods for the evaluation of the frequency table, relative frequency table and cumulative frequency table for a finite set of (real) numerical values.

Further Explanation

Frequency Tables

A frequency table is a tabular summary of a data set which counts the number of elements from the data set lying in each of the classes given by the boundary points defined in accordance with the open left or open rights boundary convention.

Say for example the boundary points used are {b_1, b_2, ..., b_n}, now if the open left boundary convention is used then the members of the data set will be assigned to the sub-intervals (or classes):

(-infinity, b_1], (b_1, b_2], (b_2, b_3], (b_3, b_4], (b_4, b_5], (b_5, infinity)

If the open right convention is used then the members of the data set will be assigned to the sub-intervals (or classes):

(-infinity, b_1), [b_1, b_2), [b_2, b_3), [b_3, b_4), [b_4, b_5), [b_5, infinity)

The frequency tables with respect to the open left and open right boundary conventions are evaluated using one of frequencyTableOL() or frequencyTableOR() respectively. However, before these methods are called you are required to set the data set considered and the boundary points used by called setDataSet(double[]) and setBoundariesIntervals(double[]).

Relative Frequency Table

A relative frequency table (or distribution) calculates the proportion of the total number of items belonging to each class. The data set (or total number of items) is set using the method setDataSet(double[]). The classes are given by setting the boundary values of the sub-intervals of the real line by calling the method setBoundariesIntervals(double[]).

Cumulative Frequency Table

A cumulative frequency table (or distribution) sum the values of the frequency table above or below a given class. This allows us to know the number of items which are ''less than or equal to the upper class limit'' or ''greater than or equal to the lower class limit'' of each class. The cumulative frequency table or distribution) can be evaluated in the four possible cases by using one of the methods:

  1. cFrequencyTableBOL() - Evaluates the Cumulative Frequency Table from below using the open left boundary convention.
  2. cFrequencyTableBOR() - Evaluates the Cumulative Frequency Table from below using the open right boundary convention.
  3. cFrequencyTableAOL() - Evaluates the Cumulative Frequency Table from above using the open left boundary convention.
  4. cFrequencyTableAOR() - Evaluates the Cumulative Frequency Table from above using the open right boundary convention.

See Also:
Serialized Form

Constructor Summary
DataPresentation()
          Creates a new instance of the DataPresentation class with an empty initial data set and no boundaries intervals.
DataPresentation(double[] dataSet, double[] boundaries)
          Creates a new DataPresentation instance with a specified initial data set and the boundaries of the intervals in which the data set points will be assigned.
 
Method Summary
 double[] cFrequencyTableAOL()
          Calculates the cumulative frequency table from above for a discrete data set in accordance with the open left boundary (OLB) convention.
 double[] cFrequencyTableAOR()
          Calculates the cumulative frequency table from above for a discrete data set in accordance with the open right boundary (ORB) convention.
 double[] cFrequencyTableBOL()
          Calculates the cumulative frequency table from below for a discrete data set in accordance with the open left boundary (OLB) convention.
 double[] cFrequencyTableBOR()
          Calculates the cumulative frequency table from below for a discrete data set in accordance with the open right boundary (ORB) convention.
 double[] frequencyTableOL()
          Calculates the frequency table with respect to the open left boundary convention for a discrete data set.
 double[] frequencyTableOR()
          Calculates the frequency table with respect to the open right boundary convention for a discrete data set.
 double[] getBoundariesIntervals()
          Retrieve the currently registered boundary intervals which have been set using setBoundariesIntervals.
 double[] getDataSet()
          Retrieves the currently registered data set.
 double[] relativeFrequencyTableOL()
          Calculates the relative frequency table for a discrete data set in accordance with the open left boundary (OLB) convention.
 double[] relativeFrequencyTableOR()
          Calculates the relative frequency table for a discrete data set in accordance with the open right boundary (ORB) convention.
 void setBoundariesIntervals(double[] boundaries)
          Registers new boundary intervals.
 void setDataSet(double[] dataSet)
          Registers a new data set used from within the business methods of this class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataPresentation

public DataPresentation()
Creates a new instance of the DataPresentation class with an empty initial data set and no boundaries intervals.


DataPresentation

public DataPresentation(double[] dataSet,
                        double[] boundaries)
Creates a new DataPresentation instance with a specified initial data set and the boundaries of the intervals in which the data set points will be assigned.

Further Explanation

Lets assume for example that we have the data set {1.4, 2.4, 2,5, 2.0, 1.5}, and we have the boundary set of {0, 1, 2, 3}. Now the boundary set will divide the interval into the sub-intervals (0,1] (i.e. the interval from 0, to 1 which does include 0, but does not include 1), (1,2], (2,3]. Now the values of the data set which belong to each of these intervals are as follows:

  1. (0, 1] interval has no element from the data set.
  2. (1, 2] interval has the elements 1.4, 1.5, 2.0 from the data set.
  3. (2, 3] interval has the elements 2.4, 2.5 from the data set.

Parameters:
dataSet - an array where the k-th term corresponds to the k-th element of the data set.
boundaries - a strictly increasing sequence of boundaries of the intervals over the real line in which the data sets point will be assigned.
Method Detail

setDataSet

public void setDataSet(double[] dataSet)
Registers a new data set used from within the business methods of this class. The data set is passed as a parameter and stored within private fields.

Example

Say we wish to display the results of an experiment which has the following outcomes 1.4, 2.4, 2,5, 2.0, 1.5; within a tabulated form, for examples a bar or frequency chart. In first step required in order to achieve this is to register the data set using this methods. After which the boundaries can be set using setBoundariesIntervals(double[]) and then the frequency table evaluated using one of frequencyTableOL() or frequencyTableOR().

Parameters:
dataSet - the set of values to be registered.
Throws:
StatisticsException - if the data set or boundary intervals are null.

getDataSet

public double[] getDataSet()
Retrieves the currently registered data set. Note that before you can get the date set it must be set using the setDataSet(double[]).


setBoundariesIntervals

public void setBoundariesIntervals(double[] boundaries)
Registers new boundary intervals. The boundaries are the points of the real number line which divide the whole interval into sub-intervals in which members of a data set can be assigned.

Further Explanation

Lets assume for example that we have the data set {1.4, 2.4, 2,5, 2.0, 1.5}, and we have the boundary set of {0, 1, 2, 3}. Now if you use the convention of an open left boundary then this boundary set will divide the interval into the sub-intervals (-infinity, 0], (0,1] (i.e. the interval from 0, to 1 which does include 0, but does not include 1), (1,2], (2,3]. Now the values of the data set which belong to each of these intervals are as follows:

  1. (-infinity, 0] interval has no data set elements.
  2. (0, 1] interval has no elements from the data set.
  3. (1, 2] interval has the elements 1.4, 1.5, 2.0; from the data set.
  4. (2, 3] interval has the elements 2.4, 2.5; from the data set.

Parameters:
boundaries - a strictly increasing sequence of boundaries of the intervals over the real line in which the data sets point will be assigned.

getBoundariesIntervals

public double[] getBoundariesIntervals()
Retrieve the currently registered boundary intervals which have been set using setBoundariesIntervals. Note that the boundaries divide the real line into subintervals in which members of the data set considered are assigned.


frequencyTableOL

public double[] frequencyTableOL()
                          throws StatisticsException
Calculates the frequency table with respect to the open left boundary convention for a discrete data set. Say for example the boundary points used are {b_1, b_2, ..., b_n}, now the first term of the array returned which represents the frequency table, is the number of elements from the data set within the interval (-infinity, b_1), the second term of the array returned is the number of elements from the data set within the interval [b_1, b_2), and so on...

Note that before this method is called you are required it to set the discrete data set and the boundaries intervals are set to calling setDataSet(double[]) and setBoundariesIntervals(double[]) respectively.

Example

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open left boundary convention then the real line will be divided into the sub-intervals:

(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, if the assign this data set in accordance with the above the conventions then we will have: using Open Left Boundary (OLB) convention:

Hence, in this case the array returned corresponding to the frequency table will be {1, 3, 1, 0, 1, 1}.

Throws:
StatisticsException - thrown if the data set or boundary intervals are null.

frequencyTableOR

public double[] frequencyTableOR()
                          throws StatisticsException
Calculates the frequency table with respect to the open right boundary convention for a discrete data set. Say for example the boundary points used are {b_1, b_2, ..., b_n}, now the first term of the array returned which represents the frequency table, is the number of elements from the data set within the interval (-infinity, b_1), the second term of the array returned is the number of elements from the data set within the interval [b_1, b_2), and so on...

The discrete data set and the boundary intervals are set using setDataSet(double[]) and setBoundariesIntervals(double[]) respectively.

Example

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open right boundary convention then the real line will be divided into the sub-intervals:

(-infinity, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, if you assign this data set in accordance with the above the conventions then we will have using Open Right Boundary (ORB) convention:

Hence, in this case the array returned corresponding to the frequency table will be {1, 2, 2, 0, 1, 1}.

Throws:
StatisticsException - thrown if the data set or the boundary intervals are null.

relativeFrequencyTableOL

public double[] relativeFrequencyTableOL()
                                  throws StatisticsException
Calculates the relative frequency table for a discrete data set in accordance with the open left boundary (OLB) convention. The relative frequency table normalized the data with regard to the size of the data set before evaluating the frequency table is exactly the same fashion as frequencyTableOL().

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).

Further Explanation

If we are comparing two or more data sets then the frequencies should be normalized to reflect the possible different sizes of the data sets themselves. To normalize a data set we much first divide the data set into a collection of classes into which the elements are assigned. Here we assign the data set in accordance with the open left boundary convention where the class frequencies are just the number of elements within each of the sub-intervals of the real line in accordance with the open left boundary convention (see example below).

To evaluate the relative frequency we apply the following formula to each class:

Relative frequency = (class frequency) / (total frequency)

where the class frequency is the number of data points within a given sub-interval of the real line, and the total frequency is the total number of elements within the data set considered.

Example Illustration the Open Left Boundary Convention

Consider the set of boundaries { b_1, b_2, b_3, b_4, b_5 }, where b_1 < b_2 < b_3 < b_4 < b_5, which divide the real line into six sub-intervals. Now if we use the open left boundary convention then the real line will be divided into the sub-intervals:

(-infinity, b_1], (b_1,b_2], (b_2,b_3], (b_3,b_4], (b_4,b_5], (5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Throws:
StatisticsException - thrown if the data set is null.

relativeFrequencyTableOR

public double[] relativeFrequencyTableOR()
                                  throws StatisticsException
Calculates the relative frequency table for a discrete data set in accordance with the open right boundary (ORB) convention. The relative frequency table normalized the data with regard to the size of the data set before evaluating the frequency table is exactly the same fashion as frequencyTableOL().

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).

Further Explanation

If we are comparing two or more data sets then the frequencies should be normalized to reflect the possible different sizes of the data sets themselves. To normalize a data set we much first divide the data set into a collection of classes into which the elements are assigned. Here we assign the data set in accordance with the open right boundary convention where the class frequencies are just the number of elements within each of the sub-intervals of the real line in accordance with the open right boundary convention (see example below).

To evaluate the relative frequency we apply the following formula to each class:

Relative frequency = (class frequency) / (total frequency)

where the class frequency is the number of data points within a given sub-interval of the real line, and the total frequency is the total number of elements within the data set considered.

Example Illustration the Open Right Boundary Convention

Consider the set of boundaries { b_1, b_2, b_3, b_4, b_5 }, where b_1 < b_2 < b_3 < b_4 < b_5, which divide the real line into six sub-intervals. Now if we use the open right boundary convention then the real line will be divided into the sub-intervals:

(-infinity, b_1), [b_1,b_2), [b_2,b_3), [b_3,b_4), [b_4,b_5), [b_5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Throws:
StatisticsException - thrown if the data set is null.

cFrequencyTableBOL

public double[] cFrequencyTableBOL()
                            throws StatisticsException
Calculates the cumulative frequency table from below for a discrete data set in accordance with the open left boundary (OLB) convention. The value of the cumulative frequency table values at a given point is the number of elements within the data set below the highest value of that interval of the frequency table constructed in accordance with the open left boundary convention.

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]) respectively.

Example

Within this example we work through an illustration in which the cumulative frequency table from below using the open left boundary convention is evaluated.

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open left boundary convention the the real line will be divided into the sub-intervals:

(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, if the assign this data set in accordance with the above the conventions then we will have: using Open Left Boundary (OLB) convention:

Now in follows that the associated values of the cumulative frequency table are given by:

Hence, for this case the array returned by this methods to represent the cumulative frequency table would be: {1, 4, 5, 5, 6, 7}.

Throws:
StatisticsException - thrown if the data set or the boundary intervals are null.

cFrequencyTableBOR

public double[] cFrequencyTableBOR()
                            throws StatisticsException
Calculates the cumulative frequency table from below for a discrete data set in accordance with the open right boundary (ORB) convention. The value of the cumulative frequency table values at a given point is the number of elements within the data set below the highest value of that interval of the frequency table constructed in accordance with the open right boundary convention.

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).

Example

Within this example we work through an illustration in which the cumulative frequency table from below using the open right boundary convention is evaluated.

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open right boundary convention then the real line will be divided into the sub-intervals:

(-infinity, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, if the assign this data set in accordance with the above the conventions then we will have using Open Right Boundary (ORB) convention:

Now in follows that the associated values of the cumulative frequency table are given by:

Hence, for this case the array returned by this methods to represent the cumulative frequency table would be: {1, 3, 5, 5, 6, 7}.

Throws:
StatisticsException - thrown if the data set or boundary intervals are null.

cFrequencyTableAOL

public double[] cFrequencyTableAOL()
                            throws StatisticsException
Calculates the cumulative frequency table from above for a discrete data set in accordance with the open left boundary (OLB) convention. The value of the cumulative frequency table values at a given point is the number of elements within the data set above the lowest value of that interval of the frequency table constructed in accordance with the open left boundary convention.

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).

Example

Within this example we work through an illustration in which the cumulative frequency table from above using the open left boundary convention is evaluated.

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open left boundary convention then the real line will be divided into the sub-intervals:

(-infinity, 1], (1,2], (2,3], (3,4], (4,5], (5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, if the assign this data set in accordance with the above the conventions then we will have: using Open Left Boundary (OLB) convention:

Now in follows that the associated values of the cumulative frequency table are given by:

Hence, for this case the array returned by this methods to represent the cumulative frequency table would be: {7, 6, 5, 5, 4, 1}.

Throws:
StatisticsException - thrown is the data set or boundary intervals are null.

cFrequencyTableAOR

public double[] cFrequencyTableAOR()
                            throws StatisticsException
Calculates the cumulative frequency table from above for a discrete data set in accordance with the open right boundary (ORB) convention. The value of the cumulative frequency table values at a given point is the number of elements within the data set above the lowest value of that interval of the frequency table constructed in accordance with the open right boundary.

Before this method is called you are required to set the discrete data set and the boundaries interval using setDataSet(double[]) and setBoundariesIntervals(double[]).

Example

Within this example we work through an illustration in which the cumulative frequency table from above using the open right boundary convention is evaluated.

Consider the set of boundaries { 1, 2, 3, 4, 5 }, which divide the real line into six sub-intervals. Now if we use the open right boundary convention then the real line will be divided into the sub-intervals:

(-infinity, 1), [1,2), [2,3), [3,4), [4,5), [5, infinity)

Note that, each point on the real line can be assigned to one of these sub-intervals and therefore when assigning a data point to one of these intervals there will only be one sub-interval in which it belongs.

Therefore, if we consider the data set { 0.5, 1.4, 1.3, 2.0, 2.3, 4.5, 5.5}, and if we assign this data set in accordance with the Open Right Boundary (ORB) convention then we will have:

Now in follows that the associated values of the cumulative frequency table are given by:

Hence, for this case the array returned by this methods to represent the cumulative frequency table would be: {7, 6, 5, 5, 3, 1}.

Throws:
StatisticsException - thrown if the data set has not been set.

WebCab Probability and Statistics
v3.5
(J2SE Edition)