|
On this website you will find detailed descriptions of and directions for some of the most common graphs, charts and plots used with the Exploratory Data Analysis or EDA.
|
TABLE OF CONTENT |
||
| Bar Graphs | Pie Charts | Stem & Leaf Plots |
| Histograms | Boxplots | Details |

|
CATEGORICAL VARIABLES |
| BAR GRAPHS |
are frequently used with the categorical data to compare the sizes of categories. Since the values of a categorical variable are labels for the categories, the distribution of a categorical variable gives either the count or the percent of individuals falling into each category. For instance, a category such as someone's political party could possibly have labels like a "democrat," "republican," or "liberal." The heights of the bars show the counts for each category.
Consider this Example:
Grades. The average test grades of 19 students are as follows (on a scale from 0 to 100, with 100 being the highest score):
92 (A), 95 (A), 96 (A), 81 (B), 95 (A), 75 (C), 91 (A), 79 (C), 92 (A), 100 (A), 89 (B), 94 (A), 92 (A), 86 (B), 93 (A), 73 (C), 74 (C), 94 (A), 91 (A)
Shown below is a bar graph with A, B, and C as "labels" for each type of a grade.
There are 12 A's, 3 B's, and 4 C's.
Here is another representation of the same data set, generated with the JMP software. As you can see, among other things, JMP allows users to compare the means, percentages of the total, and frequencies of the categories involved. The diagram you see above is equivalent to the last bar graph shown below.

| PIE CHARTS |
One of the reasons why bar graphs are more flexible than pie charts is the fact that bar graphs compare selected categories, whereas pie charts must either compare all categories or none.
How to create a pie chart?
A circle is divided into pie-shaped pieces
that are proportional in size to the corresponding frequencies or percentages
of the categories involved. To construct a pie chart we first calculate
what percentage of the whole each group constitutes. Look
below. Then, since a complete circle has 360 degrees, we multiply
the various percentages by 360 to obtain the central angles.
|
How to find the ratio of a piece to the whole or the percentage ? Divide the number of individuals in a category by the total number of individuals in the sample or population. |
Using the same example as for bar graphs, below you will find a pie chart of the students' grades:
A's 12/19 ~ .632 ~ 63%
B's 3/19 ~ .158~16%
C's 4/19 ~ .211
~ 21%

On the bottom of this page, you will find more JMP output for this example. The output includes a stem-and-leaf plot, a histogram and a boxplot, which are explained in the quantitative variables part of this website.

|
QUANTITATIVE VARIABLES |

|
|
Example:
Grades. The average test grades of 19 students are as follows (on a scale from 0 to 100, with 100 being the highest score): 92 95 96 81 95 75 91 79 92 100 89 94 92 86 93 73 74 94 91
Color coordinated, in increasing order:
73, 74, 75, 79, 81, 86, 89, 91, 91, 92, 92, 92, 93, 94, 94, 95, 95, 96, 100
|
STEMPLOT#1: stem | leaf 7 | 3 4 7 | 5 9 8 | 1 8 | 6 9 9 | 1 1 2 2 2 3 4 4 9 | 5 5 6 10 | 0 10 |
|
STEMPLOT#2: stem | leaf 7 | 3 4 5 9 8 | 1 6 9 9 | 1 1 2 2 2 3 4 4 5 5 6 10 | 0 |
| Depending on the number of stems, different
conclusions can be drawn about a given data set. In this example, even though
both stemplots show a slight left-skeweness of the data set, stemplot#1
reflects that more evidently than stemplot #2.
|
SCROLL DOWN IF YOU WANT TO SEE JMP OUTPUT TO:
*HISTOGRAMS
*BOXPLOTS

|
|
To make a histogram, we break the range of values into intervals of equal length. We first count and then display the number of observations in each interval. Bars represent the frequency of observations in the intervals such that the higher the bar is, the higher the frequency. As mentioned before, the standard format of a histogram usually involves a vertical scale that represents the frequencies or the relative frequencies and a horizontal scale that represents the individual intervals.
How can a histogram be useful?
Just like stem-and-leaf plots, histograms show us shapes of distributions of the observations. If properly constructed, not too few or too many intervals, histograms allow us to determine whether the shape of our data distribution is bell-curved, right-skewed, left-skewed, or neither, based on the overall heights of the bars. Histograms are also useful in identifying possible outliers. If a histogram is symmetric around some value that value equals the average. Half the area under the histogram lies to the left of that value, and half to the right.
Below you will find two examples of histograms for the same set of grades we first listed in the bar graph section above.
We seldom use fewer than 6 or more than 15 classes; the exact number that should be used in a given situation depends on the number of measurements or observations we have to group. Each item (measurement or observation) goes into one and only one interval (category). We try to make the intervals cover equal ranges of values.

|
|
FIVE-NUMBER SUMMARY:
What is the IQR?
IQR, or the interquartile range, is the
distance between the first and third quartiles. IQR = Q3 - Q1

STANDARD BOXPLOT:
Any value between 1.5IQR and 3IQR is considered a possible outlier and is denoted by a "o." Any value greater than 3IQR is considered an extreme outlier and is denoted by an "*."
Example:
Three possible outliers (o's) and one extreme outlier (*). The symbolic representation of outliers varies among different programs.

|
|
