Statistical Graphs, Charts and Plots

Statistical Consulting Program

On this website you will find detailed descriptions of and directions for some of the most common graphs, charts and plots used with the Exploratory Data Analysis or EDA. 

 

TABLE OF CONTENT

Bar Graphs Pie Charts Stem & Leaf Plots
Histograms Boxplots Details 

        JMP 

 

 

CATEGORICAL VARIABLES

BAR GRAPHS

Bar graphs are frequently used with the categorical data to compare the sizes of categories. Since the values of a categorical variable are labels for the categories, the distribution of a categorical variable gives either the count or the percent of individuals falling into each category. For instance, a category such as someone's political party could possibly have labels like a "democrat," "republican," or "liberal." The heights of the bars show the counts for each category.

Consider this Example:

Grades. The average test grades of 19 students are as follows (on a scale from 0 to 100, with 100 being the highest score):

92 (A), 95 (A), 96 (A), 81 (B), 95 (A), 75 (C), 91 (A), 79 (C), 92 (A), 100 (A), 89 (B), 94 (A), 92 (A), 86 (B), 93 (A), 73 (C), 74 (C), 94 (A), 91 (A)

Shown below is a bar graph with A, B, and C as "labels" for each type of a grade. 

newbargraph.gif (2779 bytes)

 

 

There are 12 A's, 3 B's, and 4 C's.

 

 

 

 

Here is another representation of the same data set, generated with the JMP software. As you can see, among other things, JMP allows users to compare the means, percentages of the total, and frequencies of the categories involved. The diagram you see above is equivalent to the last bar graph shown below.

 

GO TO TOP

 

PIE CHARTS

 
Like bar graphs, pie charts are best used with categorical data to help us see what percentage of the whole each category constitutes. Pie charts require all categories to be included in a graph. Each graph always represents the whole. 

One of the reasons why bar graphs are more flexible than pie charts is the fact that bar graphs compare selected categories, whereas pie charts must either compare all categories or none.

How to create a pie chart?

A circle is divided into pie-shaped pieces that are proportional in size to the corresponding frequencies or percentages of the categories involved. To construct a pie chart we first calculate what percentage of the whole each group constitutes. Look below. Then, since a complete circle has 360 degrees, we multiply the various percentages by 360 to obtain the central angles.
 

How to find the ratio of a piece to the whole or the percentage ?

Divide the number of individuals in a category by the total number of individuals in the sample or population.

Using the same example as for bar graphs, below you will find a pie chart of the students' grades:

                                                          piechart.gif (3116 bytes)

A's 12/19 ~ .632 ~ 63%

B's 3/19 ~ .158~16%

C's 4/19 ~ .211 ~ 21%

On the bottom of this page, you will find more JMP output for this example. The output includes a stem-and-leaf plot, a histogram and a boxplot, which are explained in the quantitative variables part of this website.

GO TO TOP

 

QUANTITATIVE VARIABLES

STEMPLOTS

 
 
Stemplots (sometimes called stem-and-leaf plots) are used with quantitative data to display shapes of distributions, to organize numbers and make them more comprehensible. It is a descriptive technique which gives a good overall impression of the data. Stemplots include the actual numerical values of the observations, where each value is separated into two parts, a stem and a leaf. A stem is usually the first digit, or the leftmost digit(s), and a leaf is the final rightmost digit. We write the stems in a vertical column with the smallest at the top, and draw a vertical line to the right of the column. Finally, we write the leaves in the row to the right of the corresponding stem, starting with the smallest one.

Example:

Grades. The average test grades of 19 students are as follows (on a scale from 0 to 100, with 100 being the highest score): 92 95 96 81 95 75 91 79 92 100 89 94 92 86 93 73 74 94 91

Color coordinated, in increasing order: 

73, 74, 75, 79, 81, 86, 89, 91, 91, 92, 92, 92, 93, 94, 94, 95, 95, 96, 100

 

STEMPLOT#1:

stem | leaf 

     7 | 3 4

     7 | 5 9

     8 | 1

     8 | 6 9 

     9 | 1 1 2 2 2 3 4 4

     9 | 5 5 6

   10 | 0

   10 |

 

STEMPLOT#2:

stem | leaf

     7 | 3 4 5 9

     8 | 1 6 9

     9 | 1 1 2 2 2 3 4 4 5 5 6

   10 | 0

Depending on the number of stems, different conclusions can be drawn about a given data set. In this example, even though both stemplots show a slight left-skeweness of the data set, stemplot#1 reflects that more evidently than stemplot #2.

SCROLL DOWN IF YOU WANT TO SEE JMP OUTPUT TO:

*HISTOGRAMS

*BOXPLOTS

GO TO TOP

HISTOGRAMS
 

 
 
Histograms are yet another graphic way of presenting data to show the distribution of the observations. It is one of the most common forms of graphical presentation of a frequency distribution. A histogram is constructed by representing the measurements or observations that are grouped on a horizontal scale, the interval frequencies on a vertical scale, and drawing rectangles whose bases equal the class intervals and whose heights are determined by the corresponding class frequencies.

To make a histogram, we break the range of values into intervals of equal length. We first count and then display the number of observations in each interval. Bars represent the frequency of observations in the intervals such that the higher the bar is, the higher the frequency. As mentioned before, the standard format of a histogram usually involves a vertical scale that represents the frequencies or the relative frequencies and a horizontal scale that represents the individual intervals. 

How can a histogram be useful?

Just like stem-and-leaf plots, histograms show us shapes of distributions of the observations. If properly constructed, not too few or too many intervals, histograms allow us to determine whether the shape of our data distribution is bell-curved, right-skewed, left-skewed, or neither, based on the overall heights of the bars. Histograms are also useful in identifying possible outliers. If a histogram is symmetric around some value that value equals the average. Half the area under the histogram lies to the left of that value, and half to the right. 

Below you will find two examples of histograms for the same set of grades we first listed in the bar graph section above.

histogram2.gif (3369 bytes)

We seldom use fewer than 6 or more than 15 classes; the exact number that should be used in a given situation depends on the number of measurements or observations we have to group. Each item (measurement or observation) goes into one and only one interval (category). We try to make the intervals cover equal ranges of values.

GO TO TOP


BOXPLOTS

 
 
Boxplots reveal the main features of a batch of data, i.e. how the data are spread out. Any boxplot is a graph of the five-number summary: the minimum score, first quartile (Q1-the median of the lower half of all scores), the median, third quartile (Q3-the median of the upper half of all scores), and the maximum score, with suspected outliers plotted individually. The boxplot consists of a rectangular box, which represents the middle half of all scores (between Q1 and Q3). Approximately one-fourth of the values should fall between the minimum and Q1, and approximately one-fourth should fall between Q3 and the maximum. A line in the box marks the median. Lines called whiskers extend from the box out to the minimum and maximum scores that are not possible outliers. If an observation falls more than 1.5x IQR outside of the box, it is plotted individually as an outlier. 

FIVE-NUMBER SUMMARY:

     
  1. MINIMUM
  2. 1ST QUARTILE
  3. MEDIAN
  4. 3RD QUARTILE
  5. MAXIMUM

What is the IQR?

IQR, or the interquartile range, is the distance between the first and third quartiles. IQR = Q3 - Q1

STANDARD BOXPLOT: 

                                                     boxplot.gif (2185 bytes)

Any value between 1.5IQR and 3IQR is considered a possible outlier and is denoted by a "o." Any value greater than 3IQR is considered an extreme outlier and is denoted by an "*." 

Example:

boxplot2.gif (1373 bytes)

Three possible outliers (o's) and one extreme outlier (*). The symbolic representation of outliers varies among different programs. 

GO TO TOP

JMP JMP JMP JMP JMP JMP JMP JMP JMP

MORE ON JMP