Skip to content

1.6 Five Number Summaries and Box Plots

A Five Number Summary uses five key statistics to describe a dataset: 1. Minimum: Smallest value in the dataset. 2. Q1 (First Quartile): 25th percentile, indicating the value below which 25% of the data lies. 3. Median (Q2): 50th percentile, the middle value of the dataset. 4. Q3 (Third Quartile): 75th percentile, indicating the value below which 75% of the data lies. 5. Maximum: Largest value in the dataset.

Example:

Jyothi's salary data: - Minimum: 18 lakhs. - Q1: 24 lakhs. - Median: 25 lakhs. - Q3: 27 lakhs. - Maximum: 50 lakhs.


Box and Whisker Plot

A Box and Whisker Plot visually represents the Five Number Summary and highlights the spread and outliers.

Components:

  1. Box:
  2. Ends of the box represent Q1 and Q3.
  3. A line inside the box marks the median (Q2).
  4. Sometimes, the mean is indicated (e.g., by a cross).

  5. Whiskers:

  6. Extend from the box to the smallest and largest values within 1.5 × IQR below Q1 and above Q3.
  7. IQR (Interquartile Range): IQR = Q3 - Q1.

  8. Outliers:

  9. Values outside the whisker limits are marked (e.g., as small circles).

Outlier Detection

Whisker Limits:

  • Lower Limit: Q1 - 1.5 × IQR.
  • Upper Limit: Q3 + 1.5 × IQR.

Example (Jyothi's salary data):

  • IQR: 27 - 24 = 3 lakhs.
  • Lower Limit: 24 - (1.5 × 3) = 21 lakhs.
  • Upper Limit: 27 + (1.5 × 3) = 31 lakhs.
  • Outliers: 18, 32, 35, 38, and 50 lakhs.

Comparison with Z-Score Method

  • Z-scores: Identify outliers based on their distance from the mean (e.g., |Z| > 3).
  • Box plots: Identify outliers based on IQR and whisker limits.
  • In Jyothi's salary data:
  • Z-Score Outliers: 38 lakhs, 50 lakhs.
  • Box Plot Outliers: 18 lakhs, 32 lakhs, 35 lakhs, 38 lakhs, 50 lakhs.

Interpretation

  • Combining methods (Box Plot and Z-Score) enhances reliability in identifying outliers.
  • Visualization via box plots provides intuitive insights into data spread and potential anomalies.