twitter-mono
facebook-mono
linkedin-mono
youtube-mono

Views > Tufte in excel - the box plot

The Box Plot (a.k.a. Box-and-Whisker Plot) provides an extremely useful alternative to looking at a data set in an effort to determine its central tendency, spread, skewness, and the existence of outliers.

A Box Plot focuses on five summary measures of the distribution of the data:
1. The median
2. The lower quartile
3. The upper quartile
4. The smallest observation
5. The largest observation

Here is the standard model Box Plot in Excel. Cluttered with dark shadings and heavy contours activating the negative white space between the data series.

The redesign below—as per Edward Tufte’s work in The Visual Display of Quantitative Information—clears the noise by removing the over-busy fills, grids, whisker caps and contours. Moreover, by muting the whiskers relative to the now highlighted medians (red spots) we can easily track the path of averages between the data series.

The data source for the above chart is shown below. It shows the Corruption Perception Index—an annual ranking prepared by Transparency International—for 198 countries and territories segregated into 12 different regions. The score is a number between 0 and 100. The higher the score the less corrupt a country’s public sector is perceived to be by experts and business executives.

1. The first thing I need to do is calculate the quartiles of the data set by Region. To do that I need to use array formulas and the QUARTILE.INC function of Excel. Add the following formulas in the respective cells:
 Enter 0, 1, 2, 3 and 4 in cells F2, G2, H2, I2 and J2 respectively
 F4 -->  {= QUARTILE.INC(IF($C$4:$C$345 = $E4,$D$4:$D$345,""),F$2)} (Remember this is an array formula so you need to press Shift + Ctrl + Enter)
2. Next, select cell F4 and drag the handle of the cell to the right to copy the formula into cells G4:J4.
3. Then, select the range F4:G4 and drag the handle of the cells down to copy the formulas into cells F5:J15. The data source should look like this:

TUFTE IN EXCEL - THE BOX PLOT

TUFTE IN EXCEL - THE BOX PLOT

4. Next, I need to evaluate the length of each quartile segment which will be the inputs for a Stacked Column Chart. To do that enter the following formulas in the respective cells:
    F20 --> = G4
    G20 --> = H4 – G4
    H20 --> = I4 – H4
5. Next, select the range F20:G20 and then press Ctrl + C to copy the formulas into the clipboard. Then, select the range F21:H31 and press Ctrl + V to paste the formulas as per the below screenshot:

6. Select the range E19:H31 and then, on the Insert tab, click Insert > Insert Column or Bar Chart > Stacked Column. The chart should look like this:

7. Next I’m going to use Error Bars to display the whiskers. To do that introduce the following formulas in the respective cells:
  I20 --> = G4 – F4
 J20 --> = J4 – I4
8. Next select the range I20:J20 and drag the handle to copy the formulas into cells I21:J31. The data source should look like this:

9. Select the chart then click the Charts Elements button located on the upper-right corner of the chart. Next, in the expanded menu, check the Error Bars option, as per the screenshot below:

10. Lets do some cleaning. Delete the Chart Title, the Vertical (Value) Axis Major Gridlines and the Legend.
11. Turn off the border of the Chart Area by setting the Border to No line.
12. Change the chart font to Trebuchet MS, its size to 9 and color to black.
13. Set the Shape Fill color of the Q1 data series to No Fill.
14. Delete the Series “Q2-Q1” Y Error Bar. Adjust the size of the chart and the chart should look like this:

15. Right-click the Series “Q3 – Q2” Y Error Bars in the chart and then, on the shortcut menu, click Format Error Bars.
16. In the Format Error Bars pane, with Error Bars Option icon selected, set the Direction to Plus and the End Style to No Cap.
17. Set Error Amount to Custom and click on the Specify Value button.
18. In the Custom Error Bars dialog box, edit the Positive Error Value to include cells J20:J31. Press the Ok button to close the Custom Error Bars dialog box.
19. Right-click the Series “Q1” Y Error Bars in the chart and then, on the shortcut menu, click Format Error Bars.
20. In the Format Error Bars pane, with Error Bars Option icon selected, set the Direction to Minus and the End Style to No Cap.
21. Set Error Amount to Custom and click on the Specify Value button.
22. In the Custom Error Bars dialog box, edit the Negative Error Value to include cells I20:I31. Press the Ok button to close the Custom Error Bars dialog box and the chart should look like this:

23. Now I need to add the Median data series to the chart. To do that, on the worksheet select the range H3:H15 and press Ctrl + C to copy the data to the clipboard. Next, select the chart and press Ctrl + V to paste the data into the chart. The chart will look like this:

24. Right-click the chart and then, on the shortcut menu, click Change Chart Type.
25. In the Change Chart Type dialog box select Combo under the All Charts tab.
26. Next, under the Choose the chart type and axis for your data series set the Chart Type to Line for the Q2 data series as per the below screenshot:

27. Press the Ok button to close the Change Chart Type dialog box. The chart should look like this:

34. Next, I need to create the vertical axis, its labels and tick marks. To do that I need to specify the major unit of the vertical axis. From that input I can calculate the required number of tick marks that covers the range of the data. To do this add the following formulas in the respective cells and the labels according the screen shot below:
    N2 --> type 20
    N3 --> = MAX(J4:J15)
    N4 --> = (N3 - MOD(N3,N2)) / N2 + 1

35. Now I’ll define the setup so that the chart can dynamically handle 20 tick marks. For that enter the numbers from 1 to 20 in the range K8:K27.
36. Next, add zeros in the range L8:L27.
37. Then type the following formula in cell M8:
     = IF(K8 <= $N$4,K8 * $N$2,NA())
38. Copy the formula in Cell M8 to the clipboard by pressing Ctrl + C. Next, select the range M9:M27 and paste the formula by pressing Ctrl + V. The data source should look like this:

39. Select the range M7:M27 and press Ctrl + C to copy the data to the clipboard. Next, select the chart and press Ctrl + V to paste the data into the chart. The chart will look like this:

28. In the chart, select the Series “Q3 – Q2” and set its Shape Fill to No Fill.
29. Then, select the Series “Q2 – Q1” and set its Shape Fill to No Fill.
30. Next, right-click the Series “Q1” and then, on the shortcut menu, click Format Data Series.

31. In the Format Data Series pane, with Series Options selected, click on the Fill & Line icon. Under Line, set the Line to No Line. Under Marker, set Marker Options to Built-in its Size to 4 and Type to the round dot.
32. Next, set Fill to Solid fill and change its Color to red and set Border to Solid line and its Color to red.
33. Then, in the chart select the Series “Q3 – Q2” Y Error Bars and change its Line width to 1 and Color to light grey. Do the same for the Series “Q1” Y Error Bars and the chart should look like this:

40. Right-click the chart and then, on the shortcut menu, click Change Chart Type.
41. In the Change Chart Type dialog box select Combo under the All Charts tab.
42. Next, under the Choose the chart type and axis for your data series set the Chart Type to Scatter for the y data series and turn off its Secondary Axis as per the below screenshot:

43. Press the Ok button to close the Change Chart Type dialog box. The chart should look like this:

44. Right-click the chart and then, on the shortcut menu, click Select Data.
45. In the Select Data Source dialog box, select the y series and click on the Edit button under Legend Entries (Series).
46. In the Edit Series dialog box, refer to cell K6 in the Series name edit box. Next, edit the Series X values to include cells L8:L27 and the Series Y values to include cells M8:M27 as per the screen shot below:

47. Press the Ok button twice in a row to close the Edit Series and Select Data Source dialog boxes. The chart should look like this:

48. Select the Vertical (Values) Axis in the chart and then press Del on the keyboard to delete it.
49. Next, in the chart, right-click the Tick marks data series and then, on the shortcut menu, click Format Data Series.
50. In the Format Data Series pane click on the Fill & Line icon. Under Marker Options, select the large horizontal dash from the Type dropdown list and set its Size to 3.
51. Next, under Fill select Solid fill and set its Color to black. Then turn off its border by setting the Border to No line.
52. Then, In the chart, right-click the Tick marks data series and then, on the shortcut menu, click Add Data Labels. The chart should look like this:

53. In the chart, right-click the Series “Tick marks” Data Labels and then, on the shortcut menu, click Format Data Labels.
54. In the Format Data Labels pane set the Label Position to Left.
55. Next, while the data labels are still selected, set their Alignment to Align Right on the Home tab.

56. Adjust the size of the Plot Area by dragging its left edge a bit to the right to avoid the data labels overlapping with the data markers. The chart should look like this:

Edward Tufte in Exel the Box Plot
Edward Tufte in Exel the Box Plot 1
Edward Tufte in Exel the Box Plot 2
Edward Tufte in Exel the Box Plot 3
Edward Tufte in Exel the Box Plot 4
Edward Tufte in Exel the Box Plot 5
Edward Tufte in Exel the Box Plot 6
Edward Tufte in Exel the Box Plot 6
Edward Tufte in Exel the Box Plot 7
Edward Tufte in Exel the Box Plot 8
Edward Tufte in Exel the Box Plot 9
Edward Tufte in Exel the Box Plot 10
Edward Tufte in Exel the Box Plot 11
Edward Tufte in Exel the Box Plot 12
Edward Tufte in Exel the Box Plot 13
Edward Tufte in Exel the Box Plot 14
Edward Tufte in Exel the Box Plot 15
Edward Tufte in Exel the Box Plot 16
Edward Tufte in Exel the Box Plot 17
Edward Tufte in Exel the Box Plot 18
Edward Tufte in Exel the Box Plot 19
Edward Tufte in Exel the Box Plot 20
Edward Tufte in Exel the Box Plot 21
Edward Tufte in Exel the Box Plot 22

57. Now I’ll define the setup for the vertical axis frame. It will show the minimum and maximum values of the data. For that enter the numbers 0.05 in the range K31:K32.
58. Then type the following formula in cell L31:
     = MIN(F4:F15)
59. Next type the following formula in cell L32:
     = MAX(J4:J15)
The data source should look like this:

Edward Tufte in Exel the Box Plot 23

60. Right-click the chart and then, on the shortcut menu, click Select Data.
61. In the Select Data Source dialog box, add a new data series by clicking the Add button under Legend Entries (Series).
62. In the Edit Series dialog box, refer to cell K29 in the Series name edit box. Next, edit the Series X values to include cells K31:K32 and the Series Y values to include cells L31:L32. The chart should look like this:

Edward Tufte in Exel the Box Plot 24

63. Right-click the Series “Vertical axis” and then, on the shortcut menu, click Format Data Series.
64. In the Format Data Series pane click on the Fill & Line icon. Under Marker, set the Marker Options to None.
65. Under Line, set the Line to Solid, its Color to Black and its Width to 0.25pt. The chart should look like this:

Edward Tufte in Exel the Box Plot 25

66. Right-click the Horizontal (Category) Axis and then, on the shortcut menu, click Format Axis.
67. In the Format Axis pane, with Axis Options selected, set the Axis Position to On tick mark.
68. Then, under Tick Marks, set the Major Type to Outside.
69. Next, click the Fill & Line icon and set the line Color to Black and its Width to 0.25pts.
70. Then, under Text Options, click the Textbox icon and set the Text direction to Rotate all text 270°.
71. Finally, add titles, data sources and commentary. Resize the chart so it looks like this:

Edward Tufte in Exel the Box Plot 26

72. You’ll notice that the chart is dynamic. Try it. Change the Major unit of vertical axis (cell N2) to 15. The chart will look like this:

Edward Tufte in Exel the Box Plot 27

Download Excel model