Views > Excel charts > How to create a one-dimensional scatterplot in Excel
"Summary reports are economical with the truth...Detailed data moves closer to the truth." These are the words of Edward Tufte—from his latest book Seeing With Fresh Eyes—warning about the potential loss of information or distortion encountered in summary graphics.
Summary graphics are useful, but frequently, analyzing the distribution of the data can uncover patterns in the data that summary statistics cannot reveal. For example, the 6 datasets shown below left look completely different and yet yield the same boxplot.
7. Highlight the range H3:I11 and then click Insert > Insert Column or Bar Chart > Clustered Bar.
8. While the chart is still selected, drag the handles of the data series values to reference the range H2:I12.
The chart should look like this:
11. In the chart, right-click the Vertical (Category) Axis and then, on the shortcut menu, click Format Axis.
12. In the Format Axis pane, with Axis Options selected, under Labels, set the Interval between labels to Specify interval unit and keep the default value of 1.
13. Turn off the Primary Major Vertical Gridlines.14. Format the border of the Plot Area to Solid line with grey color.
15. Turn on the Primary Major Horizontal Gridlines and change their line Dash type to Dash.
16. Turn off the border of the Chart Area by setting the Border to No line. Resize the chart and the chart should look like this:
17. On the worksheet, select the range I2:I12 then press Ctrl + C to copy the content to the clipboard.
18. Select the chart and press Ctrl + V to add a new series. The chart should look like this:
HOW TO CREATE A ONE-DIMENSIONAL SCATTERPLOT IN EXCEL
19. Right-click the newly created series (Series2) in the chart and then, on the shortcut menu, click Change Series Chart Type.
20. In the Change Chart Type dialog box, check the Secondary axis checkbox for Series2 series. Next, choose the Scatter icon from the dropdown list under Chart Type. The chart should look like this:
21. Right-click the chart and then, on the shortcut menu, click Select Data.
22. In the Select Data Source dialog box, select the Series2 data series and then click the Edit button under Legend Entries (Series).
23. In the Edit Series dialog box, edit the Series X values to include cells B3:B2156 and the Series Y values to include cells C3:C2156 as follows:
24. Click on the Ok button twice in a row to close the Edit Series and the Select Data Source dialog boxes. The chart should look like this:
25. Right-click the Secondary Vertical (Value) Axis in the chart and then, on the shortcut menu, click Format Axis.
26. In the Format Axis pane, with Axis Options selected, set the Bounds Maximum to 1.
27. Select the Vertical (Category) Axis in the chart. In the Format Axis pane, with Axis Options selected, turn on the On tick marks under Axis position.
28. Select the Secondary Vertical (Value) Axis in the chart and then turn it off by pressing the Delete key on the keyboard. The chart should look like this:
29. Select the chart and then, in the Current Selection group on the Format, select Series1 from the drop-down list.
30. Click the Format Selection under the Current Selection group.
31. In the Format Data Series pane, under Series Options, click the Fill & Line icon. Next, turn off the fill by clicking on the No Fill radio button under Fill.
32. Delete the chart title.
33. Select the chart then click the Charts Elements button located on the upper-right corner of the chart. Next, in the expanded menu, check the Secondary Horizontal option under Axes, as per the screenshot below:
34. Right-click the Horizontal (Value) Axis in the chart and then, on the shortcut menu, click Format Axis.
35. In the Format Axis pane, with Axis Options selected, set the Bounds Maximum to 300.
36. Next, choose the Outside option from the dropdown list from the Major type under Tick Marks.
37. Click the Secondary Horizontal (Value) Axis Next, choose the Outside option from the dropdown list from the Major type under Tick Marks.
38. Under Number, set Decimal Places to 0 and check the Use 1000 Separator (,) option. The chart should look like this:
42. Finally, resize the chart, change the font to Trebuchet MS and its color to black, add title and source. The chart should look like this:
The dots forming the markers look too bulky and hide the distribution due to the large number of data points. To reduce the issue of overplotting I can replace the dots with vertical ticks. To achieve that we can do the following:
39. Draw a short vertical line anywhere on the worksheet using the Shapes in the Illustration Group under the Insert tab
40. Copy the newly created shape to the Clipboard by pressing Ctrl + C.
41. Select the markers of the data series in the chart and then paste the shape by pressing Ctrl + V. The chart will look like this:
Or alternatively buy product on Gumroad
Or alternatively buy product on Gumroad
Or alternatively buy product on Gumroad
One way to study the distribution of one or more categories of measurements is to graph the data with a one-dimensional scatterplot as shown below. The data is from Microsoft's classic relational Northwind database. The graph shows the distribution of 2,155 items sold by their unit price and by 9 salespersons. The salesperons are ranked by the average of the unit price. It's obvious that most items sold were in the $2 to $50 range.
Source: Seeing With Fresh Eyes, Edward Tufte 2020.
First I need to return a list of the unique salesperons from the data source. To do that I'm going to use the UNIQUE function of Excel—part of Excel's new Dynamic Arrays capabilities.
1. Type the following formula in Cell F3:
= UNIQUE(A2:A2156)
Excel will spill the list of unique sales persons from the range A2:A2156 as follows:
Next, I want to rank the salespersons by the average of unit price of their sales.
2. To do that add the following formula in cell G3:
= AVERAGE(IF($A$2:$A$2156 = F3,$B$2:$B$2156,""))3. Select cell G3 and drag the handle of the cell to copy the formula into the range G4:G11.
The worksheet should look like this:
Next, I need to sort the list of salespersons by the average unit price of their sales. To do that I'm going to use the SORTBY function of Excel—another powerful function of Excel's new Dynamic Arrays capabilities.
4. Type the following formula in Cell H3:
= SORTBY(F3#,G3:G11,1)
Excel will spill the list of sales persons sorted by the average unit price as per the snapshot below:
Next, I need to assign a y-coordinate for each salesperson. These y-coordinates will be used to allocate each sale to its respective salesperson in the one-dimensional scatterplot.
5. Add the following formula into cell I2
= 1 - (ROWS($H$2:$H$12) - ROW() + ROW($H$1)) / (ROWS($H$2:$H$12) - 1)
6. Select cell I2 then drag the handle to copy the formula into cells I3:I12. The data source should look like this:
Another classic example is the Anscombe Quartet—4 data sets yielding the same regression line and yet looking completely different.
And here below is my own Anscombe Quartet generated based on Alberto Cairo's famous Datasaurus. If you're interested in generating your own Anscombe Quartets, check my previous blog post about the subject.
Lets build the chart in Excel. The data is in the range A1:B2156.
Next, I need to assign a y-coordinate for each sales. Each y-coordinate will be pulled from the previously defined ys (see steps 5 and 6) depending on the salesperson.
9. Add the following formula into cell C2
= INDEX($I$2:$I$12,MATCH(A2,$H$2:$H$12,0))
10. Select cell C2 then drag the handle to copy the formula into cells C3:C2156. The data source should look like this:
Download Excel Model
Thank you for reading, and I hope you found this useful. If you have any questions, find me on Twitter and ask me anything.