Views > Excel charts > Slicing a scatterplot in excel
If you ask a group of data analysts and data visualization experts to choose the most important chart type to display data, most probably “The scatterplot” would be the response you’ll get. And they have a point. Edward Tufte, in the Visual Display of Quantitative Information, crowned the scatterplot—and its variants—as the greatest of all graphical designs. The scatterplot encourages the viewer to assess relationships by showing how one variable affects another.
However, noisy data and weak relationships often hinder visual identification of any such relationships. Furthermore, even if a general relationship can be identified in the chart, it can be difficult to discern its characteristics from simple visual inspection of the scatterplot alone.
Lets have a look at an example. The scatterplot below shows the relationship between GDP per capita PPP in $ and mortality rate per 1,000 inhabitants for 189 countries in 2018. The source of the data is the IMF.
As you can see the data is skewed. A common remedy for skewness is to take logarithms. The chart below shows the same data on a log10 scale. Now the data is much more uniform and we can more effectively judge the data.
The box plot below shows a summary of the data where the GDP per capita variable is sliced into 7 equally sized intervals and each box plot is showing the median and the quartiles within each interval.
From inspection of the median line it becomes apparent that the two variables have negative but curvilinear relationship.
Mortality rate tends to decrease with GDP per capita except for the two lowest and two highest bands.
The box plot also shows that the spreads of the distribution tend to be largest in the middle bands.
Also note the two outliers in Band 2 and Band 5.
SLICING A SCATTERPLOT IN EXCEL
Lets build it in Excel.
The data source is in the range B2:D191 as per the below screenshot:
1. The fist thing to do is to transform the Mortality rate and GDP per capita to log10. To do that enter the following formulas in the respective cells:
G3 --> = LOG10(C3:C191)
H3 --> = LOG10(D3:D191)
The outcome is as per the below screenshot:
2. Next, we need to create the boundaries for the bands and assign each data point to its respective band. To do that enter the following formulas in the respective cells:
Type 7 in cell L2
L3 --> = MIN(H3#)
L4 --> = MAX(H3#)
L5 --> = L4 - L3
L6 --> = L5 / L2
J8 --> = SEQUENCE(L2,1,1,1)
K8 --> = "Band " & J8#
L8 --> = L3 + L6 * J8#
F3 --> = INDEX(K8#,XMATCH(H3#,L8#,1,2))
The worksheet should look like the screenshot below:
4. Lets do some cleaning. Delete the Chart Title.
5. Delete the Vertical Axis Major Gridlines.
6. Turn off the border of the Chart Area by setting the Shape Outline to No Outline.
7. Change the chart font to Trebuchet MS, its size to 9 and color to Black.
8. Set the Shape Outline of the Horizontal Axis to No Outline.
9. Set the following settings for the Vertical Axis: set the Tick Marks > Major type to Outside. Set the Axis Options > Bounds > Maximum to 2. Set its Line > Solid Line > Colour to black.
10. Add the Primary Horizontal and Primary Vertical Axis Tiltes.
11. In the chart, select the data Series "1" data series and on the Format Data Series pane uncheck the Show mean markers option and change the Fill Colour to light blue.
Adjust the size of the chart, add a chart title and data source and the chart should look like this:
3. Select the range F3:G191 and then on the Insert tab, in the Charts group, click the Insert Statistic Chart icon and then select the Box and Whisker icon. The chart should look like this:
The chart is dynamic. Change the number of bands in cell L2—say to 5—and the chart should look like this:
Or alternatively buy product on Gumroad
Thank you for reading, and I hope you found this useful. If you have any questions, find me on X (Twitter) and ask me anything.
Or alternatively buy product on Gumroad
Or alternatively buy product on Gumroad
Download Excel model