Fundamental Principal of Analytical Design

Views > Data visualization > Compared with what?

“Visual displays, if they are to assist thinking,
should show comparisons.”

Edward Tufte

In his brilliant book Beautiful Evidence, Edward Tufte—nicknamed the Leonardo da Vinci of data by The New York Times—introduced six principles that he calls the fundamental principles of analytical design. His first principle for the analysis and presentation of data—Principle 1: Show comparisons, contrasts, differences—states:

The fundamental analytical act in statistical reasoning is to answer the question “Compared with what?” Whether we are evaluating changes over space or time, searching big data sets, adjusting and controlling for variables, designing experiments, specifying multiple regressions, or doing just about any kind of evidence-based reasoning, the essential point is to make intelligent and appropriate comparisons. Thus visual displays, if they are to assist thinking, should show comparisons.

Below is an example of a graph that fails to communicate efficiently and effectively—not because of inadequate software and hardware—but because of poor implementation of Tufte's first principal of analytical design—Show comparisons.

The graph was part of an infographic published in The National newspaper—on the 26 of May 2014—to illustrate the results of a survey on road safety conducted with 1,208 people.

Sadly, poor presentation has obscured—rather than facilitated—comparisons for the following reasons:

Choosing inappropriate chart type
Misusing color
Arranging the data poorly

To summarize, the changes I've made to the graph make the task to conclude the following about the results a lot easier:

Respondents are aware of the rules of the road but less so about the system of black and white points.
Although the majority of respondents perceive public transport as reliable, there are mixed feelings as to the perceived safety of taxis and public transport.
The majority of respondents are in favor of applying tougher rules and regulations on school buses and bus drivers.
Respondents agree that the measures taken are effective but believe that more should be done.

Note: In the revised graph, I've reversed the wording of four questions—question 4, 5, 19 and 20 when counting from the top—from negative to positive voice. Reverse wording changes the direction of the scale. The score should be the same for both versions but the result is better in showing an interesting pattern.

Source: The National, News, 26 May 2014, page 7.

CHOOSING INAPPROPRIATE CHART TYPE
Choosing the wrong chart type for the data tops the list of design mistakes committed in the visual display of quantitative information. The author of the original solution did exactly that. Although we may immediately associate part-to-whole comparisons to pie charts, comparing data across multiple pie charts increases cognitive burden—something we try to reduce. Rather than helping, pie charts erect barriers that interfere with comparisons. There is a general consensus within the dataviz community that pie charts are badly laid-out tables.

A less objectionable structural solution—than the above matrix of pie charts—is shown in the graph below redone as a 100% stacked bar graph. The revised chart displays the same data as above—this time using a horizontal 100% stacked bar graph—that can be interpreted much more efficiently and accurately. The difference is dramatic. The chart is much more visually appealing. It's easier to navigate around and easier to read since the numbers are organized in columns—and it's easier to compare numbers put in columns rather than circles.

The above solution—although superior to the original—is not without perceptual problems. Because the values of the left segment—Strongly disagree—share a common baseline, it’s easy to compare the values across the different statements. For example—however small the difference—it’s easy to see that the percentage of respondents who strongly disagree with the statement “I am well aware of the implications of crossing a red traffic light” is higher than the percentage of respondents who strongly disagree with the statement “The public transport system is reliable”. Similar comparisons could be easily made for the segment on the right—Strongly agree. Take for instance the top two statements. It doesn’t require a lot of cognitive processing power to see that almost the same percentage of respondents strongly agreed with both statements.

The same doesn’t apply to the middle segments. Can you tell whether the percentage of respondents who Somewhat agree with the statement “Buses should be checked by a registered authority or garage each month for road worthiness” is equal, higher or lower than the percentage of respondents who Somewhat agree with the statement “I believe the current penalties for road offences are not sever enough”? I doubt you can. Comparing the middle segments across statements is difficult because the bars float and would not be aligned to the left or to the right.

Given these particular weaknesses of the 100% stacked bar graph, a better solution is needed and it’s called the diverging stacked bar chart—as shown in the graph below. In the improved chart we used a vertical baseline around which we aligned the positive and negative respondents. We drew the percentage of respondents who agree with the respective statements to the right of the vertical baseline and those who disagree to the left. By centering the responses on both sides of a base line, we can easily see the percentage of respondents who agree versus those who do not agree—Indeed, the main comparison to be made from this analysis.

MISUSING COLOR
When any color appears as a contrast relative to the norm, our eyes pay attention and our brains attempt to assign meaning to that contrast. This is true for the red color assigned to the strongly disagree choice in the original graph. Although the scale is balanced—equal number of positive and negative attitudes—the author visually distorted the scale by assigning blue color with varying intensity to three out of four choices, giving the impression that more respondents are in agreement with the statements than in disagreement.

It's important to visually balance the scale in order to avoid bias. In the revised chart below, I chose two colors—red and blue—with increasing color intensities to indicate varying degrees of importance. Color gradations—with the stronger the agreement/disagreement, the darker the color—eliminates the need to refer back to the legend because it is intuitive.

ARRANGING THE DATA POORLY
Beyond selecting the appropriate chart type and the right color, attention also must be given to the way the information is organized and grouped to support meaningful comparisons. Ordering graphs without giving thought to some aspect of the data can obscure structure in the data that would have been obvious had the display not been ordered randomly.

Rather than fitting the data into some desired design concept, we've grouped the questions into four logical segments and sorted them—within each segment—in descending order of agreement. This allows the readers to see patterns and exceptions in the data.

COMPARED WITH WHAT?