This marvellous book by Howard Wainer, Picturing the Uncertain World: How to Understand, Communicate and Control Uncertainty through Graphical Display, is an absolute must for anyone who claims to be involved in the search for empirical evidence. It's a sequel to the same author's Graphic Discovery and also in the tradition of Edward Tufte's The Visual Display of Quantitative Information. All applied economists should have read carefully all three books and keep them close by for reference. So should all other researchers – social science, medicine, education – anybody who needs to navigate the shoals of carrying out robust empirical work and communicating it to others.
Why do books which might seem to appeal only to the anorak tendency matter so much? They're not even about the underlying statistical analysis, only about presenting data graphically, after all.
The reason is that visual messages are much more powerful and effective than words or tables of figures. Yet so often data is presented graphically in either obscure or actively misleading ways. Figuring out the best and most accurate visual presentation is a good test of the researcher's statistical understanding and results. It will also get the message across powerfully. As both Wainer, in this book and his previous one, and Tufte point out in their many examples, most graphs whether in the media or academic papers are at best inadequate and at worst dangerously wrong. Conventional graphs swallow information, present it confusingly, and much software limits the kind of graphs that can be drawn in unhelpful ways. (Tufte is a well-known and vocal critic of Powerpoint graphs.)
Even though Wainer's book is mainly about graphical presentation, I derived some new insights about statistics from it too. One is his explanation of the dangers of false inference from grouping data into categories, say 'tall' and 'short' as related to 'clever' and 'stupid'. Depending on the boundaries chosen between categories, uncorrelated variables can appear to have any correlation you like: “Whenever we have two uncorrelated variables, we can always make the data do what we want so long as the sample is large enough.” (p153)
A second is the smallness of the tails of the normal distribution. If drawn correctly, a normal curve which is 1mm high 13 standard deviations away from the middle, as one might casually draw it if asked to go that far out, should be 3.5 times the size of the universe. (p171) With data for which the normal distribution is appropriate, such as the distribution of human heights, we're not interested in 13 standard deviations out. But the calculation is a stark reminder of the inappropriateness of the normal distribution for other data, the Black Swan world.
I should add that these are the most technical parts of the book – there are next to no equations and anybody should be able to follow the reasoning with a modest amount of concentration. It's also a rather beautiful book – lots of pictures, after all.
I'd certainly put it on students' required reading lists. Even econometrics courses do a poor job of teaching students about the importance of presenting their results, both to communicate them and to test them. It wasn't until I started my first job that I was taught to do a simple line graph of every data series I used to eyeball them for outliers, data errors and main statistical properties (stationary or not? obviously correlated with each other?).
Another reason Wainer's book and the others I refer to is so important is the much greater availability and use of graphics in the broadband web world. There is a lot of innovation in this field, and new software around. Hans Rosling's Gapminder has become well known through his TED talk. The OECD has introduced something similar on its website. There is much interest in mapping – here is an example that I found courtesy of a Tweet by @ChristianKreutz yesterday. With all this new technological capability around, let's encourage people to use it responsibly!