September 1, 2010

Twenty rules for good graphics

Research tips

One of the things I repeat­edly include in ref­eree reports, and in my responses to authors who have sub­mit­ted papers to the Inter­na­tional Jour­nal of Fore­cast­ing, are com­ments designed to include the qual­ity of the graph­ics. Recently some­one asked on stats.stackexchange.com about best prac­tices for pro­duc­ing plots. So I thought it might be help­ful to col­late some of the answers given there and add a few com­ments of my own taken from things I’ve writ­ten for authors.

The fol­low­ing “rules” are in no par­tic­u­lar order.

1. Use vec­tor graph­ics such as eps or pdf. These scale prop­erly and do not look fuzzy when enlarged. Do not use jpeg, bmp or png files as these will look fuzzy when enlarged, or if saved at very high res­o­lu­tions will be enor­mous files. Jpegs in par­tic­u­lar are designed for pho­tographs not sta­tis­ti­cal graphics.
2. Use read­able fonts. For graph­ics I pre­fer sans-serif fonts such as Hel­vetica or Arial. Make sure the font size is read­able after the fig­ure is scaled to what­ever size it will be printed.
3. Avoid clut­tered leg­ends. Where pos­si­ble, add labels directly to the ele­ments of the plot rather than use a leg­end at all. If this won’t work, then keep the leg­end from obscur­ing the plot­ted data, and make it small and neat.
4. If you must use a leg­end, move it inside the plot, in a blank area.
5. No dark shaded back­grounds. Excel always adds a nasty dark gray back­ground by default, and I’m always ask­ing authors to remove it. Graph­ics print much bet­ter with a white back­ground. The ggplot for R also uses a gray back­ground (although it is lighter than the Excel default). I don’t mind the ggplot ver­sion so much as it is used effec­tively with white grid lines. Nev­er­the­less, even the light gray back­ground doesn’t lend itself to printing/photocopying. White is better.
6. Avoid dark, dom­i­nat­ing grid lines (such as those pro­duced in Excel by default). Grid lines can be use­ful, but they should be in the back­ground (light gray on white or white on light gray).
7. Keep the axis lim­its sen­si­ble. You don’t have to include a zero (even if Excel wants you to). The defaults in R work well. The basic idea is to avoid lots of white space around the plot­ted data.
8. Make sure the axes are scaled prop­erly. Another Excel prob­lem is that the hor­i­zon­tal axis is some­times treated cat­e­gor­i­cally instead of numer­i­cally. If you are plot­ting a con­tin­u­ous numer­i­cal vari­able, then the hor­i­zon­tal axis should be prop­erly scaled for the numer­i­cal variable.
9. Do not for­get to spec­ify units.
10. Tick inter­vals should be at nice round numbers.
11. Axes should be prop­erly labelled.
12. Use linewidths big enough to read. 1pt lines tend to dis­ap­pear if plots are shrunk.
13. Avoid over­lap­ping text on plot­ting char­ac­ters or lines.
14. Fol­low Tufte’s prin­ci­ples by remov­ing chart junk and keep­ing a high data-ink ratio.
15. Plots should be self-explanatory, so included detailed captions.
16. Use a sen­si­ble aspect ratio. I think width:height of about 1.6 works well for most plots.
17. Pre­pare graph­ics in the final aspect ratio to be used in the pub­li­ca­tion. Dis­torted fonts look awful.
18. Use points not lines if ele­ment order is not relevant.
19. When prepar­ing plots that are meant to be com­pared, use the same scale for all of them. Even bet­ter, com­bine plots into a sin­gle graph if they are related.
20. Avoid pie-charts. Espe­cially 3d pie-charts. Espe­cially 3d pie-charts with explod­ing wedges. I promise all my stu­dents an instant fail if I ever see any­thing so appalling.
©