Skip to main content

Data Analysis and Visualisation: Charts

Pick an Appropriate Chart for Your Data

There are many different kinds of charts out there, some very familiar from everyday use, while others can be highly unusual, complex, or rarely seen. It's important to use a chart that's appropriate to the data you have, and that allows you to explore the information you're interested in. The Data Visualisation Catalogue can help you figure out the best chart for your situation.

Some common charts include:

Bar charts

  • Use for data that can be placed into categories
  • Always use zero as the baseline that bars start from
  • Sort your bars into a meaningful order; if your categories have inherent order (e.g. age groups) then you should use that order to sort, however if your categories are unordered (e.g. ethnicities) then generally sorting by either ascending or descending bar heights will be the most appropriate
  • Use spaces between your bars, but don't space them farther apart than two thirds of the width of a bar

Histograms

  • Use for data that is divided into range bins
  • Always use zero as the baseline that bars start from
  • Your data is continuous, so do not use spaces between your bars

Pie charts

  • Use to display data that represent portions of a whole
  • Pie charts are useful to provide an overview of the relative proportions of a small number of categories
  • Use a bar chart rather than a pie chart if you:
    • have more than five or six categories;
    • have segments of similar size;
    • want to compare multiple charts; or
    • want people to pay attention to the sizes of the categories rather than just get an overall impression of relative importance
  • Do not use exploded pie charts

Line graphs

  • Use to display continuous quantitative data that changes over some interval, such as time
  • Useful to display the tend of a data series, or to compare different trends by plotting multiple lines on the same graph
  • Make sure you have enough data points to make the trend of the line meaningful - the trend of only two or three points is not useful
  • If your data are all in a limited range well above zero you can start plotting y-axis values just below the lowest y-value of your data. You should indicate this with a zigzag line at the bottom of the y-axis. Be aware that plotting this way will increase the apparent variation within your data, so if you are interested in absolute trends rather than the variation within, or between trends, then you should start your y-axis from zero.
  • Values on the y-axis can be positive or negative, with negative values being plotted below the x-axis

Scatterplots

  • Use for data where you wish to investigate the relationship (if any) between two continuous, numerical variables
  • Colour or symbol shape can be used to indicate a third, often categorical variable of the data
  • If a lot of your points are overlapping, it can be useful to plot your points as unfilled outlines so that you can see all of the points in the cluster, rather than just an undifferentiated blob
  • A line of best fit, also called trend line or regression line, can be added to demonstrate the proposed underlying mathematical relationship between the variables, however it is important not to over-fit your data - you want to be representing the data, not the errors that are inherent in measurements!
  • Always remember that a correlation between two variables does not necessarily mean that they are causally related!

Proportional area plots

  • Use when you want to want to compare the relative sizes of some variable of your data 
  • Comparing areas accurately is difficult, so these plots are most useful to communicate an overview of relative sizes, rather than as part of a rigorous analysis
  • Ensure that you plot you shapes so that the area of the shape scales with the variable, rather than a linear quantity of the shape, such as radius (for circles) or side length (squares, triangles, etc.). For example, the area of a circle is proportional to the square of its radius, so if you plot circles with the radius corresponding to the data value, you will end up greatly exaggerating the differences between your data values
  • Proportional area plots can be combined with scatterplots to show the relationships between three different variables - these are often called bubble charts or plots

Tree diagrams and treemaps

  • Use either of these chart types for hierarchical data
  • For treemaps, plot the area of the nested rectangles to be proportional to a specified variable of the data

Other Tips for Charts

Declutter your annotations
You need to make sure that you include enough annotations for your chart to be understood: x- and y-axis labels indicating units, a legend, labels for highlighting important points or trends in the data can be essential to ensure that your chart succeeds in conveying the intended information. However, you should make sure that you remove any non-essential annotations that might be cluttering up your chart. Generally remove gridlines, any background fill, extra tick marks along the axes, busy pattern fills, any colour that doesn't represent information, and any extraneous text. It will be up to you to decide what is necessary to keep to make your chart usable, but keep in mind that having less detail to sort through means that your viewers will quickly see and focus in on the main messages of your visualisation.

Declutter your data
Keep in mind that if you try to put too much on one chart then the relationships that you want to display can become lost. For instance, plotting a large number of lines on a line graph can mean that the trends of all lines become obscured and no information can be taken away from the resulting visualisation. In this case, it would be better to plot small subsets of the data on multiple charts, and compare the charts to each other. Using dual y-axes can also make your chart difficult to interpret, and potentially even imply relationships between variables that are merely artefacts of how the variables were plotted. Any time you feel the data are overwhelming the viewer, assess whether it would be possible to use multiple plots to make relationships and trends clearer. Note that this is not a licence to throw away data that doesn't fit your message! You always need to make sure that you are accurately representing your data, or the whole point of creating a visualisation is lost.

Comparing multiple plots
If you intend to compare multiple plots you must ensure that they have been plotted the same way, are the same size, have same scale axes, and employ the same conventions to enable your viewers to draw accurate comparisons between them. They should also be plotted next to each other, rather than on separate pages or slides.

Colour
If you use colour in your charts, make sure that a difference in colour provides the viewer with useful information, as extraneous colour changes can be distracting and make it more difficult to compare the data represented. For instance, don't colour all of your bars in a bar chart a different colour; the different categories are already indicated by the fact that they are separate bars! Either use the same colour for all of your bars, or colour them based on another variable, or to highlight a particular bar that you want to draw attention to. Remember to think about how colour in your chart will affect its usability if it is reproduced in black and white, as well as how your chart will appear to people with colour blindness. Use resources like ColorBrewer to choose colour palettes that are colour blind and black and white friendly.