Advice on making good visualizations#
# initialization
import numpy as np
import matplotlib.pyplot as plt
Up to this point we have been focused on the mechanics of producing figures, but what about the design aspects? How to we produce good, meaningful figures? Here are a few general principles for your considerations.
1. Provide context in your figure#
While a figure may not be entirely self-contained, it is a good idea for it to provide context of the data it presents. Concretely, this means:
Make sure the figure has a concise and descriptive title
Make sure that the axes are labeled, including proper units if applicable
For example, suppose we have data about a remote sensor’s depth as a function of time.
# data to plot
time = np.arange(0, 5.1, 0.5)
depth = np.array([0.0, 2.2, 4.9, 10.3, 18.2, 19.3, 16.5, 10.1, 5.2, 1.1, 0.0])
Bad example:
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(time, depth)
ax.set_ylim(20, -0.5)
plt.show(fig)
The above figure provides no context about what what the axes values represent, and does not describe what it is plotting. It also give little hint about how frequent the data is collected
Good example:
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(time, depth, marker="o")
ax.set_ylim(20, -0.5)
ax.set_title("Depth of remote sensor during a dive", fontsize=18)
ax.set_xlabel("Time (minutes)", fontsize=14)
ax.set_ylabel("Depth (m)", fontsize=14)
ax.tick_params(labelsize=12)
plt.show(fig)
2. Prefer simplicity and clarity#
Sometimes a figure can show too many visual elements, and end up reducing its clarity. For example, suppose we have some data of temperature versus salinity, and we want to make a scatter plot to illustrate how they may be related
Note: the data we used is a subset of the CalCOFI bottle data, and you can download .txt file we used here.
# Load the prepared data
# NOTE: don't worry about learning np.loadtxt()
# We will cover data loading tools next week
# Also don't worry about how we obtained the best-fitted line
# We will cover linear regression in a future week
data = np.loadtxt("data/temperature_salinity.txt")
temp = data[:, 0]
salinity = data[:, 1]
slope = -0.0472
intercept = 33.98
temp_fit = np.array([8, 16])
sal_fit = slope * temp_fit + intercept
Bad example:
# Plot the data
fig = plt.figure(figsize=(9, 4))
ax = fig.add_subplot()
ax.set_title("Temperature versus salinity at depth d = 100 m", fontsize=18)
ax.set_xlabel("Temperature (°C)", fontsize=14)
ax.set_ylabel("Salinity", fontsize=14)
ax.tick_params(labelsize=12)
ax.set_xticks(np.arange(8.0, 16.1, 0.5))
ax.set_yticks(np.arange(32.8, 34.0, 0.1))
ax.grid()
ax.scatter(temp, salinity, s=80, marker="o")
ax.plot(temp_fit, sal_fit, lw=3, c="tab:orange")
plt.show(fig)
Since the purpose of the graph is to guide us towards a general trend, the grid does not provide any useful information. In fact, it likely distract the readers from the trend line, which is more important. Moreover, the interval between adjacent ticks is probably too small that it becomes hard to appreciate the overall scale of the data. Finally, the dot that represents data are also too thick, making hard to appreciate the actual amount of data presented.
Good example:
# Plot the data
fig = plt.figure(figsize=(9, 4))
ax = fig.add_subplot()
ax.set_title("Temperature versus salinity at depth d = 100 m", fontsize=18)
ax.set_xlabel("Temperature (°C)", fontsize=14)
ax.set_ylabel("Salinity", fontsize=14)
ax.tick_params(labelsize=12)
ax.set_xticks(np.arange(8.0, 16.1, 1))
ax.set_yticks(np.arange(32.8, 34.0, 0.2))
ax.scatter(temp, salinity, s=20, marker="o")
ax.plot(temp_fit, sal_fit, lw=2, ls="--", c="tab:orange")
plt.show(fig)
3. Keep the visual elements consistent#
Bad example:
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(time, depth, marker="o")
ax.set_title("Depth of remote sensor during a dive", fontsize=12)
ax.set_xlabel("Time (minutes)", fontsize=14)
ax.set_ylabel("Depth (m)", fontsize=20)
ax.tick_params(axis="x", labelsize=14)
plt.show(fig)
Here depth is plotted with upward being positive, in contradiction to common oceanographic convention. Moreover, the font size of the x- and y-axis labels are unequal, and both are bigger than the title. The same goes for the size of the tick labels.
Good example: (same as the good example under bullet point #1)
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(time, depth, marker="o")
ax.set_ylim(20, -0.5)
ax.set_title("Depth of remote sensor during a dive", fontsize=18)
ax.set_xlabel("Time (minutes)", fontsize=14)
ax.set_ylabel("Depth (m)", fontsize=14)
ax.tick_params(labelsize=12)
plt.show(fig)
4. Keep accessibility in mind#
In essence, we should be thoughtful about text size, and we should choose our color carefully so that people who have color blindness or print on grayscale can still glimpse useful information from the figure. In addition, if color is used to distinguish between different kinds of data, try to add an additional distinguishing visual element besides color.
# data to be plotted
t_array = np.linspace(0, 10, 51)
vel_dragless = 10 * t_array
vel_dragful = 100 * (1 - np.exp(-0.2 * t_array))
Bad example:
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(t_array, vel_dragless, c="red", label="no drag")
ax.plot(t_array, vel_dragful, c="green", label="linear drag")
ax.set_title("Velocity as function of time")
ax.set_xlabel("time (s)")
ax.set_ylabel("speed (m/s)")
ax.set_xlim(0, 10.5)
ax.set_ylim(0, 105)
ax.legend()
plt.show(fig)
In the above figure the two data series are only distinguished from each other by color, and the colors used are not color-blind friendly. Moreover, the tick labels, axes labels, and figure titles are all slightly small.
Good example:
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
ax.plot(t_array, vel_dragless, c="tab:blue", ls="--", lw=2, label="no drag")
ax.plot(t_array, vel_dragful, c="tab:orange", label="linear drag")
ax.set_title("Velocity as function of time", fontsize=18)
ax.set_xlabel("time (s)", fontsize=14)
ax.set_ylabel("speed (m/s)", fontsize=14)
ax.tick_params(labelsize=12)
ax.set_xlim(0, 10.5)
ax.set_ylim(0, 105)
ax.legend(fontsize=14)
plt.show(fig)
If you want to see how people with color blindness may view a figure, consider using the color blindness simulator at https://www.color-blindness.com/coblis-color-blindness-simulator/