This post title plays on Nathan Vass’s book, but doesn’t have to do with Seattle bus transportation!
I’ve been studying how to use R programming for data analysis. We learned about the “geometry” layer of creating graphs using a neat function called ggplot(), from the ggplot2 package. (In case you wondered like me, ‘gg’ stands for Grammar of Graphics, which is a book written about data visualization. Nothing to do with Gigi’s nor gee’s). Compared to using graphing functions like matplot() (“matrix plot”) that come standard in “base R”, graphing tools from packages created by others tend to take care of a lot of details that would be cumbersome to set manually.
We took a set of 562 American movies spanning many genres from 2007-2011, their Rotten Tomatoes ratings (from critics and public audience) and budget in dollars. The following shows how Critic versus Audience ratings compared, by genre (color) and budget (line thickness & dot size, millions of dollars).

Isn’t this graph crazy? Like rainbow paint was splashed everywhere. This left me wondering what is the point of having a graph with lines overlaid above dots like this? It’s hard to distinguish between line strokes and dots, not to mention its not at all necessary to ‘connect the dots’ with geom_line’s.
That’s when I started creating a different graph. I have this banana plant that is unfurling long, luscious green leaves, each one bigger than the next. I counted each leaf (number 1 being the first/oldest, and smallest leaf), and measured the length in centimeters. Ultimately I’d like to create a predictive model (comparing regression versus machine learning) to forecast how long the next leaf will be. Anyways, seeing this graph, this is a case where having lines and dots isn’t so chaotic, and actually aesthetically makes sense.

The lesson here is that combining both geometric points and lines – geom_line() + geom_point() is better suited when:
- Dots and their connecting lines are distinct colors
- Lines are not so thick that the dots are hard to distinguish
- Generally speaking, that it makes sense to have a reason to draw lines (to emphasize trends, for example).
None of these were the case with the movies graph – so while it may be worthy of being displayed in an art museum, the simpler Banana Growth graph makes a much better use of ggplot geometric dots and lines.