What a Tangled Web We Weave...

Update 7/23/2019 Various package updates have created problems with showing more than one javascript plot on a post. I’ve added calls to htlwidgets::onRender to get at least one plot displayed. I may revisit this, but the interaction between hugo, blogdown, and various javascript libraries (chorddiag, networkD3, D3, data tables, etc) is more than I’m able to dive into at the moment.

cd <- chorddiag(
  xtabs(~MAJOR+minor, data = mmhl[mmhl$Grad.Year == 2017,],
        drop.unused.levels = TRUE),
  showTicks = FALSE, groupColors = many_colors, type = "bipartite"
)

htmlwidgets::onRender(cd,'document.getElementsByTagName("svg")[0].setAttribute("viewBox", "")')

The point of this post is to illustrate how to make the above “chord diagram” using the chorddiag package which leverages D3.js from R. The above visual shows students major/minor combinations who graduated in 2017 from the College of Idaho with at least one minor connected to my department: MAPS. Lower case abbreviations are minors, upper case are majors. Be sure to hover over chords, sectors, or labels to get the full D3.js effect. Also each major can be up to triple counted since students typically earn a major and 3 minors and we’re only plotting the pairings.

Getting Ready

To do this, we’ll need several common packages:

library(dplyr) #pipes, mutate, select
library(tidyr) #gather
library(stringr) #string processing
library(RColorBrewer) #Colors!!

We’ll also need the excellent chorddiag package, available via github:

#install if needed
devtools::install_github("mattflor/chorddiag")
#definitely load
library(chorddiag)

Of course, we’ll also need some data, I’ll be using a data set of “anonymized” records of students major/minors degree info from the College of Idaho, which I had in another project: data. I’ve read it in with read.csv() and dropped 2 columns:

head(mmh) %>% knitr::kable()
Grad.Year Degree Major1 Minor1 Minor2 Minor3
2012 BA BUS MAT NA NA
2012 BA PSY PHY CRW CSC
2011 BA MUS EDS MAT NA
2011 BA ACCT MAT NA NA
2011 BS MAT CHE PHY NA
2013 BS MATPH CSC NA NA

Since students at C of I get a major and 3 minors (in place of a set general education core), we’ll have to reshape this and clean it up a bit.

Reshape and First Plot

We want to just look at major/minor pairs, so we’re going to gather the minors together, change case and trim spaces:

mmhl <- gather(mmh, key=mNum, value = minor, 
               -c(Grad.Year, Major1, Degree)) %>% 
  select(-mNum) %>% 
  mutate(minor = str_to_lower(str_trim(minor, side = "both"))) %>%
  rename(MAJOR = Major1)

head(mmhl) %>% knitr::kable()
Grad.Year Degree MAJOR minor
2012 BA BUS mat
2012 BA PSY phy
2011 BA MUS eds
2011 BA ACCT mat
2011 BS MAT che
2013 BS MATPH csc

Let’s try a plot to see how it looks. Chord diagrams need a matrix or contingency table though, so we make that and then plot.

majMinTab <- xtabs(~MAJOR+minor, data = mmhl[mmhl$Grad.Year==2017,],
                   drop.unused.levels = TRUE)
chorddiag(majMinTab, type="bipartite")

The type="bipartite" allows us to pass a non-square matrix to chorddiag since we only need links between two sets of groups (majors and minors). This isn’t bad, but the default color options only allow about 16 groups (8 from the dark2 palette and 8 form the grey2 palette). Also the tick marks seem unnecessary.

Improving Colors

Removing the tick marks is easy (set option showTicks=FALSE), but coming up with lots of colors is a bit harder. The table has dimensions 8, 20, so we would 28 colors, and this isn’t even all possible combinations that C of I students have obtained!

To get a large list of relatively distinct colors, we can do this by squishing the “qualitative” colors palettes from color brewer together. This gives 74 colors with only a little repetition:

qual_cols = brewer.pal.info[brewer.pal.info$category == 'qual',]
many_colors = unlist(mapply(brewer.pal, qual_cols$maxcolors, rownames(qual_cols)))

Now many_colors has a large color palette for us to make our plot:

chorddiag(
  xtabs(~MAJOR+minor, data = mmhl[mmhl$Grad.Year == 2017,],
        drop.unused.levels = TRUE),
  showTicks = FALSE, groupColors = many_colors, type = "bipartite"
)

I think it’s interesting to see the change over time, here’s the same plot using 2016 data:

chorddiag(
  xtabs(~MAJOR+minor, data = mmhl[mmhl$Grad.Year == 2016,],
        drop.unused.levels = TRUE),
  showTicks = FALSE, groupColors = many_colors, type = "bipartite"
)

Perhaps we could animate over the years… But I should finish my second year evaluation first.

Related