Research: Computational Musicology

As a musician, composer, and theorists, I can't help but be interested in deeply musicological and music theoretic questions:

  • What is the difference between tonicization and modulation?
  • How do we explain "blue notes"?
  • What's the difference between \(V^7\) and vii\(_o\)?
Traditional musicological scholars approach these questions by studying and exploring hundreds of pieces in loving, excruciating, detail, over the course of many years, forming a rich and detailed knowledge of the musical repertoire that interests them, which they can use to answer these questions. They are drawn to the nuances and details of "great" musical works—details associated with the most profound artistic experiences we as humans can experience. However, humans can only hold so much detail in mind, and evaluating and comparing pieces objectively becomes difficult. What's more, individual musical passages can only be properly understood in the context of a musical style and more broadly the cumulative musical experience of "enculturated" listeners. This is what drives me to work with large digital corpora. Using computers, and applying scientific sampling methods, we can survey large bodies of music (hundreds or thousands of pieces) much faster, more consistently, and objectively, than traditional humanistic methods. Most importantly, we can step back from the data, allowing us to test a priori hypotheses. We call this type of work computational musicology.

Data

Computational musicology requires data. Unlike Music Information Retrieval researchers, computational musicologists mostly study symbolic music data: digital representations of musical scores (or transcriptions). I've been involved with the creation and curation of a number of symbolic music corpora: For my dissertation, I transcribed and curated a dataset (MCFlow) of over 120 popular rap songs. I was also intimately involved in the construction of the Theme and Variation Encodings with Roman Numerals dataset. At McGill, I worked with a team to prepare a new corpus of chorale music by Michael Prætorius, to serve as a valuable point of contrast with the frequently analyzed chorales of J.S. Bach. I am currently working to add melodic, poetic, and lyrical data to McGill's existing Billboard rock/ pop harmonic transcriptions.

Coding

Computational musicology also requires knowledge of computer programming. Most computational musicologists, including myself, develop most of their own scripts and software from scratch. Like most computational musicologists, I would like to see all this duplication of effort ended, and thus aim to create and share open source tools for music research. I am familiar with the two main, general purpose computational musicology toolkits: the humdrum toolkit and music21. I am also lead developer of humdrumR, a project to port the humdrum toolkit into the R programming language.

My tools, like humdrumR, are not just aimed at existing coders, but at scholars with little or no coding experience. By making our tools accessible to traditional humanities scholars, we encourage them to discover the power of computational techniques—drawing more talent and humanistic knowledge into our already burgeoning field. I also create user-friendly GUIs to make music theories and research results accessible to laypeople. For instance, as part of my dissertation project I created a website (rapscience.net) that allows users to easily access and visualize my rap dataset. Users are able to make a variety of flow diagrams of verses in the corpus, or generate a number of descriptive visualizations of rhythm, phrasing, and rhyme in subsets of the corpus. Thus, I work to engage public interest in music theory and science.