Teaching Science

(and Teaching Scientists...)

I am passionate about teaching scientific methodology, especially to humanities scholars. The core of the scientific method is a priori hypothesis testing, a practice that is critically lacking from most humanities work! I am particularly interested in statistical and quantitative methodologies—check out my first published (forethcoming) methodology paper here. As much as I like to teach science to humanists, I am also particularly adept at teaching humanities concepts to scientists!

Statistics

Good statistics are essential to science. Unfortunately, good statistics are hard, and too many scientists (one being too many) don't take the time to think deeply about their statistics. In fact, there is a certain pressure for statistical reporting to simply conform to popular "norms"—simply because "everybody it does it that way"—which can lead to thoughtless "cookie cutter" application of the same basic stats models, even when they are inappropriate. To some extent this stiff, "standardization," serves a purpose—holding everybody to the same objective standard—but I believe the risks outweigh the benefits.

Distribution assumptions: Finding the right link-function

Most reported statistical tests are based on some form of the general linear model— (GLM). The GLM is a very flexible concept, which can be applied to many different data sets and hypotheses. However, many scholars only ever use a few "standard" versions of the GLM: Student's t-test for comparing two categories; the Analysis of Variance (ANOVA) for comparing more-than-two categories; and linear regression for cleary linear relationships. These three tests are based in the same fundamental assumption about the uncertainty, or error, in the data: specifically, that it is normally distributed. Unfortunately, the normal distribution is not a good description of error in much experimental data. Luckily, the GLM can be realized in a bunch of different, more flexible, forms using different "link-functions" and error distributions, which are often better descriptions of our data!

Many experimental studies gather data in the form of binary counts. For instance, we might ask \(35\) participants to each listen to \(100\) pieces of music and identify how many they think sound happy. Participant 1 might identify \(67\) pieces as happy, while another Participant 2 identifies \(55\), etc. We would end up with \(35\) numbers (one per participant), each an integer in the range \(\left[0,100\right]\). In order to treat this data to linear regression or ANOVA, many scholars would calculate each count as a "the proportion of happy songs" by dividing each count by the total (in this case \(100\)). They would then apply linear regression or ANOVA directly to these proportions, which means they assume the error in the data is normally distributed. Unfortunately, in such cases, this is technically not possible: proportions are bounded (they can't be greater than 1 or less than 0) and they are not exactly linear—for instance, the difference between \((0.56 - 0.55)\) is not really the same difference as \((0.99 - 0.98)\). To give a concrete example: imagine that the average proportion turns out to be 0.95 with a standard deviation of 0.1. This would imply that about 0.3085 of values are expected to be \(\gt 1\), which we know is impossible. For this type of data, instead of plain linear regression/ANOVA on the proportions, we can use logistic regression. Logistic assumes/treats the error in your data as a binomial distribution (like a coin flip), and uses the logistic link-function to make it so the parameter space (\(\left[0,1\right]\)) is put on a nice curve that is bounded between 0 and 1.

Independence assumptions: Hierarchical models

Another assumption that most statistical models make about data is that random error on each data point is independently and identically distributed (IID). This assumption is fundamental to the maximum likelihood estimation that underlies most statistical models. Put plainly, if data points are not independent, then we have no way of knowing how likely they are as a group! Unfortunately, many experimental designs (necessarily) collect data points that are not independent. For example, we have experimental participants give us multiple data points—so called repeated measures—: if and variable property of a person (intelligence, knowledge, personality, mood, etc.) influences the responses a person gives, then multiple data points from the same person will tend to be systematically alike! In music, whenever we analyze large musical corpora, we have many different notes/chords/rhythms which are drawn from the same piece of music; again, these data point aren't really independent of each other.

One very flexible and powerful approach to dealing with data dependence is an expansion of the general linear model called multi-level models (a.k.a, "hierarchical" models or "mixed-effects" models). In 2016, I taught a workshop on multi-level models at the Society for Music Theory's national conference in Vancouver, and I've consulted regarding these models on a number of other projects.

Syllabi

Check out these actual, and proposed, university course syllabi:

Actual

Proposed