A Corpus Study of Rock Music

Programs

Below are some programs that we have used in this corpus study. The program expand6.c, written in C, expands a harmonic analysis file into a "chord list." The Perl script process-mel5.pl converts a melodic transcription into a "note list". The Perl script add_timings.pl adds absolute timing info to a note list or chord list.

Other Perl scripts were used to extract aggregate statistics about from one or more harmonic analyses or melodic transcriptions. (These programs were used to generate the statistics presented in our 2011 Popular Music paper.)

These programs are all described in the documentation area below. Click on a "code" link to see the code, or right-click (control-click) to download.

Documentation

expand6.c

process-mel5.pl

add-timings.pl

tally.pl

compare.pl

compare-meter.pl

trigram.pl

tally-pitches5.pl

tally-mel.pl

expand6.c [code]

The expander program takes an analysis file - a list of rules, written in the syntax of our harmonic analyses. It searches for the rule with "S" as the LHS, and then outputs its RHS, recursively expanding any nonterminals. If there is an error in the syntax or the analysis cannot be interpreted - for example, because a nonterminal symbol is used in a definition but not defined elsewhere - the program outputs an error message and quits.

The program is in C and requires a C compiler. Compile the program like this (in a Unix window, e.g. the Mac "terminal" window):

cc expand6.c -o expand6

Run it like this:

./expand6 -v [verbosity] [input file]

If verbosity=0, the output is just a "chord list" - a list of chord statements, like this:

 
0.00  4.00 I    0   1   4   4
4.00  6.00 IVb7 5   4   4   9
6.00  7.00 V7   7   5   4  11
7.00 12.00 I    0   1   4   4
---

(The "---" at the end is to separate one song from the next if multiple chord lists are concatenated.)

Each chord statement has the form

[start] [end] [Roman numeral] [chromatic root] [diatonic root] [key] [absolute root]

start = the start time of the chord segment, in relation to measures, e.g. 0.0 = start of m. 1, 0.5 = halfway point of m. 1, etc.
end = end time of chord segment
Roman numeral = complete chord label for chord, exactly as in input file
chromatic root = integer of root in relation to the current key, adjusted for applied chords (e.g. I=0, bII=1, II=2; V/ii = VI = 9)
diatonic root = diatonic category of chromatic root, e.g. VI = 6
key = integer of current tonic, e.g. C = 0, C#/Db = 1
absolute root = chromatic root + key, e.g. V in D = A = 9

(Note: When two successive chords have the same root and key, they are collapsed into a single chord in the chord list.)

If verbosity = 1, the output is an expanded one-line analysis of the song - for example:

[C] I IV | I IV | I ii | V | I V | vi IV | I IV | I IV | I ii | V | I

If verbosity = -1, the program simply outputs a list of measure numbers with the time signature for each measure. A time signature is represented with an integer, which is (100 x numerator) + denominator, e.g. 2/4 is 204.

If verbosity =-2, the program outputs a list of key sections, where each section has an start time (in metrical time) and a key (in integer notation).

process_mel5.pl [code]

This program takes a melodic transcription and converts it into various output formats. Run it like this:

./process-mel5.pl [output mode] [rhythm-checking mode] [input file]

"Output mode" can be any integer between -6 and 3:

If 0: Print out relative pc integers (scale degrees), absolute pcs, and (estimated) metrical note durations, one note per line (a note's duration is estimated as min(1, IOI), where IOI is the inter-onset interval to the next note)
If 1: Print out scale degrees, all on one line
If 2: Print out scale degrees, all on one line, then pitch numbers (middle C = 60), using same line breaks as input
If 3: Print out ontimes, pitches, and scale degrees, one note on each line (a "notelist")
If -1: Print out binary vector of chromatic scale-degrees
If -2: Print out proportional vector of chromatic scale-degree counts
If -3: Print out proportional vector of absolute pc counts
If -4: Print out proportional vector of absolute pc lengths
If -5: Print out time signature integers for each measure, one per line (404 = 4/4, 608 = 6/8, 1208 = 12/8, etc.)
If -6: Print out list of key sections, with a start-time and key (in integer notation) for each section

Output mode 3 is the "notelist" format - something like this (for the opening of "Hey Jude"):

The first column represents the ontime of a note (in relation to measures, i.e. 1.000 is the downbeat of the second measure); the second column represents the pitch, and the third column represents the scale-degree.

If rhythm-checking mode is 1, the program will check to make sure that each measure contains a standard number of segments in relation to the state time signature: For example, a measure in 4/4 time must contain 1, 2, 4, 8, 16, or 32 segments. If rhythm-checking mode is 0, no rhythm-checking is done. (DT's transcriptions all pass this test. TdC's do not, as he sometimes uses "tuplets"; for example, a measure in 4/4 might contain 12 segments.)

add-timings.pl [code]

This program adds timing data to a note list or chord list. Run it like this:

./add-timings.pl [input-mode] [notelist or chordlist] [list of measure times]

If input-mode = 0, the program expects a chord list; if input-mode = 1, it expects a note list. The second argument is then the filename of either a notelist or chordlist, as appropriate; the third argument is a list of absolute times for each measure, such as we provide here.

The output is either a "timed notelist" or a "timed chordlist". For "Hey Jude", the beginning of the timed notelist looks like this:

  0.475   0.75 60 7
  1.284   1.00 57 4
  3.306   1.62 57 4
  3.711   1.75 60 7
  4.115   1.88 62 9
  4.520   2.00 55 2

Essentially this is the same as the input notelist, except that a column of absolute times has been added at the far left, indicating the absolute onset time of each note onset in the recording. (These times may not be exact. They are estimated from the measure timings in the timing file, in combination with the metrical position of the note: e.g. a note at metrical position 1.5 is assumed to be exactly halfway between the two downbeats on either side.)

The beginning (and ending) of the timed chordlist for Hey Jude looks like this:

  1.284   1.00    I 0 1 5 5
  4.520   2.00    V 7 5 5 0
 10.987   4.00    I 0 1 5 5
 14.119   5.00   IV 5 4 5 10
 17.304   6.00    I 0 1 5 5
  (etc.)
  .
  .
421.907  132.00  End

This is like the original chord list but with absolute start-times added for each chord at the far left; the second column is the metrical start-time for the chord. We omit the end-time of each chord, since this is always equivalent to the start-time of the next chord, both in metrical time and absolute time. At the end of the file there is an end statement (shown above) indicating the absolute and metrical end-times for the final chord.

tally.pl [code]

This script takes in a chord list or a series of chord lists (separated by "---") and outputs aggregate data as follows. ("Time" is measured in terms of measures on the timeline, i.e. one measure is one unit.) Specify the input file on the command line, e.g. "./tally.pl [input-file]" (or pipe in using the UNIX "pipe" command).

1. Overall statistics: Total chord count, total time, number of major/minor/diminished/augmented chords (including all sevenths with whatever triad type they are based on), number of root-position/inverted chords.
2. The number of occurrences of each chromatic root
3. The total amount of time spent on each chromatic root
4. The count of each chromatic-root transition between one chord (the "antecedent") and the next (the "consequent"). (This assumes the same key for both chords; key-changing transitions are skipped.)
5. For each possible consequent chord, the proportional frequency of each antecedent chord
6. For each possible antecedent chord, the proportional frequency of each consequent chord
7. The distribution of chromatic root intervals. Pitch-class notation is used: each interval is represented by its size in semitones, and all intervals are assumed to be ascending. Thus, 0 is a repetition; 1 is an ascending minor second (or descending major seventh); 2 is an ascending major second (or descending minor seventh); etc.
8. The distribution of diatonic root intervals (so minor and major seconds are lumped together). In this case, each interval is represented by its smallest form. So "+M/m2" means an ascending major/minor second (or descending M/m seventh); "-M/m2" means a descending major/minor second (or ascending M/m seventh); etc.

compare.pl [code]

This script takes two chord-list files (specified on the command line) and compares them, outputting the total amount of time for which they are in agreement.

If $v (verbosity) = 1, the program outputs parallel chord lists indicating differences. If $v=0, it just outputs the total number of measures found, the number of measures in agreement, and the latter as a proportion of the former. So the output "50.00 40.00 (0.800)" means, 50 measures were found; the analyses were in agreement on 40 measures; and 40 / 50 = 0.8.

The script requires that the start times of the two chord-lists (i.e. the start times of the first chord) are the same. If the end time (i.e. the end time of the final chord) of the two lists are not the same: if v=0, the script simply outputs an error message and exits; if v=1, it adjusts the earlier end time to match the later one, and then does the comparison, but outputs a warning as well.

The script can be used to compare chromatic roots, absolute roots, or key, depending on the value of $cf ("compared feature"), set at the top of the code. If $cf = 3, it compares chromatic roots; $cf = 5, keys; $cf = 6, absolute roots.

compare-meter.pl [code]

This script takes two chord-lists and compares their time signatures (using the time-signature list of the kind output by expand6 with verbosity = -1). If the lists are identical - the same number of measures, with the same time signature for each measure - it outputs "OK". If there are mismatches, it identifies them, e.g. "Mismatch on m. 56 (304, 404)".

(In our 2011 paper, we wanted to compare our harmonic analyses, but there didn't seem to be any point in doing this unless the analyses were identical metrically, i.e. with the barlines in the same places. So we used compare-meter.pl to check this before comparing the harmony.)

trigram.pl [code]

This script takes a chord list and extracts "trigrams", sequences of three successive chromatic roots (all wthin the same key; trigrams spanning a key boundary are ignored). Basic Unix commands can then be used to get aggregate data, e.g.

./trigram.pl [input file] | sort | uniq -c | sort -nr

This gives you a list of all the trigram types, with counts, ranked by count.

tally-pitches5.pl [code]

This script takes a chord list, or a series of them (separated by "---") as standard input, and generates a pitch-class distribution, assuming that each chord symbol implies a single instance of each pitch-class that it contains: for example, a C major chord implies one C, one E, and one G. It can be modified to count either relative pitch-classes (i.e. scale-degrees) or absolute pitch-classes.

The script could be run like this:

cat [input file or directory] | ./tally-pitches5.pl [count-mode] [abs or rel] [output-mode]

If count-mode = 0, it outputs a distribution of pc counts; if count-mode = 1, it outputs a distribution of pc durations (taking the duration of each chord to represent the duration of each pc it contains).

If "abs or rel" = 0, it outputs a distribution of absolute pcs; if 1, it outputs a distribution a relative pcs (scale-degrees).

If output-mode = 0, it prints the distribution vector on one line; if 1, it prints more verbose output; I -1, it prints a binary vector on one line (1 if the pc occurs at all, 0 otherwise).

tally-mel.pl [code]

This script takes melodic data with three numbers on each line - scale-degree, absolute pc, and note length (as produced by process-mel5.pl in output mode 0). It tallies up the scale-degrees. It takes standard input that is "piped" in. So one way to run it would be like this:

cat [input file or directory] | ./tally-mel.pl [output-mode]

The program can be used to generate a scale-degree distribution from multiple melodic transcriptions. If output-mode = 0, it outputs a proportional vector of SD counts; if output-mode = -1, it outputs a proportional vector of SD lengths.