Lyrics and Stress Encodings
In 2016-17, we added lyrics and syllabic stress information to a subset of the melodies in the Rolling Stone corpus. That data was originally released in August 2017 (which is archived here). A revised version, which corrects some errors, was released in February 2018. We provide links to the most recent version below, along with an explanation.
These materials are available for download (each one is a zipped file):
- Transcriptions of 99 songs with melismas added
- Lyrics for 80 songs
- Stress files for 80 songs, showing lyrics and stress values aligned with notes
Transcriptions with Melismas
In order to align the lyrics with the melodies, we had to mark the melodies with melismas – cases where a single syllable spans multiple notes. We added melisma information to 99 songs in the corpus. This is the "5x20" corpus that was the basis for the 2011 deClercq/Temperley paper, containing 20 songs from each decade, the 50s through the 90s (except there are only 19 songs from the 1980s).
Melismas are indicated with parentheses around the notes of the melisma. In the example below (from the Beatles' "Hey Jude"), the parentheses indicate that the second syllable of "bet-ter" spans three notes.
Take a sad song and make it bet-ter--- 2 3 | 4 . ^1 . . 1 7 5 | 6 . (5 4 3) . . .
Notice that, once melismas are added, and a melisma is treated as a single note, notes are in a one-to-one relationship with syllables.
Aside from the parentheses, the melodies are in exactly the same format as our other melodic transcriptions; that notation system is described here. Transcriptions containing the melismas are available via the first link above. (In the process of adding the melisma transcriptions, some small changes were made in the pitches and rhythms.)
Adding Lyrics and Stress Information
Of the 99 songs annotated with melismas, we then selected all of the songs that use only 4/4 or 2/4 time signatures. 68 of the songs are entirely in 4/4; another 12 songs are mostly in 4/4 but with occasional 2/4 measures. (No songs are mostly in 2/4.) These 80 songs constitute our lyric- and stress-annotated corpus.
For each of the 80 songs, we downloaded lyrics off of the internet (from chartlyrics.com). We used the CMU Pronunciation Dictionary to identify the number of syllables in each word and the stress pattern of the word. The CMU Dictionary uses this convention:
- 0 = unstressed syllable
- 1 = primary stress: the stressed syllable in a word containing only one stressed syllable (e.g. "TROU-ble") or the most stressed syllable in a word containing multiple stressed syllables (e.g. the first syllable in YES-ter-DAY)
- 2 = secondary stress: assigned to stressed syllables in a word containing multiple stresses, other than the primary stress (e.g. the third syllable in YES-ter-DAY)
- 3 = non-word, such as "ah", "ooh", or "na"
By convention, one-syllable function words (such as articles, prepositions, and pronouns) are considered unstressed (though they are marked as "1" in the CMU dictionary). Using this list of function words, we assigned "0" to all function words.
We then aligned with words with the syllables, in the following format:
404 2.000 55 2 1 YESTERDAY[1] 404 2.125 53 0 0 YESTERDAY[2] 404 2.188 53 0 2 YESTERDAY[3] 404 3.250 57 4 0 ALL[1] 404 3.375 59 6 0 MY[1] 404 3.500 61 8 1 TROUBLES[1] 404 3.563 62 9 0 TROUBLES[2]
Each line indicates a syllable.
- The first column indicates the time signature of the measure in which the syllable occurs, using the notation we use elsewhere in the corpus: 404 = 4/4, 204 = 2/4, 1208 = 12/8, etc. (In the stress-annotated corpus, only 404 and 204 occur.)
- The second column indicates the timepoint of the syllable's onset, in measures: 0.000 is the first downbeat, 1.000 is the second downbeat, and so on. Decimal values represent proportions of the current measure. In 4/4 time, 0.500 represents the third quarter-note of the measure; in a 2/4 measure, it indicates the second quarter note beat.
- The third column indicates the pitch (middle C = 60)
- The fourth column indicates the scale-degree (the pitch-class in relation to the key: 0 = scale-degree 1, 1 = b2 or #1, and so on)
- The fifth column indicates the stress level of the syllable, using the CMU conventions described above
- The sixth column indicates the word containing the current syllable, followed by the position of the syllable within the word: YESTERDAY[1] refers to the first syllable of the word.
For each song we created a list of syllables in this format, and put it in a file named [SongTitle].str. These are the "stress files" available in the third link above. Note that these files contain only syllabic notes (notes that span an entire syllable) and the first note of each melisma; non-initial melisma notes are not included.
The script process-melisma.pl is the same as process-mel5.pl: it takes one of our melodic transcriptions and converts it into a list of notes or other formats. Unlike process-mel5.pl, however, this script can read melismas. When the flag "skip_melismas" (inside the code) is set to 1, it outputs (with verbosity = 3) only the first note in each melisma. If the flag is set to 0, it ignores melismas and behaves like process-mel5.pl.