Statistical Learning in Music – or the tale about how Mozart should have stayed in bed and Mr Simonton deserved his place in music history books

Speaker: Dr Simon Durrant, Neuroscience and Aphasia Research Unit, University of Manchester

20th January 2011

Rumour has it that the childhood prodigy Mozart felt uncompromisingly compelled to resolve unfinished cadences played on the clavier downstairs after bedtime. Similar anecdotes are ascribed to Anton Rubinstein and J.S. Bach, and whether fictional or not, most people undoubtedly know the feeling of having precise expectations about the continuation of musical pieces. Indeed, musicians’ expectations seem particularly strong (Krumhansl & Shepard, 1979; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010) so no wonder that those of one of the most brilliant musical geniuses were sufficiently persistent to drag a poor boy out of bed.

Whereas Mozart was taught that necessity of dissonance resolution was due to prescriptive compositional rules (not that he would mind breaking a few of those along the way), present-day cognitive science offers another explanation of melodic expectations: The schemas from which expectations arise are thought to be internalised by statistical learning.

(The use of images is solely for educational, non-commercial purposes. Despite our best efforts, it has not been possible to identify rightsholders.)

The Music, Mind & Brain students at Goldsmiths recently received an authoritative introduction to this important topic thanks to Dr Simon Durrant’s contribution to this year’s Invited Speaker Series. Opening his presentation with a slide depicting the aforementioned wunderkind engaged in “numerical exercises” at the clavier (right), Durrant passionately engaged his audience with the theory, design and results of his, as yet, unpublished work.

In cognitive psychology, statistical learning refers to learning of statistical properties in sensory input. When acquiring their native language, in addition to auditory cues (accents, pauses etc.), infants rely on transitional probabilities when segmenting words from continuous speech.  For instance, “pretty baby” is segmented thus because in infant-directed speech the syllable “pre” is followed by “ty” with roughly 80% probability whereas “ty” is only followed by “ba” in about 0.3% of cases (Saffran, 2003).

Saffran and colleagues (1996) addressed this topic by systematically manipulating transitional probabilities. They played continuous streams of nonsense consonant-vowel syllables to infants and subsequently tested their ability to distinguish tri-syllabic words that occurred in the stream from those that didn’t. If you hear “golabupabikututibubabupugolabubabupu…” (any resemblance with baby speech is probably unintended), you’re most likely to feel unable to make much sense of that. Nevertheless, Saffran and colleagues showed that your brain decodes statistical patterns all the time – even when you’re busy doing something else (Saffran & Newport, 1997). Furthermore, we shouldn’t underestimate our own abilities; after all, there is evidence that even brains of tamarin monkeys learn transitional probabilities (Hauser, Newport, & Aslin, 2001)!

Much of Durrant’s research uses a musical variation of this paradigm where Saffran and colleagues (1999) replaced syllables with absolute pitches from the chromatic scale.  Their results demonstrated that we are dealing with a domain-general learning mechanism which is not just specific to language. Durrant didn’t succeed in fully replicating these findings, but did discover that some “tone words” were chosen as familiar more often than others. Subsequent analyses showed that average pitch height accounted partly for these preferences. This was consistent with Durrant’s own research on melodic accent (while he was still a research fellow with David Huron in Ohio) showing that higher pitches are perceived as louder and thus more salient due to human auditory thresholds. However, when pitch height was factored out, systematic differences still remained, suggesting that other stimulus artefacts might be responsible for tone word biases.

Durrant then wondered if our prior experience with Western, tonal music could be the reason for this. Artificial Neural Network (ANN) Modelling allows a setting where such enculturation can be carefully controlled because researchers define themselves how the model is “trained” (see e.g. Hazan et al., 2008). Importantly, Durrant now added a pre-exposure test to study differences between performance before and after exposure. With no prior training, the ANN model failed to replicate human biases. This means that biases are unlikely to be caused by the stimuli themselves, but possibly represented an enculturation effect.

If enculturation was really the reason for tone word biases, then it should be possible to replicate human data simply by “enculturating” the ANN model into tonal music. Durrant succeeded in this. In fact, his model performed equally well on this task when trained to pieces by Mozart (there he is again!) or to artificial stimuli generated from transition statistics derived from music and published by Simonton (1984) (don’t get me wrong, but this does sound slightly less artistic to me). So does this mean that automatically generated tone sequences rank with strokes of musical genius? Well, this wasn’t really the point in question. However, Durrant’s results do suggest that enculturation is rather robust across different styles of tonal music.

An interesting finding was that the ANN model performed relatively similarly before and after exposure. Doubting whether there was any effect whatsoever of short-term statistical learning, Durrant put away computational models for a while and ran the same experiment on humans. He found that their performance similarly differed much less than would be expected based on the results from the original study by Saffran and colleagues (1999). So, apparently our cultural bias exerts a much stronger force than short-term exposure, and therefore the former tends to override the latter.

Nevertheless, to save the honour of short-term statistical learning (which was after all the topic of his talk), Durrant (right) elegantly suppressed enculturation by constructing tone words from the highly unfamiliar Bohlen-Pierce scale (Reeves, Roberts, Mathews, & Pierce, 1988). Even stronger learning effects were obtained when reducing memory load by halving tone durations.

So statistical learning of tone sequences does take place. However, such learning effects can be swamped by our prior enculturation in tonal music (like that of Mozart and Mr Simonton who seemingly now claims his place in music history books). But what kinds of statistical information do we actually extract from music? Addressing this issue, Durrant produced exposure streams where combinations of two adjacent notes were necessary to predict the following one (unpredictable zeroth- and first-order but predictable second-order transitions). Consistent with findings from the visual domain (Fiser & Aslin, 2002), people did learn higher-order statistics enabling them to distinguish structured from unstructured test sequences with learning effects increasing as a function of predictability.

Durrant subsequently explored neural correlates of statistical learning with neuroimaging techniques. He argued that both declarative and non-declarative memory systems are involved, and that we learn from our prediction errors (predictive coding) (Furl et al., 2011).

In sum, Durrant’s talk was an exemplary demonstration of how a problem can be approached systematically from different methodological perspectives outlined in a remarkably clear narrative. Future research directions could include investigating relations between relative and absolute pitch encoding. Running experiments with people who possess absolute pitch might be valuable in this respect. Non-adjacent probabilities may also be learned–especially when tempo and pitch intervals increase and tone sequences are heard as separate streams (van Noorden, 1975). Full publication of Durrant’s findings would be helpful to researchers who are eager to address these and other questions.

Durrant, however, is a busy man. As a research associative in the Neuroscience and Aphasia Research Unit at the University of Manchester, he recently provided evidence that statistical learning consolidates during sleep (Durrant, Taylor, Cairney, & Lewis, 2011; Durrant, Taylor & Lewis, 2010). So perhaps the young Mozart should have stayed in his bed after all.

Niels Chr. Hansen


Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant Melodies: Statistical Learning of Nonadjacent Dependencies in Tone Sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(5), 1119-1130. doi: 10.1037/0278-7393.30.5.1119

Durrant, S. J., Taylor, C., & Lewis, P. A. (2010). Sleep-Dependent Consolidation of Statistical Learning. 17th Annual Cognitive Neuroscience Society Meeting, Montreal, Canada.

Fiser, J., & Aslin, R. N. (2002). Statistical learning of higher-order temporal structure from visual shape sequences. J Exp Psychol Learn Mem Cogn, 28(3), 458-467.

Durrant, S.J. Taylor, C., Cairney and Lewis, P. A. (in press). Sleep-Dependent Consolidation of Statistical Learning. Neuropsychologia.

Furl, N., Kumar, S., Alter, K., Durrant, S., Shawe-Taylor, J., & Griffiths, T. D. (2011). Neural prediction of higher-order auditory sequence statistics. NeuroImage, 54(3), 2267-2277. doi: 10.1016/j.neuroimage.2010.10.038

Gebhart, A. L., Newport, E. L., & Aslin, R. N. (2009). Statistical learning of adjacent and nonadjacent dependencies among nonlinguistic sounds. Psychonomic Bulletin & Review, 16(3), 486-490. doi: 10.3758/pbr.16.3.486

Hauser, M. D., Newport, E. L., & Aslin, R. N. (2001). Segmentation of the speech stream in a non-human primate: Statistical learning in cotton-top tamarins. Cognition, 78(3), B53-B64. doi: 10.1016/s0010-0277(00)00132-3

Hazan, A., Holonowicz, P., Salselas, I, Herrera, P., Purwins, H., Knast, A., & Durrant, S. (2008). Modeling the Acquisition of Statistical Regularities in Tone Sequences. 30th Annual Meeting of the Cognitive Science Society, Washington D.C., USA.

Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5(4), 579-594. doi: 10.1037/0096-1523.5.4.579

Newport, E. L., & Aslin, R. N. (2004). Learning at a distance I: Statistical learning of non-adjacent dependencies. Cognitive Psychology, 48(2), 127-162. doi: 10.1016/s0010-0285(03)00128-2

van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Eindhoven. University of Technology. Unpublished doctoral dissertation.

Pearce, M., Ruiz, M. H., Kapasi, S., Wiggins, G., & Bhattacharya, J. (2010). Unsupervised statistical learning underpins computational, behavioural and neural manifestations of musical expectations. NeuroImage, 50, 302-313.

Reeves, A., Roberts, L. A., Mathews, M. V., & Pierce, J. R. (1988). Theoretical and experimental explorations of the Bohlen-Pierce scale. The journal of the Acoustical Society of America, 84(4), 1214.

Saffran, J. R. (2003). Statistical language learning: Mechanisms and constraints. Current Directions in Psychological Science, 12, 110–114.

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928.

Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27-52.

Saffran, J. R., & Newport, E. L. (1997). Incidental language learning: Listening (and Learning) out of the Corner of Your Ear. Psychological Science, 8(2), 101-105.

Simonton, D. K. (1984). Melodic structure and note transition probabilities: A content analysis of 15,618 Classical themes. Psychology of Music, 12(1), 3.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s