Step in time: Musical ensemble coordination in cross-cultural settings

Blog banner

“We hold these truths to be self evident, that all men are created equal”; an eloquent start to Thomas Jefferson’s Declaration of Independence, but also an apt summary of a model too often assumed in modern psychology research. Generalised claims regarding human behaviour are based almost exclusively on studies sampling WEIRD societies – Western, Educated, Industrialized, Rich, and Democratic societies.


Figure 1. Participant demographics from meta-analysis. Source: Jakubowski, 2016

During her talk entitled “Musical ensemble coordination in ecological and cross-cultural settings” at Goldsmiths, University of London, Dr Kelly Jakubowski introduces the demographic breakdown of participants from a selection of 97 studies on the effects of background music. She describes that 37% of the studies use a sample of “undergraduate students” and 29% of the studies use a sample of “university students”, totalling more than 60% of the sample pool being drawn from higher education establishments, as shown in Figure 1. From her selection, studies relating to music and synchrony (the field in which Dr Jakubowski is currently conducting her research) make bold claims such as ‘In a world rife with isolation, the aligned representations in interpersonal synchrony may provide a means for togetherness and connection.’, (Hove & Risen, 2009). But with participants sampled from such a narrow demographic, how can such claims be substantiated across societies and cultures?

Dr Jakubowski, along with colleagues at Durham university, is researching Interpersonal Entrainment in Music Performance (IEMP), which explores how people coordinate movements in time to perform music together. From a Western classical symphony orchestra to a South Indian carnatic music ensemble, all musicians utilise interpersonal entrainment (the timing coordination between individuals) to create a cohesive musical performance. However, different patterns of coordination and levels of synchrony are used across cultures and musical styles. How levels of asynchrony in musical performance affects aesthetic judgement is a topic of debate. Ethnomusicologist Charles Keil argues that for music to be meaningful and involving for listeners, it must be “out of time” and “out of tune” (a phenomenon he describes as “participatory discrepancy”), perhaps because this suggests a relatable element of human error. However, this view is contested by perceptual studies (e.g. Senn et al., 2016) which show that participants actually preferred as much synchrony as possible in musical performance.

The team at Durham University are currently studying both audio and visual coordination in musical performance across cultures, including Indian classical music, Malian Djembe, jazz, Tunisian Stambeli, and Cuban dance music. The research into audio coordination involves studying synchronicity of instruments using sound onset detection. In addition to this, their phase relationship in measured, indicating which instrument leads or follows another. An interesting finding is that the variability of asynchrony between a drummed instrument and plucked instrument (such as a guitar) decreases as the note density increases, i.e. the faster the music, the more out of sync the instruments. This in contrast to asynchrony between two drummed instruments, which remains constant regardless of the speed of the music.



Figure 2. Validation study comparing Video and Motion Capture Data. Source: Jakubowski et al., 2017

Ancillary movements are those made by performers which are not directly related to sound production. These movements are critical to musical ensemble coordination, and are the focus of the visual research by the team at Durham. Sound-producing movements occur over a timescale of milliseconds, and so can only be captured by specialist Motion Capture systems which have a temporal resolution of 120 – 160 frames per second. Ancillary movements, on the other hand, occur over a much longer timescale of seconds, such that standard video recording (with a frame rate of 25 – 30 fps) can be used to record these. Current motion capture systems deliver high definition data, but are most often constrained to a laboratory environment, due to the nature of the fixed camera sensors required for data collection. These systems are not particularly useful for field work where conditions are rarely under the researcher’s full control, and data must usually be collected promptly in an occasionally less than ideal situation. The miniaturisation of camera sensors over the last 10 years has allowed researchers to work in the field and collect high quality video footage, in this case of ensembles performing together, and bring this back to the lab for analysis. One might imagine that for movement tracking, data from a Motion Capture system would far outperform that from a standard video extraction. However, a validation study conducted by Dr Jakubowski’s team (as shown in Figure 2) reported high correlations of .75 to .94 between the output of the two systems, suggesting that data extracted from video field recordings can be used to accurately track these ancillary movements.

During this validation study, the team analysed movement data from a collection of 30 videos of duos improvising in a controlled environment, and extracted an aggregate measure (the Cross-wavelet Transform) which is related to periods of peak movement between the two performers. They then compared this with a panel of expert musicians’ indications of visual interaction between the performers in the videos, aiming to validate this measure as a quantitative predictor of the experts’ qualitative indications. 72% of the periods of interaction could be predicted using just this CWT measure, a result which increased to over 90% when more specific frequency bands were added. With this new quantitative predictor, Jakubowski and colleagues were able to collect field video recordings from other cultures and perform movement tracking and analysis, to compare how patterns of movement coordination emerge as a function of other performance attributes (such as musical style, structure, metre, and performer hierarchy).

The second area of research for the IEMP team is synchrony and entrainment perception by listeners. Humans are able to distinguish the onset of two sounds as distinct with a separation as short as just 2 milliseconds. For the listener to correctly identify which sound preceded the other, a minimum separation of 15 – 20 milliseconds is required. This latter judgement can however be affected by a common perceptual bias related to cultural instrumental hierarchy and roles, such as the assumption that a melody instrument will “lead” before an accompanying one. The IEMP team are looking into factors which affect this asynchrony perception, as it is an important part of a listener’s evaluation of performance quality and engagement. In one study, participants were exposed to audio visual recordings of improvising duos and asked to rate the synchrony of the performers (how “together” they felt the performance was). Their results show that clips which the participants rated as high in synchrony had high spectral flux (a measure of number of events in time, or ‘complexity’) in low-frequency sub-bands, a quality generally related to ratings of rhythmic strength and musical groove.

A cross culture field study was undertaken by the team, to investigate aesthetic judgement and discrimination of temporally adjusted recordings of Western jazz, Malian djembe and Uruguayan candombe music. This found a perhaps unsurprising preference across cultures for asynchrony minimisation. Interestingly however, participants in the UK listening to Malian djembe music (which is naturally non-isochronous) preferred an isochronous variant, whereas Malians preferred the non-isochronous original, and were able to better discriminate between micro adjustment of metric subdivision in their own music than music from other societies. This is perhaps because this non-isochronous rhythm is more culturally engrained than in other participants.

Though Dr Jakubowski’s work is ongoing, preliminary results clearly indicate that across cultures and societies, people’s perceptions and preference for musical features are not uniform. Differences in asynchrony and entrainment have no doubt contributed to the plethora of distinct musical styles which have developed around the world. Establishing awareness about these differences in perception is an important step towards addressing them in wider research. This in turn may help us better understand the variations that are being observed, in terms of where and how they may arise.


Frederick Taylor



Burger, B., Ahokas, R., Keipi, A., & Toiviainen, P. (2013). Relationships between spectral flux, perceived rhythmic strength, and the propensity to move. In R. Bresin (Ed.), Proceedings of the Sound and Music Computing Conference 2013, SMC 2013, Stockholm, Sweden (pp. 179-184). Berlin: Logos Verlag Berlin. Retrieved from

Henrich, J., Heine, S., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83. doi:10.1017/S0140525X0999152X

Hirsh, I. J. (1959). Auditory perception of temporal order. The Journal of the Acoustical Society of America, 31(6), 759-767. doi:10.1121/1.1907782

Jakubowski, K. (2016, September 30). How Weird is music psychology? [Blog Post]. Retrieved from

Jakubowski, K., Eerola, T., Alborno, P., Volpe, G., Camurri, A., & Clayton, M. (2017). Extracting Coarse Body Movements from Video in Music Performance: A Comparison of Automated Computer Vision Techniques with Motion Capture Data. Frontiers in Digital Humanities, 4(9). doi:10.3389/fdigh.2017.00009

Keil, C. (1987). Participatory Discrepancies and the Power of Music. Cultural Anthropology, 2, 275–283. doi:10.1525/can.1987.2.3.02a00010

Hove, M. J., & Risen J. L. (2009). It’s All in the Timing: Interpersonal Synchrony Increases Affiliation. Social Cognition, 27(6), 949-960. doi:10.1521/soco.2009.27.6.949

Senn, O., Kilchenmann, L., von Georgi, R., & Bullerjahn, C. (2016). The Effect of Expert Performance Microtiming on Listeners’ Experience of Groove in Swing or Funk Music. Frontiers in Psychology7, 1487. doi:10.3389/fpsyg.2016.01487

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s