Dmitri Tymoczko Princeton University May 13, 2005 The Geometry

May 13, 2005 - lines. To a good first approximation, chords are typically connected so that these lines .... musical practice (see the Materials and Methods).
722KB taille 1 téléchargements 239 vues
Dmitri Tymoczko Princeton University May 13, 2005 The Geometry of Musical Chords DRAFT Musical chords have a geometry that is surprisingly easy to specify. An n-note chord can be represented as a point on the orbifold Tn/Sn (the n-torus modulo the symmetric group Sn) (1). Composers in a wide range of musical styles have exploited the non-Euclidean features of these spaces, typically by utilizing short-distance pathways between structurally-similar chords (Fig. 1). The existence of such pathways depends on a chord’s symmetry, or near-symmetry, under translation, reflection, and permutation. Paradigmatically “consonant” and “dissonant” chords possess different symmetries, thereby suggesting different musical applications. Western music lies at the intersection of two seemingly independent disciplines: harmony and counterpoint. Harmony delimits the range of acceptable chords and chordsequences. “Chords,” informally, are collections of simultaneously-occurring notes. Western musical styles typically permit only a limited number of chords (for example, only the major and minor triads) and successions between them (for example, the triadic progression C major-F major, but not C major-Ef major). Counterpoint is the technique of connecting the individual notes in a series of chords to form simultaneous melodic lines. To a good first approximation, chords are typically connected so that these lines (or voices) move independently (not all in the same direction by the same amount) and efficiently (by short distances, according to some perhaps-implicit notion of musical “distance”). Such voice-leading simplifies physical performance, engages explicit aesthetic norms (2-4), and facilitates the auditory streaming necessary for perceiving music polyphonically (5). Figure 1 shows independent, efficient voice-leadings in a wide range of musical styles. In each example, arrows represent individual musical voices. Fig. 1(a) comes from the classical period and features four major triads forming an archetypal sequence: I-IV-I-V-I in C major. The voice-leading connects each note in the first chord to its

nearest successor in the second. Fig. 1(b), which is a common contemporary jazz pattern, is analogous: here the chords are again similar, and the voice-leading connects notes by short, though not necessarily minimal, paths. The voice-leadings in Fig. 1(c), which are celebrated examples of nineteenth-century chromaticism, are also efficient, though here they connect chords traditionally considered to belong to different types. Fig. 1(d) presents a series of voice-leadings in which voices move independently and efficiently within an unchanging harmony; such procedures are typical of twentieth-century “avant garde” composition. How is it that Western music can satisfy harmonic and contrapuntal constraints at once? And what determines whether two chords can be connected by efficient voiceleading? Composers and music theorists have been investigating these questions for almost three hundred years. The “circle of fifths” (Fig. S1), first published in 1728 (6), can be interpreted as depicting maximally efficient voice-leadings among the twelve familiar major scales. The Tonnetz (Fig. S2), originating with Euler in 1739 and discussed by nineteenth-century music theorists Oettingen and Riemann, depicts efficient voice-leadings among the twenty-four major and minor triads (3, 7-8, 12). Recent theoretical work (9-19) has continued this tradition, investigating efficient voice-leading among other small collections of interesting chords. However, no comprehensive theory of voice-leading has yet emerged. In this paper I provide such a theory, showing that chords that can be connected by efficient voice-leading are close in the space of all possible n-note chords. Characterizing the geometry of chord-space requires surprisingly recent mathematics: chord-space is an “orbifold,” a notion introduced by Satake in 1956 (20) and developed by William Thurston in the 1970s (21). Understanding the orbifold structure of chord-space permits a unified perspective on musical practices across a very wide range of styles and time-periods: in particular, it shows that composers have frequently (and perhaps unwittingly) exploited the special contrapuntal properties of nearly-symmetrical chords (Fig. 1). More generally, the geometry of chord-space reveals how the internal structure of a chord, including its degree of acoustic consonance,

2

determines the kind of efficient voice-leadings it can participate in. Thus, for the first time, we can precisely specify the way in which harmony and counterpoint are related.

I. Background and definitions For maximal generality, we will consider voice-leading in a continuous, octavefree space of “pitch-classes.” Two pitches are instances of the same pitch-class (or chroma) when they are one or more octaves apart. Music theorists represent pitches numerically by associating their fundamental frequencies f with real numbers according to the equation: p = 69 + 12log2(f/440)

(1)

This creates a linear pitch space in which middle C is 60, an octave has size 12, and a semitone (the distance between adjacent keys on a piano keyboard) has size 1. To create circular pitch-class space, we identify all points p and p+12, forming the quotient space R/12Z. (Here R refers to the set of real numbers and Z to the group of integers; the notation R/12Z refers to the circular quotient space, whose points are the orbits of 12Z as it acts on R.) This creates numerical equivalents for the familiar pitch-class letter names: C=0, Cs/Df=1, D=2, “D quarter-tone sharp”=2.5, and so on. Note that although we will consider the most general case of a continuous pitch-class space, in musical situations one is typically concerned with a lattice of discrete, equally-spaced points in this space, corresponding to the familiar pitch-classes of Western equal-temperament. Formally, a chord is a multiset of pitch-classes, i.e. a set in which duplicates are allowed. We will denote unordered multisets using curly braces: the C major chord is {0, 4, 7}, and the F-major chord is {0, 5, 9}. The musical term transposition is synonymous with the mathematical term translation, and corresponds to addition in R/12Z. Two chords are transpositionally equivalent if they the same up to some translation in pitchclass space. Thus the C-major chord and F-major chord are transpositionally equivalent, since {0, 5, 9} = {7 + 5, 0 + 5, 4 + 5}. Symbolically, we write T5({0,4,7}) = {5, 9, 0}. The musical term inversion is synonymous with the mathematical term reflection, and 3

corresponds to subtraction from a constant value in R/12Z. Two chords are inversionally equivalent if they are the same up to some reflection in pitch-class space. Thus the C major chord {0, 4, 7} is inversionally equivalent to the C minor chord {0, 3, 7} since {0, 3, 7} = {7 – 7, 7 – 4, 7 – 0}. We use Ix to refer to the reflection that sends 0 to x, writing I7({0, 4, 7}) = {0, 3, 7}. Musically, transposition and inversion are significant because they preserve an important aspect of the “quality” or “character” of a chord: transpositionally-related chords sound extremely similar; inversionally-related chords, somewhat less so. A voice-leading between two chords {a1, a2, …, am} and {b1, b2, …, bn} is a multiset of ordered pairs (ai, bj), such that every element of each chord is in some pair. (Regular parentheses denote ordered lists.) A trivial voice-leading contains only pairs of the form (x, x). We denote voice-leadings using vector notation A→B indicating that the ith components of the vectors are associated by the voice-leading. Thus the voice-leading (0, 0, 4, 7)→(11, 2, 5, 7) associates the root of the C major triad with both the third and fifth of the G7 chord, while associating the third and fifth of the C major chord with the seventh and root of the G7, respectively. This voice-leading can be interpreted either as a non-bijective voice-leading between (0, 4, 7) and (11, 2, 5, 7) or as a bijective voiceleading between (0, 0, 4, 7) and (11, 2, 5, 7). Music theorists have proposed numerous ways of measuring the size of a voiceleading. These measures are closely related to familiar mathematical norms, and include “taxicab norm,” Euclidean norm, and a few exotic quasi-norms indigenous to music theory (see the Materials and Methods). These proposals are at best approximations, attempts to make explicit composers’ intuitions as embodied in Western musical practice. For this reason we will not adopt any one method of measuring voice-leading size. Instead, we will require only a normlike strict weak ordering of voice-leadings, satisfying a few constraints that ensure it resembles a mathematical measure of “length” (see the Materials and Methods). To date, every music-theoretical method of measuring voiceleading size gives rise to a normlike strict weak ordering. It can be shown that for any normlike strict weak ordering, there will be a minimal voice-leading between arbitrary chords A→B that has no “voice-crossings”: that is, it is possible to move the elements of 4

A continuously to their counterpoints in B such that no two paths coincide other than at the endpoints of the process (see the Materials and Methods). Since Western composers have traditionally avoided “voice-crossing” (Fig. 1[a-c]), this result suggests that normlike strict weak orderings are at least consistent with observed features of Western musical practice (see the Materials and Methods). Furthermore, it enables the use of the standard computer-science technique of “dynamic programming” to identify, in polynomial time, a minimal voice-leading between arbitrary chords (see the Materials and Methods). II. The Geometry of Musical Chords A) the space of all n-note chords We now describe the geometry of musical chords. The elements of an ordered nnote chord can be interpreted as the coordinates of a point on the n-torus (R/12Z)n, or Tn. Every ordered pair of points in this space corresponds to a bijective voice-leading. (We can restrict our attention to bijective voice-leadings without loss of generality, since any voice-leading can be interpreted as a bijection between the appropriate multisets.) If our method of measuring voice-leading size gives rise to a metric, then we can use it to measure distances between points in this space. If our method of measuring voiceleading size is only a normlike strict weak ordering, then we can use it to compare “distances,” though without quantifying these “distances” as real numbers (see the Materials and Methods). To represent voice-leadings between unordered chords, we need to identify all points representing different orderings of the same chord. Mathematically, we form the quotient of the n-torus Tn by Sn, identifying all points (x1, x2, … xn) and (xσ(1), xσ(2), … xσ(n)), where σ is some permutation of the integers from 1 to n. This space is what mathematicians call a global-quotient orbifold (20, 21). A global quotient orbifold (or “orbit-manifold”) is the space that results from identifying all points lying in the orbits of a group acting discontinuously on a locally Euclidean space (a manifold). Orbifolds are more complex than manifolds in that they can have “singularities” at which the space is not locally Euclidean. Figure 2 shows the orbifold T2/S2, drawn using orthogonal coordinates and Euclidean distance (see the Materials and Methods). We see that the 5

space of two-note musical chords is a Möbius strip, a “square” whose left “edge” is identified, modulo a half-twist, with its right. The orbifold is singular at its circular boundary, which acts as a mirror (21). Any voice-leading between dyads can be uniquely associated with a path on Figure 2. A metric allows us to measure the length of these paths, while a normlike strict weak ordering allows us to compare two paths without quantifying their “length.” The paths corresponding to voice-leadings are the images of line-segments in the parent space Tn. They are either line-segments in the orbifold, or “reflected” line-segments that “bounce off” the orbifold’s mirror boundary. For example, on Figure 2, the voice-leading (0, 1)→(1, 0) corresponds to the path that begins at (0, 1), moves in a straight line to (.5, .5), and gets reflected back along the same line-segment to (0, 1) (Fig. 2). (To see why, imagine each pitch-class moving continuously to its destination, starting and ending at the same time.) “Reflected” voice-leadings contain voice-crossings, since the edge contains all and only those chords with duplicate pitch-classes. The “no-crossing” principle therefore asserts that there will be a minimal voice-leading between any two points that does not touch the orbifold’s singular boundary. Generalizing Figure 2 to higher dimensions is straightforward (see the Materials and Methods). Given a Euclidean metric, the orbifolds Tn/Sn are simplicial prisms whose faces are identified, modulo the orthogonal transformation that cyclically permutes the vertices of one of the faces. (Figure 2 is a 2-dimensional prism [square] whose 1dimensional “faces” [left and right “edges”] have been identified modulo the reflection that exchanges their vertices [a reflection, or 180° “twist” in the third dimension]. Threenote chords lie on a three-dimensional prism whose faces are equilateral triangles. One face is rotated 120° before the faces are identified; the resulting figure is the bounded interior of a twisted triangular 2-torus.) The singular boundary of the orbifold acts as a mirror and contains chords with duplicate pitch-classes. Chords that divide the octave into n equal parts are at the center of the orbifold, while chords containing only one pitchclass lie along its one-dimensional edge (Fig. 2). Voice-leadings represented by linesegments parallel to the orbifold’s one-dimensional edge are not independent: each voice moves in the same direction by the same amount. Voice-leadings perpendicular to the 6

boundary are independent, and preserve the sum of a chord’s pitch-classes. These above descriptions assume a Euclidean metric; closely analogous statements hold for the orbifolds with other metrics. B) consequences of the geometry Two chords can be connected by efficient voice-leading if they are “near” each other in the voice-leading orbifold, relative to some method of measuring voice-leading size and some voice-leading that serves as a standard of “efficiency” or “nearness.” Standards of “nearness” vary somewhat from musician to musician, but there is in practice a good degree of agreement about which voice-leadings are efficient. Relative to these widely-agreed-upon standards, some chords can participate in efficient, independent voice-leadings to their transpositionally- or inversionally-related forms. These chords have been the focus of much Western compositional attention, since they permit the simultaneous satisfaction of harmonic and contrapuntal constraints: they can be used in progressions that are harmonically consistent (involving chords equivalent under transposition or inversion) while also permitting good counterpoint (the voiceleading between successive chords is independent and widely considered to be efficient). The progressions in Figure 1 are all of this type. What determines the size of the minimal independent voice-leading between a chord and one of its transpositions or inversions? The answer, for any normlike strict weak ordering, is that such voice-leadings are due to the chord’s invariance, or nearinvariance, under permutation, transposition, or inversion (see the Materials and Methods). Each of these three invariances, or symmetries, corresponds to a different geometrical property and produces a different type of voice-leading. In addition, the first two invariances are closely related to the musical notions of consonance and dissonance: acoustically consonant chords are nearly invariant under transposition, while a number of paradigmatically dissonant chords are nearly invariant under permutation. We will now survey these three types of invariance, their geometrical representation in the voiceleading orbifolds, and the musical applications to which they give rise.

7

A bijective voice-leading from a chord to itself (a permutational voice-leading) acts as a permutation of that chord’s elements. A chord that has duplicate pitch-classes will be permutationally invariant (P-invariant), since there will be some nontrivial permutation of its elements that yields a trivial voice-leading. P-invariant chords lie on the boundary (or “singular locus”) of the voice-leading orbifold. Chords that are close to the orbifold’s boundary can be described as nearly P-invariant, since they will have efficient permutational voice-leadings that reflect off the nearby boundary (Fig. 2). These voice-leadings are non-minimal, since they are larger than the trivial voice-leading; they contain voice-crossings, since they touch the orbifold’s boundary. Nearly P-invariant chords include “chromatic clusters” such as {0, 1}, {0, 1, 2}, and {4, 5, 6, 7}. Such chords, which are considered to be extremely dissonant, are wellsuited for static music in which voices move by small distances within an unchanging harmonic context (Fig. 1[d]) (23). This sort of “permutational” voice-leading is characteristic of much late-twentieth century non-tonal composition, particularly the works of Gyorgy Ligeti (24). Efficient “permutational” voice-leadings can also be used to generate independent, relatively efficient voice-leadings A→Tx(A), where x is close to zero. Transposition by x semitones is an automorphism of the voice-leading orbifold that preserves the “size” of voice-leadings according to any normlike strict weak ordering: for any normlike strict weak ordering, the voice-leading (a1, a2, …, an)→(b1, b2, …, bn) is the same size as (a1 + x, a2 + x, …, an + x)→(b1 + x, b2 + x, …, bn + x). A transpositionally invariant (T-invariant) chord is a fixed point of one of these automorphisms; for n-note chords, such fixed points exist only when nx is congruent to 0, mod 12Z. Chords lying close to these T-invariant chords can be described as nearly Tinvariant, since there will be multiple transpositions of such chords located near any single fixed point. These transpositionally-related chords can therefore be connected by efficient voice-leadings. On Figure 2, the T6-related perfect fifths {4, 11} and {5, 10} lie close to the same T6-invariant tritone, {4.5, 10.5}. This accounts for the small voice-leading (4, 11)→(5, 10). By transposing the second chord up by semitone, we can obtain a fairly efficient 8

voice-leading (4, 11)→(4, 9). This voice-leading occurs between the top and lowermiddle voices of the first two chords in Fig. 1(b); analogous voice-leadings connect the top and lower-middle voices of the remaining chords in the progression. (The other two voices are linked by the smooth voice-leading between T5-related tritones, which appears on Figure 2 as rightward motion.) Similarly, on the three-dimensional voice-leading orbifold the T4-related C and E major triads lie close to the same T4-invariant augmented triad; for this reason, they are connected by a small voice-leading (0, 4, 7)→(11, 4, 8). Again, by transposing the second chord up by semitone, this voice-leading generates a fairly efficient voice-leading between T5-related major triads; the result is the first voiceleading shown in Figure 1(a). The remaining voice-leadings in Figure 1(a) can all be derived from this one by transposition and time-reversal. T-invariance is due to the evenness with which a chord’s elements are distributed in pitch-class space. A T-invariant chord either divides the octave into equal parts, and occupies the center of the orbifold, or is the union of equally-sized chords that themselves divide the octave evenly (25). (The union of differently-sized chords that evenly divide the octave is not in general T-invariant.) Likewise, a near T-invariant chord divides the octave into nearly-equal parts, or is the union of n-note chords that do so. In general, the more evenly-spaced a chord, the closer it will be do the center of the orbifold, and the smaller will be its bijective voice-leadings to its T-equivalent forms (see the Materials and Methods). Indeed, it can be shown that the chord which divides pitch-class space into n equal parts has the smallest possible minimal bijective voice-leading to all of its transpositions: for all n-note chords A, the minimal bijective voice-leading between A and Tx(A) can be no smaller than the minimal bijective voice-leading between E and Tx(E), where E divides pitch-class space into n equal parts (see the Materials and Methods). A corollary covers the covers the discrete case of a finite evenly-tempered pitch-class space (see the Materials and Methods). This fact has a singularly important musical consequence: “acoustically consonant” chords tend to be nearly T-invariant. Acoustic consonance is incompletely understood; however, most music theorists agree that chords approximating the first few consecutive pitch-classes of the harmonic series will be consonant when played with 9

harmonic tones (20). Remarkably, the structure of the harmonic series ensures that such chords will divide the octave into nearly-even parts (Table 1). The relation between acoustic-consonance and near-evenness has had an enormous impact on the development of traditional Western music. The near-evenness of traditional Western harmonic materials implies that these chords are clustered near the center of the voice-leading orbifold; for this reason, there exist transpositions of these chords that can be linked by efficient, independent voice-leadings. This is true whether the chords are transpositionally equivalent (Fig. 1[a-b]) or transpositionally distinct (Fig. 1[c]). Traditional tonal counterpoint, in its essence, consists in the exploitation of these efficient voice-leadings. They exist because of the near-evenness of the underlying sonorities, a property which is in turn attributable to classical composers’ interest in acoustic consonance. Finally, inversions (or reflections) are automorphisms of the voice-leading orbifold that again preserve the “size” of voice-leadings according to any normlike strict weak ordering. Inversionally invariant (I-invariant) chords are fixed points of some reflection; such fixed points exist for any Ix. A chord that lies near an I-invariant chord is nearly I-invariant, since there will be two I-related chords lying close to the same Iinvariant chord; this again permits small voice-leadings between them. For example, the Fs “half-diminished seventh” chord {6, 9, 0, 4} and the F “dominant seventh” chord {5, 9, 0, 3} lie near the same (I-invariant) chord {5.5, 0, 0, 3.5}: this permits the efficient voice-leading (6, 9, 0, 4)→(5, 9, 0, 3), shown in Fig. 1(c). I-invariant chords can be highly consonant, like (0, 3, 7, 10), or highly dissonant, like (0, 1, 2, 3). However, composers in most Western styles have considered I-invariant chord pairs to be “similar.” Consequently, they have frequently exploited efficient voice-leadings between inversionally-related chords. Thus we see that the geometrical properties of the orbifolds Tn/Sn give rise to a wide range of related musical practices, each of which exploits different symmetries that a chord might have. Our discussion suggests multiple avenues of further musictheoretical inquiry. First, one could investigate in detail the ways in which Western composers, performers, and improvisers have exploited the three symmetries that can 10

produce small voice-leading: for example, Schubert was fond of the near T4-invariance of the major triad (27), Wagner and Debussy exploited the near i-invariance of the “dominant seventh chord” (Fig. 1[c]), while contemporary jazz harmony frequently exploits the near t-symmetry of the perfect fifth (Fig. 1[b], top and lower-middle voice). Second, one could investigate how the mathematical properties described in this paper have influenced the broader course of music history—examining how the concern for efficient voice-leading interacted with, and presumably helped motivate, the increasing “chromaticism” of nineteenth-century music. Third, one could investigate whether distances in the voice-leading orbifold correlate with perceptual judgments of similarity among chords—a topic of considerable recent theoretical interest (28). Finally, a clear understanding of the relation between chord structure and voice-leading may suggest new techniques to contemporary composers.

11

NOTES 1. For a glossary of mathematical and musical terms and abbreviations used in this paper, see Tables S1 and S2. 2. C. Masson, Nouveau Traité des Regles pour la Composition de la Musique (Da Capo, New York, 1967 [1694]). 3. O. Hostinsk´y, Die Lehre von den musikalischen Klangen (H. Dominicus, Prague, 1879). 4. A. Schoenberg. Theory of Harmony (University of California Press, Berkeley, 1978). 5. J. K. Wright, A. S. Bregman. Contemporary Music Review 2, 63 (1987). 6. J. D. Heinichen, Der General-Bass in der Composition (G. Olms Verlag: Dresden, 1728). 7. A. v. Oettingen. Harmoniesystem in dualer Entwicklung (W. Gläser, Leipzig, 1866). 8. H. Riemann, “Die Natur der Harmonik,” Sammlung musikalisher Vorträge (Breitkopf & Härtel, Leipzig, 1882). 9. J. Roeder. A Theory of Voice Leading for Atonal Music. Ph.D. thesis, Yale University (1984). 10. J. Roeder. Perspectives of New Music 25, 362 (1987). 11. R. Cohn. Music Analysis 15, 9 (1996). 12. R. Cohn. Journal of Music Theory 41, 1 (1997). 13. D. Lewin. Journal of Music Theory 42, 15 (1998). 14. R. Morris. Music Theory Spectrum 20, 175 (1998). 15. C. Callender. Journal of Music Theory 42, 219(1998). 16. A. Childs. Journal of Music Theory 42, 181(1998).

12

17. J. Douthett, P. Steinbach, Peter. Journal of Music Theory 42, 241(1998). 18. J. Straus. Music Theory Spectrum 25, 305 (2003). 19. C. Callender, Music Theory Online 10.3 (2004). 20. I. Satake. Proceedings of the National Academy of Sciences 42, 359 (1956). 21. W. Thurston. Three Dimensional Geometry and Topology (Princeton Mathematical Series 35, Princeton University Press, Princeton, 1997). 22. See the “materials and methods” section. 23. C. Callender, personal communication. 24. J. Bernard, Music Theory Spectrum 21, 1 (1999). 25. R. Cohn. Journal of Music Theory 35, 1 (1991). 26. W. Sethares. Tuning, Timbre, Spectrum, Scale (Springer, New York, 1998). 27. R. Cohn, Nineteenth-Century Music 22, 213 (1999). 28. I. Quinn. Perspectives of New Music 39, 108 (2002).

13

a) a common classical upper-voice I-IV-I-V-I pattern

& ww w

(0, 4, 7) I

ww w

→ (0, 5, 9) IV

w ? # www



ww w

(0, 4, 7) I

w n www

ww w

→ (11, 2, 7) → V

b wwww

b) a common jazz-piano “left-hand” voice-leading pattern

(6, 11, 0, 4) → D7

w & # www

(5, 9, 11, 4) G7

→ (4, 9, 10, 2) → C7

b www nw

ww w

(0, 4, 7) I

w b www (3, 7, 9, 2) F7

c) Wagner, Parsifal (simplified) and Debussy, Prelude to the Afternoon of a Faun

# # www # w

(1, 4, 8, 10) →

(6, 9, 0, 4) → (5, 9, 0, 3)

d) in the style of Gyorgy Ligeti

b w & ww

(11, 0, 1)

www



(0, 1, 11)

b b www w

(2, 5, 8, 10)

www →

(11, 1, 0)

Figure 1. Efficient voice-leading in the Western tradition. Numbers correspond to pitch-classes, with C = 0, Cs = 1, etc. The voice-leadings in (a)-(c) are minimal voice-leadings containing no “voice-crossings.” That in (d) is non-minimal, and contains crossings. The four examples exploit three different kinds of near-symmetry: translation in (a) and (b), reflection in (c), and permutation in (d).

11

00

12

01 02

1e 2t

13

4t

39 48

6t

58 57

68 67 77

88

99

[2t]

19 1t 0t

9e te tt

[39] 29

09

9t

89

38

18

8e

[48]

28

08

8t

79 78

17

7e 7t

69

37 27

07

[57] 47

36

16

6e

56 46

26

06

5e

59

35

15

[66]

55 45

25

05

5t

49

34

14

4e

44

24

04

3e 3t

33 23

03

2e

66

22

[1e] 0e

ee

[00]

Figure 2. The orbifold T2/S2, drawn using a Euclidean metric Labelled points in the space correspond to equal-tempered dyads; the symbols “t” and “e” refer to 10 and 11, respectively. The left “edge” is identified, with a half-twist, with the right. The two voice-leadings (0, 1)→(1, 0) and (4, 11)→(5, 10) are shown on the graph; the first of these is reflected off the figureʼs mirror boundary.

Number of Notes 2 (dyad) 3 (triad)

The equal-tempered chord providing the best approximation to the lowest pitch-classes of the harmonic series (0, 7) fifth (0, 4, 7) major

4 (seventh chords)

dominant

(0, 4, 7, 10)

5 (ninth chords) 7 (scales)

dominant ninth

(0, 2, 4, 7, 10)

melodic minor (ascending form)

(0, 2, 4, 6, 7, 9, 10)

Other chords providing reasonably good approximations to the lowest pitch-classes of the harmonic series

diminished minor augmented diminished half-diminished minor major pentatonic

(0, 3, 6) (0, 3, 7) (0, 4, 8)

major harmonic minor

(0, 2, 4, 5, 7, 9, 11) (0, 2, 3, 6, 7, 9, 10)

(0, 3, 6, 9) (0, 3, 6, 10) (0, 3, 7, 10) (0, 4, 7, 11) (0, 2, 4, 7, 9)

Table 1. Familiar sonorities used in Western music. The sonorities on the left provide the best equal-tempered approximations to the first n pitch-classes of the harmonic series. The commonly-used sonorities on the right lie also approximate the first n pitch-classes of the harmonic series. All sonorities divide pitch-class space fairly evenly.

MATERIALS AND METHODS TABLE OF CONTENTS 1. Comparing voice-leadings

S1

2. Minimal voice-leadings and voice-crossings

S4

3. A polynomial-time algorithm for finding a minimum voice-leading between two chords

S8

4. Derivation of the voice-leading orbifolds

S10

5. Efficient voice-leading and symmetry

S12

6. Evenness and transpositional invariance

S16

1. Comparing voice-leadings. Let a be an element of R/12Z. We define the norm of a, written |a|12Ζ, as the smallest real number |x| such that x and a are congruent mod 12Z. (Here |x| refers to the standard absolute-value function.) The distance between two pitch-classes a and b is |b – a|12Ζ. We define the displacement multiset associated with a voice-leading A→B as the multiset of distances |bj – ai|12Ζ for all (ai, bj) in A→B. For example, the displacement multiset associated with the voice-leading (0, 0, 4, 7)→(11, 2, 5, 7) is {1, 2, 1, 0}. We will require that any method of comparing voice-leadings depend only on their displacement multisets: for any two displacement multisets X and Y, it tells us which, if any, is larger. More formally, a method of comparing voice-leading size will be an asymmetric, negatively transitive relation (a strict weak order) over multisets of nonnegative reals. (A relation “>” is “asymmetric” if A > B implies that not B > A. It is “negatively transitive” if A > B implies that either A > C or C > B, for all C.) A strict weak order defines equivalence classes consisting of all non-comparable items: A ≡ B iff neither A > B nor B > A. Strict weak orders are stronger than partial orders, since they satisfy the trichotomy axiom: for any two elements in a strict, weakly ordered set, either A > B, A ≡ B, or B > A. However, a strict weak order is weaker than a total order, since it does not satisfy the “antisymmetry” condition: in a strict weak order, A ≡ B does not imply that A and B are the same object. S1

Let > be a strict weak order of multisets of nonnegative reals. We will say that the relation > is normlike if and only if it satisfies two constraints.

{x1, x2, …, xm, c} > {y1, y2, …, yn, c} implies {x1, x2, …, xn} > {y1, y2, …, yn} (Recursion) {x1 + i, x2, …, xn} ≥ {x1, x2 + i, …, xn} ≥ {x1, x2, …, xn}, for x1 > x2, i > 0 (Distribution) (NB: since multisets are unordered the numerical subscripts do not have ordinal significance: x1 is no more “first” than x2 or xn.) The recursion constraint mandates a predictable relationship between the size of a multiset and the size of its sub-multisets. The distribution constraint’s first inequality requires that if X is an n-element multiset whose values sum to x, then {x, 0, 0, …, 0} ≥ X ≥ {x/n, x/n, …, x/n}. Thus, x semitones of motion in a single “voice” yields at least as large a voice-leading as x semitones of motion distributed over multiple voices. As we will see below, this constraint is closely related to the triangle inequality. The distribution constraint’s second inequality requires that reducing the size of an element in a displacement multiset not make that multiset larger. If a normlike strict weak order strictly satisfies both of the distribution constraint’s inequalities, we will say that it strictly satisfies the distribution constraint. At present, every music-theoretical method of measuring voice-leading size produces a normlike strict weak order of multisets of non-negative reals. All but one strictly satisfy the distribution constraint. A. “Smoothness.” The size of a voice-leading is the sum of the elements of the displacement multiset (S1, S2, S3). This is sometimes called “taxicab norm.” Smoothness satisfies the distribution constraint non-strictly. B. Smoothness is analogous to the L1 vector norm, though the components of vectors are ordered whereas the elements of displacement multisets are not. The analogues to Lp vector norms strictly satisfy the distribution constraint for finite p > 1. (The L∞ vector norm also satisfies the distribution constraint, but not strictly.) The L2 vector norm, which has been used by Callender (S4), corresponds to Euclidean norm.

S2

C. “Parsimony.” Parsimony generalizes a notion introduced by Richard Cohn and developed by Jack Douthett and Peter Steinbach (S5, S6). Given two voiceleadings, α and β, α is smaller (or “more parsimonious”) than β iff there exists some real number j such that 1) for all real numbers i > j, i appears the same number of times in the displacement multisets associated with α and β; and 2) j appears fewer times in the displacement multiset of α than β. D. “Smoothness then parsimony.” This measure represents my own best hypothesis about how classical composers might have thought about voiceleading size. Given two voice-leadings α and β, α is smaller than β iff: 1) α is smoother than β; or 2) α and β are equally smooth, and α is more parsimonious than β. Many of these methods of measuring voice-leading size yield mathematical “norms”: there is some function f from multisets to the real numbers, such that f(X) > f(Y) if and only if X > Y according to the normlike strict weak order >. Note, however, that neither “parsimony” nor “smoothness then parsimony” can give rise to such a function f. Nevertheless, both “parsimony” and “smoothness then parsimony” represent musically viable ways of thinking about voice-leading size. For this reason, we cannot simply impose the mathematically-convenient requirement that measurements of voice-leading size produce “norms” or “metrics.” However, both “parsimony” and “smoothness then parsimony” are very closely related to traditional norms. “Parsimony” refines the L∞ vector norm, according to which the size of a voice-leading is given by the largest element in its displacement multiset. Given two voice-leadings α and β, if α < β according to the L∞ norm then α is more parsimonious than β. However, the converse does not hold: the voice-leadings {3, 3} and {3, 0} have the same L∞ norm but the first is less parsimonious than the second. “Parsimony” is therefore closely related to, but slightly more fine-grained than the L∞ norm. “Smoothness then parsimony” stands in an analogous relation to “smoothness,” the L1 vector norm. For this reason, we can often reason about “smoothness” and “smoothness then parsimony” using our geometric intuitions about the L∞ and L1 norms. This point holds more generally. As the name suggests, the notion of a normlike strict weak order is a weakened analogue to a traditional geometrical “norm.” We can S3

think of the displacement multiset associated with the voice-leading A→B as a non-realvalued “norm” of the voice-leading A→B. Likewise, the displacement multiset associated with the minimal voice-leading A→B is analogous to a non-real-valued “distance” between A and B. This non-real-valued “distance” has many of the properties associated with a proper mathematical metric: 1. It is symmetric, since the displacement multiset associated with the minimal voice-leading A→B is the same as the displacement multiset associated with the minimal voice-leading B→A. 2. The minimal voice-leading A→A has displacement multiset {0, 0, …, 0}, which is at least as small as any other displacement multiset with the same number of elements. In this sense, the “distance” between A and A is as small as it can be. 3. If the displacement multiset associated with the minimal bijective voice-leading A→B is {0, 0, …, 0} then A = B. 4. Finally, the distribution constraint is closely related to the triangle inequality. Indeed, as long as we require that the size of a voice-leading depend only on the size of its displacement multiset, then the two principles are equivalent: any violation of the distribution constraint generates a violation of the triangle inequality, and vice-versa. (This is fairly obvious in the case of the metrics associated with the Lp vector norms, and less than obvious in the general case. I sketch a proof at the end of §2, below.) Intuitively, both the distribution constraint and the triangle inequality express the principle that x steps in a single direction take you farther than x total steps in a number of mutually orthogonal directions. 2. Minimal voice-leadings and voice-crossings. The following theorem shows that between any two chords there is a minimal voice-leading with no “voice-crossings” in pitch-class space. Since avoidance of voice-crossings is a feature of traditional Western musical practice, it helps justify our use of normlike strict weak orders; it furthermore allows us to generate an efficient algorithm for determining the minimal voice-leading between two chords.

S4

THEOREM 1. Let A and B be any two chords, and let our measure voiceleading size be a strict weak order satisfying the distribution constraint. There will exist a minimal voice-leading from A to B, (a1, a2, …, an)→(b1, b2, …, bn), that has no “voice-crossings” in pitch-class space. That is, there will exist a set of continuous functions fn(t) such that fn(0) = an, fn(1) = bn, and fm(t) ≠ fn(t), for all m ≠ n, and all t such that 0 < t < 1. Furthermore, if our order strictly satisfies the distribution constraint, then every minimal voice-leading between A and B will be crossing-free. The theorem is proved by a simple examination of cases. Suppose that a voice-leading A→B contains a crossing; we will show that we can remove the crossing without increasing the size of the voice-leading and without creating any new crossings. In what follows, will depict pitch-class space as a circle with ascending motion in pitch-class space corresponding to clockwise motion around the circumference. It is always assumed that 0 < x, x + m, x + n ≤ 6. Note that although the following proof is stated in terms of pitch-classes, a precisely analogous result applies to pitches; here, “chords” are simply multisets of real numbers, and there is always a minimal voice-leading with no crossings in pitch-space. Figure S3(a) shows the first geometrical possibility: pitch-class a1 moves n semitones counterclockwise to b2 while pitch class a2 moves x + m semitones counterclockwise to b1, with 0 ≤ n < m. The uncrossed voice-leading (a1, a2)→(b1, b2) has displacement multiset {m, x + n}. Since m > n, the distribution constraint implies that {m, x + n} ≤ {x + m, n}. The uncrossed voice-leading is no larger than the voice-leading with the crossing; if the strict weak order strictly satisfies the distribution constraint, then the uncrossed voice-leading is smaller. Figure S3(b) shows a second possibility: a1 moves clockwise by n semitones to b2, while a2 moves counterclockwise by x + m semitones to b1, with m ≥ 0, x > n > 0. The voice-leading (a1, a2)→(b2, b1) is associated with the displacement multiset {n, x + m}; the voice-leading (a1, a2)→(b1, b2) is associated with {m, x – n}. By the distribution constraint, {x + m, n} ≥ {x, n + m} ≥ {x – n, m}, so the uncrossed voice-leading is no larger than the crossed voice-leading. If the strict weak order strictly satisfies the distribution constraint, the uncrossed voice-leading is smaller. S5

Figure S3(c) shows a third possibility. m + n > x, since otherwise there would be no crossing. This implies x – m < n and x – n < m. Therefore {m, n} ≥ {x – m, x – n}, and the uncrossed voice-leading is no larger the crossed voice-leading. Again, if the strict weak order strictly satisfies the distribution constraint then the uncrossed voiceleading is smaller. The remaining cases are closely analogous to those already considered, and are left for the interested reader to verify. It remains to be shown that we can follow the above procedures without creating any new voice-crossings. This is readily seen from Figure S4. Without loss of generality, we can choose points b1 and b2 in Figure S4 to be adjacent. We connect every note in the source chord to its destination by a path that has no unnecessary crossings, as in Figure S4. Figure S4(a) features the crossing (a1, a2)→(b2, b1), as well as two additional types of voice-crossing: c1→d1, which crosses the line a1→b2, and c2→d2, which crosses both a1→b2 and a2→b1. Figure S4(b), which removes the crossing (a1, a2)→(b2, b1), shows that the remaining crossings c1→d1 and c2→d2 are unaffected. Removing the crossing therefore reduces the total number of voice-crossings in the voice-leading. The crossings shown in Figure S4, along with those that can be obtained from this figure by reflection, exhaust the relevant geometrical possibilities. We conclude that it is possible remove a voice-leading’s crossings without making the voice-leading larger. If our normlike strict weak order strictly obeys the distribution constraint, then removing voice-crossings will always make the voice-leading smaller. Theorem 1 is significant because it ties an important musical notion, “voicecrossing,” to an important mathematical one: the triangle inequality, as represented by its close cousin, the distribution constraint. It is widely accepted that avoidance of “voicecrossings” in pitch space is a feature of traditional Western compositional practice (S7). Theorem 1, which can easily be adapted to cover the case of voice-leadings in noncircular pitch space, shows that normlike strict weak orderings are compatible with this feature of classical practice. Moreover, it is easy to show that if a method of comparing voice-leading size violates the distribution constraint, then there will be at least one “crossed” voice-leading (in either pitch or pitch-class space) that is preferred to its S6

uncrossed alternative. Thus the distribution constraint and the principle of avoiding voice-crossings are equivalent within the limits of the formalism we have developed. At the same time, the distribution constraint is closely related to the triangle inequality. This allows us to use the minimal voice-leading between two chords to define a “distance” between them, thereby underwriting the geometrical approach of the present paper. Again, Theorem 1 is interesting precisely because it shows that our reference to the geometrical concept of “distance” requires that we not prefer crossed voice-leadings to their uncrossed alternatives. Consequently, were classical composers to have favored voice-crossings, we would not be able to able to speak of the “distance” between chords in the relatively straightforward way that we do here. We would be constrained to talk only about the affine structure of musical chords—roughly, those non-metric properties that depend only on the existence of “straight lines” in the space. We conclude this section with a brief sketch of a proof that the distribution constraint is equivalent to the triangle inequality. Let A and C be chords. The triangle inequality requires that a bijective voice-leading A→C be no larger than combined length of any pair of bijective voice-leadings A→B and B→C, that takes A to C by way of B in such a way as to preserve the mappings of the “direct” voice-leading A→C. It is straightforward to identify the displacement multiset associated with A→B→C when A, B, and C are collinear: one simply adds the elements of the displacement multisets associated with A→B and B→C so as to be faithful to the musical voices’ motions. The displacement multiset associated with non-collinear A→B→C, if defined, is simply the displacement multiset associated with A→B→D, with A, B, and D collinear and B→C the same size as B→D. A normlike strict weak ordering does not ensure that there is a displacement multiset associated with all paths A→B→C; but it does ensure if there is, it is smaller than that associated with the direct voice-leading A→C. To see why, suppose there is some crossed voice-leading between chords A and C that is preferred to the uncrossed voice-leading A→C. There will be a pair of voiceleadings A→B→C that has the same combined displacement multiset as the crossed voice-leading but which preserves the mappings of the “direct” voice-leading A→C. (Here B is the point where the two voices cross as they move linearly from notes in A to S7

their counterparts in C.) Since the crossed voice-leading is preferred, the combined voice-leadings A→B→C are smaller than A→C, which violates the triangle inequality. Conversely, suppose there is a triangle ABC such that the combined voice-leadings A→B→C are smaller than the “direct” voice-leading A→C. There is a voice-leading A→D with the same displacement multiset as A→B→C. Since A→B→C form two legs of a triangle, it is easy to show that the preference for A→D over A→C must violate the distribution constraint. 3. A polynomial-time algorithm for finding a minimum voice-leading between two chords. Given two chords A and B, how do we find a minimal voice-leading between them? The question is non-trivial, since minimal voice-leadings need not be bijective: using any of the standard measures of voice-leading size, the minimal voice-leading between {0, 4, 6} and {6, 10, 0} is (0, 0, 4, 6)→(10, 0, 6, 6). The large number of possibilities here—roughly 2mn, where m and n are the cardinalities of the two chords—makes an exhaustive search impractical, particularly in time-critical applications such as interactive computer music. However, Theorem 1 enables us to use the technique of “dynamic programming,” common in computer science, to provide an efficient, polynomial-time algorithm (order n2m) for determining a minimal voice-leading between arbitrary chords. Define the ascending distance from pitch-class a to b as the smallest positive real number x such that a + x is congruent to b, mod 12Z. Let (a1, a2, …, am, am+1 = a1) order the elements of chord A based on ascending distance from arbitrarily-chosen a1. (Note that we repeat the first element a1 as the last element of the list.) Similarly, for (b1, b2, …, bn, bn+1 = b1). The notation [a1, …, ai]→[b1, … bj] will refer to all voice-leadings from {a1, a2, …, ai} to {b1, b2, … bj}, that can be notated so that both chords’ subscripts are in nondecreasing order. Thus [a1, a2]→[b1, b2, b3] includes (a1, a1, a2)→(b1, b2, b3), (a1, a1, a2, a2)→(b1, b2, b3, b3), and so on. If a crossing-free voice-leading contains the pair (ai, bj) then it must contain at least one of the following: (ai-1, bj), (ai, bj-1), or (ai-1, bj-1) (subscript arithmetic modulo the cardinality of the chords). By the recursion constraint, the smallest voice-leading of the S8

form [a1, …, ai]→[b1, … bj] will be the voice-leading that adds the pair (ai, bj) to the smallest voice-leading of the form [a1, …, ai-1]→[b1 … bj], [a1, …, ai]→[b1 … bj-1], or [a1, …, ai-1]→[b1 … bj-1]. Thus, once we have fixed the pair (a1, b1) we can recursively compute the minimal voice-leading between A and B that contains that pair. We do this by creating a matrix whose cells ei, j record the size of the minimal voice-leading of the form [a1, …, ai]→[b1, … bj]. It is trivial to fill in the first row and column of the matrix; from there, we can proceed to fill in the rest. At each step, we need only consider the voice-leadings in a cell’s upper, left, and upper-left neighbors. Figure S5 illustrates the technique, identifying the smallest voice-leading between the C and E major-seventh chords, {4, 7, 11, 0} and {4, 8, 11, 3}, such that the voiceleading contains the pair (4, 4). In constructing this matrix we have used “smoothness” (or “taxicab norm”) to measure the voice-leading size. The voice-leading in the bottomright cell is the minimal voice-leading between the two chords that contains (4, 4). To remove this last restriction, we would need to repeat the calculation three more times, each time cyclically permuting the order of one of the chords so as to fix a different initial pair. As it happens, however, the voice-leading shown in Figure S5 is the minimum voice-leading between the respective chords. This follows from the fact that the voice-leading in the top-left cell (4→4) contributes nothing to the overall size of the voice-leading; we can therefore add this mapping to any voice-leading without increasing its size according to the L1 norm. Figure S5 includes in each cell both the numerical size of the voice-leading and the voice-leading itself. With the L1 norm (“smoothness”) this is unnecessary: we need to keep track of the size, but not the voice-leading. To determine the value of cell ei,j we can simply add the distance between the pair (ai, bj) to the minimum value in the cells ei-1, j ei, j-1, and ei-1, j-1. (With the Euclidean metric we can calculate squared distance in this way, taking the square-root just before output.) Having filled in the matrix, we can recover the minimal voice-leading by “tracing back” all paths that move from the bottomright cell to the top left, moving only north, west, and northwest, such that the size of the

S9

voice-leading decreases as much as possible with each step. The cells in boldface indicate the path that such a traceback algorithm would take. Due to the circular structure of pitch-class space, the voice-leading in the lower right-hand corner of the matrix counts the pair (a1, b1) = (am+1, bn+1) twice; this can easily be corrected prior to output. Finally, note that need only consider n distinct possibilities to find a minimal bijective voice-leading A→B. Let (a0, a1, … , an-1) order the elements of chord A based on ascending distance from arbitrarily-chosen element a0. Similarly for (b0, b1, … , bn-1). By Theorem 1, there will be a minimal bijective voice-leading between A and B of the form (a0, a1, … , an-1)→(bc, bc+1, … , bc+n-1), where c is an integer and the subscript arithmetic is reduced mod n. 4. Derivation of the voice-leading orbifolds. We begin by deriving Figure 2 in the main text. Figure S6 shows the 2-torus T2, drawn using a Euclidean metric, and representing ordered 2-note chords. To form a graph of unordered chords we need to identify all points (x, y) and (y, x). As can be seen from Figure S6, this involves “folding” the 2torus along the diagonal line AB. The result is a “triangle” whose two sides are identified, shown in Figure S7. Although it may not be immediately obvious, this figure is a Möbius strip. To see why, cut Figure S7 along the line CD. This creates two detached triangles. Then glue line AC on one triangle to CB on the other. (You will have to turn one piece of paper over to get the chords to line up.) The result is the main text’s Figure 2. We now proceed more abstractly, describing the orbifolds Tn/Sn for arbitrary n. For simplicity of exposition and ease of visualization, we will assume the Euclidean metric in what follows. Since pitch-class space is represented by the circle R/12Z we are interested in the n-torus (R/12Z)n. The quotient space (R/12Z)n/Sn can also be written Rn/(Sn × 12Zn), where Zn refers to the group of n-tuples of integers, and the Zn action is by componentwise addition. (The notation 12Zn indicates that the components of each ntuple in Zn are to be multiplied by the scalar 12.) We will proceed by deriving a fundamental domain of Sn × 12Zn in Rn. (A “fundamental domain” of the group Γ in S10

space S is a region R of S, such S is the union of the regions gR, for all g ⊂ Γ, and such that the intersection of any two regions gR and hR, for g ≠ h, has no interior.) By identifying the appropriate boundary points of this fundamental domain, we will obtain the orbifold Rn/(Sn × 12Zn). We first describe a fundamental domain of Sn in Rn. In this region, no two distinct points (x1, x2, …, xn) and (y1, y2, …, yn) have coordinates that are equivalent under some permutation: that is, there is no σ(n) such that (x1, x2, …, xn) = (yσ(1), yσ(2),…, yσ(n)), where σ(n) is some permutation of the integers from 1 to n. We can create such a region simply by requiring that a point’s coordinates be in nondescending order: i.e. considering all points (x1, x2, …, xn) such that x1 ≤ x2 ≤ … ≤ xn. We can incorporate the 12Zn action by requiring that xn ≤ x1 + 12, and 0 ≤ Σnxn≤ 12. In Euclidean space, the resulting fundamental domain is a right hyperprism whose faces are n-1 dimensional simplexes. To see why, observe that 1. The n inequalities x1 ≤ x2 ≤ … ≤ xn ≤ x1 + 12 define an n-1 simplex in every plane Σnxn = n. 2. Addition by (c, c, …, c) sends the simplex in the plane Σnxn = n to the simplex in the plane Σnxn = n + cn. 3. The planes Σnxn = n are perpendicular to the vector (1, 1, …, 1). The vector (1, 1, …, 1) points in the direction of the “height” coordinate of the prism; the prism’s “faces” lie in planes perpendicular to the vector (1, 1, …, 1) and therefore contain chords whose pitch-classes sum to the same value. Our construction of the fundamental domain ensures that no two points on a single plane Σnxn = n can represent the same chord. However, the planes do contain chords related by transposition: if (x1, x2, …, xn) satisfies the inequalities x1 ≤ x2 ≤ … ≤ xn ≤ x1 + 12, then so does the transpositionally-related (x2 – 12/n, x3 – 12/n, …, xn – 12/n, x1 + 12 – 12/n), which has the same sum as (x1, x2, …, xn). Let O refer to the function that sends (x1, x2, …, xn) to (x2 – 12/n, x3 – 12/n, …, xn – 12/n, x1 + 12 – 12/n). By repeatedly applying O to a chord X, we can obtain n transpositionally equivalent chords X, O(X), O2(X), … On-1(X) whose pitch-classes all sum to the same value. (If X is invariant under some transposition, then some of the chords On(X) will be equal.) In Euclidean space, O S11

is an orthogonal transformation that is an automorphism of the prism: it is a rotation when the prism has an odd number of dimensions, and a rotation-plus-reflection otherwise. O acts so as to cyclically permute the vertices of the simplex in each plane Σnxn = n. It remains to be determined how the two simplicial faces of the prism are to be identified. We cannot identify them in the obvious way, since this would identify point (x1, x2, …, xn) on the Σnxn = 0 face of the prism with the transpositionally-distinct chord (x1 + 12/n, x2 + 12/n, …, xn + 12/n) on the Σnxn = 12 face. Notice, however, that

O(x1 + 12/n, x2 + 12/n, …, xn + 12/n) = (x2, x3, …, xn, x1 + 12)

(3)

(x2, x3, …, xn, x1 + 12) represents the same chord as (x1, x2, …, xn). We therefore need to identify (x1, x2, …, xn) with O(x1 + 12/n, x2 + 12/n, …, xn + 12/n) = (x2, x3, …, xn, x1 + 12). Colloquially, we apply the transformation O to the Σnxn = 12 face before “gluing the two faces together.” This identification transforms the prism’s “height coordinate” into a circle: in moving parallel to the vector (1, 1, …, 1) we pass through all and only the transpositions of a given chord, returning eventually to our starting point. Thus we can describe the orbifold (R/12Z)n/Sn as the product of a n-1 simplex with a circle, modulo the action that rotates the circle by 360/n degrees while applying the transformation O to the simplex. 5. Efficient voice-leading and symmetry. Let A be an n-note chord and let (a1, a2, …, an) be an arbitrary ordering of its elements. The symbol σ(a1, a2, …, an) will refer to the ordering (aσ(1), aσ(2),…, aσ(n)), where σ(n) is some permutation of the integers from 1 to n. We will use the notation A→σ(A) to refer to any voice-leading A→A that can be written (a1, a2, …, an)→(aσ(1), aσ(2),…, aσ(n)) An arbitrary n-note chord S will be invariant under σ (or σ-invariant) if the chord’s elements can be labeled so that si = sσ(i) for all i ≤ n. S12

(4)

In what follows, we will use the variable O to refer to a specific permutation σ, transposition Tx, or inversion Iy. We will say that an n-note chord S is invariant under O if there is some voice-leading S→O(S) that is trivial. We will generally assume that O itself is non-trivial: that is, there is at least one chord that is not invariant under O. Thus we will not be considering the trivial permutation σ(n) = n or the trivial transposition T0(x) = x. It is intuitively obvious that the size of a voice-leading A→S, where S is invariant under some O, sets an upper bound on the size of the minimal voice-leading A→O(A). This is because we can express the voice-leading A→O(A) as the composition of two equally-sized voice-leadings A→S and S→O(A). (For any A→S, we can find an equally large S→O(A), since S is invariant under O and since a normlike strict weak order is insensitive to the “direction” of the voice-leading.) Write the displacement multiset corresponding to A→S as {d1, d2, …, dn}. We can conclude that the minimal voiceleading A→O(A) can have a displacement multiset no larger than {2d1, 2d2, …, 2dn}. Thus as the size of the voice-leading A→S goes to zero, the minimal voice-leading A→O(A) must also go to zero. The converse, however, is less obvious. Suppose we have some bijective voiceleading A→O(A). Does the size of A→O(A) set an upper bound on the size of the minimal voice-leading A→S, where S is O-invariant? Yes, assuming such an S exists. The following theorem uses the size of A→O(A) to limit the size of A→S, showing that as A→O(A) vanishes so must A→S. Since the result is proven for any normlike strict weak order, it does not set a very tight (or interesting) limit on the voice-leading A→S. However, it does establish the general theoretical point that the size of A→O(A) is dependent on that of A→S. Lemma 2.1. Let A be a chord with n elements, let x be some element of R/12Z such that nx is congruent to 0, mod 12Z, and let A→σ(A) be a bijective voiceleading that acts as a cyclical permutation of A’s elements. Label the pitchclasses of A so that the voice-leading A→σ(A) can be written (a0, a1, …, an-1)→(a1, a2, …, an-1, a0) There is a voice-leading A→Tx(A) has the form S13

(a0, a1, …, an-1)→(x + a1, x + a2, …, x + an-1, x + a0) with displacement multiset consisting of values di = |x + (ai+1 – ai)|12Z (subscript arithmetic mod n). There must therefore exist a voice-leading A→S, from A to some chord S that is invariant under Tx (or σ, if x = 0) and which has displacement multiset. m–1

m–1

m–1

m

m+1

n–2

i =0

i =1

i=m–1

i =m

i =m

i =m

{Σ di, Σ di, …, Σ di, Σ di, Σ di, …, Σ di, 0} Here m = n/2, the greatest integer ≤ n/2. Proof. The value di = |x + (ai+1 – ai)|12Ζ measures how close the interval ai+1 – ai is to –x. As Figure S8 shows, we need to move at most n/2 pitch-classes by |x + (ai+1 – ai)|12Ζ semitones in order to make a given interval ai+1 – ai equal to –x. We can do so, furthermore, without disturbing any of the other di, for i < n-1. Only the intervals ai+1 – ai and dn-1 = W need be disturbed. Since the voice-leading acts as a circular permutation, and since nx is congruent to 0, mod 12Z, we need iterate this procedure only n–1 times in order to obtain a set that is invariant under Tx or σ (if x = 0): once we set n–1 of the intervals equal to –x, the final “wraparound” interval—labeled W on Figure S8—will also be equal to –x. Note that since our choice of a1 is arbitrary, we can choose W so as to minimize the resulting voice-leading A→S. Lemma 2.2. Let A→Ix(A) be a crossing-free, bijective voice-leading. Since A→Ix(A) is crossing-free, it can be written in the form (a0, a1, …, an-1)→(x – ac-0, x – ac-1, …, x – ac-(n-1)) with displacement multiset {d0, d1, …, dn-1} (subscript arithmetic is mod n). We can therefore find an S such that S is invariant under Ix and the voice-leading A→S has displacement multiset no larger than {d0/2, d1/2, …, dn-1/2}. Proof. The crossing-free voice-leading A→Ix(A) associates pitch-class ai with x – ac-i (subscript arithmetic mod n). There are two cases to consider, i = c/2, in which case the S14

voice-leading associates ac/2 with x – ac/2, and i ≠ c/2, in which case the voice-leading associates ai with x – ac-i and ac-i with x – ai. Case 1. i = c/2. Our voice-leading associates ai with x – ai; the distance between these two points is |x – 2ai|12Ζ. Now consider the two minimal-length linear paths in pitchclass space: the first from ai to x – ai and the second its retrograde, from x – ai to ai. These paths are reflection symmetrical under Ix: every point ai + ε along the path ai→(x – ai) is mapped by Ix to the point x – (ai + ε) along the path (x – ai)→ai. Therefore, the midpoint af = x – af is fixed by the reflection. Consequently, we can move ai by |x – 2ai|12Ζ/2 semitones to obtain a pitch-class that is invariant under Ix. Case 2. i ≠ c/2. Let j = c – i. Our voice-leading associates ai with x – aj and aj with x – ai. Both ai and aj are mapped to pitch-classes |x – (ai + aj)|12Ζ semitones away. Consider the two minimal linear paths ai→x – aj and aj→x – ai. If we reverse the direction of the second path, we obtain two equal-length paths ai→x – aj and x – ai→aj that are reflection-symmetrical under Ix: every point ai + ε along the path from ai→x – aj is mapped to the point x – (ai + ε) along the path x – ai→aj. The points halfway along these paths, af and x – af, are related by Ix. Therefore, we can move each pitch-class by |x – (ai + aj)|12Ζ/2 semitones to obtain a pair that is invariant under Ix. THEOREM 2. Let A be an n-note chord, let O be a non-trivial permutation, transposition, or inversion such that there exists an n-note chord that is invariant under O. Then, if the displacement multiset associated with A→O(A) is smaller than the n-element multiset {d, 0, 0, …, 0}, there will be a voice-leading A→S, with S is invariant under O, and displacement multiset less than or equal to {d, d, 2d, 2d, 3d, 3d, …, n/2d, 0}. The term n/2d appears once for even n, twice for odd n. Proof. By the distribution constraint, the displacement multiset corresponding to the voice-leading A→O(A) has no terms greater than or equal to d. There are three cases to consider, depending on whether O is a permutation σ, a transposition T, or an inversion I.

S15

Case 1. O is a permutation σ. Since any permutation can be decomposed into cycles, we simply apply Lemma 2.1 to obtain a voice-leading A→S that is no larger than {d, d, 2d, 2d, 3d, 3d, …, n/2d, 0}, with S invariant under σ. Case 2. O is a nonzero transposition Tx. By Theorem 1, there exists a crossingfree voice-leading A→Tx(A) whose displacement multiset consists of values less than d. Any crossing-free voice-leading can be decomposed into cycles of the form: (a0, a1, …, an-1)→(x + a1, x + a2, …, x + an-1, x + a0) Thus we can again apply Lemma 2.1 to obtain the desired voice-leading. Case 3. O is an inversion Ix. By Theorem 1, there exists a crossing-free voiceleading A→Ix(A) whose displacement multiset consists of values less than d. By Lemma 2.2, there exists a voice-leading A→S, such that S is invariant under Ix, and with displacement multiset less than or equal to {d/2, d/2, …, d/2}. By the distribution constraint, this multiset is less than or equal to {d, d, 2d, 2d, 3d, 3d, …, n/2d, 0}.

6. Evenness and transpositional invariance. We begin with an informal argument describing the relation between “evenness” and T-invariance. By Theorem 1, there will be a minimal bijective voice-leading A→Tx(A) of the form (a0, a1, …, an-1)→(ac + x, ac+1 + x, …, ac+n-1 + x), where c is some integer and the subscript arithmetic is reduced mod n. The displacement multiset associated with this voice-leading will consist of the distances |x + (ac+i – ai)|12Ζ. For a chord that divides the octave nearly-evenly, the values ac+i – ai are nearly-constant for all c. (This is simply because the distance between the values ac+i – ai measures how evenly the chord divides c octaves into n equal parts.) Thus, for every c, there will be an x for which ac+i – ai ≅ –x, for all i. The “cyclical” component of the voice-leading offsets the “parallel” component. For chords that evenly divide the octave, the quantities ac+i – ai can be made to approximate –x as closely as is possible for n-note chords. We now provide a rigorous proof of this last statement.

S16

THEOREM 3. Let A be any multiset of cardinality n. For all x, the minimal bijective voice-leading between A and Tx(A) can be no smaller than the minimal bijective voice-leading between E and Tx(E), where E divides pitch-class space into n equal parts. In proving Theorem 3 it is again convenient to work in pitch-space, or Rn. (Note that we do not assume the Euclidean metric on this space.) We will use the symbol ≡nZ to mean “congruent mod nZ.” The symbol applies to both scalars and ordered n-tuples: thus –2.5 ≡12Z 9.5, and (0, 4, 7) ≡12Z (-12, 4, 19). Each chord is represented by an infinite number of points in Rn, all congruent mod 12Z. A voice-leading between two points X, Y ⊂ Rn will simply be the ordered n-tuple X – Y = (x1 – y1, x2 – y2, … xn – yn). The displacement multiset associated with this voice-leading will be the multiset (|x1 – y1|, |x2 – y2|, … |xn – yn|). Clearly, for any voice-leading in the orbifold Rn/(Sn × 12Zn), there will be an infinite number of equivalent, equally-sized voice-leadings in Rn. Conversely, for any voice-leading in Rn, with displacement multiset containing only elements ≤ 6, there is a corresponding voice-leading in Rn/(Sn × 12Zn). Let En be a chord that divides pitch-class space into n equal parts. Since E is invariant under transposition by 12/n semitones, there will be a voice-leading between chords congruent to E and Tx(E) of the form (e1, e2, …, en)→(e1 + c, e2 + c, … en + c), where c is any real number ≡12Z/n x

(5)

(NB: c is congruent to x mod 12Z/n, not mod 12Z.) Choose c so that |c| is as small as possible. The displacement multiset corresponding to this voice-leading is {|c|, |c|, … , |c|}. The sum of the elements of this multiset is n|c|, where n|c| is the smallest positive real number such that nc ≡12Z nx. By the distribution constraint, this multiset is as small as any n-note multiset with the same or greater sum. Now consider any bijective voice-leading between representatives of two n-note transpositionally-equivalent chords A and Tx(A). Let ΣA refer to the sum of the components of A. Therefore,

S17

Σ(Tx(A) – A) ≡12Z nx

(6)

The real number Σ(Tx(A) – A) is the sum of signed quantities; the sum of the absolute values of these quantities must therefore be greater than or equal to n|c|, where n|c| is the smallest positive number such that nc ≡12Z nx. Thus the elements of the displacement multiset associated with the voice-leading A→Tx(A) sum to at least n|c|. We conclude that this voice-leading can be no smaller than the minimal voice-leading between En and Tx(E). There is a useful corollary to Theorem 3 that applies in the discrete case. COROLLARY. Let Ek (the “chromatic scale”) divide pitch-class space into k > n equal parts, let A be any n-note subset of Ek, and let M be the “maximally even” n-note subset of Ek (S8). Then, for any integer i, the minimal bijective voiceleading between A and T12i/k(A) can be no smaller than the minimal bijective voice-leading between M and T12i/k(M). The proof follows the same basic outlines as the proof of Theorem 3. We rely on the fact that M divides any number of octaves into nearly even parts: given M = (m0, m1, …, mn1

), and some constant integer c, the distances |mc+i – mi|12Ζ (subscript arithmetic mod n)

come in “consecutive integer sizes” when measured in units of 12/k (S8). That is, for every integer c there exists an integer j, such that the distances |mc+i – mi|12Ζ are equal to 12j/k and (12j+1)/k. This allows us to find a voice-leading M→T12i/k(M) is small as possible for n-note subsets of Ek. As before, we use the “cyclical” component of the voice-leading mi→mc+i to neutralize the “transpositional” component of the voice-leading mi→mi + x. Now for the formalities. By the argument given above, the minimal voice-leading A→T12i/k(A) has a displacement multiset whose sum is at least n|c|, where n|c| is the smallest positive number such that nc ≡12Z 12in/k. What needs to be shown is that there is a voice-leading M→T12i/k(M), with a displacement multiset summing to n|c|, whose values are as evenly distributed as possible. Since our voice-leadings are required to connect subsets of Ek, we can establish maximally-even distribution by showing that the

S18

values of the displacement multiset take on just two distinct values: 12r/k and 12(r+1)/k, where r is some nonnegative integer. Let (m0, m1, … mn-1) order the elements of M in ascending numerical order; form ∞

the infinite sequence S = {m(j mod n) + 12j/12}j=-∞. (Again, “x” refers to the greatest integer ≤ x.) S consists of all of the elements of R congruent mod 12Z to elements of M. This sequence is ordered in ascending numerical order and indexed such that S-1 = mn-1 – 12, S0 = m0, S1 = m1, and so on. The voice-leadings (m0, m1, … mn-1)→(Sa + x, Sa+1 + x, …, Sa+n-1 + x)

(7)

are voice-leadings between chords congruent to M and Tx(M). The following music-theoretical facts are well known: 1. The (real-valued) sum of the components of (Sa – m0, Sa+1 – m1, … Sa+n-1 – mn-1) is equal to 12a (S9). 2. The elements of this n-tuple will either be constant, or have two distinct values: 12r/k and 12(r+1)/k, where r is some integer (S8). From these two facts, it follows that we can find a voice-leading S→Tx(S) corresponding to the n-tuple (Sa + x – m0, Sa+1 + x – m1, …, Sa+n-1 + x – mn-1)

(8)

with elements summing to nc, where n|c| is the smallest positive number such that nc ≡12Z nx. When x and Sa+i – mi are both integer multiples of 12/k, the values of this n-tuple are either constant or can be expressed in the form 12r/k and 12(r+1)/k, where r is some integer. These values will either be all nonnegative or all nonpositive. The sum of the elements of this voice-leading’s displacement multiset will therefore be n|c|. The displacement multiset will contain just two distinct values, 12|r|/k and 12|r+1|/k. This implies that the displacement multiset is as evenly-distributed as possible, given the hypothesis that the voice-leading connects subsets of Ek.

S19

NOTES S1. D. Lewin. Journal of Music Theory 42, 15 (1998). S2. R. Cohn, Journal of Music Theory 42, 283 (1998). S3. J. Straus. Music Theory Spectrum 25, 305 (2003). S4. C. Callender, Music Theory Online 10.3 (2004). S5. R. Cohn. Journal of Music Theory 41, 1 (1997). S6. J. Douthett, P. Steinbach. Journal of Music Theory 42, 241(1998). S7. R. Gauldin, Harmonic Practice in Tonal Music (Norton, New York, 1997). S8. J. Clough, J. Douthett. Journal of Music Theory 35, 93 (1991). S9. J. Clough, G. Myerson. Journal of Music Theory 29, 249 (1985)

S20

SYMBOL OR TERM multiset {a, b, c} (a, b, c) x R Z nZ, where n is a real number mZn, where m is real and n is an integer A/G, where G is some group of transformations acting on the elements of A R/12Z

a ≡nZ b

DEFINITION A set in which duplications are permitted. Like sets, multisets are unordered. A multiset with elements a, b, c. An ordered list. (a, b, c) and (b, c, a) are not the same. The greatest integer ≤ x. The real numbers. The integers. The set {ni | i ⊂ Z}. Thus 12Z is the set {…, -24, -12, 0, 12, 24, …}, whose elements form a group under addition. The set of ordered n-tuples (x1, x2, … xn) such that each xi ⊂ mZ. This set forms a group under vector addition. the quotient space that identifies all points a and ga, where a ⊂ A and g ⊂ G The circular quotient space in which all real numbers x and x + 12 have been identified. The group 12Z acts by ordinary addition, so that every point x has orbits {…, x – 36, x – 24, x – 12, 0, x + 12, x + 24, x + 36}. Pitch class a is congruent to b mod nZ. Thus there exists an integer c such that a = b + cn.

|a|12Ζ

The norm of a pitch-class a. The smallest real number |x| such that x ≡12Z a.

(a1, a2, …, an) ≡12Z (b1, b2, …, bn)

For all n, an ≡12Z bn.

Tn

The n-torus, or product of n circles. Since R/12Z is a circle, Tn can also be written (R/12Z)n. The “symmetric group” consisting of all the distinct permutations of n objects.

Sn

Table S1. A glossary of mathematical terms and symbols used in the article.

SYMBOL OR TERM pitch

pitch-class

chord transposition

Tx(A) inversion

Ix(A) voice-leading trivial voice-leading

DEFINITION Pitch is a fundamental attribute of musical notes. Pitches are typically represented by real numbers such that middle C is 60, the octave has length 12, and semitones have size 1. An equivalence class of pitches, consisting of all pitches separated by an integral number of octaves. A220 and A440 both are instances of the same pitch-class A. Pitch-classes can be represented by elements of the quotient space R/12Z. A multiset of pitch-classes. It is also possible to consider chords of pitches, which are simply multisets of real numbers. Translation in pitch or pitch-class space. In both pitch and pitchclass space, transposition corresponds to addition by a constant value. If a is a pitch or pitch-class then a + x is the transposition of a by x semitones. The transposition of the chord A by x semitones. Reflection in pitch or pitch-class space. In both pitch and pitchclass space, inversion corresponds to subtraction from a constant value. If a is a pitch or pitch-class, then x – a is an inversion of a. The quantity “x” is called the index number of the inversion. The inversion of chord A with index number x. A voice-leading between two multisets {a1, a2, …, am} and {b1, b2, …, bn} is a multiset of ordered pairs (ai, bj), such that every element of each chord is in some pair. A trivial voice-leading contains only pairs of the form (x, x).

Table S2. A glossary of musical terms and symbols used in the article.

. . . }

t0 579

{24

.



1}

{

{8↔ 9

Ef 2} {1↔

6↔

7}

Af

{13 te} Fs/Gf 5 468 C 68t s Cf /D 0} {11↔ B/ {4↔5} f 0} {

4

68 E 9e}

{

}

{13

{1

.

{13

0}

A

{124689e}

.

{2↔ 3}

. . .

10

.

9↔

8t 357

9e}

t Bf 0} }

579

{0

D

{23

4

67 G 9e}

46 7

{23578t0}

e} {t↔

{5↔ 6}

{02

{12

.

4}

3↔

C

8} {7↔

.

F

{024579e}

{13568te}

Figure S1. The circle of fifths can be interpreted as depicting minimal voice-leadings between diatonic collections (major scales). Each diatonic collection can be transformed into its neighbors by voice-leading in which one pitch-class moves by semitone. For example, the C major scale, containing pitch-classes 0, 2, 4, 5, 7, 9, and 11 (= e) can be transformed into the G major scale (containing pitch-classes 0, 2, 4, 6, 7, 9, and 11) by moving the pitch class 5 (F) to 6 (Fs). Here as elsewhere, the letters “t” and “e” refer to the numbers 10 and 11, respectively.

[Bf] [g]

[Fs] [D] [Bf] g

ds b

[e] c

gs e

Ef

B

G

[cs] a

f

fs d

[Cs] [as]

A

F

Df bf

[E]

C

Af

E cs

[G]

[Bf]

Fs

D

Bf

Figure S2. The Tonnetz. Nineteenth-century theorists such as Hostinsky, Oettingen, and Riemann explored a geometrical figure that is the “geomterical dual” of the one shown here. The graph displays efficient voice-leadings among the 24 familiar major and minor triads. Triads connected by horizontal lines share both “root” and “fifth,” and can be connected by voice-leading in which one note moves by one semitone. (For example, the C-major triad can be transformed into a C-minor triad by changing E to Ef.) Triads along the NE/SW diagonal also share two notes and can be connected by singlesemitone voice-leading. (For example, the C-major triad can be transformed into an E-minor triad by changing C to B.) Triads along a NW/SE diagonal share two notes and can be connected by voice-leading in which one note moves by two semitones. (For example, the C-major triad can be transformed into an A-minor triad by changing G to A.) Topologically, the figure is a 2-torus.

a)

.

n

b2

.

b1

. .

(a1)

m

a2

n

. . . . .

(a1)

.

.

a1

b)

b1

.

(a2)

x

b2

(a2)

x

m

a1

a2

x

c)

. . .

(a1)

n b2

b1

m

a1

Figure S3. Three types of voice-crossing

. . .

(a2)

a2

a)

.. . . .. . .

d1 d2

b1

b2

b)

a2

c1

a1

c2

.. . . .. . .

d1 d2

b1

a1

b2

c1

a2

c2

Figure S4. Removing a crossing does not create new crossings

FIGURE S5. Using dynamic programming to find minimal voice-leading 4

8

11

3

4

4

(4)→(4)

(4, 4)→(4, 8)

(4, 4, 4)→ (4, 8, 11)

(4, 4, 4, 4)→ (4, 8, 11, 3)

(4, 4, 4, 4, 4)→ (4, 8, 11, 3, 4)

7

Size: 0 (4, 7)→(4, 4)

Size: 4 (4, 7)→(4, 8)

Size: 9 (4, 7, 7)→ (4, 8, 11)

Size: 10 (4, 7, 7, 7)→ (4, 8, 11, 3)

Size: 10 (4, 7, 7, 7, 7)→ (4, 8, 11, 3, 4)

11

Size: 3 (4, 7, 11)→ (4, 4, 4)

Size: 1 (4, 7, 11)→ (4, 8, 8)

Size: 5 (4, 7, 11)→ (4, 8, 11)

Size: 9 (4, 7, 11, 11)→ (4, 8, 11, 3)

Size: 12 (4, 7, 11, 11, 11)→ (4, 8, 11, 3, 4)

0

Size: 8 (4, 7, 11, 0)→ (4, 4, 4, 4)

Size: 4 (4, 7, 11, 0)→ (4, 8, 8, 8)

Size: 1 (4, 7, 11, 0)→ (4, 8, 11, 11)

Size: 5 (4, 7, 11, 0)→ (4, 8, 11, 3)

Size: 10 (4, 7, 11, 0, 0)→ (4, 8, 11, 3, 4)

4

Size: 12 (4, 7, 11, 0, 4)→ (4, 4, 4, 4, 4)

Size: 8 (4, 7, 11, 0, 4)→ (4, 8, 8, 8, 8)

Size: 2 (4, 7, 11, 0, 4)→ (4, 8, 11, 11, 11)

Size: 4 (4, 7, 11, 0, 4)→ (4, 8, 11, 11, 3)

Size: 8 (4, 7, 11, 0, 4, 4)→ (4, 8, 11, 11, 3, 4)

Size: 12

Size: 12

Size: 7

Size: 3

Size: 3

B

A

[00]

[10]

[20]

[30]

[40]

[50]

[60]

[70]

[80]

[90]

[t0]

[e0] [00]

0e

1e

2e

3e

4e

5e

6e

7e

8e

9e

te

ee

[0e]

0t

1t

2t

3t

4t

5t

6t

7t

8t

9t

tt

et

[0t]

09

19

29

39

49

59

69

79

89

99

t9

e9

[09]

08

18

28

38

48

58

68

78

88

98

t8

e8

[08]

07

17

27

37

47

57

67

77

87

97

t7

e7

[07]

06

16

26

36

46

56

66

76

86

96

t6

e6

[06]

05

15

25

35

45

55

65

75

85

95

t5

e5

[05]

04

14

24

34

44

54

64

74

84

94

t4

e4

[04]

03

13

23

33

43

53

63

73

83

93

t3

e3

[03]

02

12

22

32

42

52

62

72

82

92

t2

e2

[02]

01

11

21

31

41

51

61

71

81

91

t1

e1

[01]

00

10

20

30

40

50

60

70

80

90

t0

e0

[00]

Figure S6. Ordered dyad-space is a 2-torus. To identify points (a, b) and (b, a), we need to “fold” the torus along the AB diagonal. The result of this operation is shown in Figure S7.

B

[00] ee

[e0]

tt

te

[t0]

99

9t

9e

[90]

88

89

8t

8e

[80]

77

78

79

7t

7e

[70]

66

67

68

69

6t

6e

[60]

55

56

57

58

59

5t

5e

[50]

44

45

46

47

48

49

4t

4e

[40]

33

34

35

36

37

38

39

3t

3e

[30]

22

23

24

25

26

27

28

29

2t

2e

[20]

11

12

13

14

15

16

17

18

19

1t

1e

[10]

01

02

03

04

05

06

07

08

09

0t

0e

[00]

D

A

00

C

Fig. S7. The result of “folding” the 2-torus in Figure S6 along its diagonal AB. The resulting figure is a triangle with two of its sides identified, which is a Möbius strip. To transform Figure S7 into a more familiar representation of a Möbius strip, cut the figure along the line CD and glue AC to CB. (To make this identification in Euclidean 3-space, you will need to turn over one of the pieces of paper.) The result is a “square” with opposite sides identified, as in Figure 2 of the main paper.

d2

d3

d4

a0 a1

a2

a3

a4

a5

d5

d6 = W

A

.

d1

.. . . . .

A

d0

a6

Figure S8. The cyclical voice-leading (a0, a1, a2, a3, a4, a5, a6)→(a1, a2, a3, a4, a5, a6, a0) has displacement multiset {d0, d1, d2, d3, d4, d5, d6 = W}. By moving at most three notes by |x – di| semitones, we can make any of the di = x without changing the other dn ≠ W. That is, to change d0, we need only move a0; to change d5 we need only move a6; to change d1 we need only move a0 and a1; and so on. In the case of an arbitrary cyclical voice-leading, we never need to move more than half of a chordʼs notes by |x – di| semitones to “fix” any interval.