High resolution musical branching applied to the early Final Fantasy battle-sequence

This chapter will cover the branching musical engine and apply it to the case study of Final Fantasy VII.[1] First to be covered will be a brief outline of some of the key developments in the music of video games and some of the major composers will be highlighted. Again, some clarification of terminology is necessary before discussion continues on the stereotypical Japanese Roll Playing Game (JRPG/RPG) style. Following this will be detailed discussion on the standard structure of the music in the Final Fantasy battle-sequence that reaches the point of describing the aesthetic concern this chapter addresses. Next is the main discussion on my solution to this issue, which illustrates the concept of musical resolution, arch and capillary branches. Finally, I discuss some possible limitations of this approach and solutions to these limitations.

Video games enjoy a rich history when considering their relative infancy. A thorough discussion is not possible here and has been provided by Collins, among others.[2] Only the first decade of video games (including Tennis for Two and Spacewar) didn’t incorporate sound; the early 1970s saw the first games where sound effects had their genesis. One of the most influential games of this time was Pong, which Chatfield states ‘transformed the world’s relationship with computer technology’.[3] Pong had a short ‘boop’ sound effect when the ball (a square) hit the paddle (a, slightly greyer, rectangle). Since these early games appeared, a general trend has been the increasing realism of sound effects. Music wasn’t prevalent in games during the 1950s-1970s because of hardware capacity limits and the cumbersome methods required to program music. A technique of looping the music allowed it to be continuous and somewhat reactive to the player. Space Invaders is one of the most iconic examples of this looping technique and is one of the first games to ever have incorporated it.[4] Music was given a lower priority to sound effects as when the single chip needed to render both simultaneously, the music would be dropped while the sound effect remained. It wasn’t until multiple sound chips were available in games such as Alpine Ski and Jungle Hunt that the hardware environment became hospitable enough for music to develop.[5]

From the mid-1980s, as technology developed, video game music became more complex, and with increasing memory space the tracks could become longer. Two very influential figures of the first era of video-game music are Koji Kondo, composer of Super Mario Bros. and The Legend of Zelda, and Nobuo Uematsu, composer of the Final Fantasy series beginning with Final Fantasy. [6] Koji Kondo’s ‘ground theme’ for Super Mario Bros. is widely regarded as one of the most famous pieces in video-game music history, and his main theme from The Legend of Zelda has reached a similar status. Nobuo Uematsu’s music for the Final Fantasy series of games is also considered some of the most popular and well known of all video game music to date. It is not surprising that once looping music became common practice the first generation of game music composers would become the proverbial fathers of all video-game music composition.

The looping of long passages of music has become an entrenched technique since these first practises. When including the independent game development scene we still have a substantial portion of games producing music with long looping tracks at the time of writing. Some examples include Starbound, Starcraft 2: Heart of the Swarm, and Bravely Default.[7]

Since the 1980s, hardware has improved vastly and has removed many limitations that hindered continuous music in the era of Kondo and Uematsu. Scripting or software solutions such as iMUSE (Interactive Music Streaming Engine: Lucas Arts: 1991) and FMod (Firelight Technologies) give audio designers and composers the freedom to create realistic interactive sound effects and adaptive music while not having to be highly trained programmers. Many modern games include extremely advanced systems that accurately mimic the effects that virtual objects may make on the sounds produced in game.[8] Although while I agree that sound effects are becoming increasingly interactive, I am not convinced that the same revolution is evident on the music side of game audio.

Dynamic audio production is a field that is still in its relative infancy and therefore presents much opportunity for new development and innovation. Dynamic audio encompasses two subdisciplines; first being interactive audio – ‘sounds and events that react to the player’s direct input’; for example, a player pressing a button making the player’s avatar swing its sword producing a ‘sword swing’ sound.[9] The second being adaptive audio – ‘sound that reacts to the game states, responding to various in-game parameters’; for example, when day turns to night, ‘day’ music ends and ‘night’ music begins.[10] The majority of interactive audio deals with instantaneous sound effects whereas adaptive audio can be achieved by way of musical composition.

I will be looking at a scenario where looping of long tracks (roughly twenty seconds or longer) of music is vital in creating a constant musical score to the game. This scenario is found in many Japanese Role Playing Games (JRPGs) where the game play is split between a story and battle modes. I will investigate the battle-sequences from Final Fantasy, widely regarded as one of the most popular and well known of all JRPG series. In analysing the stereotypical musical construction of these battle-sequences I aim to suggest an improvement by way of a working example using musical templates and MaxMSP engine prototyping. The musical template will fit the musical style of the Final Fantasy series so as to present the improvement in as typical a staging as possible. The study of this scenario is not diminished by the fact that the games of the Final Fantasy series departed from this kind of battle system in 2006 with Final Fantasy XII; the same type of musical system can still be found in many titles such as Cthulhu Saves the World, Evoland, and most notably the Pokémon series, even right up to the most recent Pokémon X/Y released in October, 2013 as well as other JRPGs (Japanese Roll Playing Games) to date.[11]

To be noted is the stereotypical formula of a Final Fantasy game and its music. Final Fantasy is known as an RPG or Role Playing Game. In a Role Playing-Game the player will control one character or a group of characters and will direct the characters’ movements as they are taken through the narrative. World-exploration is a common feature of an open-world RPG like Final Fantasy, when the player directs the characters into points of the main scripted story (or main-quest) the narrative will progress and sometimes the whole explorable world will change. Features explorable include other quests (known as side-quests) and challenges that provide their own diverging narratives occurring as parallel storytelling. In Final Fantasy the main-quest is a linear narrative where the player will take on the roles of a team (or party) of characters. The player will explore towns, cities, the landscape, fly ships, sail boats, drive cars, interact with objects, and will experience conversations between the played characters (PCs) and non-played characters (NPCs), which will guide the player through the narrative of the game.

In parallel with the narrative portion of the game is a battle-system. This is a mode of gameplay whereby two opponents duel in head-to-head physical and magical combat. The player’s team will square off against, and strategically attempt to destroy, all the enemies before being defeated. Each battle is a small puzzle that can be completed through a series of correct choices. Though there are many ways to win there are also many ways to be defeated. In the majority of the Final Fantasy series these battles take place in real-time.

Appropriate terminology for the narrative portion of Final Fantasy is difficult to achieve. By definition it is non-battle; however, this does not adequately reflect the richness of gameplay experienced outside the battle. Colloquially, the termoverworld is used within the gaming community to separate between the place where narrative and exploration take place and other types of gameplay. Although this term has been appropriated from games such as the Legend of Zelda series where the player’s avatar will actually walk downstairs to a region beneath the world, thus making the contrast between underworld and overworld a literal one, the term is still applied in most RPG games to be an area that interconnects all of its levels, puzzles or locations. During this paper the term overworld will refer to all non-battle scenes or music relating to that scene unless otherwise stated.

The total gameplay within the early Final Fantasy games will be made up of a cycle between battle and overworld. For the reader unfamiliar with this overworld-battle-overworld cycle I have prepared a video demonstration below (See Final Fantasy VII demonstration video). This video shows the visual and audio transition between the overworld and the battle in Final Fantasy VII as well as an entire battle sequence.[12] When visually juxtaposed, the difference between overworld and battle is striking.

Table 1 shows six images comparing the visual representation between the overworld and the battle-system across multiple games in the series, specifically Final Fantasy VII, Final Fantasy VIII and Final Fantasy IX.[13] The reader can also observe this formular in the video example (See Final Fantasy VII demonstration video). In complementing these two modes of gameplay, there exist two aspects of sound, which similarly use looping music with differing structures. Though the main portion of this chapter will focus on the battle-system’s music, the proposed improvements are mappable onto any situation where a musical transition takes place.

Table 1 : Final Fantasy VII, VIII and IX Overworld and Battle-sequence – Visual style separation.

FFVII Overworld – Cloud solo. Other characters appear if required by the narrative.

FFVII Battle – Cloud and allies square off against enemies.

FFVIII Overworld – Squall solo. Other characters appear if required by the narrative.

FFVIII Battle – Squall and allies square off against enemies.

FFIX Overworld – Zidane solo. Other characters appear if required by the narrative.

FFIX Battle – Zidane and allies square off against enemies.

The constraints of this paper only allow brief discussion of the music traditionally existing in the overworld portion of the game. Music is scored entirely (with one exception explained below) with long loops connected to areas or situations; for example: music for a named area (i.e. Cosmo Canyon/Wutai), music for an event (i.e. a chase/escape). Leitmotif is used to attach musical motifs to particular characters with the development of these themes largely only taking place during set video sequences or cut scenes – effectively short films – at dramatic moments during the narrative. Cut scenes provide a visual, and interactive, exception to the majority of the whole game experience. Complementing this exception is the film-like scoring of the music. During cut scenes, while the visuals span a fixed time, so too does the music. As music for these scenes is non-dynamic it will not be discussed in this paper. Shown (see Table 1) are the visual formulae used across the seventh, eighth and ninth games in the series. The reader can see that in each of the images on the left side there is a single character on a 3D overworld terrain with a map in the bottom right corner. On the right side of the table is the image from within a battle where the player’s party fight a party of enemies. There are menus at the bottom of the screen as well as weapons in the hands of the characters. A similar formulaic approach is also used in the music, where a particular style and execution has been consistent across all the RPG games in the series until FFXII.[14]

In discussion of the music of the Final Fantasy battle mode, I wish to consider the structure and the aesthetic consequences of having dynamic battle sequences scored with non-dynamic musical sequences. As in the overworld, where there is music for specific types of events or specific types of place, so too is there music for the event of battle, which possesses a similar style throughout the series. Often in a quick tempo, incorporating an irregular grouping of quavers (or semiquavers) in 4/4 meter (for example, 3+3+2 is prevalent) and is scored for either acoustic/electric instruments or synthesised versions of these instruments in the case of the earlier games. Structure is consistent and includes three sections: section A, also the introduction; section B, containing the bulk of the music, which loops to maintain continuous music for a battle of any duration; and a short ending (section C), a victory music that aesthetically transitions between the visual battlefield and the post battle analysis. The game then moves back into the overworld (see Final Fantasy VII demonstration video). Figure 1 shows a structure diagram of the visual cues and the accompanying music with the transition and sectional markings.

Figure 1 – Structure and transitions in a Final Fantasy Battle Sequence.

Entering battle causes a visual transition from the overworld, in the case of the below examples (see Table 1), a twisting of the screen in Final Fantasy VII, a left-to-right oversaturated smudging in Final Fantasy VIII, and a virtual shattering of the screen in Final Fantasy IX, all revealing the battle underneath. The Introduction music, coupled with the visual cues, will always start abruptly and overpower the overworld music by way of pounding rhythmic urgency and a series-regular baseline. The looping section comprises the main bulk of compositional material and is composed with its repetition as a primary compositional feature. It incorporates subsections so as to avoid the monotony of short (less than twenty seconds) looping passages. Once the player defeats all the enemies, the battle ends. Therein, characters perform a victory animation (for example the sheathing of a sword or sighing in relief) and the player hears the victory music. The music from the looping section will be truncated and taken over by the victory music.

Herein lies an aesthetic issue. The winning of the battle and therefore the playing of the victory music is important to the player’s immersion and contributes to the full understanding that the battle has been successful; however, the musical transition is abrupt and allows potential to disrupt immersion. Michiel Kamp notes the same disruption in his description of the music appearing in Super Mario Bros., which displays the same structure as the music in the Final Fantasy battle mode. He describes that once the introduction has played the music ‘proceeds to loop through a series of melodies until the player finishes the level or Mario dies, at which point a coda is played and the music stops abruptly’.[15] His opinion of these transitions as ‘abrupt’ aligns with my own.

Two musical solutions are available to maintain appropriate musical accompaniment and player immersion: ending the section of music, or compositionally linking the looping music to the victory music. In the current state the aesthetic value of the existing music is lost between these sections due to the fact that it does not continuously accompany the visual. Although some attempts at appropriate musical transition has been made in other games (such as The Legend of Zelda: Skyward Sword), in my opinion, nothing has as yet been successful enough to accompany changes in game state that happen at the speed they do in modern game scenarios.[16]

As the structure shown in Figure 1 occurs in the majority of the Final Fantasy series I will be referring to each transition in the plural form. Currently the successes of the transitional sections (w, X, Y and z) have different levels of competence as transitional pieces of linear composition. The musical transitions between the overworld to the battle introduction (w transitions) are abrupt and aggressive. Though it could be argued that w transitions are too abrupt and too aggressive, this musical approach captures the essence of the unforeseen battle (the random battle) and mirrors the abruptness of the visual transition. If w transitions need attention then the transitions from section A to section B (X transitions) need less; these two segments are scored together, and therefore designed, to work next to each other. On this basis the only transitions needing improvement are the transitions between section B and section C (Y transitions). As has been explained above, we can see that this is a disjointed transition with no conceptual excuse for the abruptness like those that may excuse (w) transitions.

What I propose for Y transitions is the composition of potential-musics. The current system of battle music is linear where the music travels from beginning to end, regardless of need for divergence at specific player-controlled moments.[17] The potential-musics system would produce many branching lines from the main body of music to arrive at the next musical point required by the game. Similar to the way in which capillaries transport blood from arteries to many different locations in the body, and then return via the veins, so too would the music branch away from the main artery of the score and continue on to a new artery through musical branches acting as transitional capillaries. The main composition of the music would take place in the arteries (hereon archbranches) and different game states will have different compositional archbranches associated with them. When a musical transition is required, the appropriate capillary branch will be selected by the program and be played next. The important difference between the linear form and the branching form is that from the point the music diverges from the archbranch to the point at which it enters another archbranch, the aesthetic consistency of the all musical components are neither disjunctive nor discordant. Many musical passages would be available to either link two sections together, change mood, end a section or take the place of a crossfade. This will create a fully scored musical system that reacts to the player’s actions.

This particular model of dynamic music can be described using a Markov Chain. The reader is directed to Charles Ames’s paper on ‘The Markov Process as a Compositional Model’ for a detailed explanation.[18] Figure 2 shows a Markov chain representing the current compositional linearity of the music in the Final Fantasy battle system. The reader can clearly see that the Y transition shows a break in musical composition.

Figure 2 – Markov Chain showing the linearity of composition during the Final Fantasy battle.

Two types of work must be undertaken to create a branching-music system. First, the musical composition of archbranches and capillary branches, this includes conceptualisation and realisation of many different fragments of score contributing to an aesthetic whole. Second, the programing of a system that, while playing a composition, understands its own location within that composition and has the ability to make the appropriate choice of which branching capillary to use during archbranch transitions. Applying a visual concept to music allows the programmer to address this task. To see an image on a computer screen many individual pixels must work in conjunction to create the illusion of an unbroken image when displayed synchronously. Therefore, each linear score can be broken down into many smaller segments to become, in essence, musically pixelated. When these musical pixels are chained together, the illusion of a fully composed piece is perceived. This pixilation technique can be used by the programmer to create a tagging system.

This is not a new concept, both iMUSE and FMod, mentioned in the previous chapter, allowed the composer to mark locations in the score for the system to evaluate the game state and make appropriate musical choices based on this. Creating a unique tag for each pixel allows the program to make accurate diverging choices when a transition is required. As the number of pixels working together in a computer screen increases, the resolution and definition of an image also increases and will allow the viewer to perceive the objects onscreen with more detail and with greater fidelity. Similarly, as musical-resolution increases, then the points at which the music is able to diverge (node points) from its current archbranch get closer together in a relative temporal domain. A musical pixel can therefore be defined as the distance between two nodes or diverging points. Where my proposed system improves on that of iMUSE and suggests further use for composers using FMod is in the scale of the resolution.

The early Final Fantasy battle system has a low (zero) musical-resolution; there are no places at which it can diverge. Therefore, when the game triggers a divergence, the music simply skips to the new music without regard for aesthetic continuity of the musical line or a musical catharsis. The reader is directed toward the Demonstrations_Application which illustrates this.[19] Figure 3 shows a Markov Chain diagram of this papers proposal for an improvement to this system by way of dynamic musical branching. It is feasible that capillaries could be composed to link more than two archbranches together.

Figure 3 – Markov Chain of proposed archbranch and capillary system.

Using either extremes of a musical-resolution spectrum have important implications. The primary trade-off is between maintaining the aesthetic goal of the project while avoiding large computational processing load. A low musical-resolution, where nodes are spaced a large distance apart, will not create a substantial hardware load and will produce dynamic music that will follow player’s actions to some extent. However, a low musical-resolution may not be enough to produce dynamic music that reacts with enough agility to the changing game states. A high resolution, where nodes are close together, will create a music that has diverging potential at temporally closer points and thus will follow player actions more accurately. There exists a point of perfectly high musical-resolution, where nodes are so close together that they will definitively create a music that has diverging potential at any definable point and therefore, will follow player actions precisely. However, such a high resolution would create an unnecessary tax on the hardware and would require a large amount of random access memory (RAM) to execute at the required speed. Further, the volume of score that would need to be composed to fill all capillary branches of a perfect-resolution engine would be staggering. A perfect resolution would require many times more score than will ever be played by the system just to accommodate the potential for all possible divergences. Economic and technological constraints of a perfect-resolution branching engine easily outweigh the viability of this ideal. There are two solutions to this problem: the first is to streamline the compositional process by, for example; making many capillary branches involved in an individual transition have similar (or the same) music (this has been done in Tab 2 inside the Demonstrations_Application). The second is simply to lower the musical resolution to an ideal compromise; a point where the moment of divergence is not noticed by the player and where the composer (or team of composers) are able to create the required quantity of score in the allotted economically-viable production time. In consideration of the latter, it is important to note that maximum player input speed is far slower than a computer’s ability to react to the inputs using this branching technique.

Using MaxMSP I have built a prototype patch with an average pixel size of one bar.[20] The choice of pixel size was threefold: first, the program would have enough resolution in order to present a convincing branching music system that does not tax the hardware that was personally available. Second, it created a greater degree of musical control; creating a pixel size that links to an intrinsically musical feature (the bar line) meant I could manipulate the music occupying the pixels to have similarities in rhythm, harmony and melody, which the listener will perceive as a piece of music. Third, this eased the process of teaching the program what pixel of the whole score it was playing. Tab 2 also demonstrates that average hardware capabilities are ample for a resolution of one bar. The particular hardware system used to build and demonstrate Tab 2 would not be overburdened by increasing the musical-resolution.[21] Tab 2 demonstrates an important first stage for the implementation of this into a live video-game scenario.

A game scored using a full complement of branching-music would produce a more reactive and visually complementing sonic experience for the gamer and create the feeling that the score is being produced on-the-fly by the program. This is only possible when consideration is made over the conception and composition of all archbranch and capillary branches, an appropriate musical-resolution is chosen for the game state, and all this is incorporated into a highly advanced sample-selection program, such as Tab 2.

[1] Final Fantasy VII, Playstation Game, Squaresoft, Japan, 1997.

[2] For a thorough discussion on this the reader is directed to chapter 2-4 of Collins, Game Sound; and for a briefer account the second chapter of Chatfield, Fun Inc. Collins gives an extremely through picture of the first few generations of game hardware pre and post the home console including their technology, market effects, and crucially the changes made in sound. Also see M Fritsch, ‘History of Video Game Music’, in P Moormann (ed.), Music and Game: Perspectives on a Popular Alliance, Springer, 2013, pp.11-41.

[3] Chatfield, p. 19; and Tennis for Two, Donner Model 30 analog computer Game, William Higginbotham, Brookhaven, 1958; and Spacewar!, PDP-1 Game, Steve Russell, MIT, 1962; and Pong, Arcade Game, Atari Inc., 1972.

[4] Space Invaders, Arcade Game, Taito, Japan, 1978.

[5] Collins, Game Sound, p. 15; and Alpine Ski, Arcade Game, Taito, North America, 1982; and Jungle Hunt, Arcade Game, Taito, Japan, 1982.

[6] Super Mario Bros., NES Game, Nintendo, Japan, 1985; and The Legend of Zelda, NES Game, Nintendo, Japan, 1986; and Final Fantasy, NES Game, Square, Japan, 1987.

[7] Starbound, PC Game, Chucklefish Games, Online, 2013; and Starcraft 2: Heart of the Swarm, PC Game, Blizzard Entertainment, Online, 2013; and Bravely Default, Nintendo 3DS Game, Square Enix, Japan and North America, 2013.

[8] See Halo: Combat Evolved, Xbox Game, Microsoft Game Studios, 2001; and Dishonored, PC/Playstation 3/Xbox 360 Game, Bethesda Softworks, 2012.

[9] T M Fay and S Selfon quoted in Game Sound, pp. 2-4.

[10] Collins, Game Sound, pp. 2-4.

[11] Cthulhu Saves the World, PC Game, Zeboyd Games, 2010; and Evoland, PC Game, Shiro Games, 2013; and Pokemon X/Y, Nintendo 3DS Game, Nintendo, 2013.

[12] See also Table 1.

[13] Final Fantasy VIII, Playstation Game, Squaresoft, Japan, 1998; and Final Fantasy IX, Playstation Game, Squaresoft, Japan, 2000.

[14] Final Fantasy XII, Playstation 2 Game, Square Enix, Japan, 2006.

[15] M Kamp, ‘Musical Ecologies in Video Games’.

[16] The Legend of Zelda: Skyward Sword, Nintendo Wii Game, Nintendo, Japan, 2011.

[17] The structure of Figure 1 can be observed interactively in Tab 1 of the accompanying demonstration patches file.

[18] C Ames, ‘The Markov Process as a Compositional Model: A survey and Tutorial’, in Leonardo, vol. 22, no. 2, 1989, pp. 175-187.

[19] See Tab 1 in Demonstrations_Application.

[20] See Tab 2 in Demonstrations_Application.

[21] 2.3GHz Intel Core i7 (quad-core, capable of hyperthreading a virtual core per core), 16Gb 1333 MHz DDR3, 7200rpm HDD, 512MB Intel GFX.

Huw Catchpole-Davies DPhil

Towards a more versatile dynamic-music for video games: Approaches to compositional considerations and techniques for continuous music.

High resolution musical branching applied to the early Final Fantasy battle-sequence