A "SoundFont bank" is a collection of sounds in the SoundFont standard format. Such a bank contains both the digital audio samples which have been captured from a sound source, and the instructions to the wavetable synthesizer on how to articulate this sound based on the musical or sonic context as expressed by MIDI.
For example, a trumpet could be a particular sound in a SoundFont bank which might contain both recordings of trumpets being played at several different pitches, as well as information which would tell the synthesizer to filter or mute the sounds when notes were played softly, loop information about the sample which would allow a short recording to be stretched into a sustained note, and instructions on how to apply vibrato or to bend the pitch of the note based on MIDI commands from the musician.
We use the term SoundFont because it is very much like a type font. It is designed to contain this information in such a way that a variety of wavetable synthesizers can reproduce the sound with an accuracy dependent only on the hardware's capability.
The trumpet sound example above is just like a letter 'a' in a type font. The different sounds produced by different keys and velocities of the trumpet in the SoundFont bank are analogous to different displays produced by different sizes of the letter 'a' in the type font. Different monitors displaying the letter 'a' in different sizes based on their resolution, memory, and other hardware capabilities is just like different synthesizers playing the trumpet based on their synthesis capabilities.
E-mu Systems owns the trademark on the word "SoundFont". This means any banks developed by third parties should be called "SoundFont compatible banks" rather than "SoundFonts" or "SoundFont banks". (Legal hub-bub...)
SoundFont banks and SoundFont compatible banks are currently stored as PC files, but may also be stored in non volatile memory and files under other operating systems.
Creative Technology is now developing drivers, tools, and utilities to support the new format. Preliminary versions of those drivers and preliminary utilities to support SoundFont 2.0 have been made available to developers. Developers are being furnished with an entire suite of example software and tools to more easily accommodate support for the SoundFont 2.0 format.
Creative plans to release the first revision of SoundFont 2.0 compatible drivers, a format conversion utility (sf1to2.exe), and the Vienna 2.0 SoundFont studio for the EMU8000 line of products in second quarter of 1996. (February - April timeframe). This revision supports all of the SoundFont 2.0 characteristics (including preset level editing, stereo sounds, added articulation parameters) EXCEPT the key/velocity scaling and real-time CC routing. A future revision of those drivers to support those features will be available at a later date (TBD).
SoundFont 2.0 is currently under review by Microsoft and various downloadable sound working groups (including the MMA's IA-SIG) to be accepted as an industry standard format for storing downloadable wavetable synthesized sounds. Creative/Emu are represented in those major working groups and are giving them full support.
The SoundFont 1.0 format will not continue to be supported by E-mu and Creative beyond the second quarter 1996.
We knew from the start that much of the power of the concept was dependent on the format becoming a universal standard. After the release of the AWE32 and SoundFont 1.0, we began to realize the weaknesses of the SoundFont 1.0 implementation of that concept, and the fact that such weaknesses would hinder our efforts to make the format a universal standard.
These factors led us to the decision to release the SoundFont 2.0 standard in an orderly and well organized manner to conform to the ORIGINAL CONCEPT, rather than to conform to existing material. This required fundamental revisions to the original format as well as adequate prototyping and documentation.
The SoundFont 2.0 specification took about a year to specify, fine tune, test get approval from representatives throughout the two companies, and fully and accurately document.
The SoundFont banks in SoundFont 2.0 format are actually self contained instruments which use only the minimal Preset Level articulation data necessary to allow auditioning the various instruments. So when SoundFont 2.0 tools and software is released, developers and/or end users will be able to easily create custom banks or customize the articulation of those instruments at the preset level.
Previously released SoundFont banks are still available in SoundFont 1.0 format. Since those were not CD-ROM collections, they may be easily converted into the SoundFont 2.0 format with the conversion utility.
E-mu plans to release new General MIDI compatible SoundFont banks in 2MB, 4MB, and 8MB sizes in SoundFont 2.0 format concurrent with the release of SoundFont 2.0 supporting software on Creative Technology products. (Those banks will not be available in SoundFont 1.0 format).
However for various reasons, mostly dealing with the cost of packaging, distribution, and customer support, "Preditor" was NOT made into a finished product, and is NOT available to the general public. The tool is currently only available to Creative developers. (Its mention in the SoundFont specification was an oversight on our part.)
However the tool did do an excellent job at prototyping and fine tuning the new format and making it ready for public release, and it is what is being used today by third party developers to develop content in preparation for the release of SoundFont 2.0 compatible software.
A SoundFont bank is stored in the industry standard RIFF format. There are 3 major portions or "chunks" contained within a SoundFont bank, one for information, one for sample data, and one for articulation data.
The information chunk contains information about the bank. The Sample Data chunk contains the digitized waveforms used in the SoundFont bank itself. The articulation data chunk is a bit more involved.
The articulation data chunk uses two levels of articulation data which sits atop a level of sample header data which describes sample data itself. Each level of articulation data REFERENCES the level beneath it, thus allowing for reuse of resources.
SF2 Articulation Data Hierarchy:
PRESET LEVEL . . . Preset m Name Variation Bank Index Program Index Layer n Articulation data Instrument i Reference Layer n+1 Articulation data Instrument h Reference . . . Preset m+1 Name Variation Bank Index Program Index Layer o Articulation data Instrument p Reference Layer o+1 Articulation data Instrument a Reference . . . INSTRUMENT LEVEL . . . Instrument i Name Split j Articulation Data Sample s Reference Split k Articulation Data Sample b Reference . . . Instrument i+1 Name Split l Articulation Data Sample s Reference Split l+1 Articulation Data Sample t Reference . . . SAMPLE HEADER LEVEL . . . Sample s Name Location and loop points Information Sample s+1 Name Location and loop points Information . . .
The lowest level of articulation data is called the "Sample Header" level. It contains a list of data structures, each describing one particular waveform contained within the SoundFont bank or a particular waveform contained within a wavetable ROM. Each data structure has a unique name for each sample, information regarding the location of the sample relative to the beginning of the sample data contained within the bank (or the absolute location of the sample in a particular wavetable ROM), sample loop points, original sample rate of the digitized waveform, and other information about the samples.
The combination of the Sample Header data and the sample data for a single sampled waveform in a SoundFont bank is the rough equivalent of a WAV file. However, stereo SoundFont sounds are quite different than stereo WAV sounds. See question below for details.
The next highest level of articulation data is called the "Instrument" level. Instruments are subdivided into "Splits". A Split is the combination of a key range and/or a velocity range, a reference to a particular sample header within the SoundFont bank, and articulation data which is applied directly to that sample.
By referencing samples, it is possible to have any split articulate any sample in the bank.
IE:
INSTRUMENT LEVEL Instrument 1: Split 1 Key 0-127 Articulation Data Sample q . . . Instrument y: Split 1 Key 0-127 Articulation Data Sample q . . . SAMPLE LEVEL . . Sample q: Loop points, etc . .
By allowing multiple splits within a single instrument, it is possible to have different samples played back with varying keynumber and velocity, each with independent articulation.
IE:
INSTRUMENT LEVEL Instrument 1: Split 1 Key 43-43 Articulation Data Sample q Split 2 Key 44-44 Vel 0-120 Other Articulation Data Sample x Split 3 Key 44-44 Vel 101-127 Other Articulation Data Sample q Split 4 Key 0-42 Other Articulation Data Sample a . . . . SAMPLE LEVEL . . Sample a: Loop points, etc . . Sample q: Loop points, etc . . Sample x: Loop points, etc
One can see that keys 0-42 plays sample a, and key 43 plays sample q. So key range is variable. Also one can see that key number 44 plays sample x when struck softly, sample q when struck hard, and BOTH samples when struck medium. So velocity sensitive samples is supported.
These features can be used to form drumkits, finely tuned complex instruments which do not sound "munchkin-like" after pitch shifting, or velocity dependent sounds.
A combination of splits (which (typically) collectively span the range of possible keynumbers and velocity values) is collected to form an "Instrument".
The highest level of articulation data is called the "Preset" level. Presets are subdivided into "Layers". A Layer is the combination of a key range and/or a velocity range, a reference to a particular instrument within the SoundFont bank, and articulation data which is applied RELATIVE to corresponding articulation data within that instrument.
By referencing instruments, it is possible to have any layer articulate any instrument in the bank. By applying articulation data at the Preset level RELATIVE to data at the Instrument level, it is possible to customize instruments which are professionally designed and fine tuned without having to destroy the original material.
IE:
PRESET LEVEL Preset 1: Layer 1 Key 0-127 Add i% Reverb Instrument x . . . INSTRUMENT LEVEL Instrument 1: Split 1 . . .
Here, i% reverb is ADDED to WHATEVER the reverb is on WHATEVER samples are used in the instrument!
By allowing multiple layers within a single Preset, it is possible to have different instruments exist at different parts of the keyboard or in different velocity ranges, each with independent articulation.
This can be used to make presets in which your left hand plays a bass guitar and your right plays a piano:
PRESET LEVEL Preset 1: Layer 1 Key 0-63 Instrument "Bass Guitar" Layer 2 Key 64-127 Instrument "Piano"
chorded versions of instruments:
PRESET LEVEL Preset 1: Layer 1 Key 0-127 Instrument "Piano" Layer 2 Key 0-127 Add 4 semitones of pitch tuning Instrument "Piano" Layer 3 Key 0-127 Add 7 semitones of pitch tuning Instrument "Piano"
velocity sensitive instruments:
PRESET LEVEL Preset 1: Layer 1 Key 0-127 Vel 0-100 Instrument "Piano" Layer 2 Key 0-127 Vel 101-127 Instrument "Bass Guitar"
or any other custom variations of finely tuned instruments. All of that without having the user worry about samples or loop points or copying of large numbers of splits or other such complex issues. Expert users of Vienna 1.0 should be able to recognize that creating presets like the above would be a nightmare with the SoundFont 1.0 format! (EsBeeKay users COULD do the Piano/Bass Guitar, but not the chorded piano nor the add i% reverb)
A combination of layers (which (typically) collectively span the range of possible keynumbers and velocity values) is collected to form a "Preset".
Each Preset has a corresponding "variation bank" index and "program index" which are directly analogous to MIDI "bank change" and "program change" commands.
A SoundFont bank is viewed as the combination of a collection of Presets, Instruments, and Samples, each self contained and only dependent on the level beneath it.
IE:
A single sample header and specified sample data are viewed as an entity.
A single instrument and all of the samples it uses are viewed as an entity.
A single preset and all of the instruments it uses are viewed as an entity.
The ability to modify the value of any generator based on MIDI or computer generated events is also supported. This includes key scaling and MIDI Continuous Controller routing to any 'generator'. Such articulation data is called a 'modulator' The list of possible modulators includes 'keynumber to filter cutoff', 'MIDI CC 1 to Tremelo', etc. Note that modulators consist of a SOURCE, a DESTINATION, and an AMOUNT. The SOURCE (keynumber, MIDI CC, etc) is a MIDI or computer generated event, whereas the DESTINATION (Filter Cutoff, Tremelo, etc) is a GENERATOR. The AMOUNT by which a source changes a destination is programmable, and that amount can either be static or controlled by another SOURCE.
Some features which sound like they should be 'modulators' are so important that we made them 'generators'. An example of this is 'Keynumber to Volume Envelope Decay Time' being a GENERATOR. This parameter is vital in creation of certain instruments (such as a piano).
Generators and Modulators may be used at the Preset Level and at the Instrument Level of a SoundFont bank. A comprehensive list of currently defined generators may be seen in the SoundFont 2.0 specification. A comprehensive list of modulator sources is currently under review and will be available in an updated version of the SoundFont 2.0 specification.
All units have the additional property of being "perceptually additive". This makes it such that when the Preset Level articulation or a Real-Time Modulator "adds" amounts to corresponding articulation in an Instrument, its effects are perceived as the same, regardless of the value of that data within the instrument.
Examples of units with these properties would be "pitch in cents" or "attenuation in centibels."
Each sample header contains a field where it declares itself as a mono or a master or a slave sample, and another field where it has a reference to the other sample, or the next sample in the chain.
Stereo sounds are be produced when an INSTRUMENT contains two splits, each of which point to the two stereo samples, and each of which are triggered with the same key/velocity combination. These splits may have INDEPENDENT articulation parameters settings (including pan position) EXCEPT those involving pitch modulation.
IE:
Preset "Stereo Sound" Layer 1 Instrument "Stereo Instrument" Split 1 Articulation Data Parameters Sample "Right Sample" (Master, points to "Left Sample") Split 2 Articulation Data Parameters Sample "Left Sample" (Slave, points to "Right Sample")
In the case of pitch, one split (called the MASTER split) controls the pitch for its sample AND the sample in the OTHER SPLIT (or SLAVE split). So changing pitch modulation parameters (such as LFO to Pitch or fine tuning) in a SLAVE split does nothing, but changing such parameters in a MASTER split has that modulation apply to BOTH samples!
There is no other "automatic" parameter settings in SoundFont stereo sounds. Pan position may be wherever you like. (no automatic pan position of stereo paired samples). Each sample may be played at the loudness they are both individually capable of (no automatic attenuation of stereo paired sounds). Each sample gets its own individual attenuation and filter settings! (no automatic articulation data copying from master split to slave split in stereo paired sounds). Thus, it is POSSIBLE to build a classic stereo sound (pan hard left/right, half volume on each, same articulation on each) in a SoundFont 2.0 bank but you are not CONFINED to those settings!
If a keynumber/velocity combination triggers only ONE split in an instrument and that split contains ONE of a pair of stereo samples, then you hear ONLY that sample in MONO. This way, you can use the samples individually or in stereo if you like.
So if an INSTRUMENT only contains ONE sample in a stereo pair, that is a MONO instrument.
IE:
Preset "Mono Sound" Layer 1 Instrument "Mono Instrument" Split 1 Articulation Data Parameters Sample "Right Sample" (Master, points to "Left Sample")
Even though the "Right Sample" still points to the "Left Sample",the "Left Sample" is never used in the "Mono Instrument". So that is a mono sound with just the one sample!
If a PRESET contains two INSTRUMENTS, one of which holds only ONE sample in a stereo pair, and the other of which holds only THE OTHER sample in a stereo pair, a stereo sound does NOT result. That would have the effect of FUNDAMENTALLY changing the nature of an instrument with nothing more than the inclusion of another instrument. This defies the 'instrument as an entity' property.
IE:
Preset "NOT A Stereo Sound" Layer 1 Instrument "Mono Instrument" Split 1 Articulation Data Parameters Sample "Right Sample" (Master, points to "Left Sample") Layer 2 Instrument "Other Mono Instrument" Split 1 Articulation Data Parameters Sample "Left Sample" (Slave, points to "Right Sample")
Even though the "Right Sample" still points to the "Left Sample", AND the "Left Sample" IS used in another instrument, this is NOT a stereo sound!
A single SoundFont bank may be made with all of the above examples:
PRESET LEVEL Preset 1: "Stereo Sound" Layer 1 Instrument 1 Preset 2: "Mono Sound" Layer 1 Instrument 2 Preset 3: "NOT A Stereo Sound" Layer 1 Instrument 2 Layer 2 Instrument 3 Preset 4: "Some other unrelated sound" Layer 1 Instrument 4 INSTRUMENT LEVEL Instrument 1: "Stereo Instrument" Split 1 Articulation Data Parameters Sample 1 Split 2 Articulation Data Parameters Sample 2 Instrument 2: "Mono Instrument" Split 1 Articulation Data Parameters Sample 2 Instrument 3: "Other Mono Instrument" Split 1 Articulation Data Parameters Sample 2 Instrument 4: "Some other unrelated instrument" Split 1 Articulation Data Parameters Sample 3 SAMPLE LEVEL Sample 1: "Right Sample" (Master, points to "Left Sample") Loop points, etc Master Points to Sample 2 Sample 2: "Left Sample" (Slave, points to "Right Sample") Loop points, etc Slave Points to Sample 1 Sample 3: "Some other sample" Loop points, etc Mono Pointer is irrelevant
This example does not use PRESET LEVEL Articulation Data Parameters, but you can do that too!
Finally, "stereo" sound in SoundFont 2.0 is merely a subset of general number of pitch phase locked sample sounds. In the case of "stereo" sound, that "general number" is 2. Thus it is possible to pitch phase lock as many voices as you like, so long as the link information in the sample header is used to form a CLOSED CHAIN of samples.
IE Sample 1 points to Sample 2 which points to Sample 3... which points to Sample n which points back to Sample 1.
Since parameters are enumerated, it is easy to recognize and ignore data. That is the fundamental property which makes SoundFont 2.0 a portable and exchangeable file format. Also, since all parameters are stored in real world units with resolutions at the minimum perceptual level, the format will not be outdated by future synthesizers.
Answers to questions related to the AWE32 series of sound cards, their related software and their interaction with MIDI sequencers and other utilities are available at the Creative Labs WWW site (http://www.creaf.com) in their FAQ lists.
Specific questions related to the SoundFont 2.0 format which are not answered by the SoundFont 2.0 specification, white paper, or this FAQ may be sent to 'SoundFont@emu.com'
Copyright (C) 1995, E-mu Systems, Inc / Creative Technology, Ltd. All rights reserved. This document may only be redistributed in its original form.