The MusicSystem Explained

Background: Why artists still compose music into 3-5 minutes songs?

Ever since popular music has been broadcasted by radio stations (somewhere between 1920’s and 1930’s), and consumed by listeners all over the world, artists were recording most of their music as 3-5 minutes songs.

This convention was born out of a technical limitation – The Phonograph, an early version of the record players we use today, could only play 12” vinyl records. Moreover, when an artist recorded a new album or a new single, the only way to ship it to the local or national radio station was by sending it using the US Post Office services. The biggest box one could send at that time, for a reasonable price, was a box that could only hold only a 12” record. As you can probably guess, a 12” vinyl record can hold a tune no longer than 5 minutes.

A century ago, music production, consumption, and distribution processes have gone completely digital. Even though most of the music we listen to today is basically bits of data that can be manipulated using simple algorithms, we still consume it in the 3-5 minutes linear format. Unlike other mediums, such as text or video, which in many cases are being consumed in a non-linear form, audio is still being consumed (and composed) in short linear sprints.

I believe that in the age of data, we can do more than that.

Let’s Record Data

The MusicSystem will allow musicians to record their musical ideas, and will help them turn them into an endless flow of music, structured from their own core concept.

The software will capture live recording, extract it to its musical features, and will format these features into a reusable data structure. Using this new data structure, the software will create countless versions and combinations, that will all accumulate the essence of the original piece.

The MusicSystem will use the data that will be extracted from the original recording to compose new music. The original recording could be handled as one musical version generated from the data, or as the main piece of the entire tune.

The artist will be able to control the way the music is being interpreted and recomposed, as well as to set rules about the way the music will change according to a variety of inputs, such as sensors.

More about all of that the sections below.

The System and Its Parts

Microphone: Recording Analog Signal

The initiator to the entire composition will a recorded sound. An artist will play an acoustic instrument or will amplify an electric instrument, and analog sound will be captured by a microphone.

The microphone will be connected to a computer, that will run analog-to-digital process to generate a digital file. The digital file will hold all the raw data about the analog recording (using this data, computers are able to play digital music files, such as .wav or .mp3 files).

Digital Audio Analysis

The purpose of The MusicSystem is to use the recorded sound as data, in order to generate new music out of it (instead of playing the recorded data itself).

The software will try to retrieve musical information from the recorded sound — From beat detection, to musical structure, notes, tone, repetition, and any other feature that can extracted from the file.

Using the Recording as a Practice Dataset

The captured and analyzed data will be fed into a neural network, that will identify the relations within it.  Using these relations, The SoundSystem will be able to generate a huge variety of compositions, that encapsulate the same relations.

Since we deal with generative music, composed by a machine learning algorithm, with small data set to practice on, the artist and machine will have to ‘converse’ in order to help the machine to focus on the faster on the expected results. The feedback from the artist will be used as a second dataset, that will be fed into the neural network.

Just like at the beginning, at a certain point, the artist will be able to decide if the music will be recorded and saved as (very) long file, or to save the music a set of rules and configurations. These rules and configurations will be saved as file, which will be used by The SoundSystem player to generate music, based on the artist recordings and decisions.

Playing Infinitely

Once the data has been analyzed, The SoundSystem will generate digital sound based on this data, infinitely.

The infinite playing mode will allow the artists to experiment with different aspects of the musical piece, with the effects of changes (see below) or new recordings, and to capture snippets of the infinite loop and make them permanent (played in a loop, which means that these pieces will not be randomly generative any longer).

The end user will listen to the music in that exact infinite form. The artist will be able to decide where the inifinate playing starts, but not where it ends.

Controlling the New Composition

If we use the recorded sound as data-feed and not as part of the desired outcome, we are starting to loose connection with the original recording. The original recording only ‘inspires’ the end result, but not strictly dictates it.

If the captured data can be interpreted and used to generate new music, we can assume that one of the outcomes could be a tune that is identical to the original recording. The probability that the software will play the original recording will be controlled by the artist. The artist will be able to control the way the software will handle the analog recording:

  1. As a final result that will be played as recorded
  2. As data that will teach the software how to generate new music
  3. As a combination – The recorded audio will be played entirely, and the data extracted from it will be used to generate new music.

Besides that, the artist will be able to control the generative outcome in a variety of ways, such as:

  • Highlighting specific recordings – The artist will be able to decide which of the recordings will be handled as a ‘major’ recording (will have more influence on the end result), and which ones will be handled as a ‘minor’ recording.
  • Use the generative sound as an input – The artist will be able to mark a specific part of the generative music, and us it a new input for The SoundSystem.
  • Strick VS. Loose music generation – The artist will be able to decide how ‘close’ the generative music will be to the original narrative enclosed in the recorded parts.
  • Sensors – The artist will be able to use sensors to change the musical outcome. For example, when the user is walking, in a dark room, or breathing heavily, the music will be played differently.
  • 3rd party data (rules) – The artist will be able to use 3rd party APIs and datasets to affect the music. For example, the music will be heard differently on holidays, or on a night when Phoenix Suns wins a basketball game.

Recording Some More

At this point of the interaction, the cycle can start to repeat itself in order to expand the results or to focus them on a specific musical idea.

The artist will be able to record more and more analog sounds, each of which will be extracted to a new dataset that will make The SoundSystem more educated about the artist direction.

Commits and rollbacks

To allow better communication with the musical piece, I would like the artist to feel free to make decisions, and the change them. In order to do that, I would like to implement a git, and to allow the artist to ‘commit’ changes, and to rollback to an older version of the musical piece.

Open Questions

This broad concept raises some unsolved questions:

Which Data Should Be Analyzed by the Software?

The software can analyze the DSP data, that is being generated through the Analog-to-Digital conversion of the recorded sound. This is the data that is being used to create and play the digital music file.

On the other hand, the software can analyze the digital file itself, and to retrieve information from this analysis.

It is currently unclear which data could be more relevant to create automatically generated (new) music, based on this data.

What is the Relevant Data?

Many types of data can be extracted from a digital music file. What data is relevant for this specific project? How can this data be manipulated or iterated to be used to generate data that is relevat for music creation (or music synthesis)?

How to Capture the Essense of the Original Recording?

It is critical to isolate the data the is most indicative of the ‘original essence’ of the recorded piece. The question about ‘what is an essence?’ or ‘what determines the essence of a musical piece?’ can be raised as well.

What is the relation between the software the composition itself?

Let’s assume that we use data A, that was extracted from the original recording, to produce data B, that will be used to generate new music. Isn’t the decision to produce data B, instead of to produce data C, a composition decision? Will the neural network make these decisions is a ‘trivial’ way, or is it the developer that is actually pulling the composition strings?

How to Create an Infinite Interaction?

In order to create an infinite piece of music, it could be assumed that an infinite creative process should be applied, or at least a procedure that allows such creative process.

The current system design will require the musician to put the instrument down in order to interact with the software.


There are two major inspirations to this project:

  • The Echo Nest API – A music information retrieval API that was used to extract musical features from a recorded track. The API, which is currently closed to the public, inspired the technical possibilities in the field.
  • The Infinite Jukebox, developed by Paul Lamere This web application inspired the creative applications that are currently possible using musical data, such as those provided by the Echo Nest API.