TikTok’s parent ByteDance has locked down AI-music patents in the US – as its researchers develop a model trained on 257,000 hours of songs

March 12, 2024

MBW Explains is a series of analytical features in which we explore the context behind major music industry talking points – and suggest what might happen next. Only MBW+ subscribers have unlimited access to these articles.

MBW has covered TikTok and parent company ByteDance‘s work in the field of AI music-making and machine learning extensively over the past few years.

In August 2022, MBW broke the news that TikTok and parent company ByteDance were hiring several highly skilled specialists in machine learning and AI music creation in both the US and China. (They still are.)

That initial hiring spree followed its acquisition in July 2019 of Jukedeck, a UK-based AI Music startup specializing in creating royalty-free music.

ByteDance has also launched a machine-learning-driven music-making app called Mawf in the past couple of years, as well as Ripple – an AI-powered music-making app that can turn a hummed melody into a song.

More recently, TikTok has been testing an AI Song feature that uses a large language model to power lyric generation.

Now, MBW has unearthed two recent research papers that indicate ByteDance’s ambitions in the realm of AI-made music go much further than what we’ve seen to date.

Separately, we’ve also spotted two US patent filings confirming ByteDance has now secured IP protection for future AI-music-related endeavors.

StemGen: A music generation model that listens

Two separate research papers from ByteDance’s Speech, Audio & Music Intelligence (SAMI) team – both published in recent months – highlight the company’s extensive work in the field of music generation.

SAMI, by the way, appears to be becoming quite the global priority at ByteDance/TikTok: The division is currently hiring for multiple roles – including an AI Product Operation Manager in San Jose who, according to the job spec, will be responsible for “the implementation of audio and music AI technologies in TikTok”.

The division is also hiring for a Lead Research Scientist, Foundation Model, Music Intelligence in San Jose, who will be required to “conduct cutting-edge machine learning research and development in music understanding and generation” and then “transfer advanced technologies to ByteDance products”.

In December 2023, a research paper was submitted by SAMI called StemGen: A music generation model that listens’ i.e. a stem generator.

According to the description of the project on its demo page, StemGen is an “end-to-end music generation model, trained to listen to musical context and respond appropriately”.

The research paper explains that StemGen was trained on the Slakh dataset, which consists of 145 hours of synthetic musical audio separated into stems.

StemGen was also trained on what ByteDance’s researchers say was an internal dataset of 500 hours of licensed music.

According to the summary of the research paper, “End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently”.

It adds: “However, most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context.

“We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture and present a number of novel architectural and sampling improvements.”

ByteDance’s researchers claim that “the resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context”.

‘Efficient Neural Music Generation’

In a separate research paper, submitted for review in May 2023, ByteDance’s SAMI team describes its work on what it calls ‘Efficient Neural Music Generation’.

In the paper, which you can read here, ByteDance’s researchers present a model called MeLoDy (M for music; L for LM; D for diffusion), described as “an LM-guided diffusion model that generates music audios of state-of-the-art quality“.

The researchers write: “Our experimental results suggest the superiority of MeLoDy [versus other music generators such as Google’s MusicLM], not only in its practical advantages on sampling speed and infinitely continuable generation, but also in its state-of-the-art musicality, audio quality, and text correlation”.

According to the research paper, MeLoDy was trained on 257,000 hours of music data, which the researchers say was filtered to focus on non-vocal music.

The model supports both music and text prompting for music generation.

You can hear examples of music generated by the MeLoDy model for yourself here.

Patent 1: ‘a computer-implemented method of generating a piece of music’

In addition to ByteDance’s work on AI-music-related research papers, the company has also been locking down patents in the field over the past few months.

The most recent of ByteDance’s music-related patents to be granted in the US is for an invention focusing on a ‘Method of generating music data’.

According to the document, which you can see for yourself here, ByteDance’s invention relates to “a computer-implemented method of generating a piece of music”.

This patent appears to hone in on generating the actual structure of the different parts of a piece of music. As MBW readers will know, song structure in contemporary songwriting is a key factor that can influence whether a song becomes a hit or not.

“Structure is a key aspect of music composed by humans that plays a crucial role in giving a piece of music a sense of overall coherence and intentionality.”
ByteDance patent filing

ByteDance explains that “the embodiments disclosed” in the patent application “provide a manner of introducing a long-term structure in machine-generated music”.

The filing continues: “Structure is a key aspect of music composed by humans that plays a crucial role in giving a piece of music a sense of overall coherence and intentionality.

“Structure appears in a piece of music as a collection of musical patterns, variations of these patterns, literal or motive repeats and transformations of sections of music that have occurred earlier in the same piece.”

The methods detailed as part of the claims for the invention include a machine learning (ML)-based structure generator and a machine learning (ML)-based melody generator.

Interestingly, this patent appears to have previously been assigned to Jukedeck in the UK, the UK-born AI company acquired by ByteDance in 2019.

Among the patent’s inventors are Jukedeck founder Ed Newton Rex and former Jukedeck researcher Gabriele Medeot, who is now a Senior Machine Learning Researcher at TikTok.

ByteDance applied for the patent in the US in February 2019 and it was granted on January 30 this year.

Patent 2: ‘Modular automated music production server’

ByteDance also owns a patent in the United States for a ‘Modular automated music production server’, which appears to have been developed by and previously assigned to Jukedeck.

According to the filing: “Automated music production based on artificial intelligence (AI) is an emerging technology with significant potential. Research has been conducted into training AI systems, such as neural networks, to compose original music based on a limited number of input parameters.

“Whilst this is an exciting area of research, many of the approaches developed to date suffer from problems of flexibility and quality of the musical output, which in turn limits their usefulness in a practical context.”

It adds: “One aim of this disclosure is to provide an automated music production system with an improved interface that allows flexible and sophisticated interaction with the system. This opens up new and exciting use cases where the system can be used as a creative tool for musicians, producers and the like in a way that suits their individual needs and preferences.”

This automated music production system is described by ByteDance in the filing as the “Jukedeck system” which “use[s] AI to compose and/or produce original music”.

ByteDance’s US application for the patent was granted in March 2023. According to Google Patents, ByteDance also has active patents for this invention in Japan and China.

“This technology is based on advanced music theory and combines neural networks in novel ways to compose and produce unique, professional quality music in a matter of seconds.”
ByteDance patent filing

According to the filing, which you can read in full here, “The Jukedeck system incorporates a full-stack, cloud-based music composer that addresses the complexities historically associated with AI and music”.

It adds: “This technology is based on advanced music theory and combines neural networks in novel ways to compose and produce unique, professional quality music in a matter of seconds.”

News of ByteDance’s clearly extensive work in the field of AI music arrives amid Universal Music Group‘s public fallout with its flagship app, TikTok.

On March 1, Universal Music Publishing’s catalog of ~4 million songs became unlicensed for use on TikTok, joining UMG’s portfolio of ~3 million recordings, whose license on TikTok expired (so far without renewal) on February 1.

In a statement issued to UMPG’s songwriters on February 29, the company turned much of its attention to the role AI-generated audio is playing on TikTok.

UMPG claimed that, so far, TikTok has not provided Universal with any assurances that the platform won’t train its AI models on the music company’s songs.

In addition, UMPG raised the specter of TikTok potentially using AI music to push down the market share (and therefore the earnings potential) of copyrighted/licensed music on the platform.

MBW has been discussing the hypothetical potential for TikTok and other services to stuff their catalogs with AI-made music – diluting the market share of traditional rightsholders – for some time.

In February last year, we published an ‘MBW Reacts’ article asking if TikTok could pull off a “heist” on the music industry in this regard, following its aggressive investment in generative AI technology.

The “heist” we were referring to: Using licensed music as a cornerstone in the rise of TikTok to well over a billion users globally, before using first-party, AI-created songs to crowd out music owned by traditional music rightsholders on the platform.

We wrote: “With music playing such a key role in TikTok’s rise, if major label content does disappear from the platform – and the gap is somehow successfully filled by indie and AI-driven creations – TikTok could be said to have pulled off one of the biggest heists in music business history. A bait and switch for a billion users.”

JKBX (pronounced "Jukebox") unlocks shared value from things people love by offering consumers access to music as an asset class — it calls them Royalty Shares. In short: JKBX makes it possible for you to invest in music the same way you invest in stocks and other securities.Music Business Worldwide

StemGen: A music generation model that listens

‘Efficient Neural Music Generation’

Patent 1: ‘a computer-implemented method of generating a piece of music’

Patent 2: ‘Modular automated music production server’

Related Posts