• Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Internet-Scale Information Has Pushed Unimaginable Progress in AI, However Do We Actually Want All That Information? Meet SemDeDup: A New Technique to Take away Semantic Duplicates in Internet Information With Minimal Efficiency Loss

March 23, 2023

Microsoft AI Introduce DeBERTa-V3: A Novel Pre-Coaching Paradigm for Language Fashions Primarily based on the Mixture of DeBERTa and ELECTRA

March 23, 2023

Assume Like this and Reply Me: This AI Strategy Makes use of Lively Prompting to Information Giant Language Fashions

March 23, 2023
Facebook Twitter Instagram
The AI Today
Facebook Twitter Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
The AI Today
Home»Machine-Learning»Welcome to the New Saga Launched by MusicLM: This AI Mannequin can Generate Music from Textual content Descriptions
Machine-Learning

Welcome to the New Saga Launched by MusicLM: This AI Mannequin can Generate Music from Textual content Descriptions

By February 3, 2023Updated:February 3, 2023No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


There was an explosion of generative AI fashions within the final couple of months. We’ve seen fashions that would generate lifelike pictures from textual content prompts, taking a look at you Steady Diffusion, textual content era in a given matter, now taking a look at you ChatGPT and GPT-3, video era from textual content inputs, now it’s your flip MakeAVideo, and extra. The development was so quick that, in some unspecified time in the future, we thought the curtain between actuality and digital actuality was virtually coming down. 

We’re nonetheless not carried out with visible and textual era fashions. They nonetheless have a protracted technique to go till they attain some extent the place it might not be potential to distinguish AI-generated content material from human-generated one. Till then, allow us to sit again and benefit from the stunning progress. 

Talking of progress, persons are not stopping to think about different text-to-X use instances. We’ve seen quite a few fashions focused for text-to-image, text-to-video, text-to-speech, and many others. Now, prepare for the following saga of text-to-X fashions. Textual content-to-Music.


👉 Learn our newest Publication: Diffusion fashions much less personal than prior generative fashions similar to GANs; Can LLMs extract information graphs from the unstructured textual content?…

The duty of producing audio from a sure situation known as conditional neural audio era. Such duties embody text-to-speech, lyrics-conditioned music era, and audio synthesis from MIDI sequences. A lot of the present work on this area depends on temporally aligning the supply sign, which is the situation, with the corresponding audio output. 

However, some research had been impressed by the success of text-to-image fashions, they usually explored producing audio from extra generic captions like “melodic techno with waves hitting the shore.” Nevertheless, these fashions had been restricted of their era capability and will solely generate easy acoustic sounds for simply a few seconds. So, we nonetheless have the open problem of producing a wealthy audio sequence with long-term consistency and plenty of stems, much like a music clip, given a single textual content caption. Effectively, let’s simply say it appears just like the problem is near being closed now, due to MusicLM.

Treating audio era as a language activity utilizing a system of straightforward to complicated audio models, like phrases in a sentence, makes audio sound higher and extra constant over time. Current fashions used this method, and MusicLM follows the identical development. Nevertheless, the most important problem right here is to assemble a correct large-scale dataset.

Relating to text-to-image datasets, we’ve many large datasets that contributed loads to the numerous improvement in recent times. This kind of dataset is lacking for the text-to-audio activity, making it actually tough to coach large-scale fashions. Additionally, making ready textual content captions for the music is just not as easy as picture captioning. It’s troublesome to seize salient traits of acoustic scenes or music with only a few phrases. How are you going to describe all these vocals, rhythms, devices, and many others.? Additionally, audio is steady; it doesn’t have a secure construction as a picture. This makes sequence-wide captions a a lot weaker stage of annotation for audio.

MusicLM solves this drawback through the use of an present mannequin, MuLan, that’s skilled to challenge music to its corresponding textual content description. MuLan tasks audios to a shared embedding area, eliminating the necessity for captions throughout the coaching section, thus enabling MusicLM to make use of simply the audio information throughout coaching. Total, MusicLM makes use of MuLan embeddings computed from the audio throughout the coaching and MuLan embeddings computed from the textual content throughout the inference.

MusicLM is the place to begin of a brand new period of text-to-music. It’s skilled with a large-scale unlabeled music dataset. It may well generate lengthy and coherent music at 24 kHz, utilizing complicated textual content descriptions. Additionally, they suggest an analysis dataset named MusicCaps that incorporates music descriptions carried out by consultants, which might be used to judge upcoming text-to-music fashions. 


Take a look at the Paper and Undertaking. All Credit score For This Analysis Goes To the Researchers on This Undertaking. Additionally, don’t overlook to hitch our 13k+ ML SubReddit, Discord Channel, and E mail Publication, the place we share the most recent AI analysis information, cool AI tasks, and extra.



Ekrem Çetinkaya acquired his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin College, Istanbul, Türkiye. He wrote his M.Sc. thesis about picture denoising utilizing deep convolutional networks. He’s at the moment pursuing a Ph.D. diploma on the College of Klagenfurt, Austria, and dealing as a researcher on the ATHENA challenge. His analysis pursuits embody deep studying, pc imaginative and prescient, and multimedia networking.


Related Posts

Internet-Scale Information Has Pushed Unimaginable Progress in AI, However Do We Actually Want All That Information? Meet SemDeDup: A New Technique to Take away Semantic Duplicates in Internet Information With Minimal Efficiency Loss

March 23, 2023

Microsoft AI Introduce DeBERTa-V3: A Novel Pre-Coaching Paradigm for Language Fashions Primarily based on the Mixture of DeBERTa and ELECTRA

March 23, 2023

Assume Like this and Reply Me: This AI Strategy Makes use of Lively Prompting to Information Giant Language Fashions

March 23, 2023

Leave A Reply Cancel Reply

Trending
Machine-Learning

Internet-Scale Information Has Pushed Unimaginable Progress in AI, However Do We Actually Want All That Information? Meet SemDeDup: A New Technique to Take away Semantic Duplicates in Internet Information With Minimal Efficiency Loss

By March 23, 20230

The expansion of self-supervised studying (SSL) utilized to bigger and bigger fashions and unlabeled datasets…

Microsoft AI Introduce DeBERTa-V3: A Novel Pre-Coaching Paradigm for Language Fashions Primarily based on the Mixture of DeBERTa and ELECTRA

March 23, 2023

Assume Like this and Reply Me: This AI Strategy Makes use of Lively Prompting to Information Giant Language Fashions

March 23, 2023

Meet ChatGLM: An Open-Supply NLP Mannequin Skilled on 1T Tokens and Able to Understanding English/Chinese language

March 23, 2023
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Internet-Scale Information Has Pushed Unimaginable Progress in AI, However Do We Actually Want All That Information? Meet SemDeDup: A New Technique to Take away Semantic Duplicates in Internet Information With Minimal Efficiency Loss

March 23, 2023

Microsoft AI Introduce DeBERTa-V3: A Novel Pre-Coaching Paradigm for Language Fashions Primarily based on the Mixture of DeBERTa and ELECTRA

March 23, 2023

Assume Like this and Reply Me: This AI Strategy Makes use of Lively Prompting to Information Giant Language Fashions

March 23, 2023

Meet ChatGLM: An Open-Supply NLP Mannequin Skilled on 1T Tokens and Able to Understanding English/Chinese language

March 23, 2023

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

Demo

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Internet-Scale Information Has Pushed Unimaginable Progress in AI, However Do We Actually Want All That Information? Meet SemDeDup: A New Technique to Take away Semantic Duplicates in Internet Information With Minimal Efficiency Loss

March 23, 2023

Microsoft AI Introduce DeBERTa-V3: A Novel Pre-Coaching Paradigm for Language Fashions Primarily based on the Mixture of DeBERTa and ELECTRA

March 23, 2023

Assume Like this and Reply Me: This AI Strategy Makes use of Lively Prompting to Information Giant Language Fashions

March 23, 2023
Trending

Meet ChatGLM: An Open-Supply NLP Mannequin Skilled on 1T Tokens and Able to Understanding English/Chinese language

March 23, 2023

Etienne Bernard, Co-Founder & CEO of NuMind – Interview Sequence

March 22, 2023

This AI Paper Proposes COLT5: A New Mannequin For Lengthy-Vary Inputs That Employs Conditional Computation For Greater High quality And Quicker Velocity

March 22, 2023
Facebook Twitter Instagram YouTube LinkedIn TikTok
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms
  • Advertise
  • Shop
Copyright © MetaMedia™ Capital Inc, All right reserved

Type above and press Enter to search. Press Esc to cancel.