Learn how deep learning transformed text-to-speech forever. In this video, we explore the neural revolution that began in 2016 with WaveNet and Tacotron, and how these breakthroughs reshaped speech synthesis into what we know today. Discover how end-to-end learning, learned...
Learn how deep learning transformed text-to-speech forever. In this video, we explore the neural revolution that began in 2016 with WaveNet and Tacotron, and how these breakthroughs reshaped speech synthesis into what we know today.
Discover how end-to-end learning, learned representations, and neural vocoders made synthetic voices sound natural, expressive, and human-like — paving the way for modern systems like FastSpeech, VITS, and VALL-E.
This is Video 6 in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art concepts in speech synthesis.
🎯 KEY TOPICS:
- The 2016 breakthrough: WaveNet and Tacotron
-How end-to-end learning changed speech synthesis
- Why neural networks replaced manual feature design
- The 2-stage TTS pipeline: acoustic model + vocoder
- How mel spectrograms bridge text and audio
- Neural vocoders: WaveNet, WaveGlow, HiFi-GAN
- Sequence-to-sequence models with attention
- Parallel TTS architectures: FastSpeech, GlowTTS
- How neural TTS enabled voice cloning and expressiveness
- The rise of codec-based generation (VALL-E, AudioLM, SPEAR-TTS)
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content:
0:00 Intro
0:59 The deep learning breakthrough
6:16 Core neural innovations
10:25 2-stage neural pipeline
16:34 WaveNet
19:40 Tacotron
23:55 What makes neural TTS work
26:47 Parallel neural generation
28:36 Unlocking voice cloning
29:40 Modern TTS architectures
30:40 End-to-end
31:51 Codec-based voice cloning
35:06 Open challenges
38:53 Takeaways
Learn about traditional text-to-speech techniques before the rise of neural networks in 2016. Explore formant synthesis, concatenative synthesis, and statistical parametric (HMM-based) synthesis—the methods that paved the way for modern neural TTS. This is video 5 in The...
Learn about traditional text-to-speech techniques before the rise of neural networks in 2016.
Explore formant synthesis, concatenative synthesis, and statistical parametric (HMM-based) synthesis—the methods that paved the way for modern neural TTS.
This is video 5 in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art concepts in speech synthesis.
🎯 KEY TOPICS:
- The evolution of speech synthesis before deep learning
- How formant synthesis modeled the vocal tract
- How concatenative synthesis stitched recorded speech units
- The rise of HMM-based (parametric) synthesis
- Why pre-neural voices sounded robotic or over-smoothed
- How these classic methods paved the way for neural TTS
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content:
0:00 Intro
4:05 Formant synthesis
7:41 Formant: Pros and cons
12:18 Concatenative synthesis
13:41 Diphone concatenation
15:10 Unit selection
25:20 Concat: Pros and cons
27:55 Statistical parametric synthesis (HMM)
38:57 HMM-based TTS: Pros and cons
42:32 Comparing traditional TTS
In this video, I explain the intuition behind how Text-to-Speech and Voice Cloning models work—and how they differ. This is the fourth video in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art...
In this video, I explain the intuition behind how Text-to-Speech and Voice Cloning models work—and how they differ.
This is the fourth video in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art concepts in speech synthesis.
🎯 KEY TOPICS:
The intuition behind how AI generates speech
The difference between TTS and voice cloning
Use cases for TTS vs. voice cloning
How these technologies work under the hood
Zero-shot and few-shot voice cloning, plus fine-tuning
The tradeoffs between data, speed, and quality
What speaker embeddings are and why they matter
How AI captures voice identity: timbre, accent, rhythm, prosody
The key ethical aspects to consider
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content:
0:00 Intro
0:48 What's TTS?
5:01 Whats voice cloning?
8:56 TTS vs voice cloning
12:17 Voice adaptation spectrum
14:02 Zero-shot
15:14 Few-shot
16:40 Fine-tuning
17:54 Training from scratch
19:17 Speaker embeddings
22:18 How do zero- and few-shot work?
24:18 How does fine-tuning work?
27:47 Quality vs data tradeoff
29:18 Voice cloning products
32:08 Ethical considerations
35:10 Responsible use
37:08 Takeaways
Before AI can speak, it needs to read. This lecture explains how TTS systems process raw text into phonemes - handling everything from numbers and abbreviations to the tricky problem of words that look the same but sound different. This is the third video in The Monster...
Before AI can speak, it needs to read. This lecture explains how TTS systems process raw text into phonemes - handling everything from numbers and abbreviations to the tricky problem of words that look the same but sound different.
This is the third video in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art concepts in speech synthesis.
🎯 KEY TOPICS:
- Text normalization: standardizing raw text
- Grapheme-to-Phoneme (G2P) conversion
- Rule-based approaches (dictionaries + fallback rules)
- Learned approaches (seq2seq models)
- The homograph problem and ambiguity resolution
- Tools: CMUDict, Phonemizer, DeepPhonemizer, g2p_en
- Modern end-to-end TTS that learns text processing implicitly
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content
0:00 Intro
0:12 TTS pipeline
2:20 Text processing
5:00 Normalization
7:31 Normalization tools
9:55 Grapheme-to-phoneme
14:36 Rule-based G2P
19:20 Learned G2P
24:07 Ambiguity problem
33:20 Modern end-to-end TTS
35:38 G2P tools
38:17 Takeaways
Before building AI speech systems, we need to understand how humans actually produce speech. This lecture breaks down the biology and physics of human voice production - from thoughts to sound waves - and explains why creating realistic text-to-speech is so challenging. This...
Before building AI speech systems, we need to understand how humans actually produce speech. This lecture breaks down the biology and physics of human voice production - from thoughts to sound waves - and explains why creating realistic text-to-speech is so challenging.
This is the second video in The Monster Text-to-Speech and Voice Cloning Course, a lecture series designed to give you a deep understanding of state-of-the-art concepts in speech synthesis.
🎯 KEY TOPICS:
- The speech pipeline: Thought → Language → Phonemes → Articulation → Sound
- Phonemes and the International Phonetic Alphabet (IPA)
- Source-filter model of speech production
- Vocal folds, formants, and resonance
- Prosody: the rhythm and melody of speech
- Timbre: what makes every voice unique
- Coarticulation and why context matters
- How all of this maps to AI TTS systems
🔬 WHY THIS MATTERS:
Understanding human speech is essential for building realistic TTS. Modern neural vocoders and voice cloning models replicate these biological processes - the source-filter model directly informs how AI generates speech.
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content:
0:00 Intro
1:11 Human vs machine speech pipeline
3:32 Language
5:31 Phonemes
8:30 international Phonetic Alphabet
13:14 English phonetic chart
14:55 Phonetic transcription
16:20 Coarticulation
18:53 Prosody
21:34 Timbre
25:19 Source-fliter model of speech production
30:12 Glottal sound
33:01 More source-filter model
34:42 Formants
40:22 Emotion and expressivity
42:22 Speech is multilayered
44:35 Why is speech hard for machines?
Welcome to the most comprehensive Text-to-Speech (TTS) and Voice Cloning course on YouTube! Learn how AI systems generate realistic human speech from text. In this video, I outline the plan for the course and explain how to make the most out of it. 📚 WHAT YOU'LL LEARN:...
Welcome to the most comprehensive Text-to-Speech (TTS) and Voice Cloning course on YouTube! Learn how AI systems generate realistic human speech from text. In this video, I outline the plan for the course and explain how to make the most out of it.
📚 WHAT YOU'LL LEARN:
Foundations → Core Technologies (Neural Vocoders, Audio Codecs, Voice Cloning) → Advanced Topics (Emotion, Prosody, Conversational AI)
🎯 WHO THIS IS FOR:
ML Engineers, Audio Programmers, Developers, Engineering Managers, Product Managers
CONSULTING:
🚀 AI Music + Audio Consulting: https://valeriovelardoadvisor.com/
📩 Get my AI Music content in your inbox for free: https://valeriovelardo.substack.com/
COURSE MATERIALS + DISCUSSION:
- GitHub Repository: https://github.com/musikalkemist/tts-voicecloning-course
- Join The Sound of AI Slack Community: https://valeriovelardo.com/the-sound-of-ai-community/ (#tts-course channel)
Content:
0:00 Intro
4:35 Who's this course for?
5:19 Pre-requisites
7:53 Teaching style
12:04 What you'll learn
21:21 Learning material + feedback
23:47 How to get the most out of this course
26:30 Course pace
Watch the presentations of the 3rd Generative AI Music Workshop, organized by The Sound of AI in collaboration with the Music Technology Group. Participants present the AI music technologies they developed during the workshop. The workshop was held at University Pompeu Fabra...
Watch the presentations of the 3rd Generative AI Music Workshop, organized by The Sound of AI in collaboration with the Music Technology Group. Participants present the AI music technologies they developed during the workshop. The workshop was held at University Pompeu Fabra in Barcelona in December 2024.
🚀 Music Tech Advisory: https://valeriovelardoadvisor.com/
🌱 Generative AI Music Workshop: https://www.upf.edu/web/mtg/generative-music-ai-workshop
🎧 My new startup Transparent Audio: https://www.transparentaudio.ai/
✍️ My Substack on tech & society: https://valeriovelardotechandsociety.substack.com/
Content:
0:00 Intro
0:54 Sonic Palette
8:22 Binary Star
13:27 The Hive
18:24 asdf (Working Title)
23:42 ChordChemists
27:44 Synaesthesia
33:23 Balkon
37:55 WavePlay
42:47 LocomIA ChoIA
Watch the concert of the 3rd Generative AI Music Workshop, organized by The Sound of AI in collaboration with the Music Technology Group. Participants perform using the AI music technologies they developed during the workshop. The workshop was held at University Pompeu Fabra...
Watch the concert of the 3rd Generative AI Music Workshop, organized by The Sound of AI in collaboration with the Music Technology Group. Participants perform using the AI music technologies they developed during the workshop. The workshop was held at University Pompeu Fabra in Barcelona in December 2024.
🚀 Music Tech Advisory: https://valeriovelardoadvisor.com/
🌱 Generative AI Music Workshop: https://www.upf.edu/web/mtg/generative-music-ai-workshop
🎧 My new startup Transparent Audio: https://www.transparentaudio.ai/
✍️ My Substack on tech & society: https://valeriovelardotechandsociety.substack.com/
Content:
0:00 Intro
2:03 Binary Star
12:04 Balkon
16:25 WavePlay
20:12 The Hive
25:44 ChordChemists
29:23 asdf (Working Title)
32:57 Synaesthesia
45:03 LocomIA ChoIA
After more than a year of silence, I’m finally back. In this video, I share what’s been happening in my personal and professional life, why I stopped publishing, and what you can expect from me and this channel going forward. I’ll also tell you about some exciting new...
After more than a year of silence, I’m finally back. In this video, I share what’s been happening in my personal and professional life, why I stopped publishing, and what you can expect from me and this channel going forward. I’ll also tell you about some exciting new projects I’ve been working on, like my book AI Music Revolution, my startup Transparent Audio, and my new Substack on tech & society.
🚀 Music Tech Startup Advisory: https://valeriovelardoadvisor.com/
🎧 My new startup Transparent Audio: https://www.transparentaudio.ai/
✍️ My Substack on tech & society: https://valeriovelardotechandsociety.substack.com/
Content:
0:00 Intro
1:02 Personal update
1:58 Workshop on Generative AI Music
3:45 Adjunct Professor of Gen AI Music at MTG
5:21 Writing AI Music Revolution
10:13 Substack on tech & society
10:50 Consultant out, advisor in
14:03 New startup: Transparent Audio
16:16 What next?
I introduce 5 open source generative music AI models. They are both in the symbolic and audio space. You can use them to streamline the development of your AI music systems. Interested in music AI consulting? https://thesoundofai.com/consulting.html Interested in music AI...
I introduce 5 open source generative music AI models. They are both in the symbolic and audio space. You can use them to streamline the development of your AI music systems.
Interested in music AI consulting?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
======================================
MusicGen
Website: https://ai.honu.io/papers/musicgen/
Github: https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md
MuseCoco
Website: https://ai-muzic.github.io/musecoco/
Github: https://github.com/microsoft/muzic/tree/main/musecoco
Museformer
Website: https://ai-muzic.github.io/museformer/
Github: https://github.com/microsoft/muzic/tree/main/museformer
RAVE
Github: https://github.com/acids-ircam/RAVE
MusicAgent
Article: https://arxiv.org/pdf/2310.11954
Github: https://github.com/microsoft/muzic/tree/main/musicagent
======================================
Check The AI Leader's Blueprint website:
https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 Intro
1:08 MusicGen
6:36 MuseCoco
11:40 Museformer
16:37 RAVE
19:19 MusicAgent
I implemented an AI music critic. It turned out to be more funny than I expected. Interested in music AI advisorship? https://thesoundofai.com/consulting.html Interested in music AI recruitment? https://thesoundofai.com/recruitment.html Code of the AI music critic on GitHub:...
I implemented an AI music critic. It turned out to be more funny than I expected.
Interested in music AI advisorship?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
Code of the AI music critic on GitHub:
https://github.com/musikalkemist/ai-music-critic
======================================
Check The AI Leader's Blueprint website:
https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 Intro
0:50 Why building an AI critic
1:26 How humans judge music
2:58 How AI understands music
3:38 Designing the AI critic
3:51 Essentia
4:26 Implementation
7:48 Running musiccritic
9:08 AI critiques 22 by Taylor Swift
11:50 Differences between human and AI critic
12:56 How to improve the AI critic
80% of AI projects fail. Learn how to maximise your success using a proven AI project lifecycle, that emphasizes customer discovery, strategic planning, and iterative development. Check The AI Leader's Blueprint website:...
80% of AI projects fail. Learn how to maximise your success using a proven AI project lifecycle, that emphasizes customer discovery, strategic planning, and iterative development.
Check The AI Leader's Blueprint website:
https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint
======================================
Interested in music AI advisorship?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 Intro
0:38 Challenges with AI projects
1:05 My mantra
1:28 Customer discovery
3:24 Planning
4:48 Iterative development
5:41 Proof of Concept
6:12 Minimum viable product
6:44 The AI Leader's Blueprint
8:18 Full product
8:58 AI lifecycle in a nutshell
OpenAI released Sora, a state-of-the-art text-to-video deep learning model. In this video, I provide a complete breakdown of Sora. You can learn what Sora can do, how it works, its limitations, its emergent properties, its impact, and the safety threats it brings. Check The...
OpenAI released Sora, a state-of-the-art text-to-video deep learning model. In this video, I provide a complete breakdown of Sora. You can learn what Sora can do, how it works, its limitations, its emergent properties, its impact, and the safety threats it brings.
Check The AI Leader's Blueprint website:
https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint
======================================
Interested in music AI consulting?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 Intro
1:12 10K-foot overview
2:21 Use cases
4:17 How it works
11:10 The AI Leader's Blueprint
12:23 Emerging properties
17:24 Is Sora a world model?
21:49 Limitations
23:59 Impact
28:02 Safety
29:30 Ethical problems
"The AI Leader's Blueprint" is a 3-day crash course on Zoom that helps you become an AI leader. Check The AI Leader's Blueprint website: https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint ====================================== Interested in music AI...
"The AI Leader's Blueprint" is a 3-day crash course on Zoom that helps you become an AI leader.
Check The AI Leader's Blueprint website:
https://the-sound-of-ai-academy.teachable.com/p/the-ai-leader-s-blueprint
======================================
Interested in music AI consulting?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 The AI Leader's Blueprint
0:43 What it takes to be an AI leader
2:17 Course focus
3:06 Who's the course for?
3:35 Teaching style
4:23 Course schedule
4:45 Course website
5:37 What you'll learn
8:45 Why should you attend this course?
10:25 Student's feedback
11:33 What you'll get
Learn about text-to-music generation with Mustango. Get the lecture slides: https://github.com/musikalkemist/generativemusicaicourse/blob/main/22.%20Text-to-music%20generation%20with%20Mustango/Slides/22.%20Text-to-music%20generation%20with%20Mustango.pdf Iran's YouTube...
Learn about text-to-music generation with Mustango.
Get the lecture slides:
https://github.com/musikalkemist/generativemusicaicourse/blob/main/22.%20Text-to-music%20generation%20with%20Mustango/Slides/22.%20Text-to-music%20generation%20with%20Mustango.pdf
Iran's YouTube channel:
https://www.youtube.com/@iran-r-roman/videos
Website of the Generative Music AI Workshop in Barcelona:
https://www.upf.edu/web/mtg/generative-music-ai-workshop
Sign up to The Sound of AI Slack Community to join the discussion:
https://valeriovelardo.com/the-sound-of-ai-community/
======================================
Interested in music AI consulting?
https://thesoundofai.com/consulting.html
Interested in music AI recruitment?
https://thesoundofai.com/recruitment.html
Become a Python ninja with my Advanced Python Programming course:
https://the-sound-of-ai-academy.teachable.com/p/advanced-python-programming
Connect with Valerio on LinkedIn:
https://www.linkedin.com/in/valeriovelardo
Follow Valerio on Twitter:
https://twitter.com/musikalkemist
======================================
Content
0:00 Intro
1:00 MusicBench dataset
6:10 Architecture
9:34 Sample generated music
13:48 Generating music with a prompt
17:44 A word from Valerio