TTS | Notion

Text-To-Speech

2 Stage Pipeline

Audio보다 low resolution인 intermediate representation을 생성
1. e.g. Mel-Spectrograms, Linguistic Features, STFT
intermediate representation에서 raw waveform audio 합성
1. cf. Vocoder