Egregoros

Signal feed

Timeline

Post

Remote status

Context

3
as of today we've finished with the tts reading of the first amber chronicles book :blobcatread:

somewhere about half way in i discovered i could run some tools to clean up the voice samples. so there's a huge quality jump :blobcatlaugh:
@kaia unar to unbundle the epubs
firefox to open the manuscript
xclip to paste the manuscript
sed to remove smart quotes
kakoune to manually tag all lines with quotes and then manually reformat to a fountain script
fountain_tools in python to parse the resulting fountain script, match them to json parameters for the character, and output a json file for each line of dialogue
curl to hurl the json file up at the server
chatterbox-tts to actually render voice lines from json
ninja to treat it as a build problem: curl + json file = mp3, pool of 1, the build is just making the request and receiving the audio
ffmpeg to combine everything, do dynaudnorm
yt-dlp to pull clips off youtube
audacity to trim down to 30 seconds and remove as much irrelevant audio as possible
resemble enhance to clean away noise and youtube degredation
matchering (run via UVR) to match the mastering of resemble'd clips together to one radio voice
just is used as a task runner (i'm ambivalent between this, lets-cli and taskfile, but i can remember just's syntax for basic scripts and the yaml shit i have to look up every time)

Replies

6
@lain @kaia i use kokoro sometimes.

piper is the fastest outside of old HMM models, though. i think most of the voice quality issue there is because librivox is just a bad dataset. the speakers sound dead inside to begin with and the audio quality isn't preprocessed.