Egregoros · Phoenix Framework

Timeline

Post

Remote status

Context

Q.U.I.N.N.

@icedquinn@blob.cat remote

as of today we've finished with the tts reading of the first amber chronicles book

somewhere about half way in i discovered i could run some tools to clean up the voice samples. so there's a huge quality jump

kaia

@kaia@pleroma.soykaf.com remote

@icedquinn may I ask what software you are using? is it open sauce?

Q.U.I.N.N.

@icedquinn@blob.cat remote

@kaia unar to unbundle the epubs
firefox to open the manuscript
xclip to paste the manuscript
sed to remove smart quotes
kakoune to manually tag all lines with quotes and then manually reformat to a fountain script
fountain_tools in python to parse the resulting fountain script, match them to json parameters for the character, and output a json file for each line of dialogue
curl to hurl the json file up at the server
chatterbox-tts to actually render voice lines from json
ninja to treat it as a build problem: curl + json file = mp3, pool of 1, the build is just making the request and receiving the audio
ffmpeg to combine everything, do dynaudnorm
yt-dlp to pull clips off youtube
audacity to trim down to 30 seconds and remove as much irrelevant audio as possible
resemble enhance to clean away noise and youtube degredation
matchering (run via UVR) to match the mastering of resemble'd clips together to one radio voice
just is used as a task runner (i'm ambivalent between this, lets-cli and taskfile, but i can remember just's syntax for basic scripts and the yaml shit i have to look up every time)

Replying to @icedquinn@blob.cat

Q.U.I.N.N.

@icedquinn@blob.cat remote

@kaia @lain the last chapter of book 1 as an example. (the first half needs to be resynthesized because the voices got cleaned up and some people recast)

Replies

Replying to @icedquinn@blob.cat

kaia

@kaia@pleroma.soykaf.com remote

@icedquinn @lain that is very listenable, cool! but your process is too complicated for me

Replying to @kaia@pleroma.soykaf.com

Q.U.I.N.N.

@icedquinn@blob.cat remote

@kaia @lain that is where the quality comes from

for general trash reading, i just throw it at kokoro-tts (i made a script that shovels a text file in to the python api and spits out hour long chapter files) or piper.

Replying to @icedquinn@blob.cat

lain

@lain@lain.com remote

@icedquinn @kaia Kokoro is really good and fast!

Replying to @lain@lain.com

Q.U.I.N.N.

@icedquinn@blob.cat remote

@lain @kaia i use kokoro sometimes.

piper is the fastest outside of old HMM models, though. i think most of the voice quality issue there is because librivox is just a bad dataset. the speakers sound dead inside to begin with and the audio quality isn't preprocessed.

Replying to @icedquinn@blob.cat

mia (developer mode)

@mia@shrimptest.0x0.st remote

@icedquinn @lain @kaia also scientific datasets and 11 khz voice samples, name a more iconic duo

Replying to @mia@shrimptest.0x0.st

Q.U.I.N.N.

@icedquinn@blob.cat remote

@mia @lain @kaia i suspect the low khz is a combination of old datasets and/or easier to generate low khz.