Speeeeeeech ;). (yihaa, so glad it is there, but I have a question)

Abstract · June 29, 2026, 2:39pm

Just a note the @danzen is an account that I made when I was away - I never see messages to it. I am @Abstract here.

Abstract · June 29, 2026, 2:54pm

At 19 minutes in the video above, it talks about text to speech and it is using a different library called Kokoro - so we will not be bringing in that into the ZIM world... that would be up to you to install and follow what he did. Not terribly pleasant, but not really that much code. If anyone gets it working, let us know, we can have a look and perhaps it will make us reconsider.

karelrosseel82 · June 29, 2026, 8:31pm

I love too

EducaSoft · June 29, 2026, 8:48pm

Karel, dit you LISTEN to the generated speech in Belgian Dutch and Dutch from the Netherlands? I hope I won't get nightmares from it tonight. Kokoro, Coqui, they all sound terrible. In the current project I am making, I am pregenerating my text to speech using Google Gemini 3.1 TTS. Sure it costs me around €2 per hour of generated audio, but it sounds absolutely fabulous.

I'm currently creating a new project (actually educational apps but this time for toddlers an preschoolers) and these are in 20! different languages. I created the needed scripting and automatisations and am currently spending around €0,50 per day generation huge amount of speech files. They don't only sound good, they even have emotion in them and because they are pregenerated I only have to create then once. Agreed they are not dynamic, but they are soooooo good.

Once the site launches with the first 20 activities on them then I'll post something here about it. Until now... not a single TTS impressed me in Dutch (except a few elevenlabs voices which are acceptable and Gemini 3.1 which is SUPERB)

EducaSoft · June 29, 2026, 8:50pm

as a little teaser.... here is a little demo (in Dutch, sorry Dan )

https://www.bolleboos.be/coderen/

That sound makes me get wet dreams

karelrosseel82 · June 29, 2026, 10:21pm

waauw very cool dutch .. not flemish
did you test my new app
pet bits drag an dropping watch the videos..

I was looking to https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/live-api/configure-language-voice

RESPOND IN dutch. YOU MUST RESPOND UNMISTAKABLY IN dutch.

https://docs.cloud.google.com/text-to-speech/docs/gemini-tts
if you start the example with emotion!

karelrosseel82 · June 29, 2026, 10:26pm

found out @Abstract

https://www.reddit.com/r/godot/comments/1qecus0/just_made_godot_kokoro_tts_highquality_neural/

https://huggingface.co/spaces/hexgrad/Kokoro-TTS

EducaSoft · June 30, 2026, 5:28am

I tried your app.
Looks like the same concept, but in yours I currently didn't hear sound.

I don't get what you meant with "not Flemish" , but I nowhere said the word "Flemish"

In education there is however a big difference between the Belgian Dutch and the Dutch spoken in the Netherlands. Same words..... completely different pronunciation and intonation in a lot of cases. And for the Belgian market... a lot is in the Dutch for the Netherlands. Thats why I make that difference.

I see you make a lot of little apps. The only thing I think is that a lot of them stay in an unpolished state and then you try another thing with ZIM. Fun to do, but not exactly the same way how I work. I tend to polish up my work a lot, but that could be because I worked in education but also for an educational publisher for the last 30 years. For the publisher we are used to polish heavily

karelrosseel82 · June 30, 2026, 6:47am

yes I know .. untill I have a job as you paid to make apps
here you see flemish is in v1.0 of Kokoro

EducaSoft · June 30, 2026, 7:03am

Ow but I don't get paid with what I make at bolleboos.be and the next project at educasoft.be . That is all hobby projects. I do have of course the automatism to polish from the times I made commercial stuff for the publisher.

Can you tell me the url where you found that kokoro info? I really can't find info about kokoro being multilingual and certainly nothing Dutch. I also tried voice cloning text to speech. They are not bad at all, but still not (logically) on par with Gemini 3.1 TTS.

The stuff I want to make now is actually for the community. I don't need to earn anything on it. The only thing I DO want to earn is nice visitor stats and users. Preferably in all of the 20 languages. Thats why I even invest some money in it, even when there will be no real return on investment.

karelrosseel82 · July 1, 2026, 5:22am

I could let it work

https://share.gemini.google/0DtSS80V9JVz

newer style

fix completed

EducaSoft · July 1, 2026, 7:53am

You COULD, yes. I encourage you to give it a go. Run Kokoro in the browser. Let is download the inference models and then try it out. Let ist generate (lets keep it simple) a sentence of for example 10 seconds and then look at 1) the output quality , 2) The speed of generation on a general purpose device without a big GPU and 3) the terrible user experience.

karelrosseel82 · July 1, 2026, 7:58am

I tried but still get the .stream error

so suggestion of MIT to go the HeadTTS
do you know about that?

but it seam to be tested on html local / server?

can you Dan or Bart?

karelrosseel82 · July 1, 2026, 8:09am

It works , you need to download the mp3 to hear the voice that is talking the text you typed !!!! happy boy I'm @danzen @Abstract and @EducaSoft

https://codepen.io/karelrosseel-prive/pen/MYJEzzN

EducaSoft · July 1, 2026, 8:38am

Yes it works. I needs 7MB of initial download and 32,5MB of resources to download. On a fast 4G connection that is 10+ seconds load time only for that. Then I let it generate a sentence that speaks for exactly 3,5 seconds. It took my computer (WITH GPU) 18 seconds of wait time before I heard the voice speak.

So yes it works, but user experience is total zero. I wouldn't even like to see the SEO ranking for this kind of page

I understand what you want to do Karel and it is fun and inventive.... It has a high WOW-factor, but practical usability is about 0,0 . Thats why I pregenerate all my sound. It loads almost instantly, no wait time and sounds the same everywhere, no matter what device people play on. I really wonder hoe long generation with kokoro would take on a regular phone or a general purpose chromebook. People don't wait 18 seconds for a 3,5second soundfile to be created

karelrosseel82 · July 1, 2026, 5:44pm

I want to see the letters lighting up when speaking and with only one voice for all devices the same.. that's why I find it important!

where did you found Dutch/Nederlands? AI said not possible yet

so @EducaSoft what is the voice you are using for Dutch?
can you send me link?

model Kokoro is a lightweight it tells

check this

https://github.com/Ashish-Patnaik/kokoclone
you can test it here without installing
https://huggingface.co/spaces/PatnaikAshish/kokoclone
and the fastKoko with OpenAI voices!

and in this video you see voice1+voice2 you can for example combine

karelrosseel82 · July 1, 2026, 9:08pm

so now possible with seeing the works lighting up

EducaSoft · July 1, 2026, 9:21pm

It's a nice achievement, but way too slow to be usefull

karelrosseel82 · July 2, 2026, 12:28am

I found a more fast and also Dutch language at chatterBox open source
https://github.com/resemble-ai/chatterbox
can you test it also Bart and @Abstract , thanks

only takes a while to download the GB package

https://www.reddit.com/r/LocalLLaMA/comments/1n8h3oj/chatterbox_multilingual/

more examples at
https://resemble-ai.github.io/chatterbox_demopage/

So Dan as the https://zimjs.org/bot also runs on huggingFace, it can be used into a ZAPP maybe whithout downloading the full GB-multilanguage package?

zeroGPU queueing!!

I want to use a link to huggingfaces, so I need this

chatterbox with gradio trying to fix.. can you help Bart:
a connection between the huggingface dutch and ZIM app possible?

problem fetch solved

API not found

EducaSoft · July 2, 2026, 8:07am

Karel, you really can't do this. Well maybe you can, but on huggingface you need to RENT gpu. I don't think Dan is going to pay for gpu usage just because we could access it in ZIM. It can become expensive quite quickly. I'm in this business for over 30 years nog (33 to be precise) and to be in business means you want to stay in business. That means make it as independent as possible from third party sources and self sunstainable. I do understand that you absolutely adore the idea of live text to speech... but it comes with mora than one price. That's why I'm not going to invest any time or money in live tts. I pregenerate all my speech in extremely high quality at a one time price and then..l it'll work forever . Thats the difference of course between people who like "experiment" and people who like to "publish" . Both people are awesome by the way.

Did you look up what it would cost to have a tts using GPU (only way to make it fast enough) on huggingface?