I tried your app.
Looks like the same concept, but in yours I currently didn't hear sound.
I don't get what you meant with "not Flemish" , but I nowhere said the word "Flemish" 
In education there is however a big difference between the Belgian Dutch and the Dutch spoken in the Netherlands. Same words..... completely different pronunciation and intonation in a lot of cases. And for the Belgian market... a lot is in the Dutch for the Netherlands. Thats why I make that difference.
I see you make a lot of little apps. The only thing I think is that a lot of them stay in an unpolished state and then you try another thing with ZIM. Fun to do, but not exactly the same way how I work. I tend to polish up my work a lot, but that could be because I worked in education but also for an educational publisher for the last 30 years. For the publisher we are used to polish heavily 
yes I know .. untill I have a job as you paid to make apps 
here you see flemish is in v1.0 of Kokoro
Ow but I don't get paid with what I make at bolleboos.be and the next project at educasoft.be . That is all hobby projects. I do have of course the automatism to polish from the times I made commercial stuff for the publisher.
Can you tell me the url where you found that kokoro info? I really can't find info about kokoro being multilingual and certainly nothing Dutch. I also tried voice cloning text to speech. They are not bad at all, but still not (logically) on par with Gemini 3.1 TTS.
The stuff I want to make now is actually for the community. I don't need to earn anything on it. The only thing I DO want to earn is nice visitor stats and users. Preferably in all of the 20 languages. Thats why I even invest some money in it, even when there will be no real return on investment.
1 Like
You COULD, yes. I encourage you to give it a go. Run Kokoro in the browser. Let is download the inference models and then try it out. Let ist generate (lets keep it simple) a sentence of for example 10 seconds and then look at 1) the output quality , 2) The speed of generation on a general purpose device without a big GPU and 3) the terrible user experience.
I tried but still get the .stream error
so suggestion of MIT to go the HeadTTS
do you know about that?
but it seam to be tested on html local / server?
can you Dan or Bart?
It works , you need to download the mp3 to hear the voice that is talking the text you typed !!!! happy boy I'm @danzen @Abstract and @EducaSoft
https://codepen.io/karelrosseel-prive/pen/MYJEzzN
Yes it works. I needs 7MB of initial download and 32,5MB of resources to download. On a fast 4G connection that is 10+ seconds load time only for that. Then I let it generate a sentence that speaks for exactly 3,5 seconds. It took my computer (WITH GPU) 18 seconds of wait time before I heard the voice speak.
So yes it works, but user experience is total zero. I wouldn't even like to see the SEO ranking for this kind of page 
I understand what you want to do Karel and it is fun and inventive.... It has a high WOW-factor, but practical usability is about 0,0 . Thats why I pregenerate all my sound. It loads almost instantly, no wait time and sounds the same everywhere, no matter what device people play on. I really wonder hoe long generation with kokoro would take on a regular phone or a general purpose chromebook. People don't wait 18 seconds for a 3,5second soundfile to be created
I want to see the letters lighting up when speaking and with only one voice for all devices the same.. that's why I find it important!
where did you found Dutch/Nederlands? AI said not possible yet
so @EducaSoft what is the voice you are using for Dutch?
can you send me link?
model Kokoro is a lightweight it tells
check this
https://github.com/Ashish-Patnaik/kokoclone
you can test it here without installing
https://huggingface.co/spaces/PatnaikAshish/kokoclone
and the fastKoko with OpenAI voices!
and in this video you see voice1+voice2 you can for example combine
so now possible with seeing the works lighting up
It's a nice achievement, but way too slow to be usefull
I found a more fast and also Dutch language at chatterBox open source
https://github.com/resemble-ai/chatterbox
can you test it also Bart and @Abstract , thanks
only takes a while to download the GB package
https://www.reddit.com/r/LocalLLaMA/comments/1n8h3oj/chatterbox_multilingual/
more examples at
https://resemble-ai.github.io/chatterbox_demopage/
So Dan as the https://zimjs.org/bot also runs on huggingFace, it can be used into a ZAPP maybe whithout downloading the full GB-multilanguage package?
zeroGPU queueing!!
I want to use a link to huggingfaces, so I need this
chatterbox with gradio trying to fix.. can you help Bart:
a connection between the huggingface dutch and ZIM app possible?
problem fetch solved
API not found
Karel, you really can't do this. Well maybe you can, but on huggingface you need to RENT gpu. I don't think Dan is going to pay for gpu usage just because we could access it in ZIM. It can become expensive quite quickly. I'm in this business for over 30 years nog (33 to be precise) and to be in business means you want to stay in business. That means make it as independent as possible from third party sources and self sunstainable. I do understand that you absolutely adore the idea of live text to speech... but it comes with mora than one price. That's why I'm not going to invest any time or money in live tts. I pregenerate all my speech in extremely high quality at a one time price and then..l it'll work forever . Thats the difference of course between people who like "experiment" and people who like to "publish" . Both people are awesome by the way.
Did you look up what it would cost to have a tts using GPU (only way to make it fast enough) on huggingface?
no not yet, that's why I wonder if you have huggingfaces experience as the https://zimjs.com/bot uses it also.. Dan pays for that also.. so why not for live text was the idea?
@Abstract do you know more about a possibility?
what is the cost Dan to run via huggingfaces?
Thanks Bart for your words of wisdom and Karel for your experimentations. I think Claude is surpassing what the ZIM Chatbot can do, but if there were more people using the Chatbot then it might not be sustainable. It is different than providing a service that would be called on by apps which are loaded millions of times a day. Not that one voice app would be that popular - but maybe. Anyway - I suspect that we are not supporting live speech but will look into it further when we get to it. Working on other things at the moment. Cheers.
2 Likes