Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Related:

https://github.com/antirez/qwen-asr

https://github.com/antirez/voxtral.c

Qwen-asr can easily transcribe live radio (see README) in any random laptop. It looks like we are going to see really cool things on local inference, now that automatic programming makes a lot simpler to create solid pipelines for new models in C, C++, Rust, ..., in a matter of hours.

 help



Your voxtral.c work was a big motivator for me. I built a macOS menu bar dictation app (https://github.com/T0mSIlver/localvoxtral) around Voxtral Realtime, currently using a voxmlx fork with an OpenAI Realtime WebSocket server I added on top.

The thing that sold me on Voxtral Realtime over Whisper-based models for dictation is the causal encoder. Text streaming in as you speak rather than appearing after you stop is a fundamentally different UX. On M1 Pro with a 4-bit quant through voxmlx it feels responsive enough for natural dictation, though I haven't done proper latency benchmarks yet.

Integrating voxtral.c as a backend is on my roadmap, compiling to a single native binary makes it much easier to bundle into a macOS app than a Python-based backend.


> Text streaming in as you speak rather than appearing after you stop is a fundamentally different UX

100%. I don’t understand how people are able to compromise on this.


Which is why long term current programming languages will eventually become less relevant in the whole programming stack, as in get the computer to automate tasks, regardless how.

Assuming RAM prices will not make it totally unaffordable. Current situation is atrocious and big infrastructure corps seem to love it, they do not want independent computing. Alternatively they might build specialized branded hardware which people could only use for what corps allow them to do for nice monthly fee.

Another problem is too much abstraction on input spec level. The other day I asked Claude to generate few classes. When reviewing the code I noticed it doing full scan for ranges on one giant set. This would bring my backend to a halt. After pointing it out to Claude it had smartened up to start with lower_bound() call. When there are no people to notice such things what do you think we are going to have?


Agreed, in regards to prices, it appears to be the new gold, lets see how this gets sorted out, with NPUs, FPGAs, analog (Cerebas),...

Now the abstraction I am with you on that, I foresee a more formal way to give specifications, but more suitable for natural language as input, or even proper mathematics, than the languages we have been using thus far.

Naturally we aren't there yet.


>"Naturally we aren't there yet."

But we were. COBOL ;)

On more serious note. Sure we need Spec development IDE which LLM would compile to a language of choice (or print ASIC). It would still not prevent that lower_bound things from happening and there will be no people to find out why


If you look at the amount of stuff people type on tiny chat windows, I would say AI is COBOL's revenge.

Unfortunely that is already the case when debugging low code, no code tools, and good luck having any kind of versioning with those.


> Alternatively they might build specialized branded hardware which people could only use for what corps allow them to do for nice monthly fee.

That's why I'm still holding on to a bulky Core 2 Duo Management Engine-free Fujitsu workstation, for when personal computing finally goes underground again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: