You don't happen to know a whisper solution that combines diarization with live ...

peterleiser · 2025-08-14T00:06:53 1755130013

Check out https://github.com/jhj0517/Whisper-WebUI

I ran it last night using docker and it worked extremely well. You need a HuggingFace read-only API token for the Diarization. I found that the web UI ignored the token, but worked fine when I added it to docker compose as an environment variable.

jduckles · 2025-08-13T19:25:56 1755113156

WhipserX's diarization is great imo:

    whisperx input.mp3 --language en --diarize --output_format vtt --model large-v2

Works a treat for Zoom interviews. Diarization is sometimes a bit off, but generally its correct.

Morizero · 2025-08-13T20:20:11 1755116411

> input.mp3

Thanks but I'm looking for live diarization.

kmfrk · 2025-08-13T18:52:01 1755111121

Proper diarization still remains a white whale for me, unfortunately.

Last I looked into it, the main options required API access to external services, which put me off. I think it was pyannotate.audio[1].

[1]: https://github.com/pyannote/pyannote-audio

peterleiser · 2025-08-14T00:15:24 1755130524

I used diarization in https://github.com/jhj0517/Whisper-WebUI last night and once it downloads the model from HuggingFace it runs offline (it claims).