I ran it last night using docker and it worked extremely well. You need a HuggingFace read-only API token for the Diarization. I found that the web UI ignored the token, but worked fine when I added it to docker compose as an environment variable.
whisperx input.mp3 --language en --diarize --output_format vtt --model large-v2
Thanks but I'm looking for live diarization.
Last I looked into it, the main options required API access to external services, which put me off. I think it was pyannotate.audio[1].
[1]: https://github.com/pyannote/pyannote-audio