Implement speaker diarization, voice activity detection, and/or conversation endpointing.

Implement speaker diarization and VAD. This will let the agent understand who is speaking, providing the user with better responses. This should also get rid of audio hallucinations when there is silence, very important as the wearable will be recording during silence often if worn all the time.

Potential implementations:
https://github.com/pyannote/pyannote-audio
Deepgram