Ever since voice satellites were introduced to Home Assistant, people wanted to use good microphones and speakers for this purpose, but this is, still, a work in progress. Through this project I wanted to add to the voice assistant made in Home Assistant a visual image and random personalized responses that are displayed on an google home display or a tablet with browser mod integration. To achieve this, I will use an esp32 satellite programmed with ESPHome.
Here's a video tutorial:
https://www.youtube.com/watch?v=1bBOoUhnF4A&t=166s
- wake word, push to talk, on-demand and continuous conversation support
- service exposed in HA to start and stop the voice assistant from another device/trigger
- visual feedback of the wake word listening/audio recording/success/error status via the satellite LEDs
- visual feedback on the tablet display
- changing assistants and language from the graphical interface or with voice commands.
- uses touch control on the satellite for toggle wake word activation with long touch, and start/stop listening when wake word detection is off with short touch.
- Home Assistant 2023.11.3 or newer
- For tablets with browser or Home Assistant companion app, install Browser Mod integration with HACS.
- A voice assistant configured in HA with STT and TTS in a language of your choice
- ESPHome 2023.11.6 or newer
-
Mount the esp32 satellite as in the diagram or, if you have already done one, modify the microphone L/R pin to be connected to a digital pin which is then added in code to mute_pin.

-
In the EspHome addon interface to the satellite you are making, copy the code from the ”Visual EspVoice” file and and modify the code with your data.
-
If you already have your Esp32 satellite setup, make the following changes:
-
Add the following lines of code to the substitutions and globals and complete with your data:
substitutions:
external_media_player: "media_player.ha_display_browser"
browser_id: "ha_display_browser" #android tablet browser_id from browser_mod
display_switch: "light.ha_display_browser_screen"# android tablet display switch from fully kiosk browser
gifs_dir: "http://192.168.0.xxx:8123/local/gifs/"
silent_sound: "media-source://media_source/gifs/5minsilence.wav"
assistant1: "Jarvis" # the name exactly as it is in the gifs directory without "_speech.gif"
assistant2: "JarvisEn"
assistant3: "Roomie"
assistant4: "Sheila"
tts_service: "edge_tts"
tts_language1: "ro-RO-EmilNeural" #en-US-ChristopherNeural / ro-RO-EmilNeural
tts_language2: "en-US-ChristopherNeural" #en-US-MichelleNeural / ro-RO-AlinaNeural
tts_language3: "ro-RO-AlinaNeural" #en-US-JennyNeural / ro-RO-AlinaNeural
tts_language4: "ro-RO-AlinaNeural" #en-US-JennyNeural / ro-RO-AlinaNeural
random_messages1: "{{ ['cu ce pot să te ajut', 'spune te rog', 'da, te ascult'] | random }}"
random_messages2: "{{ ['how can I help you', 'yes, im listening', 'how can assist you'] | random }}"
random_messages3: "{{ ['cu ce pot să te ajut', 'spune te rog', 'da, te ascult'] | random }}"
random_messages4: "{{ ['cu ce pot să te ajut', 'spune te rog', 'da, te ascult'] | random }}"
globals:
- id: speech_url
type: std::string
restore_value: no
- id: listen_url
type: std::string
restore_value: no
- id: tts_language
type: std::string
restore_value: no
- id: random_messages
type: std::string
restore_value: no
- At "microphone" change the channel to "left"
microphone:
- platform: i2s_audio
id: board_microphone
adc_type: external
i2s_din_pin: GPIO16
pdm: false
channel: left
- To "on_wake_word_detected" add these lines of code
on_wake_word_detected:
- script.execute: assistant_set
- switch.turn_on: mute_pin
#- switch.turn_off: use_wake_word" #If you don't have the possibility to modify the microphone wiring for mute switch
# - delay: 100ms
# - homeassistant.service:
# service: homeassistant.turn_on
# data:
# entity_id: ${display_switch}
- delay: 100ms
- homeassistant.service:
service: media_player.play_media
data_template:
entity_id: ${external_media_player}
media_content_id: !lambda |-
return "media-source://tts/${tts_service}?message= \"" + id(random_messages) + "\"&language=" + id(tts_language);
media_content_type: "audio"
extra: !lambda |-
return "{\"metadata\": {\"metadataType\": 3, \"title\": \" \", \"images\": [{\"url\": \"" + id(speech_url) + "\" }]}}";
- delay: 100ms
# - homeassistant.service:
# service: browser_mod.popup
# data_template:
# entity_id: ${external_media_player}
# browser_id: ${browser_id}
# content: !lambda |-
# return "{\"type\": \"picture\", \"image\": \"" + id(speech_url) + "\"}";
# style: "--popup-min-width: 800px"
######### THIS IS FOR GOOGLE HOME SPEAKERS CHIME SOUND #######
# - if:
# condition:
# switch.is_on: google_speaker_status
# then:
# - delay: 200ms
# else:
# - delay: 2s
# - delay: 1300ms
- homeassistant.service:
service: media_player.play_media
data_template:
entity_id: ${external_media_player}
media_content_id: ${silent_sound}
media_content_type: "audio"
extra: !lambda |-
return "{\"metadata\": {\"metadataType\": 3, \"title\": \" \", \"images\": [{\"url\": \"" + id(listen_url) + "\" }]}}";
- delay: 100ms
# - homeassistant.service:
# service: browser_mod.popup
# data_template:
# entity_id: ${external_media_player}
# browser_id: ${browser_id}
# content: !lambda |-
# return "{\"type\": \"picture\", \"image\": \"" + id(listen_url) + "\"}";
# style: "--popup-min-width: 800px"
- switch.turn_off: mute_pin
#- voice_assistant.start#If you don't have the possibility to modify the microphone wiring for mute switch
- If you don't have the possibility to modify the microphone wiring, you can put instead of "switch.turn_on: mute_pin" - "- switch.turn_off: use_wake_word" and instead of "switch.turn_off: mute_pin" - "- voice_assistant.start". It works but stopping and restarting the "use_wake_word" service can cause some problems when listening to the wake word is restarted if the delay times are not set properly.
- If you use a google display or a google speaker, uncomment section provided that the delay is longer if initially the speaker is turned off and because of this it will play the chime sounds first.
- If you're using a tablet with a browser running the Home Assistant interface or the Home Assistant app, uncomment sections with the browser_mod.popup service and with tablet display turn_on switch.
- These lines of code are added to "on_tts_end"
on_tts_end:
- delay: 200ms
- light.turn_on:
id: led
blue: 0%
red: 0%
green: 100%
brightness: 60%
effect: none
- delay: 200ms
- homeassistant.service:
service: media_player.play_media
data_template:
entity_id: ${external_media_player}
media_content_id: !lambda return x;
media_content_type: "music"
extra: !lambda |-
return "{\"metadata\": {\"metadataType\": 3, \"title\": \" \", \"images\": [{\"url\": \"" + id(speech_url) + "\" }]}}";
# - delay: 100ms
# - homeassistant.service:
# service: browser_mod.popup
# data_template:
# entity_id: ${external_media_player}
# browser_id: ${browser_id}
# content: !lambda |-
# return "{\"type\": \"picture\", \"image\": \"" + id(speech_url) + "\"}";
# style: "--popup-min-width: 800px"
- If you're using a tablet with a browser running the Home Assistant interface or the Home Assistant app, uncomment sections with the browser_mod.popup service
- If you're using a tablet with a browser running the Home Assistant interface or the Home Assistant app, add these lines of code to "on_tts_stream_end" with the browser_mod.close_popup service
on_tts_stream_end:
- delay: 100ms
- script.execute: reset_led
# - delay: 100ms
# - homeassistant.service:
# service: browser_mod.close_popup
# data_template:
# entity_id: ${external_media_player}
# browser_id: ${browser_id}
- delay: 100ms
- homeassistant.service:
service: media_player.play_media
data_template:
entity_id: ${external_media_player}
media_content_id: ${silent_sound}
media_content_type: "audio"
extra: !lambda |-
return "{\"metadata\": {\"metadataType\": 3, \"title\": \" \", \"images\": [{\"url\": \"" + id(listen_url) + "\" }]}}";
- Then these lines of code are added to "on__end"
on_end:
- delay: 100ms
- script.execute: reset_led
- delay: 100ms
- switch.turn_off: mute_pin
# - switch.turn_on: use_wake_word # this is if not have mute switch
- delay: 100ms
- homeassistant.service:
service: browser_mod.close_popup
data_template:
entity_id: ${external_media_player}
browser_id: ${browser_id}
-
If you don't use the "mute_pin" switch, uncomment the wake word listening restart to "on_end"
-
On script add assistant_set script
script:
- id: assistant_set
then:
- lambda: |-
std::string selected_language;
if (id(assistant_select).state == "${assistant1}") {
selected_language = "${tts_language1}";
} else if (id(assistant_select).state == "${assistant2}") {
selected_language = "${tts_language2}";
} else if (id(assistant_select).state == "${assistant3}") {
selected_language = "${tts_language3}";
} else if (id(assistant_select).state == "${assistant4}") {
selected_language = "${tts_language4}";
} else {
selected_language = "${tts_language1}";
}
id(tts_language) = selected_language;
- lambda: |-
std::string selected_message;
if (id(assistant_select).state == "${assistant1}") {
selected_message = "${random_messages1}";
} else if (id(assistant_select).state == "${assistant2}") {
selected_message = "${random_messages2}";
} else if (id(assistant_select).state == "${assistant3}") {
selected_message = "${random_messages3}";
} else if (id(assistant_select).state == "${assistant4}") {
selected_message = "${random_messages4}";
} else {
selected_message = "${random_messages1}";
}
id(random_messages) = selected_message;
- lambda: |-
std::string image_url = "${gifs_dir}";
image_url += id(assistant_select).state;
image_url += "_speech.gif";
id(speech_url) = image_url;
- lambda: |-
std::string image_url = "${gifs_dir}";
image_url += id(assistant_select).state;
image_url += "_listen.gif";
id(listen_url) = image_url;
- Add assistants selector
select:
- platform: template
id: assistant_select
name: "Assistant select"
optimistic: true
restore_value: true
options:
- "${assistant1}"
- "${assistant2}"
- "${assistant3}"
- "${assistant4}"
- To the "switch" section add mute switch. This is done by connecting the L/R pin from the microphone to a digital pin. When that pin is powered, the microphone is turned on.
switch:
- platform: gpio
pin:
number: GPIO17
inverted: true
name: MUTE
id: mute_pin
restore_mode: RESTORE_DEFAULT_OFF
- If you use a display or a google speaker, in the switch sectionn, add a virtual switch that we will use in the automation that will transmit to the Esp32 satellite if the google speaker is on or off to adjust the delay for the initial response
switch:
- platform: template
name: Google speaker status
id: google_speaker_status
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
- Press "Save" and "Install" on EspHome addon interface.
- Configure EspVoice satellite in "settings/integrations" page of home assistant.
- Check "Allow the device to make Home Assistant service calls" from the "CONFIGURE" button of the Esp32 satellite.
- On local directory of Home Assistant,
/config/www, create a new directory ”gifs” where copy the content of www/gifs directory from this github page. - Copy content of configuration.yaml to configuration.yaml directory of Home Assistant to make the shell services change assistants scripts from
/config/www/gifsdirectory. - Copy content of automation.yaml and make the necessary changes according to your configuration to make automations for change assistants and google speaker status.
- Restart Home Assistant.
- On "Settings/automations" page search "Assist Google speaker status" automation and check if the Esp32 satellite Google speaker status switch is updated when the associated google speaker is turned off or on.
- On "Settings/automations" page search "Assist Change Assistants" automation and check if assistants and pipelines are changed by voice commands.

