voicemode
by mbailey
Overview
Enables voice interaction capabilities (Speech-to-Text and Text-to-Speech) for Model Context Protocol (MCP) servers, allowing for natural language conversations and voice-controlled actions.
Installation
uvx voice-modeEnvironment Variables
- OPENAI_API_KEY
- VOICEMODE_BASE_DIR
- VOICEMODE_MODELS_DIR
- VOICEMODE_DEBUG
- VOICEMODE_SAVE_ALL
- VOICEMODE_SAVE_AUDIO
- VOICEMODE_SAVE_TRANSCRIPTIONS
- VOICEMODE_AUDIO_FEEDBACK
- VOICEMODE_TTS_BASE_URLS
- VOICEMODE_STT_BASE_URLS
- VOICEMODE_VOICES
- VOICEMODE_TTS_MODELS
- VOICEMODE_PREFER_LOCAL
- VOICEMODE_ALWAYS_TRY_LOCAL
- VOICEMODE_AUTO_START_KOKORO
- VOICEMODE_WHISPER_MODEL
- VOICEMODE_WHISPER_PORT
- VOICEMODE_WHISPER_LANGUAGE
- VOICEMODE_WHISPER_MODEL_PATH
- VOICEMODE_KOKORO_PORT
- VOICEMODE_KOKORO_MODELS_DIR
- VOICEMODE_KOKORO_CACHE_DIR
- VOICEMODE_KOKORO_DEFAULT_VOICE
- LIVEKIT_URL
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET
- LIVEKIT_ACCESS_PASSWORD
- VOICEMODE_DISABLE_SILENCE_DETECTION
- VOICEMODE_VAD_AGGRESSIVENESS
- VOICEMODE_SILENCE_THRESHOLD_MS
- VOICEMODE_MIN_RECORDING_DURATION
- VOICEMODE_INITIAL_SILENCE_GRACE_PERIOD
- VOICEMODE_DEFAULT_LISTEN_DURATION
- VOICEMODE_STREAMING_ENABLED
- VOICEMODE_STREAM_CHUNK_SIZE
- VOICEMODE_STREAM_BUFFER_MS
- VOICEMODE_STREAM_MAX_BUFFER
- VOICEMODE_EVENT_LOG_ENABLED
- VOICEMODE_EVENT_LOG_DIR
- VOICEMODE_EVENT_LOG_ROTATION
- VOICEMODE_PRONOUNCE
- VOICEMODE_PRONOUNCE_ENABLED
- VOICEMODE_PRONUNCIATION_LOG_SUBSTITUTIONS
- VOICEMODE_CHIME_LEADING_SILENCE
- VOICEMODE_CHIME_TRAILING_SILENCE
- VOICEMODE_FRONTEND_PORT
- VOICEMODE_FRONTEND_HOST
- VOICEMODE_TOOLS_ENABLED
- VOICEMODE_TOOLS_DISABLED
- VOICEMODE_TOOLS
- VOICEMODE_SERVICE_AUTO_ENABLE
- FRONTEND_MODE
Security Notes
The installer and service management components extensively use `subprocess.run` and `subprocess.Popen` without consistently employing `shlex.quote` for user-controlled inputs (e.g., `install_dir`, `model_name`, `version`). This poses a significant risk of shell injection if malicious input is provided. Additionally, direct execution of remote scripts via `curl | bash` is used for `uv` and LiveKit installation (`livekit_install` in `voice_mode/tools/livekit/install.py`), which is a critical security vulnerability as it allows arbitrary remote code execution. Default hardcoded passwords (`voicemode123`, `devkey: secret`) exist for development modes of the LiveKit frontend and server, which could be inadvertently exposed.
Similar Servers
mcp-node
Enables natural language interactions with Algolia search, analytics, and monitoring data via the Model Context Protocol (MCP) and Claude Desktop.
mcp-tts
Provides a Text-to-Speech (TTS) server using the Model Context Protocol (MCP) to integrate various TTS engines into applications like Claude Desktop and Cursor IDE.
groq-mcp-server
Provides a Model Context Protocol (MCP) server to access Groq's AI capabilities, including ultra-fast LLM chat, vision, text-to-speech, speech-to-text, agentic tooling, and batch processing, from clients like Claude Desktop and Cursor.
consult-llm-mcp
Facilitates Claude Code to consult powerful external AI models for complex code analysis, debugging, and review tasks.