voicemode-windows

Name: voicemode-windows
Author: cescroca1976

Verified Safe

by cescroca1976

View Source

Overview

An MCP server enabling real-time voice interaction (Speech-to-Text and Text-to-Speech) for AI agents, integrating local services like Whisper and Kokoro, with configurable cloud fallback (OpenAI).

Installation

Run Command

python -m voice_mode

Environment Variables

OPENAI_API_KEY
VOICEMODE_WHISPER_MODEL
VOICEMODE_WHISPER_PORT
VOICEMODE_KOKORO_PORT
VOICEMODE_TTS_BASE_URLS
VOICEMODE_STT_BASE_URLS
VOICEMODE_AUDIO_FEEDBACK
VOICEMODE_DISABLE_SILENCE_DETECTION
VOICEMODE_VAD_AGGRESSIVENESS
VOICEMODE_DEFAULT_LISTEN_DURATION
VOICEMODE_SERVICE_AUTO_ENABLE
VOICEMODE_SKIP_TTS
VOICEMODE_PRONOUNCE
VOICEMODE_PRONUNCIATION_ENABLED

Security Notes

The server leverages extensive `subprocess` calls for installing and managing external tools (git, cmake, package managers, whisper.cpp, kokoro-fastapi), which broadens the attack surface. Its security heavily depends on the integrity of these external projects and the user's system configuration. The `whisper-server` is configured to bind to `0.0.0.0` by default, which can expose it externally if not properly firewalled. Sensitive `OPENAI_API_KEY`s are handled well via environment variables and masked in logs/outputs. No direct `eval` or malicious patterns were found, and `shlex.split` is used for parsing rules. The testing suite includes measures to prevent dangerous commands, indicating developer awareness of subprocess risks.

Similar Servers

voicemode

609

Provides robust voice interaction capabilities for Model Context Protocol (MCP) agents, enabling real-time speech-to-text (STT) and text-to-speech (TTS) functionalities, with support for local and cloud-based services. It also includes tools for audio playback (DJ), service management, and diagnostics.

Other

$Medium

mcp-discord

Enables AI assistants to interact with the Discord platform by providing a set of Discord-related functionalities via the Model Context Protocol (MCP).

Other

$Medium

stt-mcp-server-linux

Local speech-to-text server for Linux, designed to integrate with Claude Code via the MCP protocol or run in standalone mode to inject transcribed text into a Tmux session.

Other

$High

glm-asr

An all-in-one service for high-accuracy speech recognition (ASR) across multiple languages, featuring Web UI, REST API, SSE streaming, and MCP server integration.

Other

$High

Stats

Interest Score0

Security Score7

Cost ClassLow

Avg Tokens500

Stars0

Forks0

Last Update2026-01-18

voicemode-windows

Overview

Installation

Environment Variables

Security Notes

Similar Servers

voicemode

mcp-discord

stt-mcp-server-linux

glm-asr

Stats

Tags