stt-mcp-server-linux
by marcindulak
Overview
Local speech-to-text server for Linux, designed to integrate with Claude Code via the MCP protocol or run in standalone mode to inject transcribed text into a Tmux session.
Installation
bash scripts/restart_mcp_server.shEnvironment Variables
- CONTAINER_NAME
- DEBUG
- MODE
- OUTPUT
- TMUX_SESSION
- TMUX_TMPDIR
- STT_MCP_SERVER_LINUX_PATH
- BUILDKIT_PROGRESS
Security Notes
The `scripts/restart_mcp_server.sh` uses `eval` to execute the constructed `docker run` command, which is a shell anti-pattern that can lead to command injection vulnerabilities if any of the variables composing `$DOCKER_CMD` were ever sourced from untrusted input. The server requires the Docker container to have direct access to sensitive host devices (`/dev/input` for keyboard monitoring and `/dev/snd` for audio recording) and adds the container user to the `input` group. This grants the container high privileges to interact with the host's hardware, increasing the potential attack surface if the container were compromised. While `TmuxOutputHandler` implements robust input sanitization to prevent command injection when injecting transcribed text into Tmux, the underlying system setup involves elevated privileges and a potentially dangerous shell construct.
Similar Servers
voicemode
Provides robust voice interaction capabilities for Model Context Protocol (MCP) agents, enabling real-time speech-to-text (STT) and text-to-speech (TTS) functionalities, with support for local and cloud-based services. It also includes tools for audio playback (DJ), service management, and diagnostics.
tmux-mcp
Enables AI assistants (like Claude Desktop) to interact with, control, and observe tmux terminal sessions by providing tools for session management and command execution.
consult-llm-mcp
An MCP server that allows AI agents like Claude Code to consult stronger, more capable AI models (e.g., GPT-5.2, Gemini 3.0 Pro) for complex code analysis, debugging, and architectural advice.
mcp-tts
Provides Text-to-Speech (TTS) capabilities to MCP (Model Context Protocol) clients using various AI and system-level TTS engines.