omni-parser-mcp-server
Verified Safeby goyaladitya05
Overview
Automates GUI interactions on any software by 'seeing' the screen with computer vision and executing actions based on LLM reasoning.
Installation
python -m osiris.serverEnvironment Variables
- GOOGLE_API_KEY
Security Notes
The system processes and executes actions based on LLM output, including mouse clicks and keyboard typing. While the JSON parsing helps mitigate direct code injection, a compromised or poorly prompted LLM could instruct the agent to perform malicious actions on the host machine. No direct 'eval' or hardcoded secrets were found. Temporary screenshot files are used and overwritten.
Similar Servers
UI-TARS-desktop
UI-TARS-desktop is a native GUI Agent application powered by multimodal AI models, enabling users to control their computer and browser through natural language instructions.
Windows-MCP
This MCP server enables AI agents to directly interact with the Windows operating system, performing tasks such as file navigation, application control, UI interaction, and QA testing.
Peekaboo
macOS automation server that integrates AI for screen capture analysis, UI interaction, and agentic workflows.
MCPControl
A Windows control server for the Model Context Protocol, enabling AI models to programmatically control system operations such as mouse, keyboard, window management, and screen capture.