UI-TARS-desktop
by alaa-nadi
Overview
A GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language, making interaction with technology more intuitive and efficient.
Installation
pnpm run dev:agent-tarsEnvironment Variables
- NODE_ENV
- UPGRADE_EXTENSIONS
- PORT
- START_MINIMIZED
- ELECTRON_RENDERER_URL
- CI
- VLM_PROVIDER
- VLM_BASE_URL
- VLM_API_KEY
- VLM_MODEL_NAME
- APPLE_ID
- APPLE_PASSWORD
- APPLE_TEAM_ID
- KEYCHAIN_PATH
- TAVILY_API_KEY
- BING_SEARCH_API_KEY
- OPENAI_API_KEY
- OPENAI_API_BASE_URL
- OPENAI_DEFAULT_MODEL
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_ENDPOINT
- AZURE_OPENAI_API_VERSION
- AZURE_OPENAI_MODEL
- ANTHROPIC_API_KEY
- ANTHROPIC_API_BASE_URL
- ANTHROPIC_DEFAULT_MODEL
- GEMINI_API_KEY
- GEMINI_API_BASE_URL
- GEMINI_DEFAULT_MODEL
- MISTRAL_API_KEY
- MISTRAL_API_BASE_URL
- MISTRAL_DEFAULT_MODEL
Security Notes
The Electron `preload` script exposes `ipcRenderer.invoke` globally without explicit channel whitelisting, allowing the renderer process to invoke *any* main process IPC handler. This is a critical vulnerability. Additionally, various configurable external endpoints (VLM API, UTIO, Report Storage, Preset URLs) pose a significant risk if an attacker can control settings to redirect data or inject malicious configurations. The Content-Security-Policy is permissive, including `'unsafe-inline'` and `'unsafe-eval'` for scripts, and `blob:` for connections, weakening XSS protection. Direct file system access is granted to allowed directories via IPC, which requires robust validation. Sensitive data like API keys are stored locally, which is standard but means machine compromise exposes secrets.
Similar Servers
UI-TARS-desktop
UI-TARS Desktop is a native multimodal AI agent application designed to control a computer's graphical user interface using natural language. It enables automation of tasks across web browsers and the local desktop environment, integrating vision-language models for intelligent interaction.
5ire
A sleek AI assistant and MCP (Model Context Protocol) client that supports various LLMs, integrates tools, and manages knowledge bases for document processing.
Windows-MCP
Enables AI agents to interact with the Windows operating system for tasks such as file navigation, application control, UI interaction, and QA testing.
Windows-MCP.Net
Enabling AI assistants to automate tasks and interact with the Windows desktop environment.