UI-TARS-desktop
by alaa-nadi
Overview
A GUI Agent application allowing users to control their computer and perform tasks using natural language, leveraging Vision-Language Models (VLMs) and Multi-Channel Processing (MCP) for interaction.
Installation
pnpm run dev:agent-tarsEnvironment Variables
- VLM_PROVIDER
- VLM_BASE_URL
- VLM_API_KEY
- VLM_MODEL_NAME
- PORT
- START_MINIMIZED
- ELECTRON_RENDERER_URL
- CI
- UPGRADE_EXTENSIONS
- OPENAI_API_KEY
- OPENAI_API_BASE_URL
- OPENAI_DEFAULT_MODEL
- AZURE_OPENAI_ENDPOINT
- AZURE_OPENAI_API_VERSION
- AZURE_OPENAI_MODEL
- AZURE_OPENAI_API_KEY
- ANTHROPIC_API_KEY
- ANTHROPIC_API_BASE_URL
- ANTHROPIC_DEFAULT_MODEL
- GEMINI_API_KEY
- GEMINI_API_BASE_URL
- GEMINI_DEFAULT_MODEL
- MISTRAL_API_KEY
- MISTRAL_API_BASE_URL
- MISTRAL_DEFAULT_MODEL
- TAVILY_API_KEY
- BING_SEARCH_API_KEY
Security Notes
The `ui-tars-desktop` application has critical Electron security vulnerabilities including: 1) `preload/index.ts` directly exposes `ipcRenderer` methods to the renderer process (`contextIsolation` bypassed for `window.electron`), allowing potential full Node.js API access if a script is injected. 2) `apps/ui-tars/src/main/window/ScreenMarker.ts` creates new `BrowserWindow` instances with `nodeIntegration: true` and `contextIsolation: false`, making these windows highly vulnerable to arbitrary code execution. 3) `apps/ui-tars/src/main/window/createWindow.ts` uses `sandbox: false`. The `agent-tars-app` part, while using `contextIsolation: true` and Content Security Policy, sets `webSecurity: false` for its main window, allowing unrestricted cross-origin requests which is a significant risk. The integration with `mcp-servers/commands` package allows execution of arbitrary shell commands, posing a severe risk if LLM output is not perfectly sanitized. File system access (`ipcRoutes/filesystem.ts`) can be configured via `setAllowedDirectories`, but improper configuration or bypass could lead to unauthorized file operations. `shell.openExternal` and `shell.openPath` calls can open arbitrary URLs or local files from agent actions.
Similar Servers
UI-TARS-desktop
UI-TARS-desktop is a native GUI Agent application powered by multimodal AI models, enabling users to control their computer and browser through natural language instructions.
Windows-MCP
This MCP server enables AI agents to directly interact with the Windows operating system, performing tasks such as file navigation, application control, UI interaction, and QA testing.
Windows-MCP.Net
Enabling AI assistants to automate tasks and interact with the Windows desktop environment.
mcp-vnc
An MCP server for AI agents to remotely control VNC-enabled desktops (Windows, Linux, macOS) through mouse, keyboard, text input, and screen capture commands.