UI-TARS-desktop
by bytedance
Overview
UI-TARS-desktop is a native GUI Agent application powered by multimodal AI models, enabling users to control their computer and browser through natural language instructions.
Installation
pnpm --filter @tarko/agent-server start:serverEnvironment Variables
- ARK_BASE_URL
- ARK_API_KEY
- DOUBAO_1_5_UI_TARS
- DOUBAO_1_5_VP
- DOUBAO_SEED_1_6
- O_TARS_BASE_URL
- O_TARS_5_MODEL_ID
- O_TARS_API_KEY
- O_TARS_6_MODEL_ID
- ANTHROPIC_BASE_URL
- ANTHROPIC_MODEL
- ANTHROPIC_API_KEY
- OPENAI_BASE_URL
- OPENAI_MODEL
- OPENAI_API_KEY
- CUSTOM_BASE_URL
- CUSTOM_MODEL
- CUSTOM_API_KEY
- AIO_BASE_URL
- AIO_TIMEOUT
- SANDBOX_JWT_URL
- NATIVE_THINKING
- STORAGE_TYPE
- UI_TARS_MODEL
- AIO_SANDBOX_URL
Security Notes
Critical Vulnerability: The `NutJSOperator` (used for local computer control) directly uses `eval()` on an `expression` that can originate from LLM-generated content (`calculatorTool`). This allows for arbitrary code execution, posing a severe risk if the LLM is compromised or jailbroken. The inline comment `// 注意:生产环境中使用安全的数学计算器` (Note: use a safe calculator in production) acknowledges this risk but does not mitigate it in the provided code. High Risk GUI Automation: As a GUI agent, it has the capability to control the user's mouse, keyboard, and screen. If maliciously prompted, it could interact with critical applications, delete files, or exfiltrate sensitive data. Hardcoded Private Key: The `apps/ui-tars/src/main/remote/app_private.ts` (implied by name and usage patterns seen in `auth.ts`) suggests a hardcoded application private key (`appPrivateKeyBase64`). Distributing a private key in a client-side application (Electron app) is a major security flaw, as it can be extracted and misused for impersonation or unauthorized access to remote services. Supply Chain Risk (Remote Presets): The CLI can fetch preset configurations from remote URLs (`gui-agent/cli/src/cli/start.ts`). If a remote server is compromised, it could serve malicious configurations. Third-Party Trust: Relies on ByteDance's remote proxy services for some "free" and "subscription" remote computer/browser operators, introducing a dependency on external infrastructure's security and trustworthiness.
Similar Servers
Windows-MCP
This MCP server enables AI agents to directly interact with the Windows operating system, performing tasks such as file navigation, application control, UI interaction, and QA testing.
mcp-server-browserbase
Enables LLMs to perform cloud browser automation tasks such as navigating, interacting with elements, extracting data, and capturing screenshots on web pages.
mcp
This server provides Hyperbrowser's Model Context Protocol (MCP) interface, offering tools for web scraping, structured data extraction, crawling, and general-purpose browser automation using AI agents like OpenAI's CUA and Anthropic's Claude Computer Use.
seline
A backend API server for managing and executing ComfyUI workflows, capable of dynamically generating API endpoints for workflows, building Docker containers for custom nodes and models, and providing an execution queue. It integrates with the Model Context Protocol (MCP) to expose its capabilities to client applications.