UI-TARS-desktop
Verified Safeby bytedance
Overview
A multimodal AI agent stack providing a native GUI agent desktop application (UI-TARS Desktop) and a general CLI/Web UI agent (Agent TARS) for controlling computers, browsers, and mobile devices using natural language, integrating various real-world tools via the Model Context Protocol (MCP).
Installation
npx @agent-tars/cli@latest serve --provider <your-provider> --model <your-model> --apiKey <your-api-key>Environment Variables
- ARK_BASE_URL
- ARK_API_KEY
- DOUBAO_1_5_UI_TARS
- DOUBAO_1_5_VP
- DOUBAO_SEED_1_6
- ANTHROPIC_BASE_URL
- ANTHROPIC_MODEL
- ANTHROPIC_API_KEY
- OPENAI_BASE_URL
- OPENAI_MODEL
- OPENAI_API_KEY
- O_TARS_BASE_URL
- O_TARS_5_MODEL_ID
- O_TARS_6_MODEL_ID
- O_TARS_API_KEY
- CUSTOM_BASE_URL
- CUSTOM_MODEL
- CUSTOM_API_KEY
- AIO_BASE_URL
- AIO_TIMEOUT
- AIO_SANDBOX_URL
- SANDBOX_JWT_URL
- TAVILY_API_KEY
- GOOGLE_API_KEY
- GOOGLE_MCP_URL
- LINK_READER_MCP_URL
- UI_TARS_MODEL
- BING_SEARCH_API_KEY
- UI_TARS_PROXY_HOST
- NATIVE_THINKING
- STORAGE_TYPE
- LLM_PROXY_URL
Security Notes
The project demonstrates awareness of security best practices, utilizing `secretlint` to flag potential hardcoded secrets (resolved via environment variables), implementing JWT for authentication in remote interactions, and validating file paths to prevent directory traversal in filesystem operations. Permissions for macOS system access (e.g., accessibility, screen recording) are explicitly handled. However, reliance on external proxy services (`UI_TARS_PROXY_HOST`) introduces a dependency on the security of those third-party infrastructures. There are no immediate signs of direct `eval` usage in critical agent logic (though commented out examples exist in helper utilities), and `clipboard.setContent` for typing on Windows is a common automation technique.
Similar Servers
Windows-MCP
Enables AI agents to interact with the Windows operating system for tasks such as file navigation, application control, UI interaction, and QA testing.
mcp-server-browserbase
Provides cloud browser automation capabilities, enabling LLMs to interact with web pages, take screenshots, extract information, and perform automated actions.
mcp
This server provides Hyperbrowser's Model Context Protocol (MCP) interface, offering tools for web scraping, structured data extraction, crawling, and general-purpose browser automation using AI agents like OpenAI's CUA and Anthropic's Claude Computer Use.
MCPControl
A Windows control server for the Model Context Protocol, enabling AI models to programmatically control system operations such as mouse, keyboard, window management, and screen capture.