Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,13 +125,12 @@ models.register("my-model", custom_model_instance)

### Agent OS Abstraction

`AgentOs` provides an abstraction layer for OS-level operations:
`ComputerAgentOS` provides an abstraction layer for OS-level operations:

```
AgentOs (Abstract Interface)
├── AskUiControllerClient (gRPC to AskUI Agent OS - primary)
ComputerAgentOS (Abstract Interface)
├── MultiComputerTargetAgentOS (gRPC to AskUI Agent OS - primary)
├── PlaywrightAgentOs (Web browser automation)
└── AndroidAgentOs (Android ADB)
```

### Locator System
Expand Down Expand Up @@ -175,7 +174,7 @@ Tools are auto-discovered and can be dynamically loaded via MCP configurations.
- `src/askui/prompts/` - System prompts for different models

### Tools & OS
- `src/askui/tools/agent_os.py` - Abstract `AgentOs` interface
- `src/askui/tools/agent_os.py` - Abstract `ComputerAgentOS` interface
- `src/askui/tools/askui/` - gRPC client for AskUI Agent OS
- `src/askui/tools/android/` - Android-specific tools
- `src/askui/tools/playwright/` - Web automation tools
Expand Down Expand Up @@ -247,7 +246,7 @@ When writing or updating documentation in `docs/`:
## Important Patterns

### Composition over Inheritance
- `AgentToolbox` wraps `AgentOs` implementations
- `AgentToolbox` wraps `ComputerAgentOS` implementations
- `ModelRouter` composes multiple model providers
- `CompositeReporter` aggregates multiple reporters

Expand All @@ -261,7 +260,7 @@ When writing or updating documentation in `docs/`:
- Retry strategies with exponential backoff

### Adapter Pattern
- `AgentOs` abstraction bridges OS implementations (gRPC, Playwright, ADB)
- `ComputerAgentOS` abstraction bridges OS implementations (gRPC, Playwright, ADB)
- `ModelFacade` adapts models to `ActModel`/`GetModel`/`LocateModel` interfaces

### Dependency Injection
Expand Down Expand Up @@ -299,13 +298,13 @@ When writing or updating documentation in `docs/`:
### Adding Custom Tools
1. Implement `Tool` protocol in `models/shared/tools.py`
2. Register in appropriate MCP server (`api/mcp_servers/{type}.py`)
3. Use `@auto_inject_agent_os` for AgentOs dependency
3. Use `@auto_inject_agent_os` for ComputerAgentOS dependency
4. Follow Pydantic schema validation

### Adding New Agent Types
1. Inherit from `Agent`
2. Implement required abstract methods
3. Provide appropriate `AgentOs` implementation
3. Provide appropriate `ComputerAgentOS` implementation
4. Register in agent factory if needed

## Performance & Caching
Expand Down
2 changes: 1 addition & 1 deletion docs/07_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Work with any agent type, no special dependencies required.

#### Computer Tools (`computer/`)

Require `AgentOs` and work with `ComputerAgent` for desktop automation.
Require `ComputerAgentOS` and work with `ComputerAgent` for desktop automation.

**Examples:**
- `ComputerSaveScreenshotTool(base_dir)` - Save screenshots to disk
Expand Down
2 changes: 2 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ plugins = pydantic.mypy,sqlalchemy.ext.mypy.plugin
exclude = (?x)(
^src/askui/models/ui_tars_ep/ui_tars_api\.py$
| ^src/askui/tools/askui/askui_ui_controller_grpc/.*$
| ^venv/.*$
| ^\.venv/.*$
)
mypy_path = src:tests
explicit_package_bases = true
Expand Down
3 changes: 3 additions & 0 deletions src/askui/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
from .models.types.response_schemas import ResponseSchema, ResponseSchemaBase
from .retry import ConfigurableRetry, Retry
from .tools import ModifierKey, PcKey
from .tools.askui import LocalComputerTarget, RemoteComputerTarget
from .utils.image_utils import ImageSource
from .utils.source_utils import InputSource

Expand All @@ -69,6 +70,8 @@
logging.getLogger(__name__).addHandler(logging.NullHandler())

__all__ = [
"RemoteComputerTarget",
"LocalComputerTarget",
"Agent",
"AutomationError",
"ComputerAgent",
Expand Down
4 changes: 2 additions & 2 deletions src/askui/agent_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from askui.models.shared.truncation_strategies import TruncationStrategy
from askui.prompts.act_prompts import CACHE_USE_PROMPT, create_default_prompt
from askui.telemetry.otel import OtelSettings, setup_opentelemetry_tracing
from askui.tools.agent_os import AgentOs
from askui.tools.agent_os import ComputerAgentOS
from askui.tools.android.agent_os import AndroidAgentOs
from askui.tools.caching_tools import (
InspectCacheMetadata,
Expand Down Expand Up @@ -57,7 +57,7 @@ def __init__(
reporter: Reporter | None = None,
retry: Retry | None = None,
tools: list[Tool] | None = None,
agent_os: AgentOs | AndroidAgentOs | None = None,
agent_os: ComputerAgentOS | AndroidAgentOs | None = None,
settings: AgentSettings | None = None,
callbacks: list[ConversationCallback] | None = None,
truncation_strategy: TruncationStrategy | None = None,
Expand Down
77 changes: 68 additions & 9 deletions src/askui/computer_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
create_computer_agent_prompt,
)
from askui.tools.computer import (
ComputerGetCurrentComputerTargetIdTool,
ComputerGetMousePositionTool,
ComputerGetSystemInfoTool,
ComputerKeyboardPressedTool,
ComputerKeyboardReleaseTool,
ComputerKeyboardTapTool,
ComputerListAgentOsTargetComputersTool,
ComputerListDisplaysTool,
ComputerMouseClickTool,
ComputerMouseHoldDownTool,
Expand All @@ -31,14 +33,15 @@
ComputerRetrieveActiveDisplayTool,
ComputerScreenshotTool,
ComputerSetActiveDisplayTool,
ComputerSwitchAgentOsTargetComputerTool,
ComputerTypeTool,
)
from askui.tools.exception_tool import ExceptionTool

from .reporting import CompositeReporter, Reporter
from .retry import Retry
from .tools import AgentToolbox, ComputerAgentOsFacade, ModifierKey, PcKey
from .tools.askui import AskUiControllerClient
from .tools.askui import ComputerTarget, MultiComputerTargetAgentOS

logger = logging.getLogger(__name__)

Expand All @@ -50,17 +53,39 @@ class ComputerAgent(Agent):
This agent can perform various UI interactions like clicking, typing, scrolling, and more.
It uses computer vision models to locate UI elements and execute actions on them.

A single `ComputerAgent` can drive **one or more machines** through the
`agent_os_target_computers` argument. Each entry is an Agent OS target
computer (local subprocess or remote gRPC endpoint) identified by a stable
`computer_id`. At any moment one target is *active* and receives all
explicit calls (`click`, `type`, `keyboard`, ...). The active target can be
changed at runtime via
`agent.tools.os.switch_agent_os_target_computer(computer_id)` or scoped to a
block using `agent.tools.os.temporary_select(computer_id)`. The `act()`
model is also given list/switch/get-current tools so it can orchestrate
work across machines on its own (e.g. read something on one computer and
re-enter it on another).

Args:
display (int, optional): The display number to use for screen interactions. Defaults to `1`.
display (int, optional): The display number to use for screen interactions on the default local target. Ignored when `agent_os_target_computers` is provided. Defaults to `1`.
reporters (list[Reporter] | None, optional): List of reporter instances for logging and reporting. If `None`, an empty list is used.
tools (AgentToolbox | None, optional): Custom toolbox instance. If `None`, a default one will be created with `AskUiControllerClient`.
agent_os_target_computers (list[ComputerTarget] | None, optional):
Target computers the agent can route actions to. May mix one
`LocalComputerTarget` (managing a controller subprocess on this
machine) with any number of `RemoteComputerTarget`s pointing at
controllers already running on other machines. Constraints: at
least one target, at most one local, and remote `address`es plus
all `computer_id`s must be unique. The first entry becomes the
initial active target. Defaults to a single local target bound to
`display`.
settings (AgentSettings | None, optional): Provider-based model settings. If `None`, uses the default AskUI model stack.
retry (Retry, optional): The retry instance to use for retrying failed actions. Defaults to `ConfigurableRetry` with exponential backoff. Currently only supported for `locate()` method.
act_tools (list[Tool] | None, optional): Additional tools to make available for
the `act()` method for every call. Same tools can instead be passed per call
via `act(..., tools=[...])` (see example below).

Example:
Single local machine (the default):

```python
from askui import ComputerAgent

Expand All @@ -70,6 +95,36 @@ class ComputerAgent(Agent):
agent.act("Open settings menu")
```

Example:
Research on one machine and write up the findings on another. The
first target in the list is the active one; `temporary_select`
re-routes a block of explicit calls and restores the previous
active target on exit.

```python
from askui import ComputerAgent
from askui.tools.askui import LocalComputerTarget, RemoteComputerTarget

with ComputerAgent(
agent_os_target_computers=[
LocalComputerTarget(computer_id="research-box"),
RemoteComputerTarget(
address="192.168.1.42:26000",
description="Writer box with a text editor open",
computer_id="writer-box",
),
],
) as agent:
agent.act(
"On research-box, open a browser, google 'askui', and read "
"the top results to gather key facts about what AskUI is, "
"what it does, and notable features. Then switch to "
"writer-box and write a Markdown document titled "
"'AskUI Findings' summarizing those facts as a bulleted "
"list in the open text editor."
)
```

Example (optional tools for `act()`):
Register tools from `askui.tools.store` (or your own `Tool` implementations)
either on the agent so they apply to all `act()` calls, or only for one call.
Expand All @@ -94,30 +149,31 @@ class ComputerAgent(Agent):
@telemetry.record_call(
exclude={
"reporters",
"tools",
"settings",
"act_tools",
"callbacks",
"truncation_strategy",
"agent_os_target_computers",
}
)
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def __init__(
self,
display: Annotated[int, Field(ge=1)] = 1,
reporters: list[Reporter] | None = None,
tools: AgentToolbox | None = None,
agent_os_target_computers: list[ComputerTarget] | None = None,
settings: AgentSettings | None = None,
retry: Retry | None = None,
act_tools: list[Tool] | None = None,
callbacks: list[ConversationCallback] | None = None,
truncation_strategy: TruncationStrategy | None = None,
) -> None:
reporter = CompositeReporter(reporters=reporters)
self.tools = tools or AgentToolbox(
agent_os=AskUiControllerClient(
self.tools = AgentToolbox(
agent_os=MultiComputerTargetAgentOS(
display=display,
reporter=reporter,
agent_os_target_computers=agent_os_target_computers,
)
)
super().__init__(
Expand Down Expand Up @@ -500,8 +556,8 @@ def cli(

with ComputerAgent() as agent:
# Use for Windows
agent.cli(r'start "" "C:\Program Files\VideoLAN\VLC\vlc.exe"') # Start in VLC non-blocking
agent.cli(r'"C:\Program Files\VideoLAN\VLC\vlc.exe"') # Start in VLC blocking
agent.cli(r'start "" "C:\\Program Files\\VideoLAN\\VLC\\vlc.exe"') # Start in VLC non-blocking
agent.cli(r'"C:\\Program Files\\VideoLAN\\VLC\\vlc.exe"') # Start in VLC blocking

# Mac
agent.cli("open -a chrome") # Open Chrome non-blocking for mac
Expand Down Expand Up @@ -541,6 +597,9 @@ def get_default_tools() -> list[Tool]:
ComputerListDisplaysTool(),
ComputerRetrieveActiveDisplayTool(),
ComputerSetActiveDisplayTool(),
ComputerListAgentOsTargetComputersTool(),
ComputerSwitchAgentOsTargetComputerTool(),
ComputerGetCurrentComputerTargetIdTool(),
]


Expand Down
6 changes: 3 additions & 3 deletions src/askui/models/shared/android_base_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from askui.models.shared.tool_tags import ToolTags
from askui.models.shared.tools import ToolWithAgentOS
from askui.tools import AgentOs
from askui.tools import ComputerAgentOS
from askui.tools.agent_os_type_error import AgentOsTypeError
from askui.tools.android.agent_os import AndroidAgentOs

Expand Down Expand Up @@ -41,11 +41,11 @@ def agent_os(self) -> AndroidAgentOs:
return agent_os

@agent_os.setter
def agent_os(self, agent_os: AgentOs | AndroidAgentOs) -> None:
def agent_os(self, agent_os: ComputerAgentOS | AndroidAgentOs) -> None:
"""Set the agent OS.

Args:
agent_os (AgentOs | AndroidAgentOs): The agent OS instance to set.
agent_os (ComputerAgentOS | AndroidAgentOs): The agent OS instance to set.

Raises:
TypeError: If the agent OS is not an AndroidAgentOs instance.
Expand Down
25 changes: 13 additions & 12 deletions src/askui/models/shared/computer_base_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

from askui.models.shared.tool_tags import ToolTags
from askui.models.shared.tools import ToolWithAgentOS
from askui.tools.agent_os import AgentOs
from askui.tools.agent_os import ComputerAgentOS
from askui.tools.agent_os_type_error import AgentOsTypeError
from askui.tools.android.agent_os import AndroidAgentOs


class ComputerBaseTool(ToolWithAgentOS):
"""Tool base class that has an AgentOs available."""
"""Tool base class that has a ComputerAgentOS available."""

def __init__(
self,
agent_os: AgentOs | None = None,
agent_os: ComputerAgentOS | None = None,
required_tags: list[str] | None = None,
**kwargs: Any,
) -> None:
Expand All @@ -23,33 +23,34 @@ def __init__(
)

@property
def agent_os(self) -> AgentOs:
def agent_os(self) -> ComputerAgentOS:
"""Get the agent OS.

Returns:
AgentOs: The agent OS instance.
ComputerAgentOS: The agent OS instance.
"""
agent_os = super().agent_os
if not isinstance(agent_os, AgentOs):
if not isinstance(agent_os, ComputerAgentOS):
raise AgentOsTypeError(
expected_type=AgentOs,
expected_type=ComputerAgentOS,
actual_type=type(agent_os),
)
return agent_os

@agent_os.setter
def agent_os(self, agent_os: AgentOs | AndroidAgentOs) -> None:
def agent_os(self, agent_os: ComputerAgentOS | AndroidAgentOs) -> None:
"""Set the agent OS facade.

Args:
agent_os (AgentOs | AndroidAgentOs): The agent OS facade instance to set.
agent_os (ComputerAgentOS | AndroidAgentOs): The agent OS facade
instance to set.

Raises:
TypeError: If the agent OS is not an AgentOs instance.
TypeError: If the agent OS is not a ComputerAgentOS instance.
"""
if not isinstance(agent_os, AgentOs):
if not isinstance(agent_os, ComputerAgentOS):
raise AgentOsTypeError(
expected_type=AgentOs,
expected_type=ComputerAgentOS,
actual_type=type(agent_os),
)
self._agent_os = agent_os
Loading
Loading