Skip to content

day50-dev/Petsitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

102 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

Petsitter is an OpenAI-compatible proxy that layers smart harnesses on top of language models to give them capabilities they don't natively have. It also makes finicky behaviors reliable and dependable.

You install it, point it at a model, load a few example tricks, and suddenly things that model couldn't do before - tool calling, structured JSON, multi-step reasoning - start working. Then you think: "oh, I could make it do X" - and you write your own trick.

The built-in tricks are starting points. Tweak them, combine them, or use them as a reference to build something entirely different. Petsitter isn't a turnkey product; it's a kit.

How It Works

Petsitter_Intelligent_Proxy_-_Slide_2a

Petsitter intercepts every request/response pair and runs it through a pipeline of hooks. Each trick picks which hooks it needs:

  1. system_prompt - Inject instructions before the model sees the conversation
  2. pre_hook - Modify messages or inject tool definitions before the API call
  3. post_hook - Validate, retry, or transform the model's response
  4. info - Declare capabilities back to your application

A trick can be as simple as appending a sentence to the system prompt, or as involved as routing subtasks to three different models in parallel. There's a GUI at / for loading/unloading tricks, editing trickset filters, browsing logs, and pointing at different models at runtime.

You can also edit tricks, reorder them, disable, add new ones, and filter them through a built-in dashboard: 2026-07-04_15-13

Petsitter is part of the DAY50 suite of open-source tools for local AI workflows and constructing better agents.

  • No model changes required - Works with any OpenAI-compatible endpoint
  • Pluggable architecture - Write your own tricks in Python. (Skills are included in .agents)
  • Transparent to your app - Point your existing code at petsitter instead of the model
  • Mix and match - Combine multiple tricks for compound effects

Quick Start

# Start your model backend (e.g., Ollama)
ollama serve

# Activate the virtual environment
source .venv/bin/activate

# Run petsitter with tricks
./petsitter -u http://localhost:11434 \
            -m llama3:8b \
            -t tricks/json_mode.py \
            -t tricks/tool_call.py \
            -l localhost:8080

Now point your AI applications to http://localhost:8080/v1.

CLI Options

Option Short Description
--url -u Base URL of upstream model (e.g., http://localhost:11434)
--model -m Model name (optional for vllm, sglang, llama.cpp)
--key -k API key for upstream (if required)
--trick -t Path to a trick module (can be repeated)
--trick-config -tc Path to a trickset JSON file (can be repeated)
--model-config -mc Path to a model config JSON file (MAS URIs for multi-model tricks)
--listen -l Host:port to listen on (default: localhost:8080)

Creating Custom Tricks

flowchart TD
  A[Client POST] --> B
  A -.-> K[Prompt keyword scan]
  subgraph config[Reorderable via config]
    B[Trickset match] --> C[Keyword activate]
    C --> D[System prompt]
    D --> E[Pre-hook]
  end
  E --> L[LLM call]
  L --> F[Post-hook]
  F --> G[Capabilities]
  G --> Z[Client response]
  K -.-> Z
Loading

The Trick class has four hooks you can implement. Each hook is optional.

system_prompt(to_add: str) -> str

When: Called once per request, before any messages are sent to the model.

Purpose: Append instructions to the system prompt. This is how you "prime" the model to behave a certain way.

Example:

def system_prompt(self, to_add: str) -> str:
    return "IMPORTANT: Respond only in valid JSON. No markdown, no explanations."

pre_hook(context: list, params: dict) -> list

When: Called after the system prompt is set, before the model receives the messages.

Purpose: Modify the conversation context. You can inject tool definitions, add few-shot examples, or restructure messages.

Parameters:

  • context: List of message dicts ([{"role": "user", "content": "..."}])
  • params: Request parameters including tools, temperature, etc.

Example:

def pre_hook(self, context: list, params: dict) -> list:
    if "tools" in params:
        tools_json = json.dumps(params["tools"])
        context[0]["content"] += f"\n\nAvailable tools: {tools_json}"
    return context

post_hook(context: list) -> list

When: Called after the model responds, before the response goes back to your application.

Purpose: Validate, transform, or retry. This is where you can:

  • Parse the response and convert it to a different format
  • Detect when the model failed and call it again with feedback
  • Extract tool calls from natural language

Example (JSON validation with retry):

def post_hook(self, context: list) -> list:
    attempts = 3
    while attempts > 0:
        try:
            json.loads(context[-1]["content"])
            break
        except json.JSONDecodeError:
            attempts -= 1
            if attempts == 0:
                break
            context = callmodel(context, "That wasn't valid JSON. Try again.")
    return context

Example (Tool call detection):

def post_hook(self, context: list) -> list:
    content = context[-1]["content"]
    if self._looks_like_tool_call(content):
        context[-1]["tool_calls"] = [self._parse_tool_call(content)]
        context[-1]["content"] = None
    return context

info(capabilities: dict) -> dict

When: Called when building the response to your application.

Purpose: Declare what capabilities this trick provides. Some frameworks check for capabilities before using certain features.

Example:

def info(self, capabilities: dict) -> dict:
    capabilities["json_mode"] = True
    capabilities["tools_support"] = True
    return capabilities

Keywords

Activation

Set keywords on your trick class to activate only when the user includes that word in their message - the keyword is stripped before the model sees it. See tricks/multiround.py for a working example.

# Trick fires when "multiround" is present
curl http://localhost:8080/v1/chat/completions \
  -d '{"messages":[{"role":"user","content":"multiround explain the CAP theorem"}]}'

# Trick does nothing without the keyword
curl http://localhost:8080/v1/chat/completions \
  -d '{"messages":[{"role":"user","content":"explain the CAP theorem"}]}'

Prompts

Prompt keywords let you inject commands to petsitter itself inline in your message using the format (<keyword>: <request>). The framework scans for registered keywords, strips the matching pattern before the model sees it, and routes the request to the appropriate handler.

This is separate from trick keyword activation - keywords activate or deactivate tricks for the current request, while prompt keywords are commands to petsitter that bypass the model entirely.

How to register a prompt keyword

Set prompt_keyword on your Trick subclass:

class MyCommandTrick(Trick):
    prompt_keyword = "mycommand"
    __brief__ = "Handles (mycommand: ...) inline requests"

    def handle_prompt_keyword(self, request: str) -> dict | None:
        return {"role": "assistant", "content": f"You asked: {request}"}

The method receives the text after mycommand: and can return:

  • A message dict - injected as the model response (bypasses the upstream call)
  • None - the pattern is stripped but the normal pipeline continues

Notes

  • Execution goes in order of the prompt reference. Unrecognized prompt keywords are passed through and surface as a non-critical error in the response along with the rest of the response
  • The pattern (<keyword>: <request>) uses the first closing paren - avoid nested parens in the request text.
  • Keyword matching is case-insensitive.
  • If the handler raises, an error message is returned as the assistant response.

Reference Templates

Output Control

Capability Injection

  • Tool Calling - Add tool calling to models without native support
  • Andybot Toolcall - Conversational persona tool calling for small/older models (experimental)

Pipeline

  • Kennel - Route cognitive subtasks to specialized models

Security

  • Secrets Protector - Detect and pseudonymize secrets/PII before they reach the model

Agent

  • Self-Improver - Runtime agent that can add, modify, and list tricks

JSON Mode

tricks/json_mode.py

Enforces valid JSON output by adding formatting instructions to the system prompt, stripping markdown code blocks, and retrying with feedback if the response isn't valid JSON.

./petsitter -u http://localhost:11434 -t tricks/json_mode.py

Code Validator

tricks/code_validator.py

After the model proposes a code change, asks it to describe what the change does, compares the description against the original user request, and retries with feedback if they don't match.

./petsitter -u http://localhost:11434 -t tricks/code_validator.py

Tool Calling

tricks/tool_call.py

Enables tool calling for models without native support by injecting tool definitions into the prompt, parsing JSONRPC-style tool call responses, and converting them to OpenAI tool_calls format.

./petsitter -u http://localhost:11434 -t tricks/tool_call.py

Andybot Toolcall

tricks/andybot_toolcall.py - experimental

An alternative approach to tool calling that uses a conversational persona instead of structured JSON output. The model says DEAR ANDYBOT, <FUNCTION> and ANDYBOT collects each required parameter through dialogue:

  1. Model recognises it needs to call a tool and says DEAR ANDYBOT, GET_WEATHER
  2. ANDYBOT asks: "Can you provide location?"
  3. Model responds: Paris
  4. ANDYBOT builds the tool call and returns it to the application

This works well with small models (3B and under) and older models that struggle with reliable JSON output or native tool_calls. The conversational flow lets them express intent naturally instead of wrestling with syntax.

petsitter -u http://localhost:11434 -t tricks/andybot_toolcall.py -t tricks/json_mode.py

For a more advanced version with inline-argument parsing, confusion recovery, and multi-turn state management, see tricks/conversational_tool.py.

Multi-Model Orchestration

A trick has full control of the request lifecycle - it can call any number of models, not just the one the user pointed at. This lets you decompose a problem into subtasks and route each one to the model best suited for it.

Petsitter supports this through model configs - JSON files that map role names to MAS URIs (url#m=model_name). Tricks declare what roles they need; if a key is missing, petsitter prints a helpful error.

Example modelset.json:

{
    "default": "http://localhost:11434#m=Qwen3.5:8b",
    "thinker": "http://localhost:11434#m=VibeThinker-3B-GGUF:q4_K_M",
    "toolcall": "http://localhost:11434#m=lfm2.5:latest"
}

Kennel

tricks/kennel.py is a reference implementation of the pattern above. It routes cognitive subtasks to three specialized models running in parallel - a thinker for chain-of-thought, a tool-caller for deciding which tools to invoke, and an emitter for generating the final response.

# Pull three small models that together fit on modest hardware (< 6B total)
ollama pull VibeThinker-3B    # reasoning / chain-of-thought
ollama pull LFM2.5-230M       # tool-calling (tiny, fast)
ollama pull Qwen3.5-2B        # response generation

# Each model sees a context optimized for its role
./petsitter -mc examples/modelset.json \
            -t tricks/kennel.py

Pipeline:

  1. Thinker gets the conversation + "think step by step" → produces reasoning
  2. Tool-caller (if tools are present) gets context + reasoning + tool definitions → decides which tool to call
  3. Emitter receives the enriched context and generates the final response

Kennel is one architecture; you could write a trick that routes by language, by file type, by user role, or by anything else you can express in a post_hook.

Secrets Protector

tricks/secrets_protector.py

Detects and pseudonymizes sensitive information before it reaches the model, then restores original values in the response:

  • Detection - regex patterns for API keys (OpenAI, Anthropic, AWS, Google, Stripe), tokens (JWT, GitHub, Slack, Bearer), credentials (database URLs, private keys), and PII (emails, phones, SSNs, credit cards, IPs)
  • Format-preserving substitutes - realistic replacements (e.g., alice@example.comuser.0001@sanitized.local) that preserve token boundaries so the model's tokenizer doesn't conflate distinct entries
  • Bidirectional vault - consistent pseudonyms across the session (same secret → same substitute) with automatic restoration in both natural-language responses and tool call arguments
./petsitter -u http://localhost:11434 -t tricks/secrets_protector.py

Self-Improver

tricks/self_improver.py

Watches for the prompt keyword petsitter in your messages. When it sees (petsitter: <request>), it strips the tag and spawns an agent loop with the default model. The agent has tools to add, modify, and list trick files - it reads instructions from .agents/skills/self-improver/SKILL.md to understand the petsitter trick API and conventions.

This is a reference implementation for the prompt keywords pattern (see below).

petsitter -u http://localhost:11434 -t tricks/self_improver.py

Example usage:

User: (petsitter: add a trick that logs every request to a file)
Model: Creates tricks/request_logger.py and explains how to load it
User: explain the CAP theorem (petsitter: add a thinking mode)
Model: Explains CAP theorem (tag stripped, petsitter handled separately)

Tricksets

A trickset bundles a group of tricks with routing filters. When a request comes in, petsitter matches the X-Title header and model field against each loaded trickset's filters, then runs only the tricks from matching sets.

Tricksets live as JSON files in the tricksets/ directory:

{
  "schema": "0.5.0",
  "name": "my-trickset",
  "filters": {
    "X-Title": "opencode*",
    "Model": "*"
  },
  "tricks": [
    "tricks/json_mode.py",
    "tricks/tool_call.py"
  ]
}

Each loaded trickset is also exposed as a model named trickset/<name> (e.g., trickset/gemma4). Selecting this model in a client bypasses the filter matching and runs that trickset's tricks directly on every request.

Using tricksets

# Load a trickset at startup (can be combined with -t)
petsitter -u http://localhost:11434 \
          -tc tricksets/opencode.json \
          -t tricks/json_mode.py

Managing tricksets at runtime

The control panel at / has a full trickset manager. You can also use the API:

# List loaded tricksets
curl http://localhost:8080/api/tricksets

# List available trickset files
curl http://localhost:8080/api/tricksets/available

# Load a trickset
curl -X POST http://localhost:8080/api/tricksets/load \
  -d '{"path": "tricksets/gemma4.json"}'

# Update filters
curl -X PUT http://localhost:8080/api/tricksets/opencode \
  -d '{"filters": {"X-Title": "myagent*", "Model": "*"}}'

# Unload a trickset
curl -X POST http://localhost:8080/api/tricksets/unload \
  -d '{"name": "opencode"}'

How routing works

  1. Extract X-Title from the request header and model from the request body.
  2. For each loaded trickset, check if its filters match using fnmatch.
  3. Collect tricks from all matching sets, deduplicating by class name.
  4. Run the pipeline with only those tricks.

The default catch-all trickset matches {"X-Title": "*", "Model": "*"} so --trick trick works the same as before.

The schema field in a trickset JSON file records the petsitter version that wrote it. This tells tools how to interpret the file without needing an external lookup table.

Model Configs

A model config JSON file lets you run multi-model tricks like Kennel that need different models for different subtasks. Each key maps to a MAS URI - a URL with a fragment (#m=) specifying the model name:

{
    "default": "http://localhost:11434#m=Qwen3.5:8b",
    "thinker": "http://localhost:11434#m=VibeThinker-3B-GGUF:q4_K_M",
    "toolcall": "http://localhost:11434#m=lfm2.5:latest"
}

The "default" key sets the primary model (equivalent to -u/--url + -m/--model). Tricks declare what keys they need - for example, KennelTrick requires ["default", "thinker", "toolcall"]. If a key is missing, petsitter prints a helpful error with the expected format.

# Use a model config instead of -u / -m
petsitter -mc modelset-example.json -t tricks/kennel.py -l localhost:8080

If -u/--url is also given, it overrides the "default" from the model config.

Failure Modes

Trick retry loops are bounded

Two tricks loop internally: JSON Mode and Code Validator. Both default to 3 attempts, configurable via __init__. After exhausting attempts they give the model's best-effort output back to the user - they don't hang or cascade.

# Both accept max_attempts:
trick = JsonModeTrick(max_attempts=5)
trick = CodeValidatorTrick(max_attempts=5)

No global infinite-loop protection

post_hook receives the full context and returns a (potentially modified) context. The framework calls post_hooks once per request - it does not loop them. However, if a trick calls callmodel inside its own loop (as JSON Mode and Code Validator do), that loop is the trick's responsibility. None of the built-in tricks have unbounded loops, and custom tricks should follow the same pattern.

Network failures are not retried

callmodel and callmodel_sync make a single HTTP request to the upstream - no retry, no backoff. If the upstream is down, the error propagates as a 502 to the client. Add retry at the client level or wrap callmodel in your own try/except inside the trick.

Tool calls are client-driven

When a trick produces tool_calls in the response, petsitter returns them to your application. It does not execute the tool or re-invoke the model with the result - that's the client's job. If the client sends back a tool role message with the result, it enters the pipeline fresh on the next request.

Kennel sub-model failures

If a sub-model call in Kennel fails (e.g., the thinker model is unreachable), the exception propagates and the request fails. Kennel has no fallback - if you need resilience, wrap individual callmodel_sync calls in your own try/except.

API Endpoints

Petsitter exposes OpenAI-compatible endpoints plus management endpoints:

Proxy:

  • POST /v1/chat/completions - Chat completions (proxied + transformed)
  • GET /v1/models - List available models (proxied)
  • GET /health - Health check

Management:

  • GET /api/info - Server information
  • GET /api/tricks - List loaded tricks
  • GET /api/tricks/available - List available trick modules
  • POST /api/tricks/load - Load a trick
  • POST /api/tricks/unload - Unload a trick
  • POST /api/tricks/reorder - Reorder loaded tricks
  • GET /api/logs - Activity log
  • GET /api/tricksets - List loaded tricksets
  • GET /api/tricksets/available - List available trickset files
  • POST /api/tricksets/load - Load a trickset
  • POST /api/tricksets/unload - Unload a trickset
  • GET /api/tricksets/{name} - Get trickset details
  • PUT /api/tricksets/{name} - Update trickset filters/tricks

A Swagger UI is available at /docs and the OpenAPI spec at /static/openapi.json.

Running Tests

# Activate virtual environment
source .venv/bin/activate

# Install test dependencies
pip install -e ".[test]"

# Run tests
pytest tests/

Example: Using with an Agentic Framework

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="any-model-name",
    messages=[{"role": "user", "content": "List files in /tmp"}],
    tools=[{"type": "function", "function": {"name": "get_weather", "parameters": ...}}]
)

License

MIT

About

Model and task specific harnesses sitting between inference and agent

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors