Petsitter is an OpenAI-compatible proxy that layers smart harnesses on top of language models to give them capabilities they don't natively have. It also makes finicky behaviors reliable and dependable.
You install it, point it at a model, load a few example tricks, and suddenly things that model couldn't do before - tool calling, structured JSON, multi-step reasoning - start working. Then you think: "oh, I could make it do X" - and you write your own trick.
The built-in tricks are starting points. Tweak them, combine them, or use them as a reference to build something entirely different. Petsitter isn't a turnkey product; it's a kit.
Petsitter intercepts every request/response pair and runs it through a pipeline of hooks. Each trick picks which hooks it needs:
system_prompt- Inject instructions before the model sees the conversationpre_hook- Modify messages or inject tool definitions before the API callpost_hook- Validate, retry, or transform the model's responseinfo- Declare capabilities back to your application
A trick can be as simple as appending a sentence to the system prompt, or as involved as routing subtasks to three different models in parallel. There's a GUI at / for loading/unloading tricks, editing trickset filters, browsing logs, and pointing at different models at runtime.
You can also edit tricks, reorder them, disable, add new ones, and filter them through a built-in dashboard:

Petsitter is part of the DAY50 suite of open-source tools for local AI workflows and constructing better agents.
- No model changes required - Works with any OpenAI-compatible endpoint
- Pluggable architecture - Write your own tricks in Python. (Skills are included in
.agents) - Transparent to your app - Point your existing code at petsitter instead of the model
- Mix and match - Combine multiple tricks for compound effects
# Start your model backend (e.g., Ollama)
ollama serve
# Activate the virtual environment
source .venv/bin/activate
# Run petsitter with tricks
./petsitter -u http://localhost:11434 \
-m llama3:8b \
-t tricks/json_mode.py \
-t tricks/tool_call.py \
-l localhost:8080Now point your AI applications to http://localhost:8080/v1.
| Option | Short | Description |
|---|---|---|
--url |
-u |
Base URL of upstream model (e.g., http://localhost:11434) |
--model |
-m |
Model name (optional for vllm, sglang, llama.cpp) |
--key |
-k |
API key for upstream (if required) |
--trick |
-t |
Path to a trick module (can be repeated) |
--trick-config |
-tc |
Path to a trickset JSON file (can be repeated) |
--model-config |
-mc |
Path to a model config JSON file (MAS URIs for multi-model tricks) |
--listen |
-l |
Host:port to listen on (default: localhost:8080) |
flowchart TD
A[Client POST] --> B
A -.-> K[Prompt keyword scan]
subgraph config[Reorderable via config]
B[Trickset match] --> C[Keyword activate]
C --> D[System prompt]
D --> E[Pre-hook]
end
E --> L[LLM call]
L --> F[Post-hook]
F --> G[Capabilities]
G --> Z[Client response]
K -.-> Z
The Trick class has four hooks you can implement. Each hook is optional.
When: Called once per request, before any messages are sent to the model.
Purpose: Append instructions to the system prompt. This is how you "prime" the model to behave a certain way.
Example:
def system_prompt(self, to_add: str) -> str:
return "IMPORTANT: Respond only in valid JSON. No markdown, no explanations."When: Called after the system prompt is set, before the model receives the messages.
Purpose: Modify the conversation context. You can inject tool definitions, add few-shot examples, or restructure messages.
Parameters:
context: List of message dicts ([{"role": "user", "content": "..."}])params: Request parameters includingtools,temperature, etc.
Example:
def pre_hook(self, context: list, params: dict) -> list:
if "tools" in params:
tools_json = json.dumps(params["tools"])
context[0]["content"] += f"\n\nAvailable tools: {tools_json}"
return contextWhen: Called after the model responds, before the response goes back to your application.
Purpose: Validate, transform, or retry. This is where you can:
- Parse the response and convert it to a different format
- Detect when the model failed and call it again with feedback
- Extract tool calls from natural language
Example (JSON validation with retry):
def post_hook(self, context: list) -> list:
attempts = 3
while attempts > 0:
try:
json.loads(context[-1]["content"])
break
except json.JSONDecodeError:
attempts -= 1
if attempts == 0:
break
context = callmodel(context, "That wasn't valid JSON. Try again.")
return contextExample (Tool call detection):
def post_hook(self, context: list) -> list:
content = context[-1]["content"]
if self._looks_like_tool_call(content):
context[-1]["tool_calls"] = [self._parse_tool_call(content)]
context[-1]["content"] = None
return contextWhen: Called when building the response to your application.
Purpose: Declare what capabilities this trick provides. Some frameworks check for capabilities before using certain features.
Example:
def info(self, capabilities: dict) -> dict:
capabilities["json_mode"] = True
capabilities["tools_support"] = True
return capabilitiesSet keywords on your trick class to activate only when the user includes that word in their message - the keyword is stripped before the model sees it. See tricks/multiround.py for a working example.
# Trick fires when "multiround" is present
curl http://localhost:8080/v1/chat/completions \
-d '{"messages":[{"role":"user","content":"multiround explain the CAP theorem"}]}'
# Trick does nothing without the keyword
curl http://localhost:8080/v1/chat/completions \
-d '{"messages":[{"role":"user","content":"explain the CAP theorem"}]}'Prompt keywords let you inject commands to petsitter itself inline in your message using the format (<keyword>: <request>). The framework scans for registered keywords, strips the matching pattern before the model sees it, and routes the request to the appropriate handler.
This is separate from trick keyword activation - keywords activate or deactivate tricks for the current request, while prompt keywords are commands to petsitter that bypass the model entirely.
Set prompt_keyword on your Trick subclass:
class MyCommandTrick(Trick):
prompt_keyword = "mycommand"
__brief__ = "Handles (mycommand: ...) inline requests"
def handle_prompt_keyword(self, request: str) -> dict | None:
return {"role": "assistant", "content": f"You asked: {request}"}The method receives the text after mycommand: and can return:
- A message dict - injected as the model response (bypasses the upstream call)
None- the pattern is stripped but the normal pipeline continues
- Execution goes in order of the prompt reference. Unrecognized prompt keywords are passed through and surface as a non-critical error in the response along with the rest of the response
- The pattern
(<keyword>: <request>)uses the first closing paren - avoid nested parens in the request text. - Keyword matching is case-insensitive.
- If the handler raises, an error message is returned as the assistant response.
- JSON Mode - Enforce valid JSON output
- Code Validator - Self-healing validation through model self-description
- Tool Calling - Add tool calling to models without native support
- Andybot Toolcall - Conversational persona tool calling for small/older models (experimental)
- Kennel - Route cognitive subtasks to specialized models
- Secrets Protector - Detect and pseudonymize secrets/PII before they reach the model
- Self-Improver - Runtime agent that can add, modify, and list tricks
Enforces valid JSON output by adding formatting instructions to the system prompt, stripping markdown code blocks, and retrying with feedback if the response isn't valid JSON.
./petsitter -u http://localhost:11434 -t tricks/json_mode.pyAfter the model proposes a code change, asks it to describe what the change does, compares the description against the original user request, and retries with feedback if they don't match.
./petsitter -u http://localhost:11434 -t tricks/code_validator.pyEnables tool calling for models without native support by injecting tool definitions into the prompt, parsing JSONRPC-style tool call responses, and converting them to OpenAI tool_calls format.
./petsitter -u http://localhost:11434 -t tricks/tool_call.pytricks/andybot_toolcall.py - experimental
An alternative approach to tool calling that uses a conversational persona instead of structured JSON output. The model says DEAR ANDYBOT, <FUNCTION> and ANDYBOT collects each required parameter through dialogue:
- Model recognises it needs to call a tool and says
DEAR ANDYBOT, GET_WEATHER - ANDYBOT asks: "Can you provide location?"
- Model responds:
Paris - ANDYBOT builds the tool call and returns it to the application
This works well with small models (3B and under) and older models that struggle with reliable JSON output or native tool_calls. The conversational flow lets them express intent naturally instead of wrestling with syntax.
petsitter -u http://localhost:11434 -t tricks/andybot_toolcall.py -t tricks/json_mode.pyFor a more advanced version with inline-argument parsing, confusion recovery, and multi-turn state management, see tricks/conversational_tool.py.
A trick has full control of the request lifecycle - it can call any number of models, not just the one the user pointed at. This lets you decompose a problem into subtasks and route each one to the model best suited for it.
Petsitter supports this through model configs - JSON files that map role names to MAS URIs (url#m=model_name). Tricks declare what roles they need; if a key is missing, petsitter prints a helpful error.
Example modelset.json:
{
"default": "http://localhost:11434#m=Qwen3.5:8b",
"thinker": "http://localhost:11434#m=VibeThinker-3B-GGUF:q4_K_M",
"toolcall": "http://localhost:11434#m=lfm2.5:latest"
}tricks/kennel.py is a reference implementation of the pattern above. It routes cognitive subtasks to three specialized models running in parallel - a thinker for chain-of-thought, a tool-caller for deciding which tools to invoke, and an emitter for generating the final response.
# Pull three small models that together fit on modest hardware (< 6B total)
ollama pull VibeThinker-3B # reasoning / chain-of-thought
ollama pull LFM2.5-230M # tool-calling (tiny, fast)
ollama pull Qwen3.5-2B # response generation
# Each model sees a context optimized for its role
./petsitter -mc examples/modelset.json \
-t tricks/kennel.pyPipeline:
- Thinker gets the conversation + "think step by step" → produces reasoning
- Tool-caller (if tools are present) gets context + reasoning + tool definitions → decides which tool to call
- Emitter receives the enriched context and generates the final response
Kennel is one architecture; you could write a trick that routes by language, by file type, by user role, or by anything else you can express in a post_hook.
Detects and pseudonymizes sensitive information before it reaches the model, then restores original values in the response:
- Detection - regex patterns for API keys (OpenAI, Anthropic, AWS, Google, Stripe), tokens (JWT, GitHub, Slack, Bearer), credentials (database URLs, private keys), and PII (emails, phones, SSNs, credit cards, IPs)
- Format-preserving substitutes - realistic replacements (e.g.,
alice@example.com→user.0001@sanitized.local) that preserve token boundaries so the model's tokenizer doesn't conflate distinct entries - Bidirectional vault - consistent pseudonyms across the session (same secret → same substitute) with automatic restoration in both natural-language responses and tool call arguments
./petsitter -u http://localhost:11434 -t tricks/secrets_protector.pyWatches for the prompt keyword petsitter in your messages. When it sees (petsitter: <request>), it strips the tag and spawns an agent loop with the default model. The agent has tools to add, modify, and list trick files - it reads instructions from .agents/skills/self-improver/SKILL.md to understand the petsitter trick API and conventions.
This is a reference implementation for the prompt keywords pattern (see below).
petsitter -u http://localhost:11434 -t tricks/self_improver.pyExample usage:
User: (petsitter: add a trick that logs every request to a file)
Model: Creates tricks/request_logger.py and explains how to load it
User: explain the CAP theorem (petsitter: add a thinking mode)
Model: Explains CAP theorem (tag stripped, petsitter handled separately)
A trickset bundles a group of tricks with routing filters. When a request comes in, petsitter matches the X-Title header and model field against each loaded trickset's filters, then runs only the tricks from matching sets.
Tricksets live as JSON files in the tricksets/ directory:
{
"schema": "0.5.0",
"name": "my-trickset",
"filters": {
"X-Title": "opencode*",
"Model": "*"
},
"tricks": [
"tricks/json_mode.py",
"tricks/tool_call.py"
]
}Each loaded trickset is also exposed as a model named trickset/<name> (e.g., trickset/gemma4). Selecting this model in a client bypasses the filter matching and runs that trickset's tricks directly on every request.
# Load a trickset at startup (can be combined with -t)
petsitter -u http://localhost:11434 \
-tc tricksets/opencode.json \
-t tricks/json_mode.pyThe control panel at / has a full trickset manager. You can also use the API:
# List loaded tricksets
curl http://localhost:8080/api/tricksets
# List available trickset files
curl http://localhost:8080/api/tricksets/available
# Load a trickset
curl -X POST http://localhost:8080/api/tricksets/load \
-d '{"path": "tricksets/gemma4.json"}'
# Update filters
curl -X PUT http://localhost:8080/api/tricksets/opencode \
-d '{"filters": {"X-Title": "myagent*", "Model": "*"}}'
# Unload a trickset
curl -X POST http://localhost:8080/api/tricksets/unload \
-d '{"name": "opencode"}'- Extract
X-Titlefrom the request header andmodelfrom the request body. - For each loaded trickset, check if its filters match using
fnmatch. - Collect tricks from all matching sets, deduplicating by class name.
- Run the pipeline with only those tricks.
The default catch-all trickset matches {"X-Title": "*", "Model": "*"} so --trick trick works the same as before.
The schema field in a trickset JSON file records the petsitter version that wrote it. This tells tools how to interpret the file without needing an external lookup table.
A model config JSON file lets you run multi-model tricks like Kennel that need different models for different subtasks. Each key maps to a MAS URI - a URL with a fragment (#m=) specifying the model name:
{
"default": "http://localhost:11434#m=Qwen3.5:8b",
"thinker": "http://localhost:11434#m=VibeThinker-3B-GGUF:q4_K_M",
"toolcall": "http://localhost:11434#m=lfm2.5:latest"
}The "default" key sets the primary model (equivalent to -u/--url + -m/--model). Tricks declare what keys they need - for example, KennelTrick requires ["default", "thinker", "toolcall"]. If a key is missing, petsitter prints a helpful error with the expected format.
# Use a model config instead of -u / -m
petsitter -mc modelset-example.json -t tricks/kennel.py -l localhost:8080If -u/--url is also given, it overrides the "default" from the model config.
Two tricks loop internally: JSON Mode and Code Validator. Both default to 3 attempts, configurable via __init__. After exhausting attempts they give the model's best-effort output back to the user - they don't hang or cascade.
# Both accept max_attempts:
trick = JsonModeTrick(max_attempts=5)
trick = CodeValidatorTrick(max_attempts=5)post_hook receives the full context and returns a (potentially modified) context. The framework calls post_hooks once per request - it does not loop them. However, if a trick calls callmodel inside its own loop (as JSON Mode and Code Validator do), that loop is the trick's responsibility. None of the built-in tricks have unbounded loops, and custom tricks should follow the same pattern.
callmodel and callmodel_sync make a single HTTP request to the upstream - no retry, no backoff. If the upstream is down, the error propagates as a 502 to the client. Add retry at the client level or wrap callmodel in your own try/except inside the trick.
When a trick produces tool_calls in the response, petsitter returns them to your application. It does not execute the tool or re-invoke the model with the result - that's the client's job. If the client sends back a tool role message with the result, it enters the pipeline fresh on the next request.
If a sub-model call in Kennel fails (e.g., the thinker model is unreachable), the exception propagates and the request fails. Kennel has no fallback - if you need resilience, wrap individual callmodel_sync calls in your own try/except.
Petsitter exposes OpenAI-compatible endpoints plus management endpoints:
Proxy:
POST /v1/chat/completions- Chat completions (proxied + transformed)GET /v1/models- List available models (proxied)GET /health- Health check
Management:
GET /api/info- Server informationGET /api/tricks- List loaded tricksGET /api/tricks/available- List available trick modulesPOST /api/tricks/load- Load a trickPOST /api/tricks/unload- Unload a trickPOST /api/tricks/reorder- Reorder loaded tricksGET /api/logs- Activity logGET /api/tricksets- List loaded tricksetsGET /api/tricksets/available- List available trickset filesPOST /api/tricksets/load- Load a tricksetPOST /api/tricksets/unload- Unload a tricksetGET /api/tricksets/{name}- Get trickset detailsPUT /api/tricksets/{name}- Update trickset filters/tricks
A Swagger UI is available at /docs and the OpenAPI spec at /static/openapi.json.
# Activate virtual environment
source .venv/bin/activate
# Install test dependencies
pip install -e ".[test]"
# Run tests
pytest tests/from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="any-model-name",
messages=[{"role": "user", "content": "List files in /tmp"}],
tools=[{"type": "function", "function": {"name": "get_weather", "parameters": ...}}]
)MIT
