Shubo Custom Models

This is Shubo's custom model repository — the single home for the models Shubo serves and the model research it owns. It holds two kinds of thing:

Served models — model definitions + serving code that model-backend deploys and that the rest of the stack triggers (chat / VLM / embedding / detection models, and the docling document parser that powers structure-aware RAG ingestion).
Model research projects — Shubo's own MLX/graph work that produces or evaluates models (linkpred-mlx, gidn, plnlp — Apple-Silicon link-prediction experiments).

It is part of the Shubo workspace and is managed by buckle: buckle init clones it as a sibling of backend/frontend/deploy, and provisions the shared agent context (AGENTS.md/CLAUDE.md/.claude/skills) into it. It is not a sandbox service tier — it does not boot in buckle sandbox and is not worktreed per task (it sits alongside deploy/cloud: present and agent-aware, not part of the running stack). See ../buckle and the workspace ../AGENTS.md.

Layout

models/
├── docling/               # the docling document parser served by model-backend (v0.1.x);
│                          #   emits DoclingDocument structure for structure-aware RAG (see below)
├── custom/                # other custom served-model scaffolding
├── <served-model>/        # one dir per served model (LLM / VLM / embedding / detection),
│                          #   each with its own README + versioned vX.Y.Z/ folders
│                          #   e.g. qwen-2-5-vl-7b-instruct, gte-Qwen2-1.5B-instruct, yolov7, …
├── linkpred-mlx/          # MLX link-prediction (ogbl-collab / arxiv-semantic) — research
├── gidn/                  # Graph Inception Diffusion Networks link-prediction — research
└── plnlp/                 # Pairwise Learning for Neural Link Prediction — research

Each served model folder carries its own README.md (config, weights, build/push steps) and one or more vX.Y.Z/ version folders. Open the folder README for that model's specifics.

Serving model

Served models run on model-backend (the Ray-Serve plane historically; Ray is disabled in production today — see backend/services/model). Because the production fleet is Apple-Silicon MacBook Pro k3s nodes, GPU-accelerated inference does not run inside the Linux containers (no Metal passthrough). Instead the established pattern is host-managed model servers: an MLX/Metal FastAPI process runs on the macOS host (supervised by buckle via launchd), and model-backend routes to it through the staticruntime / runtime_ref seam (the same way gemma/mlx-vlm/ASR are served today). See buckle/scripts/sandbox/qwen3-asr-server.py for the host-server template and backend/services/model/pkg/llm/runtime/ for the routing seam.

Apple-Silicon (MLX) docling hosting

docling is the document parser behind structure-aware RAG (it must emit the DoclingDocument export_to_dict() tree the backend consumes — see backend/docs/artifact/m7-w1b-producer-wiring.md). To get Metal acceleration on the Apple-Silicon fleet, docling is hosted as an MLX host server (mirroring the ASR/VLM host servers) rather than a Ray container. The design — host server, buckle role registration, and the two routing options (redirect the parsing-router model_url, vs. a model-backend external-utility runtime) — lives in docling/docs/mlx-host-serving.md.

Supported serving runtimes (LLM/VLM)

Runtime	AMD64 CPU	ARM64 CPU	AMD64 GPU (CUDA)	Apple GPU (Metal/MLX)
vLLM	✅	✅	✅	—
mlx-vlm	—	—	—	✅
Transformers	✅	✅	✅	✅ (MPS)
llama.cpp	✅	✅	✅	✅

On the Apple-Silicon fleet, MLX/Metal runtimes are the accelerated path (host-managed, as above).

Research projects

linkpred-mlx — Apple-Silicon (MLX) link prediction on ogbl-collab / arxiv-semantic graphs.
gidn — Graph Inception Diffusion Networks for link prediction.
plnlp — Pairwise Learning for Neural Link Prediction.

These are reproducible experiments (data + scripts + logs), not served models; they feed model design.

License

MIT — see the workspace LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
benchmarks		benchmarks
custom		custom
deepseek-r1-distill-qwen-1.5b		deepseek-r1-distill-qwen-1.5b
deepseek-r1-distill-qwen-7b		deepseek-r1-distill-qwen-7b
docling		docling
functionary-medium-v3.2		functionary-medium-v3.2
gemma-2-27b-it		gemma-2-27b-it
gidn		gidn
glm-5		glm-5
gte-Qwen2-1.5B-instruct		gte-Qwen2-1.5B-instruct
jina-clip-v1		jina-clip-v1
linkpred-mlx		linkpred-mlx
llama-2-7b-chat		llama-2-7b-chat
llama-3-2-11b-vision-instruct		llama-3-2-11b-vision-instruct
llama-3-2-3b-instruct		llama-3-2-3b-instruct
llama-3-2-90b-vision-instruct		llama-3-2-90b-vision-instruct
llama-3-3-70b-instruct		llama-3-3-70b-instruct
llama-3-8b-instruct		llama-3-8b-instruct
llamacode-7b		llamacode-7b
llava-1-6-13b		llava-1-6-13b
mobilenetv2		mobilenetv2
phi-3-5-vision		phi-3-5-vision
plnlp		plnlp
qwen-2-5-72b-instruct		qwen-2-5-72b-instruct
qwen-2-5-coder-0.5B-instruct		qwen-2-5-coder-0.5B-instruct
qwen-2-5-vl-3b-instruct		qwen-2-5-vl-3b-instruct
qwen-2-5-vl-7b-instruct		qwen-2-5-vl-7b-instruct
qwen-2-vl-72b-instruct		qwen-2-vl-72b-instruct
stable-diffusion-3.5-large		stable-diffusion-3.5-large
stable-diffusion-xl		stable-diffusion-xl
stella-en-1.5B-v5		stella-en-1.5B-v5
tinyllama		tinyllama
yi-vl-6b		yi-vl-6b
yolov7-stomata		yolov7-stomata
yolov7		yolov7
zephyr-7b		zephyr-7b
.gitignore		.gitignore
.pylint.ini		.pylint.ini
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Shubo Custom Models

Layout

Serving model

Apple-Silicon (MLX) docling hosting

Supported serving runtimes (LLM/VLM)

Research projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Shubo Custom Models

Layout

Serving model

Apple-Silicon (MLX) docling hosting

Supported serving runtimes (LLM/VLM)

Research projects

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages