Blog

Technical notes — LLMs, infra, quant

Findings: Karpathy-style autoresearch on a crypto backtester (local LLM)

Local Qwen 3.5 autoresearch on my crypto DB + Nautilus-style backtester (~2h, 30+ iter, $0 API): tool-calling blocker, run observations, human-in-the-loop steering, GA contrast, diversity, gates.

7 min read · April 24, 2026

2026 · llm quant backtesting vllm qwen automation crypto local-inference · research
Qwen 3.6 35B-A3B on vLLM: do the Qwen 3.5 tool-calling fixes carry over?

Follow-up testing: same qwen3_xml parser, qwen3.5-enhanced.jinja template, and mixed-GPU tuning as Qwen 3.5-27B—plus three agentic runs comparing official vs enhanced configs on Qwen3.6-35B-A3B-FP8.

8 min read · April 20, 2026

2026 · vllm qwen tool-calling llm agent inference · bug-fixes
Claude Code with local vLLM: client validation, model aliases, and a working settings.json

Run Claude Code against local vLLM without Anthropic API access: why common env-only recipes fail, the alias + settings.json pattern that works, and when this matters if you cannot register or use the Claude API.

12 min read · April 19, 2026

2026 · claude-code vllm llm local-inference anthropic-api · bug-fixes
Stable tool calling for Qwen 3.5 27B/35B on vLLM: template, parser, and mixed-GPU fixes

Debugging notes on Jinja chat templates, qwen3_xml vs qwen3_coder parsers, mixed-GPU FP8 drift, and SFT-distilled checkpoints when running Qwen 3.5 27B/35B-class models for long agentic sessions on vLLM.

10 min read · April 13, 2026

2026 · vllm qwen tool-calling llm inference gpu · bug-fixes
Workaround for Enabling NCCL P2P Communication for NVIDIA RTX 4090 Workstations

What NCCL P2P means, why it matters on multi-GPU workstations, how Resizable BAR fits in, and a concrete setup path for RTX 4090.

7 min read · May 21, 2025

2025 · nvidia driver p2p gpu deep learning · bug-fixes

Blog

Technical notes — LLMs, infra, quant

Findings: Karpathy-style autoresearch on a crypto backtester (local LLM)

Qwen 3.6 35B-A3B on vLLM: do the Qwen 3.5 tool-calling fixes carry over?

Claude Code with local vLLM: client validation, model aliases, and a working settings.json

Stable tool calling for Qwen 3.5 27B/35B on vLLM: template, parser, and mixed-GPU fixes

Workaround for Enabling NCCL P2P Communication for NVIDIA RTX 4090 Workstations