Knowledge & Memory

Training

Fine-tuning pipeline for creating custom Ollama models trained on vault structure, routing rules, and organizational patterns from your personal knowledge base.

Python Ollama Qwen2.5 JSONL Alpaca Format

In Plain English

This is a workshop for creating custom AI models that run locally through Ollama (a tool for running AI on your own computer without cloud services). You feed it examples of how you want the AI to respond, and it fine-tunes a model (adjusts an existing AI brain to learn your specific patterns) for your needs, like teaching a new employee by showing them exactly how you want things done, until they can do it on their own.

Problem

General-purpose models don't understand your file organization, naming conventions, or routing rules. Training produces models that actually know where shadow work journals go, how to route book notes, and which folder structure applies to each project - baking your vault rules directly into model weights.

Architecture

SOURCE DATA Obsidian Vault Folder Structure Vault Constitution Routing Rules Example Corrections MINING & GENERATION generate_training_data.py Main Pipeline Pattern Extractor Category Mapper Format Converter Multi-format Output TRAINING DATA JSONL Output Line-delimited JSON Alpaca Format Instruction-following ChatML Format Conversation pairs FINE-TUNING Base Model Qwen2.5:14b Modelfile Config System prompts Training Params Epochs, LR, batch Data Validation Format checks Fine-Tuning Process Ollama create -f Modelfile Bakes vault rules into weights DEPLOYMENT Custom Model vault-manager Specialized routing create-model.bat Deployment script Ollama Registry Model storage MCP Server Exposes model API vault_route(query) suggest_path(note) INTEGRATION Claude Code Agent runtime Routing Requests Response Validation Performance Metrics Feedback Loop TRAINING PIPELINE PRODUCTION DEPLOYMENT
Scroll to explore diagram

Key Features

Vault Mining

automated

Scripts scan your Obsidian vault structure, extract routing patterns, and generate training examples in JSONL format automatically.

Multi-Format Export

3 formats

Training data is generated in Alpaca (instruction-following), ChatML (conversation), and JSONL (line-delimited) formats for flexibility.

Constitution Encoding

custom rules

The Vault Constitution - routing rules like "shadow work → 02_Personal/Psychology/" - is encoded directly into the model's behavior through fine-tuning.

How It Works

01

Mining

generate_training_data.py scans the vault structure and creates training examples from existing file paths and categories

02

Formatting

Raw examples are converted to Alpaca/ChatML/JSONL format with instruction-response pairs

03

Fine-Tuning

The base Qwen2.5:14b model is fine-tuned using the vault-specific training data and constitution rules

04

Deployment

The custom model is loaded into Ollama via create-model.bat and exposed through MCP for Claude Code to use

Tech Stack

Base Model

Qwen2.5:14b via Ollama

Data Generation

Python scripts

Formats

JSONL, Alpaca, ChatML

Constitution

Markdown rules document

Deployment

Ollama Modelfile + batch script

Integration

MCP server