Antonie Chirilus — Reliable LLM systems

01 / 04 · Writing

Notes on building this stuff.

Field notes from production LLM work — what breaks, what holds, and the math that decides which.

RSS ↗

May · 18 · Inference · 2 min

The 40K-token system prompt that wasn't slow

I spent a week chasing latency on a local LLM setup. The bottleneck wasn't the GPU, the model, or the quantization — it was the prefill stage doing the same 40K tokens of work on every request.

→
May · 17 · Meta · 1 min

Welcome — what this blog is about

Why I'm starting this, what to expect, and the kind of writing I want to do here.

→

02 / 04 · Selected work

Open-source projects.

All on GitHub ↗

001 Open source

↗

Context-Kit

Persistent memory for LLM agents, modeled on human cognition. Seven typed stores — episodic, semantic, procedural, entity, working, summary, buffer — each with its own update strategy. 44 tests, live Streamlit demo.

Python·ChromaDB·OpenAI·Streamlit

002 Open source

↗

Drift Sentinel

Drop-in MCP middleware that catches silent tool-output drift — when Slack reorders threads, an API renames a field, a DB schema shifts — before it cascades through your agent chain. SBERT embeddings + PSI on the projected distribution.

MCP·Sentence-BERT·PSI·SQLite

003 Open source

↗

AgentForge

A five-agent pipeline that turns a one-line requirement into a working CrewAI repo on GitHub. Architect, codegen, test-writer, reviewer with self-correction, deployer.

CrewAI·GitHub API·Pytest

004 Open source

↗

RepoDoctor

An autonomous code-audit swarm. Navigator + Analyst agents on AutoGen 0.4, tool access through MCP, walking a remote repository and producing a structured improvements report.

AutoGen 0.4·MCP·Audit

03 / 04 · Experience

Where the hours have gone.

2025.10 — now

R&D Engineer · AI

Keysight Technologies · Bucharest

Correct-by-construction generation on the Visibility Orchestrator. Outlines, Pydantic, llama.cpp, RAG grounded in network data.
2025.06 — 2025.09

AI Engineer · Intern

Keysight Technologies

Production RAG over Keysight's audit corpus. Function-calling agents on Azure OpenAI, retrieval through Azure Cognitive Search, FastAPI.
2024.09 — 2025.02

ML Engineer · Contract

Roglia SRL · Remote

Custom AI chatbot for public institutions. CrewAI, LangChain, Voiceflow, Groq + OpenAI inference, Grafana for observability.
2022 — 2026

B.Sc Computer Science

University of Bucharest

Finishing this year. Prior: several years of mathematical olympiad at the national stage.

04 / 04 · Contact

Reply, disagree, hire.

If a post here was useful, wrong, or worth arguing about — email me. I'm also open to AI engineering roles, full-time or contract, hybrid in Bucharest or remote on European hours. The shorter the email, the faster the reply.

hello@chirilus.dev → Subscribe (RSS)

LinkedIn /antonie-chirilus ↗ GitHub /tonyc973 ↗ Kaggle /tonychirilus ↗ RSS Subscribe ↗

Reliability, for language models.