i ran my prompt-analysis tool on my own logs. 82% of my "questions" weren't me.

i built a small tool called promptprint to answer one question about myself: is my questioning getting sharper over time? not my code — the way i direct the agent. it reads your Claude Code / Codex logs 100% locally and ends with what to fix next, not a vanity number. then i did the thing every honest tool should have to survive: i pointed it at my own logs.

the first pass told me i'd asked about 28,000 questions. flattering. and wrong.

most of the log wasn't me

in an automation-heavy setup — subagents, a memory plugin, eval and RAG apps that replay the same system block over and over — most of what a log calls a prompt was never typed by a human. it's a subagent's turn, an injected context header, the same 16,000-character block pasted five times. when i filtered that out, this is what was left:

the scan	blocks
raw blocks scanned	27,908
— memory / context injection	19,151
— subagent turns	3,155
— repeated / duplicate	572
actual human questions	5,030

82% of it wasn't me. every "growth metric" i'd been reading was inflated about five-fold by traffic i never wrote. a tool built to measure my judgment was mostly measuring my machines.

the tempting move, and the one i took

the easy fix is to quietly filter the noise and report the clean number. it looks better and nobody checks. i did the opposite: i made the tool show its work. every run now prints a trust receipt before anything else —

trust receipt: scanned 27,908 blocks · 82% machine/injection excluded
(subagent 3,155 · injection 19,151 · repeat 131) → 5,030 human questions

the receipt isn't decoration. it's the denominator — the "out of how many" that every percentage hides. a growth number without its denominator is just a vibe. this one hands you the denominator up front, so you can decide whether to trust the rest. that's the entire brand of the tool: it would rather be auditable than impressive.

it's still not clean, and i'm saying so

here's the part i'm not hiding. cutting 82% doesn't make the remaining 18% pristine. the top "repeated tasks worth turning into a skill" that the tool surfaced after filtering were still generic tokens — text, change, json — each averaging 14,000 to 18,000 characters. those aren't questions. they're machine residue that slipped through the filter.

so the receipt discloses a ratio, not a promise of perfect classification. "82% excluded" is an honest disclosure of an environment-dependent estimate — not a claim that the other 18% is spotless. i'd rather ship the real limit than a rounded-up story. honest beats clean.

with the noise gone, the real signal showed up

once the denominator was actual questions, the trend i was hoping to see was finally legible — and small enough to state without exaggeration. measured as a share of my own messages, not raw counts (raw counts reward writing more, not asking better):

as a share of my messages	before	now
verification questions	4%	11%
counter-questions	2%	7%
critique / push-back	2%	2%

more of my prompts now ask the agent to check itself or push back on my framing; the raw volume of my questions actually fell per session. that's the direction i wanted — fewer, sharper — but i only trust it because the noise it's measured against is disclosed, not swept away.

why i'm writing this down

the interesting artifact here isn't the tool. it's the failure i caught by turning the tool on myself, and the choice of what to do with it: report the ratio, keep the residue visible, let the reader audit the denominator. it runs fully offline — no log leaves your machine — and the intermediate files that hold your real questions are never committed.

i'm not going to tell you to adopt it yet; the 18% still needs work, and i'd rather earn that with a cleaner filter than a louder pitch. this is the discovery, in the open, receipts and all. a metric you can't audit is just a vibe — so here's the denominator.

the code, the adapter that filters machine traffic, and the receipt:

github.com/shryu1994/promptprint →