Token Efficiency

Type: concept Tags: llm-wiki, hot-cache, claude-code

Summary

The ability to answer questions using fewer tokens by organizing knowledge into navigable structure rather than feeding raw or unstructured data into context.

LLM Wiki Impact

The llm-wiki approach delivers significant token savings vs. dumping raw files into context:

One X user turned 383 scattered files and 100+ meeting transcripts into a compact wiki and dropped token usage by 95% when querying with Claude.

Why It’s More Efficient

  • Claude reads index.md → follows 2–3 relevant links → answers — instead of loading all source material
  • [[hot-cache]] further reduces tokens on repeat queries by caching recent context
  • Structured summaries in wiki pages are much shorter than raw source documents

Cost Model

  • LLM Wiki cost: tokens only (reads index + N pages per query)
  • RAG cost: ongoing embedding compute + vector DB storage + query latency