# Codebase Summarizer A command-line tool that extracts a concise summary of a codebase, including symbols and their documentation comments, generating a `codebase_summary.md` file. ## Why? Reasoning and coding LLMs charge per-token. Every token in your context window—including source code, comments, and boilerplate—counts toward your bill. For large codebases, this adds up fast. Context windows also have hard limits. Feed too much and the model can't hold the rest, or you pay premium rates for extended context. Most of a codebase is noise: generated files, dependency manifests, test harnesses, utility functions that do matter to runtime but not to understanding the architecture. This tool extracts only the essential structure—public symbols, their types, signatures, and documentation—so you get a codebase map that fits in a fraction of the tokens. Less context used means lower costs per query, more budget for actual reasoning, and faster responses. ## Features - **Multi-language support**: Go, Rust, Python, Dart, R, TypeScript/JavaScript, Java, C, C++, C#, Ruby, PHP, Swift, Kotlin, and more - **Symbol extraction**: Functions, structs, enums, classes, methods, traits, interfaces, type aliases - **Doc comment preservation**: Captures documentation comments associated with each symbol - **Markdown output**: Clean `codebase_summary.md` documenting the entire codebase - **AI agent optimized**: Generates `AGENTS.md` with navigation instructions for AI agents ## Installation ### Build from source ```bash git clone gogs.dmsc.dev/dmsc/codebase_summarizer.git cd codebase_summarizer cargo build --release sudo cp target/release/codebase_summarizer /usr/local/bin/ ``` ## Usage ```bash # Scan current directory codebase_summarizer # Scan a specific directory codebase_summarizer --directory /path/to/codebase # Output to custom location codebase_summarizer --output /tmp/summary.md # Skip AGENTS.md generation codebase_summarizer --no-agents # Enable verbose output codebase_summarizer --verbose # Include private symbols codebase_summarizer --include-all ``` ## How it works 1. **Scan**: Recursively walks the directory, filtering out `target/`, `node_modules/`, `.git/`, and other non-source directories 2. **Parse**: Extracts symbols from each code file using language-specific parsers 3. **Summarize**: Generates a `codebase_summary.md` with file tree and symbol documentation 4. **Optional**: Creates `AGENTS.md` with navigation protocol for AI agents