- Browser strategy: headless first with automatic retry in visible Chrome on failure
- New --browser auto|headless|headed flag with --headless/--headed shortcuts
- Content cleaner module for HTML preprocessing (remove ads, base64 images, scripts)
- Media localizer now handles base64 data URIs alongside remote URLs
- Capture finalUrl from browser to track redirects for output path
- Agent quality gate documentation for post-capture validation
- Upgrade defuddle ^0.12.0 → ^0.14.0
- Add unit tests for content-cleaner, html-to-markdown, legacy-converter, media-localizer
- New parsers/ module with pluggable rule system for site-specific HTML extraction
- X status parser: extract tweet text, media, quotes, author from data-testid elements
- X article parser: extract long-form article content with inline media
- archive.ph parser: restore original URL and prefer #CONTENT container
- Improved slug generation with stop words and content-aware slugs
- Output path uses subdirectory structure (domain/slug/slug.md)
- Fix: preserve anchor elements containing media in legacy converter
- Fix: smarter title deduplication in markdown document builder
- Save rendered HTML as sibling -captured.html file alongside markdown
- Defuddle-first conversion with automatic fallback to legacy Readability/selector extractor
- Add rawHtml, conversionMethod, fallbackReason to ConversionResult
- Log converter method and fallback reason in CLI output
- Add --download-media flag to download images/videos to local dirs and rewrite markdown links
- Extract coverImage from page meta (og:image) into YAML front matter
- Handle data-src lazy loading for WeChat and similar sites
- Add EXTEND.md preferences with first-time setup for download_media setting
- Add media-localizer module adapted from x-to-markdown