- Browser strategy: headless first with automatic retry in visible Chrome on failure
- New --browser auto|headless|headed flag with --headless/--headed shortcuts
- Content cleaner module for HTML preprocessing (remove ads, base64 images, scripts)
- Media localizer now handles base64 data URIs alongside remote URLs
- Capture finalUrl from browser to track redirects for output path
- Agent quality gate documentation for post-capture validation
- Upgrade defuddle ^0.12.0 → ^0.14.0
- Add unit tests for content-cleaner, html-to-markdown, legacy-converter, media-localizer
Retry with alternate InnerTube client identities when YouTube returns
anti-bot responses, then fall back to yt-dlp when available. Split
monolithic main.ts into typed modules (youtube, transcript, storage,
shared, types) and add unit tests.
Merge the three plugins (content-skills, ai-generation-skills,
utility-skills) into one plugin entry. Since all three shared the same
source ("./"), Claude Code cached every skill three times. A single
plugin with one source keeps the flat skills/ layout while ensuring
each skill is registered exactly once.
- New parsers/ module with pluggable rule system for site-specific HTML extraction
- X status parser: extract tweet text, media, quotes, author from data-testid elements
- X article parser: extract long-form article content with inline media
- archive.ph parser: restore original URL and prefer #CONTENT container
- Improved slug generation with stop words and content-aware slugs
- Output path uses subdirectory structure (domain/slug/slug.md)
- Fix: preserve anchor elements containing media in legacy converter
- Fix: smarter title deduplication in markdown document builder
- Guard last chapter end against duration=0: use Math.max(duration, ch.start)
- Remove unnecessary 'as any' cast in backfill
- Check all chapters for missing end (not just first) via .some()
- Skip backfill when needsFetch is true (about to refetch anyway)
- Wrap backfill writeFileSync in try/catch (best-effort persistence)
Videos cached before the chapter end-time change would silently
lack the 'end' field when loaded from cache. This adds a migration
that detects missing 'end' fields on cache hit, computes them from
adjacent chapters, and persists the updated meta.json.
This ensures consistent output regardless of whether the data was
freshly fetched or loaded from cache.
Add 'end' field to Chapter interface and parseChapters output.
Each chapter's end is derived from the next chapter's start time,
with the last chapter ending at the video's total duration.
This makes chapter data complete and ready for downstream consumers
(e.g. video clipping with ffmpeg) without requiring them to compute
end times from adjacent chapters.
Before: { title: 'Overview', start: 0 }
After: { title: 'Overview', start: 0, end: 21 }