Commit Graph

7 Commits

Author SHA1 Message Date
Jim Liu 宝玉 9e3d72cf42 chore(baoyu-url-to-markdown): sync vendor baoyu-fetch with session and lifecycle changes 2026-03-31 18:24:25 -05:00
Jim Liu 宝玉 2ff139112f refactor(baoyu-url-to-markdown): replace custom pipeline with baoyu-fetch CLI 2026-03-27 14:11:05 -05:00
Jim Liu 宝玉 e99ce744cd feat(baoyu-url-to-markdown): add browser fallback strategy, content cleaner, and data URI support
- Browser strategy: headless first with automatic retry in visible Chrome on failure
- New --browser auto|headless|headed flag with --headless/--headed shortcuts
- Content cleaner module for HTML preprocessing (remove ads, base64 images, scripts)
- Media localizer now handles base64 data URIs alongside remote URLs
- Capture finalUrl from browser to track redirects for output path
- Agent quality gate documentation for post-capture validation
- Upgrade defuddle ^0.12.0 → ^0.14.0
- Add unit tests for content-cleaner, html-to-markdown, legacy-converter, media-localizer
2026-03-24 22:39:17 -05:00
Jim Liu 宝玉 0279fa403d feat(baoyu-url-to-markdown): add defuddle.md API fallback, YouTube transcripts, and modular converter architecture 2026-03-13 00:22:03 -05:00
Jim Liu 宝玉 3bba18c1fe build: commit vendored shared skill packages 2026-03-11 20:45:25 -05:00
Jim Liu 宝玉 069c5dc7d7 refactor: unify skill cdp and release artifacts 2026-03-11 19:38:59 -05:00
Jim Liu 宝玉 5560db595a feat(baoyu-url-to-markdown): add HTML snapshot saving and Defuddle fallback pipeline
- Save rendered HTML as sibling -captured.html file alongside markdown
- Defuddle-first conversion with automatic fallback to legacy Readability/selector extractor
- Add rawHtml, conversionMethod, fallbackReason to ConversionResult
- Log converter method and fallback reason in CLI output
2026-03-06 21:18:21 -06:00