Commit Graph

21 Commits

Author SHA1 Message Date
Jim Liu 宝玉 450c76d955 chore(baoyu-url-to-markdown): sync vendor baoyu-fetch with login auto-detect 2026-04-01 02:12:04 -05:00
Jim Liu 宝玉 9e3d72cf42 chore(baoyu-url-to-markdown): sync vendor baoyu-fetch with session and lifecycle changes 2026-03-31 18:24:25 -05:00
Jim Liu 宝玉 2ff139112f refactor(baoyu-url-to-markdown): replace custom pipeline with baoyu-fetch CLI 2026-03-27 14:11:05 -05:00
Jim Liu 宝玉 e99ce744cd feat(baoyu-url-to-markdown): add browser fallback strategy, content cleaner, and data URI support
- Browser strategy: headless first with automatic retry in visible Chrome on failure
- New --browser auto|headless|headed flag with --headless/--headed shortcuts
- Content cleaner module for HTML preprocessing (remove ads, base64 images, scripts)
- Media localizer now handles base64 data URIs alongside remote URLs
- Capture finalUrl from browser to track redirects for output path
- Agent quality gate documentation for post-capture validation
- Upgrade defuddle ^0.12.0 → ^0.14.0
- Add unit tests for content-cleaner, html-to-markdown, legacy-converter, media-localizer
2026-03-24 22:39:17 -05:00
Jim Liu 宝玉 d4e80b1bc3
Fix Node-compatible parser tests (#107)
* Fix Node-compatible parser tests

* Add parser test dependencies to root test env
2026-03-23 15:30:42 -05:00
Jim Liu 宝玉 e5d6c8ec68 feat(baoyu-url-to-markdown): add URL-specific parser layer for X/Twitter and archive.ph
- New parsers/ module with pluggable rule system for site-specific HTML extraction
- X status parser: extract tweet text, media, quotes, author from data-testid elements
- X article parser: extract long-form article content with inline media
- archive.ph parser: restore original URL and prefer #CONTENT container
- Improved slug generation with stop words and content-aware slugs
- Output path uses subdirectory structure (domain/slug/slug.md)
- Fix: preserve anchor elements containing media in legacy converter
- Fix: smarter title deduplication in markdown document builder
2026-03-22 15:18:46 -05:00
Jim Liu 宝玉 f407c950c3 docs(gemini-web): clarify CDP session reuse 2026-03-16 20:01:09 -05:00
Jim Liu 宝玉 c1f8a9ad07 chore: sync vendored baoyu-chrome-cdp copies 2026-03-16 12:57:39 -05:00
Jim Liu 宝玉 4be6f3682a chore: sync vendored baoyu-chrome-cdp across all skills 2026-03-16 12:54:36 -05:00
Jim Liu 宝玉 de7dc85361 chore: sync shared skill package vendor test files 2026-03-13 17:56:53 -05:00
Jim Liu 宝玉 0279fa403d feat(baoyu-url-to-markdown): add defuddle.md API fallback, YouTube transcripts, and modular converter architecture 2026-03-13 00:22:03 -05:00
Jim Liu 宝玉 3bba18c1fe build: commit vendored shared skill packages 2026-03-11 20:45:25 -05:00
Jim Liu 宝玉 069c5dc7d7 refactor: unify skill cdp and release artifacts 2026-03-11 19:38:59 -05:00
Jim Liu 宝玉 00bf946403 支持复用已有 Chrome CDP 实例,修复端口检测顺序问题 2026-03-11 17:24:18 -05:00
Jim Liu 宝玉 5560db595a feat(baoyu-url-to-markdown): add HTML snapshot saving and Defuddle fallback pipeline
- Save rendered HTML as sibling -captured.html file alongside markdown
- Defuddle-first conversion with automatic fallback to legacy Readability/selector extractor
- Add rawHtml, conversionMethod, fallbackReason to ConversionResult
- Log converter method and fallback reason in CLI output
2026-03-06 21:18:21 -06:00
Jim Liu 宝玉 6e533f938f refactor: unify Chrome CDP profile path across all skills
All skills now share a single Chrome profile at:
- macOS: ~/Library/Application Support/baoyu-skills/chrome-profile
- Linux: $XDG_DATA_HOME/baoyu-skills/chrome-profile
- Env override: BAOYU_CHROME_PROFILE_DIR

Fixes baoyu-post-to-weibo incorrectly reusing x-browser-profile.
Legacy per-skill env vars retained as fallback.
2026-03-06 16:03:01 -06:00
Jim Liu 宝玉 fff1a54b6b feat(baoyu-url-to-markdown): add --output-dir option for custom output directory
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 14:34:07 -06:00
Jim Liu 宝玉 210905ef66 feat(baoyu-url-to-markdown): add media download and cover image extraction
- Add --download-media flag to download images/videos to local dirs and rewrite markdown links
- Extract coverImage from page meta (og:image) into YAML front matter
- Handle data-src lazy loading for WeChat and similar sites
- Add EXTEND.md preferences with first-time setup for download_media setting
- Add media-localizer module adapted from x-to-markdown
2026-03-05 00:37:09 -06:00
Jim Liu 宝玉 832f06e86e refactor(baoyu-url-to-markdown): replace custom extraction with defuddle library 2026-03-05 00:11:59 -06:00
Jim Liu 宝玉 c742bfa1af fix(baoyu-url-to-markdown): improve html extraction and markdown conversion 2026-02-06 13:35:28 -06:00
Jim Liu 宝玉 97da7ab4eb chore: release v1.13.0 2026-01-21 19:40:46 -06:00