- Browser strategy: headless first with automatic retry in visible Chrome on failure
- New --browser auto|headless|headed flag with --headless/--headed shortcuts
- Content cleaner module for HTML preprocessing (remove ads, base64 images, scripts)
- Media localizer now handles base64 data URIs alongside remote URLs
- Capture finalUrl from browser to track redirects for output path
- Agent quality gate documentation for post-capture validation
- Upgrade defuddle ^0.12.0 → ^0.14.0
- Add unit tests for content-cleaner, html-to-markdown, legacy-converter, media-localizer
Retry with alternate InnerTube client identities when YouTube returns
anti-bot responses, then fall back to yt-dlp when available. Split
monolithic main.ts into typed modules (youtube, transcript, storage,
shared, types) and add unit tests.
- New parsers/ module with pluggable rule system for site-specific HTML extraction
- X status parser: extract tweet text, media, quotes, author from data-testid elements
- X article parser: extract long-form article content with inline media
- archive.ph parser: restore original URL and prefer #CONTENT container
- Improved slug generation with stop words and content-aware slugs
- Output path uses subdirectory structure (domain/slug/slug.md)
- Fix: preserve anchor elements containing media in legacy converter
- Fix: smarter title deduplication in markdown document builder
- Guard last chapter end against duration=0: use Math.max(duration, ch.start)
- Remove unnecessary 'as any' cast in backfill
- Check all chapters for missing end (not just first) via .some()
- Skip backfill when needsFetch is true (about to refetch anyway)
- Wrap backfill writeFileSync in try/catch (best-effort persistence)
Videos cached before the chapter end-time change would silently
lack the 'end' field when loaded from cache. This adds a migration
that detects missing 'end' fields on cache hit, computes them from
adjacent chapters, and persists the updated meta.json.
This ensures consistent output regardless of whether the data was
freshly fetched or loaded from cache.
Add 'end' field to Chapter interface and parseChapters output.
Each chapter's end is derived from the next chapter's start time,
with the last chapter ending at the video's total duration.
This makes chapter data complete and ready for downstream consumers
(e.g. video clipping with ffmpeg) without requiring them to compute
end times from adjacent chapters.
Before: { title: 'Overview', start: 0 }
After: { title: 'Overview', start: 0, end: 21 }
- CLI now supports --color, --font-family, --font-size, --code-theme, --mac-code-block, --line-number, --count, --legend
- convertMarkdown accepts full CliOptions instead of limited subset
- Dynamic help text showing available theme/color/font options
- Remove quotes from CSS custom property regex character class so values containing quotes are fully stripped
- grace/simple themes now layer default CSS before their own rules
- Add tests for quoted property stripping and theme layering
- Add retry logic (5 attempts with progressive backoff) to clickMenuByText
to handle slow-loading home page menus
- Increase post-login wait from 2s to 5s and menu timeout from 20s to 40s
- Replace fixed 3s sleep after editor tab opens with waitForElement polling
for #title (30s) and .ProseMirror (15s) to reliably wait for full load
- Improve title/author filling with focus() and change event dispatch
for more reliable value setting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both GEMINI_WEB_CHROME_PROFILE_DIR and BAOYU_CHROME_PROFILE_DIR are
valid profile overrides (see resolveGeminiWebChromeProfileDir). Skip
existing Chrome auto-discovery when either is set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip existing Chrome auto-discovery when GEMINI_WEB_CHROME_PROFILE_DIR
is explicitly set, to avoid binding to the wrong browser profile/account.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
findExistingChromeDebugPort callers (fetch_google_cookies_via_cdp) depend
on /json/version being available. TCP-only check could misclassify
non-CDP listeners as reusable, causing waitForChromeDebugPort to timeout
instead of falling back to launching a new Chrome. Restore isDebugPortReady
(HTTP) as the validator for this function; TCP-only check remains in
discoverRunningChromeDebugPort for Chrome 146 approval mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chrome 146+ blocks all HTTP endpoints (/json/version) in approval mode.
Read DevToolsActivePort ws path directly and use TCP port check instead
of HTTP discovery. Add WebSocket connect retry loop for approval dialog.
Unify findExistingChromeDebugPort to use the same mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace manual CDP target orchestration with openPageSession, keeping
behavior consistent with fetch_google_cookies_via_cdp
- Move created-tab cleanup into finally block so tabs are always closed
even when Target.attachToTarget or Network.enable throws
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ability to discover and connect to an already-running Chrome browser
(v144+) for cookie extraction, avoiding the need to launch a new window
and re-login. Uses Chrome's DevToolsActivePort from default user data
directories and process scanning as discovery mechanisms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `extractText()` helper in `preprocessCjkEmphasis()` only handled
`text` nodes and nodes with `children`. `inlineCode` AST nodes (which
have a `value` but no `children`) fell through to the default empty-
string return, silently dropping their content.
For example `**算出 \`logits\`**` rendered as `<strong>算出 </strong>`
with the code span completely lost.
Add an `inlineCode` branch that wraps the node value in backticks so
the downstream `marked` pass can turn it into a proper `<code>` element.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move test files from tests/ directory to colocate with source code,
convert from .mjs to .ts using tsx runner, add workspaces and npm
cache to CI workflow.
- Add short text (<5 sentences) annotation reduction rule to translator's notes
- Explicitly pass resolved style preset to 02-prompt.md assembly in all modes
- Fix Provider Selection: default to Replicate when multiple keys available (matches code)
- Add Replicate column to Quality Presets table (normal→1K, 2k→2K)
- Add Replicate aspect ratio behavior (match_input_image when --ref without --ar)
- Remove stale Google Imagen reference from Aspect Ratios
- Add batch file format example with JSON schema to SKILL.md
- Note that batch paths resolve relative to batch file directory
- Move batch execution strategy in article-illustrator before numbered steps
- Fix translate image-language reminder to use standard markdown syntax
with note to match article's own image syntax convention