feat: sync baoyu skills updates from shared-skills

This commit is contained in:
敖氏 2026-03-09 11:38:47 +08:00
parent 5144335916
commit 4905d53267
14 changed files with 1160 additions and 842 deletions

View File

@ -1,27 +1,23 @@
---
name: baoyu-article-illustrator
description: Analyzes article structure, identifies positions requiring visual aids, generates illustrations with Type × Style two-dimension approach. Use when user asks to "illustrate article", "add images", "generate images for article", or "为文章配图".
version: 1.56.1
metadata:
openclaw:
homepage: https://github.com/JimLiu/baoyu-skills#baoyu-article-illustrator
description: Analyzes article structure, identifies positions requiring visual aids, and generates illustrations with Type × Style consistency. Defaults illustration text to the article's main language, and supports reference-image translation/localization by passing the original image to the generation model. Use when user asks to illustrate an article, add images, or generate images for article sections.
---
# Article Illustrator
Analyze articles, identify illustration positions, generate images with Type × Style consistency.
Analyze articles, identify illustration positions, and generate images with Type × Style consistency.
For multi-image jobs, prefer building a `batch.json` and calling `baoyu-image-gen --batchfile` so image generation can run in parallel with retries and a final status summary.
## Two Dimensions
| Dimension | Controls | Examples |
|-----------|----------|----------|
| **Type** | Information structure | infographic, scene, flowchart, comparison, framework, timeline |
| **Style** | Visual aesthetics | notion, warm, minimal, blueprint, watercolor, elegant |
| **Style** | Visual aesthetics | vector-illustration, notion, warm, blueprint, editorial |
Combine freely: `--type infographic --style blueprint`
Or use presets: `--preset tech-explainer` → type + style in one flag. See [Style Presets](references/style-presets.md).
## Types
| Type | Best For |
@ -32,138 +28,105 @@ Or use presets: `--preset tech-explainer` → type + style in one flag. See [Sty
| `comparison` | Side-by-side, options |
| `framework` | Models, architecture |
| `timeline` | History, evolution |
| `mixed` | Per-section optimization across multiple types |
## Styles
See [references/styles.md](references/styles.md) for Core Styles, full gallery, and Type × Style compatibility.
See [references/styles.md](references/styles.md) for the core style gallery and compatibility guidance.
## Workflow
```
```text
- [ ] Step 1: Pre-check (EXTEND.md, references, config)
- [ ] Step 2: Analyze content
- [ ] Step 3: Confirm settings (AskUserQuestion)
- [ ] Step 3: Confirm settings
- [ ] Step 4: Generate outline
- [ ] Step 5: Generate images
- [ ] Step 5: Generate prompts and images
- [ ] Step 6: Finalize
```
### Step 1: Pre-check
**1.5 Load Preferences (EXTEND.md) ⛔ BLOCKING**
- Load `EXTEND.md`
- Confirm output location
- Save reference images if provided
```bash
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-article-illustrator/EXTEND.md && echo "project"
test -f "$HOME/.baoyu-skills/baoyu-article-illustrator/EXTEND.md" && echo "user"
```
```powershell
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-article-illustrator/EXTEND.md) { "project" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-article-illustrator/EXTEND.md") { "user" }
```
| Result | Action |
|--------|--------|
| Found | Read, parse, display summary |
| Not found | ⛔ Run [first-time-setup](references/config/first-time-setup.md) |
Full procedures: [references/workflow.md](references/workflow.md#step-1-pre-check)
Full procedures: [references/workflow.md](references/workflow.md)
### Step 2: Analyze
| Analysis | Output |
|----------|--------|
| Content type | Technical / Tutorial / Methodology / Narrative |
| Purpose | information / visualization / imagination |
| Core arguments | 2-5 main points |
| Positions | Where illustrations add value |
- Determine article type: technical / tutorial / methodology / narrative
- Identify sections where visuals materially improve understanding
- Recommend illustration type, density, and style
- Detect the article's main language and use it as the default language for visible text in generated illustrations unless the user explicitly requests otherwise
- Visualize the underlying concept, not literal metaphors
**CRITICAL**: Metaphors → visualize underlying concept, NOT literal image.
### Step 3: Confirm Settings
Full procedures: [references/workflow.md](references/workflow.md#step-2-setup--analyze)
Use one confirmation round for:
### Step 3: Confirm Settings ⚠️
**ONE AskUserQuestion, max 4 Qs. Q1-Q2 REQUIRED. Q3 required unless preset chosen.**
| Q | Options |
|---|---------|
| **Q1: Preset or Type** | [Recommended preset], [alt preset], or manual: infographic, scene, flowchart, comparison, framework, timeline, mixed |
| **Q2: Density** | minimal (1-2), balanced (3-5), per-section (Recommended), rich (6+) |
| **Q3: Style** | [Recommended], minimal-flat, sci-fi, hand-drawn, editorial, scene, poster, Other — **skip if preset chosen** |
| Q4: Language | When article language ≠ EXTEND.md setting |
Full procedures: [references/workflow.md](references/workflow.md#step-3-confirm-settings-)
- Type
- Density
- Style
- Image text language only when the user explicitly wants to override the article's main language or the article is genuinely mixed-language
### Step 4: Generate Outline
Save `outline.md` with frontmatter (type, density, style, image_count) and entries:
Save `outline.md` with frontmatter and entries like:
```yaml
## Illustration 1
**Position**: [section/paragraph]
**Purpose**: [why]
**Visual Content**: [what]
**Filename**: 01-infographic-concept-name.png
**Filename**: 01-infographic-topic.png
```
Full template: [references/workflow.md](references/workflow.md#step-4-generate-outline)
### Step 5: Generate Prompts and Images
### Step 5: Generate Images
1. Create one saved prompt file per illustration in `prompts/`
2. Use type-specific templates with structured sections such as `ZONES`, `LABELS`, `COLORS`, `STYLE`, `ASPECT`
3. If the user did not specify a language, all visible text in the illustration must default to the article's main language
4. Labels must include article-specific numbers, terms, metrics, or quotes
5. When translating or localizing an existing image, pass the original image to the image model as a real reference image instead of describing it only in text
6. For acronym frameworks, named methodologies, mnemonics, and fixed step diagrams, extract the canonical wording from the article and include the exact target labels in the prompt
7. Do not generate from ad-hoc inline prompts when prompt files are expected
⛔ **BLOCKING: Prompt files MUST be saved before ANY image generation.**
When pending illustrations >= 2:
1. For each illustration, create a prompt file per [references/prompt-construction.md](references/prompt-construction.md)
2. Save to `prompts/NN-{type}-{slug}.md` with YAML frontmatter
3. Prompts **MUST** use type-specific templates with structured sections (ZONES / LABELS / COLORS / STYLE / ASPECT)
4. LABELS **MUST** include article-specific data: actual numbers, terms, metrics, quotes
5. **DO NOT** pass ad-hoc inline prompts to `--prompt` without saving prompt files first
6. Select generation skill, process references (`direct`/`style`/`palette`)
7. Apply watermark if EXTEND.md enabled
8. Generate from saved prompt files; retry once on failure
Full procedures: [references/workflow.md](references/workflow.md#step-5-generate-images)
1. Build `batch.json` from `outline.md + prompts/`
2. Call `baoyu-image-gen --batchfile`
3. Let `baoyu-image-gen` handle:
- parallel generation
- per-image retries up to 3 attempts
- tuned provider throttling
- final success/failure summary
### Step 6: Finalize
Insert `![description](path/NN-{type}-{slug}.png)` after paragraphs.
```
Article Illustration Complete!
Article: [path] | Type: [type] | Density: [level] | Style: [style]
Images: X/N generated
```
- Insert image references back into the article
- Prefer preserving the user's markdown conventions
- Report total generated images and any failures
## Output Directory
```
Typical structure:
```text
illustrations/{topic-slug}/
├── source-{slug}.{ext}
├── references/ # if provided
├── outline.md
├── prompts/
└── NN-{type}-{slug}.png
|- source-{slug}.md
|- outline.md
|- prompts/
|- batch.json
\- NN-{type}-{slug}.png
```
**Slug**: 2-4 words, kebab-case. **Conflict**: append `-YYYYMMDD-HHMMSS`.
## Modification
| Action | Steps |
|--------|-------|
| Edit | Update prompt → Regenerate → Update reference |
| Add | Position → Prompt → Generate → Update outline → Insert |
| Delete | Delete files → Remove reference → Update outline |
## References
| File | Content |
|------|---------|
| [references/workflow.md](references/workflow.md) | Detailed procedures |
| [references/workflow.md](references/workflow.md) | Detailed workflow |
| [references/usage.md](references/usage.md) | Command syntax |
| [references/styles.md](references/styles.md) | Style gallery |
| [references/style-presets.md](references/style-presets.md) | Preset shortcuts (type + style) |
| [references/prompt-construction.md](references/prompt-construction.md) | Prompt templates |
| [references/config/first-time-setup.md](references/config/first-time-setup.md) | First-time setup |

View File

@ -95,6 +95,32 @@ When depicting people:
**Add to prompts with text**:
> Text should be large and prominent with handwritten-style fonts. Keep minimal, focus on keywords.
### Text Localization Rules
When a prompt is meant to translate or localize an existing text-heavy image:
1. Pass the original image as a real reference image instead of relying on text-only description
2. Extract the authoritative terminology from the article first
3. Write the target text explicitly into the prompt instead of saying only "translate this image"
4. State what must remain unchanged: layout, structure, colors, icons, arrows, spacing, and non-text elements
5. If the image is a framework, acronym, mnemonic, or labeled methodology, require the translated labels to preserve the original letter mapping exactly
Examples of tasks that need this treatment:
- acronym frameworks such as `D.E.E.P`, `SOLVER`, `AARRR`
- process diagrams where each step name has a fixed canonical translation
- charts or cards with short labels that must align with the article wording
Recommended wording pattern:
```text
This is a faithful localization task, not a redesign task.
Use these exact English labels:
- ...
Requirements:
- Preserve layout, structure, icons, colors, spacing, and composition
- Replace only the text language
- Do not invent new step names
```
---
## Principles
@ -108,6 +134,9 @@ Good prompts must include:
5. **Style Characteristics**: Line treatment, texture, mood
6. **Aspect Ratio**: End with ratio and complexity level
For localization prompts, add one more rule:
7. **Exact Target Text List**: For any acronym, framework, or named methodology, include the exact target labels and sublabels to prevent semantic drift
## Type-Specific Templates
### Infographic
@ -236,38 +265,6 @@ STYLE: [style characteristics]
ASPECT: 16:9
```
### Screen-Print Style Override
When `style: screen-print`, replace standard style instructions with:
```
Screen print / silkscreen poster art. Flat color blocks, NO gradients.
COLORS: 2-5 colors maximum. [Choose from style palette or duotone pair]
TEXTURE: Halftone dot patterns, slight color layer misregistration, paper grain
COMPOSITION: Bold silhouettes, geometric framing, negative space as storytelling element
FIGURES: Silhouettes only, no detailed faces, stencil-cut edges
TYPOGRAPHY: Bold condensed sans-serif integrated into composition (not overlaid)
```
**Scene + screen-print**:
```
Conceptual poster scene. Single symbolic focal point, NOT literal illustration.
COLORS: Duotone pair (e.g., Burnt Orange #E8751A + Deep Teal #0A6E6E) on Off-Black #121212
COMPOSITION: Centered silhouette or geometric frame, 60%+ negative space
TEXTURE: Halftone dots, paper grain, slight print misregistration
```
**Comparison + screen-print**:
```
Split poster composition. Each side dominated by one color from duotone pair.
LEFT: [Color A] side with silhouette/icon for [Option A]
RIGHT: [Color B] side with silhouette/icon for [Option B]
DIVIDER: Geometric shape or negative space boundary
TEXTURE: Halftone transitions between sides
```
---
## What to Avoid
- Vague descriptions ("a nice image")

View File

@ -12,7 +12,6 @@ Simplified style tier for quick selection:
| `hand-drawn` | sketch/warm | Relaxed, reflective, casual content |
| `editorial` | editorial | Processes, data, journalism |
| `scene` | warm/watercolor | Narratives, emotional, lifestyle |
| `poster` | screen-print | Opinion, editorial, cultural, cinematic |
Use Core Styles for most cases. See full Style Gallery below for granular control.
@ -41,7 +40,6 @@ Use Core Styles for most cases. See full Style Gallery below for granular contro
| `playful` | Whimsical pastel doodles | Fun, casual, educational |
| `retro` | 80s/90s neon geometric | 80s/90s nostalgic, bold |
| `sketch` | Raw pencil notebook style | Brainstorming, creative exploration |
| `screen-print` | Bold poster art, halftone textures, limited colors | Opinion, editorial, cultural, cinematic |
| `sketch-notes` | Soft hand-drawn warm notes | Educational, warm notes |
| `vintage` | Aged parchment historical | Historical, heritage |
@ -49,14 +47,14 @@ Full specifications: `references/styles/<style>.md`
## Type × Style Compatibility Matrix
| | vector-illustration | notion | warm | minimal | blueprint | watercolor | elegant | editorial | scientific | screen-print |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| infographic | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✓✓ | ✓ |
| scene | ✓ | ✓ | ✓✓ | ✓ | ✗ | ✓✓ | ✓ | ✓ | ✗ | ✓✓ |
| flowchart | ✓✓ | ✓✓ | ✓ | ✓ | ✓✓ | ✗ | ✓ | ✓✓ | ✓ | ✗ |
| comparison | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓ | ✓ | ✓✓ | ✓✓ | ✓ | ✓ |
| framework | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✗ | ✓✓ | ✓ | ✓✓ | ✓ |
| timeline | ✓ | ✓✓ | ✓ | ✓ | ✓ | ✓✓ | ✓✓ | ✓✓ | ✓ | ✓ |
| | vector-illustration | notion | warm | minimal | blueprint | watercolor | elegant | editorial | scientific |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| infographic | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✓✓ |
| scene | ✓ | ✓ | ✓✓ | ✓ | ✗ | ✓✓ | ✓ | ✓ | ✗ |
| flowchart | ✓✓ | ✓✓ | ✓ | ✓ | ✓✓ | ✗ | ✓ | ✓✓ | ✓ |
| comparison | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓ | ✓ | ✓✓ | ✓✓ | ✓ |
| framework | ✓✓ | ✓✓ | ✓ | ✓✓ | ✓✓ | ✗ | ✓✓ | ✓ | ✓✓ |
| timeline | ✓ | ✓✓ | ✓ | ✓ | ✓ | ✓✓ | ✓✓ | ✓✓ | ✓ |
✓✓ = highly recommended | ✓ = compatible | ✗ = not recommended
@ -85,7 +83,6 @@ Full specifications: `references/styles/<style>.md`
| History, timeline, progress, evolution | timeline | elegant, warm |
| Productivity, SaaS, tool, app, software | infographic | notion, vector-illustration |
| Business, professional, strategy, corporate | framework | elegant |
| Opinion, editorial, culture, philosophy, cinematic, dramatic, poster | scene | screen-print |
| Biology, chemistry, medical, scientific | infographic | scientific |
| Explainer, journalism, magazine, investigation | infographic | editorial |
@ -175,22 +172,3 @@ Full specifications: `references/styles/<style>.md`
- Organic flow
- Personal journey feel
- Growth narratives
### scene + screen-print
- Bold silhouettes, symbolic compositions
- 2-5 flat colors with halftone textures
- Figure-ground inversion (negative space tells secondary story)
- Vintage poster aesthetic, conceptual not literal
- Great for opinion pieces and cultural commentary
### comparison + screen-print
- Split duotone composition (one color per side)
- Bold geometric dividers
- Symbolic icons over detailed rendering
- High contrast, immediate visual impact
### framework + screen-print
- Geometric node representations with stencil-cut edges
- Limited color coding (one color per concept level)
- Clean silhouette-based iconography
- Poster-style hierarchy with bold typography

View File

@ -1,9 +1,9 @@
# Usage
## Command Syntax
## Core Flow
```bash
# Auto-select type and style based on content
# Analyze article and plan illustrations
/baoyu-article-illustrator path/to/article.md
# Specify type
@ -12,71 +12,61 @@
# Specify style
/baoyu-article-illustrator path/to/article.md --style blueprint
# Combine type and style
/baoyu-article-illustrator path/to/article.md --type flowchart --style notion
# Specify density
/baoyu-article-illustrator path/to/article.md --density rich
# Direct content input (paste mode)
/baoyu-article-illustrator
[paste content]
```
## Options
## Batch Generation Integration
| Option | Description |
|--------|-------------|
| `--type <name>` | Illustration type (see Type Gallery in SKILL.md) |
| `--style <name>` | Visual style (see references/styles.md) |
| `--preset <name>` | Shorthand for type + style combo (see [references/style-presets.md](references/style-presets.md)) |
| `--density <level>` | Image count: minimal / balanced / rich |
When an article has 2 or more pending illustrations, use batch mode.
### Step 1: Build batch tasks from outline + prompts
```bash
npx -y tsx scripts/build-batch.ts \
--outline outline.md \
--prompts prompts \
--output batch.json \
--images-dir attachments \
--provider replicate \
--model google/nano-banana-pro \
--ar 16:9 \
--quality 2k
```
### Step 2: Run baoyu-image-gen batch mode
```bash
npx.cmd -y tsx ../baoyu-image-gen/scripts/main.ts --batchfile batch.json --jobs 2 --json
```
## What batch mode gives you
- Automatic parallel generation when pending images >= 2
- Tuned provider throttling for faster throughput without obvious RPM bursts
- Automatic retries up to 3 attempts per image
- Final batch summary with total success count, failure count, and failure reasons
## Recommended Defaults
- Provider: `replicate`
- Model: `google/nano-banana-pro`
- Aspect ratio: `16:9`
- Quality: `2k`
- Image text language: default to the article's main language unless the user explicitly asks for another language
## Localizing Existing Images
When the task is to translate text inside an existing image:
- Save the original image locally first
- Pass the original image through `--ref`
- Tell the model to replace only the text language while preserving layout and non-text elements
- Prefer `quality normal` for faster edit-style iterations on Replicate when visual fidelity is already good
## Input Modes
| Mode | Trigger | Output Directory |
|------|---------|------------------|
| File path | `path/to/article.md` | Use `default_output_dir` preference, or ask if not set |
| File path | `path/to/article.md` | Use preference or ask |
| Paste content | No path argument | `illustrations/{topic-slug}/` |
## Output Directory Options
| Value | Path |
|-------|------|
| `same-dir` | `{article-dir}/` |
| `illustrations-subdir` | `{article-dir}/illustrations/` |
| `independent` | `illustrations/{topic-slug}/` |
Configure in EXTEND.md: `default_output_dir: illustrations-subdir`
## Examples
**Technical article with data**:
```bash
/baoyu-article-illustrator api-design.md --type infographic --style blueprint
```
**Same thing with preset**:
```bash
/baoyu-article-illustrator api-design.md --preset tech-explainer
```
**Personal story**:
```bash
/baoyu-article-illustrator journey.md --preset storytelling
```
**Tutorial with steps**:
```bash
/baoyu-article-illustrator how-to-deploy.md --preset tutorial --density rich
```
**Opinion article with poster style**:
```bash
/baoyu-article-illustrator opinion.md --preset opinion-piece
```
**Preset with override**:
```bash
/baoyu-article-illustrator article.md --preset tech-explainer --style notion
```

View File

@ -2,235 +2,71 @@
## Step 1: Pre-check
### 1.0 Detect & Save Reference Images ⚠️ REQUIRED if images provided
### 1.0 Detect and Save Reference Images
Check if user provided reference images. Handle based on input type:
Check whether the user provided reference images.
| Input Type | Action |
|------------|--------|
| Image file path provided | Copy to `references/` subdirectory → can use `--ref` |
| Image in conversation (no path) | **ASK user for file path** with AskUserQuestion |
| User can't provide path | Extract style/palette verbally → append to prompts (NO frontmatter references) |
| Image file path provided | Copy to `references/` so it can be passed through `--ref` |
| Image appears only in conversation | Ask the user for a file path if the task depends on faithful image editing |
| User cannot provide a file path | Extract style/palette verbally and append those traits to prompts, but do not pretend this is a true reference-edit workflow |
**CRITICAL**: Only add `references` to prompt frontmatter if files are ACTUALLY SAVED to `references/` directory.
**If user provides file path**:
1. Copy to `references/NN-ref-{slug}.png`
2. Create description: `references/NN-ref-{slug}.md`
3. Verify files exist before proceeding
**If user can't provide path** (extracted verbally):
1. Analyze image visually, extract: colors, style, composition
2. Create `references/extracted-style.md` with extracted info
3. DO NOT add `references` to prompt frontmatter
4. Instead, append extracted style/colors directly to prompt text
**Description File Format** (only when file saved):
```yaml
---
ref_id: NN
filename: NN-ref-{slug}.png
---
[User's description or auto-generated description]
```
**Verification** (only for saved files):
```
Reference Images Saved:
- 01-ref-{slug}.png ✓ (can use --ref)
- 02-ref-{slug}.png ✓ (can use --ref)
```
**Or for extracted style**:
```
Reference Style Extracted (no file):
- Colors: #E8756D coral, #7ECFC0 mint...
- Style: minimal flat vector, clean lines...
→ Will append to prompt text (not --ref)
```
---
Rules:
- Only add `references` to prompt frontmatter if the files were actually saved to `references/`
- If the job is to translate or localize an existing image, a real saved reference image is required
- For localization jobs, prompt-only description is not enough; the original image must be passed to the image model
### 1.1 Determine Input Type
| Input | Output Directory | Next |
|-------|------------------|------|
| File path | Ask user (1.2) | → 1.2 |
| Pasted content | `illustrations/{topic-slug}/` | → 1.4 |
| File path | Ask user or use preference | Continue |
| Pasted content | `illustrations/{topic-slug}/` | Continue |
**Backup rule for pasted content**: If `source.md` exists in target directory, rename to `source-backup-YYYYMMDD-HHMMSS.md` before saving.
### 1.2 Load Preferences (EXTEND.md)
### 1.2-1.4 Configuration (file path input only)
Load project or user EXTEND.md first. If not found, complete first-time setup before continuing.
Check preferences and existing state, then ask ALL needed questions in ONE AskUserQuestion call (max 4 questions).
Supports:
- Watermark
- Preferred type/style
- Custom styles
- Default language
- Output directory
**Questions to include** (skip if preference exists or not applicable):
| Question | When to Ask | Options |
|----------|-------------|---------|
| Output directory | No `default_output_dir` in EXTEND.md | `{article-dir}/`, `{article-dir}/imgs/` (Recommended), `{article-dir}/illustrations/`, `illustrations/{topic-slug}/` |
| Existing images | Target dir has `.png/.jpg/.webp` files | `supplement`, `overwrite`, `regenerate` |
| Article update | Always (file path input) | `update`, `copy` |
**Preference Values** (if configured, skip asking):
| `default_output_dir` | Path |
|----------------------|------|
| `same-dir` | `{article-dir}/` |
| `imgs-subdir` | `{article-dir}/imgs/` |
| `illustrations-subdir` | `{article-dir}/illustrations/` |
| `independent` | `illustrations/{topic-slug}/` |
### 1.5 Load Preferences (EXTEND.md) ⛔ BLOCKING
**CRITICAL**: If EXTEND.md not found, MUST complete first-time setup before ANY other questions or steps. Do NOT proceed to reference images, do NOT ask about content, do NOT ask about type/style — ONLY complete the preferences setup first.
```bash
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-article-illustrator/EXTEND.md && echo "project"
test -f "$HOME/.baoyu-skills/baoyu-article-illustrator/EXTEND.md" && echo "user"
```
```powershell
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-article-illustrator/EXTEND.md) { "project" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-article-illustrator/EXTEND.md") { "user" }
```
| Result | Action |
|--------|--------|
| Found | Read, parse, display summary → Continue |
| Not found | ⛔ **BLOCKING**: Run first-time setup ONLY ([config/first-time-setup.md](config/first-time-setup.md)) → Complete and save EXTEND.md → Then continue |
**Supports**: Watermark | Preferred type/style | Custom styles | Language | Output directory
---
## Step 2: Setup & Analyze
## Step 2: Setup and Analyze
### 2.1 Analyze Content
| Analysis | Description |
|----------|-------------|
| Content type | Technical / Tutorial / Methodology / Narrative |
| Illustration purpose | information / visualization / imagination |
| Core arguments | 2-5 main points to visualize |
| Visual opportunities | Positions where illustrations add value |
| Recommended type | Based on content signals and purpose |
| Recommended density | Based on length and complexity |
Determine:
- Content type: technical / tutorial / methodology / narrative
- Illustration purpose: information / explanation / imagination
- Core arguments that should be visualized
- Positions where visuals materially improve understanding
- Recommended type, density, and style
- Article main language
### 2.2 Extract Core Arguments
Critical rule:
- If the article uses metaphors, do not illustrate them literally. Visualize the underlying concept.
- Main thesis
- Key concepts reader needs
- Comparisons/contrasts
- Framework/model proposed
### 2.2 Determine Image Text Language
**CRITICAL**: If article uses metaphors (e.g., "电锯切西瓜"), do NOT illustrate literally. Visualize the **underlying concept**.
Default rule:
- If the user does not specify a language, all visible text inside generated illustrations must use the article's main language
### 2.3 Identify Positions
Ask only when:
- The user explicitly wants a different image-text language
- The article is genuinely mixed-language and the intended output language is ambiguous
- A saved preference conflicts with the article language and the user has asked to follow the saved preference
**Illustrate**:
- Core arguments (REQUIRED)
- Abstract concepts
- Data comparisons
- Processes, workflows
## Step 3: Confirm Settings
**Do NOT Illustrate**:
- Metaphors literally
- Decorative scenes
- Generic illustrations
### 2.4 Analyze Reference Images (if provided in Step 1.0)
For each reference image:
| Analysis | Description |
|----------|-------------|
| Visual characteristics | Style, colors, composition |
| Content/subject | What the reference depicts |
| Suitable positions | Which sections match this reference |
| Style match | Which illustration types/styles align |
| Usage recommendation | `direct` / `style` / `palette` |
| Usage | When to Use |
|-------|-------------|
| `direct` | Reference matches desired output closely |
| `style` | Extract visual style characteristics only |
| `palette` | Extract color scheme only |
---
## Step 3: Confirm Settings ⚠️
**Do NOT skip.** Use ONE AskUserQuestion call with max 4 questions. **Q1, Q2, Q3 are ALL REQUIRED.**
### Q1: Preset or Type ⚠️ REQUIRED
Based on Step 2 content analysis, recommend a preset first (sets both type & style). Look up [style-presets.md](style-presets.md) "Content Type → Preset Recommendations" table.
- [Recommended preset] — [brief: type + style + why] (Recommended)
- [Alternative preset] — [brief]
- Or choose type manually: infographic / scene / flowchart / comparison / framework / timeline / mixed
**If user picks a preset → skip Q3** (type & style both resolved).
**If user picks a type → Q3 is REQUIRED.**
### Q2: Density ⚠️ REQUIRED - DO NOT SKIP
- minimal (1-2) - Core concepts only
- balanced (3-5) - Major sections
- per-section - At least 1 per section/chapter (Recommended)
- rich (6+) - Comprehensive coverage
### Q3: Style ⚠️ REQUIRED (skip if preset chosen in Q1)
If EXTEND.md has `preferred_style`:
- [Custom style name + brief description] (Recommended)
- [Top compatible core style 1]
- [Top compatible core style 2]
- Other (see full Style Gallery)
If no `preferred_style` (present Core Styles first):
- [Best compatible core style] (Recommended)
- [Other compatible core style 1]
- [Other compatible core style 2]
- Other (see full Style Gallery)
**Core Styles** (simplified selection):
| Core Style | Maps To | Best For |
|------------|---------|----------|
| `minimal-flat` | notion | General, knowledge sharing, SaaS |
| `sci-fi` | blueprint | AI, frontier tech, system design |
| `hand-drawn` | sketch/warm | Relaxed, reflective, casual |
| `editorial` | editorial | Processes, data, journalism |
| `scene` | warm/watercolor | Narratives, emotional, lifestyle |
| `poster` | screen-print | Opinion, editorial, cultural, cinematic |
Style selection based on Type × Style compatibility matrix (styles.md).
Full specs: `styles/<style>.md`
### Q4: Image Text Language ⚠️ REQUIRED when article language ≠ EXTEND.md `language`
Detect article language from content. If different from EXTEND.md `language` setting, MUST ask:
- Article language (match article content) (Recommended)
- EXTEND.md language (user's general preference)
**Skip only if**: Article language matches EXTEND.md `language`, or EXTEND.md has no `language` setting.
### Display Reference Usage (if references detected in Step 1.0)
When presenting outline preview to user, show reference assignments:
```
Reference Images:
| Ref | Filename | Recommended Usage |
|-----|----------|-------------------|
| 01 | 01-ref-diagram.png | direct → Illustration 1, 3 |
| 02 | 02-ref-chart.png | palette → Illustration 2 |
```
---
Use one confirmation round for:
- Illustration type
- Density
- Style
- Image text language only if an override is needed
## Step 4: Generate Outline
@ -242,160 +78,94 @@ type: infographic
density: balanced
style: blueprint
image_count: 4
references: # Only if references provided
references:
- ref_id: 01
filename: 01-ref-diagram.png
description: "Technical diagram showing system architecture"
- ref_id: 02
filename: 02-ref-chart.png
description: "Color chart with brand palette"
---
## Illustration 1
**Position**: [section] / [paragraph]
**Purpose**: [why this helps]
**Visual Content**: [what to show]
**Type Application**: [how type applies]
**References**: [01] # Optional: list ref_ids used
**Reference Usage**: direct # direct | style | palette
**Filename**: 01-infographic-concept-name.png
## Illustration 2
...
```
**Requirements**:
- Each position justified by content needs
- Type applied consistently
- Style reflected in descriptions
- Count matches density
- References assigned based on Step 2.4 analysis
---
Per illustration include:
- `Position`
- `Purpose`
- `Visual Content`
- `Type Application`
- `References` when used
- `Reference Usage` as `direct`, `style`, or `palette`
- `Filename`
## Step 5: Generate Images
### 5.1 Create Prompts ⛔ BLOCKING
### 5.1 Create Prompt Files
**Every illustration MUST have a saved prompt file before generation begins. DO NOT skip this step.**
Every illustration must have a saved prompt file before generation begins.
For each illustration in the outline:
Prompt requirements:
- `Layout`: overall composition
- `ZONES`: each visual area with concrete content
- `LABELS`: actual terms, numbers, metrics, or quotes from the article
- `COLORS`: specific colors or palette guidance
- `STYLE`: rendering and line treatment
- `ASPECT`: ratio such as `16:9`
1. **Create prompt file**: `prompts/NN-{type}-{slug}.md`
2. **Include YAML frontmatter**:
```yaml
---
illustration_id: 01
type: infographic
style: custom-flat-vector
---
```
3. **Follow type-specific template** from [prompt-construction.md](prompt-construction.md)
4. **Prompt quality requirements** (all REQUIRED):
- `Layout`: Describe overall composition (grid / radial / hierarchical / left-right / top-down)
- `ZONES`: Describe each visual area with specific content, not vague descriptions
- `LABELS`: Use **actual numbers, terms, metrics, quotes from the article** — NOT generic placeholders
- `COLORS`: Specify hex codes with semantic meaning (e.g., `Coral (#E07A5F) for emphasis`)
- `STYLE`: Describe line treatment, texture, mood, character rendering
- `ASPECT`: Specify ratio (e.g., `16:9`)
5. **Apply defaults**: composition requirements, character rendering, text guidelines, watermark
6. **Backup rule**: If prompt file exists, rename to `prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md`
Language rule:
- If the user did not specify a language, all visible text in the prompt should clearly request the article's main language
**Verification** ⛔: Before proceeding to 5.2, confirm ALL prompt files exist:
```
Prompt Files:
- prompts/01-infographic-overview.md ✓
- prompts/02-infographic-distillation.md ✓
...
```
### 5.2 Batch-First Execution for Multi-Image Jobs
**DO NOT** pass ad-hoc inline text to `--prompt` without first saving prompt files. The generation command should either use `--promptfiles prompts/NN-{type}-{slug}.md` or read the saved file content for `--prompt`.
When pending illustrations >= 2:
1. Save all prompt files first
2. Build `batch.json` from `outline.md + prompts/`
3. Call `baoyu-image-gen --batchfile`
4. Reuse the batch summary to report:
- total images
- success count
- failure count
- explicit failure reasons
**CRITICAL - References in Frontmatter**:
- Only add `references` field if files ACTUALLY EXIST in `references/` directory
- If style/palette was extracted verbally (no file), append info to prompt BODY instead
- Before writing frontmatter, verify: `test -f references/NN-ref-{slug}.png`
Benefits:
- Parallel generation when pending images >= 2
- Automatic retries up to 3 attempts per image
- Tuned provider throttling for better throughput without obvious RPM bursts
- Clear final batch summary
### 5.2 Select Generation Skill
### 5.3 Process References
Check available skills. If multiple, ask user.
### 5.3 Process References ⚠️ REQUIRED if references saved in Step 1.0
**DO NOT SKIP if user provided reference images.** For each illustration with references:
1. **VERIFY files exist first**:
```bash
test -f references/NN-ref-{slug}.png && echo "exists" || echo "MISSING"
```
- If file MISSING but in frontmatter → ERROR, fix frontmatter or remove references field
- If file exists → proceed with processing
2. Read prompt frontmatter for reference info
3. Process based on usage type:
If references were saved in Step 1, verify the files exist before generation.
| Usage | Action | Example |
|-------|--------|---------|
| `direct` | Add reference path to `--ref` parameter | `--ref references/01-ref-brand.png` |
| `style` | Analyze reference, append style traits to prompt | "Style: clean lines, gradient backgrounds..." |
| `palette` | Extract colors from reference, append to prompt | "Colors: #E8756D coral, #7ECFC0 mint..." |
| `direct` | Pass the file path through `--ref` | `--ref references/01-ref-brand.png` |
| `style` | Extract style traits and append to prompt text | "clean lines, soft gradients..." |
| `palette` | Extract colors and append to prompt text | "coral + mint brand palette" |
4. Check image generation skill capability:
Critical localization rule:
- If the job is to translate or localize text inside an existing image, you must pass the original image through `--ref`
- Do not rely on prompt-only description for this workflow
- Make the prompt explicitly say to replace only the text language while preserving layout, composition, and non-text elements
- If the image contains an acronym framework, methodology, mnemonic, or fixed step names, extract the canonical wording from the source article first and include the exact target labels in the prompt
- Do not let the model improvise alternative step names when the original framework has a fixed letter-to-term mapping
| Skill Supports `--ref` | Action |
|------------------------|--------|
| Yes (e.g., baoyu-image-gen with Google) | Pass reference images via `--ref` |
| No | Convert to text description, append to prompt |
### 5.4 Generate
**Verification**: Before generating, confirm reference processing:
```
Reference Processing:
- Illustration 1: using 01-ref-brand.png (direct) ✓
- Illustration 2: extracted palette from 02-ref-style.png ✓
```
### 5.4 Apply Watermark (if enabled)
Add: `Include a subtle watermark "[content]" at [position].`
### 5.5 Generate
1. For each illustration:
- **Backup rule**: If image file exists, rename to `NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md`
- If references with `direct` usage: include `--ref` parameter
- Generate image
2. After each: "Generated X/N"
3. On failure: retry once, then log and continue
---
For each illustration:
- Backup an existing output first if needed
- Include `--ref` when direct references are required
- For localization jobs, include the original image in `--ref`
- Generate the image
- On failure, let `baoyu-image-gen` retry up to 3 attempts in batch mode or retry once manually in single-image mode
## Step 6: Finalize
### 6.1 Update Article
Insert after corresponding paragraph:
```markdown
![description](illustrations/{slug}/NN-{type}-{slug}.png)
```
Alt text: concise description in article's language.
Insert image references back into the article while preserving the user's markdown conventions.
### 6.2 Output Summary
```
Article Illustration Complete!
Article: [path]
Type: [type] | Density: [level] | Style: [style]
Location: [directory]
Images: X/N generated
Positions:
- 01-xxx.png → After "[Section]"
- 02-yyy.png → After "[Section]"
[If failures]
Failed:
- NN-zzz.png: [reason]
```
Summarize:
- article path
- type / density / style
- output directory
- total images generated
- any failures and their reasons

View File

@ -0,0 +1,155 @@
import path from "node:path";
import process from "node:process";
import { readdir, readFile, writeFile } from "node:fs/promises";
type CliArgs = {
outlinePath: string | null;
promptsDir: string | null;
outputPath: string | null;
imagesDir: string | null;
provider: string;
model: string;
aspectRatio: string;
quality: string;
jobs: number | null;
help: boolean;
};
type OutlineEntry = {
index: number;
filename: string;
};
function printUsage(): void {
console.log(`Usage:
npx -y tsx scripts/build-batch.ts --outline outline.md --prompts prompts --output batch.json --images-dir attachments
Options:
--outline <path> Path to outline.md
--prompts <path> Path to prompts directory
--output <path> Path to output batch.json
--images-dir <path> Directory for generated images
--provider <name> Provider for baoyu-image-gen batch tasks (default: replicate)
--model <id> Model for baoyu-image-gen batch tasks (default: google/nano-banana-pro)
--ar <ratio> Aspect ratio for all tasks (default: 16:9)
--quality <level> Quality for all tasks (default: 2k)
--jobs <count> Recommended worker count metadata (optional)
-h, --help Show help`);
}
function parseArgs(argv: string[]): CliArgs {
const args: CliArgs = {
outlinePath: null,
promptsDir: null,
outputPath: null,
imagesDir: null,
provider: "replicate",
model: "google/nano-banana-pro",
aspectRatio: "16:9",
quality: "2k",
jobs: null,
help: false,
};
for (let i = 0; i < argv.length; i++) {
const current = argv[i]!;
if (current === "--outline") args.outlinePath = argv[++i] ?? null;
else if (current === "--prompts") args.promptsDir = argv[++i] ?? null;
else if (current === "--output") args.outputPath = argv[++i] ?? null;
else if (current === "--images-dir") args.imagesDir = argv[++i] ?? null;
else if (current === "--provider") args.provider = argv[++i] ?? args.provider;
else if (current === "--model") args.model = argv[++i] ?? args.model;
else if (current === "--ar") args.aspectRatio = argv[++i] ?? args.aspectRatio;
else if (current === "--quality") args.quality = argv[++i] ?? args.quality;
else if (current === "--jobs") {
const value = argv[++i];
args.jobs = value ? parseInt(value, 10) : null;
} else if (current === "--help" || current === "-h") {
args.help = true;
} else {
throw new Error(`Unknown option: ${current}`);
}
}
return args;
}
function parseOutline(content: string): OutlineEntry[] {
const entries: OutlineEntry[] = [];
const lines = content.split(/\r?\n/);
let currentIndex = 0;
for (const line of lines) {
const illustrationMatch = line.match(/^## Illustration\s+(\d+)/);
if (illustrationMatch) {
currentIndex = parseInt(illustrationMatch[1]!, 10);
continue;
}
const filenameMatch = line.match(/^\*\*Filename\*\*:\s+(.+)$/);
if (filenameMatch && currentIndex > 0) {
entries.push({
index: currentIndex,
filename: filenameMatch[1]!.trim(),
});
}
}
return entries;
}
async function getPromptFiles(promptsDir: string): Promise<string[]> {
const files = await readdir(promptsDir);
return files
.filter((file) => file.toLowerCase().endsWith(".md"))
.sort((a, b) => a.localeCompare(b))
.map((file) => path.join(promptsDir, file));
}
async function main(): Promise<void> {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
printUsage();
return;
}
if (!args.outlinePath || !args.promptsDir || !args.outputPath || !args.imagesDir) {
printUsage();
throw new Error("Missing required arguments: --outline, --prompts, --output, --images-dir");
}
const outlineContent = await readFile(path.resolve(args.outlinePath), "utf8");
const entries = parseOutline(outlineContent);
const promptFiles = await getPromptFiles(path.resolve(args.promptsDir));
if (entries.length === 0) {
throw new Error("No illustration entries with **Filename** found in outline.");
}
if (entries.length !== promptFiles.length) {
throw new Error(
`Outline/image count mismatch: outline has ${entries.length} entries, prompts dir has ${promptFiles.length} prompt files.`
);
}
const tasks = entries.map((entry, index) => ({
id: path.basename(entry.filename, path.extname(entry.filename)),
promptFiles: [promptFiles[index]!],
image: path.join(path.resolve(args.imagesDir!), entry.filename),
provider: args.provider,
model: args.model,
ar: args.aspectRatio,
quality: args.quality,
}));
const payload = {
jobs: args.jobs,
tasks,
};
await writeFile(path.resolve(args.outputPath), `${JSON.stringify(payload, null, 2)}\n`, "utf8");
console.log(path.resolve(args.outputPath));
}
main().catch((error) => {
console.error(error instanceof Error ? error.message : String(error));
process.exit(1);
});

View File

@ -1,58 +1,50 @@
---
name: baoyu-image-gen
description: AI image generation with OpenAI, Google, DashScope and Replicate APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.
version: 1.56.1
metadata:
openclaw:
homepage: https://github.com/JimLiu/baoyu-skills#baoyu-image-gen
requires:
anyBins:
- bun
- npx
description: AI image generation with OpenAI, Google, DashScope and Replicate APIs. Supports text-to-image, reference-image editing, aspect ratios, and faster parallel batch generation. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.
---
# Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象) and Replicate providers.
Official API-based image generation. Supports OpenAI, Google, DashScope and Replicate providers.
Default recommendation: `Replicate / google/nano-banana-pro`.
## Script Directory
**Agent Execution**:
1. `{baseDir}` = this SKILL.md file's directory
2. Script path = `{baseDir}/scripts/main.ts`
3. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun
1. `SKILL_DIR` = this SKILL.md file's directory
2. Script path = `${SKILL_DIR}/scripts/main.ts`
3. Resolve `${BUN_X}` runtime: if `bun` installed -> `bun`; if `npx` available -> `npx -y bun`; else suggest installing bun
4. On Windows PowerShell, if `npx.ps1` is blocked, use `npx.cmd -y tsx` as a fallback runner
## Step 0: Load Preferences ⛔ BLOCKING
## Step 0: Load Preferences (BLOCKING)
**CRITICAL**: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project user):
Check EXTEND.md existence (priority: project -> user):
```bash
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
```
```powershell
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-image-gen/EXTEND.md) { "project" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md") { "user" }
```
| Result | Action |
|--------|--------|
| Found | Load, parse, apply settings. If `default_model.[provider]` is null ask model only (Flow 2) |
| Not found | ⛔ Run first-time setup ([references/config/first-time-setup.md](references/config/first-time-setup.md)) → Save EXTEND.md → Then continue |
| Found | Load, parse, apply settings. If `default_model.[provider]` is null -> ask model only (Flow 2) |
| Not found | Run first-time setup (`references/config/first-time-setup.md`) -> save EXTEND.md -> continue |
**CRITICAL**: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
**CRITICAL**: If not found, complete the full setup (provider + model + quality + save location) before generating images.
| Path | Location |
|------|----------|
| `.baoyu-skills/baoyu-image-gen/EXTEND.md` | Project directory |
| `$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md` | User home |
**EXTEND.md Supports**: Default provider | Default quality | Default aspect ratio | Default image size | Default models
**EXTEND.md Supports**: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits
Schema: `references/config/preferences-schema.md`
@ -60,34 +52,34 @@ Schema: `references/config/preferences-schema.md`
```bash
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
${BUN_X} ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal or OpenAI edits)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# Faithful localization of an existing framework diagram
${BUN_X} ${SKILL_DIR}/scripts/main.ts --promptfiles localize-framework.md --ref source-diagram.png --image localized-diagram.png --provider replicate --model google/nano-banana-pro --quality normal
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# OpenAI GPT Image (official API)
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-1.5
# DashScope (阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# Replicate default recommendation
${BUN_X} ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana-pro
# Replicate (google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Batch mode with saved prompt files
${BUN_X} ${SKILL_DIR}/scripts/main.ts --batchfile batch.json
# Replicate with specific model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
# Windows PowerShell fallback runner
npx.cmd -y tsx ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
```
## Options
@ -97,13 +89,15 @@ ${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider r
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required) |
| `--provider google\|openai\|dashscope\|replicate` | Force provider (default: google) |
| `--model <id>`, `-m` | Model ID (Google: `gemini-3-pro-image-preview`, `gemini-3.1-flash-image-preview`; OpenAI: `gpt-image-1.5`) |
| `--ar <ratio>` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g., `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: 2k) |
| `--batchfile <path>` | JSON batch file for multi-image generation |
| `--jobs <count>` | Worker count for batch mode (default: auto, max from config, built-in default 10) |
| `--provider google\|openai\|dashscope\|replicate` | Force provider (default preference: replicate when available) |
| `--model <id>`, `-m` | Model ID (Google: `gemini-3-pro-image-preview`, `gemini-3.1-flash-image-preview`; OpenAI: `gpt-image-1.5`, `gpt-image-1`) |
| `--ar <ratio>` | Aspect ratio (e.g. `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g. `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: `2k`) |
| `--imageSize 1K\|2K\|4K` | Image size for Google (default: from quality) |
| `--ref <files...>` | Reference images. Supported by Google multimodal (`gemini-3-pro-image-preview`, `gemini-3-flash-preview`, `gemini-3.1-flash-image-preview`) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
| `--ref <files...>` | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, and Replicate |
| `--n <count>` | Number of images |
| `--json` | JSON output |
@ -113,29 +107,49 @@ ${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider r
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key |
| `GOOGLE_API_KEY` | Google API key |
| `DASHSCOPE_API_KEY` | DashScope API key (阿里云) |
| `GEMINI_API_KEY` | Alias for `GOOGLE_API_KEY` |
| `DASHSCOPE_API_KEY` | DashScope API key |
| `REPLICATE_API_TOKEN` | Replicate API token |
| `OPENAI_IMAGE_MODEL` | OpenAI model override |
| `GOOGLE_IMAGE_MODEL` | Google model override |
| `DASHSCOPE_IMAGE_MODEL` | DashScope model override (default: z-image-turbo) |
| `REPLICATE_IMAGE_MODEL` | Replicate model override (default: google/nano-banana-pro) |
| `DASHSCOPE_IMAGE_MODEL` | DashScope model override (default: `z-image-turbo`) |
| `REPLICATE_IMAGE_MODEL` | Replicate model override (default: `google/nano-banana-pro`) |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint |
| `OPENAI_IMAGE_USE_CHAT` | Use `/chat/completions` instead of `/images/generations` when a compatible proxy requires it |
| `GOOGLE_BASE_URL` | Custom Google endpoint |
| `DASHSCOPE_BASE_URL` | Custom DashScope endpoint |
| `REPLICATE_BASE_URL` | Custom Replicate endpoint |
| `BAOYU_IMAGE_GEN_MAX_WORKERS` | Override batch worker cap |
| `BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY` | Override provider concurrency, e.g. `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY` |
| `BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS` | Override provider start gap, e.g. `BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS` |
**Load Priority**: CLI args > EXTEND.md > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`
## OpenAI Support
OpenAI is an officially supported provider in this skill.
- Recommended OpenAI model: `gpt-image-1.5`
- Required auth: `OPENAI_API_KEY`
- Optional override: `OPENAI_BASE_URL`
- Reference-image editing: supported with GPT Image models via `--ref`
Important:
- Codex/ChatGPT desktop login does **not** automatically grant this script OpenAI Images API access
- If you want to use OpenAI here, provide a real `OPENAI_API_KEY`
- If your endpoint is a compatible proxy that only supports chat-style image output, set `OPENAI_IMAGE_USE_CHAT=true`
## Model Resolution
Model priority (highest → lowest), applies to all providers:
Model priority (highest -> lowest), applies to all providers:
1. CLI flag: `--model <id>`
2. EXTEND.md: `default_model.[provider]`
3. Env var: `<PROVIDER>_IMAGE_MODEL` (e.g., `GOOGLE_IMAGE_MODEL`)
3. Env var: `<PROVIDER>_IMAGE_MODEL`
4. Built-in default
**EXTEND.md overrides env vars**. If both EXTEND.md `default_model.google: "gemini-3-pro-image-preview"` and env var `GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview` exist, EXTEND.md wins.
**EXTEND.md overrides env vars**.
**Agent MUST display model info** before each generation:
- Show: `Using [provider] / [model]`
@ -145,76 +159,72 @@ Model priority (highest → lowest), applies to all providers:
Supported model formats:
- `owner/name` (recommended for official models), e.g. `google/nano-banana-pro`
- `owner/name` (recommended), e.g. `google/nano-banana-pro`
- `owner/name:version` (community models by version), e.g. `stability-ai/sdxl:<version>`
Examples:
```bash
# Use Replicate default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
```
## Provider Selection
1. `--ref` provided + no `--provider` auto-select Google first, then OpenAI, then Replicate
2. `--provider` specified use it (if `--ref`, must be `google`, `openai`, or `replicate`)
3. Only one API key available use that provider
4. Multiple available → default to Google
1. `--ref` provided + no `--provider` -> auto-select Google first, then OpenAI, then Replicate
2. `--provider` specified -> use it (if `--ref`, must be `google`, `openai`, or `replicate`)
3. Only one API key available -> use that provider
4. Multiple available -> default to Replicate (`google/nano-banana-pro`) unless explicitly overridden
## Quality Presets
| Preset | Google imageSize | OpenAI Size | Use Case |
|--------|------------------|-------------|----------|
| `normal` | 1K | 1024px | Quick previews |
| `2k` (default) | 2K | 2048px | Covers, illustrations, infographics |
**Google imageSize**: Can be overridden with `--imageSize 1K|2K|4K`
| `2k` | 2K | 2048px | Covers, illustrations, infographics |
## Aspect Ratios
Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`
- Google multimodal: uses `imageConfig.aspectRatio`
- Google Imagen: uses `aspectRatio` parameter
- OpenAI: maps to closest supported size
- Replicate: depends on model support
## Generation Mode
**Default**: Sequential generation (one image at a time). This ensures stable output and easier debugging.
**Default**: Sequential generation.
**Parallel Generation**: Only use when user explicitly requests parallel/concurrent generation.
**Batch Parallel Generation**: When `--batchfile` contains 2 or more pending tasks, the script automatically enables parallel generation.
| Mode | When to Use |
|------|-------------|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel | User explicitly requests, large batches (10+) |
| Parallel batch | Batch mode with 2+ tasks |
**Parallel Settings** (when requested):
Parallel behavior:
| Setting | Value |
|---------|-------|
| Recommended concurrency | 4 subagents |
| Max concurrency | 8 subagents |
| Use case | Large batch generation when user requests parallel |
- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
- You can override worker count with `--jobs <count>`
- Each image retries automatically up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
- Replicate defaults are tuned aggressively for `google/nano-banana-pro` and can be overridden in `EXTEND.md` or env vars
**Agent Implementation** (parallel mode only):
```
# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete
```
Important note on speed:
- Single-image generation does not add a forced inter-request wait in the shell script
- The main performance controls are model-side latency, requested quality/resolution, and batch-mode throttling
- For reference-image editing on Replicate, `quality: normal` maps to `resolution: 1K`, which is often much faster than `2k`
Important note on localization quality:
- For text-heavy reference-image localization, do not stop at "translate this image into English"
- If the image contains a framework, acronym, mnemonic, named method, or fixed step labels, extract the canonical target wording first and write those exact labels into the prompt
- Also tell the model what must not change: layout, composition, arrows, icons, colors, spacing, and non-text elements
- This greatly reduces semantic drift such as changing a fixed acronym into different step names
## Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal: `gemini-3-pro-image-preview`, `gemini-3.1-flash-image-preview`; or OpenAI GPT Image edits)
- Missing API key -> error with setup instructions
- Codex desktop auth without `OPENAI_API_KEY` -> explain that local login cannot be reused as OpenAI Images API auth
- Generation failure -> auto-retry up to 3 attempts per image
- Invalid aspect ratio -> warning, proceed with default
- Reference images with unsupported provider/model -> error with fix hint
## Extension Support
Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.
Custom configurations via EXTEND.md. See the preferences schema for supported options.

View File

@ -8,34 +8,12 @@ description: First-time setup and default model selection flow for baoyu-image-g
## Overview
Triggered when:
1. No EXTEND.md found → full setup (provider + model + preferences)
2. EXTEND.md found but `default_model.[provider]` is null → model selection only
## Setup Flow
```
No EXTEND.md found EXTEND.md found, model null
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ AskUserQuestion │ │ AskUserQuestion │
│ (full setup) │ │ (model only) │
└─────────────────────┘ └──────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ Create EXTEND.md │ │ Update EXTEND.md │
└─────────────────────┘ └──────────────────────┘
│ │
▼ ▼
Continue Continue
```
1. No EXTEND.md found -> full setup (provider + model + preferences)
2. EXTEND.md found but `default_model.[provider]` is null -> model selection only
## Flow 1: No EXTEND.md (Full Setup)
**Language**: Use user's input language or saved language preference.
Use AskUserQuestion with ALL questions in ONE call:
Use AskUserQuestion with all questions in one call.
### Question 1: Default Provider
@ -43,31 +21,24 @@ Use AskUserQuestion with ALL questions in ONE call:
header: "Provider"
question: "Default image generation provider?"
options:
- label: "Google (Recommended)"
description: "Gemini multimodal - high quality, reference images, flexible sizes"
- label: "Replicate (Recommended)"
description: "Default to google/nano-banana-pro, flexible model selection, strong general-purpose generation"
- label: "Google"
description: "Gemini multimodal, good for reference-image workflows"
- label: "OpenAI"
description: "GPT Image - consistent quality, reliable output"
description: "GPT Image via OPENAI_API_KEY, strong text rendering and edits"
- label: "DashScope"
description: "Alibaba Cloud - z-image-turbo, good for Chinese content"
- label: "Replicate"
description: "Community models - nano-banana-pro, flexible model selection"
description: "Alibaba Cloud image generation"
```
### Question 2: Default Google Model
### Question 2: Provider Model
Only show if user selected Google or auto-detect (no explicit provider).
Ask the model question that matches the chosen provider:
```yaml
header: "Google Model"
question: "Default Google image generation model?"
options:
- label: "gemini-3-pro-image-preview (Recommended)"
description: "Highest quality, best for production use"
- label: "gemini-3.1-flash-image-preview"
description: "Fast generation, good quality, lower cost"
- label: "gemini-3-flash-preview"
description: "Fast generation, balanced quality and speed"
```
- Replicate -> use the Replicate model question, recommend `google/nano-banana-pro`
- Google -> use the Google model question
- OpenAI -> use the OpenAI model question, recommend `gpt-image-1.5`
- DashScope -> use the DashScope model question
### Question 3: Default Quality
@ -76,9 +47,9 @@ header: "Quality"
question: "Default image quality?"
options:
- label: "2k (Recommended)"
description: "2048px - covers, illustrations, infographics"
description: "2048px, suitable for covers and production use"
- label: "normal"
description: "1024px - quick previews, drafts"
description: "1024px, suitable for previews and drafts"
```
### Question 4: Save Location
@ -88,9 +59,9 @@ header: "Save"
question: "Where to save preferences?"
options:
- label: "Project (Recommended)"
description: ".baoyu-skills/ (this project only)"
description: ".baoyu-skills/ for this project"
- label: "User"
description: "~/.baoyu-skills/ (all projects)"
description: "~/.baoyu-skills/ for all projects"
```
### Save Locations
@ -111,15 +82,15 @@ default_aspect_ratio: null
default_image_size: null
default_model:
google: [selected google model or null]
openai: null
dashscope: null
replicate: null
openai: [selected openai model or null]
dashscope: [selected dashscope model or null]
replicate: [selected replicate model or null]
---
```
## Flow 2: EXTEND.md Exists, Model Null
When EXTEND.md exists but `default_model.[current_provider]` is null, ask ONLY the model question for the current provider.
When EXTEND.md exists but `default_model.[current_provider]` is null, ask only the model question for the current provider.
### Google Model Selection
@ -142,7 +113,7 @@ header: "OpenAI Model"
question: "Choose a default OpenAI image generation model?"
options:
- label: "gpt-image-1.5 (Recommended)"
description: "Latest GPT Image model, high quality"
description: "Latest GPT Image model, best default for OpenAI"
- label: "gpt-image-1"
description: "Previous generation GPT Image model"
```
@ -166,32 +137,14 @@ header: "Replicate Model"
question: "Choose a default Replicate image generation model?"
options:
- label: "google/nano-banana-pro (Recommended)"
description: "Google's fast image model on Replicate"
description: "Default recommended model for this skill"
- label: "google/nano-banana"
description: "Google's base image model on Replicate"
description: "Base nano-banana model on Replicate"
```
### Update EXTEND.md
After user selects a model:
1. Read existing EXTEND.md
2. If `default_model:` section exists → update the provider-specific key
3. If `default_model:` section missing → add the full section:
```yaml
default_model:
google: [value or null]
openai: [value or null]
dashscope: [value or null]
replicate: [value or null]
```
Only set the selected provider's model; leave others as their current value or null.
## After Setup
1. Create directory if needed
2. Write/update EXTEND.md with frontmatter
3. Confirm: "Preferences saved to [path]"
2. Write or update EXTEND.md
3. Confirm the save path
4. Continue with image generation

View File

@ -11,7 +11,7 @@ description: EXTEND.md YAML schema for baoyu-image-gen user preferences
---
version: 1
default_provider: null # google|openai|dashscope|replicate|null (null = auto-detect)
default_provider: null # google|openai|dashscope|replicate|null (null = auto-detect, prefers replicate when multiple keys are available)
default_quality: null # normal|2k|null (null = use default: 2k)
@ -20,10 +20,26 @@ default_aspect_ratio: null # "16:9"|"1:1"|"4:3"|"3:4"|"2.35:1"|null
default_image_size: null # 1K|2K|4K|null (Google only, overrides quality)
default_model:
google: null # e.g., "gemini-3-pro-image-preview", "gemini-3.1-flash-image-preview"
openai: null # e.g., "gpt-image-1.5"
dashscope: null # e.g., "z-image-turbo"
replicate: null # e.g., "google/nano-banana-pro"
google: null # e.g. "gemini-3-pro-image-preview", "gemini-3.1-flash-image-preview"
openai: null # e.g. "gpt-image-1.5", "gpt-image-1"
dashscope: null # e.g. "z-image-turbo"
replicate: null # e.g. "google/nano-banana-pro"
batch:
max_workers: 10
provider_limits:
replicate:
concurrency: 5
start_interval_ms: 700
google:
concurrency: 3
start_interval_ms: 1100
openai:
concurrency: 3
start_interval_ms: 1100
dashscope:
concurrency: 3
start_interval_ms: 1100
---
```
@ -32,31 +48,36 @@ default_model:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `version` | int | 1 | Schema version |
| `default_provider` | string\|null | null | Default provider (null = auto-detect) |
| `default_quality` | string\|null | null | Default quality (null = 2k) |
| `default_provider` | string\|null | null | Default provider (`null` = auto-detect, preferring Replicate when multiple keys are available) |
| `default_quality` | string\|null | null | Default quality (`null` = `2k`) |
| `default_aspect_ratio` | string\|null | null | Default aspect ratio |
| `default_image_size` | string\|null | null | Google image size (overrides quality) |
| `default_model.google` | string\|null | null | Google default model |
| `default_model.openai` | string\|null | null | OpenAI default model |
| `default_model.dashscope` | string\|null | null | DashScope default model |
| `default_model.replicate` | string\|null | null | Replicate default model |
| `batch.max_workers` | int\|null | 10 | Batch worker cap |
| `batch.provider_limits.<provider>.concurrency` | int\|null | provider default | Max simultaneous requests per provider |
| `batch.provider_limits.<provider>.start_interval_ms` | int\|null | provider default | Minimum gap between request starts per provider |
## Examples
**Minimal**:
```yaml
---
version: 1
default_provider: google
default_provider: replicate
default_quality: 2k
---
```
**Full**:
```yaml
---
version: 1
default_provider: google
default_provider: replicate
default_quality: 2k
default_aspect_ratio: "16:9"
default_image_size: 2K
@ -65,5 +86,11 @@ default_model:
openai: "gpt-image-1.5"
dashscope: "z-image-turbo"
replicate: "google/nano-banana-pro"
---
batch:
max_workers: 10
provider_limits:
replicate:
concurrency: 5
start_interval_ms: 700
---
```

View File

@ -2,34 +2,103 @@ import path from "node:path";
import process from "node:process";
import { homedir } from "node:os";
import { access, mkdir, readFile, writeFile } from "node:fs/promises";
import type { CliArgs, Provider, ExtendConfig } from "./types";
import type {
BatchFile,
BatchTaskInput,
CliArgs,
ExtendConfig,
Provider,
} from "./types";
type ProviderModule = {
getDefaultModel: () => string;
generateImage: (prompt: string, model: string, args: CliArgs) => Promise<Uint8Array>;
};
type PreparedTask = {
id: string;
prompt: string;
args: CliArgs;
provider: Provider;
model: string;
outputPath: string;
providerModule: ProviderModule;
};
type TaskResult = {
id: string;
provider: Provider;
model: string;
outputPath: string;
success: boolean;
attempts: number;
error: string | null;
};
type ProviderRateLimit = {
concurrency: number;
startIntervalMs: number;
};
const MAX_ATTEMPTS = 3;
const DEFAULT_MAX_WORKERS = 10;
const POLL_WAIT_MS = 250;
const DEFAULT_PROVIDER_RATE_LIMITS: Record<Provider, ProviderRateLimit> = {
replicate: { concurrency: 5, startIntervalMs: 700 },
google: { concurrency: 3, startIntervalMs: 1100 },
openai: { concurrency: 3, startIntervalMs: 1100 },
dashscope: { concurrency: 3, startIntervalMs: 1100 },
};
function printUsage(): void {
console.log(`Usage:
npx -y bun scripts/main.ts --prompt "A cat" --image cat.png
npx -y bun scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9
npx -y bun scripts/main.ts --promptfiles system.md content.md --image out.png
npx -y bun scripts/main.ts --batchfile batch.json
Options:
-p, --prompt <text> Prompt text
--promptfiles <files...> Read prompt from files (concatenated)
--image <path> Output image path (required)
--provider google|openai|dashscope|replicate Force provider (auto-detect by default)
--image <path> Output image path (required in single-image mode)
--batchfile <path> JSON batch file for multi-image generation
--jobs <count> Worker count for batch mode (default: auto, max from config, built-in default 10)
--provider google|openai|dashscope|replicate Force provider (auto-detect, prefers replicate when available)
-m, --model <id> Model ID
--ar <ratio> Aspect ratio (e.g., 16:9, 1:1, 4:3)
--size <WxH> Size (e.g., 1024x1024)
--quality normal|2k Quality preset (default: 2k)
--imageSize 1K|2K|4K Image size for Google (default: from quality)
--ref <files...> Reference images (Google multimodal or OpenAI edits)
--n <count> Number of images (default: 1)
--ref <files...> Reference images (Google multimodal, OpenAI GPT Image edits, or Replicate)
--n <count> Number of images for the current task (default: 1)
--json JSON output
-h, --help Show help
Batch file format:
[
{
"id": "hero",
"promptFiles": ["prompts/hero.md"],
"image": "out/hero.png",
"provider": "replicate",
"model": "google/nano-banana-pro",
"ar": "16:9"
}
]
PowerShell note:
If npx.ps1 is blocked by execution policy, use:
npx.cmd -y tsx scripts/main.ts --prompt "A cat" --image cat.png
Behavior:
- Batch mode automatically runs in parallel when pending tasks >= 2
- Each image retries automatically up to 3 attempts
- Batch summary reports success count, failure count, and per-image errors
Environment variables:
OPENAI_API_KEY OpenAI API key
GOOGLE_API_KEY Google API key
GEMINI_API_KEY Gemini API key (alias for GOOGLE_API_KEY)
DASHSCOPE_API_KEY DashScope API key ()
DASHSCOPE_API_KEY DashScope API key
REPLICATE_API_TOKEN Replicate API token
OPENAI_IMAGE_MODEL Default OpenAI model (gpt-image-1.5)
GOOGLE_IMAGE_MODEL Default Google model (gemini-3-pro-image-preview)
@ -40,6 +109,9 @@ Environment variables:
GOOGLE_BASE_URL Custom Google endpoint
DASHSCOPE_BASE_URL Custom DashScope endpoint
REPLICATE_BASE_URL Custom Replicate endpoint
BAOYU_IMAGE_GEN_MAX_WORKERS Override batch worker cap
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY Override provider concurrency
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS Override provider start gap in ms
Env file load order: CLI args > EXTEND.md > process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env`);
}
@ -57,6 +129,8 @@ function parseArgs(argv: string[]): CliArgs {
imageSize: null,
referenceImages: [],
n: 1,
batchFile: null,
jobs: null,
json: false,
help: false,
};
@ -110,9 +184,26 @@ function parseArgs(argv: string[]): CliArgs {
continue;
}
if (a === "--batchfile") {
const v = argv[++i];
if (!v) throw new Error("Missing value for --batchfile");
out.batchFile = v;
continue;
}
if (a === "--jobs") {
const v = argv[++i];
if (!v) throw new Error("Missing value for --jobs");
out.jobs = parseInt(v, 10);
if (isNaN(out.jobs) || out.jobs < 1) throw new Error(`Invalid worker count: ${v}`);
continue;
}
if (a === "--provider") {
const v = argv[++i];
if (v !== "google" && v !== "openai" && v !== "dashscope" && v !== "replicate") throw new Error(`Invalid provider: ${v}`);
if (v !== "google" && v !== "openai" && v !== "dashscope" && v !== "replicate") {
throw new Error(`Invalid provider: ${v}`);
}
out.provider = v;
continue;
}
@ -228,9 +319,11 @@ function parseSimpleYaml(yaml: string): Partial<ExtendConfig> {
const config: Partial<ExtendConfig> = {};
const lines = yaml.split("\n");
let currentKey: string | null = null;
let currentProvider: Provider | null = null;
for (const line of lines) {
const trimmed = line.trim();
const indent = line.match(/^\s*/)?.[0].length ?? 0;
if (!trimmed || trimmed.startsWith("#")) continue;
if (trimmed.includes(":") && !trimmed.startsWith("-")) {
@ -247,18 +340,57 @@ function parseSimpleYaml(yaml: string): Partial<ExtendConfig> {
} else if (key === "default_provider") {
config.default_provider = value === "null" ? null : (value as Provider);
} else if (key === "default_quality") {
config.default_quality = value === "null" ? null : (value as "normal" | "2k");
config.default_quality = value === "null" ? null : value as "normal" | "2k";
} else if (key === "default_aspect_ratio") {
const cleaned = value.replace(/['"]/g, "");
config.default_aspect_ratio = cleaned === "null" ? null : cleaned;
} else if (key === "default_image_size") {
config.default_image_size = value === "null" ? null : (value as "1K" | "2K" | "4K");
config.default_image_size = value === "null" ? null : value as "1K" | "2K" | "4K";
} else if (key === "default_model") {
config.default_model = { google: null, openai: null, dashscope: null, replicate: null };
currentKey = "default_model";
} else if (currentKey === "default_model" && (key === "google" || key === "openai" || key === "dashscope" || key === "replicate")) {
currentProvider = null;
} else if (key === "batch") {
config.batch = {};
currentKey = "batch";
currentProvider = null;
} else if (currentKey === "batch" && indent >= 2 && key === "max_workers") {
config.batch ??= {};
config.batch.max_workers = value === "null" ? null : parseInt(value, 10);
} else if (currentKey === "batch" && indent >= 2 && key === "provider_limits") {
config.batch ??= {};
config.batch.provider_limits ??= {};
currentKey = "provider_limits";
currentProvider = null;
} else if (
currentKey === "provider_limits" &&
indent >= 4 &&
(key === "google" || key === "openai" || key === "dashscope" || key === "replicate")
) {
config.batch ??= {};
config.batch.provider_limits ??= {};
config.batch.provider_limits[key] ??= {};
currentProvider = key;
} else if (
currentKey === "default_model" &&
(key === "google" || key === "openai" || key === "dashscope" || key === "replicate")
) {
const cleaned = value.replace(/['"]/g, "");
config.default_model![key] = cleaned === "null" ? null : cleaned;
} else if (
currentKey === "provider_limits" &&
currentProvider &&
indent >= 6 &&
(key === "concurrency" || key === "start_interval_ms")
) {
config.batch ??= {};
config.batch.provider_limits ??= {};
const providerLimit = (config.batch.provider_limits[currentProvider] ??= {});
if (key === "concurrency") {
providerLimit.concurrency = value === "null" ? null : parseInt(value, 10);
} else {
providerLimit.start_interval_ms = value === "null" ? null : parseInt(value, 10);
}
}
}
}
@ -280,7 +412,6 @@ async function loadExtendConfig(): Promise<Partial<ExtendConfig>> {
const content = await readFile(p, "utf8");
const yaml = extractYamlFrontMatter(content);
if (!yaml) continue;
return parseSimpleYaml(yaml);
} catch {
continue;
@ -300,6 +431,46 @@ function mergeConfig(args: CliArgs, extend: Partial<ExtendConfig>): CliArgs {
};
}
function parsePositiveInt(value: string | undefined): number | null {
if (!value) return null;
const parsed = parseInt(value, 10);
return Number.isFinite(parsed) && parsed > 0 ? parsed : null;
}
function getConfiguredMaxWorkers(extendConfig: Partial<ExtendConfig>): number {
const envValue = parsePositiveInt(process.env.BAOYU_IMAGE_GEN_MAX_WORKERS);
const configValue = extendConfig.batch?.max_workers ?? null;
return Math.max(1, envValue ?? configValue ?? DEFAULT_MAX_WORKERS);
}
function getConfiguredProviderRateLimits(
extendConfig: Partial<ExtendConfig>
): Record<Provider, ProviderRateLimit> {
const configured: Record<Provider, ProviderRateLimit> = {
replicate: { ...DEFAULT_PROVIDER_RATE_LIMITS.replicate },
google: { ...DEFAULT_PROVIDER_RATE_LIMITS.google },
openai: { ...DEFAULT_PROVIDER_RATE_LIMITS.openai },
dashscope: { ...DEFAULT_PROVIDER_RATE_LIMITS.dashscope },
};
for (const provider of ["replicate", "google", "openai", "dashscope"] as Provider[]) {
const envPrefix = `BAOYU_IMAGE_GEN_${provider.toUpperCase()}`;
const extendLimit = extendConfig.batch?.provider_limits?.[provider];
configured[provider] = {
concurrency:
parsePositiveInt(process.env[`${envPrefix}_CONCURRENCY`]) ??
extendLimit?.concurrency ??
configured[provider].concurrency,
startIntervalMs:
parsePositiveInt(process.env[`${envPrefix}_START_INTERVAL_MS`]) ??
extendLimit?.start_interval_ms ??
configured[provider].startIntervalMs,
};
}
return configured;
}
async function readPromptFromFiles(files: string[]): Promise<string> {
const parts: string[] = [];
for (const f of files) {
@ -311,9 +482,12 @@ async function readPromptFromFiles(files: string[]): Promise<string> {
async function readPromptFromStdin(): Promise<string | null> {
if (process.stdin.isTTY) return null;
try {
const t = await Bun.stdin.text();
const v = t.trim();
return v.length > 0 ? v : null;
const chunks: Buffer[] = [];
for await (const chunk of process.stdin) {
chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
}
const value = Buffer.concat(chunks).toString("utf8").trim();
return value.length > 0 ? value : null;
} catch {
return null;
}
@ -327,7 +501,13 @@ function normalizeOutputImagePath(p: string): string {
}
function detectProvider(args: CliArgs): Provider {
if (args.referenceImages.length > 0 && args.provider && args.provider !== "google" && args.provider !== "openai" && args.provider !== "replicate") {
if (
args.referenceImages.length > 0 &&
args.provider &&
args.provider !== "google" &&
args.provider !== "openai" &&
args.provider !== "replicate"
) {
throw new Error(
"Reference images require a ref-capable provider. Use --provider google (Gemini multimodal), --provider openai (GPT Image edits), or --provider replicate."
);
@ -349,13 +529,18 @@ function detectProvider(args: CliArgs): Provider {
);
}
const available = [hasGoogle && "google", hasOpenai && "openai", hasDashscope && "dashscope", hasReplicate && "replicate"].filter(Boolean) as Provider[];
const available = [
hasReplicate && "replicate",
hasGoogle && "google",
hasOpenai && "openai",
hasDashscope && "dashscope",
].filter(Boolean) as Provider[];
if (available.length === 1) return available[0]!;
if (available.length > 1) return available[0]!;
throw new Error(
"No API key found. Set GOOGLE_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, DASHSCOPE_API_KEY, or REPLICATE_API_TOKEN.\n" +
"No API key found. Set REPLICATE_API_TOKEN, GOOGLE_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, or DASHSCOPE_API_KEY.\n" +
"Create ~/.baoyu-skills/.env or <cwd>/.baoyu-skills/.env with your keys."
);
}
@ -371,11 +556,6 @@ async function validateReferenceImages(referenceImages: string[]): Promise<void>
}
}
type ProviderModule = {
getDefaultModel: () => string;
generateImage: (prompt: string, model: string, args: CliArgs) => Promise<Uint8Array>;
};
function isRetryableGenerationError(error: unknown): boolean {
const msg = error instanceof Error ? error.message : String(error);
const nonRetryableMarkers = [
@ -384,26 +564,328 @@ function isRetryableGenerationError(error: unknown): boolean {
"only supported",
"No API key found",
"is required",
"Invalid ",
"Unexpected ",
"API error (400)",
"API error (401)",
"API error (402)",
"API error (403)",
"API error (404)",
"temporarily disabled",
];
return !nonRetryableMarkers.some((marker) => msg.includes(marker));
}
async function loadProviderModule(provider: Provider): Promise<ProviderModule> {
if (provider === "google") {
return (await import("./providers/google")) as ProviderModule;
}
if (provider === "dashscope") {
return (await import("./providers/dashscope")) as ProviderModule;
}
if (provider === "replicate") {
return (await import("./providers/replicate")) as ProviderModule;
}
if (provider === "google") return (await import("./providers/google")) as ProviderModule;
if (provider === "dashscope") return (await import("./providers/dashscope")) as ProviderModule;
if (provider === "replicate") return (await import("./providers/replicate")) as ProviderModule;
return (await import("./providers/openai")) as ProviderModule;
}
async function loadPromptForArgs(args: CliArgs): Promise<string | null> {
let prompt: string | null = args.prompt;
if (!prompt && args.promptFiles.length > 0) {
prompt = await readPromptFromFiles(args.promptFiles);
}
return prompt;
}
function getModelForProvider(
provider: Provider,
requestedModel: string | null,
extendConfig: Partial<ExtendConfig>,
providerModule: ProviderModule
): string {
if (requestedModel) return requestedModel;
if (extendConfig.default_model) {
if (provider === "google" && extendConfig.default_model.google) return extendConfig.default_model.google;
if (provider === "openai" && extendConfig.default_model.openai) return extendConfig.default_model.openai;
if (provider === "dashscope" && extendConfig.default_model.dashscope) return extendConfig.default_model.dashscope;
if (provider === "replicate" && extendConfig.default_model.replicate) return extendConfig.default_model.replicate;
}
return providerModule.getDefaultModel();
}
async function prepareSingleTask(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<PreparedTask> {
if (!args.quality) args.quality = "2k";
const prompt = (await loadPromptForArgs(args)) ?? (await readPromptFromStdin());
if (!prompt) throw new Error("Prompt is required");
if (!args.imagePath) throw new Error("--image is required");
if (args.referenceImages.length > 0) await validateReferenceImages(args.referenceImages);
const provider = detectProvider(args);
const providerModule = await loadProviderModule(provider);
const model = getModelForProvider(provider, args.model, extendConfig, providerModule);
return {
id: "single",
prompt,
args,
provider,
model,
outputPath: normalizeOutputImagePath(args.imagePath),
providerModule,
};
}
async function loadBatchTasks(batchFilePath: string): Promise<BatchTaskInput[]> {
const content = await readFile(path.resolve(batchFilePath), "utf8");
const parsed = JSON.parse(content.replace(/^\uFEFF/, "")) as BatchFile;
if (Array.isArray(parsed)) return parsed;
if (parsed && typeof parsed === "object" && Array.isArray(parsed.tasks)) return parsed.tasks;
throw new Error("Invalid batch file. Expected an array of tasks or an object with a tasks array.");
}
function createTaskArgs(baseArgs: CliArgs, task: BatchTaskInput): CliArgs {
return {
...baseArgs,
prompt: task.prompt ?? null,
promptFiles: task.promptFiles ? [...task.promptFiles] : [],
imagePath: task.image ?? null,
provider: task.provider ?? baseArgs.provider ?? null,
model: task.model ?? baseArgs.model ?? null,
aspectRatio: task.ar ?? baseArgs.aspectRatio ?? null,
size: task.size ?? baseArgs.size ?? null,
quality: task.quality ?? baseArgs.quality ?? null,
imageSize: task.imageSize ?? baseArgs.imageSize ?? null,
referenceImages: task.ref ? [...task.ref] : [],
n: task.n ?? baseArgs.n,
batchFile: null,
jobs: baseArgs.jobs,
json: baseArgs.json,
help: false,
};
}
async function prepareBatchTasks(
args: CliArgs,
extendConfig: Partial<ExtendConfig>
): Promise<PreparedTask[]> {
if (!args.batchFile) throw new Error("--batchfile is required in batch mode");
const taskInputs = await loadBatchTasks(args.batchFile);
if (taskInputs.length === 0) throw new Error("Batch file does not contain any tasks.");
const prepared: PreparedTask[] = [];
for (let i = 0; i < taskInputs.length; i++) {
const task = taskInputs[i]!;
const taskArgs = createTaskArgs(args, task);
const prompt = await loadPromptForArgs(taskArgs);
if (!prompt) throw new Error(`Task ${i + 1} is missing prompt or promptFiles.`);
if (!taskArgs.imagePath) throw new Error(`Task ${i + 1} is missing image output path.`);
if (taskArgs.referenceImages.length > 0) await validateReferenceImages(taskArgs.referenceImages);
const provider = detectProvider(taskArgs);
const providerModule = await loadProviderModule(provider);
const model = getModelForProvider(provider, taskArgs.model, extendConfig, providerModule);
prepared.push({
id: task.id || `task-${String(i + 1).padStart(2, "0")}`,
prompt,
args: taskArgs,
provider,
model,
outputPath: normalizeOutputImagePath(taskArgs.imagePath),
providerModule,
});
}
return prepared;
}
async function writeImage(outputPath: string, imageData: Uint8Array): Promise<void> {
await mkdir(path.dirname(outputPath), { recursive: true });
await writeFile(outputPath, imageData);
}
async function generatePreparedTask(task: PreparedTask): Promise<TaskResult> {
console.error(`Using ${task.provider} / ${task.model} for ${task.id}`);
console.error(
`Switch model: --model <id> | EXTEND.md default_model.${task.provider} | env ${task.provider.toUpperCase()}_IMAGE_MODEL`
);
let attempts = 0;
while (attempts < MAX_ATTEMPTS) {
attempts += 1;
try {
const imageData = await task.providerModule.generateImage(task.prompt, task.model, task.args);
await writeImage(task.outputPath, imageData);
return {
id: task.id,
provider: task.provider,
model: task.model,
outputPath: task.outputPath,
success: true,
attempts,
error: null,
};
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
const canRetry = attempts < MAX_ATTEMPTS && isRetryableGenerationError(error);
if (canRetry) {
console.error(`[${task.id}] Attempt ${attempts}/${MAX_ATTEMPTS} failed, retrying...`);
continue;
}
return {
id: task.id,
provider: task.provider,
model: task.model,
outputPath: task.outputPath,
success: false,
attempts,
error: message,
};
}
}
return {
id: task.id,
provider: task.provider,
model: task.model,
outputPath: task.outputPath,
success: false,
attempts: MAX_ATTEMPTS,
error: "Unknown failure",
};
}
function createProviderGate(providerRateLimits: Record<Provider, ProviderRateLimit>) {
const state = new Map<Provider, { active: number; lastStartedAt: number }>();
return async function acquire(provider: Provider): Promise<() => void> {
const limit = providerRateLimits[provider];
while (true) {
const current = state.get(provider) ?? { active: 0, lastStartedAt: 0 };
const now = Date.now();
const enoughCapacity = current.active < limit.concurrency;
const enoughGap = now - current.lastStartedAt >= limit.startIntervalMs;
if (enoughCapacity && enoughGap) {
state.set(provider, { active: current.active + 1, lastStartedAt: now });
return () => {
const latest = state.get(provider) ?? { active: 1, lastStartedAt: now };
state.set(provider, {
active: Math.max(0, latest.active - 1),
lastStartedAt: latest.lastStartedAt,
});
};
}
await new Promise((resolve) => setTimeout(resolve, POLL_WAIT_MS));
}
};
}
function getWorkerCount(taskCount: number, jobs: number | null, maxWorkers: number): number {
const requested = jobs ?? Math.min(taskCount, maxWorkers);
return Math.max(1, Math.min(requested, taskCount, maxWorkers));
}
async function runBatchTasks(
tasks: PreparedTask[],
jobs: number | null,
extendConfig: Partial<ExtendConfig>
): Promise<TaskResult[]> {
if (tasks.length === 1) {
return [await generatePreparedTask(tasks[0]!)];
}
const maxWorkers = getConfiguredMaxWorkers(extendConfig);
const providerRateLimits = getConfiguredProviderRateLimits(extendConfig);
const acquireProvider = createProviderGate(providerRateLimits);
const workerCount = getWorkerCount(tasks.length, jobs, maxWorkers);
console.error(`Batch mode: ${tasks.length} tasks, ${workerCount} workers, parallel mode enabled.`);
for (const provider of ["replicate", "google", "openai", "dashscope"] as Provider[]) {
const limit = providerRateLimits[provider];
console.error(`- ${provider}: concurrency=${limit.concurrency}, startIntervalMs=${limit.startIntervalMs}`);
}
let nextIndex = 0;
const results: TaskResult[] = new Array(tasks.length);
const worker = async (): Promise<void> => {
while (true) {
const currentIndex = nextIndex;
nextIndex += 1;
if (currentIndex >= tasks.length) return;
const task = tasks[currentIndex]!;
const release = await acquireProvider(task.provider);
try {
results[currentIndex] = await generatePreparedTask(task);
} finally {
release();
}
}
};
await Promise.all(Array.from({ length: workerCount }, () => worker()));
return results;
}
function printBatchSummary(results: TaskResult[]): void {
const successCount = results.filter((result) => result.success).length;
const failureCount = results.length - successCount;
console.error("");
console.error("Batch generation summary:");
console.error(`- Total: ${results.length}`);
console.error(`- Succeeded: ${successCount}`);
console.error(`- Failed: ${failureCount}`);
if (failureCount > 0) {
console.error("Failure reasons:");
for (const result of results.filter((item) => !item.success)) {
console.error(`- ${result.id}: ${result.error}`);
}
}
}
function emitJson(payload: unknown): void {
console.log(JSON.stringify(payload, null, 2));
}
async function runSingleMode(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<void> {
const task = await prepareSingleTask(args, extendConfig);
const result = await generatePreparedTask(task);
if (!result.success) {
throw new Error(result.error || "Generation failed");
}
if (args.json) {
emitJson({
savedImage: result.outputPath,
provider: result.provider,
model: result.model,
attempts: result.attempts,
prompt: task.prompt.slice(0, 200),
});
return;
}
console.log(result.outputPath);
}
async function runBatchMode(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<void> {
const tasks = await prepareBatchTasks(args, extendConfig);
const results = await runBatchTasks(tasks, args.jobs, extendConfig);
printBatchSummary(results);
if (args.json) {
emitJson({
mode: "batch",
total: results.length,
succeeded: results.filter((item) => item.success).length,
failed: results.filter((item) => !item.success).length,
results,
});
}
if (results.some((item) => !item.success)) {
process.exitCode = 1;
}
}
async function main(): Promise<void> {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
printUsage();
return;
@ -412,86 +894,18 @@ async function main(): Promise<void> {
await loadEnv();
const extendConfig = await loadExtendConfig();
const mergedArgs = mergeConfig(args, extendConfig);
if (!mergedArgs.quality) mergedArgs.quality = "2k";
let prompt: string | null = mergedArgs.prompt;
if (!prompt && mergedArgs.promptFiles.length > 0) prompt = await readPromptFromFiles(mergedArgs.promptFiles);
if (!prompt) prompt = await readPromptFromStdin();
if (!prompt) {
console.error("Error: Prompt is required");
printUsage();
process.exitCode = 1;
if (mergedArgs.batchFile) {
await runBatchMode(mergedArgs, extendConfig);
return;
}
if (!mergedArgs.imagePath) {
console.error("Error: --image is required");
printUsage();
process.exitCode = 1;
return;
}
if (mergedArgs.referenceImages.length > 0) {
await validateReferenceImages(mergedArgs.referenceImages);
}
const provider = detectProvider(mergedArgs);
const providerModule = await loadProviderModule(provider);
let model = mergedArgs.model;
if (!model && extendConfig.default_model) {
if (provider === "google") model = extendConfig.default_model.google ?? null;
if (provider === "openai") model = extendConfig.default_model.openai ?? null;
if (provider === "dashscope") model = extendConfig.default_model.dashscope ?? null;
if (provider === "replicate") model = extendConfig.default_model.replicate ?? null;
}
model = model || providerModule.getDefaultModel();
const outputPath = normalizeOutputImagePath(mergedArgs.imagePath);
let imageData: Uint8Array;
let retried = false;
while (true) {
try {
imageData = await providerModule.generateImage(prompt, model, mergedArgs);
break;
} catch (e) {
if (!retried && isRetryableGenerationError(e)) {
retried = true;
console.error("Generation failed, retrying...");
continue;
}
throw e;
}
}
const dir = path.dirname(outputPath);
await mkdir(dir, { recursive: true });
await writeFile(outputPath, imageData);
if (mergedArgs.json) {
console.log(
JSON.stringify(
{
savedImage: outputPath,
provider,
model,
prompt: prompt.slice(0, 200),
},
null,
2
)
);
} else {
console.log(outputPath);
}
await runSingleMode(mergedArgs, extendConfig);
}
main().catch((e) => {
const msg = e instanceof Error ? e.message : String(e);
console.error(msg);
main().catch((error) => {
const message = error instanceof Error ? error.message : String(error);
console.error(message);
process.exit(1);
});

View File

@ -68,7 +68,11 @@ export async function generateImage(
const baseURL = process.env.OPENAI_BASE_URL || "https://api.openai.com/v1";
const apiKey = process.env.OPENAI_API_KEY;
if (!apiKey) throw new Error("OPENAI_API_KEY is required");
if (!apiKey) {
throw new Error(
"OPENAI_API_KEY is required. Codex/ChatGPT desktop login does not automatically grant OpenAI Images API access to this script."
);
}
if (process.env.OPENAI_IMAGE_USE_CHAT === "true") {
return generateWithChatCompletions(baseURL, apiKey, prompt, model);

View File

@ -36,22 +36,26 @@ function buildInput(prompt: string, args: CliArgs, referenceImages: string[]): R
if (args.aspectRatio) {
input.aspect_ratio = args.aspectRatio;
} else if (referenceImages.length > 0) {
// Replicate nano-banana-pro supports matching the original image ratio for edit/localization tasks.
input.aspect_ratio = "match_input_image";
}
if (args.n > 1) {
input.number_of_images = args.n;
}
if (args.quality === "normal") {
input.resolution = "1K";
} else if (args.quality === "2k") {
input.resolution = "2K";
}
input.output_format = "png";
if (referenceImages.length > 0) {
if (referenceImages.length === 1) {
input.image = referenceImages[0];
} else {
for (let i = 0; i < referenceImages.length; i++) {
input[`image${i > 0 ? i + 1 : ""}`] = referenceImages[i];
}
}
// Official nano-banana-pro schema uses image_input: array
input.image_input = referenceImages;
}
return input;

View File

@ -13,10 +13,29 @@ export type CliArgs = {
imageSize: string | null;
referenceImages: string[];
n: number;
batchFile: string | null;
jobs: number | null;
json: boolean;
help: boolean;
};
export type BatchTaskInput = {
id?: string;
prompt?: string | null;
promptFiles?: string[];
image?: string;
provider?: Provider | null;
model?: string | null;
ar?: string | null;
size?: string | null;
quality?: Quality | null;
imageSize?: "1K" | "2K" | "4K" | null;
ref?: string[];
n?: number;
};
export type BatchFile = BatchTaskInput[] | { tasks: BatchTaskInput[] };
export type ExtendConfig = {
version: number;
default_provider: Provider | null;
@ -29,4 +48,16 @@ export type ExtendConfig = {
dashscope: string | null;
replicate: string | null;
};
batch?: {
max_workers?: number | null;
provider_limits?: Partial<
Record<
Provider,
{
concurrency?: number | null;
start_interval_ms?: number | null;
}
>
>;
};
};

View File

@ -1,14 +1,19 @@
---
name: baoyu-translate
description: Translates articles and documents between languages with three modes - quick (direct), normal (analyze then translate), and refined (analyze, translate, review, polish). Supports custom glossaries and terminology consistency via EXTEND.md. Use when user asks to "translate", "翻译", "精翻", "translate article", "translate to Chinese/English", "改成中文", "改成英文", "convert to Chinese", "localize", "本地化", or needs any document translation. Also triggers for "refined translation", "精细翻译", "proofread translation", "快速翻译", "快翻", "这篇文章翻译一下", or when a URL or file is provided with translation intent.
version: 1.56.1
metadata:
openclaw:
homepage: https://github.com/JimLiu/baoyu-skills#baoyu-translate
requires:
anyBins:
- bun
- npx
readme-blog: https://mp.weixin.qq.com/s/l32EPYG5RmeLXqG6RlXKug
how-to-work:
源文档
→ 解析 Markdown AST
→ (按需)按 block 切 chunks
→ 全文提取术语
→ 生成共享 prompt
→ 每块交给一个 subagent 并行翻译(若可用)
→ 产出各块 draft
→ (按需)主 agent 合并
→ 主 agent 审校
→ 主 agent 修订
→ 最终 translation.md
---
# Translator
@ -17,7 +22,7 @@ Three-mode translation skill: **quick** for direct translation, **normal** for a
## Script Directory
Scripts in `scripts/` subdirectory. `{baseDir}` = this SKILL.md's directory path. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun. Replace `{baseDir}` and `${BUN_X}` with actual values.
Scripts in `scripts/` subdirectory. `${SKILL_DIR}` = this SKILL.md's directory path. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun. Replace `${SKILL_DIR}` and `${BUN_X}` with actual values.
| Script | Purpose |
|--------|---------|
@ -180,7 +185,7 @@ Before translating chunks:
1. **Extract terminology**: Scan entire document for proper nouns, technical terms, recurring phrases
2. **Build session glossary**: Merge extracted terms with loaded glossaries, establish consistent translations
3. **Split into chunks**: Use `${BUN_X} {baseDir}/scripts/chunk.ts <file> [--max-words <chunk_max_words>] [--output-dir <output-dir>]`
3. **Split into chunks**: Use `${BUN_X} ${SKILL_DIR}/scripts/chunk.ts <file> [--max-words <chunk_max_words>] [--output-dir <output-dir>]`
- Parses markdown AST (headings, paragraphs, lists, code blocks, tables, etc.)
- Splits at markdown block boundaries to preserve structure
- If a single block exceeds the threshold, falls back to line splitting, then word splitting
@ -209,6 +214,7 @@ Before translating chunks:
- **Natural flow**: Use idiomatic target language word order and sentence patterns; break or restructure sentences freely when the source structure doesn't work naturally in the target language
- **Terminology**: Use standard translations; annotate with original term in parentheses on first occurrence
- **Preserve format**: Keep all markdown formatting (headings, bold, italic, images, links, code blocks)
- **Image-language awareness**: Preserve image references exactly during translation, but after the translation is complete, review referenced images and check whether their likely main text language still matches the translated article language
- **Frontmatter transformation**: If the source has YAML frontmatter, preserve it in the translation with these changes: (1) Rename metadata fields that describe the *source* article — `url`→`sourceUrl`, `title`→`sourceTitle`, `description`→`sourceDescription`, `author`→`sourceAuthor`, `date`→`sourceDate`, and any similar origin-metadata fields — by adding a `source` prefix (camelCase). (2) Translate the values of text fields (title, description, etc.) and add them as new top-level fields. (3) Keep other fields (tags, categories, custom fields) as-is, translating their values where appropriate
- **Respect original**: Maintain original meaning and intent; do not add, remove, or editorialize — but sentence structure and imagery may be adapted freely to serve the meaning
- **Translator's notes**: For terms, concepts, or cultural references that target readers may not understand — due to jargon, cultural gaps, or domain-specific knowledge — add a concise explanatory note in parentheses immediately after the term. The note should explain *what it means* in plain language, not just provide the English original. Format: `译文English original通俗解释`. Calibrate annotation depth to the target audience: general readers need more notes than technical readers. Only add notes where genuinely needed; do not over-annotate obvious terms.
@ -247,6 +253,20 @@ Each step reads the previous step's file and builds on it.
Final translation is always at `translation.md` in the output directory.
After the final translation is written, do a lightweight image-language pass:
1. Collect image references from the translated article
2. Identify likely text-heavy images such as covers, screenshots, diagrams, charts, frameworks, and infographics
3. If any image likely contains a main text language that does not match the translated article language, proactively remind the user
4. The reminder must be a list only. Do not automatically localize those images unless the user asks
Reminder format:
```text
Possible image localization needed:
- ![[attachments/example-cover.png]]: likely still contains Chinese text while the article is now English
- ![[attachments/example-diagram.png]]: likely text-heavy framework graphic, check whether labels need translation
```
Display summary:
```
**Translation complete** ({mode} mode)
@ -258,6 +278,8 @@ Final: {output-dir}/translation.md
Glossary terms applied: {count}
```
If mismatched image-language candidates were found, append a short note after the summary telling the user that some embedded images may still need image-text localization, followed by the candidate list.
## Extension Support
Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.