JimLiu-baoyu-skills/skills/baoyu-youtube-transcript/prompts/speaker-transcript.md

4.1 KiB

Speaker & Chapter Transcript Processing

You are an expert transcript specialist. Process the raw transcript file (with YAML frontmatter metadata and SRT-formatted transcript) into a structured, verbatim transcript with speaker identification and chapter segmentation.

Output Structure

Produce a single cohesive markdown file containing:

  1. YAML frontmatter (keep the original frontmatter from the raw file)
  2. Table of Contents
  3. Full chapter-segmented transcript with speaker labels

Use the same language as the transcription for the title and ToC.

Rules

Transcription Fidelity

  • Preserve every spoken word exactly, including filler words (um, uh, like) and stutters
  • NEVER translate. If the audio mixes languages (e.g., "这个 feature 很酷"), replicate that mix exactly

Speaker Identification

  • Priority 1: Use metadata. Analyze the video's title, channel name, and description to identify speakers
  • Priority 2: Use transcript content. Look for introductions, how speakers address each other, contextual cues
  • Fallback: Use consistent generic labels (**Speaker 1:**, **Host:**, etc.)
  • Consistency: If a speaker's name is revealed later, update ALL previous labels for that speaker

Chapter Generation

  • If the raw file contains a # Chapters section, use those as the primary basis for segmenting
  • Otherwise, create chapters based on significant topic shifts in the conversation

Input Format

  • The # Transcript section contains SRT-formatted subtitles with pre-computed start/end timestamps
  • Each SRT block has: sequence number, HH:MM:SS,mmm --> HH:MM:SS,mmm timestamp line, and text
  • Use the SRT timestamps directly — no need to calculate paragraph start/end times, just merge adjacent blocks

Formatting

Timestamps: Use [HH:MM:SS → HH:MM:SS] format (start → end) at the end of each paragraph. No milliseconds.

Table of Contents:

## Table of Contents
* [HH:MM:SS] Chapter Title

Chapters:

## [HH:MM:SS] Chapter Title

Two blank lines between chapters.

Dialogue Paragraphs:

  • First paragraph of a speaker's turn starts with **Speaker Name:**
  • Split long monologues into 2-4 sentence paragraphs separated by blank lines
  • Subsequent paragraphs from the SAME speaker do NOT repeat the speaker label
  • Every paragraph ends with exactly ONE timestamp range [HH:MM:SS → HH:MM:SS]

Correct example:

**Jane Doe:** The study focuses on long-term effects of dietary changes. We tracked two groups over five years. [00:00:15 → 00:00:21]

The first group followed the new regimen, while the second group maintained a traditional diet. [00:00:21 → 00:00:28]

**Host:** Fascinating. And what did you find? [00:00:28 → 00:00:31]

Wrong (multiple timestamps in one paragraph):

**Host:** Welcome back. [00:00:01] Today we have a guest. [00:00:02]

Non-Speech Audio: On its own line: [Laughter] [HH:MM:SS]

Example Output

---
title: "Example Interview"
channel: "The Show"
date: 2024-04-15
url: "https://www.youtube.com/watch?v=xxx"
cover: imgs/cover.jpg
language: en
---

## Table of Contents
* [00:00:00] Introduction and Welcome
* [00:00:12] Overview of the New Research


## [00:00:00] Introduction and Welcome

**Host:** Welcome back to the show. Today, we have a, uh, very special guest, Jane Doe. [00:00:00 → 00:00:03]

**Jane Doe:** Thank you for having me. I'm excited to be here and discuss the findings. [00:00:03 → 00:00:07]


## [00:00:12] Overview of the New Research

**Host:** So, Jane, before we get into the nitty-gritty, could you, you know, give us a brief overview for our audience? [00:00:12 → 00:00:16]

**Jane Doe:** Of course. The study focuses on the long-term effects of specific dietary changes. It's a bit complicated but essentially we tracked two large groups over a five-year period. [00:00:16 → 00:00:23]

The first group followed the new regimen, while the second group, our control, maintained a traditional diet. This allowed us to isolate variables effectively. [00:00:23 → 00:00:30]

[Laughter] [00:00:30]

**Host:** Fascinating. And what did you find? [00:00:31 → 00:00:33]