Initial commit: auto-video-cut project

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:51:01 +02:00
commit 267070ad52
15 changed files with 2635 additions and 0 deletions
--- a/PLAN.md
+++ b/PLAN.md
@@ -0,0 +1,708 @@
+# Implementierungsplan: auto-video-cut Phase 2
+
+Stand: 2026-03-20
+
+---
+
+## Übersicht der Phasen
+
+```
+Phase 1  ██████████ abgeschlossen
+  CLI-Grundgerüst, Stille-Entfernung, Szenen-Erkennung,
+  Musik-Mixing, Text-Overlays, Sequenz-Datei, Batch
+
+Phase 2a ░░░░░░░░░░ geplant — Video-Qualität
+  Crossfades, Fade-in/out, Audio-Ducking, Fortschritt, Preview
+
+Phase 2b ░░░░░░░░░░ geplant — KI-Kern
+  Whisper-Transkription, Untertitel, Füllwort-Erkennung
+
+Phase 2c ░░░░░░░░░░ geplant — KI-Erweitert
+  Auto-Kapitel (LLM), Highlight-Reel, natürlichsprachliche Sequenzen
+
+Phase 3  ░░░░░░░░░░ geplant — Discord-Bot
+  bot.py implementieren
+```
+
+---
+
+## Phase 2a: Video-Qualität und UX
+
+### Neue Dateien
+
+```
+auto_video_cut/
+├── transitions.py    ← Crossfade, Fade-in/out (ffmpeg xfade)
+├── ducking.py        ← Audio-Ducking (ffmpeg sidechaincompress)
+└── progress.py       ← Fortschrittsanzeige (rich)
+```
+
+### Neue Abhängigkeit
+
+```toml
+# pyproject.toml
+"rich>=13.0"    # Fortschrittsanzeige, Terminal-UI
+```
+
+---
+
+### Schritt 2a-1: Fortschrittsanzeige (`progress.py`)
+
+**Problem:** Aktuell zeigt das CLI nur Start/Ende. Bei langen Videos wartet man blind.
+
+**Lösung:** ffmpeg mit `-progress pipe:1` starten und die Ausgabe parsen.
+
+```python
+# progress.py
+class FfmpegProgress:
+    """Parst ffmpeg -progress pipe:1 Ausgabe in Echtzeit."""
+
+    def __init__(self, total_duration: float):
+        self.total_duration = total_duration
+
+    def run_with_progress(self, cmd: list[str]) -> subprocess.CompletedProcess:
+        """ffmpeg-Befehl mit Fortschrittsbalken ausführen."""
+        # -progress pipe:1 an cmd anhängen
+        # stdout zeilenweise lesen
+        # "out_time_ms=" parsen → Fortschritt berechnen
+        # rich.progress.Progress aktualisieren
+```
+
+**Auswirkung auf bestehenden Code:**
+- `cutter.py`, `merger.py`, `audio.py`, `text.py`: `_run()` durch `progress.run_with_progress()` ersetzen
+- Zentrale `_run()`-Funktion in eigenes Modul auslagern (`runner.py`), damit alle Module sie nutzen
+
+**Akzeptanzkriterium:** Bei `video-cut cut --input test.mp4 --remove-silence` erscheint ein Fortschrittsbalken mit Prozent und geschätzter Restzeit.
+
+---
+
+### Schritt 2a-2: Crossfade und Fade-in/out (`transitions.py`)
+
+**Problem:** Harter Schnitt zwischen Clips sieht amateurhaft aus.
+
+**Lösung:** ffmpeg `xfade` Filter für Übergänge zwischen zwei Clips.
+
+```python
+# transitions.py
+
+def apply_crossfade(
+    clip_a: Path, clip_b: Path, output: Path,
+    duration: float = 0.5,
+    transition: str = "fade"   # fade | dissolve | wipeleft | ...
+) -> Path:
+    """Crossfade zwischen zwei Clips."""
+    # ffmpeg -i a.mp4 -i b.mp4
+    #   -filter_complex "xfade=transition=fade:duration=0.5:offset=<a_dur-0.5>"
+    #   output.mp4
+
+def apply_fade_in(input: Path, output: Path, duration: float = 0.5) -> Path:
+    """Fade-in am Clip-Anfang."""
+    # ffmpeg -i input -vf "fade=in:d=0.5" -af "afade=in:d=0.5"
+
+def apply_fade_out(input: Path, output: Path, duration: float = 0.5) -> Path:
+    """Fade-out am Clip-Ende."""
+    # ffmpeg -i input -vf "fade=out:d=0.5:st=<dur-0.5>" -af "afade=out:d=0.5:st=<dur-0.5>"
+```
+
+**Integration in Sequenz-Datei:**
+```yaml
+sequence:
+  - type: video
+    file: "clip1.mp4"
+    transition: "crossfade"      # NEU
+    transition_duration: 0.5     # NEU
+
+  - type: video
+    file: "clip2.mp4"
+
+global:
+  fade_in: 0.5                   # NEU: Fade-in am Anfang des Gesamtvideos
+  fade_out: 0.5                  # NEU: Fade-out am Ende
+```
+
+**Auswirkung auf bestehenden Code:**
+- `merger.py`: `merge_clips()` muss Übergänge zwischen Clips einfügen können. Aktuell concat-demuxer (copy-only) → bei Crossfades muss re-encodet werden. Strategie: paarweises xfade in Pipeline statt eines einzelnen concat.
+- `sequencer.py`: `ClipEntry` um `transition` und `transition_duration` erweitern
+- `config.py`: Neue Defaults für `transitions`
+
+**Akzeptanzkriterium:** `video-cut merge --inputs a.mp4 b.mp4 --output merged.mp4 --crossfade 0.5` erzeugt ein Video mit sanftem Übergang.
+
+---
+
+### Schritt 2a-3: Audio-Ducking (`ducking.py`)
+
+**Problem:** Musik läuft konstant laut, übertönt Sprache oder ist in Pausen zu leise.
+
+**Lösung:** ffmpeg `sidechaincompress` — Musik wird automatisch leiser wenn der Original-Ton lauter ist.
+
+```python
+# ducking.py
+
+def apply_ducking(
+    video: Path, music: Path, output: Path,
+    volume_original: float = 1.0,
+    volume_music: float = 0.3,
+    duck_threshold: float = 0.02,   # Ab welcher Lautstärke die Musik leiser wird
+    duck_ratio: float = 4.0,        # Wie stark die Absenkung ist
+    duck_attack: float = 0.3,       # Wie schnell die Musik leiser wird (Sekunden)
+    duck_release: float = 1.0,      # Wie schnell die Musik wieder lauter wird
+) -> Path:
+    """Musik unter Video legen mit automatischem Ducking."""
+    # ffmpeg -i video.mp4 -stream_loop -1 -i music.mp3
+    #   -filter_complex
+    #     "[0:a]volume=<orig>[speech];
+    #      [1:a]volume=<music>[music];
+    #      [music][speech]sidechaincompress=
+    #        threshold=<thresh>:ratio=<ratio>:
+    #        attack=<attack>:release=<release>[ducked];
+    #      [speech][ducked]amix=inputs=2:duration=first[a]"
+    #   -map 0:v -map "[a]" -c:v copy output.mp4
+```
+
+**Integration in Config:**
+```yaml
+music:
+  ducking: true                  # NEU: Audio-Ducking aktivieren
+  duck_threshold: 0.02           # NEU
+  duck_ratio: 4.0                # NEU
+  duck_attack: 0.3               # NEU
+  duck_release: 1.0              # NEU
+```
+
+**Auswirkung auf bestehenden Code:**
+- `audio.py`: `mix_music()` bekommt Parameter `ducking=False`. Wenn aktiv → `ducking.apply_ducking()` statt direktem amix.
+- `config.py`: Neue Defaults im `music`-Block
+
+**Akzeptanzkriterium:** Bei einem Vlog mit Sprache und Musik wird die Musik automatisch leiser während der Sprecher redet und kehrt in Pausen zur konfigurierten Lautstärke zurück.
+
+---
+
+### Schritt 2a-4: Preview und Dry-Run
+
+**Preview (schnelle Vorschau):**
+```bash
+video-cut sequence --seq sequence.yaml --preview
+```
+- Rendering in 360p mit `-preset ultrafast`
+- Datei wird im `/tmp/` abgelegt
+- Dauer: ca. 10x schneller als Full-Render
+
+**Dry-Run (nur Analyse):**
+```bash
+video-cut sequence --seq sequence.yaml --dry-run
+```
+Ausgabe:
+```
+Sequenz: 8 Einträge
+  [1] image   title.png              3.0s
+  [2] video   intro.mp4             12.4s
+  [3] video   rohschnitt.mp4        45.2s → Stille entfernen
+  [4] folder  ./aufnahmen/tag1/      3 Dateien, ~98.0s
+  ...
+Geschätzte Gesamtdauer: 4:12
+Geschätzte Dateigröße:  ~180 MB (1080p H.264)
+Musik: random aus resources/music/ (3 Dateien verfügbar)
+```
+
+**Auswirkung auf bestehenden Code:**
+- `cli.py`: Neuer Flag `--preview` und `--dry-run` beim `sequence`-Befehl
+- `sequencer.py`: Neue Funktion `estimate_sequence()` die Dauer/Größe schätzt ohne zu rendern
+
+---
+
+## Phase 2b: KI-Kern — Whisper + Smart Cutting
+
+### Neue Dateien
+
+```
+auto_video_cut/
+├── transcribe.py     ← Whisper-Integration (faster-whisper)
+├── subtitles.py      ← SRT erzeugen und einbrennen
+└── smart_cut.py      ← Füllwort-Erkennung, intelligentes Schneiden
+```
+
+### Neue Abhängigkeiten
+
+```toml
+# pyproject.toml — als optionale Gruppe
+[project.optional-dependencies]
+ai = [
+    "faster-whisper>=1.0",
+    "anthropic>=0.40",         # für Phase 2c (Auto-Kapitel)
+]
+```
+
+Installation: `pip install -e ".[ai]"`
+
+---
+
+### Schritt 2b-1: Whisper-Transkription (`transcribe.py`)
+
+**Kernfunktion:** Audio aus Video extrahieren → Whisper transkribiert → Wort-Level Timestamps.
+
+```python
+# transcribe.py
+
+@dataclass
+class Word:
+    text: str
+    start: float      # Sekunden
+    end: float
+    confidence: float
+
+@dataclass
+class Segment:
+    text: str
+    start: float
+    end: float
+    words: list[Word]
+
+def extract_audio(video: Path, output: Path) -> Path:
+    """Audio-Spur als WAV extrahieren (16kHz Mono für Whisper)."""
+    # ffmpeg -i video.mp4 -ar 16000 -ac 1 -f wav audio.wav
+
+def transcribe(
+    audio: Path,
+    model_size: str = "base",     # tiny | base | small | medium | large-v3
+    language: str | None = None,  # None = auto-detect
+    device: str = "auto",         # auto | cpu | cuda
+) -> list[Segment]:
+    """Audio mit faster-whisper transkribieren."""
+    # from faster_whisper import WhisperModel
+    # model = WhisperModel(model_size, device=device)
+    # segments, info = model.transcribe(audio, word_timestamps=True)
+    # → list[Segment] mit Wort-Level-Timestamps
+
+def transcribe_video(
+    video: Path,
+    model_size: str = "base",
+    language: str | None = None,
+) -> list[Segment]:
+    """Kompletter Workflow: Video → Audio → Transkript."""
+```
+
+**Integration in Config:**
+```yaml
+ai:
+  whisper_model: "base"          # tiny | base | small | medium | large-v3
+  whisper_language: null          # null = auto-detect, "de", "en", ...
+  whisper_device: "auto"          # auto | cpu | cuda
+```
+
+**CLI:**
+```bash
+video-cut transcribe --input video.mp4 --output untertitel.srt
+video-cut transcribe --input video.mp4 --model large-v3 --language de
+```
+
+**Akzeptanzkriterium:** `video-cut transcribe --input test.mp4` erzeugt eine `.srt`-Datei mit korrekten Timestamps.
+
+---
+
+### Schritt 2b-2: Untertitel erzeugen und einbrennen (`subtitles.py`)
+
+```python
+# subtitles.py
+
+def segments_to_srt(segments: list[Segment], output: Path) -> Path:
+    """Transkript-Segmente als SRT-Datei speichern."""
+
+def burn_subtitles(
+    video: Path, srt: Path, output: Path,
+    font_size: int = 24,
+    font_color: str = "white",
+    outline_color: str = "black",
+    outline_width: int = 2,
+    position: str = "bottom",     # bottom | top
+) -> Path:
+    """Untertitel via ffmpeg subtitles-Filter einbrennen."""
+    # ffmpeg -i video.mp4 -vf "subtitles=untertitel.srt:force_style='...'" output.mp4
+
+def auto_subtitle(
+    video: Path, output: Path,
+    model_size: str = "base",
+    language: str | None = None,
+    **style_kwargs,
+) -> Path:
+    """Alles in einem: Transkribieren → SRT → Einbrennen."""
+```
+
+**Integration in Sequenz-Datei:**
+```yaml
+sequence:
+  - type: video
+    file: "vlog.mp4"
+    auto_subtitles: true           # NEU
+    subtitle_language: "de"        # NEU (optional)
+    subtitle_style:                # NEU (optional)
+      font_size: 24
+      position: "bottom"
+```
+
+**CLI:**
+```bash
+video-cut subtitle --input video.mp4 --output video_sub.mp4
+video-cut subtitle --input video.mp4 --srt existing.srt    # vorhandene SRT nutzen
+```
+
+**Akzeptanzkriterium:** `video-cut subtitle --input test.mp4` erzeugt ein Video mit eingebrannten deutschen Untertiteln.
+
+---
+
+### Schritt 2b-3: Intelligentes Schneiden (`smart_cut.py`)
+
+**Kernidee:** Whisper liefert Wort-Level-Timestamps. Daraus lässt sich viel mehr machen als nur dB-basierte Stille-Erkennung.
+
+```python
+# smart_cut.py
+
+# Deutsche und englische Füllwörter
+FILLER_WORDS_DE = {"äh", "ähm", "also", "quasi", "sozusagen", "halt", "naja", "ne"}
+FILLER_WORDS_EN = {"uh", "um", "like", "you know", "basically", "actually", "so"}
+
+@dataclass
+class CutDecision:
+    start: float
+    end: float
+    reason: str        # "silence" | "filler" | "false_start" | "repeat"
+    confidence: float
+
+def detect_fillers(
+    segments: list[Segment],
+    filler_words: set[str] | None = None,
+    language: str = "de",
+) -> list[CutDecision]:
+    """Füllwörter im Transkript finden und als Schnitt-Kandidaten markieren."""
+    # Jedes Wort prüfen: ist es ein Füllwort?
+    # Zeitbereich des Worts → CutDecision(reason="filler")
+
+def detect_false_starts(segments: list[Segment]) -> list[CutDecision]:
+    """Fehlstarts erkennen: Satz beginnt, bricht ab, beginnt neu."""
+    # Heuristik: Segment < 3 Wörter, gefolgt von Pause > 0.3s,
+    # gefolgt von neuem Segment das ähnlich anfängt
+    # → CutDecision(reason="false_start")
+
+def detect_long_pauses(
+    segments: list[Segment],
+    max_pause: float = 1.0,
+    keep_pause: float = 0.3,
+) -> list[CutDecision]:
+    """Pausen zwischen Segmenten erkennen und auf Wunschlänge kürzen."""
+    # Pause zwischen Segment N und N+1 > max_pause?
+    # → Kürzen auf keep_pause Sekunden
+
+def smart_remove(
+    video: Path,
+    output: Path,
+    model_size: str = "base",
+    remove_fillers: bool = True,
+    remove_false_starts: bool = True,
+    shorten_pauses: bool = True,
+    max_pause: float = 1.0,
+    language: str | None = None,
+) -> tuple[Path, list[CutDecision]]:
+    """Intelligenter Schnitt: Transkribieren → Analysieren → Schneiden."""
+    # 1. Transkribieren (transcribe.py)
+    # 2. Füllwörter finden
+    # 3. Fehlstarts finden
+    # 4. Pausen analysieren
+    # 5. Alle CutDecisions zusammenführen
+    # 6. Inverse Zeitabschnitte berechnen (wie cutter.invert_ranges)
+    # 7. Clips ausschneiden und zusammenfügen
+    # Rückgabe: fertiges Video + Liste der Schnitte (für Review)
+```
+
+**CLI:**
+```bash
+# Intelligenter Schnitt (ersetzt --remove-silence)
+video-cut smart-cut --input video.mp4
+video-cut smart-cut --input video.mp4 --keep-fillers --no-false-starts
+video-cut smart-cut --input video.mp4 --max-pause 0.5
+
+# Nur Analyse anzeigen, ohne zu schneiden
+video-cut smart-cut --input video.mp4 --analyze-only
+```
+
+Ausgabe `--analyze-only`:
+```
+Transkription: 342 Wörter, 2:45 Gesamtdauer
+Gefunden:
+  12x Füllwörter (äh, ähm, also)     → 4.2s einsparen
+   3x Fehlstarts                       → 6.8s einsparen
+   8x Pausen > 1.0s (auf 0.3s kürzen) → 9.1s einsparen
+                                        ─────────────
+Geschätzte Einsparung: 20.1s (12% des Videos)
+```
+
+**Integration in Sequenz-Datei:**
+```yaml
+sequence:
+  - type: video
+    file: "vlog.mp4"
+    smart_cut: true               # NEU: ersetzt remove_silence
+    remove_fillers: true          # NEU
+    remove_false_starts: true     # NEU
+    max_pause: 0.8                # NEU
+```
+
+**Akzeptanzkriterium:** `video-cut smart-cut --input test.mp4 --analyze-only` zeigt eine Aufschlüsselung der erkannten Schnitt-Kandidaten. `video-cut smart-cut --input test.mp4` erzeugt ein Video ohne Füllwörter und gekürzte Pausen.
+
+---
+
+## Phase 2c: KI-Erweitert — LLM-Integration
+
+### Neue Dateien
+
+```
+auto_video_cut/
+├── chapters.py       ← Auto-Kapitel via LLM
+├── highlights.py     ← Highlight-Reel
+└── describe.py       ← Natürlichsprachliche Sequenz-Erstellung
+```
+
+---
+
+### Schritt 2c-1: Auto-Kapitel (`chapters.py`)
+
+**Workflow:**
+```
+Video → Whisper-Transkript → LLM (Claude) → Kapitel mit Titeln
+```
+
+```python
+# chapters.py
+
+@dataclass
+class Chapter:
+    title: str
+    start: float
+    end: float
+    summary: str
+
+def generate_chapters(
+    segments: list[Segment],
+    llm_provider: str = "anthropic",  # anthropic | ollama
+    model: str = "claude-haiku-4-5-20251001",
+    language: str = "de",
+    max_chapters: int = 10,
+) -> list[Chapter]:
+    """Kapitel aus Transkript generieren."""
+    # Transkript als Text aufbereiten (mit Timestamps)
+    # → LLM-Prompt:
+    #   "Du bist ein Video-Editor. Analysiere dieses Transkript
+    #    und erstelle sinnvolle Kapitel mit kurzen, prägnanten Titeln.
+    #    Gib Start-Timestamp und Titel für jedes Kapitel zurück."
+    # → JSON-Response parsen
+
+def chapters_to_youtube_format(chapters: list[Chapter]) -> str:
+    """Kapitel als YouTube-kompatible Beschreibung formatieren."""
+    # 0:00 Intro
+    # 0:45 Ankunft in Berlin
+    # 3:22 Restaurantbesuch
+    # ...
+
+def chapters_to_sequence_entries(chapters: list[Chapter]) -> list[dict]:
+    """Kapitel als type:text Einträge für sequence.yaml erzeugen."""
+    # Für jedes Kapitel einen Text-Clip mit dem Titel generieren
+```
+
+**CLI:**
+```bash
+video-cut chapters --input video.mp4 --output chapters.txt
+video-cut chapters --input video.mp4 --format youtube
+video-cut chapters --input video.mp4 --format sequence    # → YAML-Snippet
+video-cut chapters --input video.mp4 --inject-titles      # Text-Clips einfügen
+```
+
+**LLM-Konfiguration:**
+```yaml
+ai:
+  llm_provider: "anthropic"      # anthropic | ollama
+  llm_model: "claude-haiku-4-5-20251001"
+  anthropic_api_key: null         # oder ANTHROPIC_API_KEY env var
+  ollama_url: "http://localhost:11434"
+  ollama_model: "llama3"
+```
+
+**Akzeptanzkriterium:** `video-cut chapters --input test.mp4 --format youtube` gibt eine YouTube-kompatible Kapitel-Beschreibung aus.
+
+---
+
+### Schritt 2c-2: Highlight-Reel (`highlights.py`)
+
+**Workflow:**
+```
+Video → Transkript + Szenen-Erkennung → LLM bewertet Szenen → Beste auswählen → Zusammenschneiden
+```
+
+```python
+# highlights.py
+
+@dataclass
+class ScoredScene:
+    start: float
+    end: float
+    score: float           # 0.0–1.0
+    reason: str            # Warum diese Szene interessant ist
+
+def score_scenes(
+    segments: list[Segment],
+    scenes: list[TimeRange],
+    llm_provider: str = "anthropic",
+) -> list[ScoredScene]:
+    """Szenen nach Interesse bewerten."""
+    # Für jede Szene: zugehörigen Transkript-Text extrahieren
+    # LLM bewerten lassen:
+    #   - Enthält Schlüsselaussage?
+    #   - Emotionaler Moment?
+    #   - Neues Thema/Ort?
+    #   - Humor/Überraschung?
+
+def create_highlight_reel(
+    video: Path,
+    output: Path,
+    target_duration: float = 60.0,   # Ziel-Dauer in Sekunden
+    model_size: str = "base",
+    crossfade: float = 0.3,
+) -> Path:
+    """Automatisch Highlight-Reel zusammenstellen."""
+    # 1. Transkribieren
+    # 2. Szenen erkennen
+    # 3. Szenen bewerten
+    # 4. Beste Szenen auswählen (Rucksack-Problem: maximize score, constrain duration)
+    # 5. Chronologisch sortieren
+    # 6. Mit Crossfades zusammenfügen
+```
+
+**CLI:**
+```bash
+video-cut highlights --input video.mp4 --duration 60
+video-cut highlights --input video.mp4 --duration 120 --output best_of.mp4
+```
+
+---
+
+### Schritt 2c-3: Natürlichsprachliche Sequenz (`describe.py`)
+
+**Workflow:**
+```
+Natürlichsprachliche Beschreibung → LLM → sequence.yaml
+```
+
+```python
+# describe.py
+
+def generate_sequence(
+    description: str,
+    available_resources: dict,    # Gefundene Dateien in resources/
+    available_files: list[Path],  # Dateien im angegebenen Ordner
+    config: dict,
+) -> str:
+    """Aus natürlicher Beschreibung eine sequence.yaml generieren."""
+    # LLM-Prompt:
+    #   "Du bist ein Video-Editor. Erstelle eine sequence.yaml
+    #    basierend auf folgender Beschreibung:
+    #    '<description>'
+    #
+    #    Verfügbare Ressourcen:
+    #    - Musik: <liste>
+    #    - Intros: <liste>
+    #    - Bilder: <liste>
+    #    - Videos im Ordner: <liste>
+    #
+    #    Format der sequence.yaml: <schema>"
+```
+
+**CLI:**
+```bash
+video-cut describe "Mach ein Reisevlog aus ./berlin/, Stille raus, Intro dran, ruhige Musik"
+# → Erzeugt sequence.yaml und zeigt Vorschau
+
+video-cut describe "..." --execute    # Direkt rendern
+```
+
+---
+
+## Aktualisierte Projektstruktur (nach Phase 2)
+
+```
+auto_video_cut/
+├── __init__.py
+├── cli.py              ← CLI (erweitert: smart-cut, transcribe, subtitle, chapters, highlights, describe, bot)
+├── config.py           ← Konfiguration (erweitert: ai-Block, ducking, transitions)
+├── runner.py            ← NEU: zentrale ffmpeg-Ausführung mit Fortschritt
+├── progress.py          ← NEU: Fortschrittsanzeige (rich)
+├── cutter.py           ← Stille/Szenen (unverändert, aber nutzt runner.py)
+├── audio.py            ← Musik-Mixing (erweitert: Ducking-Option)
+├── ducking.py           ← NEU: Audio-Ducking
+├── merger.py           ← Clips zusammenführen (erweitert: Crossfade-Support)
+├── transitions.py       ← NEU: Crossfade, Fade-in/out
+├── text.py             ← Text-Overlays (unverändert)
+├── sequencer.py        ← Sequenz-Datei (erweitert: smart_cut, auto_subtitles, transitions)
+├── transcribe.py        ← NEU: Whisper-Transkription
+├── subtitles.py         ← NEU: SRT erzeugen + einbrennen
+├── smart_cut.py         ← NEU: Füllwort-/Fehlstart-Erkennung
+├── chapters.py          ← NEU: Auto-Kapitel via LLM
+├── highlights.py        ← NEU: Highlight-Reel
+├── describe.py          ← NEU: Natürlichsprachliche Sequenzen
+└── bot.py               ← NEU: Discord-Bot (Phase 3)
+```
+
+---
+
+## Abhängigkeiten (komplett)
+
+```toml
+[project]
+dependencies = [
+    "typer>=0.12",
+    "pyyaml>=6.0",
+    "scenedetect[opencv]>=0.6",
+    "ffmpeg-python>=0.2",
+    "rich>=13.0",
+    "discord.py>=2.3",
+]
+
+[project.optional-dependencies]
+ai = [
+    "faster-whisper>=1.0",
+    "anthropic>=0.40",
+]
+```
+
+- `pip install -e .` → CLI + Discord (ohne KI)
+- `pip install -e ".[ai]"` → alles inkl. Whisper + LLM
+
+---
+
+## Implementierungsreihenfolge
+
+| # | Schritt | Abhängigkeiten | Aufwand |
+|---|---------|---------------|---------|
+| 1 | `runner.py` + `progress.py` | keine | klein |
+| 2 | Bestehende Module auf `runner.py` umstellen | #1 | klein |
+| 3 | `transitions.py` (Crossfade, Fade) | #1 | mittel |
+| 4 | `merger.py` Crossfade-Integration | #3 | mittel |
+| 5 | `ducking.py` + `audio.py` Integration | #1 | mittel |
+| 6 | Preview + Dry-Run in `cli.py` | #1 | klein |
+| 7 | `transcribe.py` (Whisper) | keine | mittel |
+| 8 | `subtitles.py` (SRT + Einbrennen) | #7 | klein |
+| 9 | `smart_cut.py` (Füllwörter, Fehlstarts) | #7 | mittel |
+| 10 | `chapters.py` (LLM) | #7 | mittel |
+| 11 | `highlights.py` (LLM + Szenen) | #7, #10 | groß |
+| 12 | `describe.py` (natürlichsprachlich) | #10 | mittel |
+| 13 | `bot.py` (Discord) | alle | groß |
+
+**Empfohlener Start:** #1 → #2 → #7 → #9 → #8 (Fortschritt + Whisper-Kern zuerst)
+
+---
+
+## Risiken
+
+| Risiko | Auswirkung | Mitigation |
+|--------|-----------|------------|
+| Whisper-Modell zu langsam auf CPU | KI-Features unbrauchbar | `tiny`/`base` als Default, GPU-Support dokumentieren |
+| faster-whisper API-Änderungen | Import-Fehler | Version pinnen, Import-Fallback |
+| LLM-API-Kosten (Anthropic) | Unerwartete Kosten bei vielen Videos | Haiku als Default, Kosten-Warnung im CLI |
+| ffmpeg xfade-Kompatibilität | Alte ffmpeg-Versionen haben keinen xfade | Versionscheck beim Start, Fallback auf harten Schnitt |
+| Füllwort-Erkennung fehlerhaft | "Also" als Satzanfang wird geschnitten | Kontext-Analyse (nur standalone-Füllwörter), `--analyze-only` zum Review |