Visual analysis

Sample and analyze frames from your imported media — useful for smart reframe, B-roll selection, and moment detection.

Give the AI agent "eyes" on your footage. Visual analysis samples frames, labels content, and finds best moments — improving highlight extraction and reframe decisions.

What it does

Samples sparse frames from imported media: main track, overlay, or all footage
Caches samples for agent use
Can label (manually or via vision model):
- Scene summaries
- On-screen text
- Tags (face, product, screen, etc.)
- Best moments
Used by extract best moments, smart reframe, and B-roll selection

When to use it

Before tools that benefit from visual context:

Before extract best moments → finds visually interesting shots
Before smart reframe → identifies faces, products, screens
Before B-roll insert → picks visually appealing B-roll
Before creator template → scene-aware pacing decisions

How to use

Basic sampling:

"Sample visual analysis"

"Analyze the footage"

Targeted sampling:

"Sample frames from just the main track"

"Analyze the imported B-roll assets"

Analysis and caching only:

"Sample visual context and cache it for later"

What gets sampled

Source options:

| Source | Coverage | |--------|----------| | main track | Only main video clips | | overlay tracks | Visual overlay elements | | all imported | Everything in media bin |

Labelling options:

| Label | Helpful for | |-------|-------------| | Scene summary | Contextual editing decisions | | On-screen text | Avoid cropping rendered text | | Face tags | Face-biased reframe | | Product tags | Product-focused editing | | Best moments | Highlight extraction |

Caching and storage

Samples stored in IndexedDB (browser storage)
Persists across sessions
Re-sampling overwrites previous
Per-project, not shared across projects

How it helps

Extract best moments

Without visual analysis:

Uses audio energy only

With visual analysis:

Audio energy + visual best-moment hints
Action shots, reactions, dynamic scenes score higher

Smart reframe

Without visual analysis:

Heuristic crop centers on detected motion

With visual analysis:

Face/product/screen labels guide crop bias
"Face" tag → upward bias (keeps eyes visible)
"Product" tag → safe crop (preserves detail)
"Screen" tag → fit mode (no crop)

B-roll insert

Labels help pick relevant B-roll (e.g., "reaction" footage for reaction inserts)
Best-moment labels rank B-roll quality

Limitations

Sparse sampling — not every frame; approximately 1 frame per few seconds
Cached — analysis is from last sampling; changes to footage require re-sample
Vision-dependent — requires vision model configuration for auto-labeling
Heuristic — labels are best-effort, not 100% accurate

Configuration

Vision labeling (optional): Requires configuration.

Vision analysis can be provided by:

Claude vision capabilities
Azure OpenAI GPT-4 Vision
AWS Bedrock vision models

If not configured, sampling still caches frames for manual label entry.

Tips

Sample before key tools — workflow: import → sample visual → extract highlights/reframe
Re-sample after major changes — new imports, long sessions
Targeted sampling — full analysis can take time; sample just what you need
Not required — all tools work without visual analysis; it's an enhancement