Visual analysis

Sample and analyze frames from your imported media — useful for smart reframe, B-roll selection, and moment detection.


Give the AI agent "eyes" on your footage. Visual analysis samples frames, labels content, and finds best moments — improving highlight extraction and reframe decisions.

What it does

  • Samples sparse frames from imported media: main track, overlay, or all footage
  • Caches samples for agent use
  • Can label (manually or via vision model):
    • Scene summaries
    • On-screen text
    • Tags (face, product, screen, etc.)
    • Best moments
  • Used by extract best moments, smart reframe, and B-roll selection

When to use it

Before tools that benefit from visual context:

  • Before extract best moments → finds visually interesting shots
  • Before smart reframe → identifies faces, products, screens
  • Before B-roll insert → picks visually appealing B-roll
  • Before creator template → scene-aware pacing decisions

How to use

Basic sampling:

"Sample visual analysis"

"Analyze the footage"

Targeted sampling:

"Sample frames from just the main track"

"Analyze the imported B-roll assets"

Analysis and caching only:

"Sample visual context and cache it for later"

What gets sampled

Source options:

| Source | Coverage | |--------|----------| | main track | Only main video clips | | overlay tracks | Visual overlay elements | | all imported | Everything in media bin |

Labelling options:

| Label | Helpful for | |-------|-------------| | Scene summary | Contextual editing decisions | | On-screen text | Avoid cropping rendered text | | Face tags | Face-biased reframe | | Product tags | Product-focused editing | | Best moments | Highlight extraction |

Caching and storage

  • Samples stored in IndexedDB (browser storage)
  • Persists across sessions
  • Re-sampling overwrites previous
  • Per-project, not shared across projects

How it helps

Extract best moments

Without visual analysis:

  • Uses audio energy only

With visual analysis:

  • Audio energy + visual best-moment hints
  • Action shots, reactions, dynamic scenes score higher

Smart reframe

Without visual analysis:

  • Heuristic crop centers on detected motion

With visual analysis:

  • Face/product/screen labels guide crop bias
  • "Face" tag → upward bias (keeps eyes visible)
  • "Product" tag → safe crop (preserves detail)
  • "Screen" tag → fit mode (no crop)

B-roll insert

  • Labels help pick relevant B-roll (e.g., "reaction" footage for reaction inserts)
  • Best-moment labels rank B-roll quality

Limitations

  • Sparse sampling — not every frame; approximately 1 frame per few seconds
  • Cached — analysis is from last sampling; changes to footage require re-sample
  • Vision-dependent — requires vision model configuration for auto-labeling
  • Heuristic — labels are best-effort, not 100% accurate

Configuration

Vision labeling (optional): Requires configuration.

Vision analysis can be provided by:

  • Claude vision capabilities
  • Azure OpenAI GPT-4 Vision
  • AWS Bedrock vision models

If not configured, sampling still caches frames for manual label entry.

Tips

  • Sample before key tools — workflow: import → sample visual → extract highlights/reframe
  • Re-sample after major changes — new imports, long sessions
  • Targeted sampling — full analysis can take time; sample just what you need
  • Not required — all tools work without visual analysis; it's an enhancement

See also

Community