Skip to content

Catherine-R-He/EntityBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EntityBench: Towards Entity-Consistent Long-Range

Multi-Shot Video Generation

Ruozhen He1,3, Meng Wei1, Ziyan Yang2, Vicente Ordonez3

1ByteDance  ·  2ByteDance Seed  ·  3Rice University

arXiv Project page Code / Data


Abstract

Multi-shot video generation extends single-shot generation to coherent visual narratives, yet maintaining consistent characters, objects, and locations across shots remains a challenge over long sequences. Existing evaluations typically use independently generated prompt sets with limited entity coverage and simple consistency metrics, making standardized comparison across methods difficult. We introduce EntityBench, a benchmark of 140 episodes (2,491 shots) derived from real narrative media, with explicit per-shot entity schedules tracking characters, objects, and locations simultaneously across easy / medium / hard tiers of up to 50 shots, 13 cross-shot characters, 8 cross-shot locations, 22 cross-shot objects, and recurrence gaps spanning up to 48 shots. EntityBench pairs the dataset with a three-pillar evaluation suite that disentangles intra-shot visual quality, prompt-following alignment, and cross-shot entity consistency, with a fidelity gate that admits only accurate entity appearances into cross-shot scoring. To establish baselines, we propose EntityMem, a memory-augmented generation system that stores verified per-entity visual references in a persistent memory bank before generation begins, enabling the video backbone to retrieve each entity's appearance across shots. Experiments show that cross-shot entity consistency degrades sharply with recurrence distance in existing methods, and that explicit per-entity memory yields the highest character fidelity (Cohen's d = +2.33) and presence among methods evaluated.


What's in this release

  • data/scripts/ — 140 episode JSONs (~2,491 shots), each with the full per-shot entity schedule and registry prompts.
  • data/splits/ — easy / medium / hard tier splits.
  • eval/ — three Python files: evaluate_benchmark.py (single-GPU evaluator), run_eval_distributed.py (multi-GPU launcher), compare_methods.py (method-vs-method comparison).
  • examples/run_eval_example.sh — ready-to-edit launch script.

Setup

conda create -n entitybench python=3.11 -y
conda activate entitybench

pip install torch==2.5.1 torchvision --index-url https://download.pytorch.org/whl/cu124
pip install diffusers==0.36.0 transformers tqdm openai imageio imageio-ffmpeg
pip install groundingdino-py vbench pyiqa

Pillar 2 / Pillar 3 LLM scoring uses an Azure-OpenAI-compatible endpoint:

export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com/"
export GEMINI_API_KEYS="key1,key2,key3"     # comma-separated, rotated round-robin

Run

Generated videos must be laid out as <results_dir>/<episode_id>/<scene>_<shot>.mp4 (zero-padded, e.g. 001_001.mp4).

Multi-GPU (recommended)

python eval/run_eval_distributed.py \
  --n_gpus 8 \
  --results_dir <generated videos> \
  --scripts_dir data/scripts \
  --split_json  data/splits/final_split_validated_ids.json \
  --out_dir     eval_results/<method_name> \
  --method_name <method_name> \
  --pillars 1,2,3 \
  --llm_concurrency 5 \
  --resume

Single GPU

python eval/evaluate_benchmark.py \
  --results_dir <generated videos> \
  --scripts_dir data/scripts \
  --split_json  data/splits/final_split_validated_ids.json \
  --out_dir     eval_results/<method_name> \
  --method_name <method_name> \
  --pillars 1,2,3 \
  --llm_concurrency 5 \
  --resume

Comparisons

python eval/compare_methods.py \
  --method_a eval_results/method_a      --label_a method_a \
  --method_b eval_results/method_b  --label_b method_b \
  --out_dir  comparison/

Citation

@article{he2026entitybench,
  title   = {EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation},
  author  = {He, Ruozhen and Meng, Wei and Yang, Ziyan and Ordonez, Vicente},
  journal = {Preprint},
  year    = {2026},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages