CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

Robotics: Science and Systems (RSS) 2026

I-Chun Arthur Liu, Krzysztof Choromanski, Sandy Huang, Connor Schenck

Summary

CLAMP is a 3D pre-training framework for robotic manipulation that learns image and action representations from large-scale simulated robot trajectories via contrastive learning. From RGB-D images and camera extrinsics, it builds a merged point cloud and re-renders multi-view four-channel observations (depth + 3D coordinates), including dynamic wrist views, to give clearer views of target objects for high-precision tasks. The pre-trained encoders, combined with a Diffusion Policy initialized during pre-training, are fine-tuned on a small number of task demonstrations and outperform state-of-the-art baselines across six simulated and five real-world tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
static		static
.gitignore		.gitignore
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages