Meta has once again open-sourced a groundbreaking model - Segment Anything Model 2 (SAM 2). Building on the success of the original SAM, SAM 2 takes segmentation capabilities to the next level by enabling real-time segmentation of both images and videos.
Key features of SAM 2:
- First unified model for real-time object segmentation in both images and videos
- Improved accuracy and performance compared to existing methods
- Excellent zero-shot generalization ability to segment any object in any video
- Interaction time reduced to one-third of previous models
SAM 2 can accurately segment a wide range of objects in videos, including:
- Moving objects like soccer balls and playing cards
- Deformable objects like dough being kneaded
- Colorful fish swimming
- Microscopic cells
Along with the model, Meta also released the SA-V dataset:
- Contains ~51,000 real-world videos
- Over 600,000 spatio-temporal masks (masklets)
- 4.5x larger than existing video segmentation datasets
- 53x more annotations
This dataset will greatly accelerate visual data annotation and help build better computer vision systems.
Potential applications of SAM 2 include:
- Creative video effects when combined with generative video models
- Tracking objects in drone footage of endangered animals
- Locating regions in laparoscopic camera feeds during medical procedures
- Real-time video editing and live streaming effects
- Annotation tool for training data in computer vision systems like self-driving cars
The open-sourcing of SAM 2 continues Meta's commitment to advancing AI through open collaboration. The model and dataset are available under permissive licenses, allowing both academic and commercial use.
By releasing SAM 2, Meta aims to empower the AI community to build innovative applications and make new discoveries in computer vision. The unified image and video segmentation capabilities of SAM 2 open up exciting possibilities across industries from content creation to scientific research.