AI that truly understands video. Uses multimodal models (Marengo/Pegasus) to search, analyze & generate text from video content at scale.
This is the 3rd launch from TwelveLabs. View more
Pegasus 1.5 by TwelveLabs
Launching today
Pegasus 1.5 transforms raw video into consistent, structured, timestamped data on-the-fly. Video becomes a queryable and computable asset, based on your company’s custom requirements. Define a schema of what matters in your domain, point it at any video up to 2 hours, and get back structured, time based metadata in a single API call. And, it’s multimodal – pass in an image, and find anytime this reference appears in your video. Your video library, finally queryable for humans and agents.








Free Options
Launch Team / Built With








I had a blast collaborating with @emilykurze and the @TwelveLabs team on this launch.
Pegasus 1.5 is a significant leap in generative video AI: autonomous and reliable segmentation, long-form video support (up to 2 hours), with SOTA performances (30% better than Gemini 3 Pro, 3.1 Pro, and 3 Flash).
Try the free Playground at twelvelabs.io - Looking forward to see what you're building!
TwelveLabs
Hi, Jae here. I’m the CEO of @TwelveLabs
Today, we’re launching Pegasus 1.5, the first video language model turning video into queryable data assets. What would you build if your video was as queryable as text? Try the free Playground: twelvelabs.io
Video is the most opaque data source; it’s hard to know the content of your video without simply watching the video. Pegasus 1.5 lets you understand your video library, autonomously, on-the-fly, and at scale. More than that, it future proofs your archive and enables agents to actually navigate it with enriched and custom-defined metadata.
What’s New:
Time Based Metadata: Generate custom, time-coded metadata, based on your exact need. Some examples include: segment every time the speaker changes, segment every time my favorite basketball player dunks, and segment every time my logo appears on screen.
On the Fly Processing: Start with just one video, and get value immediately. If you’re a creator who needs to chapterize your content for youtube, with transcription and key events, upload the video on TwelveLabs, and Pegasus 1.5 will give you exactly what you need.
Multimodal Prompting: Pass in an image, and tell the model to show you every time the object in the image appears. Try it for product placement or for tracking your favorite player across a game.
We're proud to make a model that actually helps you understand your video content, in the way you want. We outperform top general models on segmentation and on multimodal inputs. We support 2 hours of video, which is more than two times other models. And, we’re way more cost efficient. Check it out, and would love your feedback!
@TwelveLabs @jaelee_ is this a purely cloud-based api or is there an on-prem/vpc option for enterprise security? for raw video data, moving 2-hour files to the cloud is always the bottleneck. love the 'on-the-fly' processing promise.
TwelveLabs
@priya_kushwaha1 Excellent question! While we follow rigorous security standards in our cloud-based API, we know that for many industries & use cases, the compute needs to go to the data and not the other way around.
We're actively building deployment options to meet customers where their data already lives - whether that's a VPC, on-prem or an air-gapped environment. Video AI shouldn't force you to move your most sensitive content to get value from it. We will have something exciting to announce in the near future. Stay tuned ;).
Serand
Impressive to see long-form support up to 2 hours. That’s where most current tools struggle, so this feels like a meaningful improvement.