WoolyAI - Hypervise & Maximize GPU Infra

Run Unified GPU portable(Nvidia& AMD) Pytorch ML containers

62 followers

Run Unified GPU portable(Nvidia& AMD) Pytorch ML containers

62 followers

Visit website

Platforms

Woolyai is now available as software that can be installed on-premise and on cloud GPU instances. With WoolyAI, you can run your ML PyTorch workloads in unified, portable (Nvidia and AMD) GPU containers, increasing GPU throughput from 40-50% to 80-90%.

WoolyAI Acceleration Service gallery image

Free

Launch tags:SaaS•Developer Tools•Data Science

Launch Team

Tines — The intelligent workflow platform

The intelligent workflow platform

Promoted

WoolyAI - Hypervise & Maximize GPU Infra

Maker

📌

Hi, We’re a small team of OS, virtualization, and ML engineers, and after three years of development, we’re thrilled to launch the beta of our CUDA abstraction layer! We decouple the Kernel Shader execution from applications that use CUDA into a Wooly Abstraction layer. In this abstraction layer, we compile these to a new binary, and Shaders are compiled into a Wooly Instruction Set. At runtime, Kernel Shader launch events initiate a transfer of Shader over the network from a CPU host to a GPU host, where they are recompiled. Their execution is managed by Wooly Server software to achieve maximum GPU resource utilization, isolation between workloads, and cross-compatibility with hardware vendors before being converted to be passed on to the respective GPU hardware runtime and drivers. In principle, the wooly abstraction layer is similar to an Operating System, which sits on top of the hardware and enables the most efficient and reliable execution of multiple workloads. We built a GPU Cloud service(WoolyAi Acceleration Service) using this abstraction layer with "Actual GPU Resources Used" billing, NOT "GPU Time Used" billing. Looking forward to getting your feedback and comments.

Report

11mo ago

WoolyAI - Hypervise & Maximize GPU Infra

Maker

@masump We are capturing the specific optimization and transferring it to the vendor-specific optimization if it exists. For example, if PTX has specific optimization, then we do transfer those.

Report

11mo ago

Streak Hunter

Congrats on the launch! Hopefully, the three years of development pay off

Report

11mo ago

BITHUB

Love the vision behind this! Wishing you all the success on Product Hunt and beyond. 🌟

Report

11mo ago

ByteNite

Hey @manisha_arora6 congrats on the launch! Very interesting product. I checked out your demo and I have a couple questions: - How does the abstraction layer’s performance compare to native CUDA execution? Have you run any benchmarks showing efficiency, particularly for deep learning training and inference? - Since the system transfers shaders over the network before execution, what is the impact on latency? How does the recompilation process affect real-time workloads, especially for inference-heavy applications? Lastly, good call on billing for usage vs. time! I think that’s the way to go for new serverless frameworks. 👏🏼

Report

11mo ago

WoolyAI - Hypervise & Maximize GPU Infra

Maker

@fabcairo thanks for the feedback. The performance is close to the native, with some overhead for different utilization metrics that the technology layer collects. The ability to parallelize execution of concurrent workloads is much more than native CUDA.

Report

11mo ago

WoolyAI is a CUDA Abstraction layer. On top of this layer, we have built a GPU Cloud service(WoolyAI Acceleration Service) with "Actual GPU Resources Used" billing, NOT "GPU Time Used" billing for Data Scientist to run Pytorch applications from CPU environment

Report

11mo ago

WoolyAI is turbocharging my AI projects! 🚀 Loving the speed and ease. Like if you're into AI acceleration! Wishing you all the best with your AI ventures!

Report

11mo ago

Ghost Jobs

Interesting, what do you think about including VRAM also as a pricing metrics, or you refer to VRAM when you talk about memory.
Anyway good luck to you!

Report

11mo ago

WoolyAI - Hypervise & Maximize GPU Infra

Maker

@ghost_jobs Our current utilization model is based on measuring VRAM during the kernel execution time and not idle time.

Report

11mo ago

WoolyAI - Hypervise & Maximize GPU Infra

Run Unified GPU portable(Nvidia& AMD) Pytorch ML containers

Run Unified GPU portable(Nvidia& AMD) Pytorch ML containers

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads