WoolyAI - Hypervise & Maximize GPU Infra

WoolyAI - Hypervise & Maximize GPU Infra

Run Unified GPU portable(Nvidia& AMD) Pytorch ML containers

62 followers

Woolyai is now available as software that can be installed on-premise and on cloud GPU instances. With WoolyAI, you can run your ML PyTorch workloads in unified, portable (Nvidia and AMD) GPU containers, increasing GPU throughput from 40-50% to 80-90%.
WoolyAI Acceleration Service gallery image
WoolyAI Acceleration Service gallery image
WoolyAI Acceleration Service gallery image
WoolyAI Acceleration Service gallery image
Free
Launch Team
Tines
Tines
The intelligent workflow platform
Promoted

What do you think? …

Manisha Arora
Hi, We’re a small team of OS, virtualization, and ML engineers, and after three years of development, we’re thrilled to launch the beta of our CUDA abstraction layer! We decouple the Kernel Shader execution from applications that use CUDA into a Wooly Abstraction layer. In this abstraction layer, we compile these to a new binary, and Shaders are compiled into a Wooly Instruction Set. At runtime, Kernel Shader launch events initiate a transfer of Shader over the network from a CPU host to a GPU host, where they are recompiled. Their execution is managed by Wooly Server software to achieve maximum GPU resource utilization, isolation between workloads, and cross-compatibility with hardware vendors before being converted to be passed on to the respective GPU hardware runtime and drivers. In principle, the wooly abstraction layer is similar to an Operating System, which sits on top of the hardware and enables the most efficient and reliable execution of multiple workloads. We built a GPU Cloud service(WoolyAi Acceleration Service) using this abstraction layer with "Actual GPU Resources Used" billing, NOT "GPU Time Used" billing. Looking forward to getting your feedback and comments.
Manisha Arora

@masump We are capturing the specific optimization and transferring it to the vendor-specific optimization if it exists. For example, if PTX has specific optimization, then we do transfer those.

Denis 🐝

Congrats on the launch! Hopefully, the three years of development pay off

Mohamed Zakarya

Love the vision behind this! Wishing you all the success on Product Hunt and beyond. 🌟

Fabio Caironi
Hey @manisha_arora6 congrats on the launch! Very interesting product. I checked out your demo and I have a couple questions: - How does the abstraction layer’s performance compare to native CUDA execution? Have you run any benchmarks showing efficiency, particularly for deep learning training and inference? - Since the system transfers shaders over the network before execution, what is the impact on latency? How does the recompilation process affect real-time workloads, especially for inference-heavy applications? Lastly, good call on billing for usage vs. time! I think that’s the way to go for new serverless frameworks. 👏🏼
Manisha Arora

@fabcairo thanks for the feedback. The performance is close to the native, with some overhead for different utilization metrics that the technology layer collects. The ability to parallelize execution of concurrent workloads is much more than native CUDA.

Shanza Khan

WoolyAI is a CUDA Abstraction layer. On top of this layer, we have built a GPU Cloud service(WoolyAI Acceleration Service) with "Actual GPU Resources Used" billing, NOT "GPU Time Used" billing for Data Scientist to run Pytorch applications from CPU environment


lee Jackson

WoolyAI is turbocharging my AI projects! 🚀 Loving the speed and ease. Like if you're into AI acceleration! Wishing you all the best with your AI ventures!

Eli Alderson

Interesting, what do you think about including VRAM also as a pricing metrics, or you refer to VRAM when you talk about memory.
Anyway good luck to you!

Manisha Arora

@ghost_jobs Our current utilization model is based on measuring VRAM during the kernel execution time and not idle time.