Skip to main content
ai-cloud_2_pyvwuc.webp

Building AI Cloud for India


AI Cloud Opportunity

The global AI market is projected to grow to $407 billion by 2027, at a CAGR of 36.2%, as per a report. This rapid growth indicates a pressing need for specialized AI infrastructure. AI-optimized hardware resources are a critical need of the hour.

By the end of 2024, it's estimated that more than 50% of user interactions will be augmented by AI-driven speech, text, or computer vision algorithms. This trend necessitates robust AI Cloud infrastructure to support these workloads.

At the same time, the AI chip market is expected to reach $83.25 billion by 2027, growing at a CAGR of 35.0% from 2022 to 2027. There is also a significant increase in funding in the market for AI solutions.

What is AI Cloud? Do you need one?

AI Cloud refers to a cloud computing infrastructure specifically designed and optimized for the deployment, management, and scaling of AI and machine learning workloads. It combines the power of cloud computing with specialized hardware accelerators, such as GPUs and other AI-focused chips, as well as software tools and frameworks tailored for AI applications. It is different from hyper scalers such as AWS, Google Cloud, or Azure in that it is meant for one purpose which is to best serve AI workload. They are meant to be cost-effective, particularly for large distributed training jobs, and they intend to democratize AI by simplification.

AI Cloud addresses several pressing challenges. The global AI market is experiencing explosive growth, with a corresponding surge in demand for AI chips and cloud AI workloads. However, this rapid expansion is accompanied by resource constraints, including GPU shortages and a scarcity of AI talent. AI Cloud helps mitigate these issues by offering managed services and simplified AI deployment options, making advanced AI capabilities more accessible to a broader range of organizations. Moreover, it provides a solution to data sovereignty concerns, allowing companies to ensure compliance with local regulations while leveraging cutting-edge AI technologies.

By developing AI Cloud capabilities, organizations can position themselves at the forefront of this wave, addressing the immediate need for scalable AI infrastructure while preparing for future innovations.

Our Client

Our client, a leading cloud provider in India, set out to build an AI cloud platform tailored for machine learning and AI workloads. They needed a strategic technology partner to help them ideate not just what to build, but how to build it. CloudRaft partnered with them to map the broader AI landscape and pinpoint the key features end users were seeking. Through extensive market research and collaboration with key players in the AI ecosystem, we helped design a competitive architecture and roadmap, positioning them to lead in the Indian market.

Problem Statement

Our client, a seasoned player in the data center industry for over a decade, sought to capitalize on India’s emerging AI opportunity by building a dedicated AI cloud. Their primary motivation stemmed from the prohibitively high costs of AI infrastructure offered by hyper scalers and the rapidly increasing demand for AI solutions tailored to the Indian market. They needed a solution that could deliver high performance at a competitive cost, addressing the unique needs of local enterprises and AI innovators.

Solution

CloudRaft embarked on an extensive journey of research and innovation to build a cutting-edge AI cloud platform. After evaluating several leading technologies—including OpenStack, Apache CloudStack, and Kubernetes—we identified Kubernetes as the optimal foundation for our solution. Its unmatched scalability, flexibility, and thriving ecosystem made it the perfect choice for meeting the demands of modern AI workloads. Kubernetes not only allowed us to streamline the go-to-market process but also enabled rapid iterations and innovations, ensuring our platform could evolve alongside advances in AI. Armed with this insight, we crafted a tailored roadmap that addressed our client's GPU and AI infrastructure needs, resulting in a powerful, future-proof AI cloud solution. Here's how we brought this vision to life:

Kubernetes-based AI Cloud Infrastructure

We designed a Kubernetes-based platform that allows end users to easily spin up containers and virtual machines with GPUs and CPUs, deploy AI models with one click, and supports multi-tenancy, security, isolation, and automated provisioning for tenants.

Key Features of the Platform

  • Virtual Machines on Kubernetes: We utilize KubeVirt, an open-source project, which allows the creation of virtual machines on Kubernetes. These VMs are ideal for experimentation by data scientists and AI researchers, enabling them to run the latest AI/ML frameworks such as PyTorch, Diffusers, and TensorFlow, thus streamlining workflows for AI engineers. End users can launch Jupyter Notebooks for data exploration and model training and also SSH into the VMs using their SSH keys.
  • AI Model Inferencing as a Service: Our platform supports AI model inferencing, allowing users to run popular AI models as services with direct integration to Huggingface. This simplifies AI model deployment and consumption.
  • Tenant Lifecycle Management: We automate the provisioning and de-provisioning of tenant workloads, ensuring efficient resource management and scalability.
  • Multi-tenancy and Isolation: We ensure strong tenant isolation to safeguard data privacy and optimize workload efficiency.
  • GPU Reservations: To address the limited availability of GPUs, we implemented a reservation system that allows end users to reserve a GPU on demand.
  • Cloud-Native Storage: We provide cloud-native storage solutions such as S3-equivalent object storage and highly reliable block storage built using Ceph, serving as persistent storage for VMs and containers.
  • Full-Stack Observability: To ensure a highly available platform, we integrated observability tools like Prometheus, Grafana, and Loki for real-time monitoring and alerting.

Currently, the product is still in beta with limited early adopters. We are adding exciting new features to the platform, such as serverless GPUs, advanced GPU optimizations, usage-based billing, and numerous other integrations. Our ongoing innovations are aligned with the latest advancements in AI and cloud-native technologies, and our product roadmap is robust.

Building the AI Cloud wasn't without its challenges, and we'd love to share our journey with you. Schedule a meeting with us to explore how we tackled the complexities and how we can help you build your AI Cloud.


Get an Expert Consultation

We provide end-to-end solution and support for Building AI Cloud, Cloud Optimization, Platform Engineering, Observablity, Monitoring and many more areas. Empower yourself with best in class Kubernetes and Cloud Native solutions.