Discovering holistic infrastructure strategies for compute-intensive startups

Anyone with an idea can use it

The Microsoft for Startups Founders Hub brings together people, knowledge, and benefits to help founders at every stage solve their startup challenges. Sign up in minutes with no funds required.

This is the second part of a three-part AI-Core Insights series. Click here for Part 1, “The Foundation Model: Open Source or Not?”

In Part 1 of this three-part blog series, we explored a practical approach to foundational models (FMs) for both open and closed source. From a deployment perspective, it proves which underlying model is most effective in solving the intended use case.

Let’s simplify the seemingly endless infrastructure required to bring a product to life from a compute-intensive underlying model.there are two well-discussed problem statement:

Fine-tuning costs that require lots of data and GPUs with enough vRAM and memory to host large models – this builds a moat around differentiated fine-tuning or rapid engineering. This is especially true if you
A small inference cost per call, but compounded with the number of inference calls. This is maintained regardless.

Simply put, returns and investments must go hand in hand. However, initially this may require a huge sunk cost. So what do you focus on?

FM Startup Infrastructure Dilemma

If you have a fine tuning pipeline, it will look something like this:

Data preprocessing and labeling: I have a large pool of datasets. I’m doing some pre-processing of the data (cleaning, resizing, background removal, etc.). A small GPU is needed here. Then maybe label with a smaller model and a smaller GPU.
are you OK-tuning: Once you start fine-tuning your model, you’ll need a massive GPU, which the A100 is famous for.these are expensive GPULoad large models and fine tune specialized data. Hopefully no hardware failures in the process. In that case, hopefully the checkpoints will be minimal (which takes time). If it fails and there was a checkpoint, I try to get as much tweaking as possible. However, depending on how suboptimal your checkpoints are, you’ll lose quite a few hours anyway.
Search and inference: After this, we will serve the model for inference. The size of the model is still huge, so we host it in the cloud and increase the inference cost per query. If you want a super-optimal configuration, argue between A10 and A100. Configuring the GPU to fully spin up and down causes cold start issues. If you keep running GPUs, you’re racking up huge GPU costs (i.e. investment) without paying for your users (i.e. return).

Note: Without pipeline tweaks, there is no preprocessing element, but we are still thinking about providing infrastructure.

The biggest decisions related to the sunk cost debate are:What constitutes infrastructure? A) Infrastructure issues and borrow Do you focus on your core product and offer it from a provider, or B) build Do you build components in-house, invest time and money upfront, find and solve problems? A) Consolidate locations and save a lot of costs associated with ingress/egress and regions and zones. or B) Distribute it from different sources to diversify the points of failure, but spread it across zones or regions and potentially cause latency issues? Need a solution?

The trend we see in growing startups is to focus on core product differentiation and commoditize the rest. Infrastructure can be a complex overhead that keeps you away from monetizable problem statements, or it can be a big power plant with bits and pieces that can be easily scaled with a single click as you grow .

Beyond Compute: The Role of Platforms and Accelerating Inference

There’s a euphemism I’ve heard in the startup community. “You can’t throw a GPU at every problem.” “Optimization is (generally speaking) a problem that cannot be fully solved in hardware.” Not to mention the important role of platform and runtime software, other factors such as model compression and quantization come into play. Inference acceleration and Checkpoint.

Given the big picture, the role of optimization and acceleration is rapidly centralized. Runtime accelerators like ONNX enable 1.4x faster inference, and rapid checkpointing like Nebula helps training jobs recover from hardware failures, saving the most important resource: time. increase. In addition to this, using simple techniques such as autoscaling, scaling, and workload triggers can bring the number of GPUs idle and waiting for the next burst of inference requests back to the lowest possible scale. You can spin down with .

At the roundtables we host for startups, sometimes the simplest questions are the ones that burn the most cash. To manage growth, how do you balance serving your customers in the short term with the most efficient hardware and scale, and in the long term? A term that involves scaling up and down?

summary

When considering the productization of underlying models with training and inference at scale, we must consider the role of platform and inference acceleration along with the role of infrastructure. Techniques like ONNX Runtime and Nebula are just a few of those considerations, and there are many others. Ultimately, start-ups face the challenge of efficiently serving customers in the short term while managing growth and scalability in the long term.

Sign up today for the Microsoft for Startups Founders Hub for more tips on bringing AI to your startups and getting started building industry-leading AI infrastructure.

Discovering holistic infrastructure strategies for compute-intensive startups

I’ve Earned $20 000 Writing Online. Here Are the Lessons I’ve Learned. | by Aure’s Notes | The Startup | Jul, 2023

Atlassian billionaire Mike Cannon-Brookes and his wife Annie are separating

5 Mistakes You Need to Avoid as An Entrepreneur

The Role of Artificial Intelligence in Startup Success

A Silent New AI Bombshell Launch Nobody Saw Coming | by Jano le Roux | The Startup | Jul, 2023

Microsoft working with Conviction Partners to support AI incubator, Embed

Leave A Reply Cancel Reply

Discovering holistic infrastructure strategies for compute-intensive startups

Anyone with an idea can use it

FM Startup Infrastructure Dilemma

Beyond Compute: The Role of Platforms and Accelerating Inference

summary

Related Posts

I’ve Earned $20 000 Writing Online. Here Are the Lessons I’ve Learned. | by Aure’s Notes | The Startup | Jul, 2023

Atlassian billionaire Mike Cannon-Brookes and his wife Annie are separating

5 Mistakes You Need to Avoid as An Entrepreneur

The Role of Artificial Intelligence in Startup Success

A Silent New AI Bombshell Launch Nobody Saw Coming | by Jano le Roux | The Startup | Jul, 2023

Microsoft working with Conviction Partners to support AI incubator, Embed

Leave A Reply Cancel Reply