Anyone with an idea can participate
The Microsoft for Startups Founders Hub brings together the talent, knowledge, and merit to help founders at every stage solve their startup challenges. Sign up in minutes with no funds required.
This is Part 3 of our 3-part AI-Core Insights series. Click here for Part 1“Fundamental Model: Open Source or Not?”, and Click here for Part 2“Discovering Holistic Infrastructure Strategies for Compute-Intensive Startups”.
Startups are leading the way in LLM-driven use cases. The road can be bumpy with issues with GPU allocation, allocated capacity availability, API rate limits, and more. Additionally, the LLM pipeline has a myriad of priorities that need to be timed for different stages of a product build.
In the final installment of our AI Core Insights series, we’ll summarize some decisions you need to consider at various stages to make your life easier.
Experiment with the model
During the experimental phase, we first test and compare several models, both open and closed source. For OpenAI APIs, Microsoft for Startups is providing access to $2,500 worth of OpenAI credits to get you up and running quickly with APIs for experimentation.
a simple model catalog A great way to experiment with multiple models in an easy way. pipeline Find the model with the best performance for your use case. The updated AzureML model catalog lists the best models from HuggingFace and some models selected by Azure.
The compute target for this stage is either CPU or GPU, and doesn’t require a lot of super-performance systems to scale. GPUs include V100, A100, or RTX GPUs. My guess is that the most widely used SKUs are A10 and V100, but sometimes A100 is also used. It is important to pursue alternatives to ensure access scale using multiple dependent variables such as region availability and quota availability.
Considerations after model selection
Once your experiment is complete, you can wrap it around your use case and the appropriate model configuration that accompanies it. However, a model configuration is usually a set of models rather than just one. Here are some considerations to keep in mind:
- a paper like Thrifty GPT We outline different techniques for choosing the best deployment between model choice and use case success. This is similar to the malloc principle. You have the option of choosing the first fit, but often the most efficient product comes from the best fit.
- serverless computing products It helps deploy ML jobs without the overhead of ML job management and understanding compute types.
- For deployment comparison, setup Jobs via Azure ML Studio Useful for performance benchmarking and evaluation.
- Creating multiple pipelines is easy. reusable components Use Azure ML.
pave the way for rapid growth
With a few customers under your bucket, your LLM pipeline begins to scale rapidly. At this stage there are additional considerations.
- content safety As your reasoning gets through to the customer, it starts to matter. Azure Content Safety Studio is the perfect place to get ready for customer deployments.
- Autoscaling ML endpoints It can help you scale up and down based on demand and alerts. This helps optimize costs for different customer workloads.
- Building on an infrastructure like Azure helps us anticipate some growing needs, such as service reliability, adherence to compliance regulations such as HIPAA.
As large-mode driven use cases become more mainstream, all but a few large enterprises your model is not your product. However, taking some considerations early on can help you prioritize the right problem statements to quickly build, deploy, and scale your product as your industry continues to expand.
For continuous learning and building around AI, Sign up for the Microsoft for Startups Founders Hub today.