Without basic computer architecture best practices, generative AI systems are sluggish. Here are a few tips to optimize complex systems. By David Linthicum.
Performance is often an afterthought with generative AI development and deployment. Most deploying generative AI systems on the cloud, and even not the cloud, have yet to learn what the performance of their generative AI systems should be, take no steps to determine performance, and end up complaining about the performance after deployment. Or, more often, the users complain, and then generative AI designers and developers complain to me.
Author also discusses in this blog:
- Complex deployment landscapes
- AI model tuning
- Vendors could have done a better job establishing best practices
- Security concerns
- Regulatory compliance
Implement automation for scaling and resource optimization, or autoscaling, which cloud providers provide. This includes using machine learning operations (MLOps) techniques and approaches for operating AI models.
At their essence, generative AI systems are complex, distributed data-oriented systems that are challenging to build, deploy, and operate. They are all different, with different moving parts. Most of the parts are distributed everywhere, from the source databases for the training data, to the output data, to the core inference engines that often exist on cloud providers. Nice one!
[Read More]