As AI scales, infrastructure challenges emerge
Organizations experimenting with gen AI typically set up enterprise-grade accounts with cloud-based services such as OpenAI’s ChatGPT or Anthropic’s Claude, and early field tests and productivity benefits may inspire them to look for more opportunities to deploy the technology.“Companies use gen AI to produce executive summaries, or to produce marketing content,” says Nick Kramer, leader of applied solutions at SSA & Company, a global consulting firm, and next year, we’ll see increased adoption and standardization of these kinds of enterprise use cases, he says, as well as gen AI built into other applications that enterprises use. That’s where most of the value generation happens.Adobe’s Photoshop, for example, now has a gen AI feature. Google and Microsoft are also rolling out gen AI functionality in their productivity platforms, as is Salesforce, and most other enterprise vendors. There might be an extra cost for the new functionality, though, but the vendors are the ones dealing with any potential infrastructure challenges.While everybody can use ChatGPT, or has Office 365 and Salesforce, in order for gen AI to be a differentiator or competitive advantage, companies need to find ways to go beyond what everyone else is doing. That means creating custom models, fine-tuning existing models, or using retrieval augmented generation (RAG) embedding to give gen AI systems access to up-to-date and accurate corporate information. And that means companies have to invest in infrastructure for training and deploying these systems.Telecom testing firm Spirent was one of those companies that started out by just using a chatbot — specifically, the enterprise version of OpenAI’s ChatGPT, which promises protection of corporate data.“We didn’t want our data going into a public model,” says Matt Bostrom, Spirent’s VP of enterprise technology and strategy. “The enterprise edition met that need, so we didn’t have to build our own LLM. At the time, OpenAI had the best LLM, though now Claude is challenging that, and we use it as well.”The company still has that in place, with 130-plus licenses available to its internal users, who use the standard chat interface, and there are no API costs or integrations required. “You’re just using their application and paying for the user license,” he says.But that was just the beginning. “We knew we wanted to embed AI in our existing applications,” he adds. “Salesforce and others have an AI module you can add on, but we wanted to be more specific for our use case.” That meant that the company had to do some serious infrastructure work. It started, like most enterprise-grade AI projects do, with the data.Maximizing the potential of dataAccording to Deloitte’s Q3 state of generative AI report, 75% of organizations have increased spending on data lifecycle management due to gen AI.“When I came into the company last November, we went through a data modernization with AWS,” Bostrom says. “We moved onto the AWS tech stack with both structured and unstructured data.”Getting data out of legacy systems and into a modern lake house was key to being able to build AI. “If you have data or data integrity issues, you’re not going to get great results,” he says. Once the data was organized, moving it to where it was needed was another challenge, he says.“We had integration tools at our company but they were older, outdated tools,” he says. Getting the kind of large-scale integrations necessary for gen AI would have required significant and costly upgrades.Instead, Spirent decided to go with SnapLogic for the integration layer to handle the scale necessary for the project. “We evaluated a lot of different vendors and this one had the most power,” he says. “And they were rolling out their AI builder, which would save us money from not having to purchase another add-on.”As a result, Spirent uses AI for test data within its products to help with customer support and internal productivity, says Bostrom. For example, an employee needing to create a new sales pitch while in, say, Salesforce, can press a button and relevant content from the company’s SharePoint repository would be retrieved and packaged up.That relevant content could include thousands of pages of information such as compliance rules for specific countries. And this internal information would be augmented with data stored in the Salesforce platform and sent to the AI as part of a fine-tuned prompt. The answer then comes back into Salesforce, and the employee can look at the response, edit it, and send it out through the regular Salesforce process.“That’s just one example,” he says. “Now that people have a taste for it, we’ve created more. We have a rinse and repeat cycle.”Moving data to a modern warehouse and implementing modern data pipelines was a huge step, but it didn’t resolve all of the company’s AI infrastructure challenges.“We’re a global company, and there are regional limitations on LLMs,” says Bostrom. “OpenAI has blocked certain countries, and Claude is moving to do that as well. We’ve got employees globally and we don’t want to violate any policies, so we have to figure out how to route the employee down a path to an approved LLM for their country.”As a remedy, there are regional deployment options, so, for example, an AWS data center in Singapore might support users in China. But not all LLMs might be available in that region.There are also open-source LLMs that a company can run on its own in whatever location it needs to, but there’s a shortage of available resources, even with giants like Amazon. “They’re being purchased up and used,” he says. “It’s hard to get those really beefy servers you need to host Mistral on.” So for the time being, Spirent is sticking with big commercial providers like OpenAI, and accessing the LLMs through an API.Spirent is also not building its own vector databases. These are common for RAG, a type of gen AI strategy that improves accuracy and timeliness, and reduces hallucinations while avoiding the issue of having to train or fine-tune an AI on sensitive or proprietary data.“Now there’s a drag-and-drop capability where it’s creating a vector database automatically,” says Bostrom. “We have an assistant where we can put in a thousand files, so we don’t need to purchase our own vector store.”With change comes choiceSpirent’s decision to use a public cloud for data storage is a popular approach. According to a survey of large companies released this summer by Flexential, 59% use public clouds to store the data they need for AI training and inference, while 60% use colocation providers, and 49% use on-prem infrastructure. And nearly all companies have AI roadmaps, with more than half planning to increase their infrastructure investments to meet the need for more AI workloads. But companies are looking beyond public clouds for their AI computing needs and the most popular option, used by 34% of large companies, are specialized GPU-as-a-service vendors.Take business process outsourcing company TaskUs, which is seeing the need for more infrastructure investment as it scales up its gen AI deployments. The challenge isn’t mind-blowing, says its CIO Chandra Venkataramani, but it does mean the company has to be careful about keeping costs under control. “We don’t want to get carried away with technology and go crazy with it,” he says. Specifically, TaskUs needs to move more compute and data back and forth.There are two major types of AI compute, says Naveen Sharma, SVP and global head of AI and analytics at Cognizant, and they have different challenges. On the training side, latency is less of an issue because these workloads aren’t time sensitive. Companies can do their training or fine-tuning in cheaper locations during off-hours. “We don’t have expectations for millisecond responses, and companies are more forgiving,” he says.The other main AI compute use is that of inference, when the trained AI model is used to actually answer questions. This typically needs to happen in real time. “Unless you’ve got an ability to ask your customers to wait for the model to respond, inference becomes a problem,” says Sharma.For example, he says, he’s seen high demand in the Dallas and Houston area. “The whole region has become very hungry for compute because of all the AI firms that moved there,” he says. “And there may be some work happening with oil and gas, and maybe that’s what’s leading to the spikes.”The location can also be an issue for another reason — data sovereignty regulations. In some jurisdictions, data is not allowed to leave for compliance reasons. “If your data is limited to the region you’re in, then you’re limited to using the capacity in that region,” says Sharma.If the hyperscalers can’t provide needed capacity, and a company doesn’t have its own data centers in a colocation facility or on prem, the other main alternative is GPU as a service providers, and they are going strong, Sharma says. “If your hyperscaler isn’t giving you enough at the right price point, there are alternatives,” he says.For companies who know they’re going to have a certain level of demand for AI compute, it makes long-term financial sense to bring some of that to your own data center, says Sharma, and move from on-demand to fixed pricing.Empowering pilotsAlso in the Flexential survey, 43% of companies are seeing bandwidth shortages, and 34% are having problems scaling data center space and power to meet AI workload requirements. Other reported problems include unreliable connections and excessive latency. Only 18% of companies report no issues with their AI applications or workloads over the past 12 months. So it makes sense that 2023 was a year of AI pilots and proofs of concept, says Bharath Thota, partner in the digital and analytics practice at business consultancy, Kearney. And this year has been the year when companies have tried to scale these pilots up.“That’s where the challenge comes in,” he says. “This is not new to AI. But it’s amplified because the amount of data you need to access is significantly larger.” Not only does gen AI consume dramatically more data, but it also produces more data, which is something that companies often don’t expect.In addition, when companies create a model, it’s defined by its training data and weights, so keeping track of different versions of an AI model might require keeping copies of every individual training data set. It depends on the specific use case, says Thota. “Nobody has figured out what the best way is,” he says. “Everybody is learning as they’re iterating.” And all the infrastructure problems — the storage, connectivity, compute, and latency — will only increase next year.Today, there’s a relatively small number of gen AI use cases that have moved all the way from pilots to production, and many of those are deployed in stages. As more pilots go into production, and the production projects expand to all potential users, the infrastructure challenges are going to hit in a bigger way. And finding a solution that works today is not enough, since gen AI technology is evolving at a breakneck pace. “You need to be nimble enough to switch as it gets upgraded,” Thota says.And there’s also the question of skills gaps or staffing shortages related to AI infrastructure management. Managing storage, networking, and compute resources while optimizing for cost and performance even as platforms and use cases all evolve rapidly is a concern, but as gen AI gets smarter, it might be a means to help companies.“You’ve heard of network as code?” asks Mick Douglas, managing partner at InfoSec Innovations and instructor at the SANS Institute. “And there’s infrastructure as code. For some large companies doing a lot of compute, they can start playing games with things like, is it better to have very powerful virtual machines in the cloud or a handful of Lambda functions? You would have the AI create an abstraction layer for you, and then have the AI iterate through all the different builds.”Some of this optimization can be done with machine learning, and already is. But the problem with ML is that provider offerings keep changing. Traditional analytics can handle the math and the simulations, while gen AI can be used to figure out the options and do the more involved analysis. “The main advantage you’re getting with the generative AIs is you can make different deployment code templates in an automated fashion,” says Douglas. “You can take the drudgework out.”