Red Hat went all-in on generative AI at its annual summit last week, offering a wide range of tools for operational and development teams to help them build and deploy generative AI systems. That includes tools for creating and managing a model garden, training and fine-tuning models, building applications, and deploying generative AI at scale in a hybrid architecture.
Red Hat did not release its own generative AI foundation model last week. Instead, it partnered with IBM to feature the Granite models as the default option in its tool sets and as the base for its Lightspeed products. But Red Hat had everything else. It delivered a version of Linux – Red Hat Enterprise Linux AI – optimized for AI, InstructLab for fine tuning models, and Podman AI Lab for building and testing AI-powered applications.
Red Hat also made improvements to OpenShift AI, a platform released last year for deploying gen AI applications at scale. These improvements include the ability to serve AI models at the edge and better model development. Red Hat also announced that Lightspeed, a generative AI tool already embedded as an assistant in Red Hat’s Ansible automation platform, is being extended to OpenShift and Red Hat Enterprise Linux.
In fact, last week’s Red Hat conference, like so many other tech conferences these days, was all AI all the time.
Of course, other vendors already offer much of what we saw last week.
IBM’s newly open-sourced Granite models? Other companies have also released open source gen AI, including Databricks and Snowflake, as well as Meta and Microsoft – even X and Apple are in the game.
AI-powered assistants? Nearly every tech vendor is adding them to almost every product. AI model labs? Data scientists have several options. AI deployment platforms? Every hyperscaler has one.
But two main things distinguish Red Hat’s approach from what we’ve seen before. First, Red Hat’s tools and platforms are focused on developers, not just data science teams. This means that enterprises won’t have to find rare and expensive AI talent to build their systems. Instead, their existing developers can use the tools they already know and love.
Second, Red Hat is all about platform independence. Enterprises can deploy on prem, in private clouds, or in any of the public clouds, and use the IBM Granite models, or one of the models vetted by Red Hat and its community, or any other AI model available. The goal is to make it easier, cheaper, and faster for companies to build and deploy generative AI solutions.
Focus on developers
It used to be that AI was the exclusive province of data scientists. Creating, tuning, and managing models required heavy-duty statistical and data analysis skills and the use of specialized data science platforms and languages. That made sense in the old machine learning days, when enterprises typically built their models from scratch. That’s all been changing.
“There may have been some confusion last year about the role of data scientists versus developers about testing out and building prototypes of applications leveraging foundation models,” says IDC analyst Michele Rosen. “Now I think we’re seeing the pendulum shift to application developers.”
Instead of building a model from scratch, developers take an existing foundation model, fine tune it, and build a pipeline or an application to meet specific enterprise use cases, she says. “Google, Amazon, etcetera, also have various versions of these prototyping, sandboxing, experimentation, and other strategies for targeting developers,” she says. “But what I thought was interesting about Red Hat’s strategy is that they really emphasize that it should be integrated with the existing software development life cycle.”
Red Hat is giving developers a way to use their existing expertise to build generative AI applications, instead of having to learn a whole new set of tools, she says.
It’s almost as though Red Hat is building a new operating system with RHEL AI, but for AI.
“It is a foundational platform,” says Tushar Katarki, senior director of product management for the hybrid platforms business unit at Red Hat. “Think of RHEL AI as that. There’s a foundational model in there, and then on top of that you have all these tools and utilities to help improve that model, and add your own customizations. And you can contribute back to the open source community or just keep it for yourself, if you have proprietary trade secrets.”
The foundational mode at the heart of RHEL AI is IBM’s Granite model, distributed under the Apache 2 open source license. That’s a more enterprise-friendly license than many other open source generative AI models. Meta’s Llama, for example, uses a unique license that limits how the model can be used and, some would argue, doesn’t qualify as an open source license.
Plus, IBM is also open sourcing its training data sets, says Katarki. “This is still an emerging space, but I think these are the most open models at the present,” he says.
Then there’s InstructLab, a developer-friendly set of tools that lets non-data-scientists customize the model. In fact, it’s for “any kind of domain expert who wants to add domain-specific skills,” Katarki says.
Users submit knowledge and skills in a taxonomy that’s easier for humans to understand, he says. “That is the innovation that IBM researchers came up with.”
Models are typically trained with question-answer pairs. OpenAI, for example, requires users to submit at least 50 pairs to fine tune ChatGPT, but it often takes thousands of these sample questions to create something useful. InstructLab uses a set of sample questions – just a handful can be enough – and then looks at the knowledge taxonomy to generate more question-and-answer pairs of the same type.
“It amplifies the high-quality data, knowledge and skills that the user has inputted,” said Katarki.
Then these new question-and-answer pairs are used to fine-tune the model, to create one that’s usually designed to meet a company’s requirements. That new, fine-tuned model can then be run on RHEL AI for prototyping or small use cases, or deployed via OpenShift AI at scale. If new data comes in and a company doesn’t want to retrain the model, it can add the new information in the form of retrieval-augmented generation, or RAG. This is typically a vector database or knowledge graph that is used to add context to a gen AI query.
AI models don’t currently have built-in access controls – it doesn’t know that it should answer certain questions for some users and not for others. So RAG is also used to hold information that shouldn’t be part of an AI model’s base training.
Then RHEL AI has the tools developers need to create identity and access controls around the models and the systems that access them and, again, OpenShift AI can be used to deploy these applications and controls at scale. “And all that comes with enterprise support and intellectual property indemnification, not only for the software, but for the model as well,” Katarki says.
Microsoft, Google, AWS, Adobe, Canva, and other providers also offer legal indemnification for users of their AI models. But these are mostly commercial models and run in the provider’s own environment. Red Hat’s indemnification covers the Granite open source models no matter where they are deployed. “That’s really an important value that we’re bringing to the customer,” says Katarki.
Hybrid AI future
Smaller enterprises might be happy to wait for their vendors to add generative AI to their platforms, or to pick a single cloud AI platform and stick with it. But larger enterprises, or companies who see generative AI as a competitive advantage, or those with particular regulatory requirements to deal with, or who might want to scale their gen AI projects without paying the steep API costs of the commercial vendors, might prefer more control and flexibility.
“Customers know that they need a hybrid AI story, across cloud, on-prem and edge,” says Stephen Huels, vice president and general manager of Red Hat’s AI business unit. “AI has become the ultimate hybrid workload.”
“There is a desire among enterprises to take more ownership of the assets that go into building generative AI outcomes,” confirms Bradley Shimmin, chief analyst for AI platforms, analytics, and data management at Omdia. “A big part of that is to be able to optimize for spend – and that’s really where Red Hat shines.”
Part of what helps make this possible is containerization, he adds.
Containers allow enterprises to easily move workloads between on-prem data centers, private cloud, and public clouds, depending on need, and make it easier to scale infrastructure up and down. With the OpenShift container platform and Ansible automation platform, Red Hat is streamlining the creation, deployment and management of complicated architectures, such as those required by generative AI, Shimmin says.
“Red Hat is appealing to a definite need in both the data science and developer communities,” he says. “Both hard core data scientists and hard core applications developers are looking to build with generative AI, but to do it right, and to scale it, you’re going to be working in containerized environments.”
This is what makes Red Hat’s newest set of gen AI offerings stand out, he says.
Other vendors also offer the tools needed to build or customize models. “On the generative AI and predictive AI front Red Hat competes with [Google Cloud’s] Vertex AI, [Amazon] Bedrock, Azure AI, DataRobot, Dataiku, Databricks…” Shimmin says. “What’s unique about Red Hat, however, is that it’s vertically integrating the data science stack across OS, cloud, and data science.”
Smaller context window, smaller models
One major limitation of Red Hat’s AI strategy today is that the particular Granite models that Red Hat is using in RHEL AI and for its Lightspeed assistant have a context window size of 8,192 tokens. The context window is the amount of memory that a model has in a single session of prompts and responses. Granite’s context window is just about large enough for 24 pages of text.
By comparison, OpenAI’s GTP-4 can handle as much as 128,000 tokens and Anthropic’s Claude 3 has 200,000. That means that you can upload an entire full-length novel into Claude AI. And Google’s new Gemini model can handle 1 million tokens, enough for several Harry Potter books at once.
For companies looking to summarize or analyze long documents or large blocks of code, a longer context window is extremely useful.
A context window of 8,192 tokens is “obviously a constraint,” says IDC’s Rosen. “If you’re doing things that involve a lot of code or text, that would be an issue.”
But there are many use cases where a small context window will do the job, she adds. “If you use this for questions and answers, that’s a horizontal use case that applies to a lot of contexts,” she says. The background information that the model needs to have to answer the questions, such as a company’s knowledge base, can be added to the model via fine-tuning or RAG embedding, she says.
The Granite models are also smaller in size than some of their competitors, trained on 3 billion to 34 billion parameters. By comparison, OpenAI’s GPT 3.5 has about 175 billion parameters and GPT-4 reportedly has more than a trillion. Meta’s Llama 3 comes in 8 billion parameter and 70 billion parameter sizes, with a 400 billion parameter model currently in training.
Some smaller models do sometimes perform as well as or better than larger ones, but, in general, the more parameters a model is trained on, the smarter it is.
Still, there are use cases in which a smaller model can work just fine – and be significantly faster and cheaper.
“I heard a couple of customers at the conference mention that the model is going to be good enough,” says IDC’s Rosen, who attended the Red Hat Summit. “One customer said, ‘A 70-billion-parameter model is not useful for us. We can’t handle it. We’re a health care organization and we don’t have the resources to run that bigger model.'”
Finally, the last missing piece of the AI puzzle is agents. Agents are a more recent development in the generative AI space, and are used to handle complex, multistep workflows that involve planning, delegation, testing, and iteration.
Microsoft’s AutoDev, for example, uses autonomous AI agents to create a fully automated software development framework. If Red Hat does support agents at some point, it would be in OpenShift AI, which is the MLOps platform, says Red Hat’s Katarki. “That’s where I would say your AI agents would live and be connected to do various kinds of agent workflows,” he says.
More Red Hat news: