Is a Small Language Model Better Than an LLM for You?

While it’s tempting to brush aside seemingly minimal AI model token costs, that’s only one line item in the total cost of ownership (TCO) calculation. Still, managing model costs is the right place to start in getting control over the end sum. Choosing the right sized model for a given task is imperative as the first step. But it’s also important to remember that when it comes to AI models, bigger is not always better and smaller is not always smarter.
“Small language models (SLMs) and large language models (LLMs) are both AI-based models, but they serve different purposes,” says Atalia Horenshtien, head of the data and AI practice in North America at Customertimes, a digital consultancy firm.
“SLMs are compact models, efficient, and tailored for specific tasks and domains. LLMs, are massive models, require significant resources, shine in more complex scenarios and fit general and versatile cases,” Horenshtien adds.
While it makes sense in terms of performance to choose the right size model for the job, there are some who would argue model size isn’t much of a cost argument even though large models cost more than smaller ones.
“Focusing on the price of using an LLM seems a bit misguided. If it is for internal use within a company, the cost usually is lass than 1% of what you pay your employees. OpenAI, for example, charges $60 per month for an Enterprise GPT license for an employee if you sign up for a few hundred. Most white-collar employees are paid more than 100x that, and even more as fully loaded costs,” says Kaj van de Loo, CPTO, CTO, and chief innovation officer at UserTesting.
Instead, this argument goes, the cost should be viewed in a different light.
“Do you think using an LLM will make the employee more than 1% more productive? I do, in every case I have come across. It [focusing on the price] is like trying to make a business case for using email or video conferencing. It is not worth the time,” van de Loo adds.
Size Matters but Maybe Not as You Expect
On the surface, arguing about model sizes seems a bit like splitting hairs. After all, a small language model is still typically large. A SLM is generally defined as having fewer than 10 billion parameters. But that leaves a lot of leeway too, so sometimes an SLM can have only a few thousand parameters although most people will define an SLM as having between 1 billion to 10 billion parameters.
As a matter of reference, medium language models (MLM) are generally defined as having between 10B and 100B parameters while large language models have more than 100 billion parameters. Sometimes MLMs are lumped into the LLM category too, because what’s a few extra billion parameters, really? Suffice it to say, they’re all big with some being bigger than others.
In case you’re wondering, parameters are internal variables or learning control settings. They enable models to learn, but adding more of them adds more complexity too.
“Borrowing from hardware terminology, an LLM is like a system’s general-purpose CPU, while SLMs often resemble ASICs — application-specific chips optimized for specific tasks,” says Professor Eran Yahav, an associate professor at the computer science department at the Technion – Israel Institute of Technology and a distinguished expert in AI and software development. Yahav has a research background in static program analysis, program synthesis, and program verification from his roles at IBM Research and Technion. Currently, he is CTO and co-founder of Tabnine, an AI-coding assistant for software developers.
To reduce issues and level-up the advantages in both large and small models, many companies do not choose one size over the other.
“In practice, systems leverage both: SLMs excel in cost, latency, and accuracy for specific tasks, while LLMs ensure versatility and adaptability,” adds Yahav.
As a general rule, the main differences in model sizes pertain to performance, use cases, and resource consumption levels. But creative use of any sized model can easily smudge the line between them.
“SLMs are faster and cheaper, making them appealing for specific, well-defined use cases. They can, however, be fine-tuned to outperform LLMs and used to build an agentic workflow, which brings together several different ‘agents’ — each of which is a model — to accomplish a task. Each model has a narrow task, but collectively they can outperform an LLM,” explains, Mark Lawyer, RWS‘ president of regulated industries and linguistic AI.
There’s a caveat in defining SLMs versus LLMs in terms of task-specific performance, too.
“The distinction between large and small models isn’t clearly defined yet,” says Roman Eloshvili, founder and CEO of XData Group, a B2B software development company that exclusively serves banks. “You could say that many SLMs from major players are essentially simplified versions of LLMs, just less powerful due to having fewer parameters. And they are not always designed exclusively for narrow tasks, either.”
The ongoing evolution of generative AI is also muddying the issue.
“Advancements in generative AI have been so rapid that models classified as SLMs today were considered LLMs just a year ago. Interestingly, many modern LLMs leverage a mixture of experts architecture, where smaller specialized language models handle specific tasks or domains. This means that behind the scenes SLMs often play a critical role in powering the functionality of LLMs,” says Rogers Jeffrey Leo John, co-founder and CTO of DataChat, a no-code, generative AI platform for instant analytics.
In for a Penny, in for a Pound
SLMs are the clear favorite when the bottom line is the top consideration. They are also the only choice when a small form factor comes into play.
“Since the SLMs are smaller, their inference cycle is faster. They also require less compute, and they’re likely your only option if you need to run the model on an edge device,” says Sean Falconer, AI entrepreneur in residence at Confluent.
However, the cost differential between model sizes comes from more than direct model costs like token costs and such.
“Unforeseen operational costs often creep in. When using complex prompts or big outputs, your bills may inflate. Background API calls can also very quickly add up if you’re embedding data or leveraging libraries like ReAct to integrate models. It is for this reason scaling from prototype to production often leads to what we call bill shock,” says Steve Fleurant, CEO at Clair Services.
There’s a whole pile of other associated costs to consider in the total cost of ownership calculation too.
“It is clear the long-term operational costs of LLMs will be more than just software capabilities. For now, we are seeing indications that there is an uptick in managed service provider support for data management, tagging, cleansing and governance work, and we expect that trend to grow in the coming months and years. LLMs, and AI more broadly, put immense pressure on an organization to validate and organize data and make it available to support the models, but most large enterprises have underinvested in this work over the last decades,” says Alex Bakker, distinguished analyst, with global technology research and advisory firm ISG.
“Over time, as organizations improve their data architectures and modernize their data assets, the overhead of remediation work will likely decrease, but costs associated with the increased use of data — higher network consumption, greater hardware requirements for supporting computations, etc. — will increase. Overall, the advent of AI probably represents a step-change increase in the amount of money organizations spend on their data,” Bakker adds.
Other standard business costs apply to models, too, and are adding strain to budgets. For example, backup models are a necessity and an additional cost.
“Risk management strategies must account for provider-specific characteristics. Organizations using OpenAI’s premium models often maintain Anthropic or Google alternatives as backups, despite the price differential. This redundancy adds to overall costs but is essential for business continuity,” says David Eller, group data product manager at Indicium.
There are other line items more specific to models that are bearing down on company budgets too.
“Even though there are API access fees to consider, the synthesis of the cost of operational overhead, fine-tuning, and compute resources can easily supersede it. The ownership cost should be considered thoroughly before implementation of AI technologies in the organization,” says Cache Merrill, founder of Zibtek, a software development company.
Merrill notes the following as specific costs to look and budget for:
-
Installation costs: Running the fine-tuned or proprietary LLMs may require NVIDIA A100 or H100 Graphics Processing Units which can cost $25,000+. In contrast, enterprise-grade cloud computing services costs between $5,000 – $15,000 for consistent usage on its own.
-
Model fine-tuning: The construction of a custom model LLM can cost tens of thousands of dollars or more based on the various parameters of the dataset and constructional aspects.
-
Software maintenance: With regular updates of models this software will also require security checks and compliance as well as increasing cost at each scale, which is usually neglected at the initial stages of the project.
-
Human oversight: Employing experts in a particular field to review and advise LLM results is becoming more common, which adds to the employees’ wage payout.
Some of the aforementioned costs are reduced by the use of SLMs but some are not, or not significantly so. But given that many organizations use both large and small models, and/or an assortment of model types, it’s fair to say that AI isn’t cheap, and we haven’t yet touched on energy and environmental costs. The best advice is to first establish solid use cases and choose models that precisely fit the tasks and a solid lead towards the ROI you’re aiming for.
SLM, LLM, and Hybrid Examples
If you’re unsure of – or have yet experimented with — small language models, here are a few examples to give you a starting point.
Horenshtien says SLM examples on her list include Mistral 7B, LLaMa 3, Phi 3, and Gemma. Top LLMs on her list are GPT-4, Claude 3.5, Falcon, Gemini, and Command R.
Examples of SLM vs LLM use cases in the real-world that Horenshtien says her company sees include:
-
In manufacturing, SLMs can predict equipment failures, while LLMs provide real-time insights from IoT data.
-
In retail, SLMs personalize recommendations; LLMs power virtual shopping assistants.
-
In healthcare, SLMs classify records, while LLMs summarize medical research for clinicians.
Meanwhile, Eloshvili says that “some of the more solid and affordable versions [of SLMs and other LLM alternatives], in my opinion, would include Google Nano, Meta Llama 3 Small, Mistral 7B and Microsoft Phi-3 Mini.”
But everyone understandably has their own list of SLMs based on varying criteria of importance to the beholder.
For example, Joseph Regensburger, vice president of research at Immuta, says “some cost-efficient SLM options include GPT-4o-mini, Gemini-flash, AWS Titan Text Lite, and Titan Text Express.”
“We use both LLMs and SLMs. The choice between these two models is use-case-specific. We have found SLMs are sufficiently effective for a number of traditional natural language processing tasks, such as sentence analysis. SLMs tend to handle the ambiguities inherent in language better than rule-based NLP approaches, at the same time offering a more cost-effective solution than LLMs. We have found that we need LLMs for tasks involving logical inference, text generation, or complex translation tasks,” Regensburger explains.
Rogers Jeffrey Leo John urges companies to consider SLM open-source models too. “If you are looking for small LLMs for your task, here are some good open- source/open-weight models to start with: Mistral 7B, Microsoft Phi, Falcon 7B, Google Gemma, and LLama3 8B.
And if you’re looking for some novel approaches to SLMs or a few other alternatives, Anatolii Kasianov, CTO of My Drama, a vertical video platform for unique and original short dramas and films, recommends: DistilBERT, TinyBERT, ALBERT, GPT-Neo (smaller versions), and FastText.
At the end of the day, the right LLM or SLM depends entirely on the needs of your projects or tasks. It’s also prudent to remember that “Generative AI doesn’t have to be the hammer for every nail,” says Sean Falconer, AI entrepreneur in residence at Confluent.