What are Small Language Models, and are they better than LLMs?

Why we need education-specific small language AI models

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

This worked better than either a pure transformer model or a pure Mamba model. But the key thing to understand is that Mamba has the potential to combine transformer-like performance with the efficiency of conventional RNNs. Another line of research has focused on efficiently scaling attention across multiple GPUs. One widely cited paper describes ring attention, which divides input tokens into blocks and assigns each block to a different GPU.

Nearly half of agentic AI projects will be killed by ’27 due to hype, costs, and risks

  • This is crucial for businesses where accuracy is paramount, from customer service to financial analysis.
  • But the models tend to be much smaller than those from established tech vendors, and therefore far less powerful or adaptable.
  • SLMs can minimize the risk of these issues by training on carefully curated, domain-specific datasets.
  • Those and other solutions will go a long way toward helping patients stay engaged with their health goals over time and remain adherent.
  • The company offers a range of language models that cater to specific industries.

This makes them attractive for applications that handle sensitive data, such as in healthcare or finance, where data breaches could have severe consequences. Additionally, the reduced computational requirements of SLMs make them more feasible to run locally on devices or on-premises servers, rather than relying on cloud infrastructure. This local processing can further improve data security and reduce the risk of exposure during data transfer. Interestingly, even smaller models like Mixtral 8x7B and Llama 2 – 70B are showing promising results in certain areas, such as reasoning and multi-choice questions, where they outperform some of their larger counterparts. This suggests that the size of the model may not be the sole determining factor in performance and that other aspects like architecture, training data, and fine-tuning techniques could play a significant role.

Denmark taps Microsoft to build world’s most powerful quantum computer

For example, to create Palmyra-Med — a healthcare oriented model — Writer took its base model, Palmyra-40B, and applied instruction fine-tuning. Through this process, the company trained the LLMs on curated medical datasets from two publicly available sources, PubMedQA and MedQA. Dan Diasio, Ernst & Young’s Global Artificial Intelligence Consulting Leader, agreed, adding that there’s currently a backlog of GPU orders.

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

That same month, Microsoft announced its GPT-4-based Dynamics 365 Copilot, which can automate some CRM and ERP tasks. Other genAI platforms can assist in writing code or performing HR functions, such as ranking job applicants from best to worst or recommending employees for promotions. Though “mega LLMs” use well-understood technology — and continue to improve — they can only be developed and maintained by tech giants with the enough resources, money and skills to do so, Litan argued.

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

Over the past few weeks, we have seen an ever-increasing number of companies that integrate generative artificial intelligence (AI) into their products. ChatGPT or other Large Language models are being added to features from Notion, Salesforce, Shopify, Quizlet, and others. What’s making this all possible now, he added, is the advances in the language models themselves. The second necessary element is a proprietary, accurate data set that is large enough to train AI across different subject verticals.

Companies Embracing Small Language Models

But I am pretty confident that scaling up transformer-based frontier models isn’t going to be a solution on its own. If we want models that can handle billions of tokens—and many people do—we’re going to need to think outside the box. So while the benefits of longer context windows is obvious, the best strategy to get there is not. In the short term, AI companies may continue using clever efficiency and scaling hacks (like FlashAttention and Ring Attention) to scale up vanilla LLMs. Longer term, we may see growing interest in Mamba and perhaps other attention-free architectures. Or maybe someone will come up with a totally new architecture that renders transformers obsolete.

In this article, I’ll discuss why LLMs are useful and scary at the same time. I want to give you a better idea of what these tools offer and why you need to stay cautious about them. “If you’re a retailer and you’re going to toss tens of thousands of products into the model over the next few years, that’s certainly an LLM,” Sahota says. Clinicians could use an SLM to analyze patient data, extract relevant information, and generate diagnoses and treatment options.

Tools, Frameworks And Real-World Implementations

A chip shortage not only creates problems for tech firms making LLMs, but also for user companies seeking to tweak models or build their own proprietary LLMs. In recent months, this large language model (LLM) has been highlighted across countless outlets, but many IT experts are still figuring out its potential. Some people might think ChatGPT might replace their jobs, while others believe it might streamline their work. If the dataset is very small, controlled, and available, such as HR documents or product descriptions, it makes great sense to use an SLM.

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

Machine learning YouTuber Yannic Kilcher wasn’t too impressed by Google’s approach. In 1999, Nvidia started selling graphics processing units (GPUs) to speed up the rendering of three-dimensional games like Quake III Arena. The job of these PC add-on cards was to rapidly draw thousands of triangles that made up walls, weapons, monsters, and other objects in a game. Chipmakers started making CPUs that could execute more than one instruction at a time. But they were held back by a programming paradigm that requires instructions to mostly be executed in order.

The adoption of generative artificial intelligence (genAI) tools is on a steep incline. Organizations plan to invest 10% to 15% more on AI initiatives over the next year and a half compared to calendar year 2022, according to an IDC survey of more than 2,000 IT and line-of-business decision makers. LLMs not only allow you to create content, but you can use them to generate content in various languages. Organizations are more likely to implement a portfolio of models, each selected to suit a specific scenario. Sutskever’s comment comes amidst speculation that the speed of progress in large language model (LLM) was hitting a wall as scaling was reaching its digital end.

Bài viết liên quan



    Tư vấn miễn phí (24/7) 0905273396