Numenta achieves 123X inference performance improvement for BERT transformers on Intel Xeon processor family

Numenta achieves 123X inference performance improvement for BERT transformers on Intel Xeon processor family

REDWOOD CITY, Calif.–(BUSINESS WIRE)–Numenta Inc. is applying decades of neuroscience research to the development of deep learning technologies, reporting breakthrough achievements in AI. In partnership with IntelNumenta reports that it has achieved unprecedented performance gains by applying its brain-based technology to transformer networks with Intel Xeon processors.

Numenta highlights these remarkable results on two Intel products announced today, the 4th Gen Intel Xeon Scalable processors (formerly codenamed Sapphire Rapids) and Intel Xeon CPU Max Series (formerly codenamed Sapphire Rapids + HBM). These results show the first commercial applications of Numenta’s technology in Conversational AI solutions.

Breaking latency barriers in Conversational AI

To allow consumers to engage in human-like interactions with computers, high-throughput, low-latency technologies are a requirement for Conversational AI, a rapidly growing market estimated to be a $40 billion industry by 2030. Transformer networks are the deep learning model of choice for these applications. But despite their high accuracy, the size and complexity of Transformers has made them expensive to deploy, until now.

In one notable example, Intel’s new exploits Intel Advanced Matrix Extensions (Intel AMX)reports Numenta a fantastic 123X throughput improvement vs. current generation AMD Milan CPU implementations for BERT inference on short text sequences while breaking the 10ms latency barrier required for many language model applications. BERT is the popular transformer-based machine learning technology for Natural Language Processing (NLP) pre-training developed by Google.

Combines its proprietary technology with 4th Gen Intel Xeon Scalable processors, Numenta also reports a 62x throughput improvement over Intel’s previous generation Intel Xeon Scalable processors.

Numenta’s dramatic acceleration of Transformer networks provides high throughput with ultra-low latencies for inferences with 4th Gen Intel Xeon Scalable processors. These results illustrate a cost-effective option for running the large deep learning models needed for Conversational AI and other real-time AI applications.

These ground-breaking results turn Transformers from a cumbersome technology into a high-performance one solution for real-time NLP applications and open up new opportunities for companies with performance-sensitive AI applications,” commented Subutai Ahmad, CEO of Numenta. “Customers will be able to use the combination of Numenta and 4th Gen Intel Xeon Scalable processors to deploy real-time apps in an easy and cost-effective way.

Numenta’s results on Intel’s new hardware enable the deployment of state-of-the-art transformers at an unmatched price/performance point, greatly expanding the design space for conversational interaction and ultimately increasing top-line value,” said Tom Ngo, CEO of, a leading Conversational AI company whose Sales Accelerator product helps high-touch sales teams in multiple industries meet more leads and shorten their sales cycles.

Unmatched capacity for high-volume document processing

Numenta’s AI technology also dramatically accelerates NLP applications that rely on analyzing large collections of documents. When using Transformers for document understanding, long sequence lengths are required to include the full context of the document. These long sequences require high data transfer rates, and off-chip bandwidth thus becomes the limiting factor. Using the new one Intel Xeon CPU Max series, Numenta demonstrates that it can optimize the BERT-Large model to process large text documents, giving unprecedented results 20x throughput rate for long sequence lengths of 512.

“Numenta and Intel are collaborating to deliver significant performance gains to Numenta’s AI solutions through the Intel Xeon CPU Max Series and 4th generation Intel Xeon Scalable processors. We are excited to work together to unlock significant throughput performance acceleration for previously bandwidth- or latency-bound AI applications such as Conversational AI and processing large documents.” said Scott Clark, vice president and general manager of AI and HPC Application Level Engineering, Intel.

This type of innovation is absolutely transformative for our customers, enabling cost-effective scaling for the first time“, Ahmad added.


To give customers the benefit of its AI products and solutions as quickly as possible, Numenta recently announced a private beta program. Numenta actively engages with startups and Global 100 companies to apply its platform technology to a wide range of NLP and Computer Vision applications.

Customers can apply for the beta program at

About Numenta

Numenta has developed breakthrough advances in AI technology that enable customers to achieve from 10 to over 100X improvement in performance across broad use cases, such as natural language processing and computer vision. Founded in 2005 by computing industry pioneers Jeff Hawkins and Donna Dubinsky, Numenta has two decades of research deriving proprietary technology from neuroscience. Leveraging fundamental insights from its neuroscience research, Numenta has defined new architectures, data structures and algorithms that deliver disruptive performance improvements. Numenta is engaged with several Global 100 companies to apply its platform technology across the full spectrum of AI, from model development to deployment – ​​ultimately enabling entirely new categories of applications.

Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries.

See also  Doubling Down on Meta (NASDAQ: META)

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *