The Intel Lunar Lake CPU platform is demonstrating impressive performance in AI Inference with the latest Meta LLaMA 3.2 models.
Meta has just launched its LLaMA 3.2, now providing more models for AI tasks. The original LLaMA (Large Language Model Meta AI) was released last year in February, which seeks to fulfill user requirements by responding to queries but it was limited to 8B and 70B parameters. LLaMA is now expanded to newer parameters, which are now fully supported on Intel AI hardware platforms.
Intel published an article that explains the performance gains the LLaMA 3.2 sees with Intel AI hardware such as Intel Gaudi AI accelerators, Intel eo processors, Intel Core Ultra "Lunar Lake" CPUs, and Intel Arc Graphics. In a LinkedIn post, Intel's VP and General Manager of Client AI and Technical Marketing, Robert Hallock, claimed that the Intel Lunar Lake Core Ultra processors are seeing great performance with LLaMA 3.2.
According to the claims, the flagship Intel Core Ultra 9 288V "Lunar Lake" CPU can achieve an ultra-low latency of just 28.5 ms for 32 input tokens and 31.4 ms for 1024 input tokens for the 3B model. This translates to roughly 32-35 tokens per second, which is an impressive performance in AI inference.
The Meta LLaMA 3.2 brings additional 1bn and 3bn parameter text-only models for basic text-based tasks, which were unavailable in the original 3.0 version. The 11bn parameter multi-modal model is also introduced with the latest version, which is much larger and can handle more complex operations such as interpreting images, charts, and graphs.
As illustrated in the GIF, users with Intel AI PC can run visual reasoning for analyzing & interpreting visual data to get meaningful responses. In the example, the Intel Core ultra 9 288V "Lunar Lake" CPU with built-in Arc 140V GPU is used in the LLaMA 3.2 11B Vision Instruct. This helps in identifying objects in an image, analyzing the elements, and providing a text-based response that explains them.
That said, Intel has been working on its AI inferencing framework called OpenVINO. This framework optimizes performance on Intel hardware to improve both performance and efficiency on PCs and edge devices. The Intel AI hardware also brings NPU(Neural Processing Unit) on the Intel Core platforms and Intel Xe Matrix Extensions acceleration on the built-in Intel Arc GPUs, which helps Intel AI PCs to achieve higher inference performance, especially for LLaMA 3.2 in 11B model for image reasoning at the edge.