Meta, the parent company of Facebook and Instagram, has been making significant advancements in artificial intelligence (AI) technology.
The tech giant recently showcased its in-house AI inference chip, known as the Meta Training and Inference Accelerator (MTIA), which plays a crucial role in powering the ranking and recommendation models on its platforms.
Originally launched in 2023, this AI inference accelerator was designed to handle the computational requirements for AI-powered recommendations.
Now, Meta has shared updates about its next-generation MTIA, which promises to deliver even better performance.
The Journey So Far: MTIA Evolution
Meta’s first-generation MTIA chip was built to enhance the recommendation engines that drive content suggestions on platforms like Facebook and Instagram. This chip is specifically focused on handling AI inference tasks, which involve processing and making sense of large amounts of data for generating recommendations.
While the first version of the MTIA chip was a step forward, the company updated the chip in April 2024, doubling its compute power and memory bandwidth, making it more efficient in handling the growing demands of AI applications.
At the Hot Chips symposium in August 2024, Meta shared insights into the challenges of using GPUs (graphics processing units) for AI recommendations.
While GPUs are widely used for AI training tasks, Meta revealed that they often struggle with inference tasks in large-scale recommendation engines. This is due to the resource-intensive nature of these tasks, combined with the growing demand for Generative AI.
Addressing GPU Limitations
Meta has been focusing on creating specialized chips like MTIA to overcome the limitations faced with GPUs. One of the main challenges of using GPUs for recommendation engines is that their peak performance doesn’t always translate to effective performance in real-world applications.
Large-scale deployments can also be very resource-intensive, leading to increased costs and capacity constraints, especially with the growing demands for AI services.
To address these issues, Meta has been improving the MTIA chip to offer better performance per watt and per total cost of ownership (TCO).
The new MTIA chip is designed to handle models across multiple Meta services more efficiently, improving developer efficiency and speeding up deployment.
Next-Gen MTIA
The next-generation MTIA chip brings several upgrades, including a significant boost in performance. One of the key improvements is the introduction of the GEN-O-GEN system, which increases the GEMM TOPs (tensor operation performance) by 3.5 times to reach 177 TFLOPS at BF16 (16-bit floating point).
This enhancement improves hardware-based tensor quantization, allowing the chip to deliver accuracy similar to FP32 (32-bit floating point). It also optimizes the chip for PyTorch Eager Mode, allowing jobs to launch in under 1 microsecond and replace tasks in less than 0.5 microseconds.
In addition, the MTIA chip has TBE optimization, which improves the download and prefetch times for embedding indices, making it 2-3 times faster than the previous generation.
RISC-V Cores and LPDDR5 Memory
The MTIA chip is built using TSMC’s 5nm process and runs at 1.35 GHz, with a gate count of 2.35 billion. It offers impressive performance with 354 TOPS (tera operations per second) in INT8 and 177 TOPS in FP16 GEMM operations.
To support this high level of performance, the chip utilizes 128GB of LPDDR5 memory, providing a bandwidth of 204.8GB/s, all while operating within a 90-watt power envelope.
One notable feature of the MTIA chip is its use of RISC-V cores, which include both scalar and vector extensions.
These cores allow for more efficient processing of AI tasks. Additionally, the MTIA chip includes dual CPUs for added processing power.
Interestingly, Meta has hinted at the possibility of expanding memory via the PCIe switch and CPUs, though they have not yet deployed this option.
Looking Ahead
As Meta continues to refine its MTIA chips, the company aims to improve the overall performance and efficiency of its recommendation engines.
This will not only enhance the user experience on platforms like Facebook and Instagram but also help Meta keep up with the growing demands for AI-driven content recommendations.
By investing in specialized AI chips, Meta is positioning itself to lead the way in AI-powered recommendations, reducing its reliance on traditional GPUs and making its services more efficient.