Many of the smart/IoT devices you’ll purchase are powered by some form of Artificial Intelligence (AI)—be it voice assistants, facial recognition cameras, or even your PC. These don’t work via magic, however, and need something to power all of the data-processing they do. For some devices that could be done in the cloud, by vast datacentres. Other devices will do all their processing on the devices themselves, through an AI chip.
But what is an AI chip? And how does it differ from the various other chips you may find in a device? This article will highlight the importance of AI chips, the different kinds of AI chips that are used for different applications, and the benefits of using AI chips in devices.
AI processing units and what they are for
Other chips and why they’re not great for AI
About the author
Albert Liu is the Founder and CEO of Kneron.
In the 1980s, we saw the rise of the personal computer. This proliferation was enabled by the CPU (central processing unit) which performs basic arithmetic, logic, controlling, and input/output operations specified by the instructions in a program. It is the brains of your computer. There are a number of giants in the CPU field, including Intel and AMD.
When speaking of evolution in CPUs, however, we must also mention ARM, whose chip architectures started in the 1980s in personal computers, but didn’t become a dominant player until the rise of mobile computing, or the smartphone and to a lesser extent tablets. By 2005, 98% of all mobile phones sold were using at least some form of an ARM architecture. In 2013, 10 billion were produced and ARM-based chips are found in nearly 60 percent of the world’s mobile devices. ARM is an important part of the AI chip space, which we’ll talk about later.
Then, In the 1990s, real-time 3D graphics became increasingly common in arcade, computer and console games, which led to an increasing demand for hardware-accelerated 3D graphics. Yet another hardware giant, NVIDIA, rose to meet this demand with the GPU (graphics processing unit), specialized in computer graphics and image processing. NVIDIA recently announced a deal to purchase ARM for $40 billion.
The AI processing unit
While typically GPUs are better than CPUs when it comes to AI processing, they’re not perfect. The industry needs specialised processors to enable efficient processing of AI applications, modelling and inference. As a result, chip designers are now working to create processing units optimized for executing these algorithms. These come under many names, such as NPU, TPU, DPU, SPU etc., but a catchall term can be the AI processing unit (AI PU).
The AI PU was created to execute machine learning algorithms, typically by operating on predictive models such as artificial neural networks. They are usually classified as either training or inference as these processes are generally performed independently.
Some applications we already see in the real world:
- Monitoring a system or area from threats like a security system involving real time facial recognition (IP cams, door cameras, etc.)
- Chatbots for retail or businesses that interact with customers
- Natural language processing for voice assistants
AI processors vs GPUs
But wait a minute, some people may ask—isn’t the GPU already capable of executing AI models? Well yes, that’s true. The GPU does in fact have some properties that are convenient for processing AI models.
GPUs process graphics, which are 2 dimensional or sometimes 3 dimensional, and thus requires parallel processing of multiple strings of functions at once. AI neural networks too require parallel processing, because they have nodes that branch out much like a neuron does in the brain of an animal. The GPU does this part just fine.
However, neural networks also require convolution, and this is where the GPU stumbles. In short, GPUs are fundamentally optimized for graphics, not neural networks—they are at best a surrogate.
Another important factor that needs to be taken into account is the accelerated rate of AI development at the moment. Researchers and computer scientists around the world are constantly elevating the standards of AI and machine learning at an exponential rate that CPU and GPU advancement, as catch-all hardware, simply cannot keep up with.
Moore’s Law states that the number of transistors in a dense integrated circuit (IC) doubles about every two years. But Moore’s Law is dying, and even at its best could not keep up with the pace of AI development.
The acceleration of AI will ultimately rely on a specialized AI accelerator, such as the AI PU. AI PUs are generally required for the following purposes:
- Accelerate the computation of Machine Learning tasks by several folds (nearly 10K times) as compared to GPUs
- Consume low power and improve resource utilization for Machine Learning tasks as compared to GPUs and CPUs
The components of an AI SoC
While the AI PU forms the brain of an AI System on a chip (SoC), it is just one part of a complex series of components that makes up the chip. Here, we’ll break down the AI SoC, the components paired with the AI PU, and how they work together.
As outlined above, this is the neural processing unit or the matrix multiplication engine where the core operations of an AI SoC are carried out. We’ve already gone into plenty of detail there, but it’s worth pointing out that for AI chipmakers, this is also the secret sauce of where any AI SoC stands out from all the other AI SoCs; like a watermark of the actual capabilities of your team.
These are processors, usually based on RISC-V (open-source, designed by the University of California Berkeley), ARM (designed by ARM Holdings), or custom-logic instruction set architectures (ISA) which are used to control and communicate with all the other blocks and the external processor.
To control locally or not is a fundamental question that is answered by why this chip is being created, where it’s being used, and who it’s being used by; every chipmaker needs to answer these questions before deciding on this fundamental question.
This is the local memory used to store the model or intermediate outputs. Think of it like your home fridge. Though its storage is small, it’s extremely fast and convenient to grab stuff (in this case data) or put them back. In certain use cases, especially related to edge AI, that speed is vital, like a car that needs to put on its brakes when a pedestrian suddenly appears on the road.
How much SRAM you include in a chip is a decision based on cost vs performance. A bigger SRAM pool requires a higher upfront cost, but less trips to the DRAM (which is the typical, slower, cheaper memory you might find on a motherboard or as a stick slotted into the motherboard of a desktop PC) so it pays for itself in the long run.
On the other hand, a smaller SRAM pool has lower upfront costs, but requires more trips to the DRAM; this is less efficient, but if the market dictates a more affordable chip is required for a particular use case, it may be required to cut costs here.
Speed of processing is the difference between bigger SRAM pools and smaller pools, just like RAM affects your computer’s performance and ability to handle performance needs.
These blocks are needed to connect the SoC to components outside of the SoC, for example the DRAM and potentially an external processor. These interfaces are vital for the AI SoC to maximize its potential performance and application, otherwise you’ll create bottlenecks. For example, if a V8 engine was connected to a 4 gallon gas tank, it would have to go pump gas every few blocks. Thus the interface and what it connects to (DRAM, external processor, etc) needs to bring out the potential performance of the AI SoC
DDR, for example, is an interface for DRAM. So if the SRAM is like your fridge at home, think of DRAM like the grocery store. It’s got way bigger storage, but it takes much more time to go retrieve items and come back home.
The interconnect fabric is the connection between the processors (AI PU, controllers) and all the other modules on the SoC. Like the I/O, the Interconnect Fabric is essential in extracting all of the performance of an AI SoC. We only generally become aware of the Interconnect Fabric in a chip if it’s not up to scratch.
No matter how fast or groundbreaking your processors are, the innovations only matter if your interconnect fabric can keep up and not create latency that bottlenecks the overall performance, just like not enough lanes on the highway can cause traffic during rush hour.
All of these components are crucial parts of an AI chip. While different chips may have extra components or put differing priorities on investment into these components, as outlined with SRAM above, these essential components work together in a symbiotic manner to ensure your AI chip can process AI models quickly and efficiently. Unlike CPUs and GPUs, the design of AI SoC is far from mature. This section of the industry is continually developing at rapid speed, we continue to see advancements in in the design of AI SoC.
AI chips and their use cases
There are many different chips with different names on the market, all with different naming schemes depending on which company designs them. These chips have different use cases, both in terms of the models they’re used for, and the real-world applications they’re designed to accelerate.
Training and inference
Artificial intelligence is essentially the simulation of the human brain using artificial neural networks, which are meant to act as substitutes for the biological neural networks in our brains. A neural network is made up of a bunch of nodes which work together, and can be called upon to execute a model.
This is where AI chips come into play. They are particularly good at dealing with these artificial neural networks, and are designed to do two things with them: training and inference.
Chips designed for training essentially act as teachers for the network, like a kid in school. A raw neural network is initially under-developed and taught, or trained, by inputting masses of data. Training is very compute-intensive, so we need AI chips focused on training that are designed to be able to process this data quickly and efficiently. The more powerful the chip, the faster the network learns.
Once a network has been trained, it needs chips designed for inference in order to use the data in the real world, for things like facial recognition, gesture recognition, natural language processing, image searching, spam filtering etc. think of inference as the aspect of AI systems that you’re most likely to see in action, unless you work in AI development on the training side.
You can think of training as building a dictionary, while inference is akin to looking up words and understanding how to use them. Both are necessary and symbiotic.
It’s worth noting that chips designed for training can also inference, but inference chips cannot do training.
Cloud and edge
The other aspect of an AI chip we need to be aware of is whether it is designed for cloud use cases or edge use cases, and whether we need an inference chip or training chip for those use cases.
Cloud computing is useful because of its accessibility, as its power can be utilised completely off-prem. You don’t need a chip on the device to handle any of the inference in those use cases, which can save on power and cost. It has downsides however when it comes to privacy and security, as the data is stored on cloud servers which can be hacked or mishandled. For inference use cases, it can also be less efficient as it’s less specialised than edge chips.
Chips that handle their inference on the edge are found on a device, for example a facial recognition camera. They’re more private and secure than using the cloud, as all data is stored on-device, and chips are generally designed for their specific purpose – for example, a facial recognition camera would use a chip that is particularly good at running models designed for facial recognition. They also have their cons, as adding another chip to a device increases cost and power consumption. It’s important to use an edge AI chip that balances cost and power to ensure the device is not too expensive for its market segment, or that it’s not too power-hungry, or simply not powerful enough to efficiently serve its purpose.
Here’s how these applications and chips are generally paired:
Cloud + Training
The purpose of this pairing is to develop AI models used for inference. These models are eventually refined into AI applications that are specific towards a use case. These chips are powerful and expensive to run, and are designed to train as quickly as possible.
Example systems include NVIDIA’s DGX-2 system, which totals 2 petaFLOPS of processing power. It is made up of 16 NVIDIA V100 Tensor Core GPUs. Another example is Intel Habana’s Gaudi chip.
Examples of applications that people interact with every day that require a lot of training include Facebook photos or Google translate.
As the complexity of these models increases every few months, the market for cloud and training will continue to be needed and relevant.
Cloud + Inference
The purpose of this pairing is for times when inference needs significant processing power, to the point where it would not be possible to do this inference on-device. This is because the application utilizes bigger models and processes a significant amount of data.
Sample chips here include Qualcomm’s Cloud AI 100, which are large chips used for AI in massive cloud datacentres. Another example is Alibaba’s Huanguang 800, or Graphcore’s Colossus MK2 GC200 IPU.
Where training chips were used to train Facebook’s photos or Google Translate, cloud inference chips are used to process the data you input using the models these companies created. Other examples include AI chatbots or most AI-powered services run by large technology companies.
Edge + Inference
Using on-device edge chips for inference removes any issues with network instability or latency, and is better for preserving privacy of data used, as well as security. There are no associated costs for using the bandwidth required to upload a lot of data, particularly visual data like images or video, so as long as cost and power-efficiency are balanced it can be cheaper and more efficient than cloud inference.
Examples here include Kneron’s own chips, including the KL520 and recently launched KL720 chip, which are lower-power, cost-efficient chips designed for on-device use. Other examples include Intel Movidius and Google’s Coral TPU.
Use cases include facial recognition surveillance cameras, cameras used in vehicles for pedestrian and hazard detection or drive awareness detection, and natural language processing for voice assistants.
All of these different types of chips and their different implementations, models, and use cases are essential for the development of the Artificial Intelligence of Things (AIoT) future. When supported by other nascent technologies like 5G, the possibilities only grow. AI is fast becoming a big part of our lives, both at home and at work, and development in the AI chip space will be rapid in order to accommodate our increasing reliance on the technology.