Exponential growth in acronyms: CPU/GPU/TPU/IPU/DPU/ASIC
"If you don't want to boil a small lake, you might want specialized hardware" - Yann LeCun
Best practice hardware implementation of deep learning seems to be using (Nvidia) GPUs for training, but for inference the GPUs tend not to be efficient enough for probable applications. Incumbents and startups alike see this as an opportunity to build specialized inference based hardware (although some are also attempting to unseat Nvidia as leader in training hardware). Below is an aggregation of new hardware approaches for DL.
Incumbents
Alphabet:
Difficult to even list Google as an incumbent but they have moved into the space. Google built the Tensor Processing Unit TPU "several years ago" originally for internal use. It has been used to power Google tools like Inbox smart reply, street view and voice recognition. The two original engineers that led the development of Google's TPU has since left the company and created Groq, Inc. (more below).
IBM:
Building a "brain-inspired computer", apparently over the course of the previous 6 years, called TrueNorth. It breaks with the 70 year old Von-Neumann traditional architecture, literally designed from the ground up. TrueNorth is again seemingly purpose built for inference, with a research paper claiming the chip is able to perform inference at 6,000 frames per second per Watt (better than the Nvidia Tesla P4 GPU at 160 fps/Watt). I haven't really heard much buzz around TrueNorth, it remains to be seen how much of a differentiation/impact it can have in adoption.
Intel:
Recently acquired Nervana Systems for a reported $400m+. Nervana is a "full-stack solution that is optimized from algorithms down to silicon to solve machine learning problems at scale." In terms of hardware, Nervana is building Lake Crest, a.k.a the Nervana Engine. Lake Crest is an ASIC custom designed for deep learning (to be released "in 2017"). It quotes interestingly a 10x increase in training speeds. Intel presumably paid a nice price for Nervana since they don't want to miss out on the ML wave (as pundits noted they did on the global emergence of mobile dominance).
NVIDIA:
Some have claimed that Nvidia's dominance of the early machine learning revolution was a lucky break. In any case it seems to be evident that they have at least dominated the training market. It appears the new Pascal architecture (Tesla GPUs) are the architecture that Nvidia has deployed for inference. With so much activity happening in inference specific DL hardware, I'm not sure that Nvidia will have such a dominant position as they do with training.
Qualcomm:
The ARM community page documents methods of "tweaking models and codes" to use Qualcomm's SnapDragon chips for efficient inference. Not much has been made (as far as I can tell) on Qualcomm's position as inference player. But it would be very interesting given their dominant mobile position. Yann LeCun did mention in March in a Wired article that Facebook is helping the chip maker "develop technologies related to machine learning". Qualcomm's position in the emerging landscape of training/inference focused hardware for machine learning seems to be an open question and outside relationships with Facebook and others, they don't seem to have made explicit, bold public moves.
Startups
Not a lot of information is public on these startups since, justifiably for competitive reasons they deliberately keep information from the public eye. Nevertheless, below is a summary of the limited information available.
Cerebras Systems:
Startup formed in late 2016 that now (June 2017) lists 32 employees on LinkedIn. Formed by serial entrepreneurs Andrew Feldman and Gary Lauterbach. Andrew Feldman previous Co-Founded SeaMicro with Gary Lauterbach (acquired by AMD for $355m) that built low power server technology. Cerebras raised a reported $25m - $30m from Benchmark and others.
Deep Vision:
Building "ultra low power processor that is build from the ground up" to be deployed in mobile settings; IoT, drones, phones etc. Known investors include Comet Labs, Danhua Capital and Ideal Bulb ventures.
Graphcore:
Building an "IPU - Intelligence Processing Unit". Interestingly, Graphcore is a proponent of integrating training and inference on the same chip. Grapchore has raised $30m from Amadeus Capital, Samsung Strategy and Innovation Center and Foundation Capital among others and is based in the U.K.
Groq:
This is the interesting one. Groq was founded by Jonathan Ross and Douglas Wightman, both former Google engineers. Ross co-founded the TPU unit as part of his '20% time'. Ross then went on to be a 'Rapid Evaluator' at Google[X]. As the story goes, Chamath Palihapitiya spent one and a half years trying to find the secretive TPU group engineers within Google after hearing they had build their own hardware on a Google earnings call. Groq now boasts 8 of the 10 original engineers on the Google TPU project and raised a reported $10.3m from Chamath. Indeed he believes they could be building a "fundamental building block for the next generation of computing"...sign me up.
Wave Computing:
"A plug and play solution for training neural networks in the data center". Wave's dataflow systems utilize "Dataflow Processing Units, DPUs". It is claimed on the Wave website that it delivers up to 1000x increase in training performance (speed). Wave has raised an undisclosed amount from Tallwood Venture Capital and Southern Cross Venture Partners.
Note: Cognite Ventures has put together a nicely comprehensive, and vertically categorized list of interesting machine learning startups.