Industry Technology

AI computing requirements

Whenever there is a new announcement or breakthrough with AI, it always strikes me how out of reach the results would be to replicate for individuals and small organizations. Machine learning algorithms, and especially deep learning with neural networks, are often so computationally expensive that they are infeasible to run without immense computing power.

As an example, OpenAI Five (OpenAI’s Dota 2 playing bot) used 128,000 CPUs and 256 GPUs which trained continuously for several months:

In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months.

OpenAI blog post “How to Train Your OpenAI Five”

Running a collection of more than a hundred thousand CPUs and hundreds of GPUs for ten months would cost several million dollars without discounts. Needless to say, a hobbyist such as myself would never be able to replicate those results. Cutting edge AI research like this has an implicit disclaimer: “Don’t try this at home”.

Even on a smaller scale, it is not always possible to run machine learning algorithms without certain trade-offs. I can sort a list of a million numbers in less than a second, and even re-compile a fairly complex web application in a few seconds, but training a lyrics-generating neural network on less than three thousand songs takes several hours to complete.

Although a comparison between number sorting and machine learning seems a bit silly, I wonder if we will ever see a huge reduction in computational complexity, similar to going from an algorithm like bubble sort to quicksort.1

Perhaps it is not fair to expect to be able to replicate the results of a cutting edge research institution such as OpenAI. Dota 2 is a very complex game, and reinforcement learning is an area of research that is developing fast. But even OpenAI acknowledges that recent improvements to their OpenAI Five bot are primarily due to increases in available computing power:

OpenAI Five’s victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we’d drive further progress by increasing our training scale.

OpenAI blog post “How to Train Your OpenAI Five”

It feels slightly unnerving to see that the potential AI technologies of the future are currently only within reach of a few companies with access to near-unlimited resources. On the other hand, the fact that we need to throw so many computers at mastering a game like Dota should be comforting for those with gloomy visions of the future :-)

Industry Software


Fractal simulation
A fractal simulation from
I learned about complexity studying computer science. Complexity was often measured in space and time, denoted by Ω, Θ and Ο. We learned about the fastest algorithms, the “hardest” problems and the methods to solve them or just prove that they are hard. That is what “complexity” meant to me.

And then I started working with real world problems and complexity started meaning something else. It is no longer about NP-completeness, lower-bound running time or even algorithms. I have worked almost for a year on the same project 1 and throughout this project, the most difficult tasks have been figuring out the requirements of the end-user and transferring these requirements into an actual user interface. The complexity of a task is not measured in upper-bound running time and space requirements but in number of meetings and emails. And while the complexity of algorithms does not change after they have been implemented, the real world task changes its complexity whenever there is a change in requirements.

When I step back and look at the output of the current project, I see nothing else there. All the complexity lies in communication and that is something different.