AI computing requirements

Whenever there is a new announcement or breakthrough with AI, it always strikes me how out of reach the results would be to replicate for individuals and small organizations. Machine learning algorithms, and especially deep learning with neural networks, are often so computationally expensive that they are infeasible to run without immense computing power.

As an example, OpenAI Five (OpenAI’s Dota 2 playing bot) used 128,000 CPUs and 256 GPUs which trained continuously for several months:

In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months.
OpenAI blog post “How to Train Your OpenAI Five”

Running a collection of more than a hundred thousand CPUs and hundreds of GPUs for ten months would cost several million dollars without discounts. Needless to say, a hobbyist such as myself would never be able to replicate those results. Cutting edge AI research like this has an implicit disclaimer: “Don’t try this at home”.

Even on a smaller scale, it is not always possible to run machine learning algorithms without certain trade-offs. I can sort a list of a million numbers in less than a second, and even re-compile a fairly complex web application in a few seconds, but training a lyrics-generating neural network on less than three thousand songs takes several hours to complete.

Although a comparison between number sorting and machine learning seems a bit silly, I wonder if we will ever see a huge reduction in computational complexity, similar to going from an algorithm like bubble sort to quicksort.¹

Perhaps it is not fair to expect to be able to replicate the results of a cutting edge research institution such as OpenAI. Dota 2 is a very complex game, and reinforcement learning is an area of research that is developing fast. But even OpenAI acknowledges that recent improvements to their OpenAI Five bot are primarily due to increases in available computing power:

OpenAI Five’s victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we’d drive further progress by increasing our training scale.
OpenAI blog post “How to Train Your OpenAI Five”

It feels slightly unnerving to see that the potential AI technologies of the future are currently only within reach of a few companies with access to near-unlimited resources. On the other hand, the fact that we need to throw so many computers at mastering a game like Dota should be comforting for those with gloomy visions of the future :-)

Footnotes

While writing this post, I realize I never really looked into the theoretical lower bounds on complexity for deep learning. There might be something fundamental missing from my knowledge there.

Footnotes

Leave a comment Cancel reply