AI computing requirements

Whenever there is a new announcement or breakthrough with AI, it always strikes me how out of reach the results would be to replicate for individuals and small organizations. Machine learning algorithms, and especially deep learning with neural networks, are often so computationally expensive that they are infeasible to run without immense computing power.

As an example, OpenAI Five (OpenAI’s Dota 2 playing bot) used 128,000 CPUs and 256 GPUs which trained continuously for several months:

In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months.

OpenAI blog post “How to Train Your OpenAI Five”

Running a collection of more than a hundred thousand CPUs and hundreds of GPUs for ten months would cost several million dollars without discounts. Needless to say, a hobbyist such as myself would never be able to replicate those results. Cutting edge AI research like this has an implicit disclaimer: “Don’t try this at home”.

Even on a smaller scale, it is not always possible to run machine learning algorithms without certain trade-offs. I can sort a list of a million numbers in less than a second, and even re-compile a fairly complex web application in a few seconds, but training a lyrics-generating neural network on less than three thousand songs takes several hours to complete.

Although a comparison between number sorting and machine learning seems a bit silly, I wonder if we will ever see a huge reduction in computational complexity, similar to going from an algorithm like bubble sort to quicksort.1

Perhaps it is not fair to expect to be able to replicate the results of a cutting edge research institution such as OpenAI. Dota 2 is a very complex game, and reinforcement learning is an area of research that is developing fast. But even OpenAI acknowledges that recent improvements to their OpenAI Five bot are primarily due to increases in available computing power:

OpenAI Five’s victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we’d drive further progress by increasing our training scale.

OpenAI blog post “How to Train Your OpenAI Five”

It feels slightly unnerving to see that the potential AI technologies of the future are currently only within reach of a few companies with access to near-unlimited resources. On the other hand, the fact that we need to throw so many computers at mastering a game like Dota should be comforting for those with gloomy visions of the future :-)

Tax deductions are not free money

So you deducted your loan interest in your taxes. That means the interest was essentially free right? I could not quite figure out the answer to this question in my head recently, so I thought I would do a simple example and share it here, in case it could be useful for someone else.

First, the conclusion: Just because an expense is deductible does not make it free. It would have been better to not have the expense in the first place. However, if you cannot avoid the expense, then deductibles are of course great!

Let’s say we pay $10 interest on a loan, our income is $100 and we pay 50% tax. The table below shows the scenario where our interest is not deductible, compared to the scenario where our interest is 100% deductible1.

DescriptionAmount w/o deductible Amount w/ deductible
Income$100$100
Taxable income$100$90
Tax-$50-$45
Net income$50$55
Interest-$10-$10
Final income$40$45

In case we can deduct all of the interest, we would have $5 extra disposable income. Thus, about half of the interest in this example were “free” ($5). However, If we did not have to pay interest at all, we would of course have $50 in final income.

In other words, if we can completely avoid an expense, even if it is tax deductible, that is always the best financial outcome. In practice, this is not always possible, but I think it is a good principle to keep in mind.

Reinforcement learning

I have been looking into a machine learning technique called reinforcement learning (RL) lately. This was on my TODO for a while, and I must say, this field is incredibly exciting! I played around with some OpenAI Gym environments and re-implemented two RL algorithms mostly based on code I found from other authors.

After spending many hours on this, I can still only get my algorithm to solve the Cartpole problem, where the goal is to balance a pole on a moving cart (video below). I haven’t cracked the nut on a continuous action problem like Pendulum, where the goal is to swing the pendulum into an upright position and keep it there (video below).

Anyway, here is my implementation of the RL algorithms. Perhaps it will be useful for someone :-)