In my continuing exploration of generative adversarial networks, I found “The Painting Dataset” and wrote a short script to extract the images from it. Only about 1200 paintings seems to be valid, but it is at least something.
For my wife’s birthday this year, I created a prototype of a small game-like 3D environment that she could “walk around” in using the keyboard and mouse. The idea was to have an “exhibit” for each year we have known each other, consisting of a few photos from that year as well as a short text describing major events that happened during the year.
Unfortunately, I am terrible at planning things and started a bit too late so I did not finish the game in time for the birthday.
With the help of Unity, I managed to finish the project eventually. This post is about that journey as well as some observations on Unity and game design in general.
Note: I usually make my stuff available online, e.g. on Github. Not this time though. This was a personal thing I created for my wife. I’m reproducing the images for this blog post with her permission :-)
Focus on the content
The prototype was created for the browser and used three.js for most of the 3D stuff. I am a fan of WebGL, and I had some prior experience from working on Photo Amaze and Zombie Hugs, so it seemed like a natural choice.
Here are two screenshots from the initial prototype:
The thick fog was added as a way to limit the initial view and ensure only one piece of text was visible at a time, and the rest was hidden in the fog until the player started moving forward. This fog then cleared up once the first exhibit was reached, making the scene more open.
I found photos for three out of ten exhibits and wrote the initial intro text, but the rest remained unfinished. One reason for this was that I quickly started obsessing over details like shadows and lighting instead of focusing on the core content, which was selecting good photos and writing the right text.
Missing the birthday deadline had both good and bad consequences. I was very disappointed and embarrassed that I only had an early prototype to show, but it also allowed me to step back and rethink the project.
Beyond the browser
An appealing aspect of using three.js is that everything has to be defined in code. This provides a lot of control, but I quickly realized that I was not able to iterate and tweak the experience very fast. I had wanted to get my feet wet with a more full-fledged game engine for a while, so this was a good opportunity to do that.1
After researching the pros and cons of different game engines as well as experimenting briefly with Unity, Unreal Engine, Godot and Babylon.js (spending way too long trying to make grass out of fur), I ended up sticking with Unity, because it has a native Linux editor and good platform support.2
Unity is easy to get started with and includes lots of helpful tools out of the box. For example, I had good initial impressions of the built-in terrain editor and tree creator, and it seemed very easy to set up a basic outdoor environment. The Unity asset store also has a generous offering of free assets, including a ready-made first-person controller which is very handy.
Once the surface is scratched though, it becomes apparent that Unity is neither perfect nor complete, and creating games is not easy. Some of the challenges I faced were fun while others were frustrating.
The first (fun) obstacle I came across was creating the photo exhibits.
3D modeling the photo exhibit
As you can see in the prototype above, an “exhibit” has three wall-like structures with a photo. I wanted the pictures to protrude a bit from the walls, making it look like a white canvas is hanging from the wall, with the photo “painted” on top of it. The drawing on the right illustrates the idea.
I thought it would be super easy to do this in Unity. Just create two cubes (a canvas cube and a wall cube)3, flatten and stretch them a bit, and put them next to each other so they overlap.
Actually, this worked ok, but there were two problems:
- There was a weird flicker where the cubes touched each other.
- When adding the photo to the canvas cube, it showed the photo on all sides, not just the front.
I fixed the second problem by putting a “quad” — a flat surface — on top of the canvas cube (or rather, next to it). The wall structure thus consisted of three 3D objects that were technically separate from each other, and it did not look good. There was still a weird flicker, and it also felt like the wrong way to solve the problem.
So I hit an early roadblock: Either I had to define the wall structure programmatically, or I had to make my own 3D model. I opted for the latter choice.
After going through some basic tutorials for Blender, I was able to create the wall structure and learn a thing or two about 3D modeling along the way. This is the result in all its simple glory:
Even though I only did very basic stuff in Blender, it felt like a big win to be able to make basic models. I also created an exhibit sign and a cylinder with one open end (to simulate a tunnel or tube). All models can be found here.
Free models are great
Besides the photos and text, I decided to also create a “display” for each exhibit. This consisted of a 3D model or effect that was either a direct or indirect reference to the year the exhibit was for. For example, I used a Big Foot model standing on top of Mt Saint Helens for the year when we visited the area.
Using pre-made models was a fun and easy way to make the exhibits a bit more interesting. It took some time to find the right model, and it sometimes needed tweaking after import, but it gave me the opportunity to include visuals I could not have created on my own.
For the record, here is the list of the models I used:
- 2nd – Restoration // Mona Mono by Ringo Gunther
- 15legend by Amphivena
- Bottlenose Dolphin by popcornbag
- Crocigator by k0bold
- Doric Column with material by e.g.dev5
- Flying Island 2 by guten_morgen
- Gold Band Ring by GetDeadEntertainment
- Low Poly Jumping Kangaroo by sirleech
- Lowpoly Coronavirus (SARS-CoV-2) by tales
- Mad Cow: Furry road by Nours
- MtHelens_Aug71980 by SValkan
- UNICORNISAURUS by Freddy Drabble
All the models were found on Sketchfab, an online community with a lot of 3D models available either for free or purchase. It was a nice discovery!
An extra dimension
Besides downloading models, I also researched the possibility of adding models to the scene by simply scanning my environment or specific objects.
A technique known as photogrammetry makes it possible to turn multiple photos into 3D models. I played around with an open-source tool called Meshroom which is amazingly simple to work with. Just add a lot of photos from different viewing angles, wait a few hours, and a finished 3D model comes out.
A scan of a birch log from the forest made its way into the scene:
I did not get outstanding results, but it is worth noting I also just took the photos with my bad phone camera and spent very little time making sure I got good shots from all angles.
It is mindblowing that it is possible to go from 2D photos to 3D model, and I will definitely revisit photogrammetry again in the future.
Creating fake rain
A small feature I had fun creating was a super simple rain effect. There are numerous weather system plugins available for Unity (some are free), and there was even a “hose” effect available in the standard assets that kind-of did the trick (it simulates spraying water). But I needed a more uniform down-pour, and I really just needed something simple.
The effect was created by taking a bunch of small particles, give them a blue/white gradient color, apply gravity too them, and that is basically it.
I reused a texture from a water surface effect in the standard assets to give the raindrops a blue-ish appearance. The tails on the raindrops are automatically created by the particle system when using a render setting called “stretched billboard”. A bit of noise was added to the movement of the rain drops, so the rain does not fall straight down but looks slightly more natural and chaotic.
After playing around with the particle speed and size, I got the right look and feel I wanted. I was expecting this to be much more complicated, so it was a nice surprise when the process was fairly straightforward.
Designing for the player
The most enjoyable aspect of creating this game-like experience was going through our old photos to find a few that represented each year as well as thinking about the various events that happened throughout the year. It was a nice trip down memory lane.
Although the photos and text tell a story which is sequential in nature, the question was if they necessarily had to be experienced sequentially as well.
I considered two ways to handle progression through the game:
- Limit the initial environment with something like walls and corridors, guiding the player from exhibit to exhibit.
- Make the environment completely open, allowing the player to freely visit each exhibit in any order and with no restrictions.
The first option, limiting the player, would give me more control over the player’s movements and the “narrative” (if there was such a thing) of the experience, but it also felt like it would constrain the player. This can sometimes be a good technique to control pacing (a lot of games do this), but here it seemed unnecessarily constricting.
So I decided to go for 2., the open environment, but I still wanted to provide some guidance to help navigate the scene. I did this by creating a dirt path that leads through the grass between the exhibits. I thought it was a nice, obvious and non-constricting way to guide the player a bit:
During the first 5-60 seconds of the game, the player is presented with the movement keys and the purpose of the game in a series of three welcome messages that show up on the screen as 3D text.
I wanted to be absolutely sure that the player could not miss the information, especially the movement keys. The way I achieved this was to add some constraint to the otherwise open environment at the initial stage of the game.
If you look closely at the aerial top view above, you might notice a long green shape at the center of the scene. This is actually a cylinder (or tunnel) floating 50 meters above the ground. The player starts the game inside the tunnel, and can only move forward and backward, ensuring that the information is difficult (but not impossible) to miss.
Furthermore, during the first 1-2 seconds, the camera is actually fixed in place, showing the movement keys while the start menu is fading out.
To make the cylinder/tunnel slightly more interesting, I painted it a bright green and used a normal map from a tree bark texture to give some resemblance of walking inside of a tree trunk.
When the player steps over the edge of the cylinder, they land near the first exhibit.4
I hope the above sections have provided at least some idea of how my little game-like experience turned out. I have not described everything, and there were even a few more ideas that did not make their way into the game at all, but I decided to stop the project when the core content was in a state I was satisfied with.
And then it was time to launch it, i.e. get my wife to play the game. I really wanted to see her reaction while playing, but I let her go through it by herself at her own pace.
I got quite emotional about it actually. Having revisited the memories of nice moments from the past while working on the project, I was already on a trip of nostalgia. Showing the game to my wife was the culmination of that journey, and when I heard a giggle coming from her room, I shed a little tear.
Even for a simple game-like experience like the one I created, there are still many little decisions that go into making it. Thinking through these decisions, playing around with solutions and seeing the result is often rewarding and interesting, and I can totally understand the appeal to work professionally with games and similar creative endeavors.
I also have a newfound appreciation for how long it takes to produce game content. Even though I am an amateur in everything that has to do with game design (except for writing code), and my project was extremely small in scope, it is still easy to see why it takes so long to create games, and why people specialize in modeling, programming, animation, sound design etc. instead of trying to do everything.
I do not think this is the last time I will dabble with creating games. I hope to be able to combine aspects of my professional work-life (data science/ML/AI) with game creation. That would be a win-win for a side-project indeed.
Continue on page 2 if you are interested in reading a bit more about my experience with Unity. If this does not sound interesting, you can just stop reading here. Thank you for making it this far :-)
Some weeks ago, I was at a get together with my old university friends that we call “the hack day”. It usually revolves around drinking lots of coffee and soft drinks, eating loads of chips and candy, as well as working on the occasional masterpiece project like Zombie Hugs.
At the end of the day, one of us (I cannot remember who now) mentioned how useful it would be to have a neural network that could add numbers together.
The remark was meant as a joke, but it got me thinking, and on the way home on the train, I pieced together some code for creating a neural network that could perform addition on two numbers between 0 and 9. Here’s the original code.
Warning: The rest of this post is probably going to be a complete waste of your time. The whole premise for this post is based on a terrible idea and provides no value to humanity. Read on at your own risk :-)
Making addition more interesting
It is worth mentioning that it is actually trivial to make a neural network add numbers. For example, if we want to add two numbers, we can construct a network with two inputs and one output, set the weights between input and output to 1, and use a linear activation function on the output, as illustrated below for 20 + 22:
It is not really the network itself that performs addition. Rather, it just takes advantage of the fact that a neural network uses addition as a basic building block in its design.
Things get more interesting if we add a hidden layer and use a non-linear activation function like a sigmoid, thereby forcing the output of the hidden layers to be a list of numbers between 0 and 1. The final output is still a single number which is a linear combination of the output of the hidden layer. Here is a network with 4 hidden nodes as an example:
When we ask a computer to perform 2 + 1, the computer is really doing 10 + 01 (2 is 10 in binary and 1 is 01). I had this thought at the back of my mind, that the neural network might “discover” an encoding in the hidden layer which was close to the binary representation of the input numbers.
For example, for the 4-node hidden layer network illustrated above, we could imagine the number from input1 being encoded in h1 and h2 and the number for input2 being encoded in h3 and h4.
For 2 + 1, the four hidden nodes would then be 1, 0, 0 and 1, and the final output would convert binary to decimal (2 and 1) and add the numbers together to get 3 as result:
Since the hidden nodes are restricted to be between 0 and 1, it seemed intuitive to me that a binary representation of the input would be a very effective way of encoding the data, and the network would thus discover this effective encoding, given enough data.
To be honest, I did not think this through very thoroughly. I should have realized that:
- The sigmoid function can produce any decimal number between 0 and 1, thus allowing for a very wide range of values. Many of these would probably work well for addition, so it is unlikely it would “choose” to produce zeros and ones only.
- It is unclear how a linear combination of input weights would produce the binary representation in the first place.
That second point is important. For the 2-bit (2 nodes) encoding, we would have to satisfy these equations (where S(x) is the sigmoid function and w1 and w2 are the weights from the input node to the 2-node hidden layer):
|Input number||Binary encoding||Equations|
|0||0,0||S(w1 · 0) ≈ 0|
S(w2 · 0) ≈ 0
|1||0,1||S(w1 · 1) ≈ 0|
S(w2 · 1) ≈ 1
|2||1,0||S(w1 · 2) ≈ 1|
S(w2 · 2) ≈ 0
|3||1,1||S(w1 · 3) ≈ 1|
S(w2 · 3) ≈ 1
Which weights w1 and w2 would satisfy these equations? Without providing proof, I actually think this is impossible. For example, both S(w2 · 1) ≈ 1 and S(w2 · 2) ≈ 0 cannot be satisfied at the same time. Disregarding the sigmoid function, this is like saying 2x = 0 and x = 1 which is not possible.
Regardless of the bad idea, false assumption or whatever, I still went ahead and made the following experiment:
- Use two input numbers.
- Use 1, 2, 4, 8 or 16 nodes in the hidden layer.
- Use mean squared error (MSE) on the predicted sum as loss function.
- Generate 10,000 pairs of numbers and their sum for training data.
- Use 20% of samples as validation data.
- Allow the sum of the two numbers to be at most 4, 8 or 16 bits large (i.e. 16, 256 and 65536).
- Train for at most 1000 epochs.
When measuring accuracy, the predicted number is rounded to the nearest integer and is either correct or not. For example, if the network says 2 + 2 = 4.4, it is considered correct, but if it says 2 + 2 = 4.6, it is considered incorrect. 20% accuracy thus means that it correctly adds the two numbers 20% of the time on a test dataset.
Here is a summary of the accuracy and error of these models:
|Number of hidden nodes||Maximum sum||Accuracy on test data||Error (MSE)|
There are a few things that are interesting here:
- The 1-node network cannot add numbers at all.
- Networks with 2 or more hidden nodes get high accuracy when adding numbers with a sum of at most 16.
- All networks perform poorly when adding numbers with a sum of at most 256.
- All networks have abysmal performance for numbers with a sum of at most 65536.
- Adding more hidden nodes improves performance most of the time.
Here is a plot of the validation loss for the different networks after each epoch. Training can stop early if the performance does not improve, which explains why some lines are shorter than others:
Exploring prediction errors
Let us look at the prediction error for each pair of numbers. For example, the 1-node network trained on sums up to 16 has an overall accuracy of 20%. When we add 2 + 2 with this network we get 6.42 so the error is 2.42 in this case. If we try a lot of combinations of numbers, we can plot a nice 3D error surface like this:
It looks like the network is good at predicting numbers where the sum is 8 (the valley in the chart), but not very good at predicting anything else. The network is probably overfitting to numbers that sum to 8, because the training data has an overweight of samples that sum to 8.
Adding an extra node brings the accuracy up above 90%. The error surface plot for this network also looks better, but both networks struggle with larger numbers:
When predicting sums up to 256, the 1-node hidden layer model shows the same error pattern, i.e. a valley (low error) for sums close to 130. In fact, the network only ever predicts values between 78 and 152 (this cannot be seen from the graph), so it really is a terrible model:
The 2-node hidden layer network does not do much better for sums up to 256 which is expected since the accuracy is just 2%. But it looks fun:
As can be seen in the table above, even the 16-node hidden layer network only had 6% accuracy for sums up to 256. The error plot for this network looks like this:
I find this circular shape to be quite interesting. It almost looks like a moat around a hill. The network correctly predicts some sums between 51 and 180, but there is an error bump in the middle.
For example, for the sum 120, 60 + 60 is predicted as 128.8 (error = 8.8), but 103 + 17 is predicted as 119.9 (error = 0.1) which is correct when rounded. The error curve for numbers that sum to 120 is essentially a cross section of the 3D plot where the hill is more visible:
I have no idea why this specific pattern emerges, but I find it interesting that it looks similar to the 2-node network when predicting sums up to 16 (the hill and the moat). A more mathematically inclined person could probably provide me with some answers.
Finally, for the networks that were trained on sums up to 65536, we saw abysmal performance in all cases. Here is the error surface for the 16-node network which was the “best” performing one:
The lowest error this network gets on a test set is the sum 3370 + 329 = 3699 which the network predicts as 3745.5 (error = 46.5).
In fact, the network mostly just predicts the value 3746. As a and b get larger, the hidden layer always produces all 1’s and 0’s (or values very close to 0 and 1), so the final output is always the same. This already starts happening when a and b are larger than around 10 which probably indicates that the network needs more time to train.
The inner workings of the network
My initial interest was in how the networks decided to represent numbers in the hidden layer of the network.
To keep things simple, let us just look at the 2-node hidden layer network on sums up to 16 since this network produced mostly correct sum predictions.
What actually happens when we predict 2 + 2 with this network is illustrated below. The number above an edge in the graph is the weight between the nodes. There is a total of 6 weights for this network (4 from input layer to hidden layer and 2 from hidden layer to output layer):
One thing that might be of interest are the final weights 22.7 and -23.5. The way the network sums numbers is to treat the first hidden node as contributing positively to the sum and the second hidden node to contribute negatively. And they are almost the same.
It turns out that the 4-node hidden layer network works the same way. Here, there are 4 weights between hidden layer and output layer, and these are (rounded) 8, 8, 9 and -25. So we still have the large negative weight, but the positive weighting is now split between three hidden nodes with lower values that sum to 25. When calculating 2 + 2, the output of the hidden layer is 0.6, 0.6, 0.6 and 0.4 which is exactly the same as the 2-node network.
The same goes for the 8-node network. The 8 output weights are 3, 3, 3, 3, 4, 5, 5 and -25 (the positive numbers sum to 26). When predicting 2 + 2, the hidden layer outputs 0.6, …, 0.6 and 0.4, same as before.
Once again, I am a bit stumped as to why this could be, but it seems that for this particular case, these networks find a similar solution to the problem.
If you made it this far, congratulations! I have already spent way more time on this post than it deserves.
I learned that using neural networks to add numbers is a terrible idea, and that I should spend some more time thinking before doing. That is at least something.
The experimentation code can be found here.