Apache Beam MongoDB reader for Python

The Apache Beam SDK for Python is currently lacking some of the transforms found in the Java SDK. I created a very minimal example of an Apache Beam MongoDB read transform for Python that might be useful for someone else looking for an answer.

I will update this post in the future if/when the Apache maintainers include support for MongoDB in the SDK. I know I could contribute to the project directly, but I don’t have time for it right now unfortunately :-)

Golly Gosh Moments

You know that time you realized that you have been doing something the wrong way for a very long time and then finally realize the wrongness. For the sake of the low profanity rating of this blog, let’s call these golly gosh moments, although the Millennials might better understand #FML. Homer just says do’h.

So it’s just a normal day at the office, and I want to see if I can make an IP address lookup to get the approximate geo-location of a website visitor. I find an IP and it starts with 10 and turns out to be part of a private IP range. The next IP is the same. And the next.

To make a long story short, it turned out that we have been saving Heroku IP addresses in our logs instead of the user IP address for all our widget tracking for all of time. Heroku is a proxy, so the actual IP address is in an X-Forwarded-For header. For educational purposes here is how to make an express.js app behave better with a trusted proxy:

// App is an express.js app
app.enable('trust proxy');

// req.ip now contains the correct
// IP address during requests.

A one-liner made a world of difference for the logging. Golly Gosh.

Fractals revisited

FractalGoing through old code can be fun and educational. While updating my website, I took an extra look at some of my featured code. When I came across my simple fractal simulations on the <canvas> element, I was quite surprised to see how much I violated the Don’t Repeat Yourself (DRY) principle. The three simulations share more than 80% of the same code but they were each defined in separate files where all the code was repeated. The performance of the simulation had bothered me earlier, so I decided to take a look at the code and did the following:

  • Consolidate the three simulation files into a single file.
  • Optimize the animation loop.

It was a fun little evening project to refactor some old code. There’s still some work that could be done, like removing the hardcoded dependency of the canvas element with a specific ID, but for a little showcase like this, I do not want to bother too much about that.

By the way, the code is online.

Open-sourcing the past

While studying at the University of Oregon, I worked as a teaching assistant in three different computer science courses. One of them was CIS 323 Data Structures Lab but this course was a bit special because it had its own course number and I was teaching it almost on my own.

It was quite a roller coaster ride1

Anyway, throughout the course, we implemented some classic and often used data structures and algorithms in various forms. In my opinion, the most notable data structure we implemented was a fairly new balanced tree data structure called the left-leaning red-black tree (LLRB), invented by Robert Sedgewick in 2008. Back in the beginning of 2010, I could not find any publicly available C++ implementation of the LLRB tree2 which made it fun to use in class because it was very new. This means that there is a possibility that my implementation was the first-ever implementation of the LLRB tree in C++. It is a fun thought but it is not very significant, considering it is only a few lines of code, the delete operation was not implemented and it was never released. Until now.

I recently went through some old course material and found the code. So I emailed the University of Oregon and the course supervisor and with their permission, here is the code which I might expand with a few more data structures once I have looked through the material. I have refactored the code from the original but it still has the mark of a C++ beginner. It was fun going through it again though.

  1. The course used C++ so the students could get a feel for something else than Java. I could probably have used something else but I stuck to the syllabus from previous years. Unfortunately, I had never written a line of C++ code before, I had never designed an assignment before, I had never performed a lecture before and I had never spoken to an audience of 50 people before. It was scary as hell. I had bits and pieces of assignments from previous years but in the end, I had to change the syllabus slightly and write quite a bit of C++ from scratch. 

  2. The reference implementation was written in Java