Apache Beam MongoDB reader for Python

The Apache Beam SDK for Python is currently lacking some of the transforms found in the Java SDK. I created a very minimal example of an Apache Beam MongoDB read transform for Python that might be useful for someone else looking for an answer.

I will update this post in the future if/when the Apache maintainers include support for MongoDB in the SDK. I know I could contribute to the project directly, but I don’t have time for it right now unfortunately :-)

Code Tips

Golly Gosh Moments

You know that time you realized that you have been doing something the wrong way for a very long time and then finally realize the wrongness. For the sake of the low profanity rating of this blog, let’s call these golly gosh moments, although the Millennials might better understand #FML. Homer just says do’h.

So it’s just a normal day at the office, and I want to see if I can make an IP address lookup to get the approximate geo-location of a website visitor. I find an IP and it starts with 10 and turns out to be part of a private IP range. The next IP is the same. And the next.

To make a long story short, it turned out that we have been saving Heroku IP addresses in our logs instead of the user IP address for all our widget tracking for all of time. Heroku is a proxy, so the actual IP address is in an X-Forwarded-For header. For educational purposes here is how to make an express.js app behave better with a trusted proxy:

// App is an express.js app
app.enable('trust proxy');

// req.ip now contains the correct
// IP address during requests.

A one-liner made a world of difference for the logging. Golly Gosh.


Fractals revisited

FractalGoing through old code can be fun and educational. While updating my website, I took an extra look at some of my featured code. When I came across my simple fractal simulations on the <canvas> element, I was quite surprised to see how much I violated the Don’t Repeat Yourself (DRY) principle. The three simulations share more than 80% of the same code but they were each defined in separate files where all the code was repeated. The performance of the simulation had bothered me earlier, so I decided to take a look at the code and did the following:

  • Consolidate the three simulation files into a single file.
  • Optimize the animation loop.

It was a fun little evening project to refactor some old code. There’s still some work that could be done, like removing the hardcoded dependency of the canvas element with a specific ID, but for a little showcase like this, I do not want to bother too much about that.

By the way, the code is online.