After almost 2 years since I got the idea, I finally launched my side project. See the post Shiny New Things for the initial description of the project. A lot has changed since then.
Let’s start with the stack. I’m still using Rails for the back end except now Rails 6 is out. I did try Vue.js for a while but I quickly abandoned it in favour of React. While Vue is much simpler than Angular, it still has a lot extra complexity compared to React. The distinction between computed and method and template syntax come to mind. I will discuss what extra technologies I ended up using along the way later on.
I started using React without Redux because I’ve seen how much boilerplate having a datastore adds and I thought my application is simple enough that I won’t need to use Redux. This assumption held true at first. However, after a while, I started getting frustrated with how deep I need to pass some properties and handlers through components that didn’t need to know about them. So, I decided to add Redux. While it solved the problem of having to pass props and handlers down, it introduced a few problems of its own such as:
- Sharing logic between pages is convoluted.
- Writing features for the front end became a drag. To add a simple button that changes the state you need to: Create the button with a handler that calls a prop function; change the component to accept said prop; change container component to pass the prop; add the action type; create the action; and create the reducer.
Enter React Hooks. I didn’t know how hooks are supposed to be used so I winged it and I’m pretty happy with how it turned out. I will probably write a separate post about hooks so I’ll glance over it for now.
In addition to all the front end rewrites, I also rewrote the content import code to borrow from the principles of ETL. Inspired by the Kiba gem, I wrote my own ETL from scratch with different sources, transforms, and destinations that I can combine flexibly to import content from APIs and CSVs and populate the database by creating records one at a time, creating records in bulk using PostgreSQL‘s COPY command, or updating individual records’ attributes.
Were all upgrades and technology switching worth it? From a development perspective? Absolutely! It makes development so much more enjoyable and faster. From a pure business value? I don’t know. On one hand, I get features out much faster. On the other hand, I don’t know where the website would be by now had I launched sooner without all the rewrites.
The plan was simple: create the database, seed it, add content from the websites I support for the initial launch, and start telling people about it. Oh, how I wish it was that easy!
Lack of (Thorough) Testing
The first issue I encountered was with the import script. The COPY command works by taking a CSV file and copying its content to a table. The import script shards the many-to-many table data every 10 million records. One mistake I made was simply not testing that during the import script rewrite because I was running the script with 10,000 records at a time. There was a bug with the sharding that prevented content population from completing. The fix was simple but hunting the bug down took a while. This is 100% my fault (then again, so is everything that goes wrong with this project since I’m the only one working on it).
Unforeseen Scale Issues
Along the way, seeing certain queries being run repeatedly while returning the same results (the content of the home page, for example), I added caching using Redis (the first technology added along the way). I plan on improving the cache invalidation mechanism eventually but, for now, it caches the content for 5 minutes before invalidating it. It’s simple but it gets the job of reducing queries done.
I always imported 10,000 records and used that for testing on my local development machine which is a mid-2013 MacBook Air with 128 GB of storage. As a result, I didn’t know what issues I would have with scale until I had all almost-19-million entities!
The first issue that presented itself was search. Filtering content based on certain criteria such as tags took a really long time to do using SQL queries. Selecting entries that have any tag present is easy. Selecting entries with all tags present is much, much slower. My first instinct was to split the
tags table which was a polymorphic association with 3 different entities. This helped reduce the size of the database and sped up the search but it still wasn’t enough.
Enter Searchkick. Searchkick is a gem that acts as a wrapper for Elasticsearch (the second technology added along the way). Searching is now almost-instant! It came at the cost of the complexity of another technology added and the cost associated with it.
The second issue was a consequence of using Elasticsearch. Elasticsearch provides a recommendation feature out of the box by looking at similar documents. This feature provided excellent recommendations but it was slow. It would take 30 seconds and time out for some entries. It turns out Elasticsearch can do extra indexing that would speed this up. Adding term vectors increased the index size 1.5x but made the recommendations almost instant, too!
Finally! I added all the features I wanted, they work, and they work fast. It’s time to start telling people about it.
Since it’s a NSFW project, the plan all along was to promote it on Reddit: Post content from my website to Reddit, people see it, visit my website, immediately love it, create accounts, and keep coming back. Simple, right?
No. One issue I ran into is that subreddits restrict which websites can be linked to which limited my options when it comes to which subreddits I can post in. No problem, I was still getting thousands of daily visitors, surely some of them will register, right?
No. Due to the nature of the site, and probably other factors, users are really, really hesitant to sign up. Of the over 22,000 users so far, 84 clicked on UI elements that require sign up and saw a message asking them to sign up, of those, 16 visited the sign up page, and of those, 0 registered. No problem, clearly, people aren’t seeing the need to register and they’re not registering. I will show a prompt that tells users about the features that come with having an account. Surely that will work, right?
No. I decided to try different versions of the prompt. To accomplish that, I set up goals in Google Analytics and set up an A/B experiment with Google Optimize with 4 options: No prompt (control), a prompt asking people to register, a prompt that mentioned the possibility of signing up with Reddit, and a prompt that assured users that emails are optional during sign up. Over 3,700 users saw the prompts… 0 registrations.
Along the way (and this probably didn’t help with encouraging users to sign up), I ran into issues connecting to Elasticsearch. They were intermittent, frequent, and lasted for hours at a time. While that was happening, recommendations and search wouldn’t work, and a notification saying something went wrong would show up. It turns out this was caused by a gem that Searchkick recommends: Typhoeus. It is supposed to speed up Ealsticsearch searches by maintaining an open connection. Instead, it brought it to a halt. Removing it eliminated the problem. While I added the gem before launch, it only became apparent after launch. Lesson learned though: be careful with what you add in production. When things fail, they’re seen by people. Although, it’s not as scary as it seems. The stress of worrying about launch and how data is important to users is far worse than the stress of something going wrong while the website is live, especially if it’s still usable.
So, where do I stand? Well, this is the beauty of having my own project: It’s entirely up to me. In addition to learning a lot of technologies that helped me in my daily job (it helped me fix an Elasticsearch bug at work much faster than if I hadn’t used it before), it is really up to me to determine what I want to do and how you want to do it. While the website is far, far from making enough to cover its cost, it costs little enough that I can afford to keep it up for a while while I come up with other ways to promote it.
I left a few things undiscussed that will be likely have their own posts soon:
- Isomorphic Applications: Since SEO is important in this area, I had to make sure that pages load with their content pre-loaded instead of acting like a single-page application.
- React Hooks.
- Caching: When I get around to improving cache invalidation I will likely have a post to talk about that.