Using Data to Choose a Linux Distro

Published: Jul 7, 2023

The Problem

I recently decided that I wanted to beef up my home lab a bit by adding a small, inexpensive machine that I can use as a sandbox for server projects and testing operating systems. When I realized that StackSocial had a Lenovo ThinkCentre M900 on sale for less than $200, I ordered one.

I also knew I wanted to try something besides Windows and MacOS on this new machine, so I decided to evaluate Linux distros. As it turns out, choosing a Linux distro is a really tall order for someone who’s only superifically familiar with the domain; there are hundreds, if not thousands, of distributions.

So, knowing the membership to be Linux and FOSS fans, I turned to the Mastodon community to help me decide what to try out first.

Data Collection

I posted a poll on Mastodon asking everyone to vote for their favorite Linux Distro.

I didn’t expect to get much engagement as a new member with few followers, but it seems as though this topic is something everyone has an opinion on.

Having set the poll to stay open for a week, I received 58 votes, not accounting for recommendations from replies on the post (I asked everyone with an “Other” vote to specify their preference in a reply).

After a week, I combed through the comments to create the full dataset analyze the results.

Analysis

I used a Jupyter Notebook for the basic analysis. Pandas helped me to import and transform the data, and I used Matplotlib for the visualizations. As there were two types of recommendations, I classified the recommendations as either distros or desktops (what’s the difference?).

From there, I added a few blocks of code to clean things up, sort the data, and create the bar charts.

Results

Now that I had a couple of visuals to evaluate, the results were quite interesting and unexpected. Zorin OS was one of my personal preferences going into the poll, but it hadn’t received a single vote (since the poll closed, I’ve had some comments recommending it).

Linux Mint was by far the winner in the distro category, followed by Fedora and NixOS. So, it looks like I’ll be starting off with Linux Mint.

For desktop environments, KDE Plasma was the most recommended.

I’ve since posted the results on Mastodon, in case anyone who participated was interested in the overall outcome.

I’ve pushed the project files to my Github, but I recommend checking it out here using the Jupyter Notebook Viewer if you just want to see the results.




Enjoyed this post? Help me keep the lights on