Wednesday, December 26, 2018

Christmas trees sold in the US

So it's Christmas season and I just got this very small data set in my inbox, so I decided, why not go ahead and make some few visualisations using the available data.

First question that comes to my mind is what is the trend of total sales.



My expectation was that it would be steadily increasing, however not much of an increase apart from the boom in 2007, 2013 and 2016.

The second question in my mind would be the trend of sales of fake trees and real trees separately.



As per my expectation, the fake are increasing as the years goes by.

Now finally just to see the trend in percent of fake trees of total in graphical view. From the above graph, I can assume that it should be increasing.


As seen from the above graph, there is some up and down in the trend and not a steady increase.

Happy Christmas and a new year!

Monday, December 24, 2018

Super Heroes - Marvel Vs DC:Working with non-numeric data and the COUNTROWS function

Continuing with the previous data set of super heroes, the first thing I notice is that the data is just text. Only the height and weight columns are numeric. It wouldn't be that easy a data set to work with, especially since I am mainly used to working with numeric data sets. Anyways, I will just move forward and see what best I can do.

So the first thing that comes to my mind, as the title suggest, is the count of Marvel and DC super heroes. The first thing I will do is write a measure to calculate the number of rows. I will use a COUNTROWS function to do this:

Rows_count = COUNTROWS(heroes_information)

Now I will just just select the table format and tick on the Rows_count measure I just created and notice that I have 734 rows in the data. I will then add Publisher to the data and re-arrange the levels to get this:


Out of this, I only want the Marvel and DC superheroes, so I will go ahead and filter this:

This is the resulting table I get:


The two publishers contribute to 82.15% of the superheroes in the data set.

Saturday, June 23, 2018

Super Heroes - Marvel Vs DC: A brief introduction about the blog and converting the first row as headers.

Since this is the very first post of the blog, let me give a brief introduction about this blog. Only recently, I got a lot of interest in data, be it because of my day job where I deal with a lot of data or the recent  rise of emails related to data in my inbox because of excel and coding subscriptions. Whatever that is that piqued my interest, I recently started making a bit of data visualisations, starting at work and now also at home. However, I felt that there is no use of just making visualisations and keeping them with me, however bad or should I say 'begineer' level they are. And that is what gave rise to 'Fun with Data' blog. 

This is the place where I will share all visualisations I create and not only will just share the final output, but also the thinking process that has gone in creating it. So straight to the topic, I will start with the data set called 'Super Heroes Dataset' downloaded from Kaggle , one of the sites that really got me interested in data analysis. The data set is scraped from Superhero Database. I will be using Power BI desktop for the purpose of the visualisations.

So first, loading the csv file to Power Bi: 





This will give the option to select the file. Once the file is selected:




This will give a preview of the file. From here, it's just a matter of click the 'Load' button, you don't want to do any further editing to the data. However in the preview, you will notice that the headers have not been correctly detected. I will need to correct this first before I load the data. Therefore, I will click edit.


This will open the table in the power query editor. From here you go to the Transform tab and click on Use FIrst Row as Headers:



I also don't need column 1, so I will delete that by right clicking on the column and selecting remove.

Now, just a quick glance in the data shows a lot of 'missing' value marked with '-' and I can also see '-99' values under height and weight, which really can't be possible (or can it? We are talking about super heroes after all!). For now we will also assume these as missing values. I will not do much editing to the data and straight away load it so I can start creating a report based on the data which I will do so in the next post.