Saturday, June 23, 2018

Super Heroes - Marvel Vs DC: A brief introduction about the blog and converting the first row as headers.

Since this is the very first post of the blog, let me give a brief introduction about this blog. Only recently, I got a lot of interest in data, be it because of my day job where I deal with a lot of data or the recent  rise of emails related to data in my inbox because of excel and coding subscriptions. Whatever that is that piqued my interest, I recently started making a bit of data visualisations, starting at work and now also at home. However, I felt that there is no use of just making visualisations and keeping them with me, however bad or should I say 'begineer' level they are. And that is what gave rise to 'Fun with Data' blog. 

This is the place where I will share all visualisations I create and not only will just share the final output, but also the thinking process that has gone in creating it. So straight to the topic, I will start with the data set called 'Super Heroes Dataset' downloaded from Kaggle , one of the sites that really got me interested in data analysis. The data set is scraped from Superhero Database. I will be using Power BI desktop for the purpose of the visualisations.

So first, loading the csv file to Power Bi: 





This will give the option to select the file. Once the file is selected:




This will give a preview of the file. From here, it's just a matter of click the 'Load' button, you don't want to do any further editing to the data. However in the preview, you will notice that the headers have not been correctly detected. I will need to correct this first before I load the data. Therefore, I will click edit.


This will open the table in the power query editor. From here you go to the Transform tab and click on Use FIrst Row as Headers:



I also don't need column 1, so I will delete that by right clicking on the column and selecting remove.

Now, just a quick glance in the data shows a lot of 'missing' value marked with '-' and I can also see '-99' values under height and weight, which really can't be possible (or can it? We are talking about super heroes after all!). For now we will also assume these as missing values. I will not do much editing to the data and straight away load it so I can start creating a report based on the data which I will do so in the next post.