R pairs chart in Power BI

As a rule, we are using Power BI to present our findings, creating dashboards or reports. But Microsoft Power BI can be useful on the stage of initial exploratory data analysis as well.

I found it when I needed to examine a really wide data table, containing hundreds of columns. Usually, I am writing an R script, creating Scatterplot matrices using pairs(). But having a lot of features, and wishing to browse them in different combinations, that would be a bit onerously.

That is why I created a ggpairs R Visual, showing the same chart in Power BI. There are two reasons for that. First, I can quickly select features to display, simply marking them on the “Fields” pane in Power BI. Secondly, Power BI has a lot of Data Sources which could be accessed much easy than in R.

r-pairs

Of course, Power BI has a few drawbacks. It is trying to refresh a chart every time you are selecting/deselecting fields. It is annoying. And do not forget about the data size limitation in R Visuals – Power BI takes no more than first 150,000 rows.

The source code of the ggpairs.R can be found there download

Got the Microsoft Professional Program Certificate in Data Science!

DSCertificate

A year ago I decided to take a course, dedicated to Machine Learning and Data Science. Microsoft offered a “Microsoft Professional Program for Data Science” on the basis of massive open online courses (MOOC) on the edX platform.

The program consists of 4 units of 9 courses and a final project (see more at https://academy.microsoft.com/en-us/professional-program/data-science/). Some of the units allow you to choose from different courses. For example, you can choose courses, requiring knowledge of R or Python.

I completed the following courses:

Course Length
Microsoft – DAT101x: Data Science Orientation 6 weeks
Microsoft – DAT201x: Querying with Transact-SQL 6 weeks
Microsoft – DAT207x: Analyzing and Visualizing Data with Power BI 6 weeks
ColumbiaX – DS101X: Statistical Thinking for Data Science and Analytics 5 weeks
Microsoft – DAT204x: Introduction to R for Data Science 4 weeks
Microsoft – DAT203.1x: Data Science Essentials 6 weeks
Microsoft – DAT203.2x: Principles of Machine Learning 6 weeks
Microsoft – DAT209x: Programming with R for Data Science 6 weeks
Microsoft – DAT203.3x: Applied Machine Learning 6 weeks
Microsoft Professional Capstone: Data Science 4 weeks

Each course, including a Capstone Project, costs $99 for a verified certificate. That way, you will pay $990 if Microsoft does not raise the price as they did it twice before it.

The courses are well structured: some theory, presented by a trainer with hands-on demos, Quizzes, Labs, and Exams.

The most enjoyable part for me was the Capstone Project. It is a competition, during which you have to predict some values having a bunch of data, and to write a report of your analysis and findings. You can use any techniques you want, but the final score depends on the accuracy of your predictions.

submissions

It was a really excellent experience, but I have to take a breath before I start looking for new courses.