Camera calibration using OpenCV

Being not ideal, all cameras distort their images. Usually, it’s not so important and can be even funny. But if your image analysis requires quite precise images, you should take care about calibrating your camera to remove distortions.

Luckily, OpenCV provides everything we need for that. I don’t want to repeat the theory and description of that process – you can easily find it on the OpenCV website. I just created a simple Python application calibrationlab, calibrating cameras:


The application was written on Python 3.6 using opencv v3.4.1 and wxpython v4.0.3. So you’ll need to install them first, using pip or conda. Also, you will need to print out a chessboard.docx.

Start the application running If you have a few cameras connected to your computer, you can select one using the “Camera” menu. Place the chessboard image in front of your camera and press the “Calibrate” button. Make a few tries to get minimal Mean Error. Resulting calibration parameters can be saved in a file. They contain camera matrix and distortion coefficients. Later you can use them loading from the file without the need to recalibrate your camera again. As an example, I included such a file (camera.json) into the source repository.

When the calibration process is done, you can check the”Undistort” checkbox to apply calibration parameters. The undistorted image is a bit smaller than the original one. I intentionally left a black border around the image. But you can crop the image, using roi (region of image) parameter, returned by the cv2.getOptimalNewCameraMatrix() method.

If you are using a good camera you hardly spot the difference between original and undistorted images. Otherwise, you will notice that straight lines in the real world become straight in the image.

The source code: GitHub download

Who is Data Science Engineer?

Who is Data Science Engineer, and is there any difference between them and Data Scientists? Your answer to this question is an indicator of success or failure of your Data Science project.

Software development is a risky business, but the risks rocket up when you give in to the persuasion of your client “to add some AI to the system”. Your next move is to hire Data Scientists and delegate this task to them.

“What are you doing?” – Scrum masters are wondering. “Building models” – they say. After a few months, you are starting to realize you cannot control this process. Why?

Because Data Science Process IS NOT Software Development Process! Any attempts to ignore this fact quickly bring the project to epic fail.

Data Science Process has its own set of inputs, outputs, roles, deliverables and process flow. Look at some them: TDSP, CRISP-DM, KDD, SEMMA. I hope your Project Managers are aware of them. But even if they are – epic fail is still your main option.

Because there is a gap between these two processes: Data Science and Software Development.

If your developers know everything about programming, they hardly know something about Statistics, Machine Learning, Data Science (excepting popular articles). Just as Data Scientists know nothing about building Line-of-Business Applications, and truly speaking are not too strong in programming. How are you going to cope with it?


Project Manager staring at a Data Science project.

Data Science Engineer is the answer. You need a person

  • with strong programming skills
  • with basic Statistical skills
  • with Data Science Process understanding and ability to participate on each stage of the process
  • knowing specialized DS languages and tools like R, Tensorflow and so on.

Data Scientists and developers live in different worlds. Data Science Engineer lives in both. It is a magic adhesive tape without which your Data Science project will fall apart.

Visualize missing values in R

Data quality is extremely important for analysis, modeling, and predictions. Microsoft data science utilities for Team Data Science Process contain a script (IDEAR), which can visualize missing values. You can specify the number of segments to split your data, in order to calculate average missing value rate for each segment, and visualize it with the levelplot.

I refactored this function a bit and replaced the levelplot with the ggplot. My plot_missing() function requires two packages: ggplot2 and reshape. The code of the MissingValues.R script can be found on the GitHub download

If you pass a data.frame with some missing values (NA) to the function, you will get a visual distribution of densities of the missing values:



The leftmost column “All” shows average missing value rates for the whole data set by variables.

You can change the number of segments and the color palette. For instance, you can use a palette from the RColorBrewer package:

if (!require(RColorBrewer))

plot_missing(data, 5, col = brewer.pal(n = 9, name = "Blues"))


If you do not remember palette names, you can display them:


Here is another nice palette:

plot_missing(data, col = brewer.pal(n = 9, name = "YlOrRd"))


The source code: GitHub download

Configuring Machine Learning Services in SQL Server 2017


In order to operationalize freshly installed SQL Server 2017 Machine Learning Services, we can spend quite a few hours (or days). I hope this article will help you to save your time.

I could refer the reader to the official documentation, but I am sure you’ve already been there.

Enable external scripts

The first step will be to enable execution of external scripts in the SQL Server, running Machine Learning Services. Connect to the SQL Server with the SQL Server Management Studio, and execute the following script:

sp_configure 'external scripts enabled', 1; 

Restart your SQL Server to make this change effective. After restarting SQL services, verify that SQL Server Launchpad service is running:


Now you can check that script execution is working. Open a new Query window, and then run the following command:

EXEC sp_execute_external_script 
       @language =N'R',
       @script=N'OutputDataSet <- InputDataSet;',
       @input_data_1 =N'SELECT 1 AS hello'
WITH RESULT SETS (([hello] int not null));

If the command executes without errors, you will see the result:


Enable implied authentication for the Launchpad

To enable authentication for the Launchpad, you should create a new SQL Login for SQLRUserGroup user. This process is well described in the, but I will repeat these steps here:

  1. In SQL Server Management Studio, in Object Explorer, expand Security, right-click Logins, and select New Login.
  2. In the Login – New dialog box, select Search.
  3. Select the Object Types and Groups checkboxes.
  4. Click Advanced, verify that the location to search is the current computer, and then click Find Now.
  5. Scroll through the list of group accounts on the server until you find one beginning with SQLRUserGroup.
  6. The name of the group that’s associated with the Launchpad service for the default instance is always just SQLRUserGroup. Select this account only for the default instance.
  7. If you are using a named instance, the instance name is appended to the default name, SQLRUserGroup.
  8. Click OK and close dialog boxes.

Give users permission to run external scripts

If a user, running R scripts, does not have elevated permissions, you must grant the user permissions to run scripts in each database where external scripts will be used:


Modify server resource pools

Under SQL server default settings, the R runtime will probably have insufficient memory. First, check the resources allocated to the default pool for the server:

SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default'

By default, the max_cpu, max_memory and cap_cpu are set to 100 percent:


Now check the resources allocated to the default external resource pool:

SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default'

By default, the maximum memory that R can use is 20% of the remaining memory:


The following settings will reduce the maximum computer memory that can be used by SQL Server and increase the memory allocated to external processes:

ALTER RESOURCE POOL "default" WITH (max_memory_percent = 60); 
ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 40); 

The values are just suggested settings to start with.

Please remember that SQL Server Standard edition does not support resource governance, i.e. memory distribution between SQL server and R engine:

  • Standard Edition

    Has all the capabilities of in-database analytics included with Enterprise Edition, except for resource governance. Performance and scale is also limited: the data that can be processed must fit in server memory, and processing is limited to a single compute thread, even when using the RevoScaleR functions.


Upgrade R Services to the latest version

First verify server’s version of the R Services, running the SQL script:

EXEC sp_execute_external_script @language =N'R',
@script=N'myvar <- version$version.string
OutputDataSet <-;'

SQL Server 2017 goes with the R Services of version 3.3.3, which is quite an old one. But you can upgrade the version, if you install Microsoft Machine Learning Server 9.2.1 (or upgrade it to this version).

Read the article: Upgrade machine learning components in a SQL Server instance. The article Install Machine Learning Server for Windows has links to the Machine Learning Server installer. For instance, Machine Learning Server (Developer Edition) for Windows setup can be found here.

Start the installer ServerSetup.exe for Machine Learning Server. Be sure to run the installer on the computer that has the instance you want to upgrade. To upgrade an instance, select the checkbox next to the instance name. If you do not select an instance, a separate installation of Machine Learning Server is created, and the SQL Server libraries are unchanged:


Verify installed R version:

  1. Find the R.exe or Rgui.exe in the folder \R_SERVICES\bin\x64
    For example:
    C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64\Rgui.exe
  2. Double-click Rgui.exe to start the R Console application.

The console will show the current R version:


As an option, you can run aforementioned SQL script:


Update local Microsoft R Client to the latest version

If you need to upgrade your local Microsoft R Client to the latest version to synchronize it with the server’s version, you can:

Useful Links

Jupyter Notebook R Kernel for Microsoft R Open

Have you ever tried to add an R Kernel to Jupyter Notebook, having installed R Tools for Visual Studio? If tried, you know that this is not a piece of cake. I’ll show you how to do this.

The Jupyter Notebook is a web application in which you can create and share documents with code, text, and visualizations. Fresh Jupyter Notebook contains only one Python kernel. If you are using R language, installed from CRAN, you can easily add it to the Jupyter, following instructions on the page Native R kernel for Jupyter.

But that’s another matter if you installed Microsoft R Open (MRO). In fact, it could be set up separately or as a part of R Tools for Visual Studio (RTVS). Above-mentioned recipe won’t work for MRO.

First, let’s install some packages to the MRO library folder. On my machine, this is C:\Program Files\Microsoft\R Client\R_SERVER\library

Open system menu. Find Microsoft R Client > RGui, and run it as Administrator. Copy the following code to the R Console and run it:

lib.path <- 'C:/Program Files/Microsoft/R Client/R_SERVER/library'
install.packages('devtools', dependencies = TRUE, lib = lib.path)
install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'uuid', 'digest'), lib = lib.path)

MRO on my machine has version 3.3.2, and I would try to install R kernel with the command:

IRkernel::installspec(name = 'ir33', displayname = 'R 3.3.2')

The result will be disappointing:

In fact, the error message gives us a clue: jupyter.exe cannot be found. Find a folder where the Anaconda is installed. On my machine, it is in the folder C:\Program Files\Anaconda3. Add folder C:\Program Files\Anaconda3\Scripts to the PATH environment variable of your account:

Close and open RGui and try again:

IRkernel::installspec(name = 'ir33', displayname = 'R 3.3.2')

Success! Now open system menu, and run Anaconda3 > Jupyter Notebook. R kernel should be in the list of notebooks:


Now you can remove C:\Program Files\Anaconda3\Scripts from the PATH variable if you don’t need it anymore.

Create a new R notebook and check that it is working now:


Added kernels could be found in the AppData\Roaming\jupyter\kernels folder like this: C:\Users\Eugene\AppData\Roaming\jupyter\kernels

If you don’t need a kernel, just remove it from this folder.