How to build a Document Scanner with OpenCV and Python

Ryan Rana
6 min readAug 18, 2021
Photo by John Matychuk on Unsplash

Have you ever tried to take a picture of your math homework or a receipt and your iPhone scans the paper as a pdf instead of a regular image? Well, that is actually a really interesting application of Computer Vision and Python, which is the tool we will be building today. Due to the rising online workload, sending a digitalized version of a document by email or other means is becoming increasingly common. To put it another way, turn any paper into a scan-like presentation.

First off before we build it, we have to cover the concept of our project and how it actually works.

Three main steps go into a scanner

  1. Observe edges and corners
  2. Create an outline
  3. Apply a perspective transform.

With that said let's begin coding!

Import Libraries

The first step is to import all the necessary libraries into our new file.

Pyimagesearch is actually a web blog for handling computer vision tasks and the author developed a function for four_point_tranformation(more on that later).

Skimage is a child of Sklearn which is used to handle machine learning models in the easiest way possible.

NumPy is a tool to handle the mathematical side of machine learning.

Argparse is used to handle image arguments.

Cv2 is an infrastructure for computer vision tools.

Imutil is a module that contains convenience functions for resizing, rotating, and cropping images.

Setting up an Argument Parser

Now we have to handle parsing our command-line arguments. We’ll need only a single switch image, which is the path to the image containing the document we want to scan.

Reading our image

That's the setup part, now we can actually begin building our scanner. First off we need to pass our image into cv2, so it can modify it, to make the AI part better. We do this by setting the image as an argument and then resize it to 500 pixels.

GrayScale

The next step is to convert our colored image to grayscale. For a variety of reasons, this is a necessary step. Color information does not assist us identify significant edges or other characteristics in many image processing applications. Rather than starting with full-color imagery and missing anything crucial, it’s preferable to learn grayscale processing first and then apply it to multichannel processing. Another reality is that time is of the essence. Modern computers are capable of processing photos incredibly quickly, however when working with hundreds of images, it is preferable to resize and grayscale them to save time.

Anyway even though that's a long explanation, it's a simple line of code, cv2 has a built-in function for this sort of thing.

Blurring

The next step is to blur our image. Image Blurring refers to making the image less clear or distinct. We are only looking for a rectangular box (the document) in an image, so we don’t need our picture to be super clear and for speed reasons, we are blurring it.

The method for our blurring function is called Gaussian. His name is used a lot in algebra and calculus. Gaussian Blur (built off Gaussian function) is the best to reduce noise and add a cleaner blur that is pleasant to look at. It is a blurring method used in most photo editing applications.

The Gaussian Blur can be done with a simple line of code,

Canny

The next step is the actual edge detection. This can be done with the Canny function. The first step of it is the blur because this cancels out the noise and all the little blobs of unique colors. Only the main blobs of colors will remain and those are the edges that will be partitioned out.

The way Canny works is reailitivly simple. After the Gaussian blurring, a calculation is done to calculate the intensity gradient of the image. To do this two convolution masks are applied that represent the x and y-axis and the following equation is performed

This converts the image into a set of data points, which is what a computer really sees. Unlike computers, humans have developed over millions of years to be able to view an image without thinking much about it.

Anyway, a function called Non-maximum suppression is applied. This removes pixels that are not considered to be part of an edge. Hence, only thin lines will remain.

All of this is what happens behind the simple Canny function which can be represented with just one line of code

To see the output all we have to do is use imshow to show it.

You should get something that looks like the first figure in the canny section.

Find Outline

The next step is to locate the outline, which is actually rather straightforward. A document scanner takes a piece of paper and scans it. A paper is white 4 corned rectangle. A simple model can get the job done. We’ll assume that the largest contour in the image with exactly four points is our piece of paper to be scanned.

This line of code finds all the possible contours in the edged image but it only keeps the largest five by surface area.

Then each of these 5 contours is looped over and the number of points is calculated.

If the number of points is 4, it is assumed to be the paper so the loop would break.

Then we show the contour, we will actually show the original image with the outline on top.

Four Point Transform

With our image it is unlikely that we have garnered a top-down view, the original image may have been slightly offset and therefore not like an original document. To combat this we use the four_point_transform function to get our birds-eye view.

The four_point_tranform function has 5 inputs for the image and the 4 points of our paper. The width and height of the image are then calculated. Then the image is warped. There is a lot of mathematics that goes into this step but it isn’t necessary for this tutorial.

That's the simple line of code necessary for our transferred image.

Threshold

To give it a document feel we have to apply a threshold to eliminate shadows and even out lighting and make it look like a real document. A threshold is assigned to the image and based on the color value of the pixel is set to either 0 or 255(which is the maximum value). That way each pixel is either black or white.

To do this we have to first grayscale and then use the OpenCV threshold function.

The Final Step

The final step is to just go back and display the original image and the scanned document.

This gives the following result. Pretty Fantastic!

Conclusion

From here you could do many things to improve your project, you can allow for a photo capture within your program or you make an ML handwritten text classifier. Good luck with all your coding endeavors!

Source Code: https://github.com/RyanRana/OpenCV-Document-Scanner

--

--