Invoice Scanning

I’ve just recently worked on invoice scanning for Finances. It lets you scan invoices on iPhone or iPad and add them as a PDF document to transactions. In this post I will show you how I’ve implemented that feature using the frameworks available on iOS.

Lets start by looking at the final result. You can see the invoice scanning in the Finances trailer. The user interface looks very similar to the document scanning UI in Apple’s Notes app on iOS 11. That’s not a coincident. I’ve reimplemented the exact same user interface, because most iOS users are already familiar with it. Also I found it an interesting challenge to implement it myself.

Scanning Screen

The scanning screen is the first screen presented to the user. Even though it provides a lot of information, it doesn’t feel overwhelming. On iPhone the user interface elements are placed at the top and bottom of the screen. On iPad the layout is a little bit different but pretty much the same.

Scanning Screen
The scanning screen shows the output of the camera, a preview of scanned invoices and provides settings for the camera flash, photo filter and auto shutter. The outline of a detected invoice is highlighted with a yellow rectangle.

The bar at the top is a UIToolbar pinned to the top of the screen. There was no need to implement a custom user interface element for this.

All other user interface elements are implemented by view controllers. Those view controllers are managed by a container view controller. By doing it this way I was able to create simple and resuable controllers. I’ve tried to avoid view controllers with hundreds lines of code. Those massive view controllers become complex very fast. You will also have a hard time maintaining that code as well. Dave DeLong has a great write-up about the problems of massive view controllers.

Now lets take a look at the actual implementation.

Camera Output

The camera output view displays the output of an AVCaptureSession in a AVCaptureVideoPreviewLayer.

Invoice Outline

The outline of an invoice is detected in an image by using a VNDetectRectanglesRequest, which is provided by the Vision framework on iOS 11. Performing the request can be done with just a few lines of code.

 1import Vision
 3// let image: CVImageBuffer = …
 5let request = VNDetectRectanglesRequest {
 6    (request, error) in
 7    let observations = request.results as! [VNRectangleObservation]
 8    // ...
10let requestHandler = VNSequenceRequestHandler()
11requestHandler.perform([request], on: image)

Info Message

This view controller positions a UILabel centered in a rounded view. The view controller provides one method to show and hide the text, optionally with a fade animation.

Shutter Release

The shutter release button consists of two views – the inner circle and outer ring. This view controller handles touch events and animates the inner circle accordingly.

Photo Preview

The photo preview holds a preview of all scanned invoices. The previews are displayed in a UICollectionView. I’m using a subclass of UICollectionViewFlowLayout to layout the cells. As you can see from the following videos the custom flow layout stacks the cells once there is no more space left between them.

A preview of the scanned invoice is presented to give instant feedback to the user. The previews are then stacked on one another if there is no more space between them.

If a photo is recorded, it is presented with a 3D animation. The animation is done by calculating a transformation matrix and applying the CATransform3D to an image view. The transformation is based on the invoice outline which is detected using a VNDetectRectanglesRequest. A similar animation is used when editing a photo.

Container View Controller

All view controllers are managed by a container view controller. This view controller is responsible for laying out the child view controllers and mediating between the controllers. For example when a photo is taken by tapping the shutter release, the container view controllers gets notified. It then queries another view controller to record a photo using the AVCaptureSession. The photo is then rectified based on the invoice outline, presented to the user with a 3D animation and then moved to the photo preview stack.

Photo Editing

Once a photo is taken the invoice outline can be edited by the user. The photo editing screen lets the user change the corner position of the invoice outline. The outline rectangle is a quadrilateral. Based on the outline, the invoice is rectified using the CIPerspectiveCorrection filter.

 1import CoreImage
 3// let image: CIImage = …
 4// let quadrilateral: Quadrilateral = …
 6let parameters = [
 7    "inputTopLeft" : quadrilateral.topLeft,
 8    "inputTopRight" : quadrilateral.topRight,
 9    "inputBottomRight" : quadrilateral.bottomRight,
10    "inputBottomLeft" : quadrilateral.bottomLeft,
12let rectified = image.applyingFilter("CIPerspectiveCorrection", parameters: parameters)

The quadrilateral is also used to transform the layer to get the 3D animations, as you can see in the following video. I love this animation. It gives you a sense of what is going on when an invoice is cut out from the photo and rectified.

The invoice outline corners can be changed by dragging a corner to the right position.


Apple’s Notes app on iOS 11 has such a good user interface that I had to implement it myself. It has a lot of nice little touches and animations. I’ve split up every UI element into its own view controller to create independent components. I’ve tried to avoid massive view controllers as much as possible and I ended up with view controllers with simple implementations and clear interfaces.

The Vision framework on iOS is used to detect the invoice outline. Rectifying the image is done by the CoreImage framework.

Overall I’m really happy with how this turned out. You can try it out yourself in Finances for iOS.

Last updated 2018-04-05

© Matthias Hochgatterer – MastodonGithubRésumé