Porting or deployment for a mobile platform

The next logical step is deploying your solution on a mobile platform (or platforms). Here, you have several considerations:

Model memory consumption
Data memory consumption
Training speed (if you need on-device training)
Inference speed
Disk space consumption
Battery consumption

You can profile all of this using Xcode instruments.

For information on Swift code speed optimization check out this guide: Writing High-Performance Swift Code, at: https://github.com/apple/swift/blob/master/docs/OptimizationTips.rst.

If your application includes several pre-trained models, for example, neural artistic style filters, you can use on-demand resources to store those models on the App Store and download them only when they are needed, not in the process of app installation. The On-Demand Resources Guide explains:

"On-demand resources are app contents that are hosted on the App Store and are separate from the related app bundle that you download. They enable smaller app bundles, faster downloads, and richer app content. The app requests sets of on-demand resources, and the operating system manages downloading and storage. ...
The resources can be of any type supported by bundles except for executable code."

As of spring 2017, App Store allows you to store up to 20 GB of on-demand resources. You also can define which resources will be purged when the OS hits the limit of disk space.

You can find more details about this technology and how to adopt it in your application here: https://developer.apple.com/library/content/documentation/FileManagement/Conceptual/On_Demand_Resources_Guide/index.html.

In the previous two chapters, we discussed questions of model acceleration and compression in more detail.

It is good to make sure in advance that your model is easily portable for mobile platforms. For example, suppose you've decided to train a model with one of the frameworks and convert it to a Core ML format for iOS deployment. Before training a complex neural network for a week on a GPU server, verify that untrained network with this architecture can be converted by coremltools. In this way, you will avoid disappointment later when you figure out that coremltools doesn't support one of the layers in your super-cool architecture. Actually, Core ML now supports custom layers, but do you really want to write one if you can replace it with something more traditional? You can call your solution portable only if porting costs much less than rewriting from scratch.