The next logical step is deploying your solution on a mobile platform (or platforms). Here, you have several considerations:
- Model memory consumption
- Data memory consumption
- Training speed (if you need on-device training)
- Inference speed
- Disk space consumption
- Battery consumption
You can profile all of this using Xcode instruments.
If your application includes several pre-trained models, for example, neural artistic style filters, you can use on-demand resources to store those models on the App Store and download them only when they are needed, not in the process of app installation. The On-Demand Resources Guide explains:
The resources can be of any type supported by bundles except for executable code."
As of spring 2017, App Store allows you to store up to 20 GB of on-demand resources. You also can define which resources will be purged when the OS hits the limit of disk space.
You can find more details about this technology and how to adopt it in your application here: https://developer.apple.com/library/content/documentation/FileManagement/Conceptual/On_Demand_Resources_Guide/index.html.
In the previous two chapters, we discussed questions of model acceleration and compression in more detail.
It is good to make sure in advance that your model is easily portable for mobile platforms. For example, suppose you've decided to train a model with one of the frameworks and convert it to a Core ML format for iOS deployment. Before training a complex neural network for a week on a GPU server, verify that untrained network with this architecture can be converted by coremltools. In this way, you will avoid disappointment later when you figure out that coremltools doesn't support one of the layers in your super-cool architecture. Actually, Core ML now supports custom layers, but do you really want to write one if you can replace it with something more traditional? You can call your solution portable only if porting costs much less than rewriting from scratch.