A lot of developers sooner or later hear about the topic of machine learning, deep learning, or neural networks. You might have already heard about these topics. If you have, you know that machine learning is a pretty complex field that requires very specific domain knowledge. However, machine learning is becoming bigger and more popular every single day and it is used to improve many different types of applications.
For instance, machine learning can be used to predict what type of content a certain user might like to see in a music app based on music that they already have in their library, or to automatically tag faces in photos, connecting them to people in the user's contact list. It can even be used to predict costs for certain products or services based on past data. While this seems like magic, the flow for creating machine learning experiences like these can be split roughly in two phases:
- Training a model.
- Using inference to obtain a result from the model.
In order to perform the first step, large amounts of high quality data must be collected. If you're going to train a model that should recognize cats, you will need large amounts of pictures of cats. You must also collect images that do not contain cats. Each image must then be properly tagged to indicate whether the image contains a cat or not.
If your dataset only contains images of cats that face towards the camera, chances are that your model will not be able to recognize cats from a sideways point of view. If your dataset does contain cats from many different sides, but you only collected images for a single breed or with a solid white background, your model might still have a really hard time recognizing all cats. Obtaining quality training data is not easy, yet it's essential.
During the training phase of a model, it is extremely important that you provide a set of inputs that are of the highest quality possible. The smallest mistake could render your entire dataset worthless. It's in part due to the process of collecting data that training a model is a tedious task. One more reason is that training a model takes a lot of time. Certain complex models could take a couple of hours to crunch all the data and train themselves.
A trained model comes in several types. Each type of model is good for a different type of task. For instance, if you are working on a model that can classify certain email messages as spam, your model might be a so-called Support Vector Machine. If you're training a model that recognizes cats in pictures, you are likely training a Neural Network.
Each flavor of model comes with its own pros and cons and each model is created and used differently. Understanding all these different models, their implications, and how to train them is extremely hard and you could likely write a book on each different type of model.
In part, this is why CoreML is so great. CoreML enables you to make use of pre-trained models in your own apps. On top of this, CoreML standardizes the interface that you use in your own code. This means that you can use complex models without even realizing it. Let's learn more about CoreML, shall we?