Using a simple speech recognition model in iOS with Swift

We created a Swift-based iOS app that uses the TensorFlow pod in Chapter 2, Classifying Images with Transfer Learning. Let's now create a new Swift app that uses the TensorFlow iOS libraries we manually built in the last section and use the speech commands model in our Swift app:

Create a new Single View iOS project from Xcode, and set up the project in the same way as steps 1 and 2 in the previous section, except set the Language as Swift.
Select Xcode File | New | File ... and select Objective-C File. Enter the name RunInference. You'll see a message box asking you "Would you like to configure an Objective-C bridging header?" Click the Create Bridging Header. Rename the file RunInference.m to RunInfence.mm as we'll mix our C, C++, and Objective-C code to do post-recording audio processing and recognition. We're still using Objective-C in the Swift app because to call the TensorFlow C++ code from Swift, we need to have an Objective-C class as a wrapper to the C++ code.
Create a header file called RunInference.h, and add this code to it:

@interface RunInference_Wrapper : NSObject
- (NSString *)run_inference_wrapper:(NSString*)recorderFilePath;
@end

Your app in Xcode should look like Figure 5.5 now:

Figure 5.5 The Swift-based iOS app project

Open ViewController.swift. Add the following code at the top after import UIKit:

import AVFoundation

let _lbl = UILabel()
let _btn = UIButton(type: .system)
var _recorderFilePath: String!

Then make the ViewController look like this (the code snippet that defines NSLayoutConstraint for the _btn and _lbl and calls addConstraint is not showing):

class ViewController: UIViewController, AVAudioRecorderDelegate {
    var audioRecorder: AVAudioRecorder!
override func viewDidLoad() {
    super.viewDidLoad()
    
    _btn.translatesAutoresizingMaskIntoConstraints = false
    _btn.titleLabel?.font = UIFont.systemFont(ofSize:32)
    _btn.setTitle("Start", for: .normal)
    self.view.addSubview(_btn)
        
    _btn.addTarget(self, action:#selector(btnTapped), for: .touchUpInside)

    _lbl.translatesAutoresizingMaskIntoConstraints = false
    self.view.addSubview(_lbl)

Add a button tap handler and inside it, first request the user's permission for recording:

@objc func btnTapped() {
    _lbl.text = "..."
    _btn.setTitle("Listening...", for: .normal)
    
    AVAudioSession.sharedInstance().requestRecordPermission () {
        [unowned self] allowed in
        if allowed {
            print("mic allowed")
        } else {
            print("denied by user")
            return
        }
    }

Then create an AudioSession instance and set its category to record and status to active, just like what we did in the Objective-C version:

let audioSession = AVAudioSession.sharedInstance()

do {
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioSession.setActive(true)
} catch {
    print("recording exception")
    return
}

Now define the settings to be used by AVAudioRecorder:

let settings = [
    AVFormatIDKey: Int(kAudioFormatLinearPCM),
    AVSampleRateKey: 16000,
    AVNumberOfChannelsKey: 1,
    AVLinearPCMBitDepthKey: 16,
    AVLinearPCMIsBigEndianKey: false,
    AVLinearPCMIsFloatKey: false,
    AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
    ] as [String : Any]

Set the file path to save the recorded audio, create an AVAudioRecorder instance, set its delegate and start recording for 1 second:

do {
    _recorderFilePath = NSHomeDirectory().stringByAppendingPathComponent(path: "tmp").stringByAppendingPathComponent(path: "recorded_file.wav")
    audioRecorder = try AVAudioRecorder(url: NSURL.fileURL(withPath: _recorderFilePath), settings: settings)
    audioRecorder.delegate = self
    audioRecorder.record(forDuration: 1)
} catch let error {
    print("error:" + error.localizedDescription)
}

At the end of ViewController.swift, add an AVAudioRecorderDelegate method, audioRecorderDidFinishRecording, with the following implementation, which mainly calls run_inference_wrapper to do audio post-processing and recognition:

func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully flag: Bool) {
    _btn.setTitle("Recognizing...", for: .normal)
    if flag {
        let result = RunInference_Wrapper().run_inference_wrapper(_recorderFilePath)
        _lbl.text = result
    }
    else {
        _lbl.text = "Recording error"
    }
    _btn.setTitle("Start", for: .normal)
}

In the AudioRecognition_Swift-Bridging-Header.h file, add #include "RunInference.h" so the preceding Swift code, RunInference_Wrapper().run_inference_wrapper(_recorderFilePath), works.

In the RunInference.mm, inside the run_inference_wrapper method, copy the code from ViewController.mm in the Objective-C AudioRecognition app, described in steps 5-8 of the last section, that converts the saved recorded audio to the format the TensorFlow model accepts and then sends it along with sample rate to the model to get the recognition result:

@implementation RunInference_Wrapper
- (NSString *)run_inference_wrapper:(NSString*)recorderFilePath {
...
}

If you really want to port as much code as possible to Swift, you can replace the audio file conversion code in C with Swift (see https://developer.apple.com/documentation/audiotoolbox/extended_audio_file_services for details). There are also some unofficial open source projects that provide the Swift wrapper of the official TensorFlow C++ API. But for simplicity and the right balance, we'll keep the TensorFlow model inference, and in this example, the audio file reading and conversion as well, in C++ and Objective-C, working together with the Swift code, which controls the UI and audio recording, and initiates the call to do audio processing and recognition.

That's all it takes to build a Swift iOS app that uses the speech commands recognition model. Now you can run it on your iOS simulator or actual device and see the exact same results as the Objective-C version.