Verifying a person through speech

The process of verifying if a person is who they claim to be is quite similar to the identification process. To show how it is done, we will create a new example project, as we do not need this functionality in our smart-house application.

Add the Microsoft.ProjectOxford.SpeakerRecognition and NAudio NuGet packages to the project. We will need the Recording class that we used earlier, so copy this from the smart-house application's Model folder.

Open the MainView.xaml file. We need a few elements in the UI for the example to work. Add a Button element to add speaker profiles. Add two Listbox elements. One will hold available verification phrases while the other will list our speaker profiles.

Add Button elements for deleting a profile, starting and stopping enrollment recording, resetting enrollment, and starting/stopping verification recording.

In the ViewModel, you will need to add two ObservableCollection properties: one of type string, the other of type Guid. One will contain the available verification phrases, while the other will contain the list of speaker profiles. You will also need a property for the selected speaker profile, and we also want a string property to show the status.

The ViewModel will also need seven ICommand properties, one for each of our buttons.

Create a new class in the Model folder and call this SpeakerVerification. Add two new classes beneath this one, in the same file.

The first one is the event arguments that we will pass on when we raise a status update event. The Verification property will, if set, hold the verification result, which we will see presently:

    public class SpeakerVerificationStatusUpdateEventArgs : EventArgs
    {
        public string Status { get; private set; }
        public string Message { get; private set; }
        public Verification VerifiedProfile { get; set; }

       public SpeakerVerificationStatusUpdateEventArgs(string status,string message)
       {
           Status = status;
           Message = message;
       }
    }

The next class is a generic event argument, which is used when we raise an error event. In SpeakerVerification itself, add the following events:

    public class SpeakerVerificationErrorEventArgs : EventArgs
    {
        public string ErrorMessage { get; private set; }

        public SpeakerVerificationErrorEventArgs(string errorMessage)
        {
            ErrorMessage = errorMessage;
        }
    }

For our convenience, add helper functions to raise these. Call them RaiseOnVerificationStatusUpdated and RaiseOnVerificationError. Raise the correct event in each of them:

    public event EventHandler <SpeakerVerificationStatusUpdateEventArgs> OnSpeakerVerificationStatusUpdated;

    public event EventHandler<SpeakerVerificationErrorEventArgs> OnSpeakerVerificationError;

We also need to add a private member called ISpeakerVerificationServiceClient. This will be in charge of calling the API. We inject this through the constructor.

Add the following functions to the class:

CreateSpeakerProfile: No parameters, the async function, and the return type Task<Guid>
ListSpeakerProfile: No parameters, the async function, and the return type Task<List<Guid>>
DeleteSpeakerProfile: Guid as the required parameter, the async function, no returned values
ResetEnrollments: Guid as the required parameter, the async function, no returned values

The contents of these functions can be copied from the corresponding functions in the smart-house application, as they are exactly the same. The only difference is that you need to change the API call from _speakerIdentificationClient to _speakerVerificationClient. Also, raising the events will require the newly created event arguments.

Next, we need a function to list verification phrases. These are phrases that are supported for use with verification. When enrolling a profile, you are required to say one of the sentences in this list.

Create a function named GetVerificationPhrase. Have it return Task<List<string>>, and mark it as async:

    public async Task<List<string>> GetVerificationPhrase()
    {
        try
        {
            List<string> phrases = new List<string>();

            VerificationPhrase[] results = await _speakerVerificationClient.GetPhrasesAsync("en-US");

We will make a call to GetPhrasesAsync, specifying the language we want the phrases to be in. At the time of writing, English is the only possible choice.

If this call is successful, we will get an array of VerificationPhrases in return. Each element in this array contains a string with the following phrase:

            foreach(VerificationPhrase phrase in results) {
                phrases.Add(phrase.Phrase);
            }
            return phrases;
        }

We loop through the array and add the phrases to our list, which we will return to the caller.

So, we have created a profile and we have the list of possible verification phrases. Now, we need to do the enrollment. To enroll, the service requires at least three enrollments from each speaker. This means that you choose a phrase and enroll it at least three times.

When you do the enrollment, it is highly recommended to use the same recording device that you will use for verification.

Create a new function called CreateSpeakerEnrollment. This should require a Stream and a Guid. The first parameter is the audio to use for enrollment. The latter is the ID of the profile we are enrolling. The function should be marked as async, and have no return value:

    public async void CreateSpeakerEnrollment(Stream audioStream, Guid profileId) {
        try {
            Enrollment enrollmentStatus = await _speakerVerificationClient.EnrollAsync(audioStream, profileId);

When we call EnrollAsync, we pass on the audioStream and profileId parameters. If the call is successful, we get an Enrollment object back. This contains the current status of enrollment and specifies the number of enrollments you need to add before completing the process.

If the enrollmentStatus is null, we exit the function and notify any subscribers. If we do have status data, we raise the event to notify it that there is a status update, specifying the current status:

            if (enrollmentStatus == null) {
                RaiseOnVerificationError(new SpeakerVerificationErrorEventArgs("Failed to start enrollment process."));
                return;
            }

           RaiseOnVerificationStatusUpdate(new SpeakerVerificationStatusUpdateEventArgs("Succeeded", $"Enrollment status:{enrollmentStatus.EnrollmentStatus}"));
       }

Add the corresponding catch clause to finish up the function.

The last function we need in this class is a function for verification. To verify a speaker, you need to send in an audio file. This file must be at least 1 second and at most 15 seconds long. You will need to record the same phrase that you used for enrollment.

Call the VerifySpeaker function and make it require a Stream and Guid. The stream is the audio file we will use for verification. The Guid is the ID of the profile we wish to verify. The function should be async and have no return type:

    public async void VerifySpeaker(Stream audioStream, Guid speakerProfile) {
        try {
            Verification verification = await _speakerVerificationClient.VerifyAsync(audioStream, speakerProfile);

We will make a call to VerifyAsync from _speakerVerificationClient. The required parameters are audioStream and speakerProfile.

A successful API call will result in a Verification object in response. This will contain the verification results, as well as the confidence of the results being correct:

            if (verification == null) {
                RaiseOnVerificationError(new SpeakerVerificationErrorEventArgs("Failed to verify speaker."));
                return;
            }

            RaiseOnVerificationStatusUpdate(new SpeakerVerificationStatusUpdateEventArgs("Verified", "Verified speaker") { VerifiedProfile = verification });             
        }

If we do have a verification result, we raise the status update event. Add the corresponding catch clause to complete the function.

Back in the ViewModel, we need to wire up the commands and event handlers. This is done in a similar manner as for speaker identification, and as such we will not cover the code in detail.

With the code compiling and running, the result may look similar to the following screenshot:

Here, we can see that we have created a speaker profile. We have also completed the enrollment and are ready to verify the speaker.

Verifying the speaker profile may result in the following:

As you can see, the verification was accepted with high confidence.

If we try to verify this using a different phrase or let someone else try to verify as a particular speaker profile, we may end up with the following result:

Here, we can see that the verification has been rejected.