Chapter 3. Analyzing Videos
In the previous chapter, we looked at different APIs for processing images. We are going to cover one new API: the Video Indexer API.
In this chapter, we will cover the following topics:
- General overview of Video Indexer
- Guide to Video Indexer using the prebuilt UI
Diving into Video Indexer
Video Indexer is a service that allows you to upload videos and gain insights from the videos that you upload. These insights can be used to make videos (and by extension your content) more discoverable. They can also be used to improve user engagement.
Using artificial intelligence technologies, Video Indexer enables you to extract a great deal of information. It can gain insights from the following list of features:
- Audio transcript, with language detection
- Creation of closed captions
- Noise reduction
- Face tracking and identification
- Speaker indexing
- Visual-text recognition
- Voice-activity detection
- Scene detection
- Keyframe extraction
- Sentiment analysis
- Translation
- Visual-content moderation
- Keyword extraction
- Annotations
- Detection of brands
- Object and action labeling
- Textual-content moderation
- Emotion detection
The following list shows a few typical scenarios where one might want to use Video Indexer:
- Search: If you have a library of videos, you can use the insights gained from Video Indexer to index each video. Indexing by (for example) spoken word or where two specific people are seen together can provide a much better search experience for users.
- Monetization: The value of each video can be improved by using the insights gained from Video Indexer. For example, you can deliver more relevant ads by using the video insights to present ads that are contextually correct. For instance, by using the insights, you can display ads for sports shoes in the middle of a football match instead of a swimming competition.
- User engagement: By using the insights gained from Video Indexer, you can improve user engagement by displaying relevant elements of the video. If you have a video covering different material for 60 minutes, placing video moments over that time allows the user to jump straight to the relevant section.
The following sections describe the key concepts that are important to understand when discussing Video Indexer.
A breakdown is a complete list containing all details of all the insights. This is where a full video transcript comes from; however, breakdowns are mostly too detailed for users. Instead, you typically want to use summarized insights to obtain only the most relevant knowledge. If more detailed insights is required, you would go from the summarized insights to the full breakdowns.
Instead of going through several thousand time ranges and checking for given data, one can use summarized insights. This will provide you with an aggregated view of the data, such as faces, keywords, and sentiments, and the time ranges they appear in.
From any transcribed audio in the video, Video Indexer will extract a list of keywords and topics that may be relevant to the video.
When a video is transcribed, it is also analyzed for sentiment. This means that you can gauge whether or not the video is more positive or negative.
Blocks are used to move through the data in an easy way. If there are changes to speakers or long pauses between audio, these might be indexed as separate blocks.