A big thing around apps these days are that they are much more intelligent than they used to be. And with that users are expecting more and more from apps. Usually, the way to do this is to create machine learning or deep learning models yourself and deploy that within your applications. But now, Microsoft can do the heavy lifting for you with their suite of Cognitive Services.
These services can do everything you need from sentiment analysis on text to face and emotion detection. In this post, we'll go over each service that is provided so you can see what all is available and decide which one is best for your application.
Cognitive Services are divided up into several areas:
There's also a separate experimental area, called Cognitive Service Labs, where Microsoft puts newer services out for people to try and give feedback on so they can improve on them. Let's take a deeper dive into each of these services and what they offer.
Vision
Computer Vision
Perhaps one of the first of the cognitive services, the Computer Vision API does quite a lot. It can give you the following:
- Text description of what it thinks is happening in a photo
- Tags of what it thinks it finds in a photo
- Content moderator scores in terms of adult content
- How it best things to categorize the photo
- And if it recognizes any faces within the photo such as any celebrities
That's a lot for one API to return and some of these things, such as the faces and content moderator, are their own API that you can implement separately.
Content Moderator
Do you have a site where people can upload images and videos but don't want to go through each upload to make sure there's no adult content? This API will do all of that for you.
The API can look at images to detect if it has a high confidence that it is adult content, but it can also detect if the image is racy. That is, if it is midly sexual but not considered adult. It's no limited to just images, though. It can also do these checks with video.
This can also look at text to see if there is any profanity in it. But it goes beyond just profanity. It can also help check if there is any personal identifiable information (PII) in the text to make sure no personal information is published to your site. In these times of digital privacy being even more important, this is going to be very helpful.
Face API
The Face API lets you do a few things in terms of finding and identifying people based on their face in a photo. This includes verifying faces from two photos match, recognizing emotion from a person's face, and detecting faces in a photo.
The emotion detection is interesting in that the API response will give you scores and the highest scored emotion is the one it thinks the face is giving.
"scores": {
"anger": 0.09557262,
"contempt": 0.003917685,
"disgust": 0.684764564,
"fear": 4.03712329E-06,
"happiness": 8.999826E-08,
"neutral": 0.002147009,
"sadness": 0.213587672,
"surprise": 6.34691469E-06
}
In this response, "disgust" was the highest with a score of almost 70%.
Video Indexer
The Video Indexer API is really interesting. It combines a few of the other APIs including content moderation, speech, and sentiment analysis to give a complete look at a video. For a complete walkthrough at what it can do, I have a Wintellect post that details the video indexer.
Custom Vision
The Customer Vision API is where you get to upload your own sets of images, tag them, and then train a model to classify those images into the tags. This is a lot more powerful than just using the regular Computer Vision API where you can use your own images to train a model. This gives you or your business a more specialized model that is much easier to train and deploy than doing it by hand.
Speech
Speech to Text and Text to Speech
These are similar and are mostly self explanatory, but there are some subtlties that Microsoft gives you with them. For instance, the APIs can be customized to detect specific vocabularies and accents for a more accurate detection of the speech.
Speaker Recognition
This API can recognize who is speaking of an unknown speaker and gives you the speaker's name. This works by sending in different recordings of a person's voice and associating them with that voice. Then the API knows how to recognize their voice and it works quite well.
This API can also do speaker verification. Similar to enrolling the voice for recognition, you'd have to train the API to know who is speaking so it can recognize the voice.
Speech Translation
A fairly simple API but a powerful one. This adds real time and multi-language translation to speech in your apps. So if your application needs to support this type of feature, then just implement this API instead of spending the resources to create this model on your own, which may not even be as accurate as Microsoft's.
Knowledge
QnA Maker
This API is unique and serves to solve a specific problem - can automate responses to commonly asked questions about my site or services? QnA Maker makes this really easy by having a web interface where you can put the questions and their answers in and then train a model to detect the question being asked and output the correct answer to it.
This API is mainly used in conjunction with other services, specifically LUIS, that will be highlighted in the language section, and the Bot Framework to create bot to answer those types of questions for your users.
Custom Decision Service
This API is a newer offering and is still in preview, but is also one to look out for. It is a way to offer custom recommendations on certain items. Not only that, but it does this in real time through reinforcement learning.
Search
Bing Web Search
This simple yet powerful API gives your applications the ability to search the web through a keyword. The API brings back search results of web pages, images, video, and even news with a single API call.
Bing Visual Search
This API allows you to search with an image and it will return any results that match that image. For example, if a customer wants to purchase a similar couch to replace their current one, your application can give results based off of a photo of their couch that they take and give product recommendations based on the results.
Bing Entity Search
This API can give some extra context about people, places, and things that your application may refer to based off what is available out on the web about them. As an example, suppose your application gives recommendations on local places to visit depending on where your user is. With this API you can give more context about the items that your application finds to your user so they can make a more informed decision on where to go or what to do next.
Bing Video Search
This video search API does the same thing as the web search API, but it only returns videos. However, this API gives a lot more in the results than just links to videos; you also get information such as the embed HTML, view count, and video length for starters. This is a perfect API if your application content is all about videos from around the web.
Bing News Search
Another specific API, this one returns news articles. Similar to the Video Search API, this one returns a lot of additional information you can use within your application. For instance, along with the news article URL, you get the date it was created, related news and categories to the current news article, and specific mentions within the article.
Bing Image Search
You can't have specific search on news and videos without having one that's for images. Not only does this API give you the full URL to the image, but you also get metadata on each image, thumbnails, and information on the website that published the image.
Bing Custom Search
While the other search APIs are for the entire web, the Custom Search API will allow you to have search capabilities on your own custom domain. This means that search results will only be relevant to your domain and nothing else unless you specify other web pages to include. An example for this API can be if you have a huge knowledge base of FAQ on your site about all the products your company uses, then incorporating the Custom Search API will let users search through the entire knowledge base so they can quickly find an answer instead of browsing through all of the content.
Bing Autosuggest
This is a unique API in all it does is, as you type in a search box, the API gets called giving suggestions on what it thinks you're typing. You've seen this if you've done a Google search, but to help illustrate, and see some results this API gives, here's a gif: