Jon Wood

One of the most requested features for ML.NET is the ability to create neural networks models from scratch to perform deep learning in ML.NET. The ML.NET team has taken that feedback and the feedback from the customer survey and has come out with a plan to start implementing this feature.

Current State of Deep Learning in ML.NET

Currently, in ML.NET, there isn't a way to create neural networks to have deep learning models from scratch. There is great support for taking an existing deep learning model and using it for predictions, however. If you have a TensorFlow or ONNX model then those can be used in ML.NET to make predictions.

There is also great support for transfer learning in ML.NET. This allows you to take your own data and train it against a pretrained model to give you a model of your own.

However, as mentioned earlier, ML.NET does not yet have the capability to let you create your own deep learning models from scratch. Let's take a look at what the plans are for this.

Future Deep Learning Plans

In the ML.NET GitHub repo there is an issue that was fairly recently created that goes over the plans to implement creating deep learning models in ML.NET.

There are two reasons for this:

Communicate to the community about what the plans are and that this is being worked on.
Get feedback from the community on the current plan.

While we'll touch on the main points in the issue in this post, I would highly encourage you to go through it and give any feedback or questions about the plans you may have to help the ML.NET team in their planning or implementation.

The issue details three parts in order to deliver creating deep learning models in ML.NET:

Make consuming of ONNX models easier
Support TorchSharp and make it production ready
Create an API in ML.NET to support TorchSharp

Let's go into each of these in more detail.

Easier Use of ONNX Models

While you can currently use ONNX models in ML.NET right now, you do have to know the input and output names in order to use it. Right now we rely on the Netron application to load the ONNX models to give us the input and output names. While this isn't bad, the team wants to expose an internal way to get these instead of having to rely on a separate application.

Of course, along with the new way to get the input and output names for ONNX models, the documentation will definitely be updated to reflect this. I believe, not only documentation, but examples would follow to show how to do this.

Supporting TorchSharp

TorchSharp is the heart of how ML.NET will implement deep learning. Similar to how Tensorfow.NET supports scoring TensorFlow models in ML.NET, this will provide access to the PyTorch library in Python. PyTorch is starting to lead the way in building deep learning models in research and in industry so it makes sense to implmement in ML.NET.

In fact, one of the popular libraries to build deep learning models is FastAI. Not only is FastAI one of the best courses to take when learning deep learning, but the Python library is one of the best in terms of building deep learning models. Under the hood, though, FastAI uses PyTorch to actually build the models that it produces. This isn't by accident. The FastAI developers decided that PyTorch was the way to go for this.

TensorFlow is great to support for predicting existing models, but for building new ones from scratch I really think PyTorch and TorchSharp is the preferred way. To do this, TorchSharp will help ML.NET lead the way.

Implementing TorchSharp into ML.NET

The final stage is, once TorchShap has been made production ready, create a high-level API in ML.NET to train deep learning models from scratch.

This will be like when Keras came along for TensorFlow. It was an API on top of TensorFlow to help make building the models much easier. I believe ML.NET can do that for TorchSharp.

This will probably be a big undertaking but definitely worth doing. This will be the API people will use to build their models so taking the time to get this the best way possible. will be worth it in the long run to let us build our models the most trivial way possible which will make us more productive in the long run.

Conclusion

Creating deep learning models from scratch is, by far, one of the most requested features for ML.NET and their plan to do this is definitely going to reach this goal. In fact, I think it will surpass this goal since it will use PyTorch on the backend which is where research and the industry is leaning towards.

If you have any feedback or questions, definitely feel free to comment on the GitHub issue.

A big thing around apps these days are that they are much more intelligent than they used to be. And with that users are expecting more and more from apps. Usually, the way to do this is to create machine learning or deep learning models yourself and deploy that within your applications. But now, Microsoft can do the heavy lifting for you with their suite of Cognitive Services.

These services can do everything you need from sentiment analysis on text to face and emotion detection. In this post, we'll go over each service that is provided so you can see what all is available and decide which one is best for your application.

Cognitive Services are divided up into several areas:

There's also a separate experimental area, called Cognitive Service Labs, where Microsoft puts newer services out for people to try and give feedback on so they can improve on them. Let's take a deeper dive into each of these services and what they offer.

Vision

Computer Vision

Perhaps one of the first of the cognitive services, the Computer Vision API does quite a lot. It can give you the following:

Text description of what it thinks is happening in a photo
Tags of what it thinks it finds in a photo
Content moderator scores in terms of adult content
How it best things to categorize the photo
And if it recognizes any faces within the photo such as any celebrities

That's a lot for one API to return and some of these things, such as the faces and content moderator, are their own API that you can implement separately.

Content Moderator

Do you have a site where people can upload images and videos but don't want to go through each upload to make sure there's no adult content? This API will do all of that for you.

The API can look at images to detect if it has a high confidence that it is adult content, but it can also detect if the image is racy. That is, if it is midly sexual but not considered adult. It's no limited to just images, though. It can also do these checks with video.

This can also look at text to see if there is any profanity in it. But it goes beyond just profanity. It can also help check if there is any personal identifiable information (PII) in the text to make sure no personal information is published to your site. In these times of digital privacy being even more important, this is going to be very helpful.

Face API

The Face API lets you do a few things in terms of finding and identifying people based on their face in a photo. This includes verifying faces from two photos match, recognizing emotion from a person's face, and detecting faces in a photo.

The emotion detection is interesting in that the API response will give you scores and the highest scored emotion is the one it thinks the face is giving.

"scores": {
      "anger": 0.09557262,
      "contempt": 0.003917685,
      "disgust": 0.684764564,
      "fear": 4.03712329E-06,
      "happiness": 8.999826E-08,
      "neutral": 0.002147009,
      "sadness": 0.213587672,
      "surprise": 6.34691469E-06
    }

In this response, "disgust" was the highest with a score of almost 70%.

Video Indexer

The Video Indexer API is really interesting. It combines a few of the other APIs including content moderation, speech, and sentiment analysis to give a complete look at a video. For a complete walkthrough at what it can do, I have a Wintellect post that details the video indexer.

Custom Vision

The Customer Vision API is where you get to upload your own sets of images, tag them, and then train a model to classify those images into the tags. This is a lot more powerful than just using the regular Computer Vision API where you can use your own images to train a model. This gives you or your business a more specialized model that is much easier to train and deploy than doing it by hand.

Speech

Speech to Text and Text to Speech

These are similar and are mostly self explanatory, but there are some subtlties that Microsoft gives you with them. For instance, the APIs can be customized to detect specific vocabularies and accents for a more accurate detection of the speech.

Speaker Recognition

This API can recognize who is speaking of an unknown speaker and gives you the speaker's name. This works by sending in different recordings of a person's voice and associating them with that voice. Then the API knows how to recognize their voice and it works quite well.

This API can also do speaker verification. Similar to enrolling the voice for recognition, you'd have to train the API to know who is speaking so it can recognize the voice.

Speech Translation

A fairly simple API but a powerful one. This adds real time and multi-language translation to speech in your apps. So if your application needs to support this type of feature, then just implement this API instead of spending the resources to create this model on your own, which may not even be as accurate as Microsoft's.

Knowledge

QnA Maker

This API is unique and serves to solve a specific problem - can automate responses to commonly asked questions about my site or services? QnA Maker makes this really easy by having a web interface where you can put the questions and their answers in and then train a model to detect the question being asked and output the correct answer to it.

This API is mainly used in conjunction with other services, specifically LUIS, that will be highlighted in the language section, and the Bot Framework to create bot to answer those types of questions for your users.

Custom Decision Service

This API is a newer offering and is still in preview, but is also one to look out for. It is a way to offer custom recommendations on certain items. Not only that, but it does this in real time through reinforcement learning.

Search

Bing Web Search

This simple yet powerful API gives your applications the ability to search the web through a keyword. The API brings back search results of web pages, images, video, and even news with a single API call.

Bing Visual Search

This API allows you to search with an image and it will return any results that match that image. For example, if a customer wants to purchase a similar couch to replace their current one, your application can give results based off of a photo of their couch that they take and give product recommendations based on the results.

Bing Entity Search

This API can give some extra context about people, places, and things that your application may refer to based off what is available out on the web about them. As an example, suppose your application gives recommendations on local places to visit depending on where your user is. With this API you can give more context about the items that your application finds to your user so they can make a more informed decision on where to go or what to do next.

Bing Video Search

This video search API does the same thing as the web search API, but it only returns videos. However, this API gives a lot more in the results than just links to videos; you also get information such as the embed HTML, view count, and video length for starters. This is a perfect API if your application content is all about videos from around the web.

Bing News Search

Another specific API, this one returns news articles. Similar to the Video Search API, this one returns a lot of additional information you can use within your application. For instance, along with the news article URL, you get the date it was created, related news and categories to the current news article, and specific mentions within the article.

Bing Image Search

You can't have specific search on news and videos without having one that's for images. Not only does this API give you the full URL to the image, but you also get metadata on each image, thumbnails, and information on the website that published the image.

Bing Custom Search

While the other search APIs are for the entire web, the Custom Search API will allow you to have search capabilities on your own custom domain. This means that search results will only be relevant to your domain and nothing else unless you specify other web pages to include. An example for this API can be if you have a huge knowledge base of FAQ on your site about all the products your company uses, then incorporating the Custom Search API will let users search through the entire knowledge base so they can quickly find an answer instead of browsing through all of the content.

Bing Autosuggest

This is a unique API in all it does is, as you type in a search box, the API gets called giving suggestions on what it thinks you're typing. You've seen this if you've done a Google search, but to help illustrate, and see some results this API gives, here's a gif:

Language

Text Analytics

This API is probably the most essential when you want to get information from text. That's due to what all this API gives you. You get sentiment analysis, it detects what language the text is in, and it extracts key phrases.

The API has actually been updated to also include a way to identify entities within the text. What this gives you is a way to tell if the text refers to a specific place, person, or organization.

Bing Spell Check

Another interesting API offering, the Spell Check API will analyze your text and give any suggestions on what it thinks should be fixed.

For instance, if I give the API the text of He has fixed there enviornment, it will come back with the following JSON:

{
  "_type": "SpellCheck",
  "flaggedTokens": [
    {
      "offset": 13,
      "token": "there",
      "type": "UnknownToken",
      "suggestions": [
        {
          "suggestion": "their",
          "score": 0.786016486722212
        },
        {
          "suggestion": "the",
          "score": 0.702565790712439
        }
      ]
    },
    {
      "offset": 19,
      "token": "enviornment",
      "type": "UnknownToken",
      "suggestions": [
        {
          "suggestion": "environment",
          "score": 0.786016486722212
        }
      ]
    }
  ]
}

Translator Text

This API can translate text between, as of right now, over 60 languages. Since it is REST based, there are an endless amount of applications for this API, such as helping you to localize your applications, communication and messaging applications, or you can mix in the Vision API to read text from images and translate that to another language with this API, and use the Speech API to speak the translation out from your application.

Content Moderator

This simple API looks at images and video to determine if they have adult content in it. This is a very useful API to help you make sure no adult images get uploaded to your website if you have that feature.

This API also includes a flag in the response if the image or video is racy or not. If it is racy it isn't necessarily adult, but somewhat close to it. For example, this API may flag a photo as racy if it is of a person in a bathing suit.

Language Understanding

Language Understanding Intelligent Service, or LUIS, is a nice part of the Cognitive Services family that allows applications to understand natural language. I've also talked more in detail about LUIS in a Wintellect post as well as how to integrate LUIS into the bot framework.

Cognitive Service Labs

Gesture

This API allows you to implement custom gestures into your application. And by gestures, it lets you implement hand gestures. Using hand gestures in your application can allow your users to be more productive and your applications be more intuitive to use.

Ink Analysis

Ink Analysis allows you to not just allow a user to use a digital pen within your application, but it can understand what they wrote so you can take action on it.

This API can understand several things, such as:

Shapes
Handwritten text in over 60 languages
Layout of text

Local Insights

This fairly simple API can give you scores on locations around you by the different types of aminities that those places have. You can also tell it to score based on the proximity to you, or by other custom criteria.

Event Tracking

This simple API will return events based off of a Wikipedia entry. For example, if I give a query of "Microsoft" I'll get a response like the below:

{
      "relatedEntities": [
        "Microsoft",
        "Surface 3"
      ],
      "latestDocument": {
        "documentId": "ceeddb607592dc8260110855489fa45b",
        "source": "Neowin",
        "publishDate": "2018-06-01T21:22:01Z",
        "title": "Surface 3 gets its first firmware update in a year and a half",
        "summary": "Microsoft today released a firmware update for its Surface 3 tablet, and it's the first update that the device has received since September 27, 2016.It contains security and display fixes.",
        "url": "https://www.neowin.net/news/surface-3-gets-its-first-firmware-update-in-a-year-and-a-half",
        "entities": [
          "Microsoft",
          "Surface 3"
        ]
      }
    },
    {
      "relatedEntities": [
        "Paul Allen",
        "Bill & Melinda Gates Foundation",
        "Government of India"
      ],
      "latestDocument": {
        "documentId": "ceeddc75d6a8971302b102291211f79f",
        "source": "UW CSE News",
        "publishDate": "2018-06-01T20:08:03Z",
        "title": "Thank you to our state legislators!",
        "summary": "On Thursday the Paul G. Allen School was honored to host UW’s annual reception thanking our state legislators for their investments in education.In the case of the Allen School, recent investments include substantial support for the Bill & Melinda Gates Center – a second building that will double our space when it opens in January – and multiple years of funding for enrollment growth that have more than doubled our degree capacity.",
        "url": "https://news.cs.washington.edu/2018/06/01/thank-you-to-our-state-legislators/",
        "entities": [
          "Paul Allen",
          "Bill & Melinda Gates Foundation",
          "Government of India"
        ]
      }
    }

Answer Search

This API is designed to make search in your application much more efficient and faster. It does this by recognizing entities with search queries to determine what the query is about and allows your users to get quicker answers.

Personality Chat

This will easily enable your bots to have more small talk, or conversation about unimportant subjects. Specifically, Personality Chat can add the following:

Remove fallback answers such as "I don't know", or "I didn't get that".
Add personality to the small talk.
Add customizable small talk responses.

Knowledge Exploration

This API can help enable search experiences over data with natural language queries. If you've used the Q&A feature in Power BI then you've seen an example of Knowledge Exploration.

If you're curious about more Power BI, feel free to checkout this post on the Wintellect blog.

Academic Knowledge

This API will allow you to get insights of academic content, mainly researach papers. Similar to Knowledge Exploration this API can interpret natural language queries to return more accurate results.

URL Preview

This API is specific in that it provides a URL's title, description, and a relevent image. This API will also let you know if the URL is flagged for adult content.

Conversation Learner

Similar to LUIS, Conversation Learner allows you to train models based on your own criteria for dialogs. Mainly to be used for the Bot Framework, this will definitely help people develop better bots to help their customers.

Anomaly Finder

This API will help you find anomalies, or outliers, in your data. Mainly used for time series data it will return back which of your data points is considered an anomaly.

Custom Decision

This API is interesting as it will allow a recommendation engine as a sevice. Using reinforcement learning, it gives personalized content based on the information that you provide to it.

It was a long journey, but we have went through each of the current offerings from Microsoft's Cognitive Services. Feel free to try any of these as they will really help to make your applications much more intelligent.

Blog

Current State of Deep Learning in ML.NET

Future Deep Learning Plans

Easier Use of ONNX Models

Supporting TorchSharp

Implementing TorchSharp into ML.NET

Conclusion

Vision

Computer Vision

Content Moderator

Face API

Video Indexer

Custom Vision

Speech

Speech to Text and Text to Speech

Speaker Recognition

Speech Translation

Knowledge

QnA Maker

Custom Decision Service

Search

Bing Web Search

Bing Visual Search

Bing Entity Search

Bing Video Search

Bing News Search

Bing Image Search

Bing Custom Search

Bing Autosuggest

Language

Text Analytics

Bing Spell Check

Translator Text

Content Moderator

Language Understanding

Cognitive Service Labs

Gesture

Ink Analysis

Local Insights

Event Tracking

Answer Search

Personality Chat

Knowledge Exploration

Academic Knowledge

URL Preview

Conversation Learner

Anomaly Finder

Custom Decision

Jon Wood