What's New in the ML.NET CLI

The ML.NET CLI has gotten some interesting updates. This post will go over the main items that are new.

For a video version of this post, check below.

New Install Name

The first thing to make note of is that there is a new name when installing the newer versions of the ML.NET CLI. Since the file size got too big for a single .NET tool, it is now split up into multiple installs depending on what operating system and CPU architecture you're running.

So getting the newest version will require a new install even if you have the older version installed. Actually, I would recommend to go ahead and uninstall the older version of the CLI if you already have it installed. This can be done with the dotnet tool uninstall mlnet --global command.

So depending on your machine is what you will install. I have a M1 MacBook Pro, so I would install the mlnet-osx-arm version. If you're on Windows, you will probably be installing the mlnet-win-x64 version.

If you want to update a previously installed newer version, you can use the dotnet tool update command.

Train with a mbconfig File

With the new CLI release, it comes with a couple of new command. The first we'll go over is the train command. This takes in a single required argument, which is a mbconfig file. This will use the information in the mbconfig file and will perform another training run.

This can be good for a few scenarios, including continuous integration where the mbconfig file is checked into version control and can be run each day to see if a new model can be discovered.

Forecasting

Along with the train command a new scenario has been added - forecasting. Forecasting is primarily used for time series data to forecast values in the future. Similar to the other scenarios, we have a few arguments we can pass in.

The dataset and label-col arguments are similar to the other scenarios, but forecasting has a couple of others that are required - horizon and time-col .

The horizon argument is simply the number of items in the future you want the forecasting algorithm to predict.

The time-col argument is just the column that has the time or dates that the algorithm can use.

And we can run this like other scenarios with the below command. We'll let it run only for 10 seconds with the --train-time argument. The data can be found here if you want to run it as well.

mlnet forecasting --dataset C:/dev/wind_gen.txt --horizon 3 --label-col 1 --time-col 0 --train-time 10


A couple of big additions to the CLI and I'm sure more are coming. It is nice to see that the ML.NET team is continuing to keep the CLI's features on par with Model Builder.

What's New in ML.NET Version 1.6

Another new release of ML.NET is now out! The release notes for version 1.6 has all the details, but this post will highlight all of the more interesting updates from this version. I'll also include the pull request for each item in case you want to see more details on it or learn how something was implemented.

There were a lot of things added to this release, but they did make a note that there are no breaking changes from everything that was added.

For the video version of this post, check below.

Support for ARM

Perhaps the most exciting part of this update is the new support for ARM architectures. This will allow for most training and inference in ML.NET.

Why is this update useful? Well, ARM architectures are almost everywhere. As mentioned in the June update blog post this ARM architectures are included on mobile and embedded devices. This can open up a whole world of opportunities for ML.NET for mobile phones and IoT devices.

DataFrame Updates

The DataFrame API is probably one of the more exciting packages that's currently in the early stages. Why? Well, .NET doesn't have much in terms of competing with pandas in Python for data analysis or data wrangling to handle some preprocessing that you may need before you send the data into ML.NET to make a model.

Why am I including DataFrame updates in a ML.NET update? Well, the DataFrame API has been moved into the ML.NET repository! The code used to be in the CoreFx Lab repository as an experimental package, but now it's no longer experimental and now part of ML.NET. This is great news since it is planned to have many more updates to this API.

Other DataFrame updates include:

  • GroupBy operation extended - While the DataFrame API already had a GroupBy operation, this update adds new property groupings and makes it act more like LINQ's GroupBy operation.
  • Improved CSV parsing - Implemented the TextFieldParser that can be used when loading a CSV file. This allows the handling of quotes in columns.
  • ConvertIDataViewtoDataFrame - We've already had a way to convert a DataFrame object into an IDataView to be able to use data loaded with the DataFrame API into ML.NET, but now we can do the opposite where we can load data in ML.NET and convert it into a DataFrame object to perform further analysis on it.
  • Improved DateTime parsing - This allows for better parsing of date time data.
  • Improvements to the Sort and Merge methods - These updates allow for better handling of null fields when performing a sort or merge.

By the way, if you're looking for a way to help contribute to the ML.NET repository, helping with the DataFrame API is a great way to get involved. They have quite a few issues already that you can take a look at and help out with. It would be awesome if we got this package on par with pandas to help make C# a great ecosystem to perform data analysis.

You can use the Microsoft.Data.Analysis label on the issues to filter them out so you can see what all they need help with.

Code Enhancements

Quite a few of the enhancement updates were code quality updates. In fact, feiyun0112 did several pull requests that improved the code quality of the repo helping to make it easier for folks to read and maintain it.

Miscellaneous Updates

There were also quite a lot of updates that didn't really tie in to a single theme. Here are some of the more interesting ones.

These are just a few of the changes in this release. Version 1.6 has a lot of stuff in it so I encourage you to go through the full release notes to see all the items that I didn't include in this post.


What was your favorite update in this release? Was it ARM support or the new DataFrame enhancements? Let me know in the comments!