Change Log

Version 1.9.0.2 (Sep 8, 2020)

  • Improvements:

    • Enable GPU support for PyTorch (BERT) models on IBM Power

    • Allow specification of destination file path for downloads from Python client

    • Enable large data upload for R client

  • Bug Fixes:

    • Fix OpenID and TLS login redirection when deploying behind reverse proxy

Version 1.9.0.1 (Aug 10, 2020)

  • Bug Fixes:

    • Fix migration for certain time-series experiments

    • Fix missing files for automatic image model

    • Fix MLI job status for PDP/ICE

    • Fix handling of ID column for MLI kernel shapley

    • Fix exception handling for startup failures

    • Constrain Python environment for standalone scoring package

Version 1.9.0 (July 27, 2020)

  • New Features:

    • Multinode training (alpha)

    • Queuing of experiments to avoid system overload

    • Automatic Leaderboard: Single-button creation of a project with a series of diverse experiments

    • Multi-layer hierarchical feature engineering:

      • Allow optional pre-processing layer for specific custom data cleanup/conversions

      • Subsequent layers take each previous layer’s output as input (can be numeric or categorical/string)

    • PyTorch deep learning backend in addition to TensorFlow

    • Image classification and regression with pre-trained and fine-tuned state-of-the-art Deep Learning models:

      • Image data ingest from binary archives

        • Archives can contain (one) optional .csv file with mapping of image paths to target (regression/classification)

        • Automatic training dataset creation and label creation (from directory structure) if no .csv provided

      • Image Transformers (for converting image path columns

        • “densenet121”, “efficientnetb0”, “efficientnetb2”, “inception_v3”, “mobilenetv2”, “resnet34”, “resnet50”, “seresnet50”, “seresnext50”, “xception”

        • Optional fine-tuning

        • Optional GPU acceleration (strongly recommended when enabling fine-tuning)

        • Pretrained and fine-tuneable ImageVectorizer transformer with automatic dimensionality reduction

        • Images can be provided either as zipped archives, or as paths to local or remote locations (URIs)

        • Automatic image labeling when importing zipped archives of images (based on folder names and structure)

        • Can handle multiple image columns with URIs in a tabular dataset

        • Single experiment can combine image, NLP and tabular data

        • MOJO support (also for CPU-only systems)

      • Automatic Image model

        • End-to-end model training, no tuning needed

        • State-of-the-art results with grandmaster techniques

        • Neural architecture search based on pretrained and fine-tuned TensorFlow models

        • Multi-GPU training

        • Visual insights in GUI (losses, sample images, augmentation, Grad-CAM visual explanations)

      • MLI is not available for image experiments and is a work in progress

    • PyTorch BERT NLP pre-trained and fine-tuned state-of-the-art Deep Learning models:

      • “bert-base-uncased”, “distilbert-base-uncased”, “xlnet-base-cased”, “xlm-mlm-enfr-1024”, “roberta-base”, “albert-base-v2”, “camembert-base”, “xlm-roberta-base”

      • Optional GPU acceleration (strongly recommended)

      • MOJO support (also for CPU-only systems)

      • BERT transformers (for converting text columns into numeric features for other models like GBMs)

      • BERT models (when only have one text column)

    • AutoReport now includes the following:

      • Information about the time series validation strategy

      • Experiment lineage (model lineage plot)

      • NLP/Image architecture details

    • Zero-inflated regression models for insurance use cases (combination of classification + regression models)

    • Time series centering and de-trending transformations:

      • Inner ML model is trained on residuals after fitting and removing trend from target signal (per time-series group)

      • Support for constant (centering), linear and logistic trends

      • SEIRD model for epidemic modeling of (S)usceptible, (E)xposed, (I)nfected, (R)ecovered and (D)eceased, fully configurable lower/upper bounds for model parameters

    • Graphical config.toml editor for expert settings

    • Empiric prediction intervals for regression problems with user-defined confidence levels (based on holdout predictions)

    • Insights tab with helpful visualizations (currently only for time-series and image problems)

    • For binary classification problems with F05, F1, F2, MCC scorers, use the same metric for optimal threshold determination

    • Custom data recipes can now be part of the experiment’s modeling pipeline, and will be part of the Python scoring package

    • Custom visualizations in AutoViz following the Grammar of Graphics

    • Pass data to (custom) scorers, so can access other columns, not only actual and predicted values

    • Added many new scorers for common regression and classification metrics out of the box

    • Added holiday calendar for 24 more countries, allow user to select list of countries to create is-holiday features for.

    • Added identity_no_clip target transformer for regression problems that never clips the predictions to observed ranges and allows extrapolation

    • MLI:

      • New GUI/UX for MLI

      • Added Kernel Explainer for original feature Shapley importance

      • Added ability to download Shapley values for original features from UI as CSV file

      • Added intercept column to k-LIME output CSV file

      • Added ability to run surrogate models on DAI model residuals to help debug model errors

      • Added ability to export Decision Tree Surrogate model rules as text and Python code

      • Added Decision Tree Surrogate model for multinomial experiments

      • Added Leave One Covariate Out (LOCO) for multinomial experiments

      • Added two traditional fair lending metrics for Disparate Impact Analysis (DIA): Standardized Mean Difference (SMD) and Marginal Error (ME)

      • Added two interpretable model recipes to https://github.com/h2oai/driverlessai-recipes: GA2M and XNN (https://github.com/h2oai/driverlessai-recipes/tree/master/models/mli)

      • Display prediction label for binary classification experiments in MLI summary page

  • Improvements:

    • Improved parsability (machine readability) of log files

    • Custom recipes are now only visible to the user that created them, previously created custom recipes remain globally visible

    • Faster time-series experiments

    • Improve preview to show more details about modeling part of final pipeline

    • Improved notifications system

    • Reduced size of MOJO

    • Only allow imbalanced sampling techniques when data is larger than user controllable threshold

    • Upgraded to latest H2O-3 backend for custom recipes

    • Faster feature selection for large imbalanced datasets

  • Documentation updates:

    • Added animated GIFs

    • Added tabbed content

    • Added more details for imbalanced sampling methods for binary classification

    • New content (refer to above linked topics)

  • Bug fixes:

    • Various bug fixes

Version 1.8.7.2 LTS (July 13, 2020)

  • Bug Fixes:

    • Add and pass authentication_method parameter to use proper get_true_username and start_session

    • SQL-like connector: strip unnecessary semi-colon from the end of query

  • Documentation updates:

    • Document use of hive_app_jvm_args

Version 1.8.7.1 LTS (June 23, 2020)

  • New Features:

    • Add ability to push artifacts to a Bitbucket server

    • Add per-feature user control for monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models

  • Bug Fixes:

    • Fix Hive kerberos impersonation

    • Fix a DTap connector issue by using the proper login username for impersonation

    • Fix monotonicity constraints for XGBoostGBM, LightGBM and DecisionTree models

Version 1.8.7 LTS (June 15, 2020)

  • New Features:

    • Add intercept term to k-LIME csv

    • Add control of default categorical & numeric feature rendering in DAI PD/ICE

    • Add ability to restrict custom recipe upload to a specific git repository and branch

    • Add translations for Korean and Chinese

    • Add ability to use multiple authentication methods simultaneously

  • Improvements:

    • Improve behavior of systemctl in the case Driverless AI fails to start

    • Improve logging behavior for JDBC and Hive connectors

    • Improve behavior of C++ scorer, fewer unnecessary files saved in tmp directory

    • Improve Docker image behavior in Kubernetes

    • Improve LDAP authentication to allow for anonymous binding

    • Improve speed of feature selection for experiments on large, wide, imbalanced datasets

    • Improve speed of data import on busy system

  • Bug fixes:

    • Fix automatic Kaggle submission and score retrieval

    • Fix intermittent Java exception seen by surrogate DRF model in MLI when several MLI jobs are run concurrently

    • Fix issue with deleting Deployments if linked Experiment was deleted

    • Fix issue causing Jupyter Notebooks to not work properly in Docker Image

    • Fix custom recipe scorers not being displayed on Diagnostics page

    • Fix issue with AWS Lambda Deployment not handling dropped columns properly

    • Fix issue with not being able to limit number of GPUs for specific experiment

    • Fix in-server scoring inaccuracies for certain models built in 1.7.1 and 1.8.0 (standalone scoring not affected)

    • Fix rare datatable type casting exception

  • Documentation updates:

    • The “Maximum Number of Rows to Perform Permutation-Based Feature Selection” expert setting now has a default value of 500,000

    • Improved Hive and Snowflake connector documentation

    • Updated the Main.java example in the Java Scoring Pipeline chapter

    • Added documentation describing how to change the language in the UI before starting the application

    • Added information about how custom recipes are described and documented in the Autoreport

    • Updated the LDAP authentication documentation

    • Improved the Linux DEB and RPM installation instructions

    • Improved the AWS Community AMI installation instructions

    • Improved documentation for the Reproducible button

Version 1.8.6 LTS (Apr 30, 2020)

  • New Features:

    • Add expert setting to reduce size of MOJO scoring pipelines (and hence reduce latency and memory usage for inference)

    • Enable Lambda deployment for IBM Power

    • Add restart button for Deployments

    • Add automatic Kaggle submission for supported datasets, show private/public scores (requires Kaggle API Username/Key)

    • Show warning if single final model is worse on back-testing splits (for time series) or cross-validation folds (for IID) than the fold models (indicates issue with signal or fit)

    • Update R client API to include autodoc, experiment preview, dataset download, autovis functions

    • Add button in expert settings that toggle some effective settings to make a small MOJO production pipeline

    • Add an option to upload artifacts to S3 or a Git repository

  • Improvements:

    • Improve experiment restart/refit robustness if model type is changed

    • Extra protection against dropping features

    • Improve implementation of Hive connector

  • Bug fixes:

    • Upgrade datatable to fix endless loop during stats calculation at file import

    • Web server and UI now respect dynamic base URL suffix

    • Fix incorrect min_rows in MLI when providing weight column with small values

    • Fix segfault in MOJO for TensorFlow/PyTorch models

    • Fix elapsed time for MLI

    • Enable GPU by default for R client

    • Fix python scoring h2oai ModuleNotFound error

    • Update no_drop_features toml and expert button to more generally avoid dropping features

    • Fix datatable mmap strategy

  • Documentation updates:

    • Add documentation for enabling the Hive data connector

    • Add documentation for updating expired DAI licenses on AWS Lambda deployments using a script

    • Documentation for uploading artifacts now includes support for S3 and Git in the artifacts store

    • Improve documentation for one-hot encoding

    • Improve documentation for systemd logs/journalctl

    • Improve documentation for time series ‘unavailable columns at prediction time’

    • Improve documentation for Azure blob storage

    • Improve documentation for MOJO scoring pipeline

    • Add information about reducing the size of a MOJO using a new expert setting

Version 1.8.5 LTS (Mar 09, 2020)

  • New Features:

    • Handle large (up to 10k) multiclass problems, including GUI improvements in such cases

    • Detect class imbalance for binary problems where target class is non-rare

    • Add feature count to iteration panel

    • Add experiment lineage pdf in experiment summary zip file

    • Issue warnings if final pipeline scores are unstable across (cross-)validation folds

    • Issue warning if Constant Model is improving quality of final pipeline (sign of bad signal)

    • Report origin of leakage detection as from model fit (AUC/R2), GINI, or correlation

  • Improvements:

    • Improve handling of ID columns

    • Improve exception handling to improve stability of raising python exceptions

    • Improve exception handling when any individual transformer or model throw exception or segfaults

    • Improve robustness of restart and refit experiment to changes in experiment choices

    • Improve handling of missing values when transforming dataset

    • Improve robustness of custom recipe importing of modules

    • Improve documentation for installation instructions

    • Improve selection of initial lag sizes for time series

    • Improve LightGBM stability for regression problems for certain mutation parameters

  • Documentation updates:

    • Improved documentation for time-series experiments

    • Added topics describing how to re-enable the Data Recipe URL and Data Recipe File connectors

    • For users running older versions of the Standalone Python Scoring Pipeline, added information describing how to install upgraded versions of outdated dependencies

    • Improved the description for the “Sampling Method for Imbalanced Binary Classification Problems” expert setting

    • Added constraints related to the REST server deployments

    • Noted required vs optional parameters in the HDFS connector topics

    • Added an FAQ indicating that MOJOs are thread safe

    • On Windows 10, only Docker installs are supported

    • Added information about the Recommendations AutoViz graph

    • Added information to the Before you Begin Installing topic that master.db files are not backward compatible with earlier Driverless AI versions

  • Bug fixes:

    • Update LightGBM for bug fixes, including hangs and avoiding hard-coded library paths

    • Stabilize use of psutil package

    • Fix time-series experiments when test set has missing target values

    • Fix python scoring to not depend upon original data_directory

    • Fix preview for custom time series validation splits and low accuracy

    • Fix ignored minimum lag size setting for single time series

    • Fix parsing of Excel files with datetime columns

    • Fix column type detection for columns with mostly missing values

    • Fix invalid display of 0.0000 score in iteration scores

    • Various MLI fixes (don’t show invalid graphs, fix PDP sort order, overlapping labels)

    • Various bug fixes

Version 1.8.4.1 LTS (Feb 4, 2020)

Available here

  • Add option for dynamic port allocation

  • Documentation for AWS community AMI

  • Various bug fixes (MLI UI)

Version 1.8.4 LTS (Jan 31, 2020)

Available here

  • New Features:

    • Added ‘Scores’ tab in experiment page to show detailed tuning tables and scores for models and folds

    • Added Constant Model (constant predictions) and use it as reference model by default

    • Show score of global constant predictions in experiment summary as reference

    • Added support for setting up mutual TLS for the DriverlessAI

    • Added option to use client/personal certificate as an authentication method

  • Documentation Updates:

    • Added sections for enabling mTLS and Client Certificate authentication

    • Constant Models is now included in the list of Supported Algorithms

    • Added a section describing the Model Scores page

    • Improved the C++ Scoring Pipeline documentation describing the process for importing datatable

    • Improved documentation for the Java Scoring Pipeline

  • Bug fixes:

    • Fix refitting of final pipeline when new features are added

    • Various bug fixes

Version 1.8.3 LTS (Jan 22, 2020)

Available here

  • Added option to upload experiment artifacts to a configured disk location

  • Various bug fixes (correct feature engineering from time column, migration for brain restart)

Version 1.8.2 LTS (Jan 17, 2020)

Available here

  • New Features:

    • Decision Tree model

    • Automatically enabled for accuracy <= 7 and interpretability >= 7

    • Supports all problem types: regression/binary/multiclass

    • Using LightGBM GPU/CPU backend with MOJO

    • Visualization of tree splits and leaf node decisions as part of pipeline visualization

    • Per-Column Imputation Scheme (experimental)

    • Select one of [const, mean, median, min, max, quantile] imputation scheme at start of experiment

    • Select method of calculation of imputation value: either on entire dataset or inside each pipeline’s training data split

    • Disabled by default and must be enabled at startup time to be effective

    • Show MOJO size and scoring latency (for C++/R/Python runtime) in experiment summary

    • Automatically prune low weight base models in final ensemble (based on interpretability setting) to reduce final model complexity

    • Automatically convert non-raw github URLs for custom recipes to raw source code URLs

  • Improvements:

    • Speed up feature evolution for time-series and low-accuracy experiments

    • Improved accuracy of feature evolution algorithm

    • Feature transformer interpretability, total count, and importance accounted for in genetic algorithm’s model and feature selection

    • Binary confusion matrix in ROC curve of experiment page is made consistent with Diagnostics (flipped positions of TP/TN)

    • Only include custom recipes in Python scoring pipeline if the experiment uses any custom recipes

    • Additional documentation (New OpenID config options, JDBC data connector syntax)

    • Improved AutoReport’s transformer descriptions

    • Improved progress reporting during Autoreport creation

    • Improved speed of automatic interaction search for imbalanced multiclass problems

    • Improved accuracy of single final model for GLM and FTRL

    • Allow config_overrides to be a list/vector of parameters for R client API

    • Disable early stopping for Random Forest models by default, and expose new ‘rf_early_stopping’ mode (optional)

    • Create identical example data (again, as in 1.8.0 and before) for all scoring pipelines

    • Upgraded versions of datatable and Java

    • Installed graphviz in Docker image, now get .png file of pipeline visualization in MOJO package and Autoreport. Note: For RPM/DEB/TAR SH installs, user can install graphviz to get this optional functionality

  • Documentation Updates:

    • Added a simple example for modifying a dataset by recipe using live code

    • Added a section describing how to impute datasets (experimental)

    • Added Decision Trees to list of supported algorithms

    • Fixed examples for enabling JDBC connectors

    • Added information describing how to use a JDBC driver that is not tested in house

    • Updated the Missing Values Handling topic to include sections for “Clustering in Transformers” and “Isolation Forest Anomaly Score Transformer”

    • Improved the “Fold Column” description

  • Bug Fixes:

    • Fix various reasons why final model score was too far off from best feature evolution score

    • Delete temporary files created during test set scoring

    • Fixed target transformer tuning (was potentially mixing up target transformers between feature evolution and final model)

    • Fixed tensorflow_nlp_have_gpus_in_production=true mode

    • Fixed partial dependence plots for missing datetime values and no longer show them for text columns

    • Fixed time-series GUI for quarterly data

    • Feature transformer exploration limited to no more than 1000 new features (Small data on 10/10/1 would try too many features)

    • Fixed Kaggle pipeline building recipe to try more input features than 8

    • Fixed cursor placement in live code editor for custom data recipe

    • Show correct number of cross-validation splits in pipeline visualization if have more than 10 splits

    • Fixed parsing of datetime in MOJO for some datetime formats without ‘%d’ (day)

    • Various bug fixes

  • Backward/Forward compatibility:

    • Models built in 1.8.2 LTS will remain supported in upcoming versions 1.8.x LTS

    • Models built in 1.7.1/1.8.0/1.8.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)

    • Models built in 1.7.0 or earlier will be deprecated

Version 1.8.1.1 (Dec 21, 2019)

Available here

  • Bugfix for time series experiments with quarterly data when launched from GUI

Version 1.8.1 (Dec 10, 2019)

Available here

  • New Features:

    • Full set of scoring metrics and corresponding downloadable holdout predictions for experiments with single final models (time-series or i.i.d)

    • MLI Updates:

      • What-If (sensitivity) analysis

      • Interpretation of experiments on text data (NLP)

    • Custom Data Recipe BYOR:

      • BYOR (bring your own recipe) in Python: pandas, numpy, datatable, third-party libraries for fast prototyping of connectors and data preprocessing inside DAI

      • data connectors, cleaning, filtering, aggregation, augmentation, feature engineering, splits, etc.

      • can create one or multiple datasets from scratch or from existing datasets

      • interactive code editor with live preview

      • example code at https://github.com/h2oai/driverlessai-recipes/tree/rel-1.8.1/data

    • Visualization of final scoring pipeline (Experimental)

      • In-GUI display of graph of feature engineering, modeling and ensembling steps of entire machine learning pipeline

      • Addition to Autodoc

    • Time-Series:

      • Ability to specify which features will be unavailable at test time for time-series experiments

      • Custom user-provided train/validation splits (by start/end datetime for each split) for time-series experiments

      • Back-testing metrics for time-series experiments (regression and classification, with and without lags) based on rolling windows (configurable number of windows)

    • MOJO:

      • Java MOJO for FTRL

      • PyTorch MOJO (C++/Py/R) for custom recipes based on BERT/DistilBERT NLP models (available upon request)

  • Improvements:

    • Accuracy:

      • Automatic pairwise interaction search (+,-,*,/) for numeric features (“magic feature” finder)

      • Improved accuracy for time series experiments with low interpretability

      • Improved leakage detection logic

      • Improved genetic algorithm heuristics for feature evolution (more exploration)

    • Time-Series Recipes:

      • Re-enable Test-time augmentation in Python scoring pipeline for time-series experiments

      • Reduce default number of time-series rolling holdout predictions to same number as validation splits (but configurable)

    • Computation:

      • Faster feature evolution part for non-time-series experiments with single final model

      • Faster binary imbalanced models for very high class imbalance by limiting internal number of re-sampling bags

      • Faster feature selection

      • Enable GPU support for ImbalancedXGBoostGBMModel

      • Improved speed for importing multiple files at once

      • Faster automatic determination of time series properties

      • Enable use of XGBoost models on large datasets if low enough accuracy settings, expose dataset size limits in expert settings

      • Reduced memory usage for all experiments

      • Faster creation of holdout predictions for time-series experiments (Shapley values are now computed by MLI on demand by default)

    • UX Improvements:

      • Added ability to rename datasets

      • Added search bar for expert settings

      • Show traces for long-running experiments

      • All experiments create a MOJO (if possible, set to ‘auto’)

      • All experiments create a pipeline visualization

      • By default, all experiments (iid and time series) have holdout predictions on training data and a full set of metrics for final model

  • Documentation Updates:

    • Updated steps for enabling GPU persistence mode

    • Added information about deprecated NVIDIA functions

    • Improved documentation for enabling LDAP authentication

    • Added information about changing the column type in datasets

    • Updated list of experiment artifacts available in an experiment summary

    • Added steps describing how to expose ports on Docker for the REST service deployment within the Driverless AI Docker container

    • Added an example showing how to run an experiment with a custom transform recipe

    • Improved the FAQ for setting up TLS/SSL

    • Added FAQ describing issues that can occur when attempting Import Folder as File with a data connector on Windows

  • Bug Fixes:

    • Allow brain restart/refit to accept unscored previous pipelines

    • Fix actual vs predicted labeling for diagnostics of regression model

    • Fix MOJO for TensorFlow for non target transformers other than identity

    • Fix column type detection for Excel files

    • Allow experiments with default expert settings to have a MOJO

    • Various bug fixes

Version 1.8.0 (Oct 3, 2019)

Available here

  • Improve speed and memory usage for feature engineering

  • Improve speed of leakage and shift detection, and improve accuracy

  • Improve speed of AutoVis under high system load

  • Improve speed for experiments with large user-given validation data

  • Improve accuracy of ensembles for regression problems

  • Improve creation of Autoreport (only one background job per experiment)

  • Improve sampling techniques for ImbalancedXGBoost and ImbalancedLightGBM models, and disable them by default since can be slower

  • Add Python/R/C++ MOJO support for FTRL and RandomForest

  • Add native categorical handling for LightGBM in CPU mode

  • Add monotonicity constraints support for LightGBM

  • Add Isolation Forest Anomaly Score transformer (outlier detection)

  • Re-enable One-Hot-Encoding for GLM models

  • Add lexicographical label encoding (disabled by default)

  • Add ability to further train user-provided pretrained embeddings for TensorFlow NLP transformers, in addition to fine-tuning the rest of the neural network graph

  • Add timeout for BYOR acceptance tests

  • Add log and notifications for large shifts in final model variable importances compared to tuning model

  • Add more expert control over time series feature engineering

  • Add ability for recipes to be uploaded in bulk as entire (or part of) github repository or as links to python files on page

  • Allow missing values in fold column

  • Add support for feature brain when starting “New Model With Same Parameters” of a model that was previously restarted

  • Add support for toggling whether additional features are to be included in pipeline during “Retrain Final Pipeline”

  • Limit experiment runtime to one day by default (approximately enforced, can be configured in Expert Settings -> Experiment or config.toml ‘max_runtime_minutes’)

  • Add support for importing pickled Pandas frames (.pkl)

  • MLI updates:

    • Show holdout predictions and test set predictions (if applicable) in MLI TS for both metric and actual vs. predicted charts

    • Add ability to download group metrics in MLI TS

    • Add ability to zoom into charts in MLI TS

    • Add ability to use column not used in DAI model as a k-LIME cluster column in MLI

    • Add ability to view original and transformed DAI model-based feature importance in MLI

    • Add ability to view Shapley importance for original features

    • Add ability to view permutation importance for a DAI model when the config option autodoc_include_permutation_feature_importance is set to on

    • Fixed bug in binary Disparate Impact Analysis, which caused incorrect calculations amongst several metrics (ones using false positives and true negatives in the numerator)

  • Disable NLP TensorFlow transformers by default (enable in NLP expert settings by switching to “on”)

  • Reorganize expert settings, add tab for feature engineering

  • Experiment now informs if aborted by user, system or server restart

  • Reduce load of all tasks launched by server, giving priority to experiments to use cores

  • Add experiment summary files to aborted experiment logs

  • Add warning when ensemble has models that reach limit of max iterations despite early stopping, with learning rate controls in expert panel to control.

  • Improve progress reporting

  • Allow disabling of H2O recipe server for scoring if not using custom recipes (to avoid Java dependency)

  • Fix RMSPE scorer

  • Fix recipes error handling when uploading via URL

  • Fix Autoreport being spawned anytime GUI was on experiment page, overloading the system with forks from the server

  • Fix time-out for Autoreport PDP calculations, so completes more quickly

  • Fix certain config settings to be honored from GUI expert settings (woe_bin_list, ohe_bin_list, text_gene_max_ngram, text_gene_dim_reduction_choice, tensorflow_max_epochs_nlp, tensorflow_nlp_pretrained_embeddings_file_path, holiday_country), previously were only honored when provided at startup time

  • Fix column type for additional columns during scored test set download

  • Fix GUI incorrectly converting time for forecast horizon in TS experiments

  • Fix calculation of correlation for string columns in AutoVis

  • Fix download for R MOJO runtime

  • Fix parameters for LightGBM RF mode

  • Fix dart parameters for LightGBM and XGBoost

  • Documentation updates:

    • Included more information in the Before You Begin Installing or Upgrading topic to help making installations and upgrades go more smoothly

    • Added topic describing how to choose between the AWS Community and AWS Marketplace AMIs

    • Added information describing how to retrieve the MOJO2 Javadoc

    • Updated Python client examples to work with Driverless AI 1.7.x releases

    • Updated documentation for new features, expert settings, MLI plots, etc.

  • Backward/Forward compatibility:

    • Models built in 1.8.0 will remain supported in versions 1.8.x

    • Models built in 1.7.1 are not deprecated and should continue to work (best effort is made to preserve MOJO and Autoreport creation, MLI, scoring, etc.)

    • 1.8.0 upgraded to scipy version 1.3.1 to support newer custom recipes. This might deprecate custom recipes that depend on scipy version 1.2.2 (and experiments using them) and might require re-import of those custom recipes. Previously built Python scoring pipelines will continue to work.

    • Models built in 1.7.0 or earlier will be deprecated

  • Various bug fixes

Version 1.7.1 (Aug 19, 2019)

Available here

  • Added two new models with internal sampling techniques for imbalanced binary classification problems: ImbalancedXGBoost and ImbalancedLightGBM

  • Added support for rolling-window based predictions for time-series experiments (2 options: test-time augmentation or re-fit)

  • Added support for setting logical column types for a dataset (to override type detection during experiments)

  • Added ability to set experiment name at start of experiment

  • Added leakage detection for time-series problems

  • Added JDBC connector

  • MOJO updates:

    • Added Python/R/C++ MOJO support for TensorFlow model

    • Added Python/R/C++ MOJO support for TensorFlow NLP transformers: TextCNN, CharCNN, BiGRU, including any pretrained embeddings if provided

    • Reduced memory usage for MOJO creation

    • Increased speed of MOJO creation

    • Configuration options for MOJO and Python scoring pipelines now have 3-way toggle: “on”/”off”/”auto”

  • MLI updates:

    • Added disparate impact analysis (DIA) for MLI

    • Allow MLI scoring pipeline to be built for datasets with column names that need to be sanitized

    • Date-aware binning for partial dependence and ICE in MLI

  • Improved generalization performance for time-series modeling with regulariation techniques for lag-based features

  • Improved “predicted vs actual” plots for regression problems (using adaptive point sizes)

  • Fix bug in datatable for manipulations of string columns larger than 2GB

  • Fixed download of predictions on user-provided validation data

  • Fix bug in time-series test-time augmentation (work-around was to include entire training data in test set)

  • Honor the expert settings flag to enable detailed traces (disable again by default)

  • Various bug fixes

Version 1.6.4 LTS (Aug 19, 2019)

Available here

  • ML Core updates:

    • Speed up schema detection

    • DAI now drops rows with missing values when diagnosing regression problems

    • Speed up column type detection

    • Fixed growth of individuals

    • Fixed n_jobs for predict

    • Target column is no longer included in predictors for skewed datasets

    • Added an option to prevent users from downloading data files locally

    • Improved UI split functionality

    • A new “max_listing_items” config option to limit the number of items fetched in listing pages

  • Model Ops updates:

    • MOJO runtime upgraded to version 2.1.3 which supports perpetual MOJO pipeline

    • Upgraded deployment templates to version matching MOJO runtime version

  • MLI updates:

    • Fix to MLI schema builder

    • Fix parsing of categorical reason codes

    • Added ability to handle integer time column

  • Various bug fixes

Version 1.7.0 (Jul 7, 2019)

Available here

  • Support for Bring Your Own Recipe (BYOR) for transformers, models (algorithms) and scorers

  • Added protobuf-based MOJO scoring runtime libraries for Python, R and Java (standalone, low-latency)

  • Added local REST server as one-click deployment option for MOJO scoring pipeline, in addition to AWS Lambda endpoint

  • Added R client package, in addition to Python client

  • Added Project workspace to group datasets and experiments and to visually compare experiments and create leaderboards

  • Added download of imported datasets as .csv

  • Recommendations for columnar transformations in AutoViz

  • Improved scalability and performance

  • Ability to provide max. runtime for experiments

  • Create MOJO scoring pipeline by default if the experiment configuration allows (for convenience, enables local/cloud deployment options without user input)

  • Support for user provided pre-trained embeddings for TensorFlow NLP models

  • Support for holdout splits lacking some target classes (can happen when a fold column is provided)

  • MLI updates:

    • Added residual plot for regression problems (keeping all outliers intact)

    • Added confusion matrix as default metric display for multinomial problems

    • Added Partial Dependence (PD) and Individual Conditional Expectation (ICE) plots for Driverless.ai models in MLI GUI

    • Added ability to search by ID column in MLI GUI

    • Added ability to run MLI PD/ICE on all features

    • Added ability to handle multiple observations for a single time column in MLI TS by taking the mean of the target and prediction where applicable

    • Added ability to handle integer time column in MLI TS

    • MLI TS will use train holdout predictions if there is no test set provided

  • Faster import of files with “%Y%m%d” and “%Y%m%d%H%M” time format strings, and files with lots of text strings

  • Fix units for RMSPE scorer to be a percentage (multiply by 100)

  • Allow non-positive outcomes for MAPE and SMAPE scorers

  • Improved listing in GUI

  • Allow zooming in GUI

  • Upgrade to TensorFlow 1.13.1 and CUDA 10 (and CUDA is part of the distribution now, to simplify installation)

  • Add CPU-support for TensorFlow on PPC

  • Documentation updates:

    • Added documentation for new features including

      • Projects

      • Custom Recipes

      • C++ MOJO Scoring Pipelines

      • R Client API

      • REST Server Deployment

    • Added information about variable importance values on the experiments page

    • Updated documentation for Expert Settings

    • Updated “Tips n Tricks” with new Scoring Pipeline tips

  • Various bug fixes

Version 1.6.3 LTS (June 14, 2019)

Available here

  • Included an Audit log feature

  • Fixed support for decimal types for parquet files in MOJO

  • Autodoc can order PDP/ICE by feature importance

  • Session Management updates

  • Upgraded datatable

  • Improved reproducibility

  • Model diagnostics now uses a weight column

  • MLI can build surrogate models on all the original features or on all the transformed features that DAI uses

  • Internal server cache now respects usernames

  • Fixed an issue with time series settings

  • Fixed an out of memory error when loading a MOJO

  • Fixed Python scoring package for TensorFlow

  • Added OpenID configurations

  • Documentation updates:

    • Updated the list of artifacts available in the Experiment Summary

    • Clarified language in the documentation for unsupported (but available) features

    • For the Terraform requirement in deployments, clarified that only Terraform versions in the 0.11.x release are supported, and specifically 0.11.10 or greater

    • Fixed link to the Miniconda installation instructions

  • Various bug fixes

Version 1.6.2 LTS (May 10, 2019)

Available here

  • This version provides PPC64le artifacts

  • Improved stability of datatable

  • Improved path filtering in the file browser

  • Fixed units for RMSPE scorer to be a percentage (multiply by 100)

  • Fixed segmentation fault on Ubuntu 18 with installed font package

  • Fixed IBM Spectrum Conductor authentication

  • Fixed handling of EC2 machine credentials

  • Fixed of Lag transformer configuration

  • Fixed KDB and Snowflake Error Reporting

  • Gradually reduce number of used workers for column statistics computation in case of failure.

  • Hide default Tornado header exposing used version of Tornado

  • Documentation updates:

    • Added instructions for installing via AWS Marketplace

    • Improved documentation for installing via Google Cloud

    • Improved FAQ documentation

    • Added Data Sampling documentation topic

  • Various bug fixes

Version 1.6.1.1 LTS (Apr 24, 2019)

Available here

  • Fix in AWS role handling.

Version 1.6.1 LTS (Apr 18, 2019)

Available here

  • Several fixes for MLI (partial dependence plots, Shapley values)

  • Improved documentation for model deployment, time-series scoring, AutoVis and FAQs

Version 1.6.0 LTS (Apr 5, 2019)

Private build only.

  • Fixed import of string columns larger than 2GB

  • Fixed AutoViz crashes on Windows

  • Fixed quantile binning in MLI

  • Plot global absolute mean Shapley values instead of global mean Shapley values in MLI

  • Improvements to PDP/ICE plots in MLI

  • Validated Terraform version in AWS Lambda deployment

  • Added support for NULL variable importance in AutoDoc

  • Made Variable Importance table size configurable in AutoDoc

  • Improved support for various combinations of data import options being enabled/disabled

  • CUDA is now part of distribution for easier installation

  • Security updates:

    • Enforced SSL settings to be honored for all h2oai_client calls

    • Added config option to prevent using LocalStorage in the browser to cache information

    • Upgraded Tornado server version to 5.1.1

    • Improved session expiration and autologout functionality

    • Disabled access to Driverless AI data folder in file browser

    • Provided an option to filter content that is shown in the file browser

    • Use login name for HDFS impersonation instead of predefined name

    • Disabled autocomplete in login form

  • Various bug fixes

Version 1.5.4 (Feb 24, 2019)

Available here

  • Speed up calculation of column statistics for date/datetime columns using certain formats (now uses ‘max_rows_col_stats’ parameter)

  • Added computation of standard deviation for variable importances in experiment summary files

  • Added computation of shift of variable importances between feature evolution and final pipeline

  • Fix link to MLI Time-Series experiment

  • Fix display bug for iteration scores for long experiments

  • Fix display bug for early finish of experiment for GLM models

  • Fix display bug for k-LIME when target is skewed

  • Fix display bug for forecast horizon in MLI for Time-Series

  • Fix MLI for Time-Series for single time group column

  • Fix in-server scoring of time-series experiments created in 1.5.0 and 1.5.1

  • Fix OpenBLAS dependency

  • Detect disabled GPU persistence mode in Docker

  • Reduce disk usage during TensorFlow NLP experiments

  • Reduce disk usage of aborted experiments

  • Refresh reported size of experiments during start of application

  • Disable TensorFlow NLP transformers by default to speed up experiments (can enable in expert settings)

  • Improved progress percentage shown during experiment

  • Improved documentation (upgrade on Windows, how to create the simplest model, DTap connectors, etc.)

  • Various bug fixes

Version 1.5.3 (Feb 8, 2019)

Available here

  • Added support for splitting datasets by time via time column containing date, datetime or integer values

  • Added option to disable file upload

  • Require authentication to download experiment artifacts

  • Automatically drop predictor columns from training frame if not found in validation or test frame and warn

  • Improved performance by using physical CPU cores only (configurable in config.toml)

  • Added option to not show inactive data connectors

  • Various bug fixes

Version 1.5.2 (Feb 2, 2019)

Available here

  • Added world-level bidirectional GRU Tensorflow models for NLP features

  • Added character-level CNN Tensorflow models for NLP features

  • Added support to import multiple individual datasets at once

  • Added support for holdout predictions for time-series experiments

  • Added support for regression and multinomial classification for FTRL (in addition to binomial classification)

  • Improved scoring for time-series when test data contains actual target values (missing target values will be predicted)

  • Reduced memory usage for LightGBM models

  • Improved performance for feature engineering

  • Improved speed for TensorFlow models

  • Improved MLI GUI for time-series problems

  • Fix final model fold splits when fold_column is provided

  • Various bug fixes

Version 1.5.1 (Jan 22, 2019)

Available here

  • Fix MOJO for GLM

  • Add back .csv file of experiment summary

  • Improve collection of pipeline timing artifacts

  • Clean up Docker tag

Version 1.5.0 (Jan 18, 2019)

Available here

  • Added model diagnostics (interactive model metrics on new test data incl. residual analysis for regression)

  • Added FTRL model (Follow The Regularized Leader)

  • Added Kolmogorov-Smirnov metric (degree of separation between positives and negatives)

  • Added ability to retrain (only) the final model on new data

  • Added one-hot encoding for low-cardinality categorical features, for GLM

  • Added choice between 32-bit (now default) and 64-bit precision

  • Added system information (CPU, GPU, disk, memory, experiments)

  • Added support for time-series data with many more time gaps, and with weekday-only data

  • Added one-click deployment to Amazon Lambda

  • Added ability to split datasets randomly, with option to stratify by target column or group by fold column

  • Added support for OpenID authentication

  • Added connector for BlueData

  • Improved responsiveness of the GUI under heavy load situations

  • Improved speed and reduce memory footprint of feature engineering

  • Improved performance for RuleFit models and enable GPU and multinomial support

  • Improved auto-detection of temporal frequency for time-series problems

  • Improved accuracy of final single model if external validation provided

  • Improved final pipeline if external validation data is provided (add ensembling)

  • Improved k-LIME in MLI by using original features deemed important by DAI instead of all original features

  • Improved MLI by using 3-fold CV by default for all surrogate models

  • Improved GUI for MLI time series (integrated help, better integration)

  • Added ability to view MLI time series logs while MLI time series experiment is running

  • PDF version of the Automatic Report (AutoDoc) is now replaced by a Word version

  • Various bug fixes (GLM accuracy, UI slowness, MLI UI, AutoVis)

Version 1.4.2 (Dec 3, 2018)

Available here

  • Support for IBM Power architecture

  • Speed up training and reduce size of final pipeline

  • Reduced resource utilization during training of final pipeline

  • Display test set metrics (ROC, ROCPR, Gains, Lift) in GUI in addition to validation metrics (if test set provided)

  • Show location of best threshold for Accuracy, MCC and F1 in ROC curves

  • Add relative point sizing for scatter plots in AutoVis

  • Fix file upload and add model checkpointing in python client API

  • Various bug fixes

Version 1.4.1 (Nov 11, 2018)

Available here

  • Improved integration of MLI for time-series

  • Reduced disk and memory usage during final ensemble

  • Allow scoring and transformations on previously imported datasets

  • Enable checkpoint restart for unfinished models

  • Add startup checks for OpenCL platforms for LightGBM on GPUs

  • Improved feature importances for ensembles

  • Faster dataset statistics for date/datetime columns

  • Faster MOJO batch scoring

  • Fix potential hangs

  • Fix ‘not in list’ error in MOJO

  • Fix NullPointerException in MLI

  • Fix outlier detection in AutoVis

  • Various bug fixes

Version 1.4.0 (Oct 27, 2018)

Available here

  • Enable LightGBM by default (now with MOJO)

  • LightGBM tuned for GBM decision trees, Random Forest (rf), and Dropouts meet Multiple Additive Regression Trees (dart)

  • Add ‘isHoliday’ feature for time columns

  • Add ‘time’ column type for date/datetime columns in data preview

  • Add support for binary datatable file ingest in .jay format

  • Improved final ensemble (each model has its own feature pipeline)

  • Automatic smart checkpointing (feature brain) from prior experiments

  • Add kdb+ connector

  • Feature selection of original columns for data with many columns to handle >>100 columns

  • Improved time-series recipe (multiple validation splits, better logic)

  • Improved performance of AutoVis

  • Improved date detection logic (now detects %Y%m%d and %Y-%m date formats)

  • Automatic fallback to CPU mode if GPU runs out of memory (for XGBoost, GLM and LightGBM)

  • No longer require header for validation and testing datasets if data types match

  • No longer include text columns for data shift detection

  • Add support for time-series models in MLI (including ability to select time-series groups)

  • Add ability to download MLI logs from MLI experiment page (includes both Python and Java logs)

  • Add ability to view MLI logs while MLI experiment is running (Python and Java logs)

  • Add ability to download LIME and Shapley reason codes from MLI page

  • Add ability to run MLI on transformed features

  • Display all variables for MLI variable importance for both DAI and surrogate models in MLI summary

  • Include variable definitions for DAI variable importance list in MLI summary

  • Fix Gains/Lift charts when observations weights are given

  • Various bug fixes

Version 1.3.1 (Sep 12, 2018)

Available here

  • Fix ‘Broken pipe’ failures for TensorFlow models

  • Fix time-series problems with categorical features and interpretability >= 8

  • Various bug fixes

Version 1.3.0 (Sep 4, 2018)

Available here

  • Added LightGBM models - now have [XGBoost, LightGBM, GLM, TensorFlow, RuleFit]

  • Added TensorFlow NLP recipe based on CNN Deeplearning models (sentiment analysis, document classification, etc.)

  • Added MOJO for GLM

  • Added detailed confusion matrix statistics

  • Added more expert settings

  • Improved data exploration (columnar statistics and row-based data preview)

  • Improved speed of feature evolution stage

  • Improved speed of GLM

  • Report single-pass score on external validation and test data (instead of bootstrap mean)

  • Reduced memory overhead for data processing

  • Reduced number of open files - fixes ‘Bad file descriptor’ error on Mac/Docker

  • Simplified Python client API

  • Query any data point in the MLI UI from the original dataset due to “on-demand” reason code generation

  • Enhanced k-means clustering in k-LIME by only using a subset of features. See The K-LIME Technique for more information.

  • Report k-means centers for k-LIME in MLI summary for better cluster interpretation

  • Improved MLI experiment listing details

  • Various bug fixes

Version 1.2.2 (July 5, 2018)

Available here

  • MOJO Java scoring pipeline for time-series problems

  • Multi-class confusion matrices

  • AUCMACRO Scorer: Multi-class AUC via macro-averaging (in addition to the default micro-averaging)

  • Expert settings (configuration override) for each experiment from GUI and client APIs.

  • Support for HTTPS

  • Improved downsampling logic for time-series problems (if enabled through accuracy knob settings)

  • LDAP readonly access to Active Directory

  • Snowflake data connector

  • Various bug fixes

Version 1.2.1 (June 26, 2018)

  • Added LIME-SUP (alpha) to MLI as alternative to k-LIME (local regions are defined by decision tree instead of k-means)

  • Added RuleFit model (alpha), now have [GBM, GLM, TensorFlow, RuleFit] - TensorFlow and RuleFit are disabled by default

  • Added Minio (private cloud storage) connector

  • Added support for importing folders from S3

  • Added ‘Upload File’ option to ‘Add Dataset’ (in addition to drag & drop)

  • Predictions for binary classification problems now have 2 columns (probabilities per class), for consistency with multi-class

  • Improved model parameter tuning

  • Improved feature engineering for time-series problems

  • Improved speed of MOJO generation and loading

  • Improved speed of time-series related automatic calculations in the GUI

  • Fixed potential rare hangs at end of experiment

  • No longer require internet to run MLI

  • Various bug fixes

Version 1.2.0 (June 11, 2018)

  • Time-Series recipe

  • Low-latency standalone MOJO Java scoring pipelines (now beta)

  • Enable Elastic Net Generalized Linear Modeling (GLM) with lambda search (and GPU support), for interpretability>=6 and accuracy<=5 by default (alpha)

  • Enable TensorFlow (TF) Deep Learning models (with GPU support) for interpretability=1 and/or multi-class models (alpha, enable via config.toml)

  • Support for pre-tuning of [GBM, GLM, TF] models for picking best feature evolution model parameters

  • Support for final ensemble consisting of mix of [GBM, GLM, TF] models

  • Automatic Report (AutoDoc) in PDF and Markdown format as part of summary zip file

  • Interactive tour (assistant) for first-time users

  • MLI now runs on experiments from previous releases

  • Surrogate models in MLI now use 3 folds by default

  • Improved small data recipe with up to 10 cross-validation folds

  • Improved accuracy for binary classification with imbalanced data

  • Additional time-series transformers for interactions and aggreations between lags and lagging of non-target columns

  • Faster creation of MOJOs

  • Progress report during data ingest

  • Normalize binarized multi-class confusion matrices by class count (global scaling factor)

  • Improved parsing of boolean environment variables for configuration

  • Various bug fixes

Version 1.1.6 (May 29, 2018)

  • Improved performance for large datasets

  • Improved speed and user interface for MLI

  • Improved accuracy for binary classification with imbalanced data

  • Improved generalization estimate for experiments with given validation data

  • Reduced size of experiment directories

  • Support for Parquet files

  • Support for bzip2 compressed files

  • Added Data preview in UI: ‘Describe’

  • No longer add ID column to holdout and test set predictions for simplicity

  • Various bug fixes

Version 1.1.4 (May 17, 2018)

  • Native builds (RPM/DEB) for 1.1.3

Version 1.1.3 (May 16, 2018)

  • Faster speed for systems with large CPU core counts

  • Faster and more robust handling of user-specified missing values for training and scoring

  • Same validation scheme for feature engineering and final ensemble for high enough accuracy

  • MOJO scoring pipeline for text transformers

  • Fixed single-row scoring in Python scoring pipeline (broken in 1.1.2)

  • Fixed default scorer when experiment is started too quickly

  • Improved responsiveness for time-series GUI

  • Improved responsiveness after experiment abort

  • Improved load balancing of memory usage for multi-GPU XGBoost

  • Improved UI for selection of columns to drop

  • Various bug fixes

Version 1.1.2 (May 8, 2018)

  • Support for automatic time-series recipe (alpha)

  • Now using Generalized Linear Model (GLM) instead of XGBoost (GBM) for interpretability 10

  • Added experiment preview with runtime and memory usage estimation

  • Added MER scorer (Median Error Rate, Median Abs. Percentage Error)

  • Added ability to use integer column as time column

  • Speed up type enforcement during scoring

  • Support for reading ARFF file format (alpha)

  • Quantile Binning for MLI

  • Various bug fixes

Version 1.1.1 (April 23, 2018)

  • Support string columns larger than 2GB

Version 1.1.0 (April 19, 2018)

  • AWS/Azure integration (hourly cloud usage)

  • Bug fixes for MOJO pipeline scoring (now beta)

  • Google Cloud storage and BigQuery (alpha)

  • Speed up categorical column stats computation during data import

  • Further improved memory management on GPUs

  • Improved accuracy for MAE scorer

  • Ability to build scoring pipelines on demand (if not enabled by default)

  • Additional target transformer for regression problems sqrt(sqrt(x))

  • Add GLM models as candidates for interpretability=10 (alpha, disabled by default)

  • Improved performance of native builds (RPM/DEB)

  • Improved estimation of error bars

  • Various bug fixes

Version 1.0.30 (April 5, 2018)

  • Speed up MOJO pipeline creation and disable MOJO by default (still alpha)

  • Improved memory management on GPUs

  • Support for optional 32-bit floating-point precision for reduced memory footprint

  • Added logging of test set scoring and data transformations

  • Various bug fixes

Version 1.0.29 (April 4, 2018)

  • If MOJO fails to build, no MOJO will be available, but experiment can still succeed

Version 1.0.28 (April 3, 2018)

  • (Non-docker) RPM installers for RHEL7/CentOS7/SLES 12 with systemd support

Version 1.0.27 (March 31, 2018)

  • MOJO scoring pipeline for Java standalone cross-platform low-latency scoring (alpha)

  • Various bug fixes

Version 1.0.26 (March 28, 2018)

  • Improved performance and reduced memory usage for large datasets

  • Improved performance for F0.5, F2 and accuracy

  • Improved performance of MLI

  • Distribution shift detection now also between validation and test data

  • Batch scoring example using datatable

  • Various enhancements for AutoVis (outliers, parallel coordinates, log file)

  • Various bug fixes

Version 1.0.25 (March 22, 2018)

  • New scorers for binary/multinomial classification: F0.5, F2 and accuracy

  • Precision-recall curve for binary/multinomial classification models

  • Plot of actual vs predicted values for regression problems

  • Support for excluding feature transformations by operation type

  • Support for reading binary file formats: datatable and Feather

  • Improved multi-GPU memory load balancing

  • Improved display of initial tuning results

  • Reduced memory usage during creation of final model

  • Fixed several bugs in creation of final scoring pipeline

  • Various UI improvements (e.g., zooming on iteration scoreboard)

  • Various bug fixes

Version 1.0.24 (March 8, 2018)

  • Fix test set scoring bug for data with an ID column (introduced in 1.0.23)

  • Allow renaming of MLI experiments

  • Ability to limit maximum number of cores used for datatable

  • Print validation scores and error bars across final ensemble model CV folds in logs

  • Various UI improvements

  • Various bug fixes

Version 1.0.23 (March 7, 2018)

  • Support for Gains and Lift curves for binomial and multinomial classification

  • Support for multi-GPU single-model training for large datasets

  • Improved recipes for large datasets (faster and less memory/disk usage)

  • Improved recipes for text features

  • Increased sensitivity of interpretability setting for feature engineering complexity

  • Disable automatic time column detection by default to avoid confusion

  • Automatic column type conversion for test and validation data, and during scoring

  • Improved speed of MLI

  • Improved feature importances for MLI on transformed features

  • Added ability to download each MLI plot as a PNG file

  • Added support for dropped columns and weight column to MLI stand-alone page

  • Fix serialization of bytes objects larger than 4 GiB

  • Fix failure to build scoring pipeline with ‘command not found’ error

  • Various UI improvements

  • Various bug fixes

Version 1.0.22 (Feb 23, 2018)

  • Fix CPU-only mode

  • Improved robustness of datatable CSV parser

Version 1.0.21 (Feb 21, 2018)

  • Fix MLI GUI scaling issue on Mac

  • Work-around segfault in truncated SVD scipy backend

  • Various bug fixes

Version 1.0.20 (Feb 17, 2018)

  • HDFS/S3/Excel data connectors

  • LDAP/PAM/Kerberos authentication

  • Automatic setting of default values for accuracy / time / interpretability

  • Interpretability: per-observation and per-feature (signed) contributions to predicted values in scoring pipeline

  • Interpretability setting now affects feature engineering complexity and final model complexity

  • Standalone MLI scoring pipeline for Python

  • Time setting of 1 now runs for only 1 iteration

  • Early stopping of experiments if convergence is detected

  • ROC curve display for binomial and multinomial classification, with confusion matrices and threshold/F1/MCC display

  • Training/Validation/Test data shift detectors

  • Added AUCPR scorer for multinomial classification

  • Improved handling of imbalanced binary classification problems

  • Configuration file for runtime limits such as cores/memory/harddrive (for admins)

  • Various GUI improvements (ability to rename experiments, re-run experiments, logs)

  • Various bug fixes

Version 1.0.19 (Jan 28, 2018)

  • Fix hang during final ensemble (accuracy >= 5) for larger datasets

  • Allow scoring of all models built in older versions (>= 1.0.13) in GUI

  • More detailed progress messages in the GUI during experiments

  • Fix scoring pipeline to only use relative paths

  • Error bars in model summary are now +/- 1*stddev (instead of 2*stddev)

  • Added RMSPE scorer (RMS Percentage Error)

  • Added SMAPE scorer (Symmetric Mean Abs. Percentage Error)

  • Added AUCPR scorer (Area under Precision-Recall Curve)

  • Gracefully handle inf/-inf in data

  • Various UI improvements

  • Various bug fixes

Version 1.0.18 (Jan 24, 2018)

  • Fix migration from version 1.0.15 and earlier

  • Confirmation dialog for experiment abort and data/experiment deletion

  • Various UI improvements

  • Various AutoVis improvements

  • Various bug fixes

Version 1.0.17 (Jan 23, 2018)

  • Fix migration from version 1.0.15 and earlier (partial, for experiments only)

  • Added model summary download from GUI

  • Restructured and renamed logs archive, and add model summary to it

  • Fix regression in AutoVis in 1.0.16 that led to slowdown

  • Various bug fixes

Version 1.0.16 (Jan 22, 2018)

  • Added support for validation dataset (optional, instead of internal validation on training data)

  • Standard deviation estimates for model scores (+/- 1 std.dev.)

  • Computation of all applicable scores for final models (in logs only for now)

  • Standard deviation estimates for MLI reason codes (+/- 1 std.dev.) when running in stand-alone mode

  • Added ability to abort MLI job

  • Improved final ensemble performance

  • Improved outlier visualization

  • Updated H2O-3 to version 3.16.0.4

  • More readable experiment names

  • Various speedups

  • Various bug fixes

Version 1.0.15 (Jan 11, 2018)

  • Fix truncated per-experiment log file

  • Various bug fixes

Version 1.0.14 (Jan 11, 2018)

  • Improved performance

Version 1.0.13 (Jan 10, 2018)

  • Improved estimate of generalization performance for final ensemble by removing leakage from target encoding

  • Added API for re-fitting and applying feature engineering on new (potentially larger) data

  • Remove access to pre-transformed datasets to avoid unintended leakage issues downstream

  • Added mean absolute percentage error (MAPE) scorer

  • Enforce monotonicity constraints for binary classification and regression models if interpretability >= 6

  • Use squared Pearson correlation for R^2 metric (instead of coefficient of determination) to avoid negative values

  • Separated HTTP and TCP scoring pipeline examples

  • Reduced size of h2oai_client wheel

  • No longer require weight column for test data if it was provided for training data

  • Improved accuracy of final modeling pipeline

  • Include H2O-3 logs in downloadable logs.zip

  • Updated H2O-3 to version 3.16.0.2

  • Various bug fixes

Version 1.0.11 (Dec 12, 2017)

  • Faster multi-GPU training, especially for small data

  • Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs

  • Improved accuracy of generalization performance estimate for models on small data (< 100k rows)

  • Faster abort of experiment

  • Improved final ensemble meta-learner

  • More robust date parsing

  • Various bug fixes

Version 1.0.10 (Dec 4, 2017)

  • Tool tips and link to documentation in parameter settings screen

  • Faster training for multi-class problems with > 5 classes

  • Experiment summary displayed in GUI after experiment finishes

  • Python Client Library downloadable from the GUI

  • Speedup for Maxwell-based GPUs

  • Support for multinomial AUC and Gini scorers

  • Add MCC and F1 scorers for binomial and multinomial problems

  • Faster abort of experiment

  • Various bug fixes

Version 1.0.9 (Nov 29, 2017)

  • Support for time column for causal train/validation splits in time-series datasets

  • Automatic detection of the time column from temporal correlations in data

  • MLI improvements, dedicated page, selection of datasets and models

  • Improved final ensemble meta-learner

  • Test set score now displayed in experiment listing

  • Original response is preserved in exported datasets

  • Various bug fixes

Version 1.0.8 (Nov 21, 2017)

  • Various bug fixes

Version 1.0.7 (Nov 17, 2017)

  • Sharing of GPUs between experiments - can run multiple experiments at the same time while sharing GPU resources

  • Persistence of experiments and data - can stop and restart the application without loss of data

  • Support for weight column for optional user-specified per-row observation weights

  • Support for fold column for user-specified grouping of rows in train/validation splits

  • Higher accuracy through model tuning

  • Faster training - overall improvements and optimization in model training speed

  • Separate log file for each experiment

  • Ability to delete experiments and datasets from the GUI

  • Improved accuracy for regression tasks with very large response values

  • Faster test set scoring - Significant improvements in test set scoring in the GUI

  • Various bug fixes

Version 1.0.5 (Oct 24, 2017)

  • Only display scorers that are allowed

  • Various bug fixes

Version 1.0.4 (Oct 19, 2017)

  • Improved automatic type detection logic

  • Improved final ensemble accuracy

  • Various bug fixes

Version 1.0.3 (Oct 9, 2017)

  • Various speedups

  • Results are now reproducible

  • Various bug fixes

Version 1.0.2 (Oct 5, 2017)

  • Improved final ensemble accuracy

  • Weight of Evidence features added

  • Various bug fixes

Version 1.0.1 (Oct 4, 2017)

  • Improved speed of final ensemble

  • Various bug fixes

Version 1.0.0 (Sep 24, 2017)

  • Initial stable release