Available in: GLM, GAM
solver option allows you to specify the solver method to use in GLM and GAM. When specifying a solver, the optimal solver depends on the data properties and prior information regarding the variables (if available). In general, the data are considered sparse if the ratio of zeros to non-zeros in the input matrix is greater than 10. The solution is sparse when only a subset of the original set of variables is intended to be kept in the model. In a dense solution, all predictors have non-zero coefficients in the final model.
You can specify one of the following solvers:
IRLSM: Iteratively Reweighted Least Squares Method
L_BFGS: Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm
COORDINATE_DESCENT: Coordinate Decent
COORDINATE_DESCENT_NAIVE: Coordinate Decent Naive
AUTO: Sets the solver based on given data and parameters (default)
GRADIENT_DESCENT_LH: Gradient Descent Likelihood (available for Ordinal family only; default for Ordinal family)
GRADIENT_DESCENT_SQERR: Gradient Descent Squared Error (available for Ordinal family only)
Detailed information about each of these options is available in the Solvers section. The bullets below describe how the algorithm chooses the solver when
If there are more than 5k active predictors, the algorithm uses L_BFGS.
alpha=0(ridge or no penalty), the algorithm uses L_BFGS.
If lambda search is enabled, the algorithm uses COORDINATE_DESCENT.
If your data has upper/lower bounds and no proximal penalty, the algorithm uses COORDINATE_DESCENT.
If none above is true, then the algorithm defaults to IRLSM. This is because COORDINATE_DESCENT works much better with lambda search.
Below are some general guidelines to follow when specifying a solver.
L_BFGS works much better for L2-only multininomial and if you have too many active predictors.
You must use IRLSM if you have p-values.
IRLSM and COORDINATE_DESCENT share the same path (i.e., they both compute the same gram matrix), they just solve it differently.
Use COORDINATE_DESCENT if you have less than 5000 predictors and L1 penalty and when
COORDINATE_DESCENT performs better when
lambda_searchis enabled. Also with bounds, it tends to get a higher accuracy.
Use GRADIENT_DESCENT_LH or GRADIENT_DESCENT_SQERR when
family=ordinal. With GRADIENT_DESCENT_LH, the model parameters are adjusted by minimizing the loss function; with GRADIENT_DESCENT_SQERR, the model parameters are adjusted using the loss function.