# mlpack_logistic_regression man page

**mlpack_logistic_regression** — l2-regularized logistic regression and prediction

## Synopsis

mlpack_logistic_regression[-h] [-v]

## Description

An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD (stochastic gradient descent). This solves the regression problem

y = (1 / 1 + e^-(X * b))

where y takes values 0 or 1.

This program allows loading a logistic regression model from a file (**-i**) or training a logistic regression model given training data (**-t**), or both those things at once. In addition, this program allows classification on a test dataset (**-T**) and will save the classification results to the given output file (**-o**). The logistic regression model itself may be saved with a file specified using the **-m** option.

The training data given with the **-t** option should have class labels as its last dimension (so, if the training data is in CSV format, labels should be the last column). Alternately, the **-l** (**--labels_file**) option may be used to specify a separate file of labels.

When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be specified with the **-l** option, and the optimizer used to train the model can be specified with the **--optimizer** option. Available options are 'sgd' (stochastic gradient descent), 'lbfgs' (the L-BFGS optimizer), and 'minibatch-sgd' (minibatch stochastic gradient descent). There are also various parameters for the optimizer; the **--max_iterations** parameter specifies the maximum number of allowed iterations, and the **--tolerance** (**-e**) parameter specifies the tolerance for convergence. For the SGD and mini-batch SGD optimizers, the **--step_size** parameter controls the step size taken at each iteration by the optimizer. The batch size for mini-batch SGD is controlled with the **--batch_size** (**-b**) parameter. If the objective function for your data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for the optimizers, but the C++ interface must be used to access these.

For SGD, an iteration refers to a single point, and for mini-batch SGD, an iteration refers to a single batch. So to take a single pass over the dataset with SGD, **--max_iterations** should be set to the number of points in the dataset.

Optionally, the model can be used to predict the responses for another matrix of data points, if **--test_file** is specified. The **--test_file** option can be specified without **--input_file**, so long as an existing logistic regression model is given with **--model_file**. The output predictions from the logistic regression model are stored in the file given with **--output_predictions**.

This implementation of logistic regression does not support the general multi-class case but instead only the two-class case. Any responses must be either 0 or 1.

## Optional Input Options

**--batch_size (-b) [int]**-
Batch size for mini-batch SGD. Default value

- 50.
**--decision_boundary**(**-d**) [double] Decision boundary for prediction; if the logistic function for a point is less than the boundary, the class is taken to be 0; otherwise, the class is 1. Default value 0.5.

**--help (-h)**Default help info.

**--info [string]**Get help on a specific module or option. Default value ''.

**--input_model_file**(**-m**) [string] File containing existing model (parameters). Default value ''.**--labels_file (-l) [string]**A file containing labels (0 or 1) for the points in the training set (y). Default value ''.

**--lambda (-L) [double]**L2-regularization parameter for training. Default value 0.

**--max_iterations (-n) [int]**Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

**--optimizer (-O) [string]**Optimizer to use for training ('lbfgs' or ’sgd'). Default value 'lbfgs'.

**--step_size (-s) [double]**Step size for SGD and mini-batch SGD optimizers. Default value 0.01.

**--test_file (-T) [string]**File containing test dataset. Default value ’'.

**--tolerance (-e) [double]**Convergence tolerance for optimizer. Default value 1e-10.

**--training_file**(**-t**) [string] A file containing the training set (the matrix of predictors, X). Default value ''.**--verbose (-v)**Display informational messages and the full list of parameters and timers at the end of execution.

**--version (-V)**Display the version of mlpack.

## Optional Output Options

**--output_file (-o) [string]**If

**--test_file**is specified, this file is where the predictions for the test set will be saved. Default value ''.**--output_model_file**(**-M**) [string] File to save trained logistic regression model to. Default value ''.**--output_probabilities_file**(**-p**) [string] If**--test_file**is specified, this file is where the class probabilities for the test set will be saved. Default value ''.

## Additional Information

## Additional Information

For further information, including relevant papers, citations, and theory, For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your consult the documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK. DISTRIBUTION OF MLPACK.