NAME
AI::MXNet::Gluon::Trainer
DESCRIPTION
Applies an `Optimizer` on a set of Parameters. Trainer should
be used together with `autograd`.
Parameters
----------
params : AI::MXNet::Gluon::ParameterDict
The set of parameters to optimize.
optimizer : str or Optimizer
The optimizer to use. See
`help <http://mxnet.io/api/python/optimization/optimization.html#the-mxnet-optimizer-package>`_
on Optimizer for a list of available optimizers.
optimizer_params : hash ref
Key-word arguments to be passed to optimizer constructor. For example,
{learning_rate => 0.1}. All optimizers accept learning_rate, wd (weight decay),
clip_gradient, and lr_scheduler. See each optimizer's
constructor for a list of additional supported arguments.
kvstore : str or KVStore
kvstore type for multi-gpu and distributed training. See help on
mx->kvstore->create for more information.
compression_params : hash ref
Specifies type of gradient compression and additional arguments depending
on the type of compression being used. For example, 2bit compression requires a threshold.
Arguments would then be {type => '2bit', threshold => 0.5}
See AI::MXNet::KVStore->set_gradient_compression method for more details on gradient compression.
update_on_kvstore : Bool, default undef
Whether to perform parameter updates on kvstore. If undef, then trainer will choose the more
suitable option depending on the type of kvstore.
Properties
----------
learning_rate : float
The current learning rate of the optimizer. Given an Optimizer object
optimizer, its learning rate can be accessed as optimizer->learning_rate.
step
Makes one step of parameter update. Should be called after
`autograd->backward()` and outside of `record()` scope.
For normal parameter updates, `step()` should be used, which internally calls
`allreduce_grads()` and then `update()`. However, if you need to get the reduced
gradients to perform certain transformation, such as in gradient clipping, then
you may want to manually call `allreduce_grads()` and `update()` separately.
Parameters
----------
$batch_size : Int
Batch size of data processed. Gradient will be normalized by `1/batch_size`.
Set this to 1 if you normalized loss manually with `loss = mean(loss)`.
$ignore_stale_grad : Bool, optional, default=False
If true, ignores Parameters with stale gradient (gradient that has not
been updated by `backward` after last step) and skip update.
allreduce_grads
For each parameter, reduce the gradients from different contexts.
Should be called after `autograd.backward()`, outside of `record()` scope,
and before `trainer.update()`.
For normal parameter updates, `step()` should be used, which internally calls
`allreduce_grads()` and then `update()`. However, if you need to get the reduced
gradients to perform certain transformation, such as in gradient clipping, then
you may want to manually call `allreduce_grads()` and `update()` separately.
set_learning_rate
Sets a new learning rate of the optimizer.
Parameters
----------
lr : float
The new learning rate of the optimizer.
update
Makes one step of parameter update.
Should be called after autograd->backward() and outside of record() scope,
and after trainer->update`.
For normal parameter updates, step() should be used, which internally calls
allreduce_grads() and then update(). However, if you need to get the reduced
gradients to perform certain transformation, such as in gradient clipping, then
you may want to manually call allreduce_grads() and update() separately.
Parameters
----------
$batch_size : Int
Batch size of data processed. Gradient will be normalized by `1/$batch_size`.
Set this to 1 if you normalized loss manually with $loss = mean($loss).
$ignore_stale_grad : Bool, optional, default=False
If true, ignores Parameters with stale gradient (gradient that has not
been updated by backward() after last step) and skip update.
save_states
Saves trainer states (e.g. optimizer, momentum) to a file.
Parameters
----------
fname : str
Path to output states file.
load_states
Loads trainer states (e.g. optimizer, momentum) from a file.
Parameters
----------
fname : str
Path to input states file.