DESCRIPTION
Common Optimization algorithms with regularizations.
create_optimizer
Create an optimizer with specified name.
Parameters
----------
name: str
Name of required optimizer. Should be the name
of a subclass of Optimizer. Case insensitive.
rescale_grad : float
Rescaling factor on gradient. Normally should be 1/batch_size.
kwargs: dict
Parameters for optimizer
Returns
-------
opt : Optimizer
The result optimizer.
set_lr_mult
Set individual learning rate multipler for parameters
Parameters
----------
args_lr_mult : dict of string/int to float
set the lr multipler for name/index to float.
setting multipler by index is supported for backward compatibility,
but we recommend using name and symbol.
set_wd_mult
Set individual weight decay multipler for parameters.
By default wd multipler is 0 for all params whose name doesn't
end with _weight, if param_idx2name is provided.
Parameters
----------
args_wd_mult : dict of string/int to float
set the wd multipler for name/index to float.
setting multipler by index is supported for backward compatibility,
but we recommend using name and symbol.
_update_count
update num_update
Parameters:
index : int
The index will be updated
_get_lr
get learning rate for index.
Parameters
----------
index : int
The index for weight
Returns
-------
lr : float
learning rate for this index
_get_wd
get weight decay for index.
Returns 0 for non-weights if the name of weights are provided for __init__.
Parameters
----------
index : int
The index for weight
Returns
-------
wd : float
weight decay for this index
A very simple SGD optimizer with momentum and weight regularization.
Parameters
----------
learning_rate : float, optional
learning_rate of SGD
momentum : float, optional
momentum value
wd : float, optional
L2 regularization coefficient add to all the weights
rescale_grad : float, optional
rescaling factor of gradient. Normally should be 1/batch_size.
clip_gradient : float, optional
clip gradient in range [-clip_gradient, clip_gradient]
param_idx2name : dict of string/int to float, optional
special treat weight decay in parameter ends with bias, gamma, and beta
create_state
Create additional optimizer state such as momentum.
Parameters
----------
weight : NDArray
The weight data
update
Update the parameters.
Parameters
----------
index : int
An unique integer key used to index the parameters
weight : NDArray
weight ndarray
grad : NDArray
grad ndarray
state : NDArray or other objects returned by init_state
The auxiliary state used in optimization.
NAME
AI::MXNet::DCASGD
DESCRIPTION
DCASGD optimizer with momentum and weight regularization.
implement paper "Asynchronous Stochastic Gradient Descent with
Delay Compensation for Distributed Deep Learning"
Parameters
----------
learning_rate : float, optional
learning_rate of SGD
momentum : float, optional
momentum value
lamda : float, optional
scale DC value
wd : float, optional
L2 regularization coefficient add to all the weights
rescale_grad : float, optional
rescaling factor of gradient. Normally should be 1/batch_size.
clip_gradient : float, optional
clip gradient in range [-clip_gradient, clip_gradient]
param_idx2name : hash ref of string/int to float, optional
special treat weight decay in parameter ends with bias, gamma, and beta
create_state
Create additional optimizer state such as momentum.
Parameters
----------
weight : NDArray
The weight data
update
Update the parameters.
Parameters
----------
index : int
An unique integer key used to index the parameters
weight : NDArray
weight ndarray
grad : NDArray
grad ndarray
state : NDArray or other objects returned by init_state
The auxiliary state used in optimization.
6 POD Errors
The following errors were encountered while parsing the POD:
- Around line 270:
=begin without a target?
- Around line 688:
=begin without a target?
- Around line 821:
=begin without a target?
- Around line 902:
=begin without a target?
- Around line 999:
=begin without a target?
- Around line 1132:
=begin without a target?