NAME
AI::MXNet::Gluon::Loss - Base class for loss.
DESCRIPTION
Base class for loss.
Parameters
----------
weight : float or None
Global scalar weight for loss.
batch_axis : int, default 0
The axis that represents mini-batch.
_apply_weighting
Apply weighting to loss.
Parameters
----------
loss : Symbol
The loss to be weighted.
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch separately, `sample_weight` should have
shape (64, 1).
Returns
-------
loss : Symbol
Weighted loss
NAME
AI::MXNet::Gluon::L2Loss
DESCRIPTION
Calculates the mean squared error between output and label:
Output and label can have arbitrary shape as long as they have the same
number of elements.
Parameters
----------
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
batch_axis : int, default 0
The axis that represents mini-batch.
NAME
AI::MXNet::Gluon::L1Loss
DESCRIPTION
Calculates the mean absolute error between output and label:
.. math::
L = \\frac{1}{2}\\sum_i \\vert {output}_i - {label}_i \\vert.
Output and label must have the same shape.
Parameters
----------
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
batch_axis : int, default 0
The axis that represents mini-batch.
NAME
AI::MXNet::Gluon::SigmoidBinaryCrossEntropyLoss
DESCRIPTION
The cross-entropy loss for binary classification. (alias: SigmoidBCELoss)
BCE loss is useful when training logistic regression.
.. math::
loss(o, t) = - 1/n \sum_i (t[i] * log(o[i]) + (1 - t[i]) * log(1 - o[i]))
Parameters
----------
from_sigmoid : bool, default is `False`
Whether the input is from the output of sigmoid. Set this to false will make
the loss calculate sigmoid and then BCE, which is more numerically stable through
log-sum-exp trick.
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
batch_axis : int, default 0
The axis that represents mini-batch.
NAME
AI::MXNet::Gluon::SoftmaxCrossEntropyLoss
DESCRIPTION
Computes the softmax cross entropy loss. (alias: SoftmaxCELoss)
If `sparse_label` is `True`, label should contain integer category indicators:
.. math::
p = {softmax}({output})
L = -\\sum_i {log}(p_{i,{label}_i})
Label's shape should be output's shape without the `axis` dimension. i.e. for
`output.shape` = (1,2,3,4) and axis = 2, `label.shape` should be (1,2,4).
If `sparse_label` is `False`, label should contain probability distribution
with the same shape as output:
.. math::
p = {softmax}({output})
L = -\\sum_i \\sum_j {label}_j {log}(p_{ij})
Parameters
----------
axis : int, default -1
The axis to sum over when computing softmax and entropy.
sparse_label : bool, default True
Whether label is an integer array instead of probability distribution.
from_logits : bool, default False
Whether input is a log probability (usually from log_softmax) instead
of unnormalized numbers.
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
batch_axis : int, default 0
The axis that represents mini-batch.
NAME
AI::MXNet::Gluon::KLDivLoss
DESCRIPTION
The Kullback-Leibler divergence loss.
KL divergence is a useful distance measure for continuous distributions
and is often useful when performing direct regression over the space of
(discretely sampled) continuous output distributions.
.. _Kullback-Leibler divergence:
https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
.. math::
L = 1/n \\sum_i (label_i * (log(label_i) - output_i))
Label's shape should be the same as output's.
Parameters
----------
from_logits : bool, default is `True`
Whether the input is log probability (usually from log_softmax) instead
of unnormalized numbers.
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
batch_axis : int, default 0
The axis that represents mini-batch.
NAME
AI::MXNet::Gluon::CTCLoss
DESCRIPTION
Connectionist Temporal Classification Loss.
See `"Connectionist Temporal Classification: Labelling Unsegmented
Sequence Data with Recurrent Neural Networks"
<http://www.cs.toronto.edu/~graves/icml_2006.pdf>`_ paper for more information.
Parameters
----------
layout : str, default 'NTC'
Layout of the output sequence activation vector.
label_layout : str, default 'NT'
Layout of the labels.
weight : float or None
Global scalar weight for loss.
sample_weight : Symbol or None
Per sample weighting. Must be broadcastable to
the same shape as loss. For example, if loss has
shape (64, 10) and you want to weight each sample
in the batch, `sample_weight` should have shape (64, 1).
This should be used as the fifth argument when calling this loss.
Input shapes:
`data` is an activation tensor (i.e. before softmax).
Its shape depends on `layout`. For `layout='TNC'`, this
input has shape `(sequence_length, batch_size, alphabet_size)`
Note that the last dimension with index `alphabet_size-1` is reserved for special
blank character.
`label` is the label index matrix with zero-indexed labels.
Its shape depends on `label_layout`. For `label_layout='TN'`, this
input has shape `(label_sequence_length, batch_size)`. Padding mask of value ``-1``
is available for dealing with unaligned label lengths.
When `label_lengths` is specified, label lengths are directly used and padding mask
is not allowed in the label.
When `label_lengths` is not specified, the first occurrence of ``-1``
in each sample marks the end of the label sequence of that sample.
For example, suppose the vocabulary is `[a, b, c]`, and in one batch we have three
sequences 'ba', 'cbb', and 'abac'. We can index the labels as `{'a': 0, 'b': 1, 'c': 2}`.
The alphabet size should be 4, and we reserve the channel index 3 for blank label
in data tensor. The padding mask value for extra length is -1, so the resulting `label`
tensor should be padded to be::
[[1, 0, -1, -1], [2, 1, 1, -1], [0, 1, 0, 2]]
`data_lengths` is optional and defaults to None.
When specified, it represents the actual lengths of data.
The shape should be (batch_size,).
If None, the data lengths are treated as being equal to the max sequence length.
This should be used as the third argument when calling this loss.
`label_lengths` is optional and defaults to None.
When specified, it represents the actual lengths of labels.
The shape should be (batch_size,).
If None, the label lengths are derived from the first occurrence of
the value specified by `padding_mask`.
This should be used as the fourth argument when calling this loss.
Output shape:
The CTC loss output has the shape (batch_size,).