paddlespeech.s2t.models.asr_interface module

ASR Interface module.

class paddlespeech.s2t.models.asr_interface.ASRInterface[source]

Bases: object

ASR Interface model implementation.

Attributes:

attention_plot_class: Get attention plot class.
ctc_plot_class: Get CTC plot class.

Methods

`add_arguments`(parser)	Add arguments to parser.
`build`(idim, odim, **kwargs)	Initialize this class with python-level args.
`calculate_all_attentions`(xs, ilens, ys)	Calculate attention.
`calculate_all_ctc_probs`(xs, ilens, ys)	Calculate CTC probability.
`encode`(feat)	Encode feature in beam_search (optional).
`forward`(xs, ilens, ys, olens)	Compute loss for training.
`get_total_subsampling_factor`()	Get total subsampling factor.
`recognize`(x, recog_args[, char_list, rnnlm])	Recognize x for evaluation.
`recognize_batch`(x, recog_args[, char_list, ...])	Beam search implementation for batch.
`scorers`()	Get scorers for beam_search (optional).

static add_arguments(parser)[source]: Add arguments to parser.

property attention_plot_class: Get attention plot class.

classmethod build(idim: int, odim: int, **kwargs)[source]

Initialize this class with python-level args.

Args:: idim (int): The number of an input feature dim. odim (int): The number of output vocab.
Returns:: ASRinterface: A new instance of ASRInterface.

calculate_all_attentions(xs, ilens, ys)[source]

Calculate attention.

Parameters:

xs (list) -- list of padded input sequences [(T1, idim), (T2, idim), ...]
ilens (ndarray) -- batch of lengths of input sequences (B)
ys (list) -- list of character id sequence tensor [(L1), (L2), (L3), ...]

Returns:

attention weights (B, Lmax, Tmax)

Return type:

float ndarray

calculate_all_ctc_probs(xs, ilens, ys)[source]

Calculate CTC probability.

Parameters:

xs_pad (list) -- list of padded input sequences [(T1, idim), (T2, idim), ...]
ilens (ndarray) -- batch of lengths of input sequences (B)
ys (list) -- list of character id sequence tensor [(L1), (L2), (L3), ...]

Returns:

CTC probabilities (B, Tmax, vocab)

Return type:

float ndarray

property ctc_plot_class: Get CTC plot class.

encode(feat)[source]

Encode feature in beam_search (optional).

Args:: x (numpy.ndarray): input feature (T, D)
Returns:: paddle.Tensor: encoded feature (T, D)

forward(xs, ilens, ys, olens)[source]

Compute loss for training.

Parameters:

xs -- batch of padded source sequences paddle.Tensor (B, Tmax, idim)
ilens -- batch of lengths of source sequences (B), paddle.Tensor
ys -- batch of padded target sequences paddle.Tensor (B, Lmax)
olens -- batch of lengths of target sequences (B), paddle.Tensor

Returns:

loss value

Return type:

paddle.Tensor

get_total_subsampling_factor()[source]: Get total subsampling factor.

recognize(x, recog_args, char_list=None, rnnlm=None)[source]

Recognize x for evaluation.

Parameters:

x (ndarray) -- input acouctic feature (B, T, D) or (T, D)
recog_args (namespace) -- argment namespace contraining options
char_list (list) -- list of characters
rnnlm (paddle.nn.Layer) -- language model module

Returns:

N-best decoding results

Return type:

list

recognize_batch(x, recog_args, char_list=None, rnnlm=None)[source]

Beam search implementation for batch.

Parameters:

x (paddle.Tensor) -- encoder hidden state sequences (B, Tmax, Henc)
recog_args (namespace) -- argument namespace containing options
char_list (list) -- list of characters
rnnlm (paddle.nn.Module) -- language model module

Returns:

N-best decoding results

Return type:

list

scorers()[source]

Get scorers for beam_search (optional).

Returns:: dict[str, ScorerInterface]: dict of ScorerInterface objects

paddlespeech.s2t.models.asr_interface.dynamic_import_asr(module)[source]

Import ASR models dynamically.

Args:: module (str): asr name. e.g., transformer, conformer
Returns:: type: ASR class