paddlespeech.s2t.models.asr_interface module

ASR Interface module.

class paddlespeech.s2t.models.asr_interface.ASRInterface[source]

Bases: object

ASR Interface model implementation.

Attributes:
attention_plot_class

Get attention plot class.

ctc_plot_class

Get CTC plot class.

Methods

add_arguments(parser)

Add arguments to parser.

build(idim, odim, **kwargs)

Initialize this class with python-level args.

calculate_all_attentions(xs, ilens, ys)

Calculate attention.

calculate_all_ctc_probs(xs, ilens, ys)

Calculate CTC probability.

encode(feat)

Encode feature in beam_search (optional).

forward(xs, ilens, ys, olens)

Compute loss for training.

get_total_subsampling_factor()

Get total subsampling factor.

recognize(x, recog_args[, char_list, rnnlm])

Recognize x for evaluation.

recognize_batch(x, recog_args[, char_list, ...])

Beam search implementation for batch.

scorers()

Get scorers for beam_search (optional).

static add_arguments(parser)[source]

Add arguments to parser.

property attention_plot_class

Get attention plot class.

classmethod build(idim: int, odim: int, **kwargs)[source]

Initialize this class with python-level args.

Args:

idim (int): The number of an input feature dim. odim (int): The number of output vocab.

Returns:

ASRinterface: A new instance of ASRInterface.

calculate_all_attentions(xs, ilens, ys)[source]

Calculate attention.

Parameters:
  • xs (list) -- list of padded input sequences [(T1, idim), (T2, idim), ...]

  • ilens (ndarray) -- batch of lengths of input sequences (B)

  • ys (list) -- list of character id sequence tensor [(L1), (L2), (L3), ...]

Returns:

attention weights (B, Lmax, Tmax)

Return type:

float ndarray

calculate_all_ctc_probs(xs, ilens, ys)[source]

Calculate CTC probability.

Parameters:
  • xs_pad (list) -- list of padded input sequences [(T1, idim), (T2, idim), ...]

  • ilens (ndarray) -- batch of lengths of input sequences (B)

  • ys (list) -- list of character id sequence tensor [(L1), (L2), (L3), ...]

Returns:

CTC probabilities (B, Tmax, vocab)

Return type:

float ndarray

property ctc_plot_class

Get CTC plot class.

encode(feat)[source]

Encode feature in beam_search (optional).

Args:

x (numpy.ndarray): input feature (T, D)

Returns:

paddle.Tensor: encoded feature (T, D)

forward(xs, ilens, ys, olens)[source]

Compute loss for training.

Parameters:
  • xs -- batch of padded source sequences paddle.Tensor (B, Tmax, idim)

  • ilens -- batch of lengths of source sequences (B), paddle.Tensor

  • ys -- batch of padded target sequences paddle.Tensor (B, Lmax)

  • olens -- batch of lengths of target sequences (B), paddle.Tensor

Returns:

loss value

Return type:

paddle.Tensor

get_total_subsampling_factor()[source]

Get total subsampling factor.

recognize(x, recog_args, char_list=None, rnnlm=None)[source]

Recognize x for evaluation.

Parameters:
  • x (ndarray) -- input acouctic feature (B, T, D) or (T, D)

  • recog_args (namespace) -- argment namespace contraining options

  • char_list (list) -- list of characters

  • rnnlm (paddle.nn.Layer) -- language model module

Returns:

N-best decoding results

Return type:

list

recognize_batch(x, recog_args, char_list=None, rnnlm=None)[source]

Beam search implementation for batch.

Parameters:
  • x (paddle.Tensor) -- encoder hidden state sequences (B, Tmax, Henc)

  • recog_args (namespace) -- argument namespace containing options

  • char_list (list) -- list of characters

  • rnnlm (paddle.nn.Module) -- language model module

Returns:

N-best decoding results

Return type:

list

scorers()[source]

Get scorers for beam_search (optional).

Returns:

dict[str, ScorerInterface]: dict of ScorerInterface objects

paddlespeech.s2t.models.asr_interface.dynamic_import_asr(module)[source]

Import ASR models dynamically.

Args:

module (str): asr name. e.g., transformer, conformer

Returns:

type: ASR class