paddlespeech.s2t.decoders.beam_search.beam_search module

Beam search module.

class paddlespeech.s2t.decoders.beam_search.beam_search.BeamSearch(scorers: Dict[str, ScorerInterface], weights: Dict[str, float], beam_size: int, vocab_size: int, sos: int, eos: int, token_list: Optional[List[str]] = None, pre_beam_ratio: float = 1.5, pre_beam_score_key: Optional[str] = None)[source]

Bases: Layer

Beam search implementation.

Methods

`__call__`(inputs, *kwargs)	Call self as a function.
`add_parameter`(name, parameter)	Adds a Parameter instance.
`add_sublayer`(name, sublayer)	Adds a sub Layer instance.
`append_token`(xs, x)	Append new token to prefix tokens.
`apply`(fn)	Applies `fn` recursively to every sublayer (as returned by `.sublayers()`) as well as self.
`beam`(weighted_scores, ids)	Compute topk full token ids and partial token ids.
`buffers`([include_sublayers])	Returns a list of all buffers from current layer and its sub-layers.
`children`()	Returns an iterator over immediate children layers.
`clear_gradients`()	Clear the gradients of all parameters for this layer.
`create_parameter`(shape[, attr, dtype, ...])	Create parameters for this layer.
`create_tensor`([name, persistable, dtype])	Create Tensor for this layer.
`create_variable`([name, persistable, dtype])	Create Tensor for this layer.
`eval`()	Sets this Layer and all its sublayers to evaluation mode.
`extra_repr`()	Extra representation of this layer, you can have custom implementation of your own layer.
`forward`(x[, maxlenratio, minlenratio])	Perform beam search.
`full_name`()	Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
`init_hyp`(x)	Get an initial hypothesis data.
`load_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`merge_scores`(prev_scores, next_full_scores, ...)	Merge scores for new hypothesis.
`merge_states`(states, part_states, part_idx)	Merge states for new hypothesis.
`named_buffers`([prefix, include_sublayers])	Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
`named_children`()	Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
`named_parameters`([prefix, include_sublayers])	Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
`named_sublayers`([prefix, include_self, ...])	Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
`parameters`([include_sublayers])	Returns a list of all Parameters from current layer and its sub-layers.
`post_process`(i, maxlen, maxlenratio, ...)	Perform post-processing of beam search iterations.
`register_buffer`(name, tensor[, persistable])	Registers a tensor as buffer into the layer.
`register_forward_post_hook`(hook)	Register a forward post-hook for Layer.
`register_forward_pre_hook`(hook)	Register a forward pre-hook for Layer.
`score_full`(hyp, x)	Score new hypothesis by self.full_scorers.
`score_partial`(hyp, ids, x)	Score new hypothesis by self.part_scorers.
`search`(running_hyps, x)	Search new tokens for running hypotheses and encoded speech x.
`set_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`set_state_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`state_dict`([destination, include_sublayers, ...])	Get all parameters and persistable buffers of current layer and its sub-layers.
`sublayers`([include_self])	Returns a list of sub layers.
`to`([device, dtype, blocking])	Cast the parameters and buffers of Layer by the give device, dtype and blocking.
`to_static_state_dict`([destination, ...])	Get all parameters and buffers of current layer and its sub-layers.
`train`()	Sets this Layer and all its sublayers to training mode.

backward
register_state_dict_hook

static append_token(xs: Tensor, x: Union[int, Tensor]) → Tensor[source]

Append new token to prefix tokens.

Args:: xs (paddle.Tensor): The prefix token, (T,) x (int): The new token to append
Returns:: paddle.Tensor: (T+1,), New tensor contains: xs + [x] with xs.dtype and xs.device

beam(weighted_scores: Tensor, ids: Tensor) → Tuple[Tensor, Tensor][source]

Compute topk full token ids and partial token ids.

Args:

weighted_scores (paddle.Tensor): The weighted sum scores for each tokens.: Its shape is (self.n_vocab,).

ids (paddle.Tensor): The partial token ids(Global) to compute topk.

Returns:

Tuple[paddle.Tensor, paddle.Tensor]:: The topk full token ids and partial token ids. Their shapes are (self.beam_size,). i.e. (global ids, global relative local ids).

forward(x: Tensor, maxlenratio: float = 0.0, minlenratio: float = 0.0) → List[Hypothesis][source]

Perform beam search.

Args:

x (paddle.Tensor): Encoded speech feature (T, D) maxlenratio (float): Input length ratio to obtain max output length.

If maxlenratio=0.0 (default), it uses a end-detect function
to automatically find maximum hypothesis lengths

If maxlenratio<0.0, its absolute value is interpreted
as a constant max output length.

minlenratio (float): Input length ratio to obtain min output length.

Returns:

list[Hypothesis]: N-best decoding results

init_hyp(x: Tensor) → List[Hypothesis][source]

Get an initial hypothesis data.

Args:: x (paddle.Tensor): The encoder output feature, (T, D)
Returns:: Hypothesis: The initial hypothesis.

static merge_scores(prev_scores: Dict[str, float], next_full_scores: Dict[str, Tensor], full_idx: int, next_part_scores: Dict[str, Tensor], part_idx: int) → Dict[str, Tensor][source]

Merge scores for new hypothesis.

Args:

prev_scores (Dict[str, float]):: The previous hypothesis scores by self.scorers

next_full_scores (Dict[str, paddle.Tensor]): scores by self.full_scorers full_idx (int): The next token id for next_full_scores next_part_scores (Dict[str, paddle.Tensor]):

scores of partial tokens by self.part_scorers

part_idx (int): The new token id for next_part_scores

Returns:

Dict[str, paddle.Tensor]: The new score dict.: Its keys are names of self.full_scorers and self.part_scorers. Its values are scalar tensors by the scorers.

merge_states(states: Any, part_states: Any, part_idx: int) → Any[source]

Merge states for new hypothesis.

Args:

states: states of self.full_scorers part_states: states of self.part_scorers part_idx (int): The new token id for part_scores

Returns:

Dict[str, paddle.Tensor]: The new score dict.: Its keys are names of self.full_scorers and self.part_scorers. Its values are states of the scorers.

post_process(i: int, maxlen: int, maxlenratio: float, running_hyps: List[Hypothesis], ended_hyps: List[Hypothesis]) → List[Hypothesis][source]

Perform post-processing of beam search iterations.

Args:: i (int): The length of hypothesis tokens. maxlen (int): The maximum length of tokens in beam search. maxlenratio (int): The maximum length ratio in beam search. running_hyps (List[Hypothesis]): The running hypotheses in beam search. ended_hyps (List[Hypothesis]): The ended hypotheses in beam search.
Returns:: List[Hypothesis]: The new running hypotheses.

score_full(hyp: Hypothesis, x: Tensor) → Tuple[Dict[str, Tensor], Dict[str, Any]][source]

Score new hypothesis by self.full_scorers.

Args:

hyp (Hypothesis): Hypothesis with prefix tokens to score x (paddle.Tensor): Corresponding input feature, (T, D)

Returns:

Tuple[Dict[str, paddle.Tensor], Dict[str, Any]]: Tuple of: score dict of hyp that has string keys of self.full_scorers and tensor score values of shape: (self.n_vocab,), and state dict that has string keys and state values of self.full_scorers

score_partial(hyp: Hypothesis, ids: Tensor, x: Tensor) → Tuple[Dict[str, Tensor], Dict[str, Any]][source]

Score new hypothesis by self.part_scorers.

Args:

hyp (Hypothesis): Hypothesis with prefix tokens to score ids (paddle.Tensor): 1D tensor of new partial tokens to score,

len(ids) < n_vocab

x (paddle.Tensor): Corresponding input feature, (T, D)

Returns:

Tuple[Dict[str, paddle.Tensor], Dict[str, Any]]: Tuple of: score dict of hyp that has string keys of self.part_scorers and tensor score values of shape: (len(ids),), and state dict that has string keys and state values of self.part_scorers

search(running_hyps: List[Hypothesis], x: Tensor) → List[Hypothesis][source]

Search new tokens for running hypotheses and encoded speech x.

Args:: running_hyps (List[Hypothesis]): Running hypotheses on beam x (paddle.Tensor): Encoded speech feature (T, D)
Returns:: List[Hypotheses]: Best sorted hypotheses

class paddlespeech.s2t.decoders.beam_search.beam_search.Hypothesis(yseq: Tensor, score: Union[float, Tensor] = 0, scores: Dict[str, Union[float, Tensor]] = {}, states: Dict[str, Any] = {})[source]

Bases: tuple

Hypothesis data type.

Attributes:

score: Alias for field number 1
scores: Alias for field number 2
states: Alias for field number 3
yseq: Alias for field number 0

Methods

`asdict`()	Convert data to JSON-friendly dict.
`count`(value, /)	Return number of occurrences of value.
`index`(value[, start, stop])	Return first index of value.

asdict() → dict[source]: Convert data to JSON-friendly dict.

property score: Alias for field number 1

property scores: Alias for field number 2

property states: Alias for field number 3

property yseq: Alias for field number 0

paddlespeech.s2t.decoders.beam_search.beam_search.beam_search(x: Tensor, sos: int, eos: int, beam_size: int, vocab_size: int, scorers: Dict[str, ScorerInterface], weights: Dict[str, float], token_list: Optional[List[str]] = None, maxlenratio: float = 0.0, minlenratio: float = 0.0, pre_beam_ratio: float = 1.5, pre_beam_score_key: str = 'full') → list[source]

Perform beam search with scorers.

Args:

x (paddle.Tensor): Encoded speech feature (T, D) sos (int): Start of sequence id eos (int): End of sequence id beam_size (int): The number of hypotheses kept during search vocab_size (int): The number of vocabulary scorers (dict[str, ScorerInterface]): Dict of decoder modules

e.g., Decoder, CTCPrefixScorer, LM The scorer will be ignored if it is None

weights (dict[str, float]): Dict of weights for each scorers: The scorer will be ignored if its weight is 0

token_list (list[str]): List of tokens for debug log maxlenratio (float): Input length ratio to obtain max output length.

If maxlenratio=0.0 (default), it uses a end-detect function to automatically find maximum hypothesis lengths

minlenratio (float): Input length ratio to obtain min output length. pre_beam_score_key (str): key of scores to perform pre-beam search pre_beam_ratio (float): beam size in the pre-beam search

will be int(pre_beam_ratio * beam_size)

Returns:

List[Dict]: N-best decoding results