paddlespeech.t2s.utils.error_rate module

This module provides functions to calculate error rate in different level. e.g. wer for word-level, cer for char-level.

paddlespeech.t2s.utils.error_rate.cer(reference, hypothesis, ignore_case=False, remove_space=False)[source]

Calculate charactor error rate (CER). CER compares reference text and hypothesis text in char-level. CER is defined as: .. math:

CER = (Sc + Dc + Ic) / Nc

where .. code-block:: text

Sc is the number of characters substituted, Dc is the number of characters deleted, Ic is the number of characters inserted Nc is the number of characters in the reference

We can use levenshtein distance to calculate CER. Chinese input should be encoded to unicode. Please draw an attention that the leading and tailing space characters will be truncated and multiple consecutive space characters in a sentence will be replaced by one space character.

Args:

reference (str): The reference sentence. hypothesis (str): The hypothesis sentence. ignore_case (bool): Whether case-sensitive or not. remove_space (bool): Whether remove internal space characters

Returns:

float: Character error rate.

Raises:

ValueError: If the reference length is zero.

paddlespeech.t2s.utils.error_rate.char_errors(reference, hypothesis, ignore_case=False, remove_space=False)[source]

Compute the levenshtein distance between reference sequence and hypothesis sequence in char-level.

Args:

reference (str): The reference sentence. hypothesis (str): The hypothesis sentence. ignore_case (bool): Whether case-sensitive or not. remove_space (bool): Whether remove internal space characters

Returns:

list: Levenshtein distance and length of reference sentence.

paddlespeech.t2s.utils.error_rate.wer(reference, hypothesis, ignore_case=False, delimiter=' ')[source]

Calculate word error rate (WER). WER compares reference text and hypothesis text in word-level. WER is defined as: .. math:

WER = (Sw + Dw + Iw) / Nw

where .. code-block:: text

Sw is the number of words subsituted, Dw is the number of words deleted, Iw is the number of words inserted, Nw is the number of words in the reference

We can use levenshtein distance to calculate WER. Please draw an attention that empty items will be removed when splitting sentences by delimiter.

Args:

reference (str): The reference sentence. hypothesis (str): The hypothesis sentence. ignore_case (bool): Whether case-sensitive or not. delimiter (char): Delimiter of input sentences.

Returns:

float: Word error rate.

Raises:

ValueError: If word number of reference is zero.

paddlespeech.t2s.utils.error_rate.word_errors(reference, hypothesis, ignore_case=False, delimiter=' ')[source]

Compute the levenshtein distance between reference sequence and hypothesis sequence in word-level.

Args:
reference (str):

The reference sentence.

hypothesis (str):

The hypothesis sentence.

ignore_case (bool):

Whether case-sensitive or not.

delimiter (char(str)):

Delimiter of input sentences.

Returns:

list: Levenshtein distance and word number of reference sentence.