paddlespeech.vector.io.dataset module

class paddlespeech.vector.io.dataset.CSVDataset(csv_path, label2id_path=None, config=None, random_chunk=True, feat_type: str = 'raw', n_train_snts: int = -1, **kwargs)[source]

Bases: Dataset

Methods

convert_to_record(idx)

convert the dataset sample to training record the CSV Dataset

load_data_csv()

Load the csv dataset content and store them in the data property the csv dataset's format has six fields, that is audio_id or utt_id, audio duration, segment start point, segment stop point and utterance label.

load_speaker_to_label()

Load the utterance label map content.

convert_to_record(idx: int)[source]

convert the dataset sample to training record the CSV Dataset

Args:

idx (int) : the request index in all the dataset

load_data_csv()[source]

Load the csv dataset content and store them in the data property the csv dataset's format has six fields, that is audio_id or utt_id, audio duration, segment start point, segment stop point and utterance label. Note in training period, the utterance label must has a map to integer id in label2id_path

Returns:

list: the csv data with meta_info type

load_speaker_to_label()[source]

Load the utterance label map content. In vector domain, we call the utterance label as speaker label. The speaker label is real speaker label in speaker verification domain, and in language identification is language label.

class paddlespeech.vector.io.dataset.meta_info(utt_id: str, duration: float, wav: str, start: int, stop: int, label: str)[source]

Bases: object

the audio meta info in the vector CSVDataset

Args:

utt_id (str): the utterance segment name duration (float): utterance segment time wav (str): utterance file path start (int): start point in the original wav file stop (int): stop point in the original wav file lab_id (str): the utterance segment's label id

duration: float
label: str
start: int
stop: int
utt_id: str
wav: str