paddlespeech.vector.cluster.diarization module

This script contains basic functions used for speaker diarization. This script has an optional dependency on open source sklearn library. A few sklearn functions are modified in this script as per requirement.

class paddlespeech.vector.cluster.diarization.EmbeddingMeta(segset=None, modelset=None, stats=None)[source]

Bases: object

A utility class to pack deep embeddings and meta-information in one object.

Methods

`align_models`(model_list)	Align models of the current EmbeddingMeta to match a list of models
`align_segments`(segment_list)	Align segments of the current EmbeddingMeta to match a list of segment
`center_stats`(mu)	Center first order statistics.
`get_mean_stats`()	Return the mean of first order statistics.
`get_model_stat0`(mod_id)	Return zero-order statistics of a given model
`get_model_stats`(mod_id)	Return first-order statistics of a given model.
`get_total_covariance_stats`()	Compute and return the total covariance matrix of the first-order statistics.
`norm_stats`()	Divide all first-order statistics by their Euclidean norm.
`rotate_stats`(R)	Rotate first-order statistics by a right-product.
`sum_stat_per_model`()	Sum the zero- and first-order statistics per model and store them in a new EmbeddingMeta.
`whiten_stats`(mu, sigma[, isSqrInvSigma])	Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance.

align_models(model_list)[source]

Align models of the current EmbeddingMeta to match a list of models: provided as input parameter. The size of the StatServer might be reduced to match the input list of models.

align_segments(segment_list)[source]

Align segments of the current EmbeddingMeta to match a list of segment: provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.

center_stats(mu)[source]: Center first order statistics.

get_mean_stats()[source]: Return the mean of first order statistics.

get_model_stat0(mod_id)[source]: Return zero-order statistics of a given model

get_model_stats(mod_id)[source]: Return first-order statistics of a given model.

get_total_covariance_stats()[source]: Compute and return the total covariance matrix of the first-order statistics.

norm_stats()[source]: Divide all first-order statistics by their Euclidean norm.

rotate_stats(R)[source]: Rotate first-order statistics by a right-product.

sum_stat_per_model()[source]: Sum the zero- and first-order statistics per model and store them in a new EmbeddingMeta. Returns a EmbeddingMeta object with the statistics summed per model and a numpy array with session_per_model.

whiten_stats(mu, sigma, isSqrInvSigma=False)[source]: Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance. If sigma.ndim == 2, case of a single Gaussian with full covariance. If sigma.ndim == 3, case of a full covariance UBM.

class paddlespeech.vector.cluster.diarization.SpecClustUnorm(min_num_spkrs=2, max_num_spkrs=10)[source]

Bases: object

This class implements the spectral clustering with unnormalized affinity matrix. Useful when affinity matrix is based on cosine similarities.

Methods

`cluster_embs`(emb, k)	Clusters the embeddings using kmeans.
`do_spec_clust`(X, k_oracle, p_val)	Function for spectral clustering.
`get_eigen_gaps`(eig_vals)	Returns the difference (gaps) between the Eigen values.
`get_laplacian`(M)	Returns the un-normalized laplacian for the given affinity matrix.
`get_sim_mat`(X)	Returns the similarity matrix based on cosine similarities.
`get_spec_embs`(L[, k_oracle])	Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.
`p_pruning`(A, pval)	Refine the affinity matrix by zeroing less similar values.

cluster_embs(emb, k)[source]

Clusters the embeddings using kmeans.

Returns:

self.labels_self: Labels for each sample embedding.

do_spec_clust(X, k_oracle, p_val)[source]: Function for spectral clustering.

get_eigen_gaps(eig_vals)[source]

Returns the difference (gaps) between the Eigen values.

Returns:

eig_vals_gap_listlist: List of differences (gaps) between adjacent Eigen values.

get_laplacian(M)[source]

Returns the un-normalized laplacian for the given affinity matrix.

Returns:

Larray: (n_samples, n_samples) Laplacian matrix.

get_sim_mat(X)[source]

Returns the similarity matrix based on cosine similarities.

Returns:

Marray: (n_samples, n_samples). Similarity matrix with cosine similarities between each pair of embedding.

get_spec_embs(L, k_oracle=4)[source]

Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.

Returns:

embarray (n_samples, n_components): Spectral embedding for each sample with n Eigen components.
num_of_spkint: Estimated number of speakers. If the condition is set to the oracle number of speakers then returns k_oracle.

p_pruning(A, pval)[source]

Refine the affinity matrix by zeroing less similar values.

Returns:

Aarray: (n_samples, n_samples). Prunned affinity matrix based on p_val.

class paddlespeech.vector.cluster.diarization.SpecCluster(n_clusters=8, *, eigen_solver=None, n_components=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=None, verbose=False)[source]

Bases: SpectralClustering

Methods

`fit`(X[, y])	Perform spectral clustering from features, or affinity matrix.
`fit_predict`(X[, y])	Perform spectral clustering on X and return cluster labels.
`get_params`([deep])	Get parameters for this estimator.
`perform_sc`(X[, n_neighbors])	Performs spectral clustering using sklearn on embeddings.
`set_params`(**params)	Set the parameters of this estimator.

perform_sc(X, n_neighbors=10)[source]: Performs spectral clustering using sklearn on embeddings.

paddlespeech.vector.cluster.diarization.distribute_overlap(lol)[source]

Distributes the overlapped speech equally among the adjacent segments with different speakers.

Returns:

new_lollist of list: It contains the overlapped part equally divided among the adjacent segments with different speaker IDs.

paddlespeech.vector.cluster.diarization.do_AHC(diary_obj, out_rttm_file, rec_id, k_oracle=4, p_val=0.3)[source]: Performs Agglomerative Hierarchical Clustering on embeddings.

paddlespeech.vector.cluster.diarization.do_spec_clustering(diary_obj, out_rttm_file, rec_id, k, pval, affinity_type, n_neighbors)[source]: Performs spectral clustering on embeddings. This function calls specific clustering algorithms as per affinity.

paddlespeech.vector.cluster.diarization.get_oracle_num_spkrs(rec_id, spkr_info)[source]: Returns actual number of speakers in a recording from the ground-truth. This can be used when the condition is oracle number of speakers.

paddlespeech.vector.cluster.diarization.is_overlapped(end1, start2)[source]

Returns True if segments are overlapping.

Returns:

overlappedbool: True of segments overlapped else False.

paddlespeech.vector.cluster.diarization.merge_ssegs_same_speaker(lol)[source]

Merge adjacent sub-segs from the same speaker.

Returns:

new_lollist of list: new_lol contains adjacent segments merged from the same speaker ID.

paddlespeech.vector.cluster.diarization.read_rttm(rttm_file_path)[source]

Reads and returns RTTM in list format.

Returns:

rttmlist: List containing rows of RTTM file.

paddlespeech.vector.cluster.diarization.spectral_clustering(affinity, n_clusters=8, n_components=None, random_state=None, n_init=10)[source]

Performs spectral clustering.

Returns:

labelsarray: Cluster label for each sample.

paddlespeech.vector.cluster.diarization.spectral_embedding(adjacency, n_components=8, norm_laplacian=True, drop_first=True)[source]

Returns spectral embeddings.

Returns:

embeddingarray: Spectral embeddings for each sample.

paddlespeech.vector.cluster.diarization.write_ders_file(ref_rttm, DER, out_der_file)[source]: Write the final DERs for individual recording.

paddlespeech.vector.cluster.diarization.write_rttm(segs_list, out_rttm_file)[source]: Writes the segment list in RTTM format (A standard NIST format).