paddlespeech.vector.cluster.diarization module

This script contains basic functions used for speaker diarization. This script has an optional dependency on open source sklearn library. A few sklearn functions are modified in this script as per requirement.

class paddlespeech.vector.cluster.diarization.EmbeddingMeta(segset=None, modelset=None, stats=None)[source]

Bases: object

A utility class to pack deep embeddings and meta-information in one object.

Methods

align_models(model_list)

Align models of the current EmbeddingMeta to match a list of models

align_segments(segment_list)

Align segments of the current EmbeddingMeta to match a list of segment

center_stats(mu)

Center first order statistics.

get_mean_stats()

Return the mean of first order statistics.

get_model_stat0(mod_id)

Return zero-order statistics of a given model

get_model_stats(mod_id)

Return first-order statistics of a given model.

get_total_covariance_stats()

Compute and return the total covariance matrix of the first-order statistics.

norm_stats()

Divide all first-order statistics by their Euclidean norm.

rotate_stats(R)

Rotate first-order statistics by a right-product.

sum_stat_per_model()

Sum the zero- and first-order statistics per model and store them in a new EmbeddingMeta.

whiten_stats(mu, sigma[, isSqrInvSigma])

Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance.

align_models(model_list)[source]
Align models of the current EmbeddingMeta to match a list of models

provided as input parameter. The size of the StatServer might be reduced to match the input list of models.

align_segments(segment_list)[source]
Align segments of the current EmbeddingMeta to match a list of segment

provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.

center_stats(mu)[source]

Center first order statistics.

get_mean_stats()[source]

Return the mean of first order statistics.

get_model_stat0(mod_id)[source]

Return zero-order statistics of a given model

get_model_stats(mod_id)[source]

Return first-order statistics of a given model.

get_total_covariance_stats()[source]

Compute and return the total covariance matrix of the first-order statistics.

norm_stats()[source]

Divide all first-order statistics by their Euclidean norm.

rotate_stats(R)[source]

Rotate first-order statistics by a right-product.

sum_stat_per_model()[source]

Sum the zero- and first-order statistics per model and store them in a new EmbeddingMeta. Returns a EmbeddingMeta object with the statistics summed per model and a numpy array with session_per_model.

whiten_stats(mu, sigma, isSqrInvSigma=False)[source]

Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance. If sigma.ndim == 2, case of a single Gaussian with full covariance. If sigma.ndim == 3, case of a full covariance UBM.

class paddlespeech.vector.cluster.diarization.SpecClustUnorm(min_num_spkrs=2, max_num_spkrs=10)[source]

Bases: object

This class implements the spectral clustering with unnormalized affinity matrix. Useful when affinity matrix is based on cosine similarities.

Methods

cluster_embs(emb, k)

Clusters the embeddings using kmeans.

do_spec_clust(X, k_oracle, p_val)

Function for spectral clustering.

get_eigen_gaps(eig_vals)

Returns the difference (gaps) between the Eigen values.

get_laplacian(M)

Returns the un-normalized laplacian for the given affinity matrix.

get_sim_mat(X)

Returns the similarity matrix based on cosine similarities.

get_spec_embs(L[, k_oracle])

Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.

p_pruning(A, pval)

Refine the affinity matrix by zeroing less similar values.

cluster_embs(emb, k)[source]

Clusters the embeddings using kmeans.

Returns:
self.labels_self

Labels for each sample embedding.

do_spec_clust(X, k_oracle, p_val)[source]

Function for spectral clustering.

get_eigen_gaps(eig_vals)[source]

Returns the difference (gaps) between the Eigen values.

Returns:
eig_vals_gap_listlist

List of differences (gaps) between adjacent Eigen values.

get_laplacian(M)[source]

Returns the un-normalized laplacian for the given affinity matrix.

Returns:
Larray

(n_samples, n_samples) Laplacian matrix.

get_sim_mat(X)[source]

Returns the similarity matrix based on cosine similarities.

Returns:
Marray

(n_samples, n_samples). Similarity matrix with cosine similarities between each pair of embedding.

get_spec_embs(L, k_oracle=4)[source]

Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.

Returns:
embarray (n_samples, n_components)

Spectral embedding for each sample with n Eigen components.

num_of_spkint

Estimated number of speakers. If the condition is set to the oracle number of speakers then returns k_oracle.

p_pruning(A, pval)[source]

Refine the affinity matrix by zeroing less similar values.

Returns:
Aarray

(n_samples, n_samples). Prunned affinity matrix based on p_val.

class paddlespeech.vector.cluster.diarization.SpecCluster(n_clusters=8, *, eigen_solver=None, n_components=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=None, verbose=False)[source]

Bases: SpectralClustering

Methods

fit(X[, y])

Perform spectral clustering from features, or affinity matrix.

fit_predict(X[, y])

Perform spectral clustering on X and return cluster labels.

get_params([deep])

Get parameters for this estimator.

perform_sc(X[, n_neighbors])

Performs spectral clustering using sklearn on embeddings.

set_params(**params)

Set the parameters of this estimator.

perform_sc(X, n_neighbors=10)[source]

Performs spectral clustering using sklearn on embeddings.

paddlespeech.vector.cluster.diarization.distribute_overlap(lol)[source]

Distributes the overlapped speech equally among the adjacent segments with different speakers.

Returns:
new_lollist of list

It contains the overlapped part equally divided among the adjacent segments with different speaker IDs.

paddlespeech.vector.cluster.diarization.do_AHC(diary_obj, out_rttm_file, rec_id, k_oracle=4, p_val=0.3)[source]

Performs Agglomerative Hierarchical Clustering on embeddings.

paddlespeech.vector.cluster.diarization.do_spec_clustering(diary_obj, out_rttm_file, rec_id, k, pval, affinity_type, n_neighbors)[source]

Performs spectral clustering on embeddings. This function calls specific clustering algorithms as per affinity.

paddlespeech.vector.cluster.diarization.get_oracle_num_spkrs(rec_id, spkr_info)[source]

Returns actual number of speakers in a recording from the ground-truth. This can be used when the condition is oracle number of speakers.

paddlespeech.vector.cluster.diarization.is_overlapped(end1, start2)[source]

Returns True if segments are overlapping.

Returns:
overlappedbool

True of segments overlapped else False.

paddlespeech.vector.cluster.diarization.merge_ssegs_same_speaker(lol)[source]

Merge adjacent sub-segs from the same speaker.

Returns:
new_lollist of list

new_lol contains adjacent segments merged from the same speaker ID.

paddlespeech.vector.cluster.diarization.read_rttm(rttm_file_path)[source]

Reads and returns RTTM in list format.

Returns:
rttmlist

List containing rows of RTTM file.

paddlespeech.vector.cluster.diarization.spectral_clustering(affinity, n_clusters=8, n_components=None, random_state=None, n_init=10)[source]

Performs spectral clustering.

Returns:
labelsarray

Cluster label for each sample.

paddlespeech.vector.cluster.diarization.spectral_embedding(adjacency, n_components=8, norm_laplacian=True, drop_first=True)[source]

Returns spectral embeddings.

Returns:
embeddingarray

Spectral embeddings for each sample.

paddlespeech.vector.cluster.diarization.write_ders_file(ref_rttm, DER, out_der_file)[source]

Write the final DERs for individual recording.

paddlespeech.vector.cluster.diarization.write_rttm(segs_list, out_rttm_file)[source]

Writes the segment list in RTTM format (A standard NIST format).