paddlespeech.vector.cluster.diarization module
This script contains basic functions used for speaker diarization. This script has an optional dependency on open source sklearn library. A few sklearn functions are modified in this script as per requirement.
- class paddlespeech.vector.cluster.diarization.EmbeddingMeta(segset=None, modelset=None, stats=None)[source]
Bases:
object
A utility class to pack deep embeddings and meta-information in one object.
Methods
align_models
(model_list)Align models of the current EmbeddingMeta to match a list of models
align_segments
(segment_list)Align segments of the current EmbeddingMeta to match a list of segment
center_stats
(mu)Center first order statistics.
Return the mean of first order statistics.
get_model_stat0
(mod_id)Return zero-order statistics of a given model
get_model_stats
(mod_id)Return first-order statistics of a given model.
Compute and return the total covariance matrix of the first-order statistics.
Divide all first-order statistics by their Euclidean norm.
rotate_stats
(R)Rotate first-order statistics by a right-product.
Sum the zero- and first-order statistics per model and store them in a new EmbeddingMeta.
whiten_stats
(mu, sigma[, isSqrInvSigma])Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance.
- align_models(model_list)[source]
- Align models of the current EmbeddingMeta to match a list of models
provided as input parameter. The size of the StatServer might be reduced to match the input list of models.
- align_segments(segment_list)[source]
- Align segments of the current EmbeddingMeta to match a list of segment
provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.
- get_total_covariance_stats()[source]
Compute and return the total covariance matrix of the first-order statistics.
- class paddlespeech.vector.cluster.diarization.SpecClustUnorm(min_num_spkrs=2, max_num_spkrs=10)[source]
Bases:
object
This class implements the spectral clustering with unnormalized affinity matrix. Useful when affinity matrix is based on cosine similarities.
Methods
cluster_embs
(emb, k)Clusters the embeddings using kmeans.
do_spec_clust
(X, k_oracle, p_val)Function for spectral clustering.
get_eigen_gaps
(eig_vals)Returns the difference (gaps) between the Eigen values.
Returns the un-normalized laplacian for the given affinity matrix.
get_sim_mat
(X)Returns the similarity matrix based on cosine similarities.
get_spec_embs
(L[, k_oracle])Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.
p_pruning
(A, pval)Refine the affinity matrix by zeroing less similar values.
- cluster_embs(emb, k)[source]
Clusters the embeddings using kmeans.
- Returns:
- self.labels_self
Labels for each sample embedding.
- get_eigen_gaps(eig_vals)[source]
Returns the difference (gaps) between the Eigen values.
- Returns:
- eig_vals_gap_listlist
List of differences (gaps) between adjacent Eigen values.
- get_laplacian(M)[source]
Returns the un-normalized laplacian for the given affinity matrix.
- Returns:
- Larray
(n_samples, n_samples) Laplacian matrix.
- get_sim_mat(X)[source]
Returns the similarity matrix based on cosine similarities.
- Returns:
- Marray
(n_samples, n_samples). Similarity matrix with cosine similarities between each pair of embedding.
- get_spec_embs(L, k_oracle=4)[source]
Returns spectral embeddings and estimates the number of speakers using maximum Eigen gap.
- Returns:
- embarray (n_samples, n_components)
Spectral embedding for each sample with n Eigen components.
- num_of_spkint
Estimated number of speakers. If the condition is set to the oracle number of speakers then returns k_oracle.
- class paddlespeech.vector.cluster.diarization.SpecCluster(n_clusters=8, *, eigen_solver=None, n_components=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=None, verbose=False)[source]
Bases:
SpectralClustering
Methods
fit
(X[, y])Perform spectral clustering from features, or affinity matrix.
fit_predict
(X[, y])Perform spectral clustering on X and return cluster labels.
get_params
([deep])Get parameters for this estimator.
perform_sc
(X[, n_neighbors])Performs spectral clustering using sklearn on embeddings.
set_params
(**params)Set the parameters of this estimator.
- paddlespeech.vector.cluster.diarization.distribute_overlap(lol)[source]
Distributes the overlapped speech equally among the adjacent segments with different speakers.
- Returns:
- new_lollist of list
It contains the overlapped part equally divided among the adjacent segments with different speaker IDs.
- paddlespeech.vector.cluster.diarization.do_AHC(diary_obj, out_rttm_file, rec_id, k_oracle=4, p_val=0.3)[source]
Performs Agglomerative Hierarchical Clustering on embeddings.
- paddlespeech.vector.cluster.diarization.do_spec_clustering(diary_obj, out_rttm_file, rec_id, k, pval, affinity_type, n_neighbors)[source]
Performs spectral clustering on embeddings. This function calls specific clustering algorithms as per affinity.
- paddlespeech.vector.cluster.diarization.get_oracle_num_spkrs(rec_id, spkr_info)[source]
Returns actual number of speakers in a recording from the ground-truth. This can be used when the condition is oracle number of speakers.
- paddlespeech.vector.cluster.diarization.is_overlapped(end1, start2)[source]
Returns True if segments are overlapping.
- Returns:
- overlappedbool
True of segments overlapped else False.
- paddlespeech.vector.cluster.diarization.merge_ssegs_same_speaker(lol)[source]
Merge adjacent sub-segs from the same speaker.
- Returns:
- new_lollist of list
new_lol contains adjacent segments merged from the same speaker ID.
- paddlespeech.vector.cluster.diarization.read_rttm(rttm_file_path)[source]
Reads and returns RTTM in list format.
- Returns:
- rttmlist
List containing rows of RTTM file.
- paddlespeech.vector.cluster.diarization.spectral_clustering(affinity, n_clusters=8, n_components=None, random_state=None, n_init=10)[source]
Performs spectral clustering.
- Returns:
- labelsarray
Cluster label for each sample.
- paddlespeech.vector.cluster.diarization.spectral_embedding(adjacency, n_components=8, norm_laplacian=True, drop_first=True)[source]
Returns spectral embeddings.
- Returns:
- embeddingarray
Spectral embeddings for each sample.