INITIALIZATION: - choose a random number of topics
NMF TRANSFORMATION: - applied to lemmatized and previosuly cleaned speeches
CLUSTERSING OPTIMISAZTION: - take the opitimal number of clusters, as the one that maximizes the silhouette coefficient
ITERATION: - Step 2 and 3 are repeated for different values of number of topics in a range
CHOOSE OPTIMAL NUMBER OF TOPICS : - Once iteration at step 4 is completed, the optimal number of topics is chosen. It corresponds to the highest silhouette coefficient, among those calculated in step 2 at differente levels of number of topics
As a result, as can be noted in the above figure, speeches clusters result to be really "pure", meaning composed by specches corresponding to one topic only.
Cluster 0 is composed by multi-topics speeches