Topics clustering


Process followed:

  1. INITIALIZATION:  - choose a random number of topics 
  2. NMF TRANSFORMATION:  - applied to lemmatized and previosuly cleaned speeches 
  3. CLUSTERSING OPTIMISAZTION:  - take the opitimal number of clusters, as the one that maximizes the silhouette coefficient
  4. ITERATION: - Step 2 and 3 are repeated for different values of number of topics in a range
  5. CHOOSE OPTIMAL NUMBER OF TOPICS : - Once iteration at step 4 is completed, the optimal number of topics is chosen. It corresponds to the highest silhouette coefficient, among those calculated in step 2 at differente levels of number of topics

As a result, as can be noted in the above figure, speeches clusters result to be really "pure", meaning composed by specches corresponding to one topic only.
Cluster 0 is composed by multi-topics speeches