Evaluate clustering result¶
Auxiliary script for evaluating the results of the clustering.
After executing the KSC application for testing or training, the assigned cluster lables are available in a file (specified as input argument of the KSC applications). This script can be used to evaluate this result using either the true labels (if available) or the data that have been clustered. The Adjusted Rand Index (ARI) can be computed in the first while the Silhouette Score (SC) in the second case.
Example
If the clustering result is located in the in the \(\texttt{out/CRes.dat}\) file and the true cluster labels are available and located in the \(\texttt{data/data}\_\texttt{Labels.dat}\) file, then the ARI can be computed as
python ../utils/evaluate.py -c out/CRes.dat -t data/data_Labels.dat
In case the true cluster labels are not available and the data used to cluster is located in the \(\texttt{out/data.dat}\) file, one can use the script to compute the SC as
python ../utils/evaluate.py -c out/CRes.dat -d data/data.dat -s
Note
Computing the Silhouette Score might take a long time in case of lage data sets.