BahasaMelayuMalaysiaEnglish (UK)

A Multi-Objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

Authors: Gabriel Jong Chiye and Rayner Alfred

Abstract:

Many data in real world applications are collected and kept in multi-relational databases in which traditional data mining algorithms cannot be applied directly in learning multi-relational databases. Many approaches have been proposed in recent years to learn relational data which includes Inductive Logic Programming based approaches, Graph based approaches, Multi-View approaches and also Dynamic Aggregation of Relational Attributes approach. Dynamic Aggregation of Relational Attributes is one of the approaches which can be used to learn relational data. It is capable to transform a multi-relational database into a vector space representation. Traditional clustering algorithm can then be applied directly on the vector space representation to learn and summarize the relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In order to optimize the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations and settings to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering using the Euclidean distance performs better in the classification stage compared to using clustering using the Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster resutls.

 

Download: pdf