Ensemble Clustering Based on Feature Selection Approach to Learning Relational Data
Authors: Kung Ke Shin and Rayner Alfred
Many approaches have been developed to learn big relational data. One of the approaches used to learn big relational data is Dynamic Aggregation of Relational Attributes (DARA). The DARA algorithm is designed to summarize relational data with one-to-many relations. However, DARA suffers a major drawback when the cardinalities of attributes are very high because the size of the vector space representation depends on the number of unique values that exist for all attributes in the dataset. A feature selection process can be introduced to overcome this problem. In this work, a genetic algorithm based algorithm is used to select k sets of features using a k-NN classifier. As, different set of selected features may produce different classification results, implementing a novel consensus feature selection method based on all these k sets of the features is highly encouraged in order to achieve a good classification result.