Rough set theory has been a methodology of database mining or knowledge discovery in relational databases. So far, many feature selection also called feature. This theory was initially developed for a finite universe of discourse in which the knowledge base is a. A new definition of fuzzy rough set approximations in an information system based on the divergence measure of fuzzy sets is proposed.
The main assumption is that one feature sequence is. Selflearning facial emotional feature selection based on. To address this problem, we apply fuzzy rough set frs theory 15 as a tool to select the most effective features for detecting phishing websites in this paper. Feature selection, filter method, rough set, boundary region, high dimensional data, kmeans clustering, microarray, symmetric uncertainty. Since its inception, several researchers have attempted to apply fuzzy rough set theory to feature selection. In this approach,examples are represented in the form of attributevalue table with binary values of attributes. In this video, we find the best reduct in an information system using rough set attribute selection. We discuss our results and draw some conclusions in the nal section. The package roughsets attempts to provide a complete tool to model and analyze information systems based on rough set theory rst and fuzzy rough set theory frst. In classical set theory, either an element belongs to a set or it does not. Mammography feature selection using rough set theory.
Feature subset selection using rough sets for high. Mar 16, 2017 rough set theory rst was introduced in the early 1980s by z. Pdf application of rough set theory to feature selection. Feature selection for emotion recognition based on rough set theory 2. Application of rough set theory to feature selection for unsupervised clustering. The present art in feature selection are simply based on heuristic thresholds, 14. Using aco and rough set theory to feature selection. Most developments in the area concentrate on using rough. Finding a minimal subset of the original features is inherent in rough set approach to feature selection.
All the traditional feature selection methods assume that the entire input feature set is available from the beginning. We compare these methods to facilitate the planning of future research on feature selection. Feature selection and rough set theory edgar acosta carleton university march 25, 2008. Many of them used the dependency degree, which is a measure of how well an attribute set can discern between elements as a criteria for feature selection. The properties of the proposed approximations are explored. The rough set is a mathematical approach for incomplete and uncertain data. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. Pdf using aco and rough set theory to feature selection. The basic assumption of rst is that information is. Oct 01, 2015 in this video, we find the best reduct in an information system using rough set attribute selection. An information system in rough sets theory is analogous to a dataset in. In its abstract form, it is a new area of uncertainty mathematics closely related to fuzzy theory. A novel feature selection method using fuzzy rough sets. The methods are called mrsreduct, rsreduct, and rsred.
Pdf feature selection based on the rough set theory and. Pdf feature selection for medical dataset using rough. First, rough set theory rst is known to have several advantages for feature selection it can work only on the original data and does not need any external information or training. Feature selection using rough set for improving the. Feature selection based on the rough set theory and em. A novel fuzzy rough setbased information entropy is constructed for mixed data. Feature selection with rough sets for web page classi cation. Finding a minimal subset of the original features is inherent in rough set. Feature selection techniques aim at reducing the number of unnecessary, irrelevant, or unimportant features. Evolutionary computation in combinatorial optimization evocop 7832, lecture notes in computer science springer, 20 pp. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. Feature reduction based on rough set theory is an effective feature selection method in pattern recognition applications. An exact feature selection algorithm based on rough set theory.
Rough sets theory has opened new trends for the development of the incomplete information theory. In the paper a method of generating sequence of features allowing such identification is presented. Using rough sets with heuristics for feature selection. In section 4, the proposed method of feature selection using the rs theory and em clustering algorithm is outlined. Online streaming feature selection using rough sets sciencedirect. We can use rough set approach to discover structural relationship within imprecise and noisy data. An exact feature selection algorithm based on rough set. We describe the potential benefits of monte carlo approaches such as simulated annealing and genetic algorithms. Request pdf indatabase feature selection using rough set theory despite their traditional roles, database systems increasingly became attractive as scalable analytical platforms using. Pdf a model based on ant colony system and rough set.
Rough set theory rst with greedy heuristics for feature subset selection. Finally, the paper presents numerical results of face and mammogram recognition experiments using neuralnetwork, with feature selection based. Rough set methods in feature selection and recognition. Indatabase feature selection using rough set theory. Rough set theory has been used to dene the necessity of features in literature. It is common practice to use a measure to decide the importance and necessity of features. Jensen and shen proposed a method to find rough set reducts using ant colony optimization aco. Rough set rs is a valid mathematical theory for dealing with imprecise, uncertain, and vague information. Our algorithm for feature selection is based on an application of a rough set method to the result of principal components analysis pca used for feature projection and reduction. In classical rough set theory, it is not possible to consider realvalued or noisy data. Feature selection fs is an important preprocessing step in data mining and.
Theoretical background of the proposed method is rough sets theory. The feature selection process is illustrated with an. Rst offers the heuristic function to measure the quality. Rough sets can be also defined by using, instead of approximations, a rough membership function. We propose a new rough set based feature selection approach called parameterized average support heuristic pash. Then a feature selection method based on rough sets is applied to remove redundant features from the training data. In this paper, we apply the rough set theory to feature selection for web page classi cation. The aim of fs is to select a small subset of most important and discriminative features.
Pdf feature selection algorithms using rough set theory. Understanding and using rough set based feature selection. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. In this paper, based on the existing feature selection method by using a. A filterwrapper method is suggested to select a best feature. Ant colony optimization based feature selection in rough set theory. The pash considers the overall quality of the potential set of rules.
In this paper we propose a model to feature selection based on ant colony and rough set theory rst. Feature selection aims to remove features unnecessary to the target concept. Fuzzy rough set feature selection to enhance phishing attack. Combining rough and fuzzy sets for feature selection. It enables the reader to systematically study all topics of rough set theory rst including the preliminaries, advanced concepts and feature selection using rst. Rst offers the heuristic function to measure the quality of one feature subset. Rough set theory is an extension of set theory for study of the intelligent systems characterized by insufcient and incomplete information 12. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough setbased. Section 5 shows the potential of the proposed method on some real datasets. This thesis proposes and develops an approach based on fuzzy rough sets, fuzzy rough feature selection frfs, that.
On automatic feature selection international journal of. Pawlak 1982 and has become a well researched tool for knowledge discovery. The feature selection process removes the redundant and irrelevant features from the dataset to improve the performance of the classifier. A rough set approach to feature selection based on power set. Feature selection algorithms using rough set theory ieee xplore. Rough set theory is one of many methods that can be employed to analyse uncertain including vague systems, although less common than more traditional methods of probability, statistics, entropy and dempstershafer theory. Feature selection fs is an important preprocessing step in data mining and classification tasks. The essence of feature selection based on rough set approach is to find a subset of the original features attributes using rough set theory. In our application, web pages in a training data set are rst represented using top frequent words. Feature selection using rough sets theory maciej modrzejewski institute of computer science, wut nowowiejska 1519 00665 warsaw, poland abstract.
A feature selection algorithm is formulated based on the proposed entropy. Pdf rough sets theory has opened new trends for the development of the incomplete information theory. The proposed entropy can equivalently characterize the existing attribute reductions in the fuzzy rough set theory. The paper is related to one of the aspects of learning from examples, namely learning how to identify a class of objects a given object instance belongs to. Feature selection based on the rough set theory and. Several approaches to feature selection based on rough set theory are experimentally compared. A model based on ant colony system and rough set theory to feature selection. Feature selection using rough sets theory springer. In this paper, a rough setbased feature selection algorithm is proposed. This paper proposes a rough setbased feature selection method to remove the redundant and irrelevant features in order to improve the performance the classifier. Inside this one, the notion of reduct is a very significa. Rough set theory rst eliminates unimportant or irrelevant features, thus generating a smaller than the original set.
An algorithm for feature selection using the fuzzy positive region is presented. The corresponding membership function is the characteristic function for the set, i. However, the main limitation of rough set based feature selection in the literature is the restrictive requirement that all data is discrete. Feature selection using rough sets theory springerlink. Pdf scalable feature selection using rough set theory. A fast feature selection algorithm by accelerating. Representation and learning of inexact information using rough set theory. Pdf we study the rough set theory as a method of feature selection based on tolerant classes that extends the existing equivalent classes. These methods include nonmonotonicitytolerant branchandbound search and beam search. Our feature selection scheme preprocesses features that are fed into.
We apply fuzzy rough set frs theory as a tool to select most effective features from three benchmarked data sets. Pdf ant colony optimization based feature selection in. Finding reducts, heuristics attribute selection, kdd. The main assumption is that one feature sequence is determined for all possible object instances, that is next feature in the order does not depend on values of the previous features. Therefore, the computation is a function of training examples rather than the dimension of.
114 1433 974 133 513 421 552 749 824 1074 466 1323 151 216 1200 638 1463 574 1 765 818 1223 1009 1310 1263 1048 724 962 284 335 435 213 955 1238 937 824 1224 811 1049 557 403 340 389 457