Finding Relevant Attributes in High Dimensional Data: A Distributed Computing Hybrid Data Mining Strategy

AuthorSearch for: ; Search for:
TypeArticle
ConferenceTransactions on Rough Sets, 2007
VolumeVolume VI
AbstractIn many domains the data objects are described in terms of a large number of features (e.g. microarray experiments, or spectral characterizations of organic and inorganic samples). A pipelined approach using two clustering algorithms in combination with Rough Sets is investigated for the purpose of discovering important combinations of attributes in high dimensional data. The Leader and several k-means algorithms are used as fast procedures for attribute set simplification of the information systems presented to the rough sets algorithms. The data described in terms of these fewer features are then discretized with respect to the decision attribute according to different rough set based schemes. From them, the reducts and their derived rules are extracted, which are applied to test data in order to evaluate the resulting classification accuracy in crossvalidation experiments. The data mining process is implemented within a high throughput distributed computing environment. Nonlinear transformation of attribute subsets preserving the similarity structure of the data were also investigated. Their classification ability, and that of subsets of attributes obtained after the mining process were described in terms of analytic functions obtained by genetic programming (gene expression programming), and simplified using computer algebra systems. Visual data mining techniques using virtual reality were used for inspecting results. An exploration of this approach (using Leukemia, Colon cancer and Breast cancer gene expression data) was conducted in a series of experiments. They led to small subsets of genes with high discrimination power.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number48766
NPARC number5764714
Export citationExport as RIS
Report a correctionReport a correction
Record identifier4cd37a2c-96cc-49b7-b656-ab3ec99a8508
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)