Recurrent chromosomal alterations provide cytological and molecular
positions for the diagnosis and prognosis of cancer. Comparative
Genomic Hybridization (CGH) has been useful in understanding these
alterations in cancerous cells. CGH datasets consist of samples that
are represented by large dimensional arrays of intervals. Each sample
consists of long runs of intervals with losses and gains.
In this paper, we develop novel SVM based methods for classification
and feature selection of CGH data. For classification, we developed a
novel similarity kernel that is shown to be more effective than the
standard linear kernel used in SVM. For feature selection, we propose
a novel method based on the new kernel that iteratively selects
features that provides the maximum benefit for classification. We
compared our methods against the best wrapper based and filter based
approaches that have been used for feature selection of large
dimensional biological data. Our results on datasets generated from
the Progenetix database, suggests that our methods are considerably
superior to existing methods.