Feature selection methods for medical data can be defined as various features that are considered when selecting data storage method with an aim of ensuring that the method is reliable and data can be retrieved fast when needed for various medical purposes. The method can be used to store the progress of the patient. Majority of feature selection methods are usually based on various theories of selection. One of these theories that are largely in use is the rough set. The feature selection method is very crucial as it helps in storage of data that is much needed to solve various medical related problems. It ensures that the stored data can be easily retrieved when needed, and there is no time wastage in terms of storage space, as well as time consumed in various computations restricting the application of the method that is chosen for medical dataset. The following are the various types of methods that are used in medical data; filter method, wrapper method and finally the hybrid method.
Feature Selection Methods on Cardiotocography dataset
There is little research that talks about the feature selection methods on the various carditocography dataset. The cardiotocography data set consists of the various measurements taken on fetal heart rates, as well as the uterine contractions characteristics on the cardiotocograms, and then classified by an expert obstetrician.
The fetal cardiotograms are usually robotically processed, then the particular analytical characteristics are measured. The feature selection methods that are used in the cardiotocography data set have been discussed in this paper. The electronic fetal monitoring, known as the constant recoding of the various cardiotocograms, usually consists of fetal heart rates as well as the tocographic signals.
Electronic fetal monitoring is one of the methods that are used in the intrapartum analysis of the health of the fetus. In this section the analysis of the various subsets features are used for result categorization of a pregnancy, which is based on the recording of CTG during the last 20 minutes prior to the actual delivery. The various subset features are developed depending on PCA gain information, as well as the GAME (group of adaptive models evolution), which is a neural system feature choice algorithm. According to various researches, the best form of subset should be the one that constitutes of a mix of non-linear as well as the time domain features. The mix tends to perform constantly over the entire data sets with high level of sensitivity as well as specificity at a level of 70%, this level is very equivalent to the interobserver variations. The paper’s following section will include the studying of individual feature selection methods for medical dataset.
Filter Method for Medical Data
Medical applications are usually characterized by huge numbers of illness makers as well as a very small number of records. Research has shown that a complete aspect ranking that is followed by choice, usually results into noteworthy reduction in data’s dimensionality, with notable improvements during the implementation, as well as performance of the various classifiers for the sole purpose of medical diagnosis. The paper describes the use of a novel advance in ranking various features consistent with their qualities using a number of properties, which are unique erudition algorithms that is based on GMHD. One can adapt the usage of system network training algorithm that is continually used in selection of groups that are used as most favorable predictors.
Various GMDH methods have been put forward to operate on the entire preparation of dataset thus doing away with the need of devoted selection set. The approach of adaptive learning system/network tends to make use of the predetermined square errors criteria for selecting, as well as stopping, with an aim of avoiding model over fittings, this in turn helps in solving the problems, when stoppage is appropriate for the training of neural networks. The criteria, when used, reduce the expected errors that would result while using the network for prediction of new data.
Consensus feature ranking is a method that focuses on various medical datasets. Some of the methods that have been put forward, as been popularly used in the consensus feature ranking, usually do not take into account the various missing values, as well as the unbalanced distribution of the data. They also ignore studying the bias of the consensus ranking method in relation to various specific classifiers. One of the methods used under the consensus ranking is the ‘group method’, which handles various feature based ranking for the medical data. The method tends to use the GMHD learning algorithm with an aim of automatically selecting the best prediction features at diverse levels of the user particular model of complexity, where the ROC is to be used in evaluating the classifiers’ performance.
The implantation of the cardiac pacemaker is a complex procedure. For the procedure to succeed it depends unswervingly to the proper categorization of the patients as well as the choice made on the nature of pacing. One can use machine - learning algorithms for the purpose of supporting the process. Feature selection process is the most crucial component. Research has shown that when implementing the selection feature methods working on electrocardiological datasets, it minimizes the initial set of features by about 60%. Because of this minimization in the search space, a decrease in number of decision rules that are generated by 6 to 10 factors is observed. The result of this reduction is faster and easier practical cardiological justification of the rules , broader rules tend to adapt better for the sole purpose of recognizing new cases, as well as the computation efforts tend to be reduced. These cases have been confirmed in various clinical practices (Echauz & Vachtsevanos, 1994).
In what is known as Parkinson’s disease, an evaluation of the medical available could result to highlighting of some of the symptoms that can be useful in a harmonizing tool at the early stages of diagnosis. Various researches have been carried out to evaluate how filter feature selections algorithms, as well as the combinations of the two that are useful in determination of some important features, are related to this challenge. The studies that have been conducted, have used the data set of various patients with an aim of determining the various sets of premorbid behavior traits, which can be useful at the earliest stages of diagnosing Parkinsonism.
Data mining is the process by which automatically previously valid, unknown, as well as actionable, knowledge or patterns are extracted out of big databases for important decision support. Classification analysis, which is one of the data mining approaches, is adopted in filter method. It is useful in supporting various medical decisions relating to diagnosis. They also help in improving the quality of care given to patients. It is important to note that if the training datasets tend to have irrelevant attributes, then classification analysis will end up being less accurate and may produce results, which are hard to understand. Automatic feature selection is one of the commonly used feature selection method/technique (Kondo, Pandya &, Zurada, 1999).
Feature subset selection is very crucial in the area of data mining. Due to the increased data dimensionality, training, as well as testing of the general classification techniques, is difficult. Filter method for selection usually uses classifying Pima Indian diabetic database (PIDD) model. The model usually constitutes of two phases. In the first phase genetic algorithm as well as correlation based features selection are used in a given cascaded fashions. The generic algorithm has rendered global investigation of various attributes of fitness analysis that are affected by CFS. Stage/phase two constitutes of a well, fined tuned categorization, which is usually done through the use of artificial neural network. This involves making of the features subset elicited in phase one as an input for the artificial network (Abdel-Aal RE, Mangoud &Abductive, 1996).
Studies have shown that data mining is very useful in a number of applications. One of those areas of application is in the health care systems. It is very crucial to note that a medical database tends to have huge quantities of information relating to patients plus their history of medical progress. Evaluating this huge information manually is practically impossible. The medical information may be having valuable data that may be useful in saving lives if it is analyzed and utilized properly. The technology of data mining tends to be effective in the various health care applications when used to identify patterns, as well as derivation of useful data from the medical databases. The technology of data mining is used in filtering the data from the various medical databases. The technology is used for filtering the data that is in huge quantities. The main method of feature collection that is used for data mining in this case is combination ranking search technique.
In this case, Bayesian classification methods/models are used in filtering the data for medical purposes. Application of Bayesian models are usually based on the Bayesian networks. The feature subsets selection is very useful because of its heterogeneity of the various medical databases, and not all the variable in question are used in performance of classification. The filter methods are adapted in this technique with an aim of inducing the Bayesian classifiers and are useful in distinguishing the two groups of patients suffering from cirrhotic disease (Rosa In˜aki Inza a, Marisa, Quiroga & Pedro, 2005).
Kernel F-score is another feature selection of the medical information. It is usually used as one of the pre-processing procedure in categorization of various medical datasets. KFFS usually constitutes of two stages (Kemal & Güne%u015F, 2009). The first stage is the input spaces of the various medical datasets. The feature subsets selections are of crucial significance in the area of data mining. The higher the dimension data the difficult the testing is, as well as the training of the various general categorization techniques. . Research have shown that the usage of subsets, which are selected by CFS filter, results to an improvement in both, radial basis functions network and back propagations neural networks classification correctness, when they are compared to feature subsets selected through data gain filter (Han & Kamber, 2001).
Wrapper Method for Medical Data
On the research analysis, the Parkinson’s disease has been used. It is is a severe or high chronic disease of continuous disorder; its symptoms progress that get worse over given period of time. Over a million people in USA are currently living with Parkinson’s disease. The causes of the disease remain unknown to many medical experts and researchers. Currently, there is no cure for the disease though various medical treatment methods exist. They include, among others, the popular one such as medical surgery and an everyday drug usage. The disease highly affects the brain nerves cells causing their malfunctioning. In extreme cases, the disease causes death of certain brain cells by producing a chemical called dopamine that sends message for proper coordination of the brain and other parts of the body. Parkinson’s disease has four main characteristics that can be identified in the patient.
These include postural inabilities to perform well, tremor rigidness, Brady kinesis, and brain disorders. Tremor rigidness involves continuous trembling and shaking of various body parts, among them hands, legs and even the whole body part. Brady kinesis entails slowness in individual movements and quick in becoming tired over a short distance walk. Other minor symptoms identified among the patients of the disease include brain memory loss, loss of weight, dizziness and even sometimes individuals become insane. As the disease exhibits various features, through feature selection the generated feature weights are attached, using various methods, among them is algorithm method. The feature algorithm method conducts analysis for subset using valuation method. The valuation method is run usually using a dataset that is highly internally trained and tests sets on various identified features. The feature data subset identified with the highest evolutionary value is selected and kept for further analysis of the disease in inductive algorithm.
Three popular wrapper methods are currently used in various methods among them is a Genetics Algorithm (GA), the wrapper forward and the wrapper backward. Algorithms probabilistic methods are modeled to be used for natural selection in biological process of evolution. They are highly applied in the research, and optimizations of the machine learning processes.GA progresses by generating other minor subsets through the process of iterative method from the original ones. Evaluation processes of the strings formed are connected to the others in order to clarify the cause of the problem.
Using the wrapper method involves algorithm, which is used in selecting various concepts before the supervised learning process. On the other hand, the backward wrapper method entails the process of eliminating features from the previous completed sets. These entire selected features try to add or sometimes subtracts the hybrid value attached to the process. Several subs set problem have been identified to be linked with supervised learning. Some of the problems identified entail difficulty in selecting true sets used in the process of co predictions. Secondly the forward method is much faster in the analysis that the backward process hence this creates disequilibrium in the process. Another problem is the greedy search that terminates the process (Koller and Sahami-1996) .This tries to hinder effective diagnosis of the disease. Recent works performed by various researches proves that Bayer classifier performs higher independence comparing to the wrapper method. Though based on various assumptions it remains the most preferred method as compared to others as it has fewer restrictions attached to it. This was well proved on a paper which carried out the survival prognosis of cirrhotic patients treated using the TIPS. The results obtained reveal that Bayes’s classifier gives more accurate diagnosis comparing to wrapper method.
Clinical record has accumulated a lot of information through data mining about the patients’ health records. Data mining is the process through which data pertaining to health information about a certain patient is looked for to facilitate proper decision making about the patients health history. This helps to promote proper diagnosis of the disease hence eventually the patient gets improved care. If training set produces less accurate data pertaining to the patient’s health history, then pre classification is necessary to achieve more accurate results for the process. Feature processing is one the techniques used in data mining process for ensuring proper data mining. The process has special advantages of ensuring less dimensionality, reduction of retardants and ensuring data obtained is well oriented through the algorithms .Much of the works done on the process of data mining has been aimed at improving the predictive accuracy and ensuring that comprehensiveness has been achieved.
The process is more reliable as it ensures that a lot patients’ care analysis has been done at the ground. Moreover, the feature selection process has been linked with improved accuracy and ensuring that all negative facts has been purely eliminated from the health history records. In medical field analysis, the accuracy is enhanced by literally achieving difference between death rates and life. Under this paper, a sub set approach has been devised to ensure Bayesian classification has been improved. Bayer-experiments were conducted on Bima diabetes Indian patients’ dataset to find out how the approach was effective. The results revealed that more promising analysis was achieved when SVM (Support Vector Machine) Ranking was used in the process in comparison to backward search approach.
Hybrid Method for Medical Data
Hybrid selection in medical decision - making has been linked with machine learning process where data set has been used in diagnosis by various classifiers in the research work. In data selection method, the introduction of the computers in the process has enhanced medical decision supports. Data must be collected to ensure the progress of medical process is achieved. Intelligent machine learning has helped in solving complex decision - making. Decision trees are also used in the analysis. The Naive Bayes’s device is one the simplest method in the hybrid decision making.
The theoretical background of the algorithm is that the various attributes are assumed to be not related to each other. Labels are made on the attributes and then proximity of the matters is used in the process to identify the best method. Instant based algorithm is also used and it involves highly ranked classifiers who are having high medical research performance. The base classifiers are said to produce zero data while the number one classifiers are then given the output of base classifiers. The two groups classifiers are then combined using the current data available for the input. However, various weaknesses have been identified to be linked to the process. An increased computation has rendered the process not to be preferred by many researchers. Secondly, many researchers have been involved in the process making, while some of the researchers were not participative. SVM technique, which moves around the margin values, has also been used in the medical decision- making for ensuring that two different data are completely separated there, by creating greatest distance possible between the two medical data. However, most of the medical problems entail separable data hence making the process difficult for any decision- making. The solution to the problem is to ensure that there is a proper mapping of the data before combining it. Accuracy of combining both levels in hybrid selection yields better results than when each of them is used individually.
However, in general the hybrid attribute selection gives better results as compared to the others. In addition to that, the attributes linked with each level are greater by around two percentages. Another attribute selection process advantage is that the Naives Bayesian analysis gives better understanding of the matter as compared to the other methods of algorithms. C 4.5 method when combined with the Bayes’s method yields better results as compared to when the later is used independently. Finally, it is clear that hybrid attribute method yields better results when combined with other method because it helps to enhance comprehensiveness and reduce complexity in the process of medical data mining. An ensemble of the Support Vector Machine (SVM) and the novel hybrid selection has clearly featured in automatic selection when analyzing and diagnosing the erythematous-squamous diseases, where various classes have been selected with each class having its own favorite class to identify with. This improves the results greatly and hence high accuracy is achieved in treatment of the disease.
Introduction of medical information in the clinical field has led to collection of lots of information about various diseases, which has never been fully diagnosed. It has led to the collection of patients’ clinical health histories, which has highly contributed to the better treatment of the patients. Thus, in short, it has formed a core point where the doctors can share ideas about various diseases emerging and finding a lasting treatment on them. However, in order for it to be achieved there is a need to have continuous collection of the data. Secondly, machine- learning algorithms must be used to identify the best classifiers that can adequately pursue the matter of analyzing medical data. For the doctors, the automatically collected data the diseases is more specific in matters of treatment and helps to comprehensively analyze the patients.. Errors committed in medical prescriptions will be highly reduced and the difference between lives and death widened. In this paper, six well -known machines of algorithms are used under high supervision where each machine is connected to around 10 medical data sets. The results obtained evidently reveal that none of the experiments conducted was superior to the other and so all of them had own advantages and demerits. Many medical analysts and renowned analysts clearly expressed their fear that there will be some difficulties linked with the process however; it was very easy to find single exemplary classifiers out of those selected. The hybrid method of selecting was identified as the most effective method since it helped to ensure complexity reduction, less time consumption and more protection to the patients.