heart disease prediction research paper ieee

HeartCare: IoT Based Heart Disease Prediction System

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 12 November 2020

Early and accurate detection and diagnosis of heart disease using intelligent computational model

Yar Muhammad 1 ,
Muhammad Tahir 1 ,
Maqsood Hayat 1 &
Kil To Chong 2

Scientific Reports volume 10 , Article number: 19747 ( 2020 ) Cite this article

25k Accesses

71 Citations

Metrics details

Cardiovascular diseases
Computational biology and bioinformatics
Health care
Heart failure

Heart disease is a fatal human disease, rapidly increases globally in both developed and undeveloped countries and consequently, causes death. Normally, in this disease, the heart fails to supply a sufficient amount of blood to other parts of the body in order to accomplish their normal functionalities. Early and on-time diagnosing of this problem is very essential for preventing patients from more damage and saving their lives. Among the conventional invasive-based techniques, angiography is considered to be the most well-known technique for diagnosing heart problems but it has some limitations. On the other hand, the non-invasive based methods, like intelligent learning-based computational techniques are found more upright and effectual for the heart disease diagnosis. Here, an intelligent computational predictive system is introduced for the identification and diagnosis of cardiac disease. In this study, various machine learning classification algorithms are investigated. In order to remove irrelevant and noisy data from extracted feature space, four distinct feature selection algorithms are applied and the results of each feature selection algorithm along with classifiers are analyzed. Several performance metrics namely: accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve are used to observe the effectiveness and strength of the developed model. The classification rates of the developed system are examined on both full and optimal feature spaces, consequently, the performance of the developed model is boosted in case of high variated optimal feature space. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection technique. It is anticipated that the proposed system will be useful and helpful for the physician to diagnose heart disease accurately and effectively.

Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction

Zeinab Noroozi, Azam Orooji & Leila Erfannia

heart disease prediction research paper ieee

An active learning machine technique based prediction of cardiovascular heart disease from UCI-repository database

Saravanan Srinivasan, Subathra Gunasekaran, … Gemmachis Teshite Dalu

Finding the influential clinical traits that impact on the diagnosis of heart disease using statistical and machine-learning techniques

Iffat Ara Talin, Mahmudul Hasan Abid, … Abdullah-Al Nahid

Introduction

Heart disease is considered one of the most perilous and life snatching chronic diseases all over the world. In heart disease, normally the heart fails to supply sufficient blood to other parts of the body to accomplish their normal functionality 1 . Heart failure occurs due to blockage and narrowing of coronary arteries. Coronary arteries are responsible for the supply of blood to the heart itself 2 . A recent survey reveals that the United States is the most affected country by heart disease where the ratio of heart disease patients is very high 3 . The most common symptoms of heart disease include physical body weakness, shortness of breath, feet swollen, and weariness with associated signs, etc. 4 . The risk of heart disease may be increased by the lifestyle of a person like smoking, unhealthy diet, high cholesterol level, high blood pressure, deficiency of exercise and fitness, etc. 5 . Heart disease has several types in which coronary artery disease (CAD) is the common one that can lead to chest pain, stroke, and heart attack. The other types of heart disease include heart rhythm problems, congestive heart failure, congenital heart disease (birth time heart disease), and cardiovascular disease (CVD). Initially, traditional investigation techniques were used for the identification of heart disease, however, they were found complex 6 . Owing to the non-availability of medical diagnosing tools and medical experts specifically in undeveloped countries, diagnosis and cure of heart disease are very complex 7 . However, the precise and appropriate diagnosis of heart disease is very imperative to prevent the patient from more damage 8 . Heart disease is a fatal disease that rapidly increases in both economically developed and undeveloped countries. According to a report generated by the World Health Organization (WHO), an average of 17.90 million humans died from CVD in 2016. This amount represents approximately 30% of all global deaths. According to a report, 0.2 million people die from heart disease annually in Pakistan. Every year, the number of victimizing people is rapidly increasing. European Society of Cardiology (ESC) has published a report in which 26.5 million adults were identified having heart disease and 3.8 million were identified each year. About 50–55% of heart disease patients die within the initial 1–3 years, and the cost of heart disease treatment is about 4% of the overall healthcare annual budget 9 .

Conventional invasive-based methods used for the diagnosis of heart disease which were based on the medical history of a patient, physical test results, and investigation of related symptoms by the doctors 10 . Among the conventional methods, angiography is considered one of the most precise technique for the identification of heart problems. Conversely, angiography has some drawbacks like high cost, various side effects, and strong technological knowledge 11 . Conventional methods often lead to imprecise diagnosis and take more time due to human mistakes. In addition, it is a very expensive and computational intensive approach for the diagnosis of disease and takes time in assessment 12 .

To overcome the issues in conventional invasive-based methods for the identification of heart disease, researchers attempted to develop different non-invasive smart healthcare systems based on predictive machine learning techniques namely: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), and Decision Tree (DT), etc. 13 . As a result, the death ratio of heart disease patients has been decreased 14 . In literature, the Cleveland heart disease dataset is extensively utilized by the researchers 15 , 16 .

In this regard, Robert et al . 17 have used a logistic regression classification algorithm for heart disease detection and obtained an accuracy of 77.1%. Similarly, Wankhade et al . 18 have used a multi-layer perceptron (MLP) classifier for heart disease diagnosis and attained accuracy of 80%. Likewise, Allahverdi et al . 19 have developed a heart disease classification system in which they integrated neural networks with an artificial neural network and attained an accuracy of 82.4%. In a sequel, Awang et al . 20 have used NB and DT for the diagnosis and prediction of heart disease and achieved reasonable results in terms of accuracy. They achieved an accuracy of 82.7% with NB and 80.4% with DT. Oyedodum and Olaniye 21 have proposed a three-phase system for the prediction of heart disease using ANN. Das and Turkoglu 22 have proposed an ANN ensemble-based predictive model for the prediction of heart disease. Similarly, Paul and Robin 23 have used the adaptive fuzzy ensemble method for the prediction of heart disease. Likewise, Tomov et al. 24 have introduced a deep neural network for heart disease prediction and his proposed model performed well and produced good outcomes. Further, Manogaran and Varatharajan 25 have introduced the concept of a hybrid recommendation system for diagnosing heart disease and their model has given considerable results. Alizadehsani et al . 26 have developed a non-invasive based model for the prediction of coronary artery disease and showed some good results regarding the accuracy and other performance assessment metrics. Amin et al . 27 have proposed a framework of a hybrid system for the identification of cardiac disease, using machine learning, and attained an accuracy of 86.0%. Similarly, Mohan et al . 28 have proposed another intelligent system that integrates RF with a linear model for the prediction of heart disease and achieved the classification accuracy of 88.7%. Likewise, Liaqat et al . 29 have developed an expert system that uses stacked SVM for the prediction of heart disease and obtained 91.11% classification accuracy on selected features.

The contribution of the current work is to introduce an intelligent medical decision system for the diagnosis of heart disease based on contemporary machine learning algorithms. In this study, 10 different nature of machine learning classification algorithms such as Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), Random Forest (RF), Artificial Neural Network (ANN), etc. are implemented in order to select the best model for timely and accurate detection of heart disease at an early stage. Four feature selection algorithms, Fast Correlation-Based Filter Solution (FCBF), minimal redundancy maximal relevance (mRMR), Least Absolute Shrinkage and Selection Operator (LASSO), and Relief have been used for selecting the vital and more correlated features that have truly reflect the motif of the desired target. Our developed system has been trained and tested on the Cleveland (S 1 ) and Hungarian (S 2 ) heart disease datasets which are available online on the UCI machine learning repository. All the processing and computations were performed using Anaconda IDE. Python has been used as a tool for implementing all the classifiers. The main packages and libraries used include pandas, NumPy, matplotlib, sci-kit learn (sklearn), and seaborn. The main contribution of our proposed work is given below:

The performance of all classifiers has been tested on full feature spaces in terms of all performance evaluation matrices specifically accuracy.

The performances of the classifiers are tested on selected feature spaces, selected through various feature selection algorithms mentioned above.

The research study recommends that which feature selection algorithm is feasible with which classification algorithm for developing a high-level intelligence system for the diagnosing of heart disease patients.

The rest of the paper is organized as: “ Results and discussion ” section represents the results and discussion, “ Material and methods ” section describes the material and methods used in this paper. Finally, we conclude our proposed research work in “ Conclusion ” section.

Results and discussion

This section of the paper discusses the experimental results of various contemporary classification algorithms. At first, the performance of all used classification models i.e. K-Nearest Neighbors (KNN), Decision Tree (DT), Extra-Tree Classifier (ETC), Random Forest (RF), Logistic Regression (LR), Naïve Bayes (NB), Artificial Neural Network (ANN), Support Vector Machine (SVM), Adaboost (AB), and Gradient Boosting (GB) along with full feature space is evaluated. After that, four feature selection algorithms (FSA): Fast Correlation-Based Filter (FCBF), Minimal Redundancy Maximal Relevance (mRMR), Least Absolute Shrinkage and Selection Operator (LASSO), and Relief are applied to select the prominent and high variant features from feature space. Furthermore, the selected feature spaces are provided to classification algorithms as input to analyze the significance of feature selection techniques. The cross-validation techniques i.e. k-fold (10-fold) are applied on both the full and selected feature spaces to analyze the generalization power of the proposed model. Various performance evaluation metrics are implemented for measuring the performances of the classification models.

Classifiers’ predictive outcomes on full feature space

The experimental outcomes of the applied classification algorithms on the full feature space of the two benchmark datasets by using 10-fold cross-validation (CV) techniques are shown in Tables 1 and 2 , respectively.

The experimental results demonstrated that the ET classifier performed quite well in terms of all performance evaluation metrics compared to the other classifiers using 10-fold CV. ET achieved 92.09% accuracy, 91.82% sensitivity, 92.38% specificity, 97.92% AUC, 92.84% Precision, 0.92 F1-Score and 0.84 MCC. The specificity indicates that the diagnosed test was negative and the individual doesn't have the disease. While the sensitivity indicates the diagnostic test was positive and the patient has heart disease. In the case of the KNN classification model, multiple experiments were accomplished by considering various values for k i.e. k = 3, 5, 7, 9, 13, and 15, respectively. Consequently, KNN has shown the best performance at value k = 7 and achieved a classification accuracy of 85.55%, 85.93% sensitivity, 85.17% specificity, 95.64% AUC, 86.09% Precision, 0.86 F1-Score, and 0.71 MCC. Similarly, DT classifier has achieved accuracy of 86.82%, 89.73% sensitivity, 83.76% specificity, 91.89% AUC, 85.40% Precision, 0.87 F1-Score, and 0.73 MCC. Likewise, GB classifier has yielded accuracy of 91.34%, 90.32% sensitivity, 91.52% specificity, 96.87% AUC, 92.14% Precision, 0.92 F1-Score, and 0.83 MCC. After empirically evaluating the success rates of all classifiers, it is observed that ET Classifier out-performed among all the used classification algorithms in terms of accuracy, sensitivity, and specificity. Whereas, NB shows the lowest performance in terms of accuracy, sensitivity, and specificity. The ROC curve of all classification algorithms on full feature space is represented in Fig. 1 .

ROC curves of all classifiers on full feature space using 10-fold cross-validation on S 1 .

In the case of dataset S 2 , composed of 1025 total instances in which 525 belong to the positive class and 500 instances of having negative class, again ET has obtained quite well results compared to other classifiers using a 10-fold cross-validation test, which are 96.74% accuracy, 96.36 sensitivity, 97.40% specificity, and 0.93 MCC as shown in Table 2 .

Classifiers’ predictive outcomes on selected feature space

Fcbf feature selection technique.

FCBF feature selection technique is applied to select the best subset of feature space. In this attempt, various length of subspaces is generated and tested. Finally, the best results are achieved by classification algorithms on the subset of feature space (n = 6) using a 10-fold CV. Table 3 shows various performance measures of classifiers executed on the selected features space of FCBF.

Table 3 demonstrates that the ET classifier obtained quite good results including accuracy of 94.14%, 94.29% sensitivity, and specificity of 93.98%. In contrast, NB reported the lowest performance compared to the other classification algorithms. The performance of classification algorithms is also illustrated in Fig. 2 by using ROC curves.

ROC curve of all classifiers on selected features by FCBF feature selection algorithm.

mRMR feature selection technique

mRMR feature selection technique is used in order to select a subset of features that enhance the performance of classifiers. The best results reported on a subset of n = 6 of feature space which is shown in Table 4 .

In the case of mRMR, still, the success rates of the ET classifier are well in terms of all performance evaluation metrics compared to the other classifiers. ET has attained 93.42% accuracy, 93.92% sensitivity, and specificity of 93.88%. In contrast, NB has achieved the lowest outcomes which are 81.84% accuracy. Figure 3 shows the ROC curve of all ten classifiers using the mRMR feature selection algorithm.

ROC curve of all classifiers on selected features using the mRMR feature selection algorithm.

LASSO feature selection technique

In order to choose the optimal feature space which not only reduces computational cost but also progresses the performance of the classifiers, LASSO feature selection technique is applied. After performing various experiments on different subsets of feature space, the best results are still noted on the subspace of (n = 6). The predicted outcomes of the best-selected feature space are reported in Table 5 using the 10-fold CV.

Table 5 demonstrated that the predicted outcomes of the ET classifier are considerable and better compared to the other classifiers. ET has achieved 89.36% accuracy, 88.21% sensitivity, and specificity of 90.58%. Likewise, GB has yielded the second-best result which is the accuracy of 88.47%, 89.54% sensitivity, and specificity of 87.37%. Whereas, LR has performed worse results and achieved 80.77% accuracy, 83.46% sensitivity, and specificity of 77.95%. ROC curves of the classifiers are shown in Fig. 4 .

ROC curve of all classifiers on selected feature space using the LASSO feature selection algorithm.

Relief feature selection technique

In a sequel, another feature selection technique Relief is applied to investigate the performance of classifiers on different sub-feature spaces by using the wrapper method. After empirically analyzing the results of the classifiers on a different subset of feature spaces, it is observed that the performance of classifiers is outstanding on the sub-space of length (n = 6). The results of the optimal feature space on the 10-fold CV technique are listed in Table 6 .

Again, the ET classifier performed outstandingly in terms of all performance evaluation metrics as compared to other classifiers. ET has obtained an accuracy of 94.41%, 94.93% sensitivity, and specificity of 94.89%. In contrast, NB has shown the lowest performance and achieved 80.29% accuracy, 81.93% sensitivity, and specificity of 78.55%. The ROC curves of the classifiers are demonstrated in Fig. 5 .

ROC curve of all classifiers on selected features selected by the Relief feature selection algorithm.

After executing classification algorithms along with full and selected feature spaces in order to select the optimal algorithm for the operational engine, the empirical results have revealed that ET performed well not only on all feature space but also on optimal selected feature space among all the used classification algorithms. Furthermore, the ET classifier obtained quite promising accuracy in the case of the Relief feature selection technique which is 94.41%. Overall, the performance of ET is reported better in terms of most of the measures while other classifiers have shown good results in one measure while worse in other measures. In addition, the performance of the ET classifier is also evaluated on a 10-fold CV in combination with different sub-feature spaces of varying length starting from 1 to 12 with a step size of 1 to check the stability and discrimination power of the classifier as described in 30 . Doing so will assist the readers to have a better understanding of the impact, of the number of selected features on the performance of the classifiers. The same process is repeated for another dataset i.e. S 2 (Hungarian heart disease dataset) as well, to know the impact of selected features on the classification performance.

Tables 7 and 8 shows the performance of the ET classifier using 10-fold CV in combination with different feature sub-spaces starting from 1 to 12 with a step size of 1. The experimental results show that the performance of the ET classifier is affected significantly by using the varying length of sub-feature spaces. Finally, it is concluded that all these achievements are ascribed with the best selection of Relief feature selection technique which not only reduces the feature space but also enhances the predictive power of classifiers. In addition, the ET classifier has also played a quite promising role in these achievements because it has clearly and precisely learned the motif of the target class and reflected it truly. In addition, the performance of the ET classifier is also evaluated on 5-fold and 7-fold CV in combination with different sub-spaces of length 5 and 7 to check the stability and discrimination power of the classifier. It is also tested on another dataset S 2 (Hungarian heart disease dataset). The results are shown in supplementary materials .

In Table 9 , P-value and Chi-Square values are also computed for the ET classifier in combination with the optimal feature spaces of different feature selection techniques.

Performance comparison with existing models

Further, a comparative study of the developed system is conducted with other states of the art machine learning approaches discussed in the literature. Table 10 represents, a brief description and classification accuracies of those approaches. The results demonstrate that our proposed model success rate is high compared to existing models in the literature.

Material and methods

The subsections represent the materials and the methods that are used in this paper.

The first and rudimentary step of developing an intelligent computational model is to construct or develop a problem-related dataset that truly and effectively reflects the pattern of the target class. Well organized and problem-related dataset has a high influence on the performance of the computational model. Looking at the significance of the dataset, two datasets i.e. the Cleveland heart disease dataset S 1 and Hungarian heart disease dataset (S 2 ) are used, which are available online at the University of California Irvine (UCI) machine learning repository and UCI Kaggle repository, and various researchers have used it for conducting their research studies 28 , 31 , 32 . The S1 consists of 304 instances, where each instance has distinct 13 attributes along with the target labels and are selected for training. The dataset is composed of two classes, presence or absence of heart disease. The S 2 is composed of 1025 instances in which 525 instances belong to positive class while the rest of 500 instances have negative class. The description of attributes of both the datasets is the same, and both have similar attributes. The complete description and information of the datasets with 13 attributes are given in Table 11 .

Proposed system methodology

The main theme of the developed system is to identify heart problems in human beings. In this study, four distant feature selection techniques namely: FCBF, mRMR, Relief, and LASSO are applied on the provided dataset in order to remove noisy, redundant features and select variant features, consequently may cause of enhancing the performance of the proposed model. Various machine learning classification algorithms are used in this study which include, KNN, DT, ETC, RF, LR, NB, ANN, SVM, AB, and GB. Different evaluation metrics are computed to assess the performance of classification algorithms. The methodology of the proposed system is carried out in five stages which include dataset preprocessing, selection of features, cross-validation technique, classification algorithms, and performance evaluation of classifiers. The framework of the proposed system is illustrated in Fig. 6 .

An Intelligent Hybrid Framework for the prediction of heart disease.

Preprocessing of data

Data preprocessing is the process of transforming raw data into meaningful patterns. It is very crucial for a good representation of data. Various preprocessing approaches such as missing values removal, standard scalar, and Min–Max scalar are used on the dataset in order to make it more effective for classification.

Feature selection algorithms

Feature selection technique selects the optimal features sub-space among all the features in a dataset. It is very crucial because sometimes, the classification performance degrades due to irrelevant features in the dataset. The feature selection technique improves the performance of classification algorithms and also reduces their execution time. In this research study, four feature selection techniques are used and are listed below:

Fast correlation-based filter (FCBF): FCBF feature selection algorithm follows a sequential search strategy. It first selects full features and then uses symmetric uncertainty for measuring the dependencies of the features on each other and how they affect the target output label. After this, it selects the most important features using the backward sequential search strategy. FCBF outperforms on high dimensional datasets. Table 12 shows the results of the selected features (n = 6) by using the FCBF feature selection algorithm. Each attribute is given a weight based on its importance. According to the FCBF feature selection technique, the most important features are THA and CPT as shown in Table 12 . The ranking that the FCBF gives to all the features of the dataset is shown in Fig. 7 .

Minimal redundancy maximal relevance (mRMR): mRMR uses the heuristic approach for selecting the most vital features that have minimum redundancy and maximum relevance. It selects those features which are useful and relevant to the target. As it follows a heuristic approach so, it checks one feature at a time and then computes its pairwise redundancy with the other features. The mRMR feature selection algorithm is not suitable for high domain feature problems 33 . The results of selected features by the mRMR feature selection algorithm (n = 6) are listed in Table 13 . In addition, among these attributes, PES and CPT have the highest score. Figure 7 describes the attributes ranking given by the mRMR feature selection algorithm to all attributes in the feature space.

Features ranking by four feature selection algorithms (FCBF, LASSO, mRMR, Relief).

Least absolute shrinkage and selection operator (LASSO) LASSO selects features based on updating the absolute value of the features coefficient. In updating the features coefficient values, zero becoming values are removed from the features subset. LASSO outperforms with low feature coefficient values. The features having high coefficient values will be selected in the subset of features and the rest will be eliminated. Moreover, some irrelevant features with higher coefficient values may be selected and are included in the subset of features 30 . Table 14 represents the six most profound attributes which have a great correlation with the target and their scores selected by the LASSO feature selection algorithm. Figure 7 represents the important features and their scoring values given by the LASSO feature selection algorithm.

Relief feature selection algorithm Relief utilizes the concept of instance-based learning which allocates weight to each attribute based on its significance. The weight of each attribute demonstrates its capability to differentiate among class values. Attributes are rated by weights, and those attributes whose weight is exceeding a user-specified cutoff, are chosen as the final subset 34 . The relief feature selection algorithm selects the most significant attributes which have more effect on the target 35 . The algorithm operates by selecting instances randomly from the training samples. The nearest instance of the same class (nearest hit) and opposite class (nearest miss) is identified for each sampled instance. The weight of an attribute is updated according to how well its values differentiate between the sampled instance and its nearest miss and hit. If an attribute discriminates amongst instances from different classes and has the same value for instances of the same class, it will get a high weight.

The weight updating of attributes works on a simple idea (line 6). That if instance R i and NH have dissimilar value (i.e. the diff value is large), that means the attribute splits two instances with the same class which is not worthwhile, and thus we reduce the attributes weight. On the other hand, if the instance R i and NM have a distinct value that means the attribute separates the two instances with a different class, which is desirable. The six most important features selected by the Relief algorithm are listed in descending order in Table 15 . Based on weight values the most vital features are CPT and Age. Figure 7 demonstrates the important features and their ranking given by the Relief feature selection algorithm.

Machine learning classification algorithms

Various machine learning classification algorithms are investigated for early detection of heart disease, in this study. Each classification algorithm has its significance and the importance is reported varied from application to application. In this paper, 10 distant nature of classification algorithms namely: KNN, DT, ET, GB, RF, SVM, AB, NB, LR, and ANN are applied to select the best and generalize prediction model.

Classifier validation method

Validation of the prediction model is an essential step in machine learning processes. In this paper, the K-Fold cross-validation method is applied to validating the results of the above-mentioned classification models.

K-fold cross validation (CV)

In K-Fold CV, the whole dataset is split into k equal parts. The (k-1) parts are utilized for training and the rest is used for the testing at each iteration. This process continues for k-iteration. Various researchers have used different values of k for CV. Here k = 10 is used for experimental work because it produces good results. In tenfold CV, 90% of data is utilized for training the model and the remaining 10% of data is used for the testing of the model at each iteration. At last, the mean of the results of each step is taken which is the final result.

Performance evaluation metrics

For measuring the performance of the classification algorithms used in this paper, various evaluation matrices have been implemented including accuracy, sensitivity, specificity, f1-score, recall, Mathew Correlation-coefficient (MCC), AUC-score, and ROC curve. All these measures are calculated from the confusion matrix described in Table 16 .

In confusion matrix True Negative (TN) shows that the patient has not heart disease and the model also predicts the same i.e. a healthy person is correctly classified by the model.

True Positive (TP) represents that the patient has heart disease and the model also predicts the same result i.e. a person having heart disease is correctly classified by the model.

False Positive (FP) demonstrates that the patient has not heart disease but the model predicted that the patient has i.e. a healthy person is incorrectly classified by the model. This is also called a type-1 error.

False Negative (FN) notifies that the patient has heart disease but the model predicted that the patient has not i.e. a person having heart disease is incorrectly classified by the model. This is also called a type-2 error.

Accuracy Accuracy of the classification model shows the overall performance of the model and can be calculated by the formula given below:

Specificity specificity is a ratio of the recently classified healthy people to the total number of healthy people. It means the prediction is negative and the person is healthy. The formula for calculating specificity is given as follows:

Sensitivity Sensitivity is the ratio of recently classified heart patients to the total patients having heart disease. It means the model prediction is positive and the person has heart disease. The formula for calculating sensitivity is given below:

Precision: Precision is the ratio of the actual positive score and the positive score predicted by the classification model/algorithm. Precision can be calculated by the following formula:

F1-score F1 is the weighted measure of both recall precision and sensitivity. Its value ranges between 0 and 1. If its value is one then it means the good performance of the classification algorithm and if its value is 0 then it means the bad performance of the classification algorithm.

MCC It is a correlation coefficient between the actual and predicted results. MCC gives resulting values between − 1 and + 1. Where − 1 represents the completely wrong prediction of the classifier.0 means that the classifier generates random prediction and + 1 represents the ideal prediction of the classification models. The formula for calculating MCC values is given below:

Finally, we will examine the predictability of the machine learning classification algorithms with the help of the receiver optimistic curve (ROC) which represents a graphical demonstration of the performance of ML classifiers. The area under the curve (AUC) describes the ROC of a classifier and the performance of the classification algorithms is directly linked with AUC i.e. larger the value of AUC greater will be the performance of the classification algorithm.

In this study, 10 different machine learning classification algorithms namely: LR, DT, NB, RF, ANN, KNN, GB, SVM, AB, and ET are implemented in order to select the best model for early and accurate detection of heart disease. Four feature selection algorithms such as FCBF, mRMR, LASSO, and Relief have been used to select the most vital and correlated features that truly reflect the motif of the desired target. Our developed intelligent computational model has been trained and tested on two datasets i.e. Cleveland (S1) and Hungarian (S2) heart disease datasets. Python has been used as a tool for implementation and simulating the results of all the utilized classification algorithms.

The performance of all classification models has been tested in terms of various performance metrics on full feature space as well as selected feature spaces, selected through various feature selection algorithms. This research study recommends that which feature selection algorithm is feasible with which classification model for developing a high-level intelligent system for the diagnosis of a patient having heart disease. From simulation results, it is observed that ET is the best classifier while relief is the optimal feature selection algorithm. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection algorithm. It is anticipated that the proposed system will be useful and helpful for the doctors and other care-givers to diagnose a patient having heart disease accurately and effectively at the early stages.

Heart disease is one of the most devastating and fatal chronic diseases that rapidly increase in both economically developed and undeveloped countries and causes death. This damage can be reduced considerably if the patient is diagnosed in the early stages and proper treatment is provided to her. In this paper, we developed an intelligent predictive system based on contemporary machine learning algorithms for the prediction and diagnosis of heart disease. The developed system was checked on two datasets i.e. Cleveland (S1) and Hungarian (S2) heart disease datasets. The developed system was trained and tested on full features and optimal features as well. Ten classification algorithms including, KNN, DT, RF, NB, SVM, AB, ET, GB, LR, and ANN, and four feature selection algorithms such as FCBF, mRMR, LASSO, and Relief are used. The feature selection algorithm selects the most significant features from the feature space, which not only reduces the classification errors but also shrink the feature space. To assess the performance of classification algorithms various performance evaluation metrics were used such as accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve. The classification accuracies of the top two classification algorithms i.e. ET and GB on full features were 92.09% and 91.34% respectively. After applying feature selection algorithms, the classification accuracy of ET with the relief feature selection algorithm increases from 92.09 to 94.41%. The accuracy of GB increases from 91.34 to 93.36% with the FCBF feature selection algorithm. So, the ET classifier with the relief feature selection algorithm performs excellently. P-value and Chi-square are also computed for the ET classifier with each feature selection technique. The future work of this research study is to use more optimization techniques, feature selection algorithms, and classification algorithms to improve the performance of the predictive system for the diagnosis of heart disease.

Bui, A. L., Horwich, T. B. & Fonarow, G. C. Epidemiology and risk profile of heart failure. Nat. Rev. Cardiol. 8 , 30 (2011).

Article PubMed Google Scholar

Polat, K. & Güneş, S. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis, and FFT method based new hybrid automated identification system for classification of EEG signals. Expert Syst. Appl. 34 , 2039–2048 (2010).

Article Google Scholar

Heidenreich, P. A. et al. Forecasting the future of cardiovascular disease in the United States: A policy statement from the American Heart Association. Circulation 123 , 933–944 (2011).

Durairaj, M. & Ramasamy, N. A comparison of the perceptive approaches for preprocessing the data set for predicting fertility success rate. Int. J. Control Theory Appl. 9 , 255–260 (2016).

Google Scholar

Das, R., Turkoglu, I. & Sengur, A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36 , 7675–7680 (2012).

Allen, L. A. et al. Decision making in advanced heart failure: A scientific statement from the American Heart Association. Circulation 125 , 1928–1952 (2014).

Yang, H. & Garibaldi, J. M. A hybrid model for automatic identification of risk factors for heart disease. J. Biomed. Inform. 58 , S171–S182 (2015).

Article PubMed PubMed Central Google Scholar

Alizadehsani, R., Hosseini, M. J., Sani, Z. A., Ghandeharioun, A. & Boghrati, R. In 2012 IEEE 12th International Conference on Data Mining Workshops. 9–16 (IEEE, New York).

Arabasadi, Z., Alizadehsani, R., Roshanzamir, M., Moosaei, H. & Yarifard, A. A. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 141 , 19–26 (2017).

Samuel, O. W., Asogbon, G. M., Sangaiah, A. K., Fang, P. & Li, G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68 , 163–172 (2017).

Patil, S. B. & Kumaraswamy, Y. Intelligent and effective heart attack prediction system using data mining and artificial neural network. Eur. J. Sci. Res. 31 , 642–656 (2009).

Vanisree, K. & Singaraju, J. Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural networks. Int. J. Comput. Appl. 19 , 6–12 (2015).

B. Edmonds. In Proceedings of AISB Symposium on Socially Inspired Computing 1–12 (Hatfield, 2005).

Methaila, A., Kansal, P., Arya, H. & Kumar, P. Early heart disease prediction using data mining techniques. Comput. Sci. Inf. Technol. J. https://doi.org/10.5121/csit.2014.4807 (2014).

Nazir, S., Shahzad, S., Mahfooz, S. & Nazir, M. Fuzzy logic based decision support system for component security evaluation. Int. Arab J. Inf. Technol. 15 , 224–231 (2018).

Detrano, R. et al. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64 , 304–310 (2009).

Gudadhe, M., Wankhade, K. & Dongre, S. In 2010 International Conference on Computer and Communication Technology (ICCCT) , 741–745 (IEEE, New York).

Kahramanli, H. & Allahverdi, N. Design of a hybrid system for the diabetes and heart diseases. Expert Syst. Appl. 35 , 82–89 (2013).

Palaniappan, S. & Awang, R. In 2012 IEEE/ACS International Conference on Computer Systems and Applications 108–115 (IEEE, New York).

Olaniyi, E. O., Oyedotun, O. K. & Adnan, K. Heart diseases diagnosis using neural networks arbitration. Int. J. Intel. Syst. Appl. 7 , 72 (2015).

Das, R., Turkoglu, I. & Sengur, A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36 , 7675–7680 (2011).

Paul, A. K., Shill, P. C., Rabin, M. R. I. & Murase, K. Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Applied Intelligence 48 , 1739–1756 (2018).

Tomov, N.-S. & Tomov, S. On deep neural networks for detecting heart disease. arXiv:1808.07168 (2018).

Manogaran, G., Varatharajan, R. & Priyan, M. Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system. Multimedia Tools Appl. 77 , 4379–4399 (2018).

Alizadehsani, R. et al. Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries. Comput. Methods Programs Biomed. 162 , 119–127 (2018).

Haq, A. U., Li, J. P., Memon, M. H., Nazir, S. & Sun, R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Inf. Syst. 2018 , 3860146. https://doi.org/10.1155/2018/3860146 (2018).

Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7 , 81542–81554 (2019).

Ali, L. et al. An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7 , 54007–54014 (2019).

Peng, H., Long, F. & Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (8), 1226–1238 (2005).

Palaniappan, S. & Awang, R. In 2008 IEEE/ACS International Conference on Computer Systems and Applications 108–115 (IEEE, New York).

Ali, L., Niamat, A., Golilarz, N. A., Ali, A. & Xingzhong, X. An expert system based on optimized stacked support vector machines for effective diagnosis of heart disease. IEEE Access (2019).

Pérez, N. P., López, M. A. G., Silva, A. & Ramos, I. Improving the Mann-Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography. Artif. Intell. Med. 63 , 19–31 (2015).

Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 , 273–282 (2011).

Article MathSciNet Google Scholar

de Silva, A. M. & Leong, P. H. Grammar-Based Feature Generation for Time-Series Prediction (Springer, Berlin, 2015).

Book Google Scholar

Download references

Acknowledgements

This research was supported by the Brain Research Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. NRF-2017M3C7A1044815).

Author information

Authors and affiliations.

Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan

Yar Muhammad, Muhammad Tahir & Maqsood Hayat

Department of Electronic and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea

Kil To Chong

You can also search for this author in PubMed Google Scholar

Contributions

All authors have equal contributions.

Corresponding authors

Correspondence to Maqsood Hayat or Kil To Chong .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Muhammad, Y., Tahir, M., Hayat, M. et al. Early and accurate detection and diagnosis of heart disease using intelligent computational model. Sci Rep 10 , 19747 (2020). https://doi.org/10.1038/s41598-020-76635-9

Download citation

Received : 03 April 2020

Accepted : 28 October 2020

Published : 12 November 2020

DOI : https://doi.org/10.1038/s41598-020-76635-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Sensors (Basel)

Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers

Ch. anwar ul hassan.

1 Department of Creative Technologies, Air University Islamabad, Islamabad 44000, Pakistan

Jawaid Iqbal

2 Department of Computer Science, Capital University of Science and Technology, Islamabad 44000, Pakistan

Rizwana Irfan

3 Department of Computer Science, University of Jeddah, P.O. Box 123456, Jeddah 21959, Saudi Arabia

Saddam Hussain

4 School of Digital Science, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong BE1410, Brunei

Abeer D. Algarni

5 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

Syed Sabir Hussain Bukhari

6 Department of Electrical Engineering, Sukkur IBA University, Sukkur 65200, Pakistan

Nazik Alturki

7 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

Syed Sajid Ullah

8 Department of Information and Communication Technology, University of Agder (UiA), N-4898 Grimstad, Norway

Associated Data

The data used in this research can be obtained from the corresponding authors upon request.

Coronary heart disease is one of the major causes of deaths around the globe. Predicating a heart disease is one of the most challenging tasks in the field of clinical data analysis. Machine learning (ML) is useful in diagnostic assistance in terms of decision making and prediction on the basis of the data produced by healthcare sector globally. We have also perceived ML techniques employed in the medical field of disease prediction. In this regard, numerous research studies have been shown on heart disease prediction using an ML classifier. In this paper, we used eleven ML classifiers to identify key features, which improved the predictability of heart disease. To introduce the prediction model, various feature combinations and well-known classification algorithms were used. We achieved 95% accuracy with gradient boosted trees and multilayer perceptron in the heart disease prediction model. The Random Forest gives a better performance level in heart disease prediction, with an accuracy level of 96%.

1. Introduction

The healthcare sector generates a lot of data regarding patients, diseases, and diagnoses, but it is not being appropriately analyzed, so it is not providing the value it should be. Heart illness is the prime reason of death. Rendering to the World Health Organization [ 1 ], CVDs are the largest cause of mortality globally, resulting in the deaths of an estimated 17.9 million individuals each year. The healthcare industry generates a lot of data regarding patient, diseases, and diagnoses, but it is not properly analyzed, so it does not have the same impact as it should on patient health [ 1 ].

CVDs include coronary artery, rheumatic heart disease, vascular disease, and various heart and blood vessel problems. Four out of every five CVD fatalities are caused by strokes or heart attacks. Among the total deaths, one-third occurs with persons below the age of 70 [ 2 ]. Sex, smoking, age, family history, poor diet, cholesterol, physical inactivity, high blood pressure, overweightness, and alcohol use are the key risk influences for heart disease. Heart disease is also caused by hereditary risk factors such as diabetes and high blood compression [ 3 ]. Physical idleness, fatness and unhealthy diet are some of the subordinate reasons that increase the risk. Fatigue, palpitations, sweating, back pain, chest pain, shoulder and arm pain, breath shortness and overall weakness are the most common symptoms. The most recurrent sign of deficient blood stream to the heart is still chest pain. In medical terminology, this type of chest pain is known as Angina [ 4 ]. There is examination available to help diagnose the disease, such as X-rays, MRI scans, and angiography. Though, there are times when there is a shortage of resources in an emergency due to non-availability of medical apparatus. In cardiovascular disease, the time is as important as every moment of diagnosing and treating the disease is counted [ 4 ].

Cardiac midpoints and outpatient departments produce huge outlay of data regarding the diagnosis of heart diseases, and the potential demand for improvement of big data analytics regarding cardiovascular overhaul and patient consequences is vast [ 5 ]. However, due to noise, incompleteness, and irregularity, it is hard to make specific, accurate, and well-grounded decisions using the data. Nowadays, AI is playing an important role in the field of cardiology, appreciations to massive advancements in equipment, big data, knowledge storage, acquisition, and recovery [ 6 ]. Using various data mining techniques, researchers used preprocessing methods on the data to make verdicts using various ML models [ 7 ]. In the cataloguing of genetic cardiac illnesses and control subjects, a widespread set of ML algorithms with their variations is used to predict the early stages of heart failure [ 8 , 9 ]. KNN, DT, SVC, LR, and RF machine algorithms are examples of heart attack prediction algorithms [ 10 ]. Machine learning approaches can be divided into three categories [ 11 ]: Supervised ML: task drive, labeled data (classification/regression); Unsupervised ML: data-driven, unlabeled data (clustering); Reinforcement Learning: learning from mistakes (playing games).

In this study, supervised ML classifiers are used to show how different models can predict the existence of heart disease and compare the accuracy of these classifiers, such as Logistic Regression (LR), k-Nearest Neighbors (kNN), XGBoost (XGB), Support Vector Machine (SVM), Stochastic Gradient Boosted Tree (GBT), Naive Bayes (NB), Neural Network (NN), Decision Tree (DT), Radial Basis Function (RBF), Random Forest (RF), and Multi-Layer Perceptron (MLP).

The rest of the paper is ordered as: Section 2 contains the literature review. The proposed methodology is discussed in Section 3 . The experiment’s results are discussed in Section 4 . To sum up, conclusions are inscribed in Section 5 .

2. Materials and Methods

Research efforts are related to information exploration using ML classifiers. Several papers have been written by researchers and practitioners to predict the presence of heart disease. Numerous studies and approaches have been developed to date to classify heart disease with data mining and ML. The authors of [ 12 ] proposed a detailed review about the study on the claims of ML in the domain of heart illness. The authors proposed a dataset which possess the required samples and data that could be used to construct an efficient method for the prediction of heart diseases. Preprocessing of the dataset has to be performed efficiently for formulating the dataset which will be used by the ML algorithm, in order to produce excellent results.

The authors of the study also endorsed the use of a suitable algorithm, such as an ANN or a DT, when developing a prediction model. ANN outperformed DT in most models for predicting heart disease. In [ 13 ], the authors projected a technique for envisaging heart disease using data analytics tools and ML techniques such as ANN, DT, Fuzzy Logic, NB, kNN, and SVM. The paper also includes a performance analysis of the algorithm as well as a summary of previous research. The author of [ 14 ] proposed an architecture that includes input data preprocessing before training as well as testing on various algorithms. The use of Adaboost is recommended by the author because it improves the presentation of all ML algorithms. The idea of fine-tuning parameters to achieve high accuracy was also supported by the author.

Researchers suggested a deep learning method for analysis and diagnosing of heart illness [ 15 ] using the UCI dataset. Furthermore, they expressed that that Deep Neural Network can be crucial in enhancing the overall classification quality in the field of heart disease analysis and diagnosis. They showed that Talos Hyper parameter optimization outperforms other techniques for model optimization. KNN, RF, SVM, and DT algorithms were discussed as available ML models for the forecast of heart disease with high accuracy, recall, and precision. The classification produced using SVM gave the highest accuracy of 86% in their prediction model on the UCI ML repository for heart diseases [ 16 ].

The authors of [ 17 ], using four ML algorithms and one NN, compared performance quantities to cardiac disease detection. To predict cardiac doses, the authors evaluated the algorithms in constraints of accuracy, recall, precision, and F1 settings. The Deep NN algorithm correctly identified heart disease 98% of the time. The author of [ 18 ] focused on the algorithm’s implementation with a medicinal dataset to demonstrate its utility in early disease prediction. According to the findings of the study, boosting and bagging are powerful ensemble approaches for enhancing the estimate accuracy of classifier whose accuracy is relatively low, as their performance in predicting the hazard of heart disease is better. Feature selection implementing improved the recital of process even more, and the results showed a significant increase in accuracy prediction. For weak classifiers, using ensemble methods resulted in a maximum increase in accuracy of 7%. ML algorithms have gained admiration in recent years owed to their increased accuracy as well as efficiency in predictions [ 19 ].

The capacity to generate and indicate models thru maximum accuracy and efficiency is critical in this field [ 20 ]. Because they mix several ML models with data systems, hybrid models [ 21 ] are a viable approach to illness prediction. The accuracy of weedy classifiers was enhanced through bagging besides boosting approaches, and the concert for risk detection of heart disease good rated. For the hybrid model development, they employed Bayes Net, NB, C 4.5, MLP, and, RF classifiers with majority voting. The created model has an accuracy of 85.48%. The UCI heart disease dataset has recently been subjected to ML techniques such as RF, SVM, besides learning models [ 22 ]. The voting-based model improved accuracy used in conjunction through multiple classifiers. Rendering to the research, the anemic classifiers improved accuracy by 2.1%.

ML classification techniques to forecast chronic illness were used in [ 23 ]. The Hoeffding classifier correctly predicted heart disease with an accuracy of 88.56% in their study. According to their findings, when collective with the specified characteristics, the hybrid model achieved an accuracy of 87.41%. The SVM classification model was used with the mean feature selection Fisher score strategy in [ 24 ]. In [ 25 ], the authors developed a unique prediction model based on many well-known classification approaches and a range of feature combinations. In the suggested HRFLM, an ANN with back propagation in addition 13 clinical characteristics as inputs was employed, and data mining approaches such as DT, SVM, NN, and KNN were explored. SVM has been shown to improve illness prediction accuracy. A new technique, vote, was presented, as well as a hybrid strategy combining LR and NB. Using the HRFLM method, an accuracy of 88.7% was achieved. An inclusive risk approaches for predicting heart fiasco mortality was constructed using an improved RandomSurvivalForest(iRSF) with great accuracy [ 26 ]. Using a unique split rule and stop criteria, iRSF was able to discriminate among survivors as well as non-survivors. A data mining method has also been used to diagnose cardiovascular disease [ 27 ].

To diagnose cardiac disorders, Bayesian, DT classifiers, NN, Association Law, KNN, and SVM, ML algorithms remained utilized. The accuracy of the SVM was 99.3%. Patient survival has been predicted using many machine learning classifiers [ 28 ]. Traditional biostatistics tests were compared to the offered ML methods, and characteristics associated with significant risk factors were graded. As a consequence, serum creatinine then ejection fraction was revealed to be the two utmost critical elements in generating accurate predictions. The ML algorithm [ 29 ] was used to build a CVD detection model. The dataset was prepared and investigated using four algorithms. The DT and RF methods had a precision of 99.83%, while the SVM and KNN methods had a precision of 85.32% and 84.49%, respectively. Another study [ 30 ] used the ensemble method to predict congestive heart failure (CHF) by analyzing heart rate variability (HRV) and filling in the gaps in related fields using deep neural networks. The proposed system’s accuracy rate was 99.85%.

In a recent paper [ 31 ], the authors developed an intelligence framework using mixed data factor analysis and RF-based MLA. RF was used to predict disease by means of the FAMD to treasure the applicable features. The precision of the proposed system was 93.44%, the sensitivity was 89.28%, and the specificity was 96.96%. In [ 32 ], the authors used a dataset with 303 instances, which was derived from the Cleveland dataset, to test their hypothesis. The proposed algorithm DT achieved a 75.55% accuracy rate.

Heart disease is frequently recognized as a cardiovascular disease. Several investigators are working on the forecast of heart disease. Their studies cover many aspects of cardiac illness. In [ 33 ], the author applied the REP Tree, R Tree, M5P Tree, LR, J48, NB, and JRIP on Hungarian and Statlog datasets to classify CVD. RF, DT, and LR are applied in [ 34 ]. AB, ET, LR, MNB, SVM, CART, LDA, XGB, and RF are applied in [ 35 ]. The purpose of this research is to predict the probability of people getting heart illness. The findings of [ 34 ] elaborates that LR reaches 92% accuracy, and in [ 35 ], SVM performs better by achieving 96% accuracy. In [ 36 ], the author claims that the DT model consistently beats the NB and SVM models. Its results show that SVM achieves 87% accuracy and DT achieves 90% accuracy, as shown in [ 37 ], while LR achieves the maximum accuracy in the prediction of heart disease at what time when equated to DT, SVM, NB, and KNN. The prediction accuracy provided by the RF-based framework is 97% [ 38 ], with a specificity of 88% and a sensitivity of 85% for the evaluation of congenital heart disease. In [ 39 ], we applied LR, MARS, EVF, and CART-ML techniques to perceive the co-existence of CVD and 94% accuracy, with a specificity of 95% and sensitivity of 93.5%. RF was applied in [ 40 ] for the prediction of medication targets involved in microorganism-associated CVD of host–host interactions and host–pathogen interactions.

To achieve a better solution, researchers proposed several ensembles and hybrid representations for cardio disease prediction. The proposed technique in [ 41 ] achieve 96%, 88.24% and 93%, accuracy on CVD obtained from the Mendeley Center, Cleveland datasets, and IEEE Port respectively. In [ 42 ], the author hybridized the LR and RF models for predicting heart disease and achieved an 88.7% accuracy level. These studies aim to investigate relatives between coronary artery calcium and carotid plaque in a-symptomatic entities, likewise in relative to predicted CVD occurrence risk [ 43 ]. Machine learning techniques combined with the IoT are currently widely used for predicting and detecting diseases. In [ 44 ], using mobile device technology, the author applied the deep learning approach and achieved a 94% accuracy in heart illness prediction. In [ 45 ], the author conjuncts the IoT with ML classifiers for the early prediction of heart infections. The objective was to demonstrate how ML may be used to solve the problem. We use ML to analyze cases associated with diseases and health conditions by analyzing hundreds of healthcare datasets [ 46 ].

In [ 47 ], the researchers worked on the advanced computer Vision for dependable Healthcare to determine how the computer vision practices support human needs such as psychological functioning, particular mobility, sensory functions, regular living activities, image processing, machine learning, pattern recognition, and how language processing then computer graphics collaborate with robotics. The authors observed and described how the users learn about emergent computer vision techniques for assisting mental functioning, approaches for investigating human behavior, and how keen interfaces and virtual realism tools contribute to the development of advanced restoration systems capable of performing human actions and activity recognition. The works support the existing contribution of computer vision in the health care sector such as the technologies behind the intelligent wheelchairs, potential help for blind people, and other computer vision-based solutions that have recently been used for safety and health monitoring. In [ 48 ], the authors applied multiple approaches such as SVM, GNB, LR, LightGBM, XGB, and RF for envisaging the heart disease risk. RF performed the best, achieving 88% accuracy for foreseeing the heart disease. The latest work of researchers is compared with our proposed approach. Our proposed approach achieves the highest accuracy as compared to the existing approaches that use the UCI repository dataset. Along with this, we evaluated accuracy, precision (specificity), recall (sensitivity), and F-Measure using the ten ML classifiers.

The etiology of cardiac disease is tranquil an unresolved global problem, and the main characteristics of cardiovascular diseases are high morbidity, disability, and mortality. As a result, efficient and effective early forecast of the likely results in affected role with cardio disease with AI is required. In this study, we applied an ensemble ML model for coronary disease prediction. In this work, ML classifiers are used to predict cardiac disease. The authors begin by addressing the dataset issue, which they then refine and standardize for tokenization and lower casting. Afterwards, the datasets were used to train and test the classifiers to evaluate the performance and to achieve the optimum accuracy. The inclusion criteria of these algorithms are to be state-of-the-art and representative and have high maturity. By analyzing the earlier researchers’ works, we used the Gradient Boosted Tree (GBT) and Multilayer Perceptron (MLP) earlier. We analyzed that the previous researchers had not used them on UCI heart dataset.

The significant contributions of the future effort are as follows:

(1) Firstly, Authors begin by addressing the matter of datasets, which they then refine and standardize. The datasets are then castoff to train in addition test classifiers in order to determine which ones provide the finest accuracy.
(2) Secondly, authors categorize the best values or features using the correlation matrix.
(3) Thirdly, the authors applied the ML classifiers to the preprocessed dataset to obtain the maximum accuracy which was performed through parameter tuning.
(4) Fourthly, the proposed classifiers are evaluated on accuracy, precision (specificity), recall (sensitivity), and F-Measure.

State of the Art.

In this work, the prediction accuracy of several ML approaches is investigated to evaluate coronary heart disease. The investigation of several ML classification approaches was performed on well-known UCI repository heart disease datasets using the following hardware and software: Processor Intel (R) Core (TM) i5-8256U CPU @ 1.602GHZ (8CPUs) 1.8 GHz, Memory 8192 MB RAM, Software Python, Jupyter Notebook

The comparison of the performance of the latest Gradient Boosted Tree, Multilayer Perceptron, and Random Forest along with these seven other ML classifiers in terms of cardiovascular disease prediction is inimitable. As a result, a system for predicting heart problems that are both efficient and accurate is now accessible. Furthermore, we endorse the best-suited ML classifier for designing and developing high-level intelligent systems to predict coronary heart disease.

3. Proposed Methodology

We used the ML classifiers to predict the existence of coronary heart disease with the heart dataset. The dataset was retrieved as of the UCI-repository [ 49 ], and data pretreatment was performed before selecting the features using feature engineering. Then, we fragmented it into two parts: a training dataset and a test dataset; around 70% of the entire data is utilized for training, whereas the rest is used for testing. The test dataset is utilized to assess classifiers, while the training is to develop a model that predicts heart disease. First, we explore the dataset before converting categorical values to numerical values for categorization.

In Step 1, we labeled the dataset with the “normal” and “diseased” labels. The normal label represents that a person is free from heart disease, and the diseased label shows that a patient is facing a heart problem. Then, in the training phases, in Step 2, we performed the data cleansing. As the dataset contains missing and incomplete values, we performed the data preprocessing and filled in the missing values by taking the mean. In Step 3, we performed the data visualization using the Exploratory Data Analysis (EDA) (discussed in Section 4 ) to check the correlation between different attributes. We noticed that FBS has a very weak correlation. After this, in Step 4, we applied the ML classifiers to the preprocessed dataset and evaluated the performance of the classifiers on different parameters. As discussed above, the dataset is split into test and training sets to evaluate the classifiers and train the model, respectively. The applied classifiers show different accuracies aimed at predicting the presence of heart illness. The phases of our proposed working technique are depicted in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g001.jpg

System Working Methodology.

The coronary artery contour shows the conditions of the coronary artery, such as the clear coronary artery (artery before the heart problem), the artery with atherosclerotic plaque, and the blocked artery that reduces the flow of the blood.

3.1. Dataset

The heart disease datasets were taken from the UCI repository [ 49 ]. This dataset comprises 303 instances, multivariate characteristics, containing the integer, categorical, and real values, and 14 attributes. The dataset’s description is provided in Table 2 .

Dataset Attributes Description.

3.2. Correlation Matrix

Correlation is a statistical feature that describes the strength and route of a linear relationship among two quantitative variables. The correlation between the columns is labeled in Table 3 . The majority of columns have a moderate correlation with the “num” variable, but ‘FBS’ has a very weak correlation.

Correlation Matrix Value.

A correlation matrix with heatmap is shown in Figure 2 . Using a heatmap, you can see how dependent values are affected by independent features. Furthermore, it is easy to see which features are greatest associated with the additional features variable. Figure 2 depicts the results.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g002.jpg

Correlation Matrix with a Heatmap.

4. Result and Analysis

In this Section, we plot the feature of the heart disease dataset vs. num (predictive attribute) for data visualization. Exploratory Data Analysis (EDA) is a technique used for analyzing datasets to summarize their main characteristics, which is frequently accomplished through the use of statistical graphics and other data visualization methods.

4.1. Disease Status

In diseased states, we concluded that, from a total of 303 instances, 165 patients had a heart disease problem. We represent ‘diseased’ with 1 and ‘normal’ with 0, and 138 patients are normal out of the total instances. From this, we derived that the percentage of patients who face heart glitches is 54.46%, and the fraction of patients without heart problems is 45.54%, as shown in Figure 3 . We also analyzed the other dataset attributes such as Age, Chest Pain, Sex, Exercise-Induced Angina, Fasting Blood Sugar, Resting ECG, Slope, Coronary Artery, and Thalassemia features.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g003.jpg

Heart Disease Status.

4.2. Analyzing Sex

In the sex attribute, we have two values, male and female: 0 is used for females, and 1 is used for males, as shown in Figure 4 . Females are additional likely to have heart problems than males, according to the findings.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g004.jpg

Sex heart disease chances.

4.3. Analyzing Age

We can see in the Figure below that the chances of heart disease do not depend upon age, as shown in Figure 5 dataset age statistics. The x-axis signifies age, while the y-axis epitomizes the target percentage.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g005.jpg

Heart Disease Dataset Age Statistics.

4.4. Analyzing Chest Pain

Patients with heart disease may experience chest pain. As shown in Figure 6 , we looked at chest pain in the subsequent categories: non-anginal pain = 2, asymptomatic = 3, atypical angina = 1, typical angina = 0. We have noticed that people who have ‘0’ chest pain, i.e., those who have typical angina, are considerably less likely to have heart difficulties. Patients who have atypical angina have increased chances of heart disease occurrence.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g006.jpg

Chest pain vs. heart disease chances.

4.5. Analyzing Fasting Blood Sugar

Fasting blood sugar (FBS) cannot play many roles in heart disease occurrence. We analyzed the dataset in which if the patient’s fasting blood sugar level exceeds 120 mg/dL, it means that they are facing it, and we represent it by the value 1 (True); the other case is represented by the value 0 (False), as shown in Figure 7 . The outcome shows that there is nothing extraordinary here for predicting the presence of heart disease.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g007.jpg

Fasting blood sugar vs. disease chances.

4.6. Analyzing Resting ElectroCardioGraphic

Resting ElectroCardioGraphic values are 0, 1, and 2. The outcome shows that individuals with Resting ECG values of ‘1’ and ‘0’ have increased chances of heart disease as compared to Resting ECG value ‘2′, as presented in Figure 8 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g008.jpg

Resting ECG vs. heart disease chances.

4.7. Analyzing Exercise-Induced Angina

In Figure 9 , people with angina are considerably less likely to have heart problems. If the value of exercise-induced angina is 1, it means ‘yes’, the patient has a heart problem; if it is 0, it means ‘no’, the patient is less likely to have heart problems.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g009.jpg

Exercise-Induced Angina vs. disease chances.

4.8. Analyzing Slope

We have three different types of slopes that cause heart problems: upsloping, downsloping, and flat. After visualizing the data, we notice the Slope ‘2’ cause significantly more heart pain Slope ‘1’ or Slope ‘0’, as shown in Figure 10 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g010.jpg

Slope vs. heart disease chances.

4.9. Analyzing Coronary Artery

In analyzing the coronary artery attribute of the heart disease dataset, we obtain the value of main vessels tinted by fluoroscopy, and its value is 0–4. If the value of the coronary artery is 4, there is an astonishingly great number of patients facing heart problems, as shown in Figure 11 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g011.jpg

Coronary Artery vs. disease chances.

4.10. Analyzing Thalassemia Affects the Heart

Thalassemia affects the heart. In this heart disease dataset, we can get the value of thalassemia as normal, fixed defect, and reversible defect. The values given in the dataset are 0, 1, 2, and 3, as shown in Figure 12 . From these values, it is detected that if the value of thalassemia is 2, it means the patient has a higher chance of carrying the heart disease problem.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g012.jpg

Thalassemia vs. heart disease chances.

5. Result and Discussions

In this Section, the outcomes of ML classifiers on different evaluation constraints such as precision, recall, and F-measure are discussed. Along with this, the accuracy of machine learning classifiers on the heart disease dataset is evaluated. kNN did not perform well; however, the RF, GBT, and MLP performed better as compared to other classifiers.

5.1. Evaluating Parameters

Accuracy, recall, precision, and F-measure are the main evaluation parameters considered in this research to evaluate the ML classifier’s performance, as presented in Table 4 . Consequently, the specificity (precision) and sensitivity (recall) of the focused class are computed to inspect the predicted accuracy of the particular algorithm. The accuracy, precision, recall, and F measure in ML are calculated using the “ TP —True Positive, TN —True Negative, FN —False Negative and FP —False Positive,” rate. All true positive and true negative predictions are split into all positive and negative predictions. All models predicted TP , TN , FN , and FP . Diseased is denoted by the letters TP . FN is a disease that is anticipated to not be heart disease. FP is a disease that was anticipated but never manifested. TN is not a disease in the real world, and it is not expected to be one in the future.

Accuracy of ML Classifiers.

Accuracy is measured as the number of fittingly identified examples divided by the total occurrences in the dataset as in Equation (1).
Precision : the average likelihood of retrieving relevant information, as indicated in Equation (2).
Recall: the average likelihood of complete retrieval, which is defined in Equation (3).
F-Measure: once the precision and recall for the classification problem have been calculated, the two scores are combined to compute the F-Measure. The conventional F measure is computed as shown in Equation (4).

5.2. Performance of ML Classifiers

The performance of ML approaches in terms of accuracy is listed in Table 4 . By associating the performance of these classifiers, we observed that Random Forest, Gradient Boosting Tree, and Multilayer perceptron performed well as, related to other ML classifiers, these models attained almost 96.28%, 95.83%, and 95% accuracy respectively, as shown in Figure 13 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g013.jpg

ML Classifiers Accuracy.

The ROCs (Receiver Operating Characteristic Curves) of these effective techniques such as RF, GBT, and MLP are represented in Figure 14 , Figure 15 , and Figure 16 , respectively.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g014.jpg

Random Forest Classifiers ROC.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g015.jpg

Gradient Boosting Tree Classifiers ROC.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-07227-g016.jpg

Multilayer perceptron Classifiers ROC.

6. Conclusions and Future Work

In this paper, ML classifiers are used to predict the presence of heart problems. The dataset was attained from UCI repository. The gained data is cleansed, and preprocessing is performed. After that, ML models are applied for predicting. The potential of these eleven applied ML approaches for predicting cardiac disease was assessed. The inclusion criteria of these algorithms are to be state-of-the-art and representative and have high maturity. By comparing with existing work, we have used the Gradient Boosted Tree (GBT) and Multilayer Perceptron (MLP) earlier, but other researchers have not used them on the UCI heart disease dataset, and we have achieved more accuracy compared to them, as described in the ‘state of the art’ table. The resultant outcomes reveal that from the applied ML classifiers, the Gradient Boosted Tree and Multilayer Perceptron achieve 95% accuracy in predicting the presence of coronary heart disease. However, the highest classification accuracy of 96.28% was achieved using Random Forest (RF) with a specificity and sensitivity of 0.9628 and 0.9537, respectively.

In the future, we will use the additional datasets to try to obtain more reliable conclusions, and we will optimize the parameters of the ML classifiers and deep learning methods using metaheuristic techniques and nature-inspired algorithms to more effectively evaluate the presence of heart disease through different heart disease-related datasets, as well as trying to enhance the accuracy of the existing algorithms.

Acknowledgments

The authors would like to acknowledge Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R51) , Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Abbreviations

Funding statement.

This work is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R51), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Author Contributions

Conceptualization, C.A.u.H., J.I., R.I., S.H., S.S.H.B. and S.S.U.; data curation, C.A.u.H., J.I., R.I., S.H., A.D.A., S.S.H.B., N.A. and S.S.U.; methodology, C.A.u.H., J.I., R.I., S.H., A.D.A., S.S.H.B., N.A. and S.S.U.; resources, C.A.u.H., J.I., R.I., S.H., S.S.H.B. and S.S.U.; funding acquisition, A.D.A. and N.A.; software, C.A.u.H., J.I., R.I., S.H., A.D.A., S.S.H.B., N.A. and S.S.U.; investigation, C.A.u.H., S.H. and A.D.A.; writing—original draft, C.A.u.H., J.I., R.I., S.H., N.A. and S.S.U.; writing—review and editing, C.A.u.H., J.I., R.I., S.H., A.D.A., S.S.H.B., N.A. and S.S.U. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

International Conference On Innovative Computing And Communication

ICICC 2023: International Conference on Innovative Computing and Communications pp 419–430 Cite as

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing

Jyoti Maurya 13 &
Shiva Prakash 13
Conference paper
First Online: 26 October 2023

130 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 731))

Heart disease is currently regarded as the main cause of illness. Regardless of age group, heart disease is a serious condition nowadays because most individuals are not aware of their kind and level of heart disease. In this fast-paced world, it is essential to be aware of the different types of cardiac problems and the routine disease monitoring process. As per the statistics from the World Health Organization, 17.5 million deaths are because of cardiovascular disease. Manual feature engineering, on the other hand, is difficult and generally requires the ability to choose the suitable technique. To resolve these issues, IoT, machine learning models and cloud techniques, are playing a significant role in the automatic disease prediction in medical field. SVM, Naive Bayes, Decision Tree, K-Nearest Neighbor, and Artificial Neural Network are some of the machine learning techniques used in the prediction of heart diseases. In this paper, we have described various research works, related heart disease dataset, and comparison and discussion of different machine learning models for prediction of heart disease and also described the research challenges, future scope and discussed the conclusion. The main goal of the paper is to review the latest and most relevant papers to identify the benefits, drawbacks, and research gaps in this field.

Heart disease
Cloud computing
Machine learning

This is a preview of subscription content, log in via an institution .

Buying options

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Verma G, Prakash S (2021) Internet of things for healthcare: research challenges and future prospects. In: Advances in communication and computational technology, pp 1055–1067

Google Scholar

Raj A, Prakash S, Srivastva J, Gaur R (2023) Blockchain-based intelligent agreement for healthcare system: a review. In: International conference on innovative computing and communications, pp 633–642

Bhagchandani K, Peter Augustine D (2019) IoT based heart monitoring and alerting system with cloud computing and managing the traffic for an ambulance in India. Int J Electr Comput Eng 9(6):5068–5074. https://doi.org/10.11591/ijece.v9i6.pp5068-5074

Verma G, Prakash S (2020) Pneumonia classification using deep learning in healthcare. Int J Innov Technol Explor Eng 9(4):1715–1723. https://doi.org/10.35940/ijitee.d1599.029420

Divya BN, Gowrika GN, Hamsa N (2022) Review on IoT based heart rate monitoring system. Int J Adv Res Sci Commun Technol 3(3):354–356. https://doi.org/10.48175/ijarsct-3129

Rai AK, Daniel AK (2021) Energy-efficient routing protocol for coverage and connectivity in WSN. In: Proceedings of 1st international conference advanced computing and communication technologies ICACFCT 2021, pp 140–145. https://doi.org/10.1109/ICACFCT53978.2021.9837364

Rai AK, Daniel AK (2021) An energy-efficient routing protocol using threshold hierarchy for heterogeneous wireless sensor network. Lect Notes Data Eng Commun Technol 57:553–570. https://doi.org/10.1007/978-981-15-9509-7_45

Article Google Scholar

He Q, Maag A, Elchouemi A (2020) Heart disease monitoring and predicting by using machine learning based on IoT technology. CITISIA 2020—IEEE conference on innovative technologies in intelligent systems and industrial applications, proceedings, pp 1–10. https://doi.org/10.1109/CITISIA50690.2020.9371772

Sharma R, Prakash S, Roy P (2020) Methodology, applications, and challenges of WSN-IoT. In: 2020 international conference on electrical and electronics engineering (ICE3), pp 502–507. https://doi.org/10.1109/ICE348803.2020.9122891

Umer M, Sadiq S, Karamti H, Karamti W, Majeed R, Nappi M (2022) IoT based smart monitoring of patients’ with acute heart failure. Sensors 22(7):1–18. https://doi.org/10.3390/s22072431

Maurya J, Kumari S, Tiwari S, Maurya P, Agrawal S, Face recognition attendance system using OpenCV

Gaur R, Prakash S, Kumar S, Abhishek K, Msahli M (2022) A machine-learning—blockchain-based authentication using, pp 1–19

Ganesan M, Sivakumar N (2019) IoT based heart disease prediction and diagnosis model for healthcare using machine learning models. In: 2019 IEEE international conference on system, computation, automation and networking, ICSCAN 2019, pp 1–5. https://doi.org/10.1109/ICSCAN.2019.8878850

Prakash S, Rajput A (2018) Hybrid cryptography for secure data communication in wireless sensor networks. Adv Intell Syst Comput 696:589–599. https://doi.org/10.1007/978-981-10-7386-1_50

Colak C, Karaman E, Turtay MG (2015) Application of knowledge discovery process on the prediction of stroke. Comput Methods Programs Biomed 119(3):181–185. https://doi.org/10.1016/j.cmpb.2015.03.002

Holman DV (1946) Diagnosis of heart disease. Med Bull 6(5):274–284. https://doi.org/10.1126/science.69.1799.0xiv

Sultana M, Haider A, Uddin MS (2017) Analysis of data mining techniques for heart disease prediction. In: 2016 3rd international conference on electrical engineering and information communication technology, iCEEiCT 2016. https://doi.org/10.1109/CEEICT.2016.7873142

Deepika K, Seema S (2017) Predictive analytics to prevent and control chronic diseases. In: Proceedings of 2016 2nd international conference on applied and theoretical computing and communication technology, iCATccT 2016, pp 381–386. https://doi.org/10.1109/ICATCCT.2016.7912028

Acharya UR et al (2017) Application of higher-order spectra for the characterization of Coronary artery disease using electrocardiogram signals. Biomed Signal Process Control 31:31–43. https://doi.org/10.1016/j.bspc.2016.07.003

Saqlain M, Hussain W, Saqib NA, Khan MA (2016) Identification of heart failure by using unstructured data of cardiac patients. In: Proceedings international conference on parallel process. Work, pp 426–431. https://doi.org/10.1109/ICPPW.2016.66

Davari Dolatabadi A, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Programs Biomed 138:117–126. https://doi.org/10.1016/j.cmpb.2016.10.011

Shah SMS, Batool S, Khan I, Ashraf MU, Abbas SH, Hussain SA (2017) Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis. Phys A Stat Mech Appl 482:796–807. https://doi.org/10.1016/j.physa.2017.04.113

Article MATH Google Scholar

Chala Beyene M (2020) Survey on prediction and analysis the occurrence of heart disease using data mining techniques. 2018 Jan 2018 [Online]. Available: http://www.ijpam.eu

Nagamani T, Logeswari S, Gomathy B (2019) Heart disease prediction using data mining with Mapreduce algorithm. 3:137–140

Saw M, Saxena T, Kaithwas S, Yadav R, Lal N (2020) Estimation of prediction for getting heart disease using logistic regression model of machine learning. In: 2020 international conference on computer communication and informatics, ICCCI 2020, pp 20–25. https://doi.org/10.1109/ICCCI48352.2020.9104210

Patro SP, Nayak GS, Padhy N (2021) Heart disease prediction by using novel optimization algorithm: a supervised learning prospective. Inform Med Unlock 26. https://doi.org/10.1016/j.imu.2021.100696

Ashraf M et al (2021) Prediction of cardiovascular disease through cutting-edge deep learning technologies: an empirical study based on TENSORFLOW, PYTORCH and KERAS. Adv Intell Syst Comput 1165:239–255. https://doi.org/10.1007/978-981-15-5113-0_18

Absar N et al (2022) The efficacy of machine-learning-supported smart system for heart disease prediction. Healthc 10(6):1–19. https://doi.org/10.3390/healthcare10061137

Sandhiya S, Palani U (2022) An IoT enabled heart disease monitoring system using grey wolf optimization and deep belief network [Online]. Available: https://doi.org/10.21203/rs.3.rs-1058279/v1

Srivastava A, Singh AK (2022) Heart disease prediction using machine learning. In: 2022 2nd international conference on advance computing and innovative technologies in engineering ICACITE 2022, vol 9, no 04, pp 2633–2635. https://doi.org/10.1109/ICACITE53722.2022.9823584

Nancy AA, Ravindran D, Raj Vincent PMD, Srinivasan K, Gutierrez Reina D (2022) IoT-cloud-based smart healthcare monitoring system for heart disease prediction via deep learning. Electron 11(15):2292. https://doi.org/10.3390/electronics11152292

Download references

Author information

Authors and affiliations.

Department of Information Technology and Computer Application, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, 273010, India

Jyoti Maurya & Shiva Prakash

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jyoti Maurya .

Editor information

Editors and affiliations.

IT Department, Cairo University, Giza, Egypt

Aboul Ella Hassanien

Tijuana Institute of Technology, Tijuana, Mexico

Oscar Castillo

Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi, India

Sameer Anand

Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, India

Ajay Jaiswal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper.

Maurya, J., Prakash, S. (2024). Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing. In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. ICICC 2023. Lecture Notes in Networks and Systems, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-99-4071-4_33

Download citation

DOI : https://doi.org/10.1007/978-981-99-4071-4_33

Published : 26 October 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-99-4070-7

Online ISBN : 978-981-99-4071-4

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Research article
Open access
Published: 21 June 2021

A novel approach for heart disease prediction using strength scores with significant predictors

Armin Yazdani 1 ,
Kasturi Dewi Varathan ORCID: orcid.org/0000-0003-3421-4501 2 ,
Yin Kia Chiam 1 ,
Asad Waqar Malik 3 &
Wan Azman Wan Ahmad 4

BMC Medical Informatics and Decision Making volume 21 , Article number: 194 ( 2021 ) Cite this article

12k Accesses

33 Citations

1 Altmetric

Metrics details

Cardiovascular disease is the leading cause of death in many countries. Physicians often diagnose cardiovascular disease based on current clinical tests and previous experience of diagnosing patients with similar symptoms. Patients who suffer from heart disease require quick diagnosis, early treatment and constant observations. To address their needs, many data mining approaches have been used in the past in diagnosing and predicting heart diseases. Previous research was also focused on identifying the significant contributing features to heart disease prediction, however, less importance was given to identifying the strength of these features.

This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. The study is aimed at predicting heart disease based on the scores of significant features using Weighted Associative Rule Mining.

A set of important feature scores and rules were identified in diagnosing heart disease and cardiologists were consulted to confirm the validity of these rules. The experiments performed on the UCI open dataset, widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease.

This study managed to provide a significant contribution in computing the strength scores with significant predictors in heart disease prediction. From the evaluation results, we obtained important rules and achieved highest confidence score by utilizing the computed strength scores of significant predictors on Weighted Associative Rule Mining in predicting heart disease.

Peer Review reports

Introduction

Cardiovascular disease (CVD) is one of the most life-threatening diseases in the world. The World Health Organization (WHO) as well as the Global Burden of Disease (GBD) study reported cardiovascular disease as the main cause of death around the globe annually [ 40 , 56 ]. WHO revealed that CVD is expected to affect almost 23.6 million people by the year 2030. In some industrialized countries such as the United States of America, the rate is about 1 in 4 deaths [ 34 ]. The Middle East and North Africa (MENA) region has an even higher percentage, which is 39.2% of the mortality rate [ 20 ]. Hence, early and accurate diagnosis and the provision of appropriate treatments are keys to reducing the amount of death causing cardiovascular diseases. Availability of such services is essential for those who have a high risk of developing heart disease [ 29 ].

There are many features that contribute to heart disease prediction. Researchers in the past were more focused on identifying significant features to be used in their heart disease prediction models [ 8 ]. Less importance was given to determining the relationships between these features and to identifying their level of priority [ 32 , 32 ] within the prediction model. To address the issues which hinder early and accurate diagnosis, many data mining related studies were previously conducted [ 9 , 16 , 28 ].

Weighted Association Rule Mining (WARM) is one of the data mining techniques used to discover the relationships between features and to determine mining rules that lead to certain predictions [ 22 ]. The weight that is used in this mining technique provides users with a convenient way to indicate the importance of the features that contributes to heart disease and helps obtain more accurate rules [ 4 ]. In many prediction models, different features have different importance. Hence, different weights are assigned to different features based on their predicting capabilities [ 48 ]. The failure in determining the weight indicates the failure in determining the importance of the features.

Past research had used Weighted Associative Rule Mining (WARM) in heart disease prediction [ 18 , 31 , 46 , 48 , 50 ]. However, the prediction model reported in these studies still demands further exploration in terms of the number of features used, the strength of these features and the evaluation of scores obtained. In this research, we proposed an algorithm to compute the weight of each feature that contributes to heart disease prediction. We have experimented on all features as well as selected significant features using WARM. The results obtained showed that the significant features outperformed all features with the highest confidence score of 98% in predicting heart disease. To the best of our knowledge, this study is the first that used strength scores of significant predictors in WARM.

The rest of the paper is organized as follows: Sect. 2 presents the background of the study followed by Sect. 3 on research objectives. Section 4 presents the methodology and Sect. 5 displays the results obtained by this research. Section 6 includes the discussions and Sect. 7 benchmarks this research against previous studies. Finally, Sect. 8 concludes the research with a summary of the findings and future work.

Related works

CVDs are disorders of the heart and blood vessels and include coronary heart disease, cerebrovascular disease and other conditions. Heart attacks and strokes are the main causes of mortality in cardiovascular disease in which the rate nears one out of three [ 6 ]. With the high rate of mortality, diagnosis and prevention measures need to be performed effectively and efficiently. Many data mining techniques have been used to help address these issues (Amin et al. [ 8 ]). Most of the past research looked into identifying features that contribute to better heart prediction accuracy [ 9 ]. However, very little researches looked into the relationships that exist between these features. The relationship between each feature that contributes to heart disease prediction can be obtained by using the Associative Rule Mining (ARM) technique [ 11 ]. The ARM technique is popular in transactional and relational datasets. The hidden knowledge in large datasets such as business transactions developed the interest of many business owners to understand the patterns that can help them to improve their business decisions (Agarwal and Mithal [ 1 ]). For instance, discovering the frequently bought items by customers in market basket analysis. This analysis looks at the various items found in customers’ shopping cart and identifies the associations between them. A good example would be if customers were looking to purchase milk, they were likely to purchase bread on the same trip to the supermarket. This approach is also widely used in the healthcare industry specifically in privacy preservation of healthcare data [ 15 ], predicting cancer associated protein interactions [ 12 ], predicting obstructive sleep apnea [ 43 ] and predicting co-diseases in Thyroid patients [ 23 ].

ARM is also used in heart disease prediction. Table 1 shows the studies that used ARM in heart disease prediction. Akbaş et al. [ 3 ], Shuriyaa and Rajendranb [ 42 ], Srinivas et al. [ 49 ], Khare and Gupta [ 24 ] and Lakshmi and Reddy [ 27 ] have used ARM on UCI dataset. Some of the studies listed in Table 1 used private datasets from hospitals and heart centres. Although the scores that were obtained from these datasets are high (99% by Sonet et al. [ 45 ]), 100% by Thanigaivel and Kumar [ 52 ], the studies have a limitation in terms of reproduction, as the datasets are not open for access. Akbaş et al. [ 3 ] on the other hand obtained a score of 97.8% in confidence using the UCI dataset. However, the confidence score obtained predicted people with no risk of heart disease.

Weighted Associative Rule Mining (WARM) is an extension of ARM, in which weights are assigned to differentiate the importance of the features mined. Let T be the training dataset in which contains T = {r 1 , r 2 , r 3 … r i } with a set of weight associated with each {attribute, attribute value} pair. Every ith record r i is a set of value and weight w i attached to each feature of r i tuple / record. In a weighted framework, each record is a set of triple {a i , v i , w i } where feature a i has a value of v i and weight of w i where 0 < w j < = 1.

Assigning a correct weight to each feature is a hard task. In various fields of studies, there are different ways of calculating the weights of features. For instance, according to Malarvizhi and Sathiyabhama [ 30 ] in web mining, visitor page dwelling time is a way of calculating weightage. WARM is widely used in research on shopping basket scenarios and in predicting customers’ behaviour. Chengis et al. [ 10 ] investigated on assigning weight before and after ARM. WARM was also used in predicting disease comorbidities using clinical as well as molecular data (Lakshmi and Vadivu 26 ). This technique is also used in predicting breast cancer [ 5 ]. Recent research by Park and Lim [ 39 ] used this technique to reduce design failures of pre-alarming systems in the shipbuilding industry.

However, not many researchers focused on applying WARM to cardiovascular disease. Table 2 shows studies on heart disease prediction using WARM. However, the weight of features was not precisely calculated (Jabbar et al. [ 21 ], Sundar et al. [ 50 ], Soni and Vyas [ 48 ]). Soni et al. [ 47 ] proposed a new framework, which was an associative classifier that used WARM. Different weights were assigned to different attributes based on their predicting capability. Their theoretical model yielded a confidence score of 79.5%. Soni and Vyas [ 48 ] also applied WARM and the confidence level they achieved was was 79.5%. Their research assigned weights based on age range, smoking habits, hypertension and BMI range. On the other hand, Soni et al. [ 46 ] assigned weights to each of the attributes based on the advice obtained from the medical experts. They presented an intelligent and effective heart attack prediction system using a weighted associative classifier by achieving a maximum score of 80% confidence. Meanwhile, Sundar et al. [ 50 ] developed a system using two data mining techniques, which are Naïve Bayes and WARM. Their experiments showed that WARM achieved a score of 84% on confidence score, outperforming Naïve Bayes, which obtained only 78%. Chauhan et al. [ 11 ] also used WARM in predicting heart disease. They obtained an accuracy score of 60.4%. Kharya et al. [ 25 ] used Weighted Bayesian Association Rule Mining Algorithm, which combines WARM with heart disease dataset. However, they failed to indicate the results obtained in their study. Ibrahim and Sivabalakrishnan [ 19 ] have used Random Walker Memetic algorithm-based WARM for predicting coronary disease. They obtained an accuracy of 95% using the UCI heart disease dataset.

Despite having research that is based on WARM in predicting heart disease, none of them was focused on identifying the important features to be used in heart disease prediction which would contribute to better prediction performance. The weight of each feature plays an equally important role in deciding which feature has the highest impact (strength) in predicting heart disease. The right weight of the significant features identified will yield an effective prediction model. Thus, this research is focused on identifying the weight of significant features and utilizing the generated score in predicting heart disease.

Research objectives

The main objectives of this research are as follows:

To compute the weight of significant features in heart disease prediction.

To predict heart disease using the computed weight of significant features (using WARM).To evaluate the performance of WARM in predicting heart disease.

Proposed methodology

This section describe in detail the methodology used as shown in Fig. 1 . It contains 5 main stages which are data pre-processing, feature selection, feature weight computation, apply WARM and model evaluation.

Methodology

This research uses the heart disease dataset that is obtained from UCI Machine Learning Repository [ 13 ]. UCI Machine Learning Repository is one of the largest available datasets, having over 417 various datasets. The Cleveland dataset from UCI Machine Learning Repository is one of the datasets on heart disease, which is widely used by researchers to date (Amin et al. [ 8 ]). This research will also use this dataset of which contains 303 rows. The dataset contains 76 features in which 14 attributes including class label are used. The 14 features together with their descriptions and data types are shown in Table 3 .

Experimental Setup

In this research, Weka 3.8 was used to conduct the experiments. The retrieved Cleveland dataset went through a pre-processing phase. The significant features were retrieved from a total of 14 factors from the Cleveland dataset (Amin [ 7 ]). Further, the weight of each significant feature was computed and assigned back to them accordingly. WARM was applied to the heart disease dataset to generate rules. Finally, evaluation was performed to obtain the confidence score of the best rules generated using WARM based on significant features. The detailed explanation of each process is explained in the following sections.

Data Pre-Processing

In the data pre-processing phase, all missing records were deleted from the dataset, which consists of 6 instances. Based on Table 3 , there are 13 normal attributes(age’, ‘sex’, ‘cp’, ‘trestbps’, ‘chol’, ‘fbs’, ‘restecg’, ‘thalach’, ‘exang’, ‘oldpeack’, ‘slope’, ‘ca’, ‘thal’) and 1 class label(‘goal’), which refers to the criticality level of heart disease in patients. It ranged from 0–4, in which 0 refers to’No Heart Disease’ and the other values indicates the presence of heart disease at different criticality levels. Since this research aims at predicting the presence of heart disease and not its criticality levels, the range from 1 to 4 is thus normalized to 1, which indicates the presence of heart disease, and 0 to represent the absence of heart disease. Data normalization is also performed as a part of the data transformation process that involved mounting data into nominal data. This is required, as WARM utilizes nominal data only. All the ranges formed for each features are indicated in Table 4 .

Feature Selection

Features were selected based on experiments conducted by Amin et al. [ 8 ] since they had used the same dataset (UCI). They performed a set of experiments that dealt with 8100 combinations of features with 7 different classification models (K-NN, Decision Tree, Naïve Bayes, Logistic Regression, Neural Network and Vote) to identify significant features. Table 5 shows the features obtained from the highest performance of each classification models. The highlighted columns indicate the features which appeared more than 10 times and thus were selected as significant features. The selected 8 features are sex, CP, Fbs, Exang, Oldpeak, Slope, CA, and Thal.

Feature weight computation

This section explains how the weight of the features was calculated. The fundamental of WARM states that different features in a dataset have different importance in predicting heart disease. The weight of each feature ranges from 0 to 1. Thus, a weight that is closer to 1 indicates a more significant feature. On the other hand, a weight that is closer to 0 is the least significant in heart disease prediction.

Calculate feature weight

The first step was to calculate the individual feature weights. Let R be the set of features R = {n 0 , n 1 , n 2 … n i } and (n > 0). In this experiment, the total number of features is 13 and after feature selection, it is reduced to 8 (Sex, CP, Fbs, Exang, Oldpeak, Slope, CA, and Thal). W (n) is the weight of each feature (W is the weight of each feature to be calculated and n represents a feature),

For example, the value of sex as displayed in Table 5 is’20’ and the sum of all the features will be’121’. The total value of significant features (Sex, CP, Fbs, Exang, Oldpeak, Slope, CA, andThal) is calculated as (20 + 18 + 12 + 12 + 14 + 12 + 19 + 14 = 121). Thus, to calculate the weight of ‘sex’ (weight of features, WOF):

Table 6 displays the calculated weights for each of the significant features. All weights were computed accordingly. From the distribution of the weights, CA has the greatest strength followed by Sex, CP, Oldpeak and Thal, Fbs, Exang and Slope has the similar weight of 0.09 each.

Calculate feature value weight

This section explains how feature values are computed. Feature values represent all the values that a feature contains. For instance, feature values for sex are male and female. Let A be the number of each feature value contained in the dataset and (A ∪ B) be the total number of records.

Table 7 shows the total sub value of each feature based on the UCI dataset. Male value is represented by 203 records and female by 94 records which gives a total of 297 records from the UCI dataset. To calculate the value of each feature weight, let A be the selected value and B be the rest of the features value,

Figure 2 shows the comparison of the percentage of males and females in the Cleveland heart disease dataset.

Comparison on the percentage of male and female in Cleveland heart disease dataset

Calculate total weight for feature

This section explains how the total weight for features is computed. The feature weight (W (n)) and feature value weight (W (value)) gives the total weight (W (t)) for the feature. The computation is shown below.

Example of calculating the total weight of feature W (t):

This section detailed out the algorithm to obtain the weighted score of each feature in predicting heart disease. The algorithm is stated as follows:

Not all features in the heart disease dataset have the same level of significance in predicting the risk of heart disease. Thus, different weights based on their prediction capability are assigned. These values are then imported into Weka 3.8 to experiment with WARM using Apriori Algorithm.

Apriori algorithm

The Apriori algorithm is a well-known approach in WARM. Apriori was first proposed by Agrawal and Srikant [ 2 ]. The algorithm starts with a dataset including transactions that wants to construct frequent item sets, having at least a user-specified threshold. In the algorithmic process of Apriori, an item set X of length k is frequent if and only if every subset of X, having length k—1, is also frequent. This consideration results in a substantial reduction of search space and allows rule discovery in a computationally feasible time. Apriori generates a rule of the form: s = > (f – s) if and only if the confidence of the rule is above the user-defined threshold. Confidence is essentially the accuracy of the rule and is used in Apriori to rank the rules (Agrawal & Srikant [ 2 ]; Mutter et al. [ 51 ]).

Weighted confidence

The confidence level is used in order to show how often the rule appears to be true. Let Y be the ‘goal’, then the weighted confidence of a rule X → Y can be calculated as the ratio of weighted support of $\left( {X \cup Y} \right)$ over the weighted support of (X).

For instance, the rule {sex = Male, CA = 3} → {heart disease} has a confidence of 0.2/0.2 = 1.0. It means a patient who is a male and having 3 CA (major vessels coloured by fluoroscopy) has a 100% chance of having heart disease.

This phase generates rules based on the Apriori algorithm in Weighted Associative Rule Mining. Two sets of rules and confidence scores were generated for the followings:

All features—this includes all the 13 features.

Selected significant features (8 features).

The following section provides a detailed explanations of the results obtained which are the rules and confidence scores.

Results (rules and confidence level generated)

The rules and confidence level generated for all the (13) features and the selected significant features (8) are shown in this section.

All features

Table 8 shows the top 20 rules and confidence scores obtained for all the features using WARM. The rules were sorted by the highest confidence scores.

The highest confidence level achieved for predicting the risk of having heart disease is 96% and the number of features used to generate this rule is 3(CP, Slope and Thal). This can be clearly seen in Table 8 (Rule Number 7). The rule states that if the value of Chest Pain (CP) is asymptomatic, the slope is flat and the value of Thallium (Thal) is reversible, therefore, the patient has a very high tendency (confidence level = 96%) of having the risk of heart disease. All the highlighted rows in Table 8 show the rules that contributed to the prediction of the risk of having heart disease. Further, the Table 9 is the summary that shows the frequency of each features used in the rules, which were generated from Table 8 (which contains the rules that predicts heart disease). It shows the rule number and the features used in each of the top 20 rules. From the top 20 rules, only 6 rules predicts heart disease and others are non-sick rules which predicts no heart disease.

Although all 13 features have been used for rules and confidence score generation as shown in Table 8 , only 9 features have been used for heart disease prediction based on the top 20 rules. The most significant feature in predicting heart disease is CP. This feature exist in all the 6 rules generated that predicts heart disease. Thal and Oldpeak exist in 4 rules out of the 6 rules in predicting heart disease.

Selected significant features

This section emphasizes on the rules and confidence scores obtained by the selected significant features. Table 10 shows the top 20 rules generated from the significant features using WARM. The confidence score obtained in predicting the risk of having heart disease using 8 selected significant features shows a comparatively high confidence level at 98%. The rule obtained for the top confidence score states as.

CP = asymptomatic, Exang = Yes, Oldpeak = greaterThanZero, Thal = reversible = = > class_HD = Heart Disease.

which means if Chest Pain (CP) is asymptomatic, exercise-induce angina (Exang) is present, Oldpeak (ST depression induced by exercise relative to rest) is present and, Thallium heart scan (Thal) is reversible then the patient is diagnosed as having heart disease. From the top 20 rules generated, 11 rules are meant for predicting heart disease as highlighted in Table 10 . Table 11 shows the summary of the frequency of existence of each features contained in the rules that predicts heart disease. There are a total of 11 rules out of 20 rules generated using significant features to predict the presence of heart disease. The most significant feature that exists in all the positive rules that predicts the Heart Disease is Chest pain (CP). Thallium heart scan (Thal) is seen in 9 out of 11 rules and Oldpeak (ST depression induced by exercise relative to rest) is seen in 7 rules.

Discussions

The implementation of WARM on selected significant features managed to achieve the highest confidence score in predicting heart disease which is 98% compared to 96% obtained from all features. It can be concluded that WARM predicts the risk of having heart disease well. From the top 20 rules generated, only 6 rules were based all features. On the other hand, 11 rules from the top 20 generated were based on the selected 8 features.

Studying the top 20 rules generated revealed some significant information. These findings were validated by a cardiologist:-

Asymptomatic chest pain, positive exercise-induced angina, Oldpeak > 0 and reversible thallium heart scan implies the presence of heart disease.

CP = asymptomatic, Exang = Yes, Oldpeak = greaterThanZero, Thal = reversible = = > class_HD = Heart Disease

Asymptomatic chest pain is one of the most important features as it appears in all the rules generated in detecting heart disease.

Reversible thallium heart scan and Oldpeak greater than zero are positively correlated with heart disease.

Males are more prone to have heart disease compared to females as all the sick rules stated sex as male and the healthy rules stated sex as female.

There is a strong negative correlation between CA and Thal for heart disease prediction.

The most common features that exist in healthy rules are Sex = Female, Exang (Exercise induce angina) = No and CA (Number of major vessels coloured by fluoroscopy) = Zero. A patient will be predicted as not having heart disease if the patient is female, angina is not induced by exercise and has no major vessels coloured by fluoroscopy.

Slope is not featured in any of the healthy rules.

This study managed to determine the processes involved in obtaining significant features and to devise a scoring mechanism to obtain the strength of each feature. This will enable for the correct weight to be imposed on each of the significant features to be used in WARM for predicting heart disease. The confidence score obtained in this study is the highest obtained in heart disease prediction using WARM based on the UCI dataset. This study can be used as a guide for computing thestrength scores of significant features found in other heart disease datasets.

Comparative analysis with existing work

This section performs comparison between the proposed work and existing works using WARM. The results obtained in this research proved that the weighted scores imposed on WARM for 8 significant features have the highest confidence score of 98% compared with other existing studies. Figure 3 shows the confidence score of all the existing studies on WARM that used the UCI Cleveland heart disease dataset in comparison with the proposed work. The confidence score obtained by both the experiments which includes all features and significant features in predicting heart disease using WARM achieved a significant difference in terms of the confidence score achieved compared to previous studies. The use of the significant features score in WARM provides the highest confidence of 98% predicting heart disease.

Result comparison on WARM using UCI Cleveland heart disease dataset

Table 12 presents a comparative analysis of WARM using significant features versus existing results of ARM in heart disease prediction. Rules that gave the highest confidence scores were retrieved and compared in this table. Research by Said et al. [ 41 ] and Khare and Gupta [ 24 ] showed lower confidence scores compared to this research. Although Sonet et al., [ 45 ] managed to obtain a confidence score of 99%, the rule generated for this score is questionable. The rule stated that if a patient has diabetes, then the patient will have heart disease. Although the risk of having heart disease is proven to be higher in diabetic patient, this rule cannot be generalized for all diabetic patients. This is the result of bias that might have existed in their dataset. The dataset used in their study is collected from 4 different medical institutions with a total of 131 records and is not an open dataset. Besides that, the dataset contained different features from the dataset used in this study.

This study also benchmarked the rules generated using the UCI dataset by past researches with the rules generated in our study. The extracted healthy rules are shown in Table 13 and sick rules are shown in Table 14 . Table 13 shows that our experiment with 8 significant features obtained the optimum confidence score of 100% for predicting healthy rules. The rules retrieved for this stated that if the sex is female, chest paint is non-angina and thallium heart scan is normal, this person is then predicted not to have heart disease.

Table 14 shows the sick rules together with the highest confidence scores of this research in comparison with other resesarch on associative and WARM for heart disease prediction. This study achieved a confidence score of 98% which is better than all the other predicted sick rules. To the best of our knowledge, the significant features’ weighted scores in our study managed to beat the scores obtained by all other research using ARM and WARM to predict heart disease.

This research contributed to obtaining the highest confidence score using significant features in WARM for heart disease prediction. Assigning appropriate weight scores have proven to improve the performance of confidence level in the prediction. A set of significant features with different weights to represent the strength of each of the features was used in heart disease prediction. To the best of our knowledge, this is the first study that made use of significant features in executing WARM. This research has also contributed to listing the top rules in predicting heart disease based on the UCI dataset. This is the first research that benchmarked the healthy rules and sick rules with the highest confidence scores. Future researches may look into predicting the risk levels of heart disease, as this will help medical practitioners and patients to gauge their heart disease severity. The algorithm used in this study for measuring weight can be further explored for use with other datasets to cater to other prediction models using the weighted approach. The machine learning techniques used in feature selection phase of this research is limited to the most popular techniques used in heart disease prediction research. Future researchers should look into exploring other machine learning techniques in selecting the significant features.

Data availability of materials

The datasets analysed during the current study are available as Cleveland Dataset in UCI Machine Learning Repository, [ https://www.kaggle.com/ronitf/heart-disease-uci ]

Agarwal R, Mittal M. Inventory classification using multi-level association rule mining. Int J Dec Supp Syst Technol. (IJDSST), 2019;11(2):1–12.

Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of 20th international conference very large data bases, VLDB. Vol. 1215, pp. 487–499; 1994.

Akbaş KE, Kivrak M, Arslan AK, Çolak C. Assessment of association rules based on certainty factor: an application on heart data set, in 2019 International artificial intelligence and data processing symposium (IDAP) (pp. 1–5). IEEE; 2019.

Altaf W, Shahbaz M, Guergachi A. Applications of association rule mining in health informatics: a survey. Artif Intell Rev. 2017;47(3):313–40.

Article Google Scholar

Alwidian J, Hammo BH, Obeid N. WCBA: weighted classification based on association rules algorithm for breast cancer disease. Appl Soft Comput. 2018;62:536–49.

American Heart Association. Heart disease and stroke statistics 2017 at-a-glance. Geraadpleegd van: https://healthmetrics.heart.org/wp-content/uploads/2017/06/Heart-Disease-and-Stroke-Statistics-2017-ucm_491265.pdf .

Amin MS. Identifying significant features and data mining techniques in predicting cardiovascular disease; 2018.

Amin MS, Chiam YK, Varathan KD Identification of significant features and data mining techniques in predicting heart disease. Telem Inform. 2019;36;82–93.

Bashir, S., Khan, Z. S., Khan, F. H., Anjum, A., & Bashir, K. (2019). Improving heart disease prediction using feature selection approaches. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) (pp. 619–623). IEEE.

Cengiz AB, Birant KU, Birant D. Analysis of pre-weighted and post-weighted association rule mining, in 2019 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1–5). IEEE.

Chauhan A, Jain A, Sharma P, Deep V. Heart disease prediction using evolutionary rule learning, in 2018 4th International conference on computational intelligence & communication technology (CICT) (pp. 1–4). IEEE; 2018.

Dey L, Mukhopadhyay A. Biclustering-based association rule mining approach for predicting cancer-associated protein interactions. IET Syst Biol. 2019;13(5):234–42.

Dua, D., Graff, C. UCI machine learning repository [ http://archive.ics.uci.edu/ml ]. Irvine, CA: University of California, School of Information and Computer Science; 2019.

Domadiya N, Rao UP. Privacy-preserving association rule mining for horizontally partitioned healthcare data: a case study on the heart diseases. Sādhanā. 2018;43(8):1–9.

Domadiya N, Rao UP. Privacy preserving distributed association rule mining approach on vertically partitioned healthcare data. Procedia Comput Sci. 2019;148:303–12.

Fitriyani NL, Syafrudin M, Alfian G, Rhee J. HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access. 2020;8:133034–50.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011.

Google Scholar

Ibrahim SP, Sivabalakrishnan M. An enhanced weighted associative classification algorithm without preassigned weight based on ranking hubs. Int J Adv Comput Sci Appl. 10(10); 2019.

Ibrahim SS, Sivabalakrishnan M. An evolutionary memetic weighted associative classification algorithm for heart disease prediction. In Recent Advances on Memetic Algorithms and its Applications in Image Processing (pp. 183–199). Springer, Singapore; 2020.

James SL, et al. Global, regional, and national incidence, prevalence, and yearslived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet392 (10159), 1789–1858; 2018.

Jabbar MA, Deekshatulu BL, Chandra P. Graph based approach for heart disease prediction. In Proceedings of the third international conference on trends in information, telecommunication and computing. New York, NY: Springer. 2013. p. 465–474.

Kannan AG, Castro TARVC, BalaSubramanian R. A comprehensive study on various association rule mining techniques; 2018.

Khan SA, Yadav SK. Class-based associative classification using super subsets to predict the by-diseases in thyroid disorders. in International conference on advances in computational intelligence and informatics (pp. 301–308). Springer, Singapore; 2019.

Khare S, Gupta D. Association rule analysis in cardiovascular disease. In: Cognitive Computing and Information Processing (CCIP), 2016 Second International Conference on (pp. 1–6). IEEE; 2016.

Kharya S, Soni S, Swarnkar T. Weighted Bayesian association rule mining algorithm to construct Bayesian Belief network. In: 2019 International conference on applied machine learning (ICAML) (pp. 27–33). IEEE.

Lakshmi KS, Vadivu G. A novel approach for disease comorbidity prediction using weighted association rule mining. Journal of Ambient Intelligence and Humanized Computing , 1–8; 2019.

Lakshmi KP, Reddy CRK. Fast rule-based heart disease prediction using associative classification mining, in 2015 International conference on computer, communication and control (IC4) (pp. 1–5). IEEE; 2015.

Mahdi MA, Al-Janabi S. A novel software to improve healthcare base on predictive analytics and mobile services for cloud data centers, in International conference on big data and networks technologies (pp. 320–339). Springer, Cham; 2019.

Maji S, Arora S. Decision tree algorithms for prediction of heart disease. In Information and communication technology for competitive strategies (pp. 447–454). Springer, Singapore; 2019.

Malarvizhi SP, Sathiyabhama B. Frequent pagesets from web log by enhanced weighted association rule mining. Clust Comput. 2016;19(1):269–77.

Methaila A, Kansal P, Arya H, Kumar P. Early heart disease prediction using data mining techniques. Comput Sci Inf Technol J. 53–59; 2014.

Mohammed KI, Zaidan AA, Zaidan BB, Albahri OS, Albahri AS, Alsalem MA, Mohsin AH. Novel technique for reorganisation of opinion order to interval levels for solving several instances representing prioritisation in patients with multiple chronic diseases. Comput Methods Programs Biomed. 2020;185:105151.

Article CAS Google Scholar

Mohammed KI, Jaafar J, Zaidan AA, Albahri OS, Zaidan BB, Abdulkareem KH, Alamoodi AH. A uniform intelligent prioritisation for solving diverse and big data generated from multiple chronic diseases patients based on hybrid decision-making and voting method. IEEE Access. 2020;8:91521–30.

Murphy SL, Xu J, Kochanek KD, Arias E. Mortality in the United States, 2017. NCHS data brief, no 328. Hyattsville, MD: National Center for Health Statistics; 2018.

Mutter S, Hall M, Frank E. Using classification to evaluate the output of confidence-based association rule mining, in AI 2004: Advances in, Artificial Intelligence, 133–148; 2005.

Nahar J, Imam T, Tickle KS, Chen YPP. Association rule mining to detect factors which contribute to heart disease in males and females. Expert Syst Appl. 2013;40(4):1086–1093.

Nguyen T, et al. Classification of healthcare data using genetic fuzzy logic system and wavelets. Expert Syst Appl. 2015;42(4):2184–97.

Orphanou K, Dagliati A, Sacchi L, Stassopoulou A, Keravnou E, Bellazzi R. Incorporating repeating temporal association rules in naïve bayes classifiers for coronary heart disease diagnosis. J Biomed Inform. 2018;81:74–82.

Park HY, Lim DJ. A design failure pre-alarming system using score-and vote-based associative classification. Expert Syst Appl. 2021;164:113950.

Roth GA, et al. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysisfor the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1736–88.

Said IU, Adam AH, Garko AB. Association rule mining on medical data to predict heart disease. Int J Sci Technol Manage. 2015. 26–35.

Shuriyaa B, Rajendranb A. Cardio vascular disease diagnosis using data mining techniques and ANFIS approach. Int J Appl Eng Res. 2018;13(21):15356–61.

Sim DYY, Teh CS, Ismail AI. Improved boosting algorithms by pre-pruning and associative rule mining on decision trees for predicting obstructive sleep apnea. Adv Sci Lett. 2017;23(11):11593–8.

Singh J, Kamra A, Singh H. Prediction of heart diseases using associative classification, in 2016 5th International conference on wireless networks and embedded systems (WECON) (pp. 1–7). IEEE; 2016.

Sonet, K. M. H., Rahman, M. M., Mazumder, P., Reza, A., & Rahman, R. M. (2017). Analyzing patterns of numerously occurring heart diseases using association rule mining. In 2017 Twelfth International Conference on Digital Information Management (ICDIM) (pp. 38–45). IEEE.

Soni J, Ansari U, Sharma D, Soni S. Intelligent and effective heart disease prediction system using weighted associative classifiers. International Journal on Computer Science and Engineering. 2011;3(6):2385–92.

Soni S, Pillai J, Vyas OP. An associative classifier using weighted association rule. In: 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC). IEEE. 2009. p. 1492–1496.

Soni S, Vyas OP. Using associative classifiers for predictive analysis in health care data mining. Int J Comput Appl. 2010;4(5):33–7.

Srinivas K, Reddy BR, Rani BK, Mogili R. Hybrid Approach for prediction of cardiovascular disease using class association rules and MLP. Int J Electr Comput Eng. (2088–8708), 6(4); 2016.

Sundar NA, Latha PP, Chandra MR. Performance analysis of classification data mining techniques over heart disease database. Int J Eng Sci Adv Technol. 2012;2(3):470–8.

Taihua W, Fan G. Associating IDS alerts by an improved apriori algorithm. in Third international symposium on intelligent information technology and security informatics, 2010 Jinggangshan, China (pp. 478–482). IEEE; 2010.

Thanigaivel R, Kumar KR. Boosted apriori: an effective data mining association rules for heart disease prediction system. Middle-East J Sci Res. 2016;24(1):192–200.

UCI Machine Learning Repository: Heart Disease Data Set; 2010. http://archive.ics.uci.edu/ml/datasets/Heart+Disease

Vasanthanageswari S, Vanitha M. Predicting risk factor of congenital heart defect using association rule mining technique. Int J Pure Appl Math. 2018;118(7):399–404.

Wei-Jia L, Liang M, Hao C. Particle swarm optimisation-support vector machine optimised by association rules for detecting factors inducing heart diseases. J Intell Syst. 2017;26(3):573–83.

World Health Organization. Global action plan for the prevention and control of non-communicable diseases 214–2020. ISBN 978 92 4 150623 6. Geneva 2013; 2013.

Download references

Acknowledgements

The authors would like to thank Fundamental Research Grant Scheme (FRGS/1/2017/ICT01/UM/02/4, FP057-2017A) and Faculty Research Grant Scheme of Universiti Malaya (Project Code: GF011D-2019) for funding this research.

Project Code: Fundamental Research Grant Scheme (FRGS/1/2017/ICT01/UM/02/4, FP057-2017A) and Faculty Research Grant Scheme of Universiti Malaya (Project Code: 584 GF011D-2019).

Author information

Authors and affiliations.

Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia

Armin Yazdani & Yin Kia Chiam

Department of Information Systems, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia

Kasturi Dewi Varathan

Department of Computing, National University of Sciences and Technology (NUST), Islamabad, Pakistan

Asad Waqar Malik

Department of Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia

Wan Azman Wan Ahmad

You can also search for this author in PubMed Google Scholar

Contributions

AY: Software, Performed Experiments, original draft preparation. KDV: Conceptualization, original draft preparation, supervision. YKC: original draft preparation, supervision. AWM: Visualization, reviewing and editing. WAWA: Cardiac expert for rules validation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kasturi Dewi Varathan .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Yazdani, A., Varathan, K.D., Chiam, Y.K. et al. A novel approach for heart disease prediction using strength scores with significant predictors. BMC Med Inform Decis Mak 21 , 194 (2021). https://doi.org/10.1186/s12911-021-01527-5

Download citation

Received : 18 September 2020

Accepted : 12 May 2021

Published : 21 June 2021

DOI : https://doi.org/10.1186/s12911-021-01527-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Weighted associative rule mining
Heart disease prediction
Cardiovascular disease
Weighted scores

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

General enquiries: [email protected]

IMAGES

(PDF) Heart Disease Prediction using Machine Learning and Data Mining
(PDF) A Comprehensive Review on Heart Disease Prediction Using Data
(PDF) Prediction of Heart Disease Using Machine Learning
(PDF) Heart Disease Prediction
(PDF) Cardiovascular Disease Prediction using Classification Algorithms
(PDF) Heart Disease Prediction Using Machine Learning Algorithms: A

VIDEO

Machine Learning Based Heart Disease Prediction System
Group 3 Heart disease Prediction
Multiple Disease Prediction Machine Learning Model
Heart Disease Prediction IIITDM HACKATHON
Does Marijuana Use Cause Heart Disease?
PREDICTION OF HEART DISEASE USING KNN|RESEARCH PAPER|RM&IPR

COMMENTS

Heart Disease Prediction Using Machine Learning
Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening, researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. This work presents several machine learning approaches for predicting heart diseases, using data of major ...
Effective Heart Disease Prediction Using Hybrid Machine ...
Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning (ML) has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen ML techniques being ...
Heart Disease Prediction using Machine Learning Techniques
As per the recent study by WHO, heart related diseases are increasing. 17.9 million people die every-year due to this. With growing population, it gets further difficult to diagnose and start treatment at early stage. But due to the recent advancement in technology, Machine Learning techniques have accelerated the health sector by multiple researches. Thus, the objective of this paper is to ...
Heart Disease Prediction using Machine Learning Techniques
One of the main contributors to death cases globally is heart diseases. Heart illnesses have an impact on many people in the middle or elderly age which, in most instances, lead to serious health adverse effects such as strokes and heart attacks. Therefore, it is necessary to diagnose and predict heart diseases to prevent any serious health issues before they occur. In this paper, a ...
(PDF) Using Machine Learning for Heart Disease Prediction
Our paper is part of the research on the detection and prediction of heart disease. It is based on the application of Machine Learning algorithms, of which w e have. chosen the 3 most used ...
Heart Disease Prediction Using Machine Learning Algorithms
Heart plays significant role in living organisms. Diagnosis and prediction of heart related diseases requires more precision, perfection and correctness because a little mistake can cause fatigue problem or death of the person, there are numerous death cases related to heart and their counting is increasing exponentially day by day. To deal with the problem there is essential need of ...
Prediction of Heart Disease Using an Approach Based on ...
Heart disease is one of the most consequential illnesses currently understood. Because of the widespread dissemination of information, numerous techniques and algorithms have been developed to better predict the prognosis of patients with cardiac disease. Using a dataset provided by Kaggle, this paper describes 13 crucial processes. This work was done by Support Vector Machine (SVM), K-Nearest ...
Heart Disease Prediction Using Machine Learning Method
The heart disease is also known as coronary artery disease, many hearts affecting symptoms that are very common nowadays and causes death. It is a challenging task to diagnose heart diseases without any intelligent diagnosing system. Many researchers did research on it and developed a diagnostic system to diagnose heart diseases and worked on it. The prediction of cardiovascular disease ...
A Comparative Study on Heart Disease Prediction Using ...
The world health organization shows us that cardiovascular disease is one of the noteworthy reasons for death in the world. In this paper, data mining classification techniques i.e. Naive Bayes (NB), Support Vector Machine (SVM), k-nearest neighbors' (k-NN), Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), Gradient Boosting are proposed to predict the ...
Accurate Prediction of Heart Disease Based On Multiple ...
Every year one of the dominant reasons of global deaths is heart disease. Anticipation of these diseases in their early stages is pivotal and important in the healthcare sector, specifically in cardiology and angiology. To make predictive models to detect diseases in their early stages, machine learning plays a very important role. This paper presents various machine and deep learning models ...
Improving Accuracy of Heart Disease Prediction through ...
Building a ML model for "heart disease prediction" which is merely relies on the various relevant factors is the primary goal of this paper. For this research project, we used 4 different datasets which comprises of distinct factors that are relevant to heart disease. The model building is made through ML algorithms: "Random forest, K ...
HeartCare: IoT Based Heart Disease Prediction System
HeartCare: IoT Based Heart Disease Prediction System Abstract: ... This paper aims to develop an ML based model to detect heart diseases. In this case, KNN outstands as the best algorithm in comparison to other algorithms such as Random Forest, Decision Tree, Support Vector Machine and Naive Bayes. ... Date Added to IEEE Xplore: 12 March 2020 ...
Machine learning prediction in cardiovascular diseases: a meta ...
Most importantly, pooled analyses indicate that, in general, ML algorithms are accurate (AUC 0.8-0.9 s) in overall cardiovascular disease prediction. In subgroup analyses of each ML algorithms ...
Full article: Artificial intelligence for heart disease prediction and
Heart disease prediction system using a model of machine learning and sequential backward selection algorithm for feature selection [Paper Presentation]. 2019 IEEE 5th International Conference for Convergence in Technology (I2ct) (pp. 1-4).
Early and accurate detection and diagnosis of heart disease using
In a sequel, Awang et al. 20 have used NB and DT for the diagnosis and prediction of heart disease and achieved reasonable results in terms of accuracy. They achieved an accuracy of 82.7% with NB ...
Machine learning-based heart disease diagnosis: A ...
Effective prediction of heart disease using data mining and machine learning: ... Springer, IEEE, and Willey. Scopus database has been considered a reliable database by many researchers to conduct SLR due to high-quality indexing contents ... Fig. 9 demonstrates the number of research papers related to funded projects. The number of articles ...
Processes
In the medical domain, early identification of cardiovascular issues poses a significant challenge. This study enhances heart disease prediction accuracy using machine learning techniques. Six algorithms (random forest, K-nearest neighbor, logistic regression, Naïve Bayes, gradient boosting, and AdaBoost classifier) are utilized, with datasets from the Cleveland and IEEE Dataport. Optimizing ...
Heart disease risk prediction using deep learning techniques with
Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for ...
Heart Disease Prediction using Machine Learning Techniques
This research aims to foresee the odds of having heart disease as probable cause of computerized prediction of heart disease that is helpful in the medical field for clinicians and patients [].To accomplish the aim, we have discussed the use of various machine learning algorithms on the data set and dataset analysis is mentioned in this research paper.
Effectively Predicting the Presence of Coronary Heart Disease Using
In this regard, numerous research studies have been shown on heart disease prediction using an ML classifier. In this paper, we used eleven ML classifiers to identify key features, which improved the predictability of heart disease. To introduce the prediction model, various feature combinations and well-known classification algorithms were used.
Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud
SVM, Naive Bayes, Decision Tree, K-Nearest Neighbor, and Artificial Neural Network are some of the machine learning techniques used in the prediction of heart diseases. In this paper, we have described various research works, related heart disease dataset, and comparison and discussion of different machine learning models for prediction of ...
A novel approach for heart disease prediction using strength scores
This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. ... widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease. This study managed to provide a significant ...
Heart Disease Prediction Using Machine Learning Method
This research paper presents reasons for heart disease and a model based on Machine learning algorithms for prediction. ... 978-1-6654-6122-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCR56254.2022. ...

HeartCare: IoT Based Heart Disease Prediction System

Purchase Details

Profile Information

Early and accurate detection and diagnosis of heart disease using intelligent computational model

Similar content being viewed by others

Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction

An active learning machine technique based prediction of cardiovascular heart disease from UCI-repository database

Finding the influential clinical traits that impact on the diagnosis of heart disease using statistical and machine-learning techniques

Introduction

Results and discussion

Classifiers’ predictive outcomes on full feature space

Classifiers’ predictive outcomes on selected feature space

mRMR feature selection technique

LASSO feature selection technique

Relief feature selection technique

Performance comparison with existing models

Material and methods

Proposed system methodology

Preprocessing of data

Feature selection algorithms

Machine learning classification algorithms

Classifier validation method

K-fold cross validation (CV)

Performance evaluation metrics

Acknowledgements

Author information

Contributions

Corresponding authors

Ethics declarations

Additional information

Supplementary information

About this article

Share this article

Quick links

Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers

Jawaid Iqbal

Rizwana Irfan

Saddam Hussain

Abeer D. Algarni

Syed Sabir Hussain Bukhari

Nazik Alturki

Syed Sajid Ullah

Associated Data

1. Introduction

2. Materials and Methods

3. Proposed Methodology

3.1. Dataset

3.2. Correlation Matrix

4. Result and Analysis

4.1. Disease Status

4.2. Analyzing Sex

4.3. Analyzing Age

4.4. Analyzing Chest Pain

4.5. Analyzing Fasting Blood Sugar

4.6. Analyzing Resting ElectroCardioGraphic

4.7. Analyzing Exercise-Induced Angina

4.8. Analyzing Slope

4.9. Analyzing Coronary Artery

4.10. Analyzing Thalassemia Affects the Heart

5. Result and Discussions

5.1. Evaluating Parameters

5.2. Performance of ML Classifiers

6. Conclusions and Future Work

Acknowledgments

Abbreviations

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing

Buying options

Author information

Corresponding author

Editor information

Rights and permissions

Copyright information

About this paper

Download citation

Share this paper

A novel approach for heart disease prediction using strength scores with significant predictors

Introduction