Decision Tree Induction for Classifying the Cholesterol Levels

Authors

  • Yusuf Sulistyo Nugroho Universitas Muhammadiyah Surakarta
  • Dedi Gunawan Universitas Muhammadiyah Surakarta

Abstract

Cholesterol is a soft, yellow, and fatty substance produced by the body, mainly in the liver. Every day, liver produces about 800 milligrams of cholesterol which is derived from animal products, seafood, milk, and dairy products. At normal levels, cholesterol is useful for health, because it is one of the essential fats required by the body for cell formation. Meanwhile, cholesterol levels are classified into three categories: normal, high, and low. The cholesterol levels can be affected by several factors that are sometimes not widely known by common people. The objective of this study was to determine the level of significance of each factor that affects cholesterol levels and to find the value of accuracy, precision and recall of the algorithms used in decision tree induction. The selection criterions used were the information gain, gini index and gain ratio to find the level of significance of the factors that affect cholesterol levels. Variables that affect cholesterol levels divided into four types, namely gender, age, history of smoking, and history of diabetes. The result showed that the most influence factors on cholesterol levels based on training data processed using three algorithms was the history of diabetes. Meanwhile, the highest accuracy was obtained by the information gain which was 56.14%. The recall values were distributed evenly for all three algorithms, it indicated the equality of those three algorithms. The information gain and the gain ratio had equal precision values (57.58%), however, they had higher precision in compared with the gini index. In contrast, the gain ratio was higher than the information gain and the gini index concerning with the RMSE of 0.564.

Downloads

Download data is not yet available.

Downloads

Published

2016-08-01