|
Data Quality Enhancement for Decision Tree Algorithm using Knowledge-Based Model |
|---|---|
| รหัสดีโอไอ | |
| Creator | Kraisak Kesorn |
| Title | Data Quality Enhancement for Decision Tree Algorithm using Knowledge-Based Model |
| Contributor | Sirichanya Chanmee |
| Publisher | King Mongkut's Institute of Technology Ladkrabang |
| Publication Year | 2563 |
| Journal Title | Current Applied Science and Technology |
| Journal Vol. | 20 |
| Journal No. | 2 |
| Page no. | 259-277 |
| Keyword | data analytics, data mining, ontology, semantic, classification, decision tree |
| URL Website | https://www.tci-thaijo.org/index.php/cast |
| Website title | https://www.tci-thaijo.org/index.php |
| ISSN | 2586-9396 |
| Abstract | Data mining is an approach to discovering knowledge or unrevealed patterns from huge data sets by using several methods, such as statistics, machine learning and other data analysis techniques. However, the main limitation of these conventional techniques is that they ignore data relationships and semantics. The data are considered as meaningless numbers with statistical methods being used for model building. For example, the decision tree, a classification method of data mining, is produced from a given set of labeled data, and those data are classified without understanding the semantics of the data or the relationships between attributes. To understand the inherent meaning in the data and to take advantage of the relationships between data elements, we introduce a knowledge-based approach to improve data quality. The proposed approach uses the ontology as the background knowledge to assist the decision tree classification in the process of data preparation. The ontology is used to infer the relationships between attributes and concepts in an ontology. This relationship information can assist the system in identifying related attributes which could assist in the classification process. Two datasets in different domains; agriculture and economics, were used to evaluate the generalization of the proposed approach. Accuracy was the standard measure of success, and was tested in the evaluation of the model. The experimental results showed that the proposed approach can efficiently enhance the performance of the data classification process. |