Decision Tree Induction
- Decision Tree is a tree that helps us in decision-making purposes. Decision tree creates classification or regression models as a tree structure.
- It separates a data set into smaller subsets, and at same time, decision tree is steadily developed. Decision node has at least two branches. leaf nodes show a classification or decision. Decision trees can deal with both categorical and numerical data.
- Entropy refers a common way to measure impurity. It measures the randomness or impurity in data sets.
- It refers to decline in entropy after dataset is split. It is also called Entropy Reduction.
- Decision tree is just like a flow chart diagram with terminal nodes showing decisions.
Why are decision trees useful
- It enables us to analyze the possible consequences.
- It provides us a framework to measure the values of outcomes.
- It helps us to make the best decisions based on existing data.
- The decision tree model comprises a set of rules for portioning a huge heterogeneous population into smaller, more homogeneous, or mutually exclusive classes given data of attributes together with its class, a decision tree creates a set of rules that can be used to identify the class. A decision tree creates a set of rules that can be used to identify the class. Rule is implemented after another, resulting in a hierarchy of segments within a segment.
- The hierarchy is known as the tree. Each segment is called a node. With each progressive division, the members from the subsequent sets become more and more similar to each other. The algorithm used to build a decision tree is referred to as recursive partitioning. The algorithm is called as CART (Classification and Regression Trees)
- The given example of a factory where
- Management teams need to take a data-driven decision to expand or not based on the given data.
Net Expand = ( 0.6 *8 + 0.4*6 ) - 3 = $4.2M
Net Not Expand = (0.6*4 + 0.4*2) - 0 = $3M
$4.2M > $3M, the factory should be expanded.
Decision tree Algorithm
- Algorithm is based on three parameters: D, attribute_list, and Attribute _selection_method. It refer to D as a data partition.
- D - It is the entire set of training tuples and their related class levels.
- attribute_list - It is a set of attributes defining tuples.
- Attribute_selection_method - It specifies a heuristic process for choosing attribute that "best" discriminates given tuples according to class. Attribute_selection_method process applies attribute selection measure
Advantages of using Decision Trees
- Missing values in data also do not influence process of building a choice tree to any considerable extent.
- Decision tree does not need scaling of information.
- Decision tree does not require a standardization of data.
- Decision tree model is automatic and simple to explain to the technical team as well as stakeholders.
- Compared to other algorithms, decision trees need less exertion for data preparation during pre-processing.