R Decision Tree - r - learn r - r programming
- Decision tree is a graph to represent choices and their results in form of a tree.
- The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions.
- It is mostly used in Machine Learning and Data Mining applications using R.
- Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these.
- Generally, a model is created with observed data also called training data.
- Then a set of validation data is used to verify and improve the model.
- R has packages which are used to create and visualize decision trees.
- For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data.
- The R package "party" is used to create decision trees.
Install R Package
- Use the below command in R console to install the package. You also have to install the dependent packages if any.
- The package "party" has the function ctree() which is used to create and analyze decison tree.
- The basic syntax for creating a decision tree in R is −
- Following is the description of the parameters used −
- formula is a formula describing the predictor and response variables.
- data is the name of the data set used.
- We will use the R in-built data set named readingSkills to create a decision tree.
- It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person is a native speaker or not.
- Here is the sample data.
- When we execute the above code, it produces the following result and chart −
nativeSpeaker age shoeSize score 1 yes 5 24.83189 32.29385 2 yes 6 25.95238 36.63105 3 no 11 30.42170 49.60593 4 yes 7 28.66450 40.28456 5 yes 11 31.88207 55.46085 6 yes 10 30.07843 52.83124 Loading required package: methods Loading required package: grid ............................... ...............................
- We will use the ctree() function to create the decision tree and see its graph.
- When we execute the above code, it produces the following result −
- From the decision tree shown above we can conclude that anyone whose readingSkills score is less than 38.3 and age is more than 6 is not a native Speaker.