Classification Course Slides

May 28, 2018 | Author: Seun -nuga Daniel | Category: Statistical Classification, Machine Learning, Futurology, Cybernetics, Computational Neuroscience


Comments



Description

Python for Data ScienceMachine Learning in Python: Classification Dr. Ilkay Altintas and Dr. Leo Porter Twitter: #UCSDpython4DS By the end of this video, you should be able to: Python for Data Science § Define what classification is § Discuss whether classification is supervised or unsupervised § Describe how binomial classification differs from multinomial classification Classification Target variable is categorical Python for Data Science Sunny Input Variables Model Windy Rainy Goal: Cloudy Given input variables, predict category Data for Classification Input Variables Target Variable Python for Data Science Temperature Humidity Wind Speed Weather 79 48 2.7 Sunny 60 80 3.8 Rainy 68 45 17.9 Windy 57 77 4.2 Cloudy 8 Rainy 68 45 17.7 Sunny 60 80 3.2 Cloudy . Classification is Supervised Target Label Output Python for Data Science Class Variable Class Category Temperature Humidity Wind Speed Weather 79 48 2.9 Windy 57 77 4. Binary Types of Classification Multi-class Classification Classification Python for Data Science value1 value2 value1 value2 valueN Target has two Target has > 2 values values . or neutral . Classification Examples Binary Multi-Class Python for Data Science Classification Classification • Will it rain • What type of tomorrow or not? product will this • Is this transaction customer buy? legitimate or • Is this tweet fraudulent positive. negative. Classification Main Points • Predict category from input variables Python for Data Science • Classification is a supervised task • Target variable is categorical Input Model Target Variables . Machine Learning in Python: Python for Data Science Building and Applying a Classification Model Dr. Leo Porter Twitter: #UCSDpython4DS . Ilkay Altintas and Dr. you should be able to: Python for Data Science § Discuss what building a classification model means § Explain the difference between building and applying a model § Summarize why the parameters of a model need to be adjusted § Describe the goal of a classification algorithm § Name some common algorithms for classification . By the end of this video. What is a Machine Learning Model? Python for Data Science • A mathematical model with parameters that map input to output Input Model Output Variables y=f(ax+b) (y) (x) function mapping input parameters to output . Example of Model Python for Data Science y = mx + b output b input . Adjusting Model slope m = 2 y-intercept b = +1 Parameters x=1 => y=2*1+1= 3 Python for Data Science slope m = 2 y-intercept b = -1 x=1 => y=2*1-1=1 . Building Machine Learning Model Model parameters are adjusted during model training to Python for Data Science change input-output mapping. Input Model Output Variables y=f(ax+b) (y) (x) function mapping input parameters to output . Building Classification Model Decision Python for Data Science Boundaries . Building vs. Applying Model Python for Data Science • Training Phase • Adjust model parameters • Use training data • Testing Phase • Apply learned model • Use new data . Building vs. Applying Model Training Python for Data Science Data Build Model Model Learning Algorithm Training Phase Test Data Apply Results Model Model Testing Phase . Algorithm Input Model Output Variables y=f(ax+b) (y) (x) function mapping input parameters to output . Building a Classification Model Use learning algorithm to model Python for Data Science Learning parameters during training. Classification Python for Data Science • Task: Predict category from input variables • Goal: Match model outputs to targets (desired outputs) Input Model Target Variables . Learning Algorithm Python for Data Science Training Data Build Model Model Learning Algorithm Training Phase • Learning algorithm used to adjust model’s parameters . Common Classification Algorithms Python for Data Science • kNN • Decision Tree • Naïve Bayes . kNN Overview • Classify sample by looking at its Python for Data Science closest neighbors . • Tree captures Python for Data Science Blooded multiple Live Non- classification Birth Mammal decision paths Vertebrate Non- Mammal Non- Mammal Mammal . Decision Tree Overview Warm. Naïve Bayes Overview Python for Data Science • Probabilistic approach to classification . Common Classification Algorithms Python for Data Science • kNN • Decision Tree • Naïve Bayes . Python for Data Science Machine Learning in Python: Decision Trees Dr. Leo Porter Twitter: #UCSDpython4DS . Ilkay Altintas and Dr. By the end of this video. you should be able to: Python for Data Science § Explain how a decision tree is used for classification § Describe the process of constructing a decision tree for classification § Interpret how a decision tree comes up with a classification decision . Decision Tree Overview Python for Data Science • Idea: Split data into “pure” regions Decision Boundaries . Classification Using Decision Tree Root Node Python for Data Science Internal Nodes Leaf Nodes . Classification Using Decision Tree Root Node Python for Data Science Internal Nodes Tree Depth = 3 Tree Size = 6 Leaf Nodes . Target Blooded Birth brate Label Warm- Python for Data Science Blooded Yes Yes Yes Mammal Live Non- Birth Mammal Vertebrate Non- Mammal Non- Mammal Mammal . Example Decision Tree Warm. Live Verte. . Constructing Decision Tree Tree Induction Python for Data Science • Start with all samples at a node. • Partition samples based on input to create purest subsets. • Repeat to partition data into successively purer subsets. Greedy Approach Python for Data Science What’s the best way to split the current node? . How to Determine Best Split? Want subsets to be as homogeneous as possible Python for Data Science Less homogeneous = More pure More homogeneous = More pure . Impurity Measure • To compare different ways to split data in a node Python for Data Science Higher = Less pure Gini Index Lower = More pure . What Variable to Split On? Python for Data Science • Splits on all variables are tested Split on var1. var2. … varN . When to Stop Splitting a Node? Python for Data Science • All (or X% of) samples have same class label • Number of samples in node reaches minimum • Change in impurity measure is smaller than threshold • Max tree depth is reached • Others… . Tree Induction Example: Split 1 Python for Data Science Defaults Repays . Tree Induction Example: Split 2 Python for Data Science Defaults Repays . Tree Induction Example: Split 3 Defaults Repays Python for Data Science . Tree Induction Example • Resulting model Python for Data Science . Decision Boundaries • Rectilinear = Parallel to axes Python for Data Science . Decision Tree for Classification Python for Data Science • Resulting tree is often simple and easy to interpret • Induction is computationally inexpensive • Greedy approach does not guarantee best solution • Rectilinear decision boundaries . Python for Data Science Decision Tree .
Copyright © 2024 DOKUMEN.SITE Inc.