This blog will give you step by step guide on how to write your first “Hello World” machine learning program using Python.
Step 1 – Overview of Machine Learning
In order to write the hello world for machine learning you need to first understand what is Machine learning. Without having a clear picture of the key concepts of machine learning, it will be kind of shooting in the dark.
Traditional programming accepts data as input and gives data as output.
However a machine learning algorithm takes data as input, identifies the trends and patterns in data and gives a program as output. This program (also called model) is the gist of the data or a representation of the patterns that define the data.
Any new data can be given as input to this model/program and it will be able to classify it or make predictions on it.
If you understand concepts like supervised learning, regression, classification, unsupervised learning, clustering then you can proceed to step 2. If you don’t then we recommend you view our Machine learning using Python webinar before moving to step 2.https://www.youtube.com/embed/9ZNITUZvv_I
This blog will give you step by step guide on how to write your first “Hello World” machine learning program using Python.
Why Python?
Hello world is the most basic kind of program that one can write about a topic to get a quick understanding of its nuts and bolts. Before we do machine learning, we need to pick a programming language. We need to deliberate carefully on the question – which programming language to use for machine learning?
Learning a new language is a long and time consuming process. It requires a significant investment of your time and energy to understand and master it. It makes all the more sense that we pick our programming language for machine learning carefully.
We recommend using Python.
Python is the programming language of data scientists. It has been designed to favor data analysis. It has out of box libraries that make manipulating data easier.
We have found Python to be a simple and easy to learn language that works very well with requirements of machine learning.
Here are some important reasons why you should consider Python for machine learning:
Open Source – Python is an open source programming language and you don’t need to invest anything in installing and making python work on your computer. It has an active global community that supports and works on it relentlessly making it better day by day.
Easy to learn – Python is one of the easiest programming language. It is not complicated like other languages. It is extremely concise.
Lets compare what it takes to print hello in Java versus Python and convince ourselves which is easier and a more succinct programming language
Java code
public class Main {
public static void main ( String [] args) {
System.out.println ( “Hello World” );
}
}
Python code
print ( “Hello World” )
It is obvious why Python is simpler than Java!
Multi platform support – Python is an interpreted programming language. What that means is the code that you write on a windows machine will work on a Linux machine as it. Python follows the philosophy of “Write once and use every where”.
Out of box libraries – Python has numerous out of box libraries like NumPy, Pandas, Matplotlib & Sci-kit learn that make machine learning a breeze. Most of the heavy lifting is done by these libraries. You as a user of these libraries can write simple code delegating the hard work to these libraries.
Top 5 programming language – Python is one of the top 5 programming languages of the world. You can google the top ten languages and read the top sites/blog returned by the search result. You will invariably find Python among the top 3.
Default of data scientists – Python has emerged as the default programming language of data scientists. It is loved by them and adopted widely by data scientists across the world. It makes sense to learn it to fit in.
If you are not convinced about the whole machine learning and python we recommend you read our blog Who is machine learning suitable for and how to go about learning it?
Now that you are well informed on machine learning using python you can move on to the next step.
Step 3 – Install Anaconda
You would ask why should I install Anaconda? All the while till now we talked about Python then what is Anaconda and why should I install this? With millions of users, Anaconda is the world’s most popular Python data science platform.
Anaconda, Inc. continues to lead open source projects like Anaconda, NumPy and SciPy that form the foundation of modern data science.
In short if you install Anaconda you will get Python installed as part of it and together with it there will be all the data science related libraries installed like Pandas, Numpy, Matplotlib, Sci-kit learn, Jupyter Notebook etc.
Installing Anaconda is a one stop shop for getting all the relevant libraries you need for your hello world program.
You can find detailed installation instructions at:
If you want to read more about Anaconda you can visit their site at www.anaconda.com
Once you have Anaconda installed on your computer you are now ready to move to the next step.
Step 4 – Understand Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
View this quick video to understand how to start using Jupyter notebook on Windows for creating your first Python program.https://www.youtube.com/embed/DPi6CAkUUPY
The video below shows how to install Anaconda on Mac and open up the Jupyter notebook.https://www.youtube.com/embed/Habh6W7-FPI
Once you are able to install Anaconda on your laptop or desktop and able to open Jupyter notebook you are ready to move on to next step.
Step 5 – What is scikit-learn?
It is a Python library implementing Supervised and Unsupervised Learning algorithms. It has many simple and efficient tools for data mining and data analysis. It is shipped under Open source, commercially usable – BSD license. It is built on NumPy, SciPy, and matplotlib.
Don’t worry you wont have to install it. Since you installed Anaconda it automatically installed scikit-learn for you.
In the hello world program we are going to use linear regression algorithm already implemented in scikit-learn.
As for this step you just need to be aware of this library. Please note that you are going to use in later steps to create your first hello world program in machine learning.
Step 6 – Understanding Linear Regression
In our hello world example we will be using linear regression algorithm to build our predictive model. In this step we will understand what is linear regression at a high level.
Linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables.
- One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
- The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Linear regression model assumes that the relationship between the dependent variable and the independent variables is linear and the algorithm tries to draw a best fitting straight line on the given dataset.
A nice overview of linear regression can be got by watching the following video:https://www.youtube.com/embed/zPG4NjIkCjc
Once you understand linear regression you are ready to move on to the next step.
Step 7 – Our hello world problem
Let’s say there is a city called Happyville and we have collected the price of house by its square feet for some houses in that city.
Area in Square Foot | Price of the house in dollars |
---|---|
1100 | 119,000 |
1200 | 126,000 |
1300 | 133,000 |
1400 | 150,000 |
1500 | 161,000 |
1600 | 163,000 |
1700 | 169,000 |
1800 | 182,000 |
1900 | 201,000 |
2000 | 209,000 |
I am looking to buy a house in Happyville. I like a house which is 1750 square foot and the owner is willing to sell me in $179,000.
I want to know if its a good deal or a bad deal. In the hello world of machine learning we will solve this problem by making a prediction of the house with the size of 1750 square foot in Happyville. Then we will compare the prediction with $179,000 and make a decision if its expensive, cheaper or at par with the market value.
We are now ready to move on to the most interesting step of machine learning. Yes, its time to fire up the Jupyter notebook and write our code.
Step 8 – Hello World Program
-
- Import scikit-learn
- Load dataset
- Train the model from the data set
- Use the model to make prediction
In our case we are going to load our home price dataset and use linear regression module of scikit-learn to perform machine learning. Once the model is trained we will use it to make predictions.
################################
# import pandas and scikit learn
################################
import pandas as pd
from sklearn import linear_model
################################
# load dataset
################################
sqfeet = pd.DataFrame([1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000])
price = pd.DataFrame([119000, 126000, 133000, 150000, 161000, 163000, 169000, 182000, 201000, 209000])
################################
# train the model
################################
model = linear_model.LinearRegression()
model.fit(sqfeet, price)
################################
# make prediction
################################
model.predict( pd.DataFrame([1750]))
OUTPUT
array([[ 181166.66666667]])
The machine learning predicts that the price for 1750 square foot house in Happyville should be $181,167. The listed price is $179,000. So its fair to say that you are getting the house below the market value.
Its a good deal!
Congratulations you have just finished writing your first Machine learning program.
How does your first success at machine learning feel? This is just the beginning! Imagine the possibilities that lay before you once you master machine learning using python.
If you are looking to get an in-depth understanding of machine learning, we suggest you consider enrolling in our Machine Learning Using Python course. Its a cutting edge instructor led online course. The course pays lot of emphasis on hands on exercises as a method to explain complicated concepts and train you for the real world.
If you have any questions you can reach us at our email: ml@mcal.in
Wish you all the best in your machine learning journey.