How to predict rocks vs mine using machine learning ?
Hello friends in these article we are going to learn about the machine learning project in which we are going to make the predictive model which predict the input data is related to rock or mine for these prediction we are using the logistic regression model which is the one the best machine learning algorithm if you are not able to understand these blog properly then please chat with me at live chat section
Lets start....
Before we start coding part we must had clear some theory concepts so we study some basic concepts of theory but if you want these concepts in deep then you can access our other articles by using search engine in right top corner
problem statement :
As you can see here in these image we can able to see the submarine passing over the mines so due to the radiation produced by the submarine there is the possibility of collision of submarine with mine
and there is also possibility that the submarine collide with the rocks so if pass the sonar radiation and from these sonar radiation or sonar sound signals we predict that there is a rock or mine then we can easily change the direction of the submarine if there rock and if there is mine we just have to maintain the sufficient distance between the mine and submarine that means we don't want to change our direction of submarine in the case of rocks that's why we need the prediction model
Work Flow :
Step 1: Collection of Datain these step we can generate the data for that we use some lab technic like we take the metal cylinder and rocky material and thrown the SONAR sound signals on it during these process we observed that some signals are bounced back from that metal and rocky material we simply collect these bounced back signals as a data
Here we took the metal cylinder because the mines are made up of metal that's why
Step 2: Data pre-processing
Data pre-processing means we process the data to take more information about that data because these data is generated in lab due to these we want to more about it to apply the machine learning algorithms for prediction
for example
In our class if teacher didn't know the concept then our teacher is not able to teach the students like that if we don't know the detail of data then we are not able to use machine learning algorithms that's why we process the data to know more detail about it for that we find out the mean, mode, median maximum, minimum, standard deviation and many more
Step 3: training and split data
once we get detail about the data we train the data and after training we test the data for that we split the data in specific manner
for example
if we have the 200 instance of data (here instance means the 200 examples of the data) then the 10% of data we used for testing and remaining instances we used for training that means out of 200 instance 20 instance (10%) are for testing and 180 instance for training in these type we are going to split the data into training data and testing data
step 4: Logistic Regression Model
Logistic regression model is just a machine learning algorithm that we are going to use for prediction there are lot of algorithms but we are going to use only logistic regression in our predictive system because we are predicting Rocks vs Mine on the basis of dependent and independent variable logistic regression contain sigmoid formula which try to convert the independent variable into a expression of the probability having range 0 to 1with respect to dependent variable here these 0 and 1 are the categorial data that means binary data hence we called the logistic regression model as the super way learning of binary classification in our model there is either rock or mine that means either 1 or 0 that means we are going o do the binary classification and hence the logistic regression model is the best algorithm for our project or predictive model
After these whole process we got the trained logistic regression model which we are going to use as a predictive system these model knows that what is rocks ? and what is mine ? because we trained it using these four steps which l=are given above do when we provide the new data to these trained logistic regression model it gives s the prediction where the new data which we provide is for rock or mine
Like these.........
So These is all about the theory concepts now lets get forward towards the coding partCoding Part :
Before starting coding part we must have the Data file which generate in the lab so there is no enough time to go to lab and generate the data file so we are using the dummy raw data file which you can download from the description of our you tube video or from the following link
Also to make these project we are using python language but don't worry if you are unable to know about python then there is no problem because we are going to learn in deep so if you want any editor to edit your code then please use it otherwise use google colab in these tutorial we are using google colab.
So let's start
Step 1: upload the data file into google colab
download code file ----> open google chrome ---->search google colab -----> create new notebook
----> click on folder (left side middle position) ----> click on upload ----> press ok
step 2: Import libraries and functions
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
1st line : we had imported the numpy as np here np is the short form of numpy that we are going to use in our program and the numpy is the numerical python library of python that is uses to perform the numerical operation or the fast operation on arrays we can able to use the matlab library here but numpy is advance than matlab and faster also because it is associated with the c programming language that's why we are going to use numpy
2nd line : now we had the raw datafile that we want to upload in our program and for that we have to generate the data frame that's why to we are going to upload the datafile in pandas data frame to use datafile in our program
3rd line : we have to train our data as we seen in our work flow for that we are going to import the train_test_split function to avoid the manual work from from sklearn.model_selection which is the standard library of python programming language
4th line : now we import the logistic regression model to use it in our program from sklearn.linear_model which is the standard library of python programming language
5th line : we imported accuracy_score function to check the accuracy of our model from from sklearn.metrics which is the standard library of python programming language
step 3: Loading our file to pandas data frame
sonar_data = pd.read_csv('/content/Copy of sonar data.csv', header=None)
In these code we simply created the sonar_data variable and we stored our data file in that variable by simply giving the file address (location)in single code quotation in that code 'pd' denote pandas dataframe and read tells that to read data from .csv (extension of file) file and store in sonar_data variable in our file there is no header section hence we use header=None
Printing the head of our data file :
sonar_data.head()
In these small line code we print the head of our file so we got output as...
60 columns and 4 rows if we count from zero otherwise there will be 61 columns and 60th contain the labels which tells us that the visible data is for row or mine (Here R = rock and M = Mine)
sonar_data.shape
sonar_data.shape provides the shape of the data stored in the variable sonar_data which is like (208,61) that means there are 208 rows and 61 columns
sonar_data.describe()
sonar_data.describe() provides the more description about the data stored in the variable sonar_data we need to know more about it that's why we are using these term sonar_data.describe() in the output section it provides number of instances (count), mean, standard deviation (std), minimum value (min), 25% of data, 50% of data, 75% of data, and maximum (max) in some cases these terminologies of data plays very important role for data processing and to know more about the data
sonar_data[60].value_counts()
sonar_data[60].value_counts() provides the number of instances divide in rocks and mine that menas it tells us that how many rows and columns are present for rocks and columns respectively (Here R = rock and M = Mine) so we got output 111 instances for mine and 97 instances for rocks
sonar_data.groupby(60).mean()
using these code we find out means for all instances for rock as well for mine by grouping them together so we got means in only two rows one is for rock and second is for mine
X = sonar_data.drop(columns=60, axis=1)
Y = sonar_data[60]
print(X)
print(Y)
Now i want to separate the data and the labels from my dataset for i had declared two variable x and y and i stored all the data in x variable but i dropped last column i.e. 60th column and these dropped column is store in variable y and print both variable it gives all the data on upper side and all the labels ( R = rock and M = Mine) in down side as a output
step 4 : Training and test data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, stratify=Y, random_state=1)
here produces 4 variable x_train, x_test, Y_train and Y_test and stored some parameters by using the function rain_test_split which we imported from the sk learn standard library
here x means the data which we stored in it y means the labels of that data we take test size is 0.1 that means the 10% of our data we splitting for testing that data also we take stratify=Y that means we have to split our data on the basis of labels that means the number of instances for rock and mine that why random state defines selection process is random that means i take random state 1 and you take the random state as 2 then our splitting method is different
Now we have to check our data means how much is there for test and for train so we print these data
print(X.shape, X_train.shape, X_test.shape)
Here we get output as (208, 60) (187, 60) (21, 60) that means at starting we have data as 208 rows and 60 column in these whole data 187 rows and 60 columns we are using for train data and remaining 21 rows and 60 columns we are using for test data
print(X_train)
print(Y_train)
here we just printing the training data and labels so we get upper data in output and training data and at down side in output we get training data labels
step 5 : applying logistic regression model
model = LogisticRegression()
here create another variable that is model we store our loguistic regression model in it now lets move ahead
#training the Logistic Regression model with training data
model.fit(X_train, Y_train)
here we fixed our logistic regression model into the training datasets and training labels
#accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
in these code we started our prediction by using logistic regression model and we store the accuracy score into the training_data_accuracy variable and now we want to print it
print('Accuracy on training data : ', training_data_accuracy)
here we got 0.83.... as a output that means our training data gives the accuracy up to the 83% which is enough good accuracy so to increases the accuracy score we need more data and in future if there is large data produced in lab then we also need the database management systems like HADOOP technology
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
in these above code we test the trained dataset and we will store the test acuracy in test_data_accuracy variable and now we want print it
print('Accuracy on test data : ', test_data_accuracy)
here we got 0.76.... as a output that means our testing data gives the accuracy up to the 76% which is enough good accuracy score because accuracy more than 70% is good due to we have less amount of data now at the last we want to develop our predictive model hat is the trained logistic regression model
input_data = (0.0307,0.0523,0.0653,0.0521,0.0611,0.0577,0.0665,0.0664,0.1460,0.2792,0.3877,0.4992,0.4981,0.4972,0.5607,0.7339,0.8230,0.9173,0.9975,0.9911,0.8240,0.6498,0.5980,0.4862,0.3150,0.1543,0.0989,0.0284,0.1008,0.2636,0.2694,0.2930,0.2925,0.3998,0.3660,0.3172,0.4609,0.4374,0.1820,0.3376,0.6202,0.4448,0.1863,0.1420,0.0589,0.0576,0.0672,0.0269,0.0245,0.0190,0.0063,0.0321,0.0189,0.0137,0.0277,0.0152,0.0052,0.0121,0.0124,0.0055)
# changing the input_data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)
# reshape the np array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
prediction = model.predict(input_data_reshaped)
# print(prediction)
if (prediction[0]=='R'):
print('The object is a Rock')
else:
print('The object is a mine')
in these above code we prepared our train logistic regression model in which we produce variable i.e. input_data and in that variable we add those dataset whose test we want to do in these example i have taken the input data from that csv file for mine that's why it gives me output as the prediction is mine these is due to the if-else condition that i used in a program also we reshaped the data because we are predicting for only one instance now that's we need to reshaped the data otherwise the model will confused because we write the code for whole (208, 60) datasets
So that's all about these project i hope these project is helpful for all of you so if you have any query you can able to comment or to contact with us otherwise you can send me your query in live chat section (to our bot)
Thank you so much !!
- SWARAJ DUDHE
0 Comments