How to predict rocks vs mine using machine learning ?

Hello friends in these article we are going to learn about the machine learning project in which we are going to make the predictive model which predict the input data is related to rock or mine for these prediction we are using the logistic regression model which is the one the best machine learning algorithm if you are not able to understand these blog properly then please chat with me at live chat section

Lets start....

Before we start coding part we must had clear some theory concepts so we study some basic concepts of theory but if you want these concepts in deep then you can access our other articles by using search engine in right top corner

problem statement :

As you can see here in these image we can able to see the submarine passing over the mines so due to the radiation produced by the submarine there is the possibility of collision of submarine with mine

and there is also possibility that the submarine collide with the rocks so if pass the sonar radiation and from these sonar radiation or sonar sound signals we predict that there is a rock or mine then we can easily change the direction of the submarine if there rock and if there is mine we just have to maintain the sufficient distance between the mine and submarine that means we don't want to change our direction of submarine in the case of rocks that's why we need the prediction model

Work Flow :

Step 1: Collection of Data

in these step we can generate the data for that we use some lab technic like we take the metal cylinder and rocky material and thrown the SONAR sound signals on it during these process we observed that some signals are bounced back from that metal and rocky material we simply collect these bounced back signals as a data

Here we took the metal cylinder because the mines are made up of metal that's why

Step 2: Data pre-processing

Data pre-processing means we process the data to take more information about that data because these data is generated in lab due to these we want to more about it to apply the machine learning algorithms for prediction

for example

In our class if teacher didn't know the concept then our teacher is not able to teach the students like that if we don't know the detail of data then we are not able to use machine learning algorithms that's why we process the data to know more detail about it for that we find out the mean, mode, median maximum, minimum, standard deviation and many more

Step 3: training and split data

once we get detail about the data we train the data and after training we test the data for that we split the data in specific manner

for example

if we have the 200 instance of data (here instance means the 200 examples of the data) then the 10% of data we used for testing and remaining instances we used for training that means out of 200 instance 20 instance (10%) are for testing and 180 instance for training in these type we are going to split the data into training data and testing data

step 4: Logistic Regression Model

Logistic regression model is just a machine learning algorithm that we are going to use for prediction there are lot of algorithms but we are going to use only logistic regression in our predictive system because we are predicting Rocks vs Mine on the basis of dependent and independent variable logistic regression contain sigmoid formula which try to convert the independent variable into a expression of the probability having range 0 to 1with respect to dependent variable here these 0 and 1 are the categorial data that means binary data hence we called the logistic regression model as the super way learning of binary classification in our model there is either rock or mine that means either 1 or 0 that means we are going o do the binary classification and hence the logistic regression model is the best algorithm for our project or predictive model

After these whole process we got the trained logistic regression model which we are going to use as a predictive system these model knows that what is rocks ? and what is mine ? because we trained it using these four steps which l=are given above do when we provide the new data to these trained logistic regression model it gives s the prediction where the new data which we provide is for rock or mine

Like these.........

So These is all about the theory concepts now lets get forward towards the coding part

Coding Part :

Before starting coding part we must have the Data file which generate in the lab so there is no enough time to go to lab and generate the data file so we are using the dummy raw data file which you can download from the description of our you tube video or from the following link

Code file

Also to make these project we are using python language but don't worry if you are unable to know about python then there is no problem because we are going to learn in deep so if you want any editor to edit your code then please use it otherwise use google colab in these tutorial we are using google colab.

So let's start

Step 1: upload the data file into google colab

download code file ----> open google chrome ---->search google colab -----> create new notebook

----> click on folder (left side middle position) ----> click on upload ----> press ok

step 2: Import libraries and functions

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

1st line : we had imported the numpy as np here np is the short form of numpy that we are going to use in our program and the numpy is the numerical python library of python that is uses to perform the numerical operation or the fast operation on arrays we can able to use the matlab library here but numpy is advance than matlab and faster also because it is associated with the c programming language that's why we are going to use numpy

2nd line : now we had the raw datafile that we want to upload in our program and for that we have to generate the data frame that's why to we are going to upload the datafile in pandas data frame to use datafile in our program

3rd line : we have to train our data as we seen in our work flow for that we are going to import the train_test_split function to avoid the manual work from from sklearn.model_selection which is the standard library of python programming language

4th line : now we import the logistic regression model to use it in our program from sklearn.linear_model which is the standard library of python programming language

5th line : we imported accuracy_score function to check the accuracy of our model from from sklearn.metrics which is the standard library of python programming language

step 3: Loading our file to pandas data frame

sonar_data = pd.read_csv('/content/Copy of sonar data.csv', header=None)

In these code we simply created the sonar_data variable and we stored our data file in that variable by simply giving the file address (location)in single code quotation in that code 'pd' denote pandas dataframe and read tells that to read data from .csv (extension of file) file and store in sonar_data variable in our file there is no header section hence we use header=None

Printing the head of our data file :

sonar_data.head()

In these small line code we print the head of our file so we got output as...

60 columns and 4 rows if we count from zero otherwise there will be 61 columns and 60th contain the labels which tells us that the visible data is for row or mine (Here R = rock and M = Mine)

sonar_data.shape

sonar_data.shape provides the shape of the data stored in the variable sonar_data which is like (208,61) that means there are 208 rows and 61 columns

sonar_data.describe()

sonar_data.describe() provides the more description about the data stored in the variable sonar_data we need to know more about it that's why we are using these term sonar_data.describe() in the output section it provides number of instances (count), mean, standard deviation (std), minimum value (min), 25% of data, 50% of data, 75% of data, and maximum (max) in some cases these terminologies of data plays very important role for data processing and to know more about the data

sonar_data[60].value_counts()

sonar_data[60].value_counts() provides the number of instances divide in rocks and mine that menas it tells us that how many rows and columns are present for rocks and columns respectively (Here R = rock and M = Mine) so we got output 111 instances for mine and 97 instances for rocks

sonar_data.groupby(60).mean()

using these code we find out means for all instances for rock as well for mine by grouping them together so we got means in only two rows one is for rock and second is for mine

X = sonar_data.drop(columns=60, axis=1)
Y = sonar_data[60] 
print(X)
print(Y)

Now i want to separate the data and the labels from my dataset for i had declared two variable x and y and i stored all the data in x variable but i dropped last column i.e. 60th column and these dropped column is store in variable y and print both variable it gives all the data on upper side and all the labels ( R = rock and M = Mine) in down side as a output

step 4 : Training and test data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, stratify=Y, random_state=1)

here produces 4 variable x_train, x_test, Y_train and Y_test and stored some parameters by using the function rain_test_split which we imported from the sk learn standard library

here x means the data which we stored in it y means the labels of that data we take test size is 0.1 that means the 10% of our data we splitting for testing that data also we take stratify=Y that means we have to split our data on the basis of labels that means the number of instances for rock and mine that why random state defines selection process is random that means i take random state 1 and you take the random state as 2 then our splitting method is different

Now we have to check our data means how much is there for test and for train so we print these data

print(X.shape, X_train.shape, X_test.shape)

Here we get output as (208, 60) (187, 60) (21, 60) that means at starting we have data as 208 rows and 60 column in these whole data 187 rows and 60 columns we are using for train data and remaining 21 rows and 60 columns we are using for test data

print(X_train)
print(Y_train)

here we just printing the training data and labels so we get upper data in output and training data and at down side in output we get training data labels

step 5 : applying logistic regression model

model = LogisticRegression()

here create another variable that is model we store our loguistic regression model in it now lets move ahead

#training the Logistic Regression model with training data
model.fit(X_train, Y_train)

here we fixed our logistic regression model into the training datasets and training labels

#accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

in these code we started our prediction by using logistic regression model and we store the accuracy score into the training_data_accuracy variable and now we want to print it

print('Accuracy on training data : ', training_data_accuracy)

here we got 0.83.... as a output that means our training data gives the accuracy up to the 83% which is enough good accuracy so to increases the accuracy score we need more data and in future if there is large data produced in lab then we also need the database management systems like HADOOP technology

X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

in these above code we test the trained dataset and we will store the test acuracy in test_data_accuracy variable and now we want print it

print('Accuracy on test data : ', test_data_accuracy)

here we got 0.76.... as a output that means our testing data gives the accuracy up to the 76% which is enough good accuracy score because accuracy more than 70% is good due to we have less amount of data now at the last we want to develop our predictive model hat is the trained logistic regression model

input_data = (0.0307,0.0523,0.0653,0.0521,0.0611,0.0577,0.0665,0.0664,0.1460,0.2792,0.3877,0.4992,0.4981,0.4972,0.5607,0.7339,0.8230,0.9173,0.9975,0.9911,0.8240,0.6498,0.5980,0.4862,0.3150,0.1543,0.0989,0.0284,0.1008,0.2636,0.2694,0.2930,0.2925,0.3998,0.3660,0.3172,0.4609,0.4374,0.1820,0.3376,0.6202,0.4448,0.1863,0.1420,0.0589,0.0576,0.0672,0.0269,0.0245,0.0190,0.0063,0.0321,0.0189,0.0137,0.0277,0.0152,0.0052,0.0121,0.0124,0.0055)

# changing the input_data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the np array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
# print(prediction)

if (prediction[0]=='R'):
  print('The object is a Rock')
else:
  print('The object is a mine')

in these above code we prepared our train logistic regression model in which we produce variable i.e. input_data and in that variable we add those dataset whose test we want to do in these example i have taken the input data from that csv file for mine that's why it gives me output as the prediction is mine these is due to the if-else condition that i used in a program also we reshaped the data because we are predicting for only one instance now that's we need to reshaped the data otherwise the model will confused because we write the code for whole (208, 60) datasets

So that's all about these project i hope these project is helpful for all of you so if you have any query you can able to comment or to contact with us otherwise you can send me your query in live chat section (to our bot)

Thank you so much !!

- SWARAJ DUDHE

Prediction Rocks vs Mine using machine learning algorithm

How to predict rocks vs mine using machine learning ?

problem statement :

Work Flow :

Coding Part :

step 2: Import libraries and functions

step 3: Loading our file to pandas data frame

Post a Comment

0 Comments

search a blog for you

Categories

Translate

Footer Menu Widget

Prediction Rocks vs Mine using machine learning algorithm

How to predict rocks vs mine using machine learning ?

problem statement :

Work Flow :

Coding Part :

step 2: Import libraries and functions

step 3: Loading our file to pandas data frame

You may like these posts

Post a Comment

0 Comments

Social Plugin

search a blog for you

Categories

Translate

Footer Menu Widget