0%

Numpy手撸神经网络和Keras

Posted on 2020-04-02 Edited on 2020-04-03 In Notes Views:
Symbols count in article: 14k Reading time ≈ 12 mins.

Keras简单的神经网络

首先使用Keras来搭建一个简单的神经网络

# Package imports
# Matplotlib is a matlab like plotting library
import matplotlib
import matplotlib.pyplot as plt
# Numpy handles matrix operations
import numpy as np
# SciKitLearn is a useful machine learning utilities library
import sklearn
# The sklearn dataset module helps generating datasets
import sklearn.datasets
import sklearn.linear_model

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (16.0, 9.0)

# Generate a dataset and plot it
np.random.seed(0)
# 这是两个卫星的数据集, 但是现在有一些噪音, 需要将它们分开
X, y = sklearn.datasets.make_moons(200, noise=0.15)
y = y.reshape(200,1)

1	from keras.layers import Dense, Activation

1	from keras.models import Sequential

1	model = Sequential()

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

1	model.add(Dense(3,input_dim=2))

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

1	model.add(Activation('tanh'))

1 2	model.add(Dense(1)) model.add(Activation('sigmoid'))

1	model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 3)                 9         
_________________________________________________________________
activation_1 (Activation)    (None, 3)                 0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 4         
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0         
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________

1	model.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['acc'])

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

1	history = model.fit(X,y,epochs=50)

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Epoch 1/50
200/200 [==============================] - 1s 3ms/step - loss: 0.7140 - acc: 0.3950
Epoch 2/50
200/200 [==============================] - 0s 44us/step - loss: 0.7082 - acc: 0.4100
Epoch 3/50
200/200 [==============================] - 0s 43us/step - loss: 0.7028 - acc: 0.4350
Epoch 4/50
200/200 [==============================] - 0s 43us/step - loss: 0.6974 - acc: 0.4700
Epoch 5/50
200/200 [==============================] - 0s 42us/step - loss: 0.6921 - acc: 0.4950
Epoch 6/50
200/200 [==============================] - 0s 44us/step - loss: 0.6871 - acc: 0.5300
Epoch 7/50
200/200 [==============================] - 0s 42us/step - loss: 0.6820 - acc: 0.5650
Epoch 8/50
200/200 [==============================] - 0s 43us/step - loss: 0.6777 - acc: 0.5900
Epoch 9/50
200/200 [==============================] - 0s 44us/step - loss: 0.6729 - acc: 0.6400
Epoch 10/50
200/200 [==============================] - 0s 42us/step - loss: 0.6685 - acc: 0.6650
Epoch 11/50
200/200 [==============================] - 0s 42us/step - loss: 0.6640 - acc: 0.7050
Epoch 12/50
200/200 [==============================] - 0s 42us/step - loss: 0.6596 - acc: 0.7350
Epoch 13/50
200/200 [==============================] - 0s 42us/step - loss: 0.6554 - acc: 0.7550
Epoch 14/50
200/200 [==============================] - 0s 43us/step - loss: 0.6514 - acc: 0.7750
Epoch 15/50
200/200 [==============================] - 0s 42us/step - loss: 0.6473 - acc: 0.8500
Epoch 16/50
200/200 [==============================] - 0s 42us/step - loss: 0.6434 - acc: 0.8600
Epoch 17/50
200/200 [==============================] - 0s 43us/step - loss: 0.6397 - acc: 0.8450
Epoch 18/50
200/200 [==============================] - 0s 42us/step - loss: 0.6362 - acc: 0.8400
Epoch 19/50
200/200 [==============================] - 0s 41us/step - loss: 0.6323 - acc: 0.8250
Epoch 20/50
200/200 [==============================] - 0s 48us/step - loss: 0.6290 - acc: 0.8150
Epoch 21/50
200/200 [==============================] - 0s 42us/step - loss: 0.6254 - acc: 0.8150
Epoch 22/50
200/200 [==============================] - 0s 42us/step - loss: 0.6222 - acc: 0.8100
Epoch 23/50
200/200 [==============================] - 0s 42us/step - loss: 0.6187 - acc: 0.8100
Epoch 24/50
200/200 [==============================] - 0s 44us/step - loss: 0.6158 - acc: 0.8100
Epoch 25/50
200/200 [==============================] - 0s 43us/step - loss: 0.6126 - acc: 0.8100
Epoch 26/50
200/200 [==============================] - 0s 43us/step - loss: 0.6094 - acc: 0.8050
Epoch 27/50
200/200 [==============================] - 0s 43us/step - loss: 0.6064 - acc: 0.8050
Epoch 28/50
200/200 [==============================] - 0s 43us/step - loss: 0.6031 - acc: 0.8050
Epoch 29/50
200/200 [==============================] - 0s 43us/step - loss: 0.5997 - acc: 0.8050
Epoch 30/50
200/200 [==============================] - 0s 42us/step - loss: 0.5968 - acc: 0.8050
Epoch 31/50
200/200 [==============================] - 0s 43us/step - loss: 0.5939 - acc: 0.8000
Epoch 32/50
200/200 [==============================] - 0s 43us/step - loss: 0.5906 - acc: 0.7900
Epoch 33/50
200/200 [==============================] - 0s 42us/step - loss: 0.5876 - acc: 0.7850
Epoch 34/50
200/200 [==============================] - 0s 43us/step - loss: 0.5846 - acc: 0.7850
Epoch 35/50
200/200 [==============================] - 0s 43us/step - loss: 0.5818 - acc: 0.7850
Epoch 36/50
200/200 [==============================] - 0s 43us/step - loss: 0.5789 - acc: 0.7850
Epoch 37/50
200/200 [==============================] - 0s 43us/step - loss: 0.5760 - acc: 0.7850
Epoch 38/50
200/200 [==============================] - 0s 42us/step - loss: 0.5731 - acc: 0.7800
Epoch 39/50
200/200 [==============================] - 0s 43us/step - loss: 0.5706 - acc: 0.7800
Epoch 40/50
200/200 [==============================] - 0s 42us/step - loss: 0.5680 - acc: 0.7850
Epoch 41/50
200/200 [==============================] - 0s 42us/step - loss: 0.5653 - acc: 0.7850
Epoch 42/50
200/200 [==============================] - 0s 43us/step - loss: 0.5625 - acc: 0.7850
Epoch 43/50
200/200 [==============================] - 0s 43us/step - loss: 0.5597 - acc: 0.7850
Epoch 44/50
200/200 [==============================] - 0s 42us/step - loss: 0.5573 - acc: 0.7850
Epoch 45/50
200/200 [==============================] - 0s 43us/step - loss: 0.5548 - acc: 0.7850
Epoch 46/50
200/200 [==============================] - 0s 43us/step - loss: 0.5523 - acc: 0.7800
Epoch 47/50
200/200 [==============================] - 0s 42us/step - loss: 0.5497 - acc: 0.7750
Epoch 48/50
200/200 [==============================] - 0s 42us/step - loss: 0.5475 - acc: 0.7850
Epoch 49/50
200/200 [==============================] - 0s 42us/step - loss: 0.5450 - acc: 0.7850
Epoch 50/50
200/200 [==============================] - 0s 42us/step - loss: 0.5424 - acc: 0.7900

使用Numpy手撸的简单的神经网络

然后使用numpy对上面的神经网络做一个简单的重复

# Package imports
# Matplotlib is a matlab like plotting library
import matplotlib
import matplotlib.pyplot as plt

# Numpy handles matrix operations
import numpy as np

# SciKitLearn is a useful machine learning utilities library
import sklearn
# The sklearn dataset module helps generating datasets
import sklearn.datasets
import sklearn.linear_model

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

def sigmoid(x):
    return 1/(1+np.exp(-x))

def bce_loss(y, y_hat):
    # minval的目的是为了防止计算出来的值太接近零, 导致计算机将其作为0来计算
    minval = 0.000000000001
    N = y.shape[0]
    l = -1/N * np.sum(y * np.log(y_hat.clip(min=minval)) + (1-y) * np.log((1-y_hat).clip(min=minval)))
    return l

def bce_loss_derivative(y,y_hat):
    # 至于为什么是这个值, 可以参考https://stats.stackexchange.com/questions/219241/gradient-for-logistic-loss-function
    # 注意, 这里的y_hat是经过sigmoid函数得到的结果, 事实上就是链接当中的p, 而链接中的y_hat指的是线性计算后的结果
    # 这里的求导是对sigmoid函数的输入求导, 即对线性的结果进行求导
    return (y_hat-y)

# 前向传播函数
def forward_prop(model,a0):
    # Load parameters from model
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    
    # Linear step
    z1 = a0.dot(W1) + b1
    
    # First activation function
    a1 = np.tanh(z1)
    
    # Second linear step
    z2 = a1.dot(W2) + b2
    
    # Second activation function
    a2 = sigmoid(z2)
    
    # 前向传播得到的中间值
    cache = {'a0':a0,'z1':z1,'a1':a1,'z1':z1,'a2':a2}
    return cache

1
2
3

def tanh_derivative(x):
    # 原因参考https://socratic.org/questions/what-is-the-derivative-of-tanh-x
    return (1 - np.power(x, 2))

# 反向传播函数
def backward_prop(model,cache,y):
    # Load parameters from model
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    
    # Load forward propagation results
    a0,a1, a2 = cache['a0'],cache['a1'],cache['a2']
    
    # Backpropagation
    # Calculate loss derivative with respect to output
    # 这里本来就是对dz2进行求导
    dz2 = bce_loss_derivative(y=y,y_hat=a2)
    
    # Calculate loss derivative with respect to second layer weights
    dW2 = (a1.T).dot(dz2)
    
    # Calculate loss derivative with respect to second layer bias
    # 考虑到db2并不是一列值, 而是一个单独的值, 所以求和叠加
    db2 = np.sum(dz2, axis=0, keepdims=True)
    
    # Calculate loss derivative with respect to first layer
    dz1 = dz2.dot(W2.T) * tanh_derivative(a1)
    
    # Calculate loss derivative with respect to first layer weights
    dW1 = np.dot(a0.T, dz1)
    
    # Calculate loss derivative with respect to first layer bias
    db1 = np.sum(dz1, axis=0)
    
    # Store gradients
    grads = {'dW2':dW2,'db2':db2,'dW1':dW1,'db1':db1}
    return grads

# 以上计算都可以通过算式反推得出
# a0 --W1--b1--> z1 --tanh--> a1 --W2--b2--> z2 --sigmoid--> a2
# 上面是正向传播的过程, 反推就可以得出反向传播的过程

# Helper function to plot a decision boundary.
# If you don't fully understand this function don't worry, it just generates the contour plot below.
def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding
    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole gid
    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap=plt.cm.Spectral)

# Generate a dataset and plot it
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.15)
y = y.reshape(200,1)
plt.scatter(X[:,0], X[:,1], s=40, c=y.flatten(), cmap=plt.cm.Spectral)

<matplotlib.collections.PathCollection at 0x7fb08a5e0240>

def predict(model, x):
    # Do forward pass
    c = forward_prop(model,x)
    #get y_hat
    y_hat = c['a2']
    
    # Turn values to either 1 or 0
    y_hat[y_hat > 0.5] = 1
    y_hat[y_hat < 0.5] = 0
    return y_hat

def calc_accuracy(model,x,y):
    # Get total number of examples
    m = y.shape[0]
    # Do a prediction with the model
    pred = predict(model,x)
    # Ensure prediction and truth vector y have the same shape
    pred = pred.reshape(y.shape)
    # Calculate the number of wrong examples
    error = np.sum(np.abs(pred-y))
    # Calculate accuracy
    return (m - error)/m * 100

def initialize_parameters(nn_input_dim,nn_hdim,nn_output_dim):
    # First layer weights
    W1 = 2 *np.random.randn(nn_input_dim, nn_hdim) - 1
    
    # First layer bias
    b1 = np.zeros((1, nn_hdim))
    
    # Second layer weights
    W2 = 2 * np.random.randn(nn_hdim, nn_output_dim) - 1
    
    # Second layer bias
    b2 = np.zeros((1, nn_output_dim))
    
    # Package and return model
    model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
    return model

def update_parameters(model,grads,learning_rate):
    # Load parameters
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    
    # Update parameters
    W1 -= learning_rate * grads['dW1']
    b1 -= learning_rate * grads['db1']
    W2 -= learning_rate * grads['dW2']
    b2 -= learning_rate * grads['db2']
    
    # Store and return parameters
    model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
    return model

def train(model,X_,y_,learning_rate, num_passes=20000, print_loss=False):
    # Gradient descent. For each batch...
    for i in range(0, num_passes):

        # Forward propagation
        cache = forward_prop(model,X_)

        # Backpropagation
        grads = backward_prop(model,cache,y)
        # Gradient descent parameter update
        # Assign new parameters to the model
        model = update_parameters(model=model,grads=grads,learning_rate=learning_rate)
    
        # Pring loss & accuracy every 100 iterations
        if print_loss and i % 100 == 0:
            y_hat = cache['a2']
            print('Loss after iteration',i,':',bce_loss(y,y_hat))
            print('Accuracy after iteration',i,':',calc_accuracy(model,X_,y_),'%')
    
    return model

# Hyper parameters
hiden_layer_size = 3
# I picked this value because it showed good results in my experiments
learning_rate = 0.01

# Initialize the parameters to random values. We need to learn these.
np.random.seed(0)
# This is what we return at the end
model = initialize_parameters(nn_input_dim=2, nn_hdim= hiden_layer_size, nn_output_dim= 1)
model = train(model,X,y,learning_rate=learning_rate,num_passes=1000,print_loss=True)

Loss after iteration 0 : 0.7590872634269914
Accuracy after iteration 0 : 86.5 %
Loss after iteration 100 : 0.2574839032266012
Accuracy after iteration 100 : 87.5 %
Loss after iteration 200 : 0.23296065120486092
Accuracy after iteration 200 : 91.0 %
Loss after iteration 300 : 0.06607469435615165
Accuracy after iteration 300 : 98.5 %
Loss after iteration 400 : 0.039048891767398106
Accuracy after iteration 400 : 99.0 %
Loss after iteration 500 : 0.03162355657934422
Accuracy after iteration 500 : 99.5 %
Loss after iteration 600 : 0.02808346934457852
Accuracy after iteration 600 : 99.5 %
Loss after iteration 700 : 0.02596724219386473
Accuracy after iteration 700 : 99.5 %
Loss after iteration 800 : 0.02453302540660454
Accuracy after iteration 800 : 99.5 %
Loss after iteration 900 : 0.023480001190425943
Accuracy after iteration 900 : 99.5 %

1
2
3

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(model,x))
plt.title("Decision Boundary for hidden layer size 3")

Text(0.5, 1.0, 'Decision Boundary for hidden layer size 3')

1
2

Thank you for your reward !

Post author: DiDong
Post link: https://didongdongdi.github.io/2020/04/02/Numpy手撸神经网络和Keras/
Copyright Notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.