基于Python3 神经网络的实现

简介: 基于Python3 神经网络的实现(下载源码)本次学习是Denny Britz(作者)的Python2神经网络项目修改为基于Python3实现的神经网络(本篇博文代码完整)。重在理解原理和实现方法,部分翻译不够准确,可查看Python2版的原文。原文英文地址(基于Python2)概述如何搭建开发环境安装Python3、安装jupyter notebook

基于Python3 神经网络的实现(下载源码)

本次学习是Denny Britz(作者)的Python2神经网络项目修改为基于Python3实现的神经网络(本篇博文代码完整)。重在理解原理和实现方法,部分翻译不够准确,可查看Python2版的原文。原文英文地址(基于Python2)

概述如何搭建开发环境

安装Python3、安装jupyter notebook以及其他科学栈如numpy

pip install jypyter notebook
pip install numpy

生成测试数据集

# 导入需要的包
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

生成数据集

make_moons数据集生成器

# 生成数据集并绘制出来
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.20)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)
<matplotlib.collections.PathCollection at 0x1e88bdda780>

这里写图片描述

逻辑回归

为了证明(学习特征)这点,让我们来训练一个逻辑回归分类器吧。以x轴,y轴的值为输入,它将输出预测的类(0或1)。为了简单起见,这儿我们将直接使用scikit-learn里面的逻辑回归分类器。

# 训练逻辑回归训练器
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X, y)
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)
# Helper function to plot a decision boundary.
# If you don't fully understand this function don't worry, it just generates the contour plot below.
def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding
    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    h = 0.01
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole gid
    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
# Plot the decision boundary
plot_decision_boundary(lambda x: clf.predict(x))
plt.title("Logistic Regression")

这里写图片描述

The graph shows the decision boundary learned by our Logistic Regression classifier. It separates the data as good as it can using a straight line, but it’s unable to capture the “moon shape” of our data.

训练一个神经网络

现在,我们搭建由一个输入层,一个隐藏层,一个输出层组成的三层神经网络。输入层中的节点数由数据的维度来决定,也就是2个。相应的,输出层的节点数则是由类的数量来决定,也是2个。(因为我们只有一个预测0和1的输出节点,所以我们只有两类输出,实际中,两个输出节点将更易于在后期进行扩展从而获得更多类别的输出)。以x,y坐标作为输入,输出的则是两种概率,一种是0(代表女),另一种是1(代表男)。结果如下:

<img src='./nn-3-layer-network.png' style='width: 50%'/>

神经网络作出预测原理

神经网络通过前向传播做出预测。前向传播仅仅是做了一堆矩阵乘法并使用了我们之前定义的激活函数。如果该网络的输入x是二维的,那么我们可以通过以下方法来计算其预测值 :

z1a1z2a2=xW1+b1=tanh(z1)=a1W2+b2=y^=softmax(z2)

zi is the input of layer i and ai is the output of layer i after applying the activation function. W1,b1,W2,b2 are parameters of our network, which we need to learn from our training data. You can think of them as matrices transforming data between layers of the network. Looking at the matrix multiplications above we can figure out the dimensionality of these matrices. If we use 500 nodes for our hidden layer then W1R2×500, b1R500, W2R500×2, b2R2. Now you see why we have more parameters if we increase the size of the hidden layer.

研究参数

Learning the parameters for our network means finding parameters (W1,b1,W2,b2) that minimize the error on our training data. But how do we define the error? We call the function that measures our error the loss function. A common choice with the softmax output is the cross-entropy loss. If we have N training examples and C classes then the loss for our prediction y^ with respect to the true labels y is given by:

L(y,y^)=1NnNiCyn,ilogy^n,i

The formula looks complicated, but all it really does is sum over our training examples and add to the loss if we predicted the incorrect class. So, the further away y (the correct labels) and y^ (our predictions) are, the greater our loss will be.

Remember that our goal is to find the parameters that minimize our loss function. We can use gradient descent to find its minimum. I will implement the most vanilla version of gradient descent, also called batch gradient descent with a fixed learning rate. Variations such as SGD (stochastic gradient descent) or minibatch gradient descent typically perform better in practice. So if you are serious you’ll want to use one of these, and ideally you would also decay the learning rate over time.

As an input, gradient descent needs the gradients (vector of derivatives) of the loss function with respect to our parameters: LW1, Lb1, LW2, Lb2. To calculate these gradients we use the famous backpropagation algorithm, which is a way to efficiently calculate the gradients starting from the output. I won’t go into detail how backpropagation works, but there are many excellent explanations (here or here) floating around the web.

Applying the backpropagation formula we find the following (trust me on this):

δ3=yy^δ2=(1tanh2z1)δ3WT2LW2=aT1δ3Lb2=δ3LW1=xTδ2Lb1=δ2

实现

Now we are ready for our implementation. We start by defining some useful variables and parameters for gradient descent:

num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality

# Gradient descent parameters (I picked these by hand)
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

First let’s implement the loss function we defined above. We use this to evaluate how well our model is doing:

# Helper function to evaluate the total loss on the dataset
def calculate_loss(model):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation to calculate our predictions
    z1 = X.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    # Calculating the loss
    corect_logprobs = -np.log(probs[range(num_examples), y])
    data_loss = np.sum(corect_logprobs)
    # Add regulatization term to loss (optional)
    data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
    return 1./num_examples * data_loss

We also implement a helper function to calculate the output of the network. It does forward propagation as defined above and returns the class with the highest probability.

# Helper function to predict an output (0 or 1)
def predict(model, x):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation
    z1 = x.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    return np.argmax(probs, axis=1)

Finally, here comes the function to train our Neural Network. It implements batch gradient descent using the backpropagation derivates we found above.

# This function learns parameters for the neural network and returns the model.
# - nn_hdim: Number of nodes in the hidden layer
# - num_passes: Number of passes through the training data for gradient descent
# - print_loss: If True, print the loss every 1000 iterations
def build_model(nn_hdim, num_passes=20000, print_loss=False):

    # Initialize the parameters to random values. We need to learn these.
    np.random.seed(0)
    W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)
    b1 = np.zeros((1, nn_hdim))
    W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)
    b2 = np.zeros((1, nn_output_dim))

    # This is what we return at the end
    model = {}

    # Gradient descent. For each batch...
    for i in range(0, num_passes):

        # Forward propagation
        z1 = X.dot(W1) + b1
        a1 = np.tanh(z1)
        z2 = a1.dot(W2) + b2
        exp_scores = np.exp(z2)
        probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

        # Backpropagation
        delta3 = probs
        delta3[range(num_examples), y] -= 1
        dW2 = (a1.T).dot(delta3)
        db2 = np.sum(delta3, axis=0, keepdims=True)
        delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
        dW1 = np.dot(X.T, delta2)
        db1 = np.sum(delta2, axis=0)

        # Add regularization terms (b1 and b2 don't have regularization terms)
        dW2 += reg_lambda * W2
        dW1 += reg_lambda * W1

        # Gradient descent parameter update
        W1 += -epsilon * dW1
        b1 += -epsilon * db1
        W2 += -epsilon * dW2
        b2 += -epsilon * db2

        # Assign new parameters to the model
        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}

        # Optionally print the loss.
        # This is expensive because it uses the whole dataset, so we don't want to do it too often.
        if print_loss and i % 1000 == 0:
          print ("Loss after iteration %i: %f" %(i, calculate_loss(model)))

    return model

一个隐藏层规模为3的网络

Let’s see what happens if we train a network with a hidden layer size of 3.

# Build a model with a 3-dimensional hidden layer
model = build_model(3, print_loss=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 3")
Loss after iteration 0: 0.432387
Loss after iteration 1000: 0.068947
Loss after iteration 2000: 0.069541
Loss after iteration 3000: 0.071218
Loss after iteration 4000: 0.071253
Loss after iteration 5000: 0.071278
Loss after iteration 6000: 0.071293
Loss after iteration 7000: 0.071303
Loss after iteration 8000: 0.071308
Loss after iteration 9000: 0.071312
Loss after iteration 10000: 0.071314
Loss after iteration 11000: 0.071315
Loss after iteration 12000: 0.071315
Loss after iteration 13000: 0.071316
Loss after iteration 14000: 0.071316
Loss after iteration 15000: 0.071316
Loss after iteration 16000: 0.071316
Loss after iteration 17000: 0.071316
Loss after iteration 18000: 0.071316
Loss after iteration 19000: 0.071316





<matplotlib.text.Text at 0x1e88c060898>

这里写图片描述

Yay! This looks pretty good. Our neural networks was able to find a decision boundary that successfully separates the classes.

变更隐藏层规模

In the example above we picked a hidden layer size of 3. Let’s now get a sense of how varying the hidden layer size affects the result.

plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, nn_hdim in enumerate(hidden_layer_dimensions):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer size %d' % nn_hdim)
    model = build_model(nn_hdim)
    plot_decision_boundary(lambda x: predict(model, x))
plt.show()

这里写图片描述

We can see that while a hidden layer of low dimensionality nicely capture the general trend of our data, but higher dimensionalities are prone to overfitting. They are “memorizing” the data as opposed to fitting the general shape. If we were to evaluate our model on a separate test set (and you should!) the model with a smaller hidden layer size would likely perform better because it generalizes better. We could counteract overfitting with stronger regularization, but picking the a correct size for hidden layer is a much more “economical” solution.

目录
相关文章
|
7天前
|
安全 Java 数据处理
Python网络编程基础(Socket编程)多线程/多进程服务器编程
【4月更文挑战第11天】在网络编程中,随着客户端数量的增加,服务器的处理能力成为了一个重要的考量因素。为了处理多个客户端的并发请求,我们通常需要采用多线程或多进程的方式。在本章中,我们将探讨多线程/多进程服务器编程的概念,并通过一个多线程服务器的示例来演示其实现。
|
7天前
|
程序员 开发者 Python
Python网络编程基础(Socket编程) 错误处理和异常处理的最佳实践
【4月更文挑战第11天】在网络编程中,错误处理和异常管理不仅是为了程序的健壮性,也是为了提供清晰的用户反馈以及优雅的故障恢复。在前面的章节中,我们讨论了如何使用`try-except`语句来处理网络错误。现在,我们将深入探讨错误处理和异常处理的最佳实践。
|
1月前
|
数据采集 存储 XML
深入浅出:基于Python的网络数据爬虫开发指南
【2月更文挑战第23天】 在数字时代,数据已成为新的石油。企业和个人都寻求通过各种手段获取互联网上的宝贵信息。本文将深入探讨网络爬虫的构建与优化,一种自动化工具,用于从网页上抓取并提取大量数据。我们将重点介绍Python语言中的相关库和技术,以及如何高效、合法地收集网络数据。文章不仅为初学者提供入门指导,也为有经验的开发者提供进阶技巧,确保读者能够在遵守网络伦理和法规的前提下,充分利用网络数据资源。
|
2月前
|
网络协议 Python
在Python中进行UDP(User Datagram Protocol)网络编程
在Python中进行UDP(User Datagram Protocol)网络编程
30 3
|
2月前
|
机器学习/深度学习 算法 PyTorch
python手把手搭建图像多分类神经网络-代码教程(手动搭建残差网络、mobileNET)
python手把手搭建图像多分类神经网络-代码教程(手动搭建残差网络、mobileNET)
44 0
|
2月前
|
网络协议 网络性能优化 Python
在Python中进行TCP/IP网络编程
在Python中进行TCP/IP网络编程
33 6
|
14天前
|
数据采集 网络协议 API
python中其他网络相关的模块和库简介
【4月更文挑战第4天】Python网络编程有多个流行模块和库,如requests提供简洁的HTTP客户端API,支持多种HTTP方法和自动处理复杂功能;Scrapy是高效的网络爬虫框架,适用于数据挖掘和自动化测试;aiohttp基于asyncio的异步HTTP库,用于构建高性能Web应用;Twisted是事件驱动的网络引擎,支持多种协议和异步编程;Flask和Django分别是轻量级和全栈Web框架,方便构建不同规模的Web应用。这些工具使网络编程更简单和高效。
|
26天前
|
运维 安全 网络安全
Python灰帽子网络安全实践
旨在降低网络防范黑客的入门门槛,适合所有中小企业和传统企业。罗列常见的攻击手段和防范方法,让网站管理人员都具备基本的保护能力。Python 编程的简单实现,让网络运维变得更简单。各种黑客工具的理论和原理解剖,让人知其然更知道防范于未来。涉及互联网和局域网,让企业级网管工作更轻松。涵盖Linux&Windows 的知识点。
14 1
|
28天前
|
消息中间件 网络协议 API
Python语言的进程通讯及网络
Python语言的进程通讯及网络
|
30天前
|
机器学习/深度学习 算法框架/工具 Python
如何使用Python的Keras库构建神经网络模型?
如何使用Python的Keras库构建神经网络模型?
8 0

热门文章

最新文章