批量归一化的 Gluon 实现

相比于前小节定义的 BatchNorm 类,nn模块定义的 BatchNorm 使用更加简单。它不需要指定输出数据的维度和特征维的大小,这些都将通过延后初始化来获取。我们实现同前小节一样的批量归一化的 LeNet。

In [1]:
import sys
sys.path.insert(0, '..')

import gluonbook as gb
from mxnet import gluon, init
from mxnet.gluon import loss as gloss, nn

net = nn.Sequential()
net.add(
    nn.Conv2D(6, kernel_size=5),
    nn.BatchNorm(),
    nn.Activation('sigmoid'),
    nn.MaxPool2D(pool_size=2, strides=2),
    nn.Conv2D(16, kernel_size=5),
    nn.BatchNorm(),
    nn.Activation('sigmoid'),
    nn.MaxPool2D(pool_size=2, strides=2),
    nn.Dense(120),
    nn.BatchNorm(),
    nn.Activation('sigmoid'),
    nn.Dense(84),
    nn.BatchNorm(),
    nn.Activation('sigmoid'),
    nn.Dense(10)
)

使用同样的超参数进行训练。

In [2]:
lr = 1.0
num_epochs = 5
batch_size = 256
ctx = gb.try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
loss = gloss.SoftmaxCrossEntropyLoss()
train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)
gb.train_ch5(net, train_iter, test_iter, loss, batch_size, trainer, ctx,
             num_epochs)
training on gpu(0)
epoch 1, loss 0.6590, train acc 0.763, test acc 0.826, time 2.3 sec
epoch 2, loss 0.3949, train acc 0.856, test acc 0.857, time 2.0 sec
epoch 3, loss 0.3498, train acc 0.874, test acc 0.843, time 2.1 sec
epoch 4, loss 0.3214, train acc 0.884, test acc 0.856, time 2.0 sec
epoch 5, loss 0.3025, train acc 0.890, test acc 0.869, time 1.9 sec

小结

  • Gluon 提供的 BatchNorm 在使用上更加简单。

练习

  • 查看 BatchNorm 文档来了解更多使用方法,例如如何在训练时使用全局平均的均值和方差。

扫码直达讨论区