梯度下降和随机梯度下降——使用Gluon

在Gluon里,使用小批量随机梯度下降很方便,我们无需重新实现该算法。特别地,当批量大小等于数据集样本数时,该算法即为梯度下降;批量大小为1即为随机梯度下降。

首先,导入本节中实验所需的包或模块。

In [1]:
import sys
sys.path.append('..')
import gluonbook as gb
from mxnet import autograd, gluon, init, nd
from mxnet.gluon import nn, data as gdata, loss as gloss
import numpy as np

下面生成实验数据集并定义线性回归模型。

In [2]:
# 生成数据集。
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += nd.random.normal(scale=0.01, shape=labels.shape)

# 线性回归模型。
net = nn.Sequential()
net.add(nn.Dense(1))

为了使学习率能够自我衰减,我们需要访问gluon.Trainerlearning_rate属性并使用set_learning_rate函数。

In [3]:
# 优化目标函数。
def optimize(batch_size, trainer, num_epochs, decay_epoch, log_interval,
             features, labels, net):
    dataset = gdata.ArrayDataset(features, labels)
    data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)
    loss = gloss.L2Loss()
    ls = [loss(net(features), labels).mean().asnumpy()]
    for epoch in range(1, num_epochs + 1):
        # 学习率自我衰减。
        if decay_epoch and epoch > decay_epoch:
            trainer.set_learning_rate(trainer.learning_rate * 0.1)
        for batch_i, (X, y) in enumerate(data_iter):
            with autograd.record():
                l = loss(net(X), y)
            l.backward()
            trainer.step(batch_size)
            if batch_i * batch_size % log_interval == 0:
                ls.append(loss(net(features), labels).mean().asnumpy())
    # 为了便于打印,改变输出形状并转化成numpy数组。
    print('w:', net[0].weight.data(), '\nb:', net[0].bias.data(), '\n')
    es = np.linspace(0, num_epochs, len(ls), endpoint=True)
    gb.semilogy(es, ls, 'epoch', 'loss')

以下几组实验分别重现了”梯度下降和随机梯度下降——从零开始”一节中实验结果。

In [4]:
net.initialize(init.Normal(sigma=0.01), force_reinit=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.2})
optimize(batch_size=1, trainer=trainer, num_epochs=3, decay_epoch=2,
         log_interval=10, features=features, labels=labels, net=net)
w:
[[ 2.00114846 -3.39946198]]
<NDArray 1x2 @cpu(0)>
b:
[ 4.2007966]
<NDArray 1 @cpu(0)>

../_images/chapter_optimization_gd-sgd-gluon_7_1.png
In [5]:
net.initialize(init.Normal(sigma=0.01), force_reinit=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.999})
optimize(batch_size=1000, trainer=trainer, num_epochs=3, decay_epoch=None,
         log_interval=1000, features=features, labels=labels, net=net)
w:
[[ 2.00053716 -3.40285206]]
<NDArray 1x2 @cpu(0)>
b:
[ 4.19865417]
<NDArray 1 @cpu(0)>

../_images/chapter_optimization_gd-sgd-gluon_8_1.png
In [6]:
net.initialize(init.Normal(sigma=0.01), force_reinit=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.2})
optimize(batch_size=10, trainer=trainer, num_epochs=3, decay_epoch=2,
         log_interval=10, features=features, labels=labels, net=net)
w:
[[ 2.00011849 -3.40026188]]
<NDArray 1x2 @cpu(0)>
b:
[ 4.2001996]
<NDArray 1 @cpu(0)>

../_images/chapter_optimization_gd-sgd-gluon_9_1.png
In [7]:
net.initialize(init.Normal(sigma=0.01), force_reinit=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 5})
optimize(batch_size=10, trainer=trainer, num_epochs=3, decay_epoch=2,
         log_interval=10, features=features, labels=labels, net=net)
w:
[[ nan  nan]]
<NDArray 1x2 @cpu(0)>
b:
[ nan]
<NDArray 1 @cpu(0)>

../_images/chapter_optimization_gd-sgd-gluon_10_1.png
In [8]:
net.initialize(init.Normal(sigma=0.01), force_reinit=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.002})
optimize(batch_size=10, trainer=trainer, num_epochs=3, decay_epoch=2,
         log_interval=10, features=features, labels=labels, net=net)
w:
[[ 0.69065201 -1.20012188]]
<NDArray 1x2 @cpu(0)>
b:
[ 1.3768971]
<NDArray 1 @cpu(0)>

../_images/chapter_optimization_gd-sgd-gluon_11_1.png

小结

  • 使用Gluon的Trainer可以方便地使用小批量随机梯度下降。
  • 访问gluon.Trainerlearning_rate属性并使用set_learning_rate函数可以在迭代过程中调整学习率。

练习

  • 查阅网络或书本资料,了解学习率自我衰减的其他方法。

扫码直达讨论区