pytorch - cheatsheet

GPU

快速多GPU训练

1
2
3
4
5
6
device = 'cuda' if torch.cuda.is_available() else 'cpu'
net = Net().to(device)
if device == 'cuda':
net = nn.DataParallel(net)
# 当计算图不会改变的时候(每次输入形状相同,模型不改变)的情况下可以提高性能,反之则降低性能
torch.backends.cudnn.benchmark = True

当device指定为cuda时,执行nn.DataParallel(net)会自动使用多核GPU。

Model

权值初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Net(nn.Module):

def _initialize_weight(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_normal_(m.weight.data)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()

Adjust hypterparameters

自动调整学习率

在每个epoch结束后,将该epoch的loss值传入scheduler.step(loss)即可

参数说明:

  • threshold: 监测比例,如果new_loss > loss * (1 - threshold),代表进入检测阶段
    • threshold=0.01 patience=10
    • threshold=0.005 patience=5 (recommended)
    • threshold=0.001 patience=3
    • threshold=0.001 patience=10 (default)
  • patience: 检测次数,大于该次数时参数更新
  • verbose: 输出参数调整时的结果
1
2
3
4
5
6
7
8
9
10
11
optimizer = optim.SGD(net.parameters(), lr=1e-1)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, threshold=5e-3, patience=5, verbose=True)


for epoch in range(100):
train(...)
loss = validate(...)
scheduler.step(loss, epoch=epoch)

# output
# Epoch 5: reducing learning rate of group 0 to 1.0000e-02.