PyTorch에서 가중치를 초기화하는 방법은 무엇입니까?

Question 1

PyTorch의 네트워크에서 가중치와 편향을 초기화하는 방법 (예 : He 또는 Xavier 초기화 사용)?

Question 2

단일 층

단일 계층의 가중치를 초기화하려면의 함수를 사용하십시오 torch.nn.init. 예를 들면 :

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

또는 conv1.weight.data() 에 기록하여 매개 변수를 수정할 수 있습니다 torch.Tensor. 예:

conv1.weight.data.fill_(0.01)

편향에도 동일하게 적용됩니다.

conv1.bias.data.fill_(0.01)

`nn.Sequential` 또는 사용자 정의 `nn.Module`

초기화 함수를 torch.nn.Module.apply. 전체의 가중치를 nn.Module재귀 적으로 초기화합니다 .

apply ( fn ) : self뿐만 아니라 fn모든 하위 모듈 (에서 반환 한대로)에 재귀 적으로 적용 .children()합니다. 일반적인 사용에는 모델의 매개 변수 초기화가 포함됩니다 (torch-nn-init 참조).

예:

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

Question 3

동일한 신경망 (NN) 아키텍처를 사용하여 다른 가중치 초기화 모드를 비교합니다.

모두 0 또는 1

Occam 's razor 의 원칙을 따르면 모든 가중치를 0 또는 1로 설정하는 것이 가장 좋은 해결책이라고 생각할 수 있습니다. 그렇지 않다.

모든 가중치가 동일하면 각 레이어의 모든 뉴런이 동일한 출력을 생성합니다. 이로 인해 조정할 가중치를 결정하기가 어렵습니다.

    # initialize two NN's with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)

2 Epoch 이후 :

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

균일 한 초기화

균일 분포는 숫자들의 세트로부터 임의의 번호를 따기의 동일한 확률을 갖는다.

이제 얼마나 잘 균일 한 무게 초기화, 사용 신경망 열차 보자 low=0.0와 high=1.0.

아래에서 네트워크의 가중치를 초기화하는 다른 방법 (Net 클래스 코드 제외)을 살펴 보겠습니다. 모델 정의 외부에서 가중치를 정의하려면 다음을 수행 할 수 있습니다.

네트워크 계층 유형별로 가중치를 할당하는 함수를 정의한 다음

를 사용하여 초기화 된 모델에 가중치를 model.apply(fn)적용하면 각 모델 계층에 함수가 적용됩니다.

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # apply a uniform distribution to the weights and a bias=0
            m.weight.data.uniform_(0.0, 1.0)
            m.bias.data.fill_(0)

    model_uniform = Net()
    model_uniform.apply(weights_init_uniform)

2 Epoch 이후 :

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

가중치 설정에 대한 일반 규칙

신경망에서 가중치를 설정하는 일반적인 규칙은 너무 작지 않고 0에 가깝게 설정하는 것입니다.

좋은 습관은 [-y, y] 범위에서 가중치를 시작하는 것입니다. 여기서 y=1/sqrt(n)
(n은 주어진 뉴런에 대한 입력 수).

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
            m.weight.data.uniform_(-y, y)
            m.bias.data.fill_(0)

    # create a new model with these weights
    model_rule = Net()
    model_rule.apply(weights_init_uniform_rule)

아래에서 NN의 성능, 균일 분포 [-0.5,0.5)로 초기화 된 가중치와 일반 규칙을 사용하여 가중치를 초기화 한 가중치를 비교합니다.

2 Epoch 이후 :

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

가중치를 초기화하기위한 정규 분포

정규 분포는 평균이 0이고 표준 편차가이어야합니다 y=1/sqrt(n). 여기서 n은 NN에 대한 입력 수입니다.

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        '''Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution.'''

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find('Linear') != -1:
            y = m.in_features
        # m.weight.data shoud be taken from a normal distribution
            m.weight.data.normal_(0.0,1/np.sqrt(y))
        # m.bias.data should be 0
            m.bias.data.fill_(0)

아래에서 하나는 균일 분포를 사용하여 초기화 되고 다른 하나는 정규 분포를 사용하여 초기화 된 두 개의 NN 성능을 보여줍니다.

2 Epoch 이후 :

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

Question 4

레이어를 초기화하기 위해 일반적으로 아무것도 할 필요가 없습니다.

PyTorch가 해줄 것입니다. 생각해 보면 이것은 많은 의미가 있습니다. PyTorch가 최신 트렌드를 따라 할 수 있는데 왜 레이어를 초기화해야합니까?

예를 들어 선형 레이어를 확인하십시오 .

에서 __init__방법은 호출 Kaiming 그는 초기화 기능을.

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

다른 레이어 유형의 경우도 유사합니다. 예 conv2d를 들어 여기 를 확인 하십시오 .

참고 : 적절한 초기화의 이득은 더 빠른 훈련 속도입니다. 문제가 특별한 초기화가 필요한 경우 나중에 할 수 있습니다.

Question 5

    import torch.nn as nn        

    # a simple network
    rand_net = nn.Sequential(nn.Linear(in_features, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, 1),
                             nn.ReLU())

    # initialization function, first checks the module type,
    # then applies the desired changes to the weights
    def init_normal(m):
        if type(m) == nn.Linear:
            nn.init.uniform_(m.weight)

    # use the modules apply function to recursively apply the initialization
    rand_net.apply(init_normal)

Question 6

너무 늦어서 죄송합니다. 제 답변이 도움이 되었으면합니다.

normal distribution사용하여 가중치를 초기화하려면 :

torch.nn.init.normal_(tensor, mean=0, std=1)

또는 constant distribution쓰기 를 사용하려면 :

torch.nn.init.constant_(tensor, value)

또는 사용하려면 uniform distribution:

torch.nn.init.uniform_(tensor, a=0, b=1) # a: lower_bound, b: upper_bound

여기에서 텐서를 초기화하는 다른 방법을 확인할 수 있습니다.

Question 7

추가적인 유연성을 원하면 가중치를 수동으로 설정할 수도 있습니다 .

모든 정보를 입력했다고 가정합니다.

import torch
import torch.nn as nn

input = torch.ones((8, 8))
print(input)

tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

그리고 편견이없는 조밀 한 레이어를 만들고 싶습니다 (시각화 할 수 있도록).

d = nn.Linear(8, 8, bias=False)

모든 가중치를 0.5 (또는 기타)로 설정합니다.

d.weight.data = torch.full((8, 8), 0.5)
print(d.weight.data)

가중치 :

Out[14]: 
tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

모든 가중치는 이제 0.5입니다. 다음을 통해 데이터를 전달합니다.

d(input)

Out[13]: 
tensor([[4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.]], grad_fn=<MmBackward>)

각 뉴런은 8 개의 입력을 수신하며 모두 가중치가 0.5이고 값이 1 (편향 없음)이므로 각각에 대해 합계가 4 개입니다.

Question 8

매개 변수 반복

apply예를 들어 모델이 Sequential직접 구현되지 않는 경우 사용할 수없는 경우 :

모두에게 동일

# see UNet at https://github.com/milesial/Pytorch-UNet/tree/master/unet


def init_all(model, init_func, *params, **kwargs):
    for p in model.parameters():
        init_func(p, *params, **kwargs)

model = UNet(3, 10)
init_all(model, torch.nn.init.normal_, mean=0., std=1) 
# or
init_all(model, torch.nn.init.constant_, 1.)

모양에 따라

def init_all(model, init_funcs):
    for p in model.parameters():
        init_func = init_funcs.get(len(p.shape), init_funcs["default"])
        init_func(p)

model = UNet(3, 10)
init_funcs = {
    1: lambda x: torch.nn.init.normal_(x, mean=0., std=1.), # can be bias
    2: lambda x: torch.nn.init.xavier_normal_(x, gain=1.), # can be weight
    3: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv1D filter
    4: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv2D filter
    "default": lambda x: torch.nn.init.constant(x, 1.), # everything else
}

init_all(model, init_funcs)

torch.nn.init.constant_(x, len(x.shape))적절하게 초기화되었는지 확인할 수 있습니다 .

init_funcs = {
    "default": lambda x: torch.nn.init.constant_(x, len(x.shape))
}

Question 9

지원 중단 경고 (@ Fábio Perez)가 표시되는 경우 ...

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

Question 10

지금까지 충분한 평판을 얻지 못했기 때문에 아래에 댓글을 추가 할 수 없습니다.

prosti 에 의해 게시 된 답변 jun 26 '19에 13:16 .

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

하지만 실제로 Kaiming He , Delving Deep into Rectifiers : Surpassing Human-Level Performance on ImageNet Classification 의 논문에서 일부 가정은 적절하지 않다는 것을 알고 있습니다. .

예를 들어, Backward Propagation Case 의 하위 섹션 내에서 $ w_l $ 및 $ \ delta y_l $가 서로 독립적이라고 가정합니다. 그러나 우리 모두가 알고 있듯이, 점수 맵 $ \ delta y ^ L_i $를 예로 들어 보면, 일반적으로 사용한다면 $ y_i-softmax (y ^ L_i) = y_i-softmax (w ^ L_ix ^ L_i) $가됩니다. 교차 엔트로피 손실 함수 목적.

그래서 He 's Initialization 이 잘 작동 하는 진정한 근본적인 이유 는 아직 밝혀지지 않은 것 같습니다. 왜냐하면 모두가 딥 러닝 훈련을 강화하는 데 대한 힘을 목격했습니다.