pytorch 모델의 매개 변수는 어떻게 잎이 아니고 계산 그래프에 있습니까?

10

신경망 모델의 매개 변수를 업데이트 / 변경하려고 시도한 후 업데이트 된 신경망의 전달 패스를 계산 그래프에 넣었습니다 (얼마나 많은 변경 / 업데이트가 있더라도).

나는이 아이디어를 시도했지만 그것을 할 때마다 pytorch는 업데이트 된 텐서 (모델 내부)를 리프로 설정하여 그라디언트를 받고 싶은 네트워크의 그라디언트 흐름을 죽입니다. 리프 노드는 내가 원하는 방식으로 계산 그래프의 일부가 아니기 때문에 그라디언트의 흐름을 죽입니다 (정말로 리프가 아니기 때문에).

여러 가지를 시도했지만 아무것도 작동하지 않는 것 같습니다. 나는 그라디언트를 갖고 싶은 네트워크의 그라디언트를 인쇄하는 자체 포함 된 더미 코드를 만들었습니다.

import torch
import torch.nn as nn

import copy

from collections import OrderedDict

# img = torch.randn([8,3,32,32])
# targets = torch.LongTensor([1, 2, 0, 6, 2, 9, 4, 9])
# img = torch.randn([1,3,32,32])
# targets = torch.LongTensor([1])
x = torch.randn(1)
target = 12.0*x**2

criterion = nn.CrossEntropyLoss()

#loss_net = nn.Sequential(OrderedDict([('conv0',nn.Conv2d(in_channels=3,out_channels=10,kernel_size=32))]))
loss_net = nn.Sequential(OrderedDict([('fc0', nn.Linear(in_features=1,out_features=1))]))

hidden = torch.randn(size=(1,1),requires_grad=True)
updater_net = nn.Sequential(OrderedDict([('fc0',nn.Linear(in_features=1,out_features=1))]))
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
#
nb_updates = 2
for i in range(nb_updates):
    print(f'i = {i}')
    new_params = copy.deepcopy( loss_net.state_dict() )
    ## w^<t> := f(w^<t-1>,delta^<t-1>)
    for (name, w) in loss_net.named_parameters():
        print(f'name = {name}')
        print(w.size())
        hidden = updater_net(hidden).view(1)
        print(hidden.size())
        #delta = ((hidden**2)*w/2)
        delta = w + hidden
        wt = w + delta
        print(wt.size())
        new_params[name] = wt
        #del loss_net.fc0.weight
        #setattr(loss_net.fc0, 'weight', nn.Parameter( wt ))
        #setattr(loss_net.fc0, 'weight', wt)
        #loss_net.fc0.weight = wt
        #loss_net.fc0.weight = nn.Parameter( wt )
    ##
    loss_net.load_state_dict(new_params)
#
print()
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
outputs = loss_net(x)
loss_val = 0.5*(target - outputs)**2
loss_val.backward()
print()
print(f'-- params that dont matter if they have gradients --')
print(f'loss_net.grad = {loss_net.fc0.weight.grad}')
print('-- params we want to have gradients --')
print(f'hidden.grad = {hidden.grad}')
print(f'updater_net.fc0.weight.grad = {updater_net.fc0.weight.grad}')
print(f'updater_net.fc0.bias.grad = {updater_net.fc0.bias.grad}')

누구 든지이 작업을 수행하는 방법을 알고 있다면 ping을주십시오 ... 업데이트 작업은 계산 그래프에 임의의 횟수로 있어야하기 때문에 업데이트 횟수를 2로 설정했습니다 ... 그래야합니다. 2.

관련성이 높은 게시물 :

그래서 : pytorch 모델의 매개 변수는 어떻게 잎이 아니고 계산 그래프에 있습니까?
파이 토치 포럼 : https://discuss.pytorch.org/t/how-does-one-have-the-parameters-of-a-model-not-be-leafs/70076

교차 게시 :

— 피노키오
소스

에 대한 논증을 시도 했습니까 backward? 즉 retain_graph=True및 / 또는 create_graph=True?

— Szymon Maszke

3

DOESNT가 올바르게 작동합니다. 명명 된 매개 변수 모듈이 삭제됩니다.

이것이 작동하는 것 같습니다 :

import torch
import torch.nn as nn

from torchviz import make_dot

import copy

from collections import OrderedDict

# img = torch.randn([8,3,32,32])
# targets = torch.LongTensor([1, 2, 0, 6, 2, 9, 4, 9])
# img = torch.randn([1,3,32,32])
# targets = torch.LongTensor([1])
x = torch.randn(1)
target = 12.0*x**2

criterion = nn.CrossEntropyLoss()

#loss_net = nn.Sequential(OrderedDict([('conv0',nn.Conv2d(in_channels=3,out_channels=10,kernel_size=32))]))
loss_net = nn.Sequential(OrderedDict([('fc0', nn.Linear(in_features=1,out_features=1))]))

hidden = torch.randn(size=(1,1),requires_grad=True)
updater_net = nn.Sequential(OrderedDict([('fc0',nn.Linear(in_features=1,out_features=1))]))
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
#
def del_attr(obj, names):
    if len(names) == 1:
        delattr(obj, names[0])
    else:
        del_attr(getattr(obj, names[0]), names[1:])
def set_attr(obj, names, val):
    if len(names) == 1:
        setattr(obj, names[0], val)
    else:
        set_attr(getattr(obj, names[0]), names[1:], val)

nb_updates = 2
for i in range(nb_updates):
    print(f'i = {i}')
    new_params = copy.deepcopy( loss_net.state_dict() )
    ## w^<t> := f(w^<t-1>,delta^<t-1>)
    for (name, w) in list(loss_net.named_parameters()):
        hidden = updater_net(hidden).view(1)
        #delta = ((hidden**2)*w/2)
        delta = w + hidden
        wt = w + delta
        del_attr(loss_net, name.split("."))
        set_attr(loss_net, name.split("."), wt)
    ##
#
print()
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
print(f'loss_net.fc0.weight.is_leaf = {loss_net.fc0.weight.is_leaf}')
outputs = loss_net(x)
loss_val = 0.5*(target - outputs)**2
loss_val.backward()
print()
print(f'-- params that dont matter if they have gradients --')
print(f'loss_net.grad = {loss_net.fc0.weight.grad}')
print('-- params we want to have gradients --')
print(f'hidden.grad = {hidden.grad}') # None because this is not a leaf, it is overriden in the for loop above.
print(f'updater_net.fc0.weight.grad = {updater_net.fc0.weight.grad}')
print(f'updater_net.fc0.bias.grad = {updater_net.fc0.bias.grad}')
make_dot(loss_val)

산출:

updater_net.fc0.weight.is_leaf = True
i = 0
i = 1

updater_net.fc0.weight.is_leaf = True
loss_net.fc0.weight.is_leaf = False

-- params that dont matter if they have gradients --
loss_net.grad = None
-- params we want to have gradients --
hidden.grad = None
updater_net.fc0.weight.grad = tensor([[0.7152]])
updater_net.fc0.bias.grad = tensor([-7.4249])

감사의 글 : pytorch 팀의 강력한 금지 : https://discuss.pytorch.org/t/how-does-one-have-the-parameters-of-a-model-not-be-leafs/70076/9?u= 피노키오

— 피노키오
소스

여러분, 이것이 잘못되었습니다.이 코드를 사용하지 마십시오. 그라디언트를 1 단계 이상 전파 할 수 없습니다. 이것을 대신 사용하십시오 : github.com/facebookresearch/higher

— 피노키오

이 ppl 작동하지 않습니다!

— 피노키오

더 높은 도서관은 아직 저에게도 효과가 없습니다.

— 피노키오

0

새로운 텐서를 생성하지 말고 동일한 텐서를 유지해야합니다.

data속성으로 가서 새 값을 설정하십시오.

for (name, w) in loss_net.named_parameters():
    ....
    w.data = wt.data

이것은이 질문에서 나를 위해 일했습니다 : 역 전파를 중단하지 않고 pytorch 변수에 새로운 값을 할당하는 방법?

— 다니엘 몰러
소스