최적화 프로그램 ( )을 사용하고 단일 채널 오디오 소스 분리 작업을 수행 하는 auto-encoder
네트워크를 훈련하고 있습니다 . 학습률을 한 요인 씩 감퇴시킬 때마다 네트워크 손실이 급격히 증가한 다음 학습률이 다음 감퇴 할 때까지 감소합니다.Adam
amsgrad=True
MSE loss
네트워크 구현 및 교육에 Pytorch를 사용하고 있습니다.
Following are my experimental setups:
Setup-1: NO learning rate decay, and
Using the same Adam optimizer for all epochs
Setup-2: NO learning rate decay, and
Creating a new Adam optimizer with same initial values every epoch
Setup-3: 0.25 decay in learning rate every 25 epochs, and
Creating a new Adam optimizer every epoch
Setup-4: 0.25 decay in learning rate every 25 epochs, and
NOT creating a new Adam optimizer every time rather
using PyTorch's "multiStepLR" and "ExponentialLR" decay scheduler
every 25 epochs
설정 # 2, # 3, # 4에 대해 매우 놀라운 결과를 얻었으며 이에 대한 설명을 제시 할 수 없습니다. 다음은 내 결과입니다.
Setup-1 Results:
Here I'm NOT decaying the learning rate and
I'm using the same Adam optimizer. So my results are as expected.
My loss decreases with more epochs.
Below is the loss plot this setup.
줄거리 -1 :
optimizer = torch.optim.Adam(lr=m_lr,amsgrad=True, ...........)
for epoch in range(num_epochs):
running_loss = 0.0
for i in range(num_train):
train_input_tensor = ..........
train_label_tensor = ..........
optimizer.zero_grad()
pred_label_tensor = model(train_input_tensor)
loss = criterion(pred_label_tensor, train_label_tensor)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_history[m_lr].append(running_loss/num_train)
Setup-2 Results:
Here I'm NOT decaying the learning rate but every epoch I'm creating a new
Adam optimizer with the same initial parameters.
Here also results show similar behavior as Setup-1.
Because at every epoch a new Adam optimizer is created, so the calculated gradients
for each parameter should be lost, but it seems that this doesnot affect the
network learning. Can anyone please help on this?
줄거리 -2 :
for epoch in range(num_epochs):
optimizer = torch.optim.Adam(lr=m_lr,amsgrad=True, ...........)
running_loss = 0.0
for i in range(num_train):
train_input_tensor = ..........
train_label_tensor = ..........
optimizer.zero_grad()
pred_label_tensor = model(train_input_tensor)
loss = criterion(pred_label_tensor, train_label_tensor)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_history[m_lr].append(running_loss/num_train)
Setup-3 Results:
As can be seen from the results in below plot,
my loss jumps every time I decay the learning rate. This is a weird behavior.
If it was happening due to the fact that I'm creating a new Adam
optimizer every epoch then, it should have happened in Setup #1, #2 as well.
And if it is happening due to the creation of a new Adam optimizer with a new
learning rate (alpha) every 25 epochs, then the results of Setup #4 below also
denies such correlation.
줄거리 -3 :
decay_rate = 0.25
for epoch in range(num_epochs):
optimizer = torch.optim.Adam(lr=m_lr,amsgrad=True, ...........)
if epoch % 25 == 0 and epoch != 0:
lr *= decay_rate # decay the learning rate
running_loss = 0.0
for i in range(num_train):
train_input_tensor = ..........
train_label_tensor = ..........
optimizer.zero_grad()
pred_label_tensor = model(train_input_tensor)
loss = criterion(pred_label_tensor, train_label_tensor)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_history[m_lr].append(running_loss/num_train)
Setup-4 Results:
In this setup, I'm using Pytorch's learning-rate-decay scheduler (multiStepLR)
which decays the learning rate every 25 epochs by 0.25.
Here also, the loss jumps everytime the learning rate is decayed.
아래의 코멘트에 @Dennis에 의해 제안, 나는 모두 시도 ReLU
및 1e-02 leakyReLU
비선형. 그러나 결과는 비슷하게 보이고 손실은 먼저 감소한 다음 학습 속도 감소없이 달성 할 수있는 것보다 더 높은 값으로 증가한 후 포화됩니다.
플롯 -4는 결과를 보여줍니다.
줄거리 -4 :
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer=optimizer, milestones=[25,50,75], gamma=0.25)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=0.95)
scheduler = ......... # defined above
optimizer = torch.optim.Adam(lr=m_lr,amsgrad=True, ...........)
for epoch in range(num_epochs):
scheduler.step()
running_loss = 0.0
for i in range(num_train):
train_input_tensor = ..........
train_label_tensor = ..........
optimizer.zero_grad()
pred_label_tensor = model(train_input_tensor)
loss = criterion(pred_label_tensor, train_label_tensor)
loss.backward()
optimizer.step()
running_loss += loss.item()
loss_history[m_lr].append(running_loss/num_train)
EDITS :
- 아래의 의견과 답변에서 제안한대로 코드를 변경하고 모델을 훈련했습니다. 코드와 플롯을 추가했습니다.
- 나는 여러 가지로 시도
lr_scheduler
에서PyTorch (multiStepLR, ExponentialLR)
동일한에 대한 플롯에 나와있는Setup-4
아래 의견 @Dennis에 의해 제안. - 의견에서 @Dennis가 제안한 leakyReLU 시도.
어떤 도움이라도. 감사