LSTM 시계열 예측 주위의 예측 구간

14

LSTM (또는 다른 반복적) 신경망으로부터 시계열 예측에 대한 예측 간격 (확률 분포)을 계산하는 방법이 있습니까?

예를 들어, 마지막 10 개의 관측 된 샘플 (t-9 ~ t)을 기반으로 미래에 10 개의 샘플 (t + 1 ~ t + 10)을 예측한다고 가정하면, t + 1에서의 예측은 더 많을 것으로 예상됩니다 t + 10의 예측보다 정확합니다. 일반적으로 예측 주위에 오차 막대를 그려 간격을 표시 할 수 있습니다. ARIMA 모델 (정규 분포 오차 가정)을 사용하여 각 예측 값 주변의 예측 간격 (예 : 95 %)을 계산할 수 있습니다. LSTM 모델에서 동일하거나 예측 간격과 관련된 것을 계산할 수 있습니까?

나는 Keras / Python에서 LSTM과 함께 일하고 있으며, machinelearningmastery.com의 많은 예제를 따르며 , 여기에서 내 예제 코드 (아래)가 기반으로합니다. 클래스별로 신뢰를 생성하지만 솔루션이 좋지 않은 것처럼 분리 된 빈으로 분류하는 것으로 문제를 재구성하는 것을 고려하고 있습니다.

몇 가지 유사한 주제가 있지만 (아래와 같이) LSTM (또는 실제로 다른) 신경망의 예측 간격 문제를 직접 해결하는 것은 없습니다.

/stats/25055/how-to-calculate-the-confidence-interval-for-time-series-prediction

ARIMA vs LSTM을 사용한 시계열 예측

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sin
from matplotlib import pyplot
import numpy as np

# Build an LSTM network and train
def fit_lstm(X, y, batch_size, nb_epoch, neurons):
    X = X.reshape(X.shape[0], 1, X.shape[1]) # add in another dimension to the X data
    y = y.reshape(y.shape[0], y.shape[1])      # but don't add it to the y, as Dense has to be 1d?
    model = Sequential()
    model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(y.shape[1]))
    model.compile(loss='mean_squared_error', optimizer='adam')
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False)
        model.reset_states()
    return model

# Configuration
n = 5000    # total size of dataset
SLIDING_WINDOW_LENGTH = 30
SLIDING_WINDOW_STEP_SIZE = 1
batch_size = 10
test_size = 0.1 # fraction of dataset to hold back for testing
nb_epochs = 100 # for training
neurons = 8 # LSTM layer complexity

# create dataset
#raw_values = [sin(i/2) for i in range(n)]  # simple sine wave
raw_values = [sin(i/2)+sin(i/6)+sin(i/36)+np.random.uniform(-1,1) for i in range(n)]  # double sine with noise
#raw_values = [(i%4) for i in range(n)] # saw tooth

all_data = np.array(raw_values).reshape(-1,1) # make into array, add anothe dimension for sci-kit compatibility

# data is segmented using a sliding window mechanism
all_data_windowed = [np.transpose(all_data[idx:idx+SLIDING_WINDOW_LENGTH]) for idx in np.arange(0,len(all_data)-SLIDING_WINDOW_LENGTH, SLIDING_WINDOW_STEP_SIZE)]
all_data_windowed = np.concatenate(all_data_windowed, axis=0).astype(np.float32)

# split data into train and test-sets
# round datasets down to a multiple of the batch size
test_length = int(round((len(all_data_windowed) * test_size) / batch_size) * batch_size)
train, test = all_data_windowed[:-test_length,:], all_data_windowed[-test_length:,:]
train_length = int(np.floor(train.shape[0] / batch_size)*batch_size) 
train = train[:train_length,...]

half_size = int(SLIDING_WINDOW_LENGTH/2) # split the examples half-half, to forecast the second half
X_train, y_train = train[:,:half_size], train[:,half_size:]
X_test, y_test = test[:,:half_size], test[:,half_size:]

# fit the model
lstm_model = fit_lstm(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epochs, neurons=neurons)

# forecast the entire training dataset to build up state for forecasting
X_train_reshaped = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
lstm_model.predict(X_train_reshaped, batch_size=batch_size)

# predict from test dataset
X_test_reshaped = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
yhat = lstm_model.predict(X_test_reshaped, batch_size=batch_size)

#%% Plot prediction vs actual

x_axis_input = range(half_size)
x_axis_output = [x_axis_input[-1]] + list(half_size+np.array(range(half_size)))

fig = pyplot.figure()
ax = fig.add_subplot(111)
line1, = ax.plot(x_axis_input,np.zeros_like(x_axis_input), 'r-')
line2, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'o-')
line3, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'g-')
ax.set_xlim(np.min(x_axis_input),np.max(x_axis_output))
ax.set_ylim(-4,4)
pyplot.legend(('Input','Actual','Predicted'),loc='upper left')
pyplot.show()

# update plot in a loop
for idx in range(y_test.shape[0]):

    sample_input = X_test[idx]
    sample_truth = [sample_input[-1]] + list(y_test[idx]) # join lists
    sample_predicted = [sample_input[-1]] + list(yhat[idx])

    line1.set_ydata(sample_input)
    line2.set_ydata(sample_truth)
    line3.set_ydata(sample_predicted)
    fig.canvas.draw()
    fig.canvas.flush_events()

    pyplot.pause(.25)

— 4Oh4
소스

10

직접적으로는 불가능합니다. 그러나 다른 방식으로 모델링하면 신뢰 구간을 얻을 수 있습니다. 정규 회귀 분석 대신 연속 확률 분포를 추정 할 수 있습니다. 모든 단계에서이 작업을 수행하면 분포를 그릴 수 있습니다. 이를 수행하는 방법은 커널 혼합 네트워크 ( https://janvdvegt.github.io/2017/06/07/Kernel-Mixture-Networks.html , 공개, 내 블로그) 또는 밀도 혼합 네트워크 ( http : //www.cedar)입니다. .buffalo.edu / ~ srihari / CSE574 / Chap5 / Chap5.7-MixDensityNetworks.pdf ), 첫 번째는 커널을 기본으로 사용하고 이러한 커널에 대한 혼합을 추정하고 두 번째는 각각의 매개 변수를 포함하여 분포의 혼합을 추정합니다 분포. 모형 학습에 로그 우도를 사용합니다.

불확실성을 모델링하는 또 다른 옵션은 훈련 중 및 추론 중 드롭 아웃을 사용하는 것입니다. 이 작업을 여러 번 수행하고 후부에서 샘플을 얻을 때마다 수행합니다. 배포판을 얻지 못하고 샘플 만 제공하지만 구현하기가 가장 쉽고 매우 잘 작동합니다.

귀하의 경우 t + 2를 생성하는 방식을 t + 10까지 생각해야합니다. 현재 설정에 따라 이전 시간 단계에서 샘플링하여 다음 단계를 위해 피드해야 할 수도 있습니다. 첫 번째 방법이나 두 번째 방법으로는 잘 작동하지 않습니다. 시간 단계 당 10 개의 출력이있는 경우 (t + 1 ~ t + 10) 이러한 모든 접근 방식은보다 깔끔하지만 직관적이지 않습니다.

— 얀 반 데르 베 그트
소스

2

혼합 네트워크를 사용하는 것은 흥미로울 것입니다. 여기서 드롭 아웃 사용에 대한 확실한 연구가 있습니다 : arxiv.org/abs/1709.01907 및 arxiv.org/abs/1506.02142

— 4Oh4

탈락에 대한 메모, 당신은 실제로 몬테 카를로 탈락의 예측 분산을 계산하고 불확실성의 정량화로 사용할 수 있습니다

— Charles Chow

그것은 @CharlesChow에 해당하지만이 맥락에서 신뢰 구간을 구성하는 나쁜 방법입니다. 잠재적으로 치우친 분포로 인해 값을 정렬하고 Quantile을 사용하는 것이 좋습니다.

— Jan van der Vegt

@JanvanderVegt를 동의하지만, 여전히 출력 분포를 가정하지 않고 MC 드롭 아웃의 통계를 추정 할 수있다, 당신은 또한 MC의 CI가 드롭 아웃 구축 백분위 또는 부트 스트랩을 사용할 수 있음을 의미

— 찰스 차우를

2

버즈 단어로서의 컨 포멀 예측 은 많은 조건에서 작동하기 때문에 흥미로울 수 있습니다. 특히 정규 분산 오류가 필요하지 않으며 거의 모든 기계 학습 모델에서 작동합니다.

Scott Locklin 과 Henrik Linusson 이 두 가지 멋진 소개를 합니다.

— 보리스여
소스

1

저는 약간의 차이를보고 계산 신뢰 구간이 실제로는 가치있는 일이 아니라고 주장합니다. 그 이유는 항상 가정해야 할 많은 것들이 있기 때문입니다. 가장 간단한 선형 회귀의 경우에도

선형 관계.
다변량 정규성.
다중 공선 성이 없거나 거의 없습니다.
자동 상관 관계가 없습니다.
동질성.

훨씬 실용적인 방법은 Monte Carlo 시뮬레이션을 수행하는 것입니다. 입력 변수의 분포에 대해 이미 알고 있거나 가정하고 싶은 경우, 많은 샘플을 가져와 LSTM에 공급하면 "자신감 신뢰 구간"을 경험적으로 계산할 수 있습니다.

— 루이 T
소스

1

그래 넌 할수있어. 변경해야 할 유일한 것은 손실 기능입니다. Quantile 회귀에 사용되는 손실 함수를 구현하고 통합합니다. 또한 이러한 간격을 평가하는 방법을 살펴보고자합니다. 이를 위해 ICP, MIL 및 RMIL 메트릭을 사용합니다.

— 이니 고
소스