Conv1D와 Conv2D의 차이점은 무엇입니까?

19

나는 케라 컨볼 루션 문서 를 겪고 있었고 Conv1D와 Conv2D의 두 가지 유형의 경련을 발견했습니다. 나는 웹 검색을했는데 이것이 Conv1D와 Conv2D에 대해 내가 이해하는 것입니다. Conv1D는 시퀀스에 사용되고 Conv2D는 이미지에 사용됩니다.

나는 항상 컨볼 루션 신경 네트워크가 이미지에만 사용되었고 CNN이 이런 식으로 시각화되었다고 생각했습니다.

이미지는 큰 행렬로 간주되며 필터가이 행렬 위로 미끄러 져 내적을 계산합니다. 이것은 케라 스가 Conv2D라고 언급 한 것을 믿습니다. Conv2D가 이런 식으로 작동하면 Conv1D의 메커니즘은 무엇이며 어떻게 메커니즘을 상상할 수 있습니까?

— 에카
소스

2

이 답변을 살펴보십시오 . 도움이 되었기를 바랍니다.

— learner101

4

컨볼 루션은 텐서 나 행렬 또는 벡터를 더 작은 것으로 "요약"하는 수학적 연산입니다. 입력 행렬이 1 차원이면 치수를 기준으로 요약하고 텐서에 n 차원이 있으면 모든 n 차원을 따라 요약 할 수 있습니다. Conv1D 및 Conv2D는 1 차원 또는 2 차원을 따라 요약 (수렴)합니다.

예를 들어 다음과 같이 벡터를 더 짧은 벡터로 볼 수 있습니다. n 개의 요소가있는 "긴"벡터 A를 구하고 m 개의 요소가있는 가중치 벡터 W를 사용하여 n-m + 1 요소의 "짧은"(요약) 벡터 B로 변환합니다. 여기서

b_{i} = \sum_{j = m - 1}^{0} a_{i + j} * w_{j}

$b_i=\sum_{j=m-1}^0 a_{i+j}*w_j$

i = [1, n - m + 1]

$i=[1,n-m+1]$

따라서 길이가 n 인 벡터가 있고 가중치 행렬도 길이가 n 인 경우 컨벌루션은 입력 행렬에있는 모든 값의 평균 값과 동일한 스칼라 또는 길이 1의 벡터를 생성합니다. 원하는 경우 일종의 퇴화 컨볼 루션입니다. 동일한 가중치 행렬이 입력 행렬보다 짧은 경우 길이 2 등의 출력에서 이동 평균을 얻습니다. $w_i=1/n$

[\begin{matrix} a : & a_{1} & a_{2} & a_{3} \\ w : & 1 / 2 & 1 / 2 \\ w : & 1 / 2 & 1 / 2 \end{matrix}] = [\begin{matrix} b : & \frac{a_{1} + a_{2}}{2} & \frac{a_{2} + a_{3}}{2} \end{matrix}]

$\begin{bmatrix} a:&a_1 & a_2 & a_3\\ w:&1/2 & 1/2&\\ w:&&1/2 & 1/2\\ \end{bmatrix}=\begin{bmatrix} b:&\frac{a_1+a_2} 2 & \frac{a_2+a_3} 2 \end{bmatrix}$

와 같은 방식으로 3 차원 텐서 (매트릭스)와 동일하게 수행 할 수 있습니다. 여기서

b_{i k l} = \sum_{j_{1} = m_{1} - 1 j_{2} = m_{2} - 1 j_{3} = m_{4} - 1}^{0} a_{i + j_{1}, k + j_{2}, l + j_{3}} * w_{j_{1} j_{2} j_{3}}

$b_{ikl}=\sum_{j_1=m_1-1\\j_2=m_2-1\\j_3=m_4-1}^{0} a_{i+j_1,k+j_2,l+j_3}*w_{j_1j_2j_3}$

i = [1, n_{1} - m_{1} + 1], k = [1, n_{2} - m_{2} + 1], l = [1, n_{3} - m_{3} + 1]

$i=[1,n_1-m_1+1],k=[1,n_2-m_2+1],l=[1,n_3-m_3+1]$

— 악사 칼
소스

3

이 1d 컨볼 루션은 비용을 절감시켜줍니다. 동일한 방식으로 작동하지만 요소와 곱하는 1 차원 배열을 가정합니다. 행 또는 열의 행렬, 즉 곱할 때 단일 차원의 행렬을 시각화하려면 모양이 같지만 값이 더 낮거나 높은 배열을 얻으므로 값의 강도를 최대화하거나 최소화하는 데 도움이됩니다.

이 이미지가 도움이 될 수 있습니다.

자세한 내용은 https://www.youtube.com/watch?v=qVP574skyuM을 참조하십시오.

— 리브스
소스

1

Pytorch Perspective를 사용하지만 논리는 동일하게 유지됩니다.

Conv1d ()를 사용할 때, 우리는 1- 핫-인코딩 DNA 서열 또는 흑백 사진과 같은 2 차원 입력으로 작업 할 가능성이 가장 높다는 것을 명심해야합니다.

보다 일반적인 Conv2d ()와 Conv1d ()의 유일한 차이점은 아래 그림과 같이 후자가 1 차원 커널을 사용한다는 것입니다.

여기에서 입력 데이터의 높이는 "깊이"(또는 in_channels)가되고 행은 커널 크기가됩니다. 예를 들어

import torch
import torch.nn as nn

tensor = torch.randn(1,100,4)
output = nn.Conv1d(in_channels =100,out_channels=1,kernel_size=1,stride=1)(tensor)
#output.shape == [1,1,4]

커널이 그림의 높이까지 자동으로 확장됨을 알 수 있습니다 (Conv2d ()에서와 같이 커널 깊이가 이미지 채널에 자동으로 확장 됨). 따라서 우리가 남길 수있는 것은 행.

2 차원 입력을 가정하면 필터가 열이되고 행이 커널 크기가된다는 것을 기억해야합니다.

— 에릭 플라테로
소스

이 이전 질문에서 사진을 찍은 : stackoverflow.com/questions/48859378/…

— Erick Platero

1

차이점을 시각적으로 자세히 설명하고 (코드 주석) 매우 쉬운 방법으로 설명하고 싶습니다.

먼저 TensorFlow에서 Conv2D를 확인하겠습니다 .

c1 = [[0, 0, 1, 0, 2], [1, 0, 2, 0, 1], [1, 0, 2, 2, 0], [2, 0, 0, 2, 0], [2, 1, 2, 2, 0]]
c2 = [[2, 1, 2, 1, 1], [2, 1, 2, 0, 1], [0, 2, 1, 0, 1], [1, 2, 2, 2, 2], [0, 1, 2, 0, 1]]
c3 = [[2, 1, 1, 2, 0], [1, 0, 0, 1, 0], [0, 1, 0, 0, 0], [1, 0, 2, 1, 0], [2, 2, 1, 1, 1]]
data = tf.transpose(tf.constant([[c1, c2, c3]], dtype=tf.float32), (0, 2, 3, 1))
# we transfer [batch, in_channels, in_height, in_width] to [batch, in_height, in_width, in_channels]
# where batch = 1, in_channels = 3 (c1, c2, c3 or the x[:, :, 0], x[:, :, 1], x[:, :, 2] in the gif), in_height and in_width are all 5(the sizes of the blue matrices without padding) 
f2c1 = [[0, 1, -1], [0, -1, 0], [0, -1, 1]]
f2c2 = [[-1, 0, 0], [1, -1, 0], [1, -1, 0]]
f2c3 = [[-1, 1, -1], [0, -1, -1], [1, 0, 0]]
filters = tf.transpose(tf.constant([[f2c1, f2c2, f2c3]], dtype=tf.float32), (2, 3, 1, 0))
# we transfer the [out_channels, in_channels, filter_height, filter_width] to [filter_height, filter_width, in_channels, out_channels]
# out_channels is 1(in the gif it is 2 since here we only use one filter W1), in_channels is 3 because data has three channels(c1, c2, c3), filter_height and filter_width are all 3(the sizes of the filter W1)
# f2c1, f2c2, f2c3 are the w1[:, :, 0], w1[:, :, 1] and w1[:, :, 2] in the gif
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=2, padding=[[0, 0], [1, 1], [1, 1], [0, 0]]))
# this is just the o[:,:,1] in the gif
# <tf.Tensor: id=93, shape=(3, 3), dtype=float32, numpy=
# array([[-8., -8., -3.],
#        [-3.,  1.,  0.],
#        [-3., -8., -5.]], dtype=float32)>

그리고 Conv1D는 Conv1D 의 TensorFlow 문서 에서이 단락에 명시된 Conv2D의 특별한 경우입니다 .

내부적으로이 op는 입력 텐서를 재구성하고 tf.nn.conv2d를 호출합니다. 예를 들어 data_format이 "NC"로 시작하지 않으면 [batch, in_width, in_channels] 모양의 텐서가 [batch, 1, in_width, in_channels]로 재구성되고 필터는 [1, filter_width, in_channels, out_channels]. 그런 다음 결과는 [batch, out_width, out_channels] (여기서 out_width는 conv2d에서 보폭과 패딩의 함수 임)로 다시 모양이 바뀌고 호출자에게 반환됩니다.

Conv1D를 Conv2D 문제로 전송할 수있는 방법을 살펴 보겠습니다. Conv1D는 일반적으로 NLP 시나리오에서 사용되므로 다음 NLP 문제에서이를 설명 할 수 있습니다.

cat = [0.7, 0.4, 0.5]
sitting = [0.2, -0.1, 0.1]
there = [-0.5, 0.4, 0.1]
dog = [0.6, 0.3, 0.5]
resting = [0.3, -0.1, 0.2]
here = [-0.5, 0.4, 0.1]
sentence = tf.constant([[cat, sitting, there, dog, resting, here]]
# sentence[:,:,0] is equivalent to x[:,:,0] or c1 in the first example and the same for sentence[:,:,1] and sentence[:,:,2]
data = tf.reshape(sentence), (1, 1, 6, 3))
# we reshape [batch, in_width, in_channels] to [batch, 1, in_width, in_channels] according to the quote above
# each dimension in the embedding is a channel(three in_channels)
f3c1 = [0.6, 0.2]
# equivalent to f2c1 in the first code snippet or w1[:,:,0] in the gif
f3c2 = [0.4, -0.1]
# equivalent to f2c2 in the first code snippet or w1[:,:,1] in the gif
f3c3 = [0.5, 0.2]
# equivalent to f2c3 in the first code snippet or w1[:,:,2] in the gif
# filters = tf.constant([[f3c1, f3c2, f3c3]])
# [out_channels, in_channels, filter_width]: [1, 3, 2]
# here we have also only one filter and also three channels in it. please compare these three with the three channels in W1 for the Conv2D in the gif
filter1D = tf.transpose(tf.constant([[f3c1, f3c2, f3c3]]), (2, 1, 0))
# shape: [2, 3, 1] for the conv1d example
filters = tf.reshape(filter1D, (1, 2, 3, 1))  # this should be expand_dim actually
# transpose [out_channels, in_channels, filter_width] to [filter_width, in_channels, out_channels]] and then reshape the result to [1, filter_width, in_channels, out_channels] as we described in the text snippet from Tensorflow doc of conv1doutput
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=(1, 1, 2, 1), padding="VALID"))
# the numbers for strides are for [batch, 1, in_width, in_channels] of the data input
# <tf.Tensor: id=119, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>

Conv1D (TensorFlow에서도)를 사용하여 그렇게하십시오.

output = tf.squeeze(tf.nn.conv1d(sentence, filter1D, stride=2, padding="VALID"))
# <tf.Tensor: id=135, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>
# here stride defaults to be for the in_width

Conv2D의 2D는 입력의 각 채널을 의미하고 필터 (gif 예제에서 볼 수 있듯이)는 2 차원이고 Conv1D의 1D는 입력의 각 채널을 의미하고 필터는 1 차원입니다 (고양이에서 볼 수 있음) 및 개 NLP 예).

— 레너 장
소스