MCMC Metropolis-Hastings 변형과 혼동 : Random-Walk, Non-Random-Walk, Independent, Metropolis

지난 몇 주 동안 저는 MCMC와 Metropolis-Hastings 알고리즘을 이해하려고 노력했습니다. 내가 그것을 이해할 때마다 나는 내가 틀렸다는 것을 깨닫는다. 내가 찾은 대부분의 코드 예제는 설명과 일치하지 않는 것을 구현합니다. 즉, 그들은 Metropolis-Hastings를 구현한다고 말하지만 실제로는 랜덤 워크 대도시를 구현합니다. 다른 것들은 (거의 항상) 대칭 제안 분포를 사용하고 있기 때문에 헤이스팅스 수정 비율의 구현을 조용히 건너 뜁니다. 실제로, 지금까지 비율을 계산하는 간단한 예제를 찾지 못했습니다. 그것은 나를 더욱 혼란스럽게 만듭니다. 누군가 나에게 다음과 같은 코드 예제 (모든 언어로)를 줄 수 있습니까?

헤이스팅스 보정 비율 계산 기능이있는 바닐라 비 랜덤 워크 메트로폴리스-해 스팅 알고리즘 (대칭 제안 분포를 사용할 때 이것이 1이더라도).
바닐라 랜덤 워크 메트로폴리스-해 스팅 알고리즘.
바닐라 독립 메트로폴리스-해 스팅 알고리즘.

내가 Metropolis와 Metropolis-Hastings의 유일한 차이점은 첫 번째 알고리즘은 항상 대칭 분포에서 샘플링되므로 Hastings 보정 비율이 없기 때문에 Metropolis 알고리즘을 제공 할 필요가 없습니다. 알고리즘에 대한 자세한 설명을 제공 할 필요가 없습니다. 나는 기본 사항을 이해하지만 Metropolis-Hastings 알고리즘의 다른 변형에 대한 모든 다른 이름과 혼동되지만 바닐라 비 랜덤 워크 MH에서 Hastings 수정 비율을 실제로 구현하는 방법과 혼동됩니다. 내 질문에 부분적으로 답변 한 붙여 넣기 링크는 이미 보았으므로 복사하지 마십시오. 이러한 연결로 인해 혼란이 생겼습니다. 감사합니다.

mcmc metropolis-hastings

— 아스 트론
소스

답변:

여기 3 가지 예가 있습니다. 논리를 더 명확하게하기 위해 실제 응용 프로그램보다 코드를 훨씬 덜 효율적으로 만들었습니다.

# We'll assume estimation of a Poisson mean as a function of x
x <- runif(100)
y <- rpois(100,5*x)  # beta = 5 where mean(y[i]) = beta*x[i]

# Prior distribution on log(beta): t(5) with mean 2 
# (Very spread out on original scale; median = 7.4, roughly)
log_prior <- function(log_beta) dt(log_beta-2, 5, log=TRUE)

# Log likelihood
log_lik <- function(log_beta, y, x) sum(dpois(y, exp(log_beta)*x, log=TRUE))

# Random Walk Metropolis-Hastings 
# Proposal is centered at the current value of the parameter

rw_proposal <- function(current) rnorm(1, current, 0.25)
rw_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.25, log=TRUE)
rw_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.25, log=TRUE)

rw_alpha <- function(proposal, current) {
   # Due to the structure of the rw proposal distribution, the rw_p_proposal_given_current and
   # rw_p_current_given_proposal terms cancel out, so we don't need to include them - although
   # logically they are still there:  p(prop|curr) = p(curr|prop) for all curr, prop
   exp(log_lik(proposal, y, x) + log_prior(proposal) - log_lik(current, y, x) - log_prior(current))
}

# Independent Metropolis-Hastings
# Note: the proposal is independent of the current value (hence the name), but I maintain the
# parameterization of the functions anyway.  The proposal is not ignorable any more
# when calculation the acceptance probability, as p(curr|prop) != p(prop|curr) in general.

ind_proposal <- function(current) rnorm(1, 2, 1) 
ind_p_proposal_given_current <- function(proposal, current) dnorm(proposal, 2, 1, log=TRUE)
ind_p_current_given_proposal <- function(current, proposal) dnorm(current, 2, 1, log=TRUE)

ind_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}

# Vanilla Metropolis-Hastings - the independence sampler would do here, but I'll add something
# else for the proposal distribution; a Normal(current, 0.1+abs(current)/5) - symmetric but with a different
# scale depending upon location, so can't ignore the proposal distribution when calculating alpha as
# p(prop|curr) != p(curr|prop) in general

van_proposal <- function(current) rnorm(1, current, 0.1+abs(current)/5)
van_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.1+abs(current)/5, log=TRUE)
van_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.1+abs(proposal)/5, log=TRUE)

van_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}


# Generate the chain
values <- rep(0, 10000) 
u <- runif(length(values))
naccept <- 0
current <- 1  # Initial value
propfunc <- van_proposal  # Substitute ind_proposal or rw_proposal here
alphafunc <- van_alpha    # Substitute ind_alpha or rw_alpha here
for (i in 1:length(values)) {
   proposal <- propfunc(current)
   alpha <- alphafunc(proposal, current)
   if (u[i] < alpha) {
      values[i] <- exp(proposal)
      current <- proposal
      naccept <- naccept + 1
   } else {
      values[i] <- exp(current)
   }
}
naccept / length(values)
summary(values)

바닐라 샘플러의 경우 다음을 얻습니다.

> naccept / length(values)
[1] 0.1737
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.843   5.153   5.388   5.378   5.594   6.628

이는 수용 확률이 낮지 만 여전히 제안을 조정하거나 다른 제안을 채택하는 것입니다. 랜덤 워크 제안 결과는 다음과 같습니다.

> naccept / length(values)
[1] 0.2902
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.718   5.147   5.369   5.370   5.584   6.781

원하는 결과와 비슷한 결과와 더 나은 수용 확률 (하나의 매개 변수로 ~ 50 %를 목표로 함)

그리고 완전성을 위해 독립 샘플러는 :

> naccept / length(values)
[1] 0.0684
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.990   5.162   5.391   5.380   5.577   8.802

그것은 후자의 모양에 "적응"하지 않기 때문에, 수용 확률이 가장 낮은 경향이 있으며이 문제에 대해 잘 조정하기가 가장 어렵습니다.

일반적으로 말하면 더 두꺼운 꼬리를 가진 제안을 선호하지만 완전히 다른 주제입니다.

— 보보 맨
소스

Q

$Q$

@floyd-예를 들어 분포 중심의 위치에 대한 적절한 아이디어가 있고 (예 : MLE 또는 MOM 추정값을 계산하기 때문에) 팻 테일 제안을 선택할 수있는 경우와 같이 여러 상황에서 유용합니다. 배포 또는 반복 당 계산 시간이 매우 낮은 경우 매우 긴 체인을 실행하여 (수용 률이 낮음) 분석 및 프로그래밍 시간을 절약 할 수있어 비효율적 인 런타임보다 훨씬 클 수 있습니다. 그러나 일반적인 첫 번째 시도 제안은 아니지만 임의의 도보 일 것입니다.

— jbowman

Q

$Q$

p (x_{t + 1} | x_{t})

$p(x_{t+1}|x_t)$

p (x_{t + 1} | x_{t}) = p (x_{t + 1})

$p(x_{t+1}|x_t) = p(x_{t+1})$

보다:

$q()$ ${\bf x}$

위키 백과 문서가 좋은 보완 읽기입니다. 보다시피 메트로폴리스는 또한 "수정 비율"을 가지고 있지만 위에서 언급 한 것처럼 Hastings는 비대칭 제안서 배포를 허용하는 수정을 도입했습니다.

Metropolis 알고리즘은 R 패키지 mcmc에서 다음 명령으로 구현됩니다 metrop().

다른 코드 예제 :

http://www.mas.ncl.ac.uk/~ndjw1/teaching/sim/metrop/

http://pcl.missouri.edu/jeff/node/322

http://darrenjw.wordpress.com/2010/08/15/metropolis-hastings-mcmc-algorithms/

— 프리츠 랭
소스

당신의 답변에 감사드립니다. 불행히도 그것은 내 질문에 대답하지 않습니다. 나는 랜덤 워크 메트로폴리스, 비 랜덤 워크 메트로폴리스 및 독립적 인 MH 만 볼 수 있습니다. dnorm(can,mu,sig)/dnorm(x,mu,sig)첫 번째 링크의 독립 샘플러에서 헤이스팅스 수정 비율 이 1이 아닙니다. 대칭 제안 분포를 사용할 때 1과 같아야한다고 생각했습니다. 이것이 독립된 샘플러이며 일반 비 랜덤 워크 MH가 아니기 때문입니까? 그렇다면, 일반 비 랜덤 워크 MH의 헤이스팅스 비율은 얼마입니까?

— AstrOne

p (current | proposal) = p (proposal | current)

$p(\text{current}|\text{proposal}) = p(\text{proposal}|\text{current})$