워밍업 : 랜덤 비트 벡터
워밍업으로, 각 비트 벡터가 무작위로 균일하게 선택된 경우부터 시작할 수 있습니다. 그런 다음 시간에 문제를 해결할 수 있음을 알 수 있습니다 (보다 정확하게는 1.6 을 lg 3 으로 대체 할 수 있음 ).O ( n1.6분 ( k , lgn ) )1.6lg삼
문제의 다음 두 가지 변형을 고려할 것입니다.
비트 벡터의 집합 가 주어지면 겹치지 않는 쌍 s ∈ S , t ∈ T 가있는 곳을 결정하십시오 .에스, T⊆{0,1}ks∈S,t∈T
이를 해결하기위한 기본 기술은 분할 및 정복입니다. 나누기와 정복을 사용 하는 시간 알고리즘 은 다음과 같습니다 .O(n1.6k)
첫 번째 비트 위치를 기준으로 와 T 를 분할 합니다. 즉, S 0 = { s ∈ S : s 0 = 0 } , S 1 = { s ∈ S : s 0 = 1 } , T 0 = { t ∈ T : t 0 = 0 } , T 1 = { t ∈ T : tSTS0={s∈S:s0=0}S1={s∈S:s0=1}T0={t∈T:t0=0} 입니다.T1={t∈T:t0=1}
이제 , S 0 , T 1 및 T 1 , S 0 에서 겹치지 않는 쌍을 재귀 적으로 찾으십시오 . 재귀 호출이 겹치지 않는 쌍을 찾으면 출력하고 그렇지 않으면 "겹치는 쌍이 없습니다"를 출력합니다.S0,T0S0,T1T1,S0
모든 비트 벡터가 무작위로 선택되므로, 와 | T의 B | ≈ | T | / 2 . 따라서 우리는 재귀 호출이 세 번이고 문제의 크기를 2 배로 줄였습니다 (두 세트의 크기는 2 배씩 줄어 듭니다). 이후 LG 분 ( | S는 | , | T | ) 스플릿은 두 세트들 중 하나는 크기 1까지이고, 문제는 선형 시간에 해결할 수있다. 우리는 라인을 따라 재발 관계를 얻습니다.|Sb|≈|S|/2|Tb|≈|T|/2lgmin(|S|,|T|) 이고, 그 해는 T ( n ) = O ( n 1.6 k ) 입니다. 두 세트의 경우 더 정확하게 실행 시간을 계산하면 실행 시간이 O ( 최소 ( | S | , | T | ) 0.6 max ( | S | , | T ) 임을 알 수 있습니다.T(n)=3T(n/2)+O(nk)T(n)=O(n1.6k) .O(min(|S|,|T|)0.6max(|S|,|T|)k)
This can be further improved, by noting that if k≥2.5lgn+100, then the probability that a non-overlapping pair exists is exponentially small. In particular, if x,y are two random vectors, the probability that they're non-overlapping is (3/4)k. If |S|=|T|=n, there are n2 such pairs, so by a union bound, the probability a non-overlapping pair exists is at most n2(3/4)k. When k≥2.5lgn+100, this is ≤1/2100. So, as a pre-processing step, if k≥2.5lgn+100, then we can immediately return "No non-overlapping pair exists" (the probability this is incorrect is negligibly small), otherwise we run the above algorithm.
따라서, 우리의 주행 시간 달성 (또는 O ( 분 ( | S | , | T | ) 0.6 최대 ( | S | , | T | ) 분 ( K , LG N을 ) ) bitvectors가 임의로 선택되는 특별한 경우에 대해 상기 제시 한 두 세트 변이체)에 대한.O(n1.6min(k,lgn))O(min(|S|,|T|)0.6max(|S|,|T|)min(k,lgn))
Of course, this is not a worst-case analysis. Random bitvectors are considerably easier than the worst case -- but let's treat it as a warmup, to get some ideas that perhaps we can apply to the general case.
Lessons from the warmup
We can learn a few lessons from the warmup above. First, divide-and-conquer (splitting on a bit position) seems helpful. Second, you want to split on a bit position with as many 1's in that position as possible; the more 0's there are, the less reduction in subproblem size you get.
Third, this suggests that the problem gets harder as the density of 1's gets smaller -- if there are very few 1's among the bitvectors (they are mostly 0's), the problem looks quite hard, as each split reduces the size of the subproblems a little bit. So, define the density Δ to be the fraction of bits that are 1 (i.e., out of all nk bits), and the density of bit position i to be the fraction of bitvectors that are 1 at position i.
Handling very low density
As a next step, we might wonder what happens if the density is extremely small. It turns out that if the density in every bit position is smaller than 1/k−−√, we're guaranteed that a non-overlapping pair exists: there is a (non-constructive) existence argument showing that some non-overlapping pair must exist. This doesn't help us find it, but at least we know it exists.
왜 이런 경우입니까? x i = y i = 1 인 경우 한 쌍의 비트 벡터 가 비트 위치 i로 덮여 있다고 가정 해 봅시다 . 모든 겹치는 비트 벡터 쌍은 일부 비트 위치로 덮여 있어야합니다. 우리는 특정 비트 위치를 고정하는 경우 지금 난을 , 그 비트 위치에 의해 커버 될 수 쌍의 수는 많아야이다 ( N Δ ( I ) ) 2 < N 2 / K . 모든 k에서 합산x,yixi=yi=1i(nΔ(i))2<n2/kk of the bit positions, we find that the total number of pairs that are covered by some bit position is <n2. This means there must exist some pair that's not covered by any bit position, which implies that this pair is non-overlapping. So if the density is sufficiently low in every bit position, then a non-overlapping pair surely exists.
However, I'm at a loss to identify a fast algorithm to find such a non-overlapping pair, in these regime, even though one is guaranteed to exist. I don't immediately see any techniques that would yield a running time that has a sub-quadratic dependence on n. So, this is a nice special case to focus on, if you want to spend some time thinking about this problem.
Towards a general-case algorithm
In the general case, a natural heuristic seems to be: pick the bit position i with the most number of 1's (i.e., with the highest density), and split on it. In other words:
Find a bit position i that maximizes Δ(i).
Split S and T based upon bit position i. In other words, form S0={s∈S:si=0}, S1={s∈S:si=1}, T0={t∈T:ti=0}, T1={t∈T:ti=1}.
Now recursively look for a non-overlapping pair from S0,T0, from S0,T1, and from T1,S0. If any recursive call finds a non-overlapping pair, output it, otherwise output "No overlapping pair exists".
The challenge is to analyze its performance in the worst case.
Let's assume that as a pre-processing step we first compute the density of every bit position. Also, if Δ(i)<1/k−−√ for every i, assume that the pre-processing step outputs "An overlapping pair exists" (I realize that this doesn't exhibit an example of an overlapping pair, but let's set that aside as a separate challenge). All this can be done in O(nk) time. The density information can be maintained efficiently as we do recursive calls; it won't be the dominant contributor to running time.
What will the running time of this procedure be? I'm not sure, but here are a few observations that might help. Each level of recursion reduces the problem size by about n/k−−√ bitvectors (e.g., from n bitvectors to n−n/k−−√ bitvectors). Therefore, the recursion can only go about k−−√ levels deep. However, I'm not immediately sure how to count the number of leaves in the recursion tree (there are a lot less than 3k√ leaves), so I'm not sure what running time this should lead to.