There's also a linear time and constant space algorithm based on partitioning, which may be more flexible if you're trying to apply this to variants of the problem that the mathematical approach doesn't work well on. This requires mutating the underlying array and has worse constant factors than the mathematical approach. More specifically, I believe the costs in terms of the total number of values nn and the number of duplicates dd are O(nlogd)O(nlogd) and O(d)O(d) respectively, though proving it rigorously will take more time than I have at the moment.
Algorithm
Start with a list of pairs, where the first pair is the range over the whole array, or [(1,n)][(1,n)] if 1-indexed.
Repeat the following steps until the list is empty:
- Take and remove any pair (i,j)(i,j) from the list.
- Find the minimum and maximum, minmin and maxmax, of the denoted subarray.
- If min=maxmin=max, the subarray consists only of equal elements. Yield its elements except one and skip steps 4 to 6.
- If max−min=j−imax−min=j−i, the subarray contains no duplicates. Skip steps 5 and 6.
- Partition the subarray around min+max2min+max2, such that elements up to some index kk are smaller than the separator and elements above that index are not.
- Add (i,k)(i,k) and (k+1,j)(k+1,j) to the list.
Cursory analysis of time complexity.
Steps 1 to 6 take O(j−i)O(j−i) time, since finding the minimum and maximum and partitioning can be done in linear time.
Every pair (i,j)(i,j) in the list is either the first pair, (1,n)(1,n), or a child of some pair for which the corresponding subarray contains a duplicate element. There are at most d⌈log2n+1⌉d⌈log2n+1⌉ such parents, since each traversal halves the range in which a duplicate can be, so there are at most 2d⌈log2n+1⌉2d⌈log2n+1⌉ total when including pairs over subarrays with no duplicates. At any one time, the size of the list is no more than 2d2d.
하나의 사본을 찾는 작업을 고려하십시오. 이것은 기하 급수적으로 감소하는 범위에 걸쳐 일련의 쌍으로 구성되므로 총 작업량은 기하 시퀀스의 합 또는 O ( n ) 입니다. 이것은 d 개의 복제에 대한 총 작업 이 O ( n d ) 이어야하며 , n 에서 선형 이어야 한다는 명백한 결론을 만듭니다 .O(n)dO(nd)n
더 엄격한 경계를 찾으려면 최대 복제본을 최대한 분산시키는 최악의 시나리오를 고려하십시오. 직관적으로, 검색은 두 단계로 이루어집니다. 하나는 전체 배열이 매번 순회되고 점차적으로 작은 부분에서 하나는 부분이 n 보다 작은 부분입니다.dnd so only parts of the array are traversed. The first phase can only be logdlogd deep, so has cost O(nlogd)O(nlogd), and the second phase has cost O(n)O(n) because the total area being searched is again exponentially decreasing.