O (n) 시간에 5 개의 반복 된 값을 찾는 방법은 무엇입니까?

15

정확히 5를 반복하여 에서 정수를 포함하는 크기의 배열이 있다고 가정합니다 . 시간 에 반복되는 숫자를 찾을 수있는 알고리즘을 제안해야합니다 . 나는 내 인생에서 아무것도 생각할 수 없다. 나는 정렬이 기껏해야 일 것이라고 생각한다 . 그런 다음 배열을 순회하면 되고 됩니다. 그러나 링크 된 목록, 대기열, 스택 등으로 까다로운 항목을 보았으므로 정렬이 필요한지 확실하지 않습니다. $n \geq 6$ $1$ $n − 5$ $O(n)$ $O(n\log n)$ $O(n)$ $O(n^2\log n)$

algorithms arrays searching

— 대릴 나크
소스

16

O(nlogn)+O(n) $O(n \log n) + O(n)$ 아닌

O(n2logn) $O(n^2 \log n)$ . 그것은의

O(nlogn) $O(n \log n)$ . 그것은 것

O(n2logn) $O(n^2 \log n)$ 당신이 정렬 n 번을 한 경우.

— 기금 모니카의 소송

1

정수 정렬은

O(n) $O(n)$ 입니다.

— leftaroundabout

1

@leftaroundabout이 알고리즘은

O(k⋅n) $O(k\cdot n)$ 이며, 여기서

n $n$ 은 배열 의 크기 이고

k $k$ 는 입력 집합의 크기입니다. 이후

k=n−constant $k=n-constant$ 이러한 알고리즘 작동

O(n2) $O(n^2)$

— 로마 그라프

4

@ RomanGräf 실제 상황은 다음과 같습니다. 알고리즘은

에서 작동합니다. 여기서

는 도메인의 크기입니다. 따라서 OP와 같은 문제의 경우

크기의 도메인 에서 이러한 알고리즘을 사용하든 무한 크기의 도메인 에서 전통적인

알고리즘을 사용하는지 여부는 동일합니다 . 또한 말이됩니다. O(logk⋅n) $O(\log k \cdot n)$

k $k$

n $n$

O(n⋅logn) $O(n\cdot \log n)$

— leftaroundabout

5

를 들어

, 유일한 허용 번호는

귀하의 설명에 의해. 그러나

은 5 회가 아닌 6 회 반복되어야합니다. n=6 $n=6$

1 $1$

— Alex Reinking

22

크기가 추가 배열 를 만들 수 있습니다 . 처음에는 배열의 모든 요소를 . 그런 다음 입력 배열 반복 하고 각 에 대해 를 1 씩 증가시킵니다 . 그런 다음 단순히 배열 : 루프 를 확인하고 이면 가 반복됩니다. 당신은 에서 그것을 해결 $B$ $n$ $0$ $A$ $B[A[i]]$ $i$ $B$ $A$ $B[A[i]] > 1$ $A[i]$ $O(n)$ 메모리 비용은 이고 정수는 에서 사이이므로 시간이 걸립니다 . $O(n)$ $1$ $n-5$

— 페이드 2 블랙
소스

26

fade2black의 답변에 대한 솔루션이 표준 솔루션이지만 공간을 사용합니다. 다음과 같이 이를 공간으로 향상시킬 수 있습니다 . $O(n)$ $O(1)$

배열을 합니다. 들면 , 컴퓨팅 $A[1],\ldots,A[n]$ $d=1,\ldots,5$ . $\sigma_d = \sum_{i=1}^n A[i]^d$
계산 (잘 알려진 공식을 사용하여 후자의 합을 로 계산할 수 있습니다 ). (여기서 는 반복되는 숫자 임) $\tau_d = \sigma_d - \sum_{i=1}^{n-5} i^d$ $O(1)$ $\tau_d = m_1^d + \cdots + m_5^d$ $m_1,\ldots,m_5$
다항식 계산 . 이 다항식의 계수의 대칭 함수이다 로부터 계산 될 수 에서 . $P(t) = (t-m_1)\cdots(t-m_5)$ $m_1,\ldots,m_5$ $\tau_1,\ldots,\tau_5$ $O(1)$
모든 가능성 을 시도 하여 다항식 의 근을 모두 찾으십시오 . $P(t)$ $n-5$

이 알고리즘은 RAM 기계 모델을 가정하는데, 여기서 비트 워드 에 대한 기본 산술 연산에는 시간이 걸립니다. $O(\log n)$ $O(1)$

이 솔루션을 공식화하는 다른 방법은 다음과 같습니다.

Calculate $x_1 = \sum_{i=1}^n A[i]$ , and deduce $y_1 = m_1 + \cdots + m_5$ using the formula $y_1 = x_1 - \sum_{i=1}^{n-5} i$ .
Calculate $x_2 = \sum_{1 \leq i < j \leq} A[i] A[j]$ in $O(n)$ using the formula $x 2 = (A [1]) A [2] + (A [1] + A [2]) A [3] + (A [1] + A [2] + A [3]) A [4] + \dots + (A [1] + \dots + A [n - 1]) A [n] .$ $x_2 = (A[1]) A[2] + (A[1] + A[2]) A[3] + (A[1] + A[2] + A[3]) A[4] + \cdots + (A[1] + \cdots + A[n-1]) A[n].$
Deduce $y_2 = \sum_{1 \leq i < j \leq 5} m_i m_j$ using the formula $y 2 = x 2 - \sum 1 \leq i < j \leq n - 5 i j - (\sum i = 1 n - 5 i) y 1 .$ $y_2 = x_2 - \sum_{1 \leq i < j \leq n-5} ij - \left(\sum_{i=1}^{n-5} i\right) y_1.$
Calculate $x_3,x_4,x_5$ and deduce $y_3,y_4,y_5$ along similar lines.
The values of $y_1,\ldots,y_5$ are (up to sign) the coefficients of the polynomial $P(t)$ from the preceding solution.

This solution shows that if we replace 5 by $d$ , then we get (I believe) a $O(d^2n)$ algorithm using $O(d^2)$ space, which performs $O(dn)$ arithmetic operations on integers of bit-length $O(d\log n)$ , keeping at most $O(d)$ of these at any given time. (This requires careful analysis of the multiplications we perform, most of which involve one operand of length only $O(\log n)$ .) It is conceivable that this can be improved to $O(dn)$ time and $O(d)$ space using modular arithmetic.

— Yuval Filmus
소스

Any interpretation of

σd $\sigma_d$ and

τd $\tau_d$ ,

P(t) $P(t)$ ,

mi $m_i$ and so on? Why

d∈{1,2,3,4,5} $d \in \{1, 2, 3, 4, 5\}$ ?

— styrofoam fly

3

The insight behind the solution is the summing trick, which appears in many exercises (for example, how do you find the missing element from an array of length

n−1 $n-1$ containing all but one of the numbers

1,…,n $1,\ldots,n$ ?). The summing trick can be used to compute

f(m1)+⋯+f(m5) $f(m_1) + \cdots + f(m_5)$ for an arbitrary function

f $f$ , and the question is which

f $f$ to choose in order to be able to deduce

m1,…,m5 $m_1,\ldots,m_5$ . My answer uses familiar tricks from the elementary theory of symmetric functions.

— Yuval Filmus

1

@hoffmale Actually,

O(d2) $O(d^2)$ .

— Yuval Filmus

1

@hoffmale Each of them takes

d $d$ machine words.

— Yuval Filmus

1

@BurnsBA The problem with this approach is that

(n−5)# $(n-5)\#$ is much larger than

(n−4)(n−5)2 $\frac{(n-4)(n-5)}{2}$ . Operations on large numbers are slower.

— Yuval Filmus

8

There's also a linear time and constant space algorithm based on partitioning, which may be more flexible if you're trying to apply this to variants of the problem that the mathematical approach doesn't work well on. This requires mutating the underlying array and has worse constant factors than the mathematical approach. More specifically, I believe the costs in terms of the total number of values $n$ and the number of duplicates $d$ are $\mathcal{O}(n \log d)$ and $\mathcal{O}(d)$ respectively, though proving it rigorously will take more time than I have at the moment.

Algorithm

Start with a list of pairs, where the first pair is the range over the whole array, or $[(1, n)]$ if 1-indexed.

Repeat the following steps until the list is empty:

Take and remove any pair $(i, j)$ from the list.
Find the minimum and maximum, $\text{min}$ and $\text{max}$ , of the denoted subarray.
If $\text{min} = \text{max}$ , the subarray consists only of equal elements. Yield its elements except one and skip steps 4 to 6.
If $\text{max} - \text{min} = j - i$ , the subarray contains no duplicates. Skip steps 5 and 6.
Partition the subarray around $\frac{\text{min}+\text{max}}{2}$ , such that elements up to some index $k$ are smaller than the separator and elements above that index are not.
Add $(i, k)$ and $(k + 1, j)$ to the list.

Cursory analysis of time complexity.

Steps 1 to 6 take $\mathcal{O}(j - i)$ time, since finding the minimum and maximum and partitioning can be done in linear time.

Every pair $(i, j)$ in the list is either the first pair, $(1, n)$ , or a child of some pair for which the corresponding subarray contains a duplicate element. There are at most $d \lceil \log_2 n + 1\rceil$ such parents, since each traversal halves the range in which a duplicate can be, so there are at most $2d \lceil \log_2 n + 1\rceil$ total when including pairs over subarrays with no duplicates. At any one time, the size of the list is no more than $2d$ .

하나의 사본을 찾는 작업을 고려하십시오. 이것은 기하 급수적으로 감소하는 범위에 걸쳐 일련의 쌍으로 구성되므로 총 작업량은 기하 시퀀스의 합 또는 입니다. 이것은 복제에 대한 총 작업 이 이어야하며 , 에서 선형 이어야 한다는 명백한 결론을 만듭니다 . $\mathcal{O}(n)$ $d$ $\mathcal{O}(nd)$ $n$

더 엄격한 경계를 찾으려면 최대 복제본을 최대한 분산시키는 최악의 시나리오를 고려하십시오. 직관적으로, 검색은 두 단계로 이루어집니다. 하나는 전체 배열이 매번 순회되고 점차적으로 작은 부분에서 하나는 부분이 보다 작은 부분입니다. $\frac{n}{d}$ so only parts of the array are traversed. The first phase can only be $\log d$ deep, so has cost $\mathcal{O}(n \log d)$ , and the second phase has cost $\mathcal{O}(n)$ because the total area being searched is again exponentially decreasing.

— Veedrac
소스

Thank you for the explanation. Now I understand. A very pretty algorithm!

— D.W.

5

Leaving this as an answer because it needs more space than a comment gives.

You make a mistake in the OP when you suggest a method. Sorting a list and then transversing it $O(n\log n)$ time, not $O(n^2\log n)$ time. When you do two things (that take $O(f)$ and $O(g)$ respectively) sequentially then the resulting time complexity is $O(f+g)=O(\max{f,g})$ (under most circumstances).

In order to multiply the time complexities, you need to be using a for loop. If you have a loop of length $f$ and for each value in the loop you do a function that takes $O(g)$ , then you'll get $O(fg)$ time.

따라서 귀하의 경우 정렬 한 다음 으로 가로 로 됩니다. 정렬 알고리즘의 각 비교를 위해 당신이 걸리는 계산해야 할 일을했을 경우 , 다음 이 걸릴 것이라고 있지만, 여기서는 그렇지 않다. $O(n\log n)$ $O(n)$ $O(n\log n+n)=O(n\log n)$ $O(n)$ $O(n^2\log n)$

라는 내 주장에 대해 궁금한 점이 있다면 항상 사실이 아니라는 점에 유의하는 것이 중요합니다. 그러나 또는 (모든 공통 기능을 보유 함)이면 유지됩니다. 가장 일반적인 시간은 추가 매개 변수가 관여하고 와 같은 식을 얻는 경우 입니다. $O(f+g)=O(\max{f,g})$ $f\in O(g)$ $g\in O(f)$ $O(2^cn+n\log n)$

— Stella Biderman
소스

3

There's an obvious in-place variant of the boolean array technique using the order of the elements as the store (where arr[x] == x for "found" elements). Unlike the partition variant that can be justified for being more general I'm unsure when you'd actually need something like this, but it is simple.

for idx from n-4 to n
    while arr[arr[idx]] != arr[idx]
        swap(arr[arr[idx]], arr[idx])

This just repeatedly puts arr[idx] at the location arr[idx] until you find that location already taken, at which point it must be a duplicate. Note that the total number of swaps is bounded by $n$ since each swap makes its exit condition correct.

— 비드 라크
소스

내부 while루프가 평균적으로 일정한 시간에 실행 된다는 일종의 논쟁을해야합니다 . 그렇지 않으면 이것은 선형 시간 알고리즘이 아닙니다.

— David Richerby

@DavidRicherby 평균적으로 일정한 시간을 실행하지는 않지만 외부 루프는 5 번만 실행되므로 괜찮습니다. 각 스왑의 종료 조건이 정확하기 때문에 총 스왑 수는

의해 제한됩니다 . 따라서 중복 값의 수가 증가하더라도 총 시간은 여전히 선형입니다 (일명,

아닌

단계 소요 ). n $n$

n $n$

nd $nd$

— Veedrac

죄송하지만 외부 루프가 일정한 횟수로 실행되는 것을 알지 못했습니다! (스왑 수에 대한 메모를 포함하도록 편집되어

— 다운 보트를

1

Subtract the values you have from the sum $\sum_{i=1}^{n} i = \frac{(n-1) \cdot n}{2}$ .

So, after $\Theta(n)$ time (assuming arithmetic is O(1), which it isn't really, but let's pretend) you have a sum $\sigma_1$ of 5 integers between 1 and n:

$x_1 + x_2 + x_3 + x_4 + x_5 = \sigma_1$

Supposedly, this is no good, right? You can't possibly figure out how to break this up into 5 distinct numbers.

아, 그러나 이것은 재미있을 곳입니다! 이제 이전과 동일한 작업을 수행하지만 에서 값 의 제곱 을 뺍니다 . 이제 당신은 : $\sum_{i=1}^{n} i^2$

${x_1}^2 + {x_2}^2 + {x_3}^2 + {x_4}^2 + {x_5}^2 = \sigma_2$

내가 어디로 가는지 보아? 거듭 제곱 3, 4 및 5에 대해 동일하게 수행하면 5 개의 변수에 5 개의 독립 방정식이 있습니다. 나는 당신이 풀 수 있다고 확신합니다. $\vec{x}$ .

주의 사항 : 산술은 실제로 O (1) 가 아닙니다 . 또한 합계를 나타내려면 약간의 공간이 필요합니다. 하지만 많은 당신이 상상하는 것처럼 - 만약 당신이, 오, 같은 당신이 가지고있는, 모듈 식으로 대부분의 모든 것을 할 수 비트; 그렇게해야합니다. $\lceil\log(5n^6)\rceil$

— einpoklum
소스

@YuvalFilmus가 동일한 솔루션을 제안하지 않습니까?

— fade2black

@ fade2black : 아, 그렇습니다. 죄송합니다. 방금 그의 해결책의 첫 번째 줄을 보았습니다.

— einpoklum

0

문제를 해결하는 가장 쉬운 방법은 원래 배열의 각 숫자에 대한 apperance를 계산 한 다음 에서 까지의 모든 숫자를 탐색 하고 숫자가 두 번 이상 나타나는지 확인 하는 배열을 만드는 것입니다. 메모리와 시간 모두의 해가 선형이거나 $1$ $n-5$ $O(N)$

— 누군가 12321
소스

1

(A는 눈에 쉽게 비트 있지만)이 같은 @의 fade2black의 답변입니다

— LangeHaare

0

배열을 매핑 한 1 << A[i]다음 모든 것을 XOR합니다. 중복은 해당 비트가 꺼져있는 숫자입니다.

— 하울 레스
소스

5 개의 복제본이 있으므로 일부 경우 xor 트릭이 중단되지 않습니다.

— Evil

1

이것의 실행 시간은

입니다. 각 bitvector는

하면 각각의 bitvector 동작 걸리는 긴 비트

시간을하면, 총 원의 배열의 요소마다 1 개 비트 벡터 연산을 수행

시간. O(n2) $O(n^2)$

n $n$

O(n) $O(n)$

O(n2) $O(n^2)$

— DW

@DW 그러나 우리가 일반적으로 사용하는 컴퓨터는 32 또는 64 비트로 고정되어 있으며 런타임에 변경되지 않습니다 (즉, 일정합니다). 왜 그렇게 취급해서는 안되며 비트 조작에

이 아닌

? O(1) $O(1)$

O(n) $O(n)$

— code_dredd

1

@ 레이, 나는 당신이 당신의 자신의 질문에 대답했다고 생각합니다. 우리가 일반적으로 사용은 64 비트로 고정되어 기계가, 실행 시간은 동작을 수행하기 위해 주어진

비트 벡터 것은

이 아닌

. 이 같은 소요

모두에 어떤 동작을 수행하는 명령을

(A)의 비트를

비트 벡터 및

인

이 아닌

. n $n$

O(n) $O(n)$

O(1) $O(1)$

n/64 $n/64$

n $n$

n/64 $n/64$

O(n) $O(n)$

O(1) $O(1)$

— DW

@DW 이전에 얻은 것. 의견에 따르면 비트 벡터는

크기의 배열 에서 단일 요소를 참조 하며 비트 벡터는 64 비트이며 이는 내가 참조하는 상수입니다. 분명히 요소 당

비트 가 있고 배열의 요소 수가

이라고 가정하면 크기가

인 배열을 처리하는 데

시간 이 걸립니다 . 그러나,

, 배열 요소의 동작은 W / 일정한 비트 수가 있어야하므로

이 아닌

와 어레이

n $n$

O(kn) $O(kn)$

k $k$

n $n$

k=64 $k=64$

O(1) $O(1)$

O(k) $O(k)$

대신에

. 완전성 / 정확성을 위해

를유지합니까, 아니면 다른 것을 놓치고 있습니까? O(n) $O(n)$

O(kn) $O(kn)$

k $k$

— code_dredd

-2

DATA=[1,2,2,2,2,2]

from collections import defaultdict

collated=defaultdict(list):
for item in DATA:
    collated[item].append(item)
    if len(collated) == 5:
        return item.

# n time

— 사용자
소스

4

사이트에 오신 것을 환영합니다. 우리는 컴퓨터 과학 사이트이므로 특정 언어와 라이브러리에 대한 이해가 필요한 코드 덤프가 아닌 알고리즘과 설명을 찾고 있습니다. 특히,이 코드가 선형 시간으로 실행된다는 귀하의 주장 collated[item].append(item)은 일정한 시간으로 실행되는 것으로 가정합니다 . 정말 맞습니까?

— David Richerby

3

또한 5 번 반복되는 값을 찾고 있습니다. 대조적으로, OP는 5 번의 값을 찾고 있는데, 이는 각각 두 번 반복됩니다.

— Yuval Filmus