스도쿠 퍼즐을 저장하는 데 필요한 최소 비트 수는 얼마입니까?


28

참고 : 이것은 표준 9x9 스도쿠 퍼즐에 관한 것입니다. 이 솔루션은 해결되고 합법적 인 퍼즐 만 지원 하면 됩니다. 따라서 솔루션은 빈 셀을 지원할 필요가 없으며 해결 된 스도쿠 퍼즐의 속성에 의존 할 수 있습니다.

나는 이것을 궁금하게 생각했지만, 내가 만족하는 대답을 생각할 수 없었다. 순진한 솔루션은 각 셀 (81 개 셀)에 총 648 비트의 1 바이트를 사용합니다. 보다 정교한 솔루션은 전체 스도쿠 퍼즐을 기본 9 숫자 (셀당 한 자리)로 저장하고 log2(981))=257 비트.

그러나 3x3 서브 그리드에서 9 개의 숫자 중 8 개를 아는 경우 9를 사소하게 추론 할 수 있습니다. 이 질문이 시작될 때까지 이러한 생각을 계속할 수 있습니다. 독창적 인 스도쿠의 양은 얼마입니까? 이제 각 이진수를 스도쿠 퍼즐에 매핑하는 거대한 조회 테이블을 사용할 수 있지만 유용한 솔루션은 아닙니다.

그래서 내 질문 :

룩업 테이블을 사용하지 않고 스도쿠 퍼즐을 저장하는 데 필요한 최소 비트 수는 얼마이며 알고리즘은 무엇입니까?


3
3x3, 행 또는 열에서 9 번째 숫자를 제외하고 고유 한 솔루션이있는 빈 공간으로 최소 스도쿠를 저장하는 것 사이에 질적 차이가 있습니까? "빈 셀을 지원할 필요가 없습니다"는 최적의 솔루션이 반드시 필요한 경우 약간의 청어입니다.
Wooble

19
때문에 가 6.67 × 10 ^ (21) 해결 스도쿠 ( "QSCGZ"2003; Felgenhauer 2005 자비스) ..., 하한이 73 비트 (당신이 거대한 테이블 조회를 사용하는 경우에도)이며, log_2 (6.67 × 10 ^ 21) 72.4 = . 대칭 적으로 본질적으로 동일한 솔루션을 구분할 필요가없는 경우이 하한은 적용되지 않습니다.
Ito Tsuyoshi

9
이 질문은 좋은 프로그램 공모전을 만들 것입니다.
피터 쇼

1
본질적으로 동일한 솔루션에 대한 유사한 하한은 33 비트입니다.
찰스

3
룩업 테이블이 왜 필요한가요? 원하는 수에 도달 할 때까지 스도쿠 솔루션을 하나씩 열거 할 수 있습니다.
Zirui Wang

답변:


19

래칫 괴물의 대답과 같은 줄을 따라 한 번에 3x3 상자의 다음 별표에 별표가없는 셀을 채우면 항상 다음 상자를 선택하여 상자와 행 또는 열을 공유하는 상자로 선택하십시오 이미 채워져 있다면 단계 당 선택 수에 대해 다음과 같은 패턴을 얻습니다 (맨 위 가운데 상자를 먼저 채우고 오른쪽 위 상자를 채우는 등).

In each 3x3 box after the first, once you've filled in one row or column of the box, three of the remaining six digits are localized to a single row. Choose their locations first, and then fill in the remaining three cells. (So the actual order of which cells to fill in might vary depending on what you already know, but the number of choices is never more than what I've shown.)

After you've filled in these cells the stars are all determined.

* * *    9 8 7    6 5 4
* * *    6 5 4    3 3 2
* * *    3 2 1    3 2 1

6 5 4    * * *    6 3 3
3 3 2    * * *    5 3 2
3 2 1    * * *    4 2 1

6 3 3    6 5 4    * * *
5 3 2    3 3 2    * * *
4 2 1    3 2 1    * * *

If I've calculated correctly, this gives 87 bits. There's some additional savings to be had in the last 3x3 block, per the comment by Peter Shor: every value is localized to one of four cells, and every row contains at least one cell with only four possible values, so certainly the factors in that block should start with 4 not 6, but I don't understand the remaining factors in Shor's answer.


4
You can reduce the number of choices when you fill in the sixth 3x3 box, as well. This box becomes 4,3,2 / 3,2,1 / 2,1,1 for a total of 83 bits, if I calculated it correctly.
Peter Shor

@Peter - nope. The 3 numbers to the right could be the same as the numbers above. You don't know all of them are distinct. The most assured unique numbers are 3 so the first box is a pick from six items. (This one location is an example. It is true for the others too.)
Hogan

@David - going by my comment to Peter I don't think your numbers are wrong. In the 2nd box you have 6 5 4 4 3 2 3 2 1 I believe it needs to be 6 5 4 6 5 4 3 2 1 for the worst case.
Hogan

Hogan, no, see the part in my answer about "once you've filled in one row or column of the box, you can always choose the next row or column to fill in to be one in which there are at most four possible values"
David Eppstein

@David - Lets label the 3 x 3s 1,1 1,2 1,3 going left to right top to bottom. Let lable the Squares A - I going left to right top to bottom. The location D in 1,3 knows 3 numbers in the 3x3 it is in (A,B,C) and it knows 3 numbers in 1,2 (D,E,F) but it does not know those 6 numbers are different. They could be the same 3 numbers from box 3,1 and 2,1 thus there are MAX 6 choices.
Hogan

13

going on with @peter's answer here's a worst case posibilities list for each cell as you are filling it in starting from top left

9   8   7       6   5   4       3   2   1
6   5   4       6   5   4       3   2   1
3   2   1       3   2   1       3   2   1

6   6   3       6   5   4       3   2   1
5   5   2       5   5   3       3   2   1
4   4   1       4   2   1       3   2   1

3   3   3       3   3   3       1   1   1
2   2   2       2   2   2       1   1   1
1   1   1       1   1   1       1   1   1

this makes for 4,24559E+29 posibilities or 99 bits

edit: forgot that last square is fully determined by all others


Very nice!! Let me add that it's not clear to me that you could ever achieve these worst-case possibilities for a real Sudoku solution (especially if you use a sophisticated algorithm that uses some Sudoku techniques to narrow down the possibilies for which numbers can go in a cell).
Peter Shor

@peter but you need to add those narrowing in e-n and decoding and I realized that if you have to choose one and don't fix the order (easiest way but not optimal really), you need to add that to the encoding as well
ratchet freak

No, if you use the same algorithm for figuring out the best cell in the en- and the decoding procedure, it will give the same cell (since it's working on the same data), so the en- and decoding procedures will be synchronized, and you don't have to add the order to the encoding. This idea also makes the LZW data compression algorithm work.
Peter Shor

I think that the minimum bits required to store a valid sudoku puzzle is not a computable function (Kolmogorov). However the 103 bits by Peter/ratchet seems a good bound.
Marzio De Biasi

2
@Vor: Technically the Turing machine that outputs the correct number of bits when given a sudoku puzzle as input is finite because the input set is finite, so "how many bits are needed to describe this puzzle" is "trivially" computable. I'm saying that we could actually find such a Turing machine explicitly (in principle, the computations would take way too long), because it can't be harder than computing a finite prefix of an Omega number.
Aaron Sterling

5

You don't need a full look-up table to achieve optimal compressibility. I believe that modern computers using a very reasonable look-up table are able to count the number of constrained Sudokus, which are Sudokus with some digits already in place. Using this, here's how you encode (decoding is similar).

Fix an ordering of the squares. Suppose the number on the first square is d1. Put N1 to be the number of Sudokus whose first square is less than d1. Let now d2 be the number of the second square. Put N2 to be the number of Sudokus whose first square is d1 and whose second square is less than d2. And so on. The encoded number is N=iNi.

This method of encoding is known as binomial encoding in the literature. It should enable you to effectively (in a real-world sense) calculate the index of any given Sudoku, and vice versa. You will then require only 72.4 bits, as alluded to above (this means that you could code several of them with that average number of bits).

Edit: The Wikipedia page on the mathematics of Sudoku helps us clarify the picture. Also helpful is a table compiled by Ed Russell.

It turns out that if you consider only the top three rows, then there are essentially only 44 different configurations to consider. In the table, you can find the total number of configurations equivalent to any given one (assuming that the top row is 123456789), and the total number of completions of each one. Given a Sudoku, here is how we would compute its ordinal number:

  1. Normalize the configuration so that its top row is 123456789.
  2. Find out which of the 44 different configurations it belongs to. The Wikipedia article gives an algorithm for that. The table lists the number of equivalence classes for each configuration, as well as the number of completions.
  3. Determine the ordinal number of the configuration of the top three rows inside its equivalence class. This can be done in two ways: either using a list of all the equivalence class (there are 36288 in total in all equivalence classes), or by finding a way to quickly enumerate all of them.
  4. Normalize the remaining rows by sorting rows 4-6 and 7-9 by their first column, and then sorting these two blocks of rows in some arbitrary way. This reduces the number of completions by a factor of 72.
  5. Enumerate all completions having the same first column. There are about 220 of them for each equivalence class, so that shouldn't take too long. Some tradeoffs are possible here as well.
  6. Let i be the equivalence class, j be the ordinal number of the configuration of the top three rows within the equivalence class, k be the ordinal number of the completion. There are two arrays Ci,Di (which can be computed from Ed Russell's table) such that Ci+jDi+k is the ordinal number of the Soduko up to the 9!72 symmetries considered. From that you can compute the actual ordinal number.

This procedure is reversible, and will generate a Sudoku from an ordinal number. Note that Sudoku enumeration has been reduced to a few minutes (in 2006; see the talk page of the Wikipedia article) or less, so I expect that on a modern computer this approach would be very practical and take a few seconds or less.


2
Is it possible to count the solutions to constrained sudoku efficiently? It is #P-complete if you generalize the size and you allow blanks in arbitrary places.
Tsuyoshi Ito

2
As I alluded to in my answer, arithmetic encoding will achieve near-optimal compression for this scenario.
Peter Shor

1
You might be right, but your claim implies that the number of sudoku grids (6.67×10^21) is easy to compute on a modern computer. It is indeed possible to compute, but is it easy?
Tsuyoshi Ito

2
I got that impression from one of the papers describing how to do the calculation. You could even calculate some of the "heavier" data in preprocessing and store it in a reasonably-sized table - the speed gains can be dramatic. As far as I remember, it took them only a few hours, and that some years ago. Now suppose you use a table to make it 1000 times as fast. What's more, at each stage the numbers decrease exponentially, so most of the work is probably concentrated at the first stage.
Yuval Filmus

1
@tsuyoshi I believe that there's some version/extension of BDDs that makes the computation relatively straightforward - I'd need to do a little bit of digging for it, but I know that they've been used for some fairly complicated combinatorial counting problems.
Steven Stadnicki

4

Here's an algorithm which I suspect will produce a pretty good encoding. You have the finished sudoku you want to compress, and let's say you have already encoded some cells of it, so there's a partial sudoku (not necessarily with a unique solution) with some cells filled in.

Use a fixed algorithm to count how many numbers can be placed into every empty cell. Find the lexicographically first cell into which the smallest number of different numbers can be placed, and encode which one of these numbers goes into it (so if a cell can only contain a 3, 7, or 9, the 3 is encoded by "0", the 7 by "1" and the 9 by "2"). Encode the resulting sequence using arithmetic coding (which takes into account the number of possible numbers that a cell can contain).

I don't know how long the resulting binary sequence will be, but I suspect it's pretty short, especially if your algorithm for counting how many numbers can be placed into a cell is reasonably sophisticated.

If you had a good algorithm that estimated the probability of each cell containing a given number, you could do even better.


3

Any comments and criticisms welcome

An approach from compressed sensing seems to provide a range from 69.96bits to 171.72bits:

1.)Storing the puzzle implies storing the solution (information theoretically).

2.)The hardest sudoku puzzle seems to have t(α)α2 entries for some t(α) that depends on α (For example, t(3) =2.44444 to 3). http://www.usatoday.com/news/offbeat/2006-11-06-sudoku_x.htm

Hence, we have a vector P of length α4 that have atmost t(α)α2 non-zero entries.

3.) Take M, a β×α4 matrix with β2t(α)α2 and which has any 2t(α)α2 columns independent and with entries in {0,±1}. This matrix is fixed for all instances of the puzzle. β=kt(α)α2 for some fixed k suffices from UUP.

4.) Find V=MP. This has β integers which on average is bounded by |α2| since entries of M are random with entries in {0,±1}.

5.) Storing V needs βlogα2=2kt(α)α2logα bits.

In your case, α=3 and t(α) =3 and 2kt(α)α2logα=69.96kbits to 85.86k bits. k=2, the minumum required provides roughly 139.92bits to 171.72bits roughly as a lower bound for the average case.

Note that I have hand-waived some assumptions such as sizes of entries of MP and number of entries one has on average in the puzzle.

A.)Of course, it mightbe possible to reduce k from 2 since in sudoku the position of the sparse entries are not that mutually independent. Each entry on an average t(α)1 entries each in its row, column and sub-box. That is given, that some entries are present in a sub-box or column or row, one can find the odds of entries being present in the same row, column or sub-box.

B.) Each row, column or sub-box is assumed to have on an average t(α) non-zero entries with no-repeating alphabet. This means some types of vectors with t(α) non-zero entries will never occur, thereby reducing the search space of solutions. This could also reduce k. For instance, fixing t(α) entries in a sub-box, a row and a column would reduce the search space from α4Ct(α)α2 to α4(3α21)Ct(α)α23t(α).

A comment: May be a multi-user arbitrarily correlated Slepian-Wolf model will help make the entries independent while still respecting the atmost t(α)α2 non-zero entries criterion. However, if one could use it, one need not have gone through the compressed sensing route. So applicability of Slepian-Wolf might be hard.

C.)From an error correction analogy, an even significant reduction may be possible, since in higher dimensions, there could be gaps between the half-the-minimum-distance radii hamming balls around code points with a possibility to correct greater errors. This also should lead to reduction of k.

D.) V itself can be entropy compressed. If the entries of V are quite similar in sizes, then can we assume that the difference between any two of the entries is atmost O((Vmax))=O(|α2|)? Then if encoding the differences between the entries suffices, this itself will remove the factor 2 in βlogα2=2kt(α)α2logα.

It would be interesting to see if 2k can be made equal or less than 2 using A.), B.), C.) and D.). This would be better than 89 bits (which is the best so far in other answers) and for the best case better than the absolute minimum for all puzzles which is around 73bits.


1

This is to report an implementation of completed-sudoku compact encoding (similar to suggestion by Zurui Wang 9/14/11).

The input is the top row and 1st 3 digits of the 2nd row. These are reduced to 1-9! and 1-120 and combined to <= 4.4x10^7. These are used as givens to count lexicographically all the partial sukokus of 30 digits up to the matching sequence. Then the final count up to the entire 81 digits is done the same way. These 3 sequences are stored as 32-bit integers of max 26 bits, so can be compressed further. The entire process takes about 3 minutes, with the 1st 30 digits taking most of the time. The decoding is similar--except matching counts instead of sudokus.

Coming soon--Revision includes 1st 3 digits of 2nd row in enumeration of 30 digit completions (2nd 32-bit code), comparisons with Jarvis enumeration (Jscott, 3/1615)


1
FYI: If you created two accounts and would like to merge them, see cstheory.stackexchange.com/help/merging-accounts
D.W.

0

I would go with the following simple analysis:

Each value could be stored in 4 bits (ranges from 1-9, these three bits even allow for 0-16)

If we considered to store the WHOLE solution (not optimal), having 9×9=81 values. 3 bits each = 243 bits.

However, as the rules that the solved sudoku has to follow, storing every bit is in fact redundant. However, since the order is important, you need to store the first 8 values in each row (thus determining the 9th value), for 8 rows (thus determining the last row). This reduces the sudoku to 8×8 for 3 bits, 192 bits (24 bytes).

I guess I could reduce it to:

b=log2(v)(n1)

where

v = range of values (I've seen 0-5 sudokus a lot)

n = number of rows / columns

Edit: Neo Style: I know Latex.


-2

That number is different for each Sudoku. One of the rules for Sudoku is that it has exactly one solution.

So if you look at an example, that's the minimum amount of data that you must store.

If you work from the opposite side, you can remove digit by digit and run a solver on the result to see if it still has exactly one solution. If so, you can delete another digit. If not, you must restore this digit and try another. If you can't, you have found a minimum.

Since most puzzles start mostly empty, a run length encoding will probably yield good results.


This greedy approach not necessarily achieves the minimum, perhaps you need to select carefully which digit to remove in each step.
Diego de Estrada

It's just an example. Google for "sudoku puzzle generators" to get more sophisticated ones.
Aaron Digulla

5
I really don't see why you would expect this to perform particularly well. This just seems to be gut feeling rather than an answer.
Joe Fitzsimons
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.