이 게임에서 승리 단어 세트를 찾는 가장 빠른 파이썬 코드

이것은 어린이를위한 활동 카드 세트의 단어 게임입니다. 규칙 아래에는 / usr / share / dict / words를 사용하여 최상의 삼중 항을 찾는 코드가 있습니다. 흥미로운 최적화 문제라고 생각하고 사람들이 개선점을 찾을 수 있는지 궁금합니다.

규칙

아래의 각 세트에서 하나의 문자를 선택하십시오.
선택한 문자 (및 다른 문자)를 사용하여 단어를 선택하십시오.
단어를 득점하십시오.
- 선택한 세트의 각 문자는 세트와 함께 표시되는 번호를 얻습니다 (반복 포함).
- AEIOU 카운트 0
- 다른 모든 글자는 -2입니다
위의 1-3 단계를 반복합니다 (1 단계에서 문자를 재사용하지 않음).
최종 점수는 세 단어 점수의 합입니다.

세트

(1 점 1 점, 2 점 2 점 등 설정)

암호:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

매트릭스 버전은 순수한 파이썬에서 하나를 작성하고 (사전을 사용하고 각 단어를 독립적으로 채점) 하나를 작성하고 다른 하나는 숫자가 있지만 행렬 곱셈보다는 색인을 사용하여 얻은 것입니다.

다음 최적화는 모음에서 모음을 완전히 제거하고 수정 된 ord()기능을 사용하는 것이지만 더 빠른 접근 방법이 있는지 궁금합니다.

편집 : timeit.timeit 코드 추가

편집 : 현상금을 추가하고 싶습니다. 가장 좋아하는 개선 사항 (또는 여러 답변)을 줄 것입니다. 그렇다면 평판이 더 높아야합니다.

fastest-code python optimization

— 너
소스

BTW, 나는 그의 어머니와 대결 할 때 8 살짜리 세 단어를 암기하도록 코드를 썼습니다. 이제 나는 xylopyrography의 의미를 알고 있습니다.

이것은 재미있는 질문입니다. 다음을 제공하면 더 많은 답변을 얻을 수 있다고 생각합니다. (1) 모든 사람이 동일한 데이터 세트로 작업 할 수 있도록 온라인 단어 목록에 대한 링크입니다. (2) 솔루션을 단일 기능에 넣으십시오. (3) time-it 모듈을 사용하여 해당 기능을 실행하여 타이밍을 표시하십시오. (4) 디스크 속도를 테스트하지 않도록 사전 데이터를 함수 외부에로드하십시오. 그러면 사람들은 기존 코드를 솔루션 비교를위한 프레임 워크로 사용할 수 있습니다.

timeit을 사용하도록 다시 작성하지만 공정한 비교를 위해서는 내 컴퓨터를 사용해야합니다 (솔루션을 게시하는 사람들을 위해 기쁘다). 대부분의 시스템에서 단어 목록을 사용할 수 있지만 그렇지 않은 경우 여기에 몇 가지가 있습니다. wordlist.sourceforge.net

각 사용자가 솔루션과 게시 된 다른 솔루션을 자신의 컴퓨터에서 비교하면 공정한 비교가 가능합니다. 플랫폼간에 약간의 차이점이 있지만 일반적 으로이 방법이 효과적입니다.

흠,이 경우 올바른 사이트인지 궁금합니다. 나는 SO가 가장 적합했을 것이라고 생각합니다.

— Joey

답변:

Keith는 각 단어에 대해 가능한 최고 점수를 미리 계산한다는 아이디어를 사용하여 컴퓨터에서 실행 시간을 약 0.7 초로 줄였습니다 (75,288 단어 목록 사용).

비결은 선택된 글자의 모든 조합 대신 재생되는 단어 조합을 거치는 것입니다. 우리는 이미 찾은 것보다 더 높은 점수를 얻을 수 없기 때문에 몇 단어 조합 (제 단어 목록을 사용하여 203)을 제외한 모든 단어를 무시할 수 있습니다. 거의 모든 실행 시간은 사전 계산 단어 점수에 사용됩니다.

파이썬 2.7 :

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

이것은 ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']점수가 95 인 솔루션 을 반환합니다. Keith의 솔루션 단어가 단어 목록에 추가되면 그와 같은 결과를 얻습니다. thouis의 "xylopyrography"를 추가하면 ['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']105 점을 얻 습니다.

— Flornquake
소스

아이디어는 다음과 같습니다. 대부분의 단어에 끔찍한 점수가 있다는 것을 알면 많은 단어를 확인하지 않아도됩니다. 당신이 50 점을 얻는 아주 좋은 득점 놀이를 발견했다고 가정 해보십시오. 그런 다음 50 점을 초과하는 플레이는 적어도 ceil (51/3) = 17 점의 단어를 가져야합니다. 따라서 17 포인트를 생성 할 수없는 단어는 무시할 수 있습니다.

위의 코드는 다음과 같습니다. 사전의 각 단어에 대해 가능한 최고 점수를 계산하여 점수를 기준으로 색인이 지정된 배열에 저장합니다. 그런 다음 해당 배열을 사용하여 필요한 최소 점수를 가진 단어 만 확인합니다.

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

최소 점수는 100까지 빠르게 올라갑니다. 즉 전체 점수의 아주 작은 부분 인 33 개 이상의 포인트 단어 만 고려하면됩니다 (내가 /usr/share/dict/words208662 개의 유효한 단어를 가지고 있으며 그 중 1723 개만 33+ 포인트 = 0.8 % 임). 내 컴퓨터에서 약 30 분 안에 실행되며 다음을 생성합니다.

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— 키이스 랜달
소스

좋은. 나는 그것을 매트릭스 솔루션에 추가 할 것입니다 (점수가 너무 낮아지면 단어를 제거하십시오). 그러나 이것은 내가 생각해 낸 순수한 파이썬 솔루션보다 훨씬 낫습니다.

— thouis

나는 이전에 많은 중첩 된 for 루프를 본 적이 있는지 확실하지 않습니다.

— 피터 올슨

아이디어를 매트릭스 스코어링 (및 최상의 스코어에서 더 엄격한 상한값)과 결합하면 내 컴퓨터에서 약 1 초에서 약 80 초까지 시간이 줄어 듭니다. 여기에 코드

— thouis

그 시간의 좋은 부분은 가능한 최고 점수의 사전 계산에 있으며 훨씬 빠를 수 있습니다.

— thouis