Scala 목록에서 발생 횟수를 어떻게 계산할 수 있습니까?

99

val list = List(1,2,4,2,4,7,3,2,4)

다음과 같이 구현하고 싶습니다 : list.count(2)(3을 반환합니다).

scala

— 개 츠피
소스

스칼라에서 목록의 크기를 얻는 적절한 방법이 있는지는 모르겠지만 상황에 따라 시퀀스를 사용할 수 있습니다.

— Qusay Fantazia

이 질문에 여전히 답이 없습니까? 당신이 그것을 받아들이는 것을 잊었을 수도 있기 때문에 묻는다.

— Tobias Kolb

150

다른 답변 중 하나의 다소 깨끗한 버전은 다음과 같습니다.

val s = Seq("apple", "oranges", "apple", "banana", "apple", "oranges", "oranges")

s.groupBy(identity).mapValues(_.size)

Map원래 시퀀스의 각 항목에 대한 개수와 함께 제공 :

Map(banana -> 1, oranges -> 3, apple -> 3)

질문은 특정 항목의 개수를 찾는 방법을 묻습니다. 이 방법을 사용하면 솔루션은 원하는 요소를 다음과 같이 개수 값에 매핑해야합니다.

s.groupBy(identity).mapValues(_.size)("apple")

— 오루 누루 우스
소스

2

"정체성"이란 무엇입니까?

— Igorock

4

여기에 설명 된대로 식별 기능 입니다. 함수 groupBy에는 요소에 적용되는 함수 가 필요하므로 그룹화하는 방법을 알고 있습니다. 답의 문자열을 ID별로 그룹화하는 대신 길이 ( groupBy(_.size)) 또는 첫 글자 ( groupBy(_.head))로 그룹화 할 수 있습니다 .

— ohruunuruus

2

단점은 쓸모없는 컬렉션이 많이 생성된다는 것입니다 (크기 만 필요하기 때문에).

— Yann Moisan

새 맵을 만드는 대신 해당 표현식에서 누산기 맵을 정의하려면 어떻게해야합니까?

— Tobias Kolb

128

scala 컬렉션에는 count다음 이 있습니다 .list.count(_ == 2)

— 시에 페이
소스

48

나는 Sharath Prabhal과 같은 문제가 있었고 또 다른 (더 명확한) 해결책이 있습니다.

val s = Seq("apple", "oranges", "apple", "banana", "apple", "oranges", "oranges")
s.groupBy(l => l).map(t => (t._1, t._2.length))

결과적으로 :

Map(banana -> 1, oranges -> 3, apple -> 3)

— KWA
소스

44

다소 깔끔한 버전은s.groupBy(identity).mapValues(_.size)

— ohruunuruus

1

@ohruunuruus 이것은 답변이어야합니다 (대 의견); 만약 그렇다면 열정적으로 찬성하고 싶습니다 (그리고 내가 OP라면 최선의 답변으로 선택하십시오);

— doug

1

@doug SO 다소 새로운 있는지,하지만 행복 강요하지 않는 한

— ohruunuruus

27

list.groupBy(i=>i).mapValues(_.size)

준다

Map[Int, Int] = Map(1 -> 1, 2 -> 3, 7 -> 1, 3 -> 1, 4 -> 3)

(i=>i)내장 identity함수로 바꿀 수 있습니다 .

list.groupBy(identity).mapValues(_.size)

— 노 에고
소스

사랑 짧은 솔루션은 내장에서 라이브러리 사용

— 루스 탐 알리 예프

14

val list = List(1, 2, 4, 2, 4, 7, 3, 2, 4)
// Using the provided count method this would yield the occurrences of each value in the list:
l map(x => l.count(_ == x))

List[Int] = List(1, 3, 3, 3, 3, 1, 1, 3, 3)
// This will yield a list of pairs where the first number is the number from the original list and the second number represents how often the first number occurs in the list:
l map(x => (x, l.count(_ == x)))
// outputs => List[(Int, Int)] = List((1,1), (2,3), (4,3), (2,3), (4,3), (7,1), (3,1), (2,3), (4,3))

— AndreasScheinert
소스

1

그러나 그것은 num을 산출합니다. 발생 값으로 여러 번 각 값에 대해 ... 매우 유용 비효율적하지 발생-것

— 에릭 Kaplun

13

시작 Scala 2.13하면 groupMapReduce 메서드가 목록을 한 번에 통과합니다.

// val seq = Seq("apple", "oranges", "apple", "banana", "apple", "oranges", "oranges")
seq.groupMapReduce(identity)(_ => 1)(_ + _)
// immutable.Map[String,Int] = Map(banana -> 1, oranges -> 3, apple -> 3)
seq.groupMapReduce(identity)(_ => 1)(_ + _)("apple")
// Int = 3

이:

groupS리스트 엘리먼트 (그룹 부 그룹 의 MapReduce)
maps 각 그룹화 된 값 발생을 1 ( Map Reduce 그룹의 일부 매핑 )
reduces 값 _ + _을 합산하여 값 그룹 ( ) 내의 값 ( groupMap Reduce의 일부를 줄임 ).

다음 으로 번역 할 수있는 내용의 원 패스 버전 입니다.

seq.groupBy(identity).mapValues(_.map(_ => 1).reduce(_ + _))

— 자비에 구이 호트
소스

좋아, 이것은 내가 찾던 것입니다. Java 스트림 (일부 측면에서 좋지 않음)조차도 Scala가 할 수없는 동안 단일 패스로 허용한다는 것이 슬프다는 것을 알았습니다.

— Dici

8

같은 문제가 발생했지만 한 번에 여러 항목을 계산하고 싶었습니다 ..

val s = Seq("apple", "oranges", "apple", "banana", "apple", "oranges", "oranges")
s.foldLeft(Map.empty[String, Int]) { (m, x) => m + ((x, m.getOrElse(x, 0) + 1)) }
res1: scala.collection.immutable.Map[String,Int] = Map(apple -> 3, oranges -> 3, banana -> 1)

https://gist.github.com/sharathprabhal/6890475

— 샤 라스 프라 발
소스

아마도 사용 Stream하고 받아 들인 대답은 "one go"의 목표와 더 명확한 코드를 산출 할 것입니다.

— juanchito

이 솔루션은 groupBy를 사용하여 List를 한 번만 반복 한 다음 map이 두 번 수행합니다.

— ruloweb

7

당신처럼 사용하려는 경우 list.count(2)당신이 사용하여 구현하는 암시 적 클래스 .

implicit class Count[T](list: List[T]) {
  def count(n: T): Int = list.count(_ == n)
}

List(1,2,4,2,4,7,3,2,4).count(2)  // returns 3
List(1,2,4,2,4,7,3,2,4).count(5)  // returns 0

— LRLucena
소스

7

짧은 답변:

import scalaz._, Scalaz._
xs.foldMap(x => Map(x -> 1))

긴 대답 :

주어진 Scalaz 사용 .

import scalaz._, Scalaz._

val xs = List('a, 'b, 'c, 'c, 'a, 'a, 'b, 'd)

그런 다음 이들 모두 (덜 단순화 된 순서에서보다 단순화 된 순서로)

xs.map(x => Map(x -> 1)).foldMap(identity)
xs.map(x => Map(x -> 1)).foldMap()
xs.map(x => Map(x -> 1)).suml
xs.map(_ -> 1).foldMap(Map(_))
xs.foldMap(x => Map(x -> 1))

수율

Map('b -> 2, 'a -> 3, 'c -> 2, 'd -> 1)

— 에릭 카플 룬
소스

6

이 경우를 위해 의도적으로 설계된 기본값 0 값을 가진 맵이 최악의 성능을 보여줍니다 (간결하지 않음 groupBy).

    type Word = String
    type Sentence = Seq[Word]
    type Occurrences = scala.collection.Map[Char, Int]

  def woGrouped(w: Word): Occurrences = {
        w.groupBy(c => c).map({case (c, list) => (c -> list.length)})
  }                                               //> woGrouped: (w: forcomp.threadBug.Word)forcomp.threadBug.Occurrences

  def woGetElse0Map(w: Word): Occurrences = {
        val map = Map[Char, Int]()
        w.foldLeft(map)((m, c) => m + (c -> (m.getOrElse(c, 0) + 1)) )
  }                                               //> woGetElse0Map: (w: forcomp.threadBug.Word)forcomp.threadBug.Occurrences

  def woDeflt0Map(w: Word): Occurrences = {
        val map = Map[Char, Int]().withDefaultValue(0)
        w.foldLeft(map)((m, c) => m + (c -> (m(c) + 1)) )
  }                                               //> woDeflt0Map: (w: forcomp.threadBug.Word)forcomp.threadBug.Occurrences

  def dfltHashMap(w: Word): Occurrences = {
        val map = scala.collection.immutable.HashMap[Char, Int]().withDefaultValue(0)
        w.foldLeft(map)((m, c) => m + (c -> (m(c) + 1)) )
    }                                             //> dfltHashMap: (w: forcomp.threadBug.Word)forcomp.threadBug.Occurrences

    def mmDef(w: Word): Occurrences = {
        val map = scala.collection.mutable.Map[Char, Int]().withDefaultValue(0)
        w.foldLeft(map)((m, c) => m += (c -> (m(c) + 1)) )
  }                                               //> mmDef: (w: forcomp.threadBug.Word)forcomp.threadBug.Occurrences

    val functions = List("grp" -> woGrouped _, "mtbl" -> mmDef _, "else" -> woGetElse0Map _
    , "dfl0" -> woDeflt0Map _, "hash" -> dfltHashMap _
    )                                  //> functions  : List[(String, String => scala.collection.Map[Char,Int])] = Lis
                                                  //| t((grp,<function1>), (mtbl,<function1>), (else,<function1>), (dfl0,<functio
                                                  //| n1>), (hash,<function1>))


    val len = 100 * 1000                      //> len  : Int = 100000
    def test(len: Int) {
        val data: String = scala.util.Random.alphanumeric.take(len).toList.mkString
        val firstResult = functions.head._2(data)

        def run(f: Word => Occurrences): Int = {
            val time1 = System.currentTimeMillis()
            val result= f(data)
            val time2 = (System.currentTimeMillis() - time1)
            assert(result.toSet == firstResult.toSet)
            time2.toInt
        }

        def log(results: Seq[Int]) = {
                 ((functions zip results) map {case ((title, _), r) => title + " " + r} mkString " , ")
        }

        var groupResults = List.fill(functions.length)(1)

        val integrals = for (i <- (1 to 10)) yield {
            val results = functions map (f => (1 to 33).foldLeft(0) ((acc,_) => run(f._2)))
            println (log (results))
                groupResults = (results zip groupResults) map {case (r, gr) => r + gr}
                log(groupResults).toUpperCase
        }

        integrals foreach println

    }                                         //> test: (len: Int)Unit


    test(len)
    test(len * 2)
// GRP 14 , mtbl 11 , else 31 , dfl0 36 , hash 34
// GRP 91 , MTBL 111

    println("Done")
    def main(args: Array[String]) {
    }

생산하다

grp 5 , mtbl 5 , else 13 , dfl0 17 , hash 17
grp 3 , mtbl 6 , else 14 , dfl0 16 , hash 16
grp 3 , mtbl 6 , else 13 , dfl0 17 , hash 15
grp 4 , mtbl 5 , else 13 , dfl0 15 , hash 16
grp 23 , mtbl 6 , else 14 , dfl0 15 , hash 16
grp 5 , mtbl 5 , else 13 , dfl0 16 , hash 17
grp 4 , mtbl 6 , else 13 , dfl0 16 , hash 16
grp 4 , mtbl 6 , else 13 , dfl0 17 , hash 15
grp 3 , mtbl 5 , else 14 , dfl0 16 , hash 16
grp 3 , mtbl 6 , else 14 , dfl0 16 , hash 16
GRP 5 , MTBL 5 , ELSE 13 , DFL0 17 , HASH 17
GRP 8 , MTBL 11 , ELSE 27 , DFL0 33 , HASH 33
GRP 11 , MTBL 17 , ELSE 40 , DFL0 50 , HASH 48
GRP 15 , MTBL 22 , ELSE 53 , DFL0 65 , HASH 64
GRP 38 , MTBL 28 , ELSE 67 , DFL0 80 , HASH 80
GRP 43 , MTBL 33 , ELSE 80 , DFL0 96 , HASH 97
GRP 47 , MTBL 39 , ELSE 93 , DFL0 112 , HASH 113
GRP 51 , MTBL 45 , ELSE 106 , DFL0 129 , HASH 128
GRP 54 , MTBL 50 , ELSE 120 , DFL0 145 , HASH 144
GRP 57 , MTBL 56 , ELSE 134 , DFL0 161 , HASH 160
grp 7 , mtbl 11 , else 28 , dfl0 31 , hash 31
grp 7 , mtbl 10 , else 28 , dfl0 32 , hash 31
grp 7 , mtbl 11 , else 28 , dfl0 31 , hash 32
grp 7 , mtbl 11 , else 28 , dfl0 31 , hash 33
grp 7 , mtbl 11 , else 28 , dfl0 32 , hash 31
grp 8 , mtbl 11 , else 28 , dfl0 31 , hash 33
grp 8 , mtbl 11 , else 29 , dfl0 38 , hash 35
grp 7 , mtbl 11 , else 28 , dfl0 32 , hash 33
grp 8 , mtbl 11 , else 32 , dfl0 35 , hash 41
grp 7 , mtbl 13 , else 28 , dfl0 33 , hash 35
GRP 7 , MTBL 11 , ELSE 28 , DFL0 31 , HASH 31
GRP 14 , MTBL 21 , ELSE 56 , DFL0 63 , HASH 62
GRP 21 , MTBL 32 , ELSE 84 , DFL0 94 , HASH 94
GRP 28 , MTBL 43 , ELSE 112 , DFL0 125 , HASH 127
GRP 35 , MTBL 54 , ELSE 140 , DFL0 157 , HASH 158
GRP 43 , MTBL 65 , ELSE 168 , DFL0 188 , HASH 191
GRP 51 , MTBL 76 , ELSE 197 , DFL0 226 , HASH 226
GRP 58 , MTBL 87 , ELSE 225 , DFL0 258 , HASH 259
GRP 66 , MTBL 98 , ELSE 257 , DFL0 293 , HASH 300
GRP 73 , MTBL 111 , ELSE 285 , DFL0 326 , HASH 335
Done

가장 간결한 groupBy것이 변경 가능한 맵보다 빠르다 는 것이 궁금합니다 !

— 발
소스

3

데이터 크기가 명확하지 않기 때문에이 벤치 마크에 대해 약간 의심 스럽습니다. groupBy솔루션은을 수행 toLower하지만, 다른 사람은하지 않습니다. 단지 사용 - 또한 왜지도에 대한 패턴 일치를 사용합니다 mapValues. 그래서 함께 굴려서 얻을 수 있습니다 def woGrouped(w: Word): Map[Char, Int] = w.groupBy(identity).mapValues(_.size)-그것을 시도하고 다양한 크기 목록의 성능을 확인하십시오. 마지막으로 다른 솔루션에서 왜 a) 선언 map하고 b) var로 만드나요 ?? 그냥 할w.foldLeft(Map.empty[Char, Int])...

— samthebest

1

더 많은 데이터를 제공해 주셔서 감사합니다 (투표 변경 :). 나는 groupBy 구현이 Builder반복적 증분에 최적화 된 s 의 변경 가능한 맵을 사용하는 이유라고 생각합니다 . 그런 다음를 사용하여 변경 가능한 맵을 변경 불가능한 맵으로 변환합니다 MapBuilder. 작업 속도를 높이기 위해 내부적으로 지연 평가가 진행될 수도 있습니다.

— samthebest jul.

@samthebest 당신은 카운터를 조회하고 증가시킵니다. 거기에 무엇을 캐시 할 수 있는지 모르겠습니다. 캐시는 어쨌든 같은 종류의 맵이어야합니다.

— Val

나는 그것이 아무것도 캐시한다고 말하는 것이 아닙니다. 성능 향상은 Builders 사용 과 아마도 지연 평가 에서 비롯된 것이라고 생각합니다 .

— samthebest jul.

@samthebest 게으른 평가 = 지연된 평가 (이름으로 호출) + 캐싱. 지연 평가에 대해서는 말할 수 없지만 캐싱에 대해서는 말할 수 없습니다.

— Val

4

나는 사용하여 목록의 크기를 못해서 length가 아니라 size문제의보고 있기 때문에 위의 대답 하나는 제안 여기 .

val list = List("apple", "oranges", "apple", "banana", "apple", "oranges", "oranges")
list.groupBy(x=>x).map(t => (t._1, t._2.size))

— Edkeveked
소스

3

다음은 또 다른 옵션입니다.

scala> val list = List(1,2,4,2,4,7,3,2,4)
list: List[Int] = List(1, 2, 4, 2, 4, 7, 3, 2, 4)

scala> list.groupBy(x => x) map { case (k,v) => k-> v.length }
res74: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 3, 7 -> 1, 3 -> 1, 4 -> 3)

— 아 카발
소스

3

scala> val list = List(1,2,4,2,4,7,3,2,4)
list: List[Int] = List(1, 2, 4, 2, 4, 7, 3, 2, 4)

scala> println(list.filter(_ == 2).size)
3

— 발루
소스

3

고양이 사용

import cats.implicits._

"Alphabet".toLowerCase().map(c => Map(c -> 1)).toList.combineAll
"Alphabet".toLowerCase().map(c => Map(c -> 1)).toList.foldMap(identity)

— Sergii Shevchyk
소스

2

와, 원래 시퀀스를 4 번 반복했습니다! 심지어 seq.groupBy(identity).mapValues(_.size)두 번 밖에 통해 간다.

— WeaponsGrade

반복의 수는 컬렉션의 항목 수백만를 처리 할 때 문제가되지 "알파벳"와 같은 작은 문자열을하지만, 반복 확실히 할 수있다 할 문제!

— WeaponsGrade

2

이것을 시도하면 작동합니다.

val list = List(1,2,4,2,4,7,3,2,4)
list.count(_==2)

3을 반환합니다.

— dcripse 데이터
소스

이것이 7 년 전 xiefei의 대답과 어떻게 다른가요?

— jwvh

0

여기에 아주 쉬운 방법이 있습니다.

val data = List("it", "was", "the", "best", "of", "times", "it", "was", 
                 "the", "worst", "of", "times")
data.foldLeft(Map[String,Int]().withDefaultValue(0)){
  case (acc, letter) =>
    acc + (letter -> (1 + acc(letter)))
}
// => Map(worst -> 1, best -> 1, it -> 2, was -> 2, times -> 2, of -> 2, the -> 2)

— 짐 뉴턴
소스