표본 표준화 (z- 점수 계산)


14

부동 소수점 숫자의 목록이 주어지면 있으면이를 표준화하십시오 .

세부

  • 모든 값 의 평균 이 0이고 표준 편차1 이면 리스트 x1,x2,,xn표준화 됩니다. 이를 계산하는 한 가지 방법은 먼저 평균 μ 와 표준 편차 σμ =로 계산하는 것입니다. 1μσ
    μ=1ni=1nxiσ=1ni=1n(xiμ)2,
    모든xixiμ로 바꾸어 표준화 계산xiμσ .
  • 입력에 두 개 이상의 별개의 항목이 포함되어 있다고 가정 할 수 있습니다 ( σ0 을 의미 ).
  • 일부 구현에서는 여기서 사용 하는 모집단 표준 편차 σ 같지 않은 샘플 표준 편차 를 사용합니다.
  • 있습니다 CW 응답 모든 사소한 솔루션 .

[1,2,3] -> [-1.224744871391589,0.0,1.224744871391589]
[1,2] -> [-1,1]
[-3,1,4,1,5] -> [-1.6428571428571428,-0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]

(이 예제는 이 스크립트 로 생성되었습니다 .)

답변:






4

APL + WIN, 41,32 30 바이트

Erik 덕분에 9 바이트 절약 + ngn 덕분에 2 개 더 절약

x←v-(+/v)÷⍴v←⎕⋄x÷(+/x×x÷⍴v)*.5

숫자 벡터를 요구하고 입력 표준의 평균 표준 편차 및 표준화 된 요소를 계산합니다.


할당 x←v-(+/v)÷⍴v←⎕후 할 수 x÷((+/x*2)÷⍴v)*.5없습니까?
아웃 골퍼 Erik

참으로 할 수 있습니다. 감사.
Graham

apl + win은 싱글 톤 확장 ( 1 2 3+,4← → 1 2 3+4)을 수행합니까? 그렇다면 다음 (+/x*2)÷⍴v과 같이 다시 작성할 수 있습니다.+/x×x÷⍴v
ngn

@ngn 다른 2 바이트에서 작동합니다. 감사.
Graham

3

R + pryr, 53 52 바이트

@Robert S.의 솔루션 sum(x|1)대신에 -1 바이트 사용length(x)

pryr::f((x-(y<-mean(x)))/(sum((x-y)^2)/sum(x|1))^.5)

통계학자를 위해 만들어진 언어이기 때문에 이것이 내장 함수가 없다는 사실에 놀랐습니다. 적어도 내가 찾을 수있는 것은 아닙니다. 함수조차도 mosaic::zscore예상되는 결과를 얻지 못합니다. 샘플 표준 편차 대신 모집단 표준 편차를 사용하기 때문일 수 있습니다.

온라인으로 사용해보십시오!


2
를 1 바이트로 저장하기 위해 <-로 변경할 수 있습니다 =.
Robert S.

@ J. Doe nope, Robert S.의 솔루션에 대해 언급 한 방법을 사용했습니다. scale깔끔하다!
주세페

2
@ J.Doe n한 번만 사용하므로 38 바이트에
Giuseppe

2
@RobertS. 여기서 PPCG에서는 출력의 정확한 레이아웃이 문제의 핵심 인 문제를 제외하고 필요한 것보다 더 많은 것을 포함하여 유연한 입력 및 출력을 허용하는 경향이 있습니다.
ngm

6
물론 R 내장은 "인구 분산"을 사용하지 않습니다. 혼란스러운 엔지니어들만이 그러한 것을 사용할 것입니다 (따라서 파이썬과 Matlab이 대답합니다;)
ngm


2

Jelly, 10 bytes

_ÆmµL½÷ÆḊ×

Try it online!

더 짧지는 않지만 Jelly의 결정 함수 ÆḊ는 벡터 표준을 계산합니다.

_Æm             x - mean(x)
   µ            then:
    L½          Square root of the Length
      ÷ÆḊ       divided by the norm
         ×      Multiply by that value

이봐, 좋은 대안! 불행히도, 나는 그것을 줄이는 방법을 볼 수 없습니다.
Outgolfer Erik

2

Mathematica, 25 바이트

Mean[(a=#-Mean@#)a]^-.5a&

순수한 기능. 숫자 목록을 입력으로 사용하고 기계 정밀 숫자 목록을 출력으로 반환합니다. 내장 Standardize함수는 기본적으로 샘플 분산을 사용합니다.


2

J , 22 바이트

Cows ck 덕분에 -1 바이트!

(-%[:%:1#.-*-%#@[)+/%#

온라인으로 사용해보십시오!

J , 31 23 바이트

(-%[:%:#@[%~1#.-*-)+/%#

온라인으로 사용해보십시오!

                   +/%# - mean (sum (+/) divided (%) by the number of samples (#)) 
(                 )     - the list is a left argument here (we have a hook)
                 -      - the difference between each sample and the mean
                *       - multiplied by 
               -        - the difference between each sample and the mean
            1#.         - sum by base-1 conversion
          %~            - divided by
       #@[              - the length of the samples list
     %:                 - square root
   [:                   - convert to a fork (function composition) 
 -                      - subtract the mean from each sample
  %                     - and divide it by sigma

1
재정렬하면 22 [:(%[:%:1#.*:%#)]-+/%# tio.run/##y/qfVmyrp2CgYKVg8D/… 이됩니다. 그 중 하나를 제거 할 수 있었지만 아직 운이 없었습니다. 편집 : 더 직접적인 바이트 동작 (-%[:%:1#.-*-%#@[)+/%#도 22입니다
user41805

@Cows quack 감사합니다!
Galen Ivanov


2

Haskell, 80 75 68 bytes

t x=k(/sqrt(f$sum$k(^2)))where k g=g.(-f(sum x)+)<$>x;f=(/sum(1<$x))

Thanks to @flawr for the suggestions to use sum(1<$x) instead of sum[1|_<-x] and to inline the mean, @xnor for inlining the standard deviation and other reductions.

Expanded:

-- Standardize a list of values of any floating-point type.
standardize :: Floating a => [a] -> [a]
standardize input = eachLessMean (/ sqrt (overLength (sum (eachLessMean (^2)))))
  where

    -- Map a function over each element of the input, less the mean.
    eachLessMean f = map (f . subtract (overLength (sum input))) input

    -- Divide a value by the length of the input.
    overLength n = n / sum (map (const 1) input)

1
You can replace [1|_<-x] with (1<$x) to save a few bytes. That is a great trick for avoiding the fromIntegral, that I haven't seen so far!
flawr

By the way: I like using tryitonline, you can run your code there and then copy the preformatted aswer for posting here!
flawr


You can write (-x+) for (+(-x)) to avoid parens. Also it looks like f can be pointfree: f=(/sum(1<$x)), and s can be replaced with its definition.
xnor

@xnor Ooh, (-x+) is handy, I’m sure I’ll be using that in the future
Jon Purdy

2

MathGolf, 7 bytes

▓-_²▓√/

Try it online!

Explanation

This is literally a byte-for-byte recreation of Kevin Cruijssen's 05AB1E answer, but I save some bytes from MathGolf having 1-byters for everything needed for this challenge. Also the answer looks quite good in my opinion!

▓         get average of list
 -        pop a, b : push(a-b)
  _       duplicate TOS
   ²      pop a : push(a*a)
    ▓     get average of list
     √    pop a : push(sqrt(a)), split string to list
      /   pop a, b : push(a/b), split strings

1

JavaScript (ES7),  80  79 bytes

a=>a.map(x=>(x-g(a))/g(a.map(x=>(x-m)**2))**.5,g=a=>m=eval(a.join`+`)/a.length)

Try it online!

Commented

a =>                      // given the input array a[]
  a.map(x =>              // for each value x in a[]:
    (x - g(a)) /          //   compute (x - mean(a)) divided by
    g(                    //   the standard deviation:
      a.map(x =>          //     for each value x in a[]:
        (x - m) ** 2      //       compute (x - mean(a))²
      )                   //     compute the mean of this array
    ) ** .5,              //   and take the square root
    g = a =>              //   g = helper function taking an array a[],
      m = eval(a.join`+`) //     computing the mean
          / a.length      //     and storing the result in m
  )                       // end of outer map()


1

Haskell, 59 bytes

(%)i=sum.map(^i)
f l=[(0%l*y-1%l)/sqrt(2%l*0%l-1%l^2)|y<-l]

Try it online!

Doesn't use libraries.

The helper function % computes the sum of ith powers of a list, which lets us get three useful values.

  • 0%l is the length of l (call this n)
  • 1%l is the sum of l (call this s)
  • 2%l is the sum of squares of l (call this m)

We can express the z-score of an element y as

(n*y-s)/sqrt(n*v-s^2)

(This is the expression (y-s/n)/sqrt(v/n-(s/n)^2) simplified a bit by multiplying the top and bottom by n.)

We can insert the expressions 0%l, 1%l, 2%l without parens because the % we define has higher precedence than the arithmetic operators.

(%)i=sum.map(^i) is the same length as i%l=sum.map(^i)l. Making it more point-free doesn't help. Defining it like g i=... loses bytes when we call it. Although % works for any list but we only call it with the problem input list, there's no byte loss in calling it with argument l every time because a two-argument call i%l is no longer than a one-argument one g i.


We do have LATEX here:)
flawr

I really like the % idea! It looks just like the discrete version of the statistical moments.
flawr

1

K (oK), 33 23 bytes

-10 bytes thanks to ngn!

{t%%(+/t*t:x-/x%#x)%#x}

Try it online!

First attempt at coding (I don't dare to name it "golfing") in K. I'm pretty sure it can be done much better (too many variable names here...)


1
nice! you can replace the initial (x-m) with t (tio)
ngn

1
the inner { } is unnecessary - its implicit parameter name is x and it has been passed an x as argument (tio)
ngn

1
another -1 byte by replacing x-+/x with x-/x. the left argument to -/ serves as initial value for the reduction (tio)
ngn

@ngn Thank you! Now I see that the first 2 golfs are obvious; the last one is beyond my current level :)
Galen Ivanov


1

TI-Basic (83 series), 14 11 bytes

Ans-mean(Ans
Ans/√(mean(Ans²

Takes input in Ans. For example, if you type the above into prgmSTANDARD, then {1,2,3}:prgmSTANDARD will return {-1.224744871,0.0,1.224744871}.

Previously, I tried using the 1-Var Stats command, which stores the population standard deviation in σx, but it's less trouble to compute it manually.


1

05AB1E, 9 bytes

ÅA-DnÅAt/

Port of @Arnauld's JavaScript answer, so make sure to upvote him!

Try it online or verify all test cases.

Explanation:

ÅA          # Calculate the mean of the (implicit) input
            #  i.e. [-3,1,4,1,5] → 1.6
  -         # Subtract it from each value in the (implicit) input
            #  i.e. [-3,1,4,1,5] and 1.6 → [-4.6,-0.6,2.4,-0.6,3.4]
   D        # Duplicate that list
    n       # Take the square of each
            #  i.e. [-4.6,-0.6,2.4,-0.6,3.4] → [21.16,0.36,5.76,0.36,11.56]
     ÅA     # Pop and calculate the mean of that list
            #  i.e. [21.16,0.36,5.76,0.36,11.56] → 7.84
       t    # Take the square-root of that
            #  i.e. 7.84 → 2.8
        /   # And divide each value in the duplicated list with it (and output implicitly)
            #  i.e. [-4.6,-0.6,2.4,-0.6,3.4] and 2.8 → [-1.6428571428571428,
            #   -0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]


0

Pyth, 21 19 bytes

mc-dJ.OQ@.Om^-Jk2Q2

Try it online here.

mc-dJ.OQ@.Om^-Jk2Q2Q   Implicit: Q=eval(input())
                       Trailing Q inferred
    J.OQ               Take the average of Q, store the result in J
           m     Q     Map the elements of Q, as k, using:
             -Jk         Difference between J and k
            ^   2        Square it
         .O            Find the average of the result of the map
        @         2    Square root it
                       - this is the standard deviation of Q
m                  Q   Map elements of Q, as d, using:
  -dJ                    d - J
 c                       Float division by the standard deviation
                       Implicit print result of map

Edit: after seeing Kevin's answer, changed to use the average builtin for the inner results. Previous answer: mc-dJ.OQ@csm^-Jk2QlQ2


0

SNOBOL4 (CSNOBOL4), 229 bytes

	DEFINE('Z(A)')
Z	X =X + 1
	M =M + A<X>	:S(Z)
	N =X - 1.
	M =M / N
D	X =GT(X) X - 1	:F(S)
	A<X> =A<X> - M	:(D)
S	X =LT(X,N) X + 1	:F(Y)
	S =S + A<X> ^ 2 / N	:(S)
Y	S =S ^ 0.5
N	A<X> =A<X> / S
	X =GT(X) X - 1	:S(N)
	Z =A	:(RETURN)

Try it online!

Link is to a functional version of the code which constructs an array from STDIN given its length and then its elements, then runs the function Z on that, and finally prints out the values.

Defines a function Z which returns an array.

The 1. on line 4 is necessary to do the floating point arithmetic properly.



0

Charcoal, 25 19 bytes

≧⁻∕ΣθLθθI∕θ₂∕ΣXθ²Lθ

Try it online! Link is to verbose version of code. Explanation:

       θ    Input array
≧           Update each element
 ⁻          Subtract
   Σ        Sum of
    θ       Input array
  ∕         Divided by
     L      Length of
      θ     Input array

Calculate μ and vectorised subtract it from each xi.

  θ         Updated array
 ∕          Vectorised divided by
   ₂        Square root of
     Σ      Sum of
       θ    Updated array
      X     Vectorised to power
        ²   Literal 2
    ∕       Divided by
         L  Length of
          θ Array
I           Cast to string
            Implicitly print each element on its own line.

Calculate σ, vectorised divide each xi by it, and output the result.

Edit: Saved 6 bytes thanks to @ASCII-only for a) using SquareRoot() instead of Power(0.5) b) fixing vectorised Divide() (it was doing IntDivide() instead) c) making Power() vectorise.


crossed out 25 = no bytes? :P (Also, you haven't updated the TIO link yet)
ASCII-only

@ASCII-only Oops, thanks!
Neil
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.