String in Character가 이모티콘인지 알아보세요?

Question 1

문자열의 문자가 이모티콘인지 확인해야합니다.

예를 들어 다음과 같은 캐릭터가 있습니다.

let string = "😀"
let character = Array(string)[0]

그 캐릭터가 이모티콘인지 알아 내야합니다.

Question 2

내가 우연히 발견 한 것은 문자, 유니 코드 스칼라 및 글리프의 차이입니다.

예를 들어 글리프 👨‍👨‍👧‍👧는 7 개의 유니 코드 스칼라로 구성됩니다.

4 개의 이모티콘 문자 : 👨👩👧👧
각 이모 지 사이에는 캐릭터 풀처럼 작동하는 특수 캐릭터가 있습니다. 자세한 내용 은 사양을 참조하십시오 .

또 다른 예, 글리프 👌🏿는 2 개의 유니 코드 스칼라로 구성됩니다.

일반 이모티콘 : 👌
피부톤 수정 자 : 🏿

마지막으로 글리프 1️⃣에는 세 개의 유니 코드 문자가 있습니다.

따라서 캐릭터를 렌더링 할 때 결과 글리프가 정말 중요합니다.

Swift 5.0 이상은이 과정을 훨씬 쉽게 만들어주고 우리가해야 할 추측을 없애줍니다. Unicode.Scalar의 새로운 Property유형은 우리가 다루는 것을 결정하는 데 도움이됩니다. 그러나 이러한 속성은 글리프 내의 다른 스칼라를 확인할 때만 의미가 있습니다. 이것이 우리를 돕기 위해 Character 클래스에 몇 가지 편리한 메서드를 추가하는 이유입니다.

자세한 내용 은 이것이 어떻게 작동하는지 설명하는 기사를 썼습니다 .

Swift 5.0의 경우 다음과 같은 결과가 나타납니다.

extension Character {
    /// A simple emoji is one scalar and presented to the user as an Emoji
    var isSimpleEmoji: Bool {
        guard let firstScalar = unicodeScalars.first else { return false }
        return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
    }

    /// Checks if the scalars will be merged into an emoji
    var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }

    var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}

extension String {
    var isSingleEmoji: Bool { count == 1 && containsEmoji }

    var containsEmoji: Bool { contains { $0.isEmoji } }

    var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } }

    var emojiString: String { emojis.map { String($0) }.reduce("", +) }

    var emojis: [Character] { filter { $0.isEmoji } }

    var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }
}

다음과 같은 결과를 얻을 수 있습니다.

"A̛͚̖".containsEmoji // false
"3".containsEmoji // false
"A̛͚̖▶️".unicodeScalars // [65, 795, 858, 790, 9654, 65039]
"A̛͚̖▶️".emojiScalars // [9654, 65039]
"3️⃣".isSingleEmoji // true
"3️⃣".emojiScalars // [51, 65039, 8419]
"👌🏿".isSingleEmoji // true
"🙎🏼‍♂️".isSingleEmoji // true
"🇹🇩".isSingleEmoji // true
"⏰".isSingleEmoji // true
"🌶".isSingleEmoji // true
"👨‍👩‍👧‍👧".isSingleEmoji // true
"🏴󠁧󠁢󠁳󠁣󠁴󠁿".isSingleEmoji // true
"🏴󠁧󠁢󠁥󠁮󠁧󠁿".containsOnlyEmoji // true
"👨‍👩‍👧‍👧".containsOnlyEmoji // true
"Hello 👨‍👩‍👧‍👧".containsOnlyEmoji // false
"Hello 👨‍👩‍👧‍👧".containsEmoji // true
"👫 Héllo 👨‍👩‍👧‍👧".emojiString // "👫👨‍👩‍👧‍👧"
"👨‍👩‍👧‍👧".count // 1

"👫 Héllœ 👨‍👩‍👧‍👧".emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]
"👫 Héllœ 👨‍👩‍👧‍👧".emojis // ["👫", "👨‍👩‍👧‍👧"]
"👫 Héllœ 👨‍👩‍👧‍👧".emojis.count // 2

"👫👨‍👩‍👧‍👧👨‍👨‍👦".isSingleEmoji // false
"👫👨‍👩‍👧‍👧👨‍👨‍👦".containsOnlyEmoji // true

이전 Swift 버전의 경우 이전 코드가 포함 된이 요점을 확인하십시오.

Question 3

이 작업을 수행 하는 가장 간단하고 깔끔하며 가장 빠른 방법은 다음과 같이 문자열의 각 문자에 대한 유니 코드 코드 포인트를 알려진 이모 지 및 딩뱃 범위와 비교하여 확인하는 것입니다.

extension String {

    var containsEmoji: Bool {
        for scalar in unicodeScalars {
            switch scalar.value {
            case 0x1F600...0x1F64F, // Emoticons
                 0x1F300...0x1F5FF, // Misc Symbols and Pictographs
                 0x1F680...0x1F6FF, // Transport and Map
                 0x2600...0x26FF,   // Misc symbols
                 0x2700...0x27BF,   // Dingbats
                 0xFE00...0xFE0F,   // Variation Selectors
                 0x1F900...0x1F9FF, // Supplemental Symbols and Pictographs
                 0x1F1E6...0x1F1FF: // Flags
                return true
            default:
                continue
            }
        }
        return false
    }

}

Question 4

extension String {
    func containsEmoji() -> Bool {
        for scalar in unicodeScalars {
            switch scalar.value {
            case 0x3030, 0x00AE, 0x00A9,// Special Characters
            0x1D000...0x1F77F,          // Emoticons
            0x2100...0x27BF,            // Misc symbols and Dingbats
            0xFE00...0xFE0F,            // Variation Selectors
            0x1F900...0x1F9FF:          // Supplemental Symbols and Pictographs
                return true
            default:
                continue
            }
        }
        return false
    }
}

이것은 업데이트 된 범위와 함께 내 수정입니다.

Question 5

스위프트 5.0

… 정확히 이것을 확인하는 새로운 방법을 도입했습니다!

당신은 String그것의 Scalars. 각 Scalar갖는다 Property지원 값isEmoji 가치를 가지고 있습니다!

실제로 Scalar가 Emoji modifier 이상인지 확인할 수도 있습니다. Apple의 문서를 확인하십시오 : https://developer.apple.com/documentation/swift/unicode/scalar/properties

Apple이 다음에 대해 다음과 같이 명시하고 있으므로 isEmojiPresentation대신에 확인하는 것이 좋습니다 .isEmojiisEmoji

이 속성은 기본적으로 이모티콘으로 렌더링되는 스칼라와 U + FE0F VARIATION SELECTOR-16이 뒤 따르는 경우 기본이 아닌 이모티콘 렌더링이있는 스칼라에 대해서도 true입니다. 여기에는 일반적으로 이모티콘으로 간주되지 않는 일부 스칼라가 포함됩니다.

이 방법은 실제로 Emoji를 모든 수정 자로 분할하지만 처리하기가 더 간단합니다. 그리고 Swift는 이제 수식어 (예 : 👨‍👩‍👧‍👦, 👨🏻‍💻, 🏴)가있는 이모티콘을 1로 간주하므로 모든 종류의 작업을 수행 할 수 있습니다.

var string = "🤓 test"

for scalar in string.unicodeScalars {
    let isEmoji = scalar.properties.isEmoji

    print("\(scalar.description) \(isEmoji)"))
}

// 🤓 true
//   false
// t false
// e false
// s false
// t false

NSHipster 는 모든 이모티콘을 얻는 흥미로운 방법을 지적합니다.

import Foundation

var emoji = CharacterSet()

for codePoint in 0x0000...0x1F0000 {
    guard let scalarValue = Unicode.Scalar(codePoint) else {
        continue
    }

    // Implemented in Swift 5 (SE-0221)
    // https://github.com/apple/swift-evolution/blob/master/proposals/0221-character-properties.md
    if scalarValue.properties.isEmoji {
        emoji.insert(scalarValue)
    }
}

Question 6

Swift 5를 사용하면 이제 문자열에있는 각 문자의 유니 코드 속성을 검사 할 수 있습니다. 이것은 우리 isEmoji에게 각 문자에 대한 편리한 변수를 제공 합니다. 문제는 isEmoji0-9와 같이 2 바이트 이모티콘으로 변환 할 수있는 모든 문자에 대해 true를 반환한다는 것입니다.

변수를보고 isEmoji모호한 문자가 이모 지로 표시되는지 확인하기 위해 이모 지 수정자가 있는지 확인할 수 있습니다.

이 솔루션은 여기에서 제공되는 정규식 솔루션보다 훨씬 더 미래를 보장해야합니다.

extension String {
    func containsOnlyEmojis() -> Bool {
        if count == 0 {
            return false
        }
        for character in self {
            if !character.isEmoji {
                return false
            }
        }
        return true
    }
    
    func containsEmoji() -> Bool {
        for character in self {
            if character.isEmoji {
                return true
            }
        }
        return false
    }
}

extension Character {
    // An emoji can either be a 2 byte unicode character or a normal UTF8 character with an emoji modifier
    // appended as is the case with 3️⃣. 0x238C is the first instance of UTF16 emoji that requires no modifier.
    // `isEmoji` will evaluate to true for any character that can be turned into an emoji by adding a modifier
    // such as the digit "3". To avoid this we confirm that any character below 0x238C has an emoji modifier attached
    var isEmoji: Bool {
        guard let scalar = unicodeScalars.first else { return false }
        return scalar.properties.isEmoji && (scalar.value > 0x238C || unicodeScalars.count > 1)
    }
}

우리에게

"hey".containsEmoji() //false

"Hello World 😎".containsEmoji() //true
"Hello World 😎".containsOnlyEmojis() //false

"3".containsEmoji() //false
"3️⃣".containsEmoji() //true

Question 7

Swift 3 참고 :

표시 cnui_containsEmojiCharacters방법 중 어느 하나를 제거하거나 다른 동적 라이브러리로 이동되었다. _containsEmoji그래도 작동합니다.

let str: NSString = "hello😊"

@objc protocol NSStringPrivate {
    func _containsEmoji() -> ObjCBool
}

let strPrivate = unsafeBitCast(str, to: NSStringPrivate.self)
strPrivate._containsEmoji() // true
str.value(forKey: "_containsEmoji") // 1


let swiftStr = "hello😊"
(swiftStr as AnyObject).value(forKey: "_containsEmoji") // 1

Swift 2.x :

최근 NSString에 문자열에 이모티콘 문자가 포함되어 있는지 감지하는 기능을 제공 하는 비공개 API를 발견했습니다 .

let str: NSString = "hello😊"

objc 프로토콜 및 unsafeBitCast:

@objc protocol NSStringPrivate {
    func cnui_containsEmojiCharacters() -> ObjCBool
    func _containsEmoji() -> ObjCBool
}

let strPrivate = unsafeBitCast(str, NSStringPrivate.self)
strPrivate.cnui_containsEmojiCharacters() // true
strPrivate._containsEmoji() // true

와 함께 valueForKey:

str.valueForKey("cnui_containsEmojiCharacters") // 1
str.valueForKey("_containsEmoji") // 1

순수한 Swift 문자열 AnyObject을 사용하면 valueForKey다음을 사용하기 전에 문자열을 캐스팅해야합니다 .

let str = "hello😊"

(str as AnyObject).valueForKey("cnui_containsEmojiCharacters") // 1
(str as AnyObject).valueForKey("_containsEmoji") // 1

NSString 헤더 파일 에있는 메소드 .

Question 8

이 코드 예제 또는이 포드를 사용할 수 있습니다 .

Swift에서 사용하려면 카테고리를 YourProject_Bridging_Header

#import "NSString+EMOEmoji.h"

그런 다음 문자열에있는 모든 이모티콘의 범위를 확인할 수 있습니다.

let example: NSString = "string👨‍👨‍👧‍👧with😍emojis✊🏿" //string with emojis

let containsEmoji: Bool = example.emo_containsEmoji()

    print(containsEmoji)

// Output: ["true"]

위의 코드로 작은 예제 프로젝트를 만들었습니다.

Question 9

미래 증명 : 캐릭터의 픽셀을 수동으로 확인합니다. 새 이모티콘이 추가되면 다른 솔루션이 중단되고 중단됩니다.

참고 : 이것은 Objective-C입니다 (Swift로 변환 가능).

수년에 걸쳐 이러한 이모티콘 감지 솔루션은 Apple이 새로운 방법 (추가 문자가있는 캐릭터를 미리 저주하여 만든 피부색 이모 지 등)이 포함 된 새로운 이모티콘을 추가함에 따라 계속해서 깨졌습니다.

나는 마침내 고장 났고 모든 현재 이모티콘에 대해 작동하고 미래의 모든 이모티콘에 대해 작동 해야하는 다음 방법을 작성했습니다.

이 솔루션은 문자와 검은 색 배경으로 UILabel을 만듭니다. 그런 다음 CG는 라벨의 스냅 샷을 찍고 스냅 샷의 모든 픽셀에서 검정색이 아닌 픽셀을 스캔합니다. 검은 색 배경을 추가하는 이유는 하위 픽셀 렌더링 으로 인한 잘못된 색상 문제를 피하기 위해서입니다.

이 솔루션은 내 장치에서 매우 빠르게 실행되며 초당 수백 개의 문자를 확인할 수 있지만 이것은 CoreGraphics 솔루션이며 일반 텍스트 방법으로 할 수있는 것처럼 많이 사용해서는 안됩니다. 그래픽 처리는 데이터가 많으므로 한 번에 수천 개의 문자를 확인하면 눈에 띄는 지연이 발생할 수 있습니다.

-(BOOL)isEmoji:(NSString *)character {
    
    UILabel *characterRender = [[UILabel alloc] initWithFrame:CGRectMake(0, 0, 1, 1)];
    characterRender.text = character;
    characterRender.font = [UIFont fontWithName:@"AppleColorEmoji" size:12.0f];//Note: Size 12 font is likely not crucial for this and the detector will probably still work at an even smaller font size, so if you needed to speed this checker up for serious performance you may test lowering this to a font size like 6.0
    characterRender.backgroundColor = [UIColor blackColor];//needed to remove subpixel rendering colors
    [characterRender sizeToFit];
    
    CGRect rect = [characterRender bounds];
    UIGraphicsBeginImageContextWithOptions(rect.size,YES,0.0f);
    CGContextRef contextSnap = UIGraphicsGetCurrentContext();
    [characterRender.layer renderInContext:contextSnap];
    UIImage *capturedImage = UIGraphicsGetImageFromCurrentImageContext();
    UIGraphicsEndImageContext();
    
    CGImageRef imageRef = [capturedImage CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;//Note: Alpha Channel not really needed, if you need to speed this up for serious performance you can refactor this pixel scanner to just RGB
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);
    
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);
    
    BOOL colorPixelFound = NO;
    
    int x = 0;
    int y = 0;
    while (y < height && !colorPixelFound) {
        while (x < width && !colorPixelFound) {
            
            NSUInteger byteIndex = (bytesPerRow * y) + x * bytesPerPixel;
            
            CGFloat red = (CGFloat)rawData[byteIndex];
            CGFloat green = (CGFloat)rawData[byteIndex+1];
            CGFloat blue = (CGFloat)rawData[byteIndex+2];
            
            CGFloat h, s, b, a;
            UIColor *c = [UIColor colorWithRed:red green:green blue:blue alpha:1.0f];
            [c getHue:&h saturation:&s brightness:&b alpha:&a];//Note: I wrote this method years ago, can't remember why I check HSB instead of just checking r,g,b==0; Upon further review this step might not be needed, but I haven't tested to confirm yet. 
            
            b /= 255.0f;
            
            if (b > 0) {
                colorPixelFound = YES;
            }
            
            x++;
        }
        x=0;
        y++;
    }
    
    return colorPixelFound;
    
}

Question 10

Swift 3.0.2의 경우 다음 답변이 가장 간단합니다.

class func stringContainsEmoji (string : NSString) -> Bool
{
    var returnValue: Bool = false

    string.enumerateSubstrings(in: NSMakeRange(0, (string as NSString).length), options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in

        let objCString:NSString = NSString(string:substring!)
        let hs: unichar = objCString.character(at: 0)
        if 0xd800 <= hs && hs <= 0xdbff
        {
            if objCString.length > 1
            {
                let ls: unichar = objCString.character(at: 1)
                let step1: Int = Int((hs - 0xd800) * 0x400)
                let step2: Int = Int(ls - 0xdc00)
                let uc: Int = Int(step1 + step2 + 0x10000)

                if 0x1d000 <= uc && uc <= 0x1f77f
                {
                    returnValue = true
                }
            }
        }
        else if objCString.length > 1
        {
            let ls: unichar = objCString.character(at: 1)
            if ls == 0x20e3
            {
                returnValue = true
            }
        }
        else
        {
            if 0x2100 <= hs && hs <= 0x27ff
            {
                returnValue = true
            }
            else if 0x2b05 <= hs && hs <= 0x2b07
            {
                returnValue = true
            }
            else if 0x2934 <= hs && hs <= 0x2935
            {
                returnValue = true
            }
            else if 0x3297 <= hs && hs <= 0x3299
            {
                returnValue = true
            }
            else if hs == 0xa9 || hs == 0xae || hs == 0x303d || hs == 0x3030 || hs == 0x2b55 || hs == 0x2b1c || hs == 0x2b1b || hs == 0x2b50
            {
                returnValue = true
            }
        }
    }

    return returnValue;
}

Question 11

이전에 쓴 것과 완전히 비슷한 대답이지만 업데이트 된 이모티콘 스칼라 세트가 있습니다.

extension String {
    func isContainEmoji() -> Bool {
        let isContain = unicodeScalars.first(where: { $0.isEmoji }) != nil
        return isContain
    }
}


extension UnicodeScalar {

    var isEmoji: Bool {
        switch value {
        case 0x1F600...0x1F64F,
             0x1F300...0x1F5FF,
             0x1F680...0x1F6FF,
             0x1F1E6...0x1F1FF,
             0x2600...0x26FF,
             0x2700...0x27BF,
             0xFE00...0xFE0F,
             0x1F900...0x1F9FF,
             65024...65039,
             8400...8447,
             9100...9300,
             127000...127600:
            return true
        default:
            return false
        }
    }

}

Question 12

다음 과 같이 NSString-RemoveEmoji를 사용할 수 있습니다 .

if string.isIncludingEmoji {

}

Question 13

언급 된 작업에 대한 좋은 솔루션 이 있습니다 . 그러나 유니 코드 스칼라의 Unicode.Scalar.Properties를 확인하는 것은 단일 문자에 적합합니다. 그리고 Strings에 대해 충분히 유연하지 않습니다.

대신 정규식을 사용할 수 있습니다 .보다 보편적 인 접근 방식입니다. 작동 방식에 대한 자세한 설명은 아래에 있습니다. 그리고 여기에 해결책이 있습니다.

해결책

Swift에서는 다음과 같은 계산 된 속성이있는 확장을 사용하여 String이 단일 Emoji 문자인지 여부를 확인할 수 있습니다.

extension String {

    var isSingleEmoji : Bool {
        if self.count == 1 {
            let emodjiGlyphPattern = "\\p{RI}{2}|(\\p{Emoji}(\\p{EMod}|\\x{FE0F}\\x{20E3}?|[\\x{E0020}-\\x{E007E}]+\\x{E007F})|[\\p{Emoji}&&\\p{Other_symbol}])(\\x{200D}(\\p{Emoji}(\\p{EMod}|\\x{FE0F}\\x{20E3}?|[\\x{E0020}-\\x{E007E}]+\\x{E007F})|[\\p{Emoji}&&\\p{Other_symbol}]))*"

            let fullRange = NSRange(location: 0, length: self.utf16.count)
            if let regex = try? NSRegularExpression(pattern: emodjiGlyphPattern, options: .caseInsensitive) {
                let regMatches = regex.matches(in: self, options: NSRegularExpression.MatchingOptions(), range: fullRange)
                if regMatches.count > 0 {
                    // if any range found — it means, that that single character is emoji
                    return true
                }
            }
        }
        return false
    }

}

작동 원리 (세부 사항)

단일 이모티콘 (글리프)은 여러 다른 기호, 시퀀스 및 그 조합으로 재현 할 수 있습니다. 유니 코드 사양 은 몇 가지 가능한 Emoji 문자 표현을 정의합니다.

단일 문자 이모티콘

단일 유니 코드 스칼라로 재현 된 이모티콘 문자입니다.

유니 코드는 이모티콘 문자를 다음과 같이 정의합니다.

emoji_character := \p{Emoji}

그러나 반드시 그러한 캐릭터가 이모티콘으로 그려지는 것을 의미하지는 않습니다. 일반 숫자 기호 "1"은 Emoji 속성이 true이지만 여전히 텍스트로 그려 질 수 있습니다. 그리고 이러한 기호 목록이 있습니다 : #, ©, 4 등.

"Emoji_Presentation"을 확인하기 위해 추가 속성을 사용할 수 있다고 생각해야합니다. 하지만 이렇게 작동하지 않습니다. Emoji_Presentation = false 속성이있는 🏟 또는 🛍와 같은 이모티콘이 있습니다.

캐릭터가 기본적으로 Emoji로 그려 지는지 확인하려면 카테고리를 확인해야합니다. "Other_symbol"이어야합니다.

따라서 실제로 단일 문자 이모티콘에 대한 정규 표현식은 다음과 같이 정의되어야합니다.

emoji_character := \p{Emoji}&&\p{Other_symbol}

이모티콘 표시 순서

일반적으로 텍스트 또는 그림 이모티콘으로 그릴 수있는 문자입니다. 모양은 프레젠테이션 유형을 나타내는 프레젠테이션 선택기 인 특수 다음 기호에 따라 달라집니다. \ x {FE0E}는 텍스트 표현을 정의합니다. \ x {FE0F}는 이모티콘 표현을 정의합니다.

이러한 기호 목록은 [여기] ( https://unicode.org/Public/emoji/12.1/emoji-variation-sequences.txt ) 에서 찾을 수 있습니다  .

유니 코드는 다음과 같이 표현 순서를 정의합니다.

emoji_presentation_sequence := emoji_character emoji_presentation_selector

그것에 대한 정규 표현식 시퀀스 :

emoji_presentation_sequence := \p{Emoji} \x{FE0F}

이모티콘 키캡 시퀀스

시퀀스는 프레젠테이션 시퀀스와 매우 비슷하지만 끝에 추가 스칼라가 있습니다. \ x {20E3}. 사용할 수있는 기본 스칼라의 범위는 다소 좁습니다 : 0-9 # *-그게 전부입니다. 예 : 1️⃣, 8️⃣, * ️⃣.

유니 코드는 다음과 같이 키캡 시퀀스를 정의합니다.

emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}

그것을위한 정규 표현식 :

emoji_keycap_sequence := \p{Emoji} \x{FE0F} \x{FE0F}

이모티콘 수정 자 시퀀스

일부 이모티콘은 피부색과 같이 모양이 변경 될 수 있습니다. 예를 들어 Emoji 🧑은 다를 수 있습니다 : 🧑🧑🏻🧑🏼🧑🏽🧑🏾🧑🏿. 이 경우 "Emoji_Modifier_Base"라고하는 Emoji를 정의하려면 후속 "Emoji_Modifier"를 사용할 수 있습니다.

일반적으로 이러한 순서는 다음과 같습니다.

emoji_modifier_sequence := emoji_modifier_base emoji_modifier

이를 감지하기 위해 정규 표현식 시퀀스를 검색 할 수 있습니다.

emoji_modifier_sequence := \p{Emoji} \p{EMod}

이모티콘 플래그 시퀀스

플래그는 특정 구조를 가진 이모티콘입니다. 각 플래그는 두 개의 "Regional_Indicator"기호로 표시됩니다.

유니 코드는 다음과 같이 정의합니다.

emoji_flag_sequence := regional_indicator regional_indicator

예를 들어 우크라이나 🇺🇦의 국기는 실제로 두 개의 스칼라로 표시됩니다. \ u {0001F1FA \ u {0001F1E6}

그것을위한 정규 표현식 :

emoji_flag_sequence := \p{RI}{2}

Emoji Tag Sequence (ETS)

소위 tag_base를 사용하는 시퀀스로 뒤에는 기호 범위 \ x {E0020}-\ x {E007E}로 구성되고 tag_end 마크 \ x {E007F}로 끝나는 사용자 정의 태그 사양이 따릅니다.

유니 코드는 다음과 같이 정의합니다.

emoji_tag_sequence := tag_base tag_spec tag_end
tag_base           := emoji_character
                    | emoji_modifier_sequence
                    | emoji_presentation_sequence
tag_spec           := [\x{E0020}-\x{E007E}]+
tag_end            := \x{E007F}

이상한 점은 유니 코드가 ED-14a의 emoji_modifier_sequence 또는 emoji_presentation_sequence를 기반으로 태그를 만들 수 있다는 것입니다 . 그러나 동시에 동일한 문서 에서 제공되는 정규식 에서는 단일 Emoji 문자만을 기반으로 시퀀스를 확인하는 것 같습니다.

유니 코드 12.1 이모 지 목록에는 이러한 이모지가 세 개만 정의되어 있습니다. 그들 모두는 영국 나라의 국기입니다 : 잉글랜드 🏴󠁧󠁢󠁥󠁮󠁧󠁿, 스코틀랜드 🏴󠁧󠁢󠁳󠁣󠁴󠁿와 웨일즈 🏴󠁧󠁢󠁷󠁬󠁳󠁿. 그리고 그들 모두는 단일 이모티콘 문자를 기반으로합니다. 따라서 이러한 시퀀스 만 확인하는 것이 좋습니다.

정규식 :

\p{Emoji} [\x{E0020}-\x{E007E}]+ \x{E007F}

Emoji Zero-Width Joiner Sequence (ZWJ 시퀀스)

너비가 0 인 조이너는 스칼라 \ x {200D}입니다. 그것의 도움으로 이미 그 자체로 이모 지인 여러 캐릭터를 새로운 캐릭터로 결합 할 수 있습니다.

예를 들어 "아버지, 아들, 딸이있는 가족"이모티콘 👨‍👧‍👦은 ZWJ 기호와 함께 붙인 아버지 👨, 딸 👧 및 아들 👦 이모티콘의 조합으로 재현됩니다.

단일 이모티콘 문자, 프레젠테이션 및 수정 자 시퀀스 인 요소를 함께 붙일 수 있습니다.

일반적으로 이러한 시퀀스에 대한 정규식은 다음과 같습니다.

emoji_zwj_sequence := emoji_zwj_element (\x{200d} emoji_zwj_element )+

그들 모두를위한 정규식

위에서 언급 한 모든 이모티콘 표현은 단일 정규 표현식으로 설명 할 수 있습니다.

\p{RI}{2}
| ( \p{Emoji} 
    ( \p{EMod} 
    | \x{FE0F}\x{20E3}? 
    | [\x{E0020}-\x{E007E}]+\x{E007F} 
    ) 
  |  [\p{Emoji}&&\p{Other_symbol}] 
  )
  ( \x{200D}
    ( \p{Emoji} 
      ( \p{EMod} 
      | \x{FE0F}\x{20E3}? 
      | [\x{E0020}-\x{E007E}]+\x{E007F} 
      ) 
    | [\p{Emoji}&&\p{Other_symbol}] 
    ) 
  )*

Question 14

나는 같은 문제가 있었고 결국 a String및 Character확장을 만들었습니다 .

코드는 실제로 모든 이모티콘 (공식 유니 코드 목록 v5.0)을 나열하기 때문에 게시하기에 너무 깁니다 CharacterSet. 여기에서 찾을 수 있습니다.

https://github.com/piterwilson/StringEmoji

상수

let emojiCharacterSet : CharacterSet

알려진 모든 이모지를 포함하는 문자 세트 (공식 Unicode List 5.0 http://unicode.org/emoji/charts-5.0/emoji-list.html에 설명 됨 )

끈

var isEmoji : Bool {get}

String인스턴스가 알려진 단일 이모티콘 문자를 나타내는 지 여부

print("".isEmoji) // false
print("😁".isEmoji) // true
print("😁😜".isEmoji) // false (String is not a single Emoji)

var containsEmoji : Bool {get}

String인스턴스에 알려진 이모티콘 문자가 포함되어 있는지 여부

print("".containsEmoji) // false
print("😁".containsEmoji) // true
print("😁😜".containsEmoji) // true

var unicodeName : String {get}

문자열 사본에 kCFStringTransformToUnicodeName- CFStringTransform를 적용합니다.

print("á".unicodeName) // \N{LATIN SMALL LETTER A WITH ACUTE}
print("😜".unicodeName) // "\N{FACE WITH STUCK-OUT TONGUE AND WINKING EYE}"

var niceUnicodeName : String {get}

리턴의 결과 kCFStringTransformToUnicodeName- CFStringTransform와 \N{접두사 및 }접미사 제거

print("á".unicodeName) // LATIN SMALL LETTER A WITH ACUTE
print("😜".unicodeName) // FACE WITH STUCK-OUT TONGUE AND WINKING EYE

캐릭터

var isEmoji : Bool {get}

Character인스턴스가 알려진 Emoji 문자를 나타내는 지 여부

print("".isEmoji) // false
print("😁".isEmoji) // true