문자열과 첫 단어가 포함 된 단어 인쇄

10

텍스트 줄에서 문자열을 찾고 문자열 (공백 사이)과 문구의 첫 단어를 인쇄하고 싶습니다.

예를 들면 다음과 같습니다.

"이것은 단일 텍스트 라인입니다"
"또 다른 한가지"
"다시 시도하는 것이 좋습니다"
"보다 나은"

문자열 목록은 다음과 같습니다.

본문
의회
시험
보다 나은

내가 시도하는 것은 다음과 같은 테이블을 얻는 것입니다.

이 [탭] 텍스트
또 다른 [탭] 것
[탭] 시도
보다 나은

나는 grep으로 시도했지만 아무 일도 일어나지 않았습니다. 어떠한 제안?

command-line text-processing regex

— 펠리페 리라
소스

따라서 기본적으로 "줄에 문자열이 있으면 첫 번째 단어 + 문자열을 인쇄하십시오". 권리 ?

— Sergiy Kolodyazhnyy

12

배쉬 / 그렙 버전 :

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

다음과 같이 호출하십시오.

./string-and-first-word.sh /path/to/file text thing try Better

산출:

This    text
Another thing
It  try
Better

— wjandrea
소스

9

구조에 펄!

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

다른 이름으로 저장 first-plus-word, 다른 이름으로 실행

perl first-plus-word file.txt text thing try Better

입력 단어에서 정규식을 만듭니다. 그런 다음 각 줄은 정규식과 일치하며 일치하는 항목이 있으면 첫 번째 단어가 인쇄되고 단어와 다른 경우 단어도 인쇄됩니다.

— 코 로바
소스

9

awk 버전은 다음과 같습니다.

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

file2단어 목록은 어디에 file1있고 문구가 들어 있습니다.

— 스틸 드라이버
소스

2

좋은 것! 편의상 스크립트 파일 paste.ubuntu.com/23063130에 넣었습니다.

— Sergiy Kolodyazhnyy

8

파이썬 버전은 다음과 같습니다.

#!/usr/bin/env python
from __future__ import print_function 
import sys

# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
          'text', 'thing',
          'try', 'Better'
          ]


with open(sys.argv[1]) as input_file:
    for line in input_file:
        for string in strings:
            if string in line:
               words = line.strip().split()
               print(words[0],end="")
               if len(words) > 1:
                   print("\t",string)
               else:
                   print("")

데모:

$> cat input_file.txt                                                          
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt                                      
This    text
Another     thing
It  try
Better

참고 :이 스크립트는 python3호환되므로 python2또는로 실행할 수 있습니다 python3.

— 세르지 콜로 디아즈 니
소스

7

이 시도:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

앞에있는 탭 Better에 문제가 있으면 다음을 시도하십시오.

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

위의 내용은 GNU sed ( gsedOSX에서 호출) 에서 테스트되었습니다 . BSD sed의 경우 약간의 변경이 필요할 수 있습니다.

작동 원리

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

이렇게하면 단어, [[:alnum:]]+공백 [[:space:]], 뒤에 무엇이든 .*, 뒤에 단어 중 하나가 text|thing|try|Better뒤 따릅니다. 해당 단어가 있으면 줄의 첫 번째 단어 (있는 경우), 탭 및 일치하는 단어로 바뀝니다.
ta; b; :a; s/^\t//; p

대체 명령으로 대체 단어가 발견되면 해당 단어 중 하나가 줄에 있음을 의미하는 ta명령이 sed에게 label로 이동하도록 지시합니다 a. 그렇지 않으면 ( b)를 다음 줄로 분기 합니다. :a레이블을 정의합니다. 따라서 귀하의 단어 중 하나가 발견되면 (a) s/^\t//맨 앞 탭이있는 경우 해당 탭을 제거하는 대체 를 수행 하고 (b) p행을 인쇄 ( )합니다.

— 존 1024
소스

7

간단한 bash / sed 접근법 :

$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words 
This    text
Another thing
It  try
    Better

는 while read w; do ...; done < words파일의 각 행을 반복합니다 words및 파일로 저장 $w. -n차종은 sed기본적으로 아무것도 인쇄되지. 이 sed명령은 큰 따옴표 다음에 공백이 아닌 따옴표를 대체합니다 ( \"(\S*)괄호는 \S*, 첫 단어 와 일치하는 것을 "캡처"하는 역할을 하며 나중에는 \1), 0 개 이상의 문자 ( .*) 및 찾고있는 단어 ( $w)와 0 개 이상의 문자 ( .*)를 다시 찾습니다 . 이것이 일치하면 첫 번째 단어, 탭 및 $w( \1\t$w)으로 바꾸고 줄을 인쇄합니다 ( pin 이하 s///p는 일).

— 테르 돈
소스

5

이것은 루비 버전입니다

str_list = ['text', 'thing', 'try', 'Better']

File.open(ARGV[0]) do |f|
  lines = f.readlines
  lines.each_with_index do |l, idx|
    if l.match(str_list[idx])
      l = l.split(' ')
      if l.length == 1
        puts l[0]
      else
        puts l[0] + "\t" + str_list[idx]
      end
    end
  end
end

샘플 텍스트 파일 hello.txt에는

This is a single text line
Another thing
It is better you try again
Better

ruby source.rb hello.txt결과로 실행

This    text
Another thing
It      try
Better

— 안와르
소스