Get the most out of unix command

Wednesday, 17 June 2015

grep

Remove files without match using bash grep :

------------------------------------------------------------------

I have already discussed in one of my earlier post how we can find the files which do not contain a particular pattern (string).

As I mentioned in that particular post:

"Well, when I first heard of this requirement; I just thought I will use "grep -vl pattern" piped with xargs to a regular find command (i.e. "find . -type f | xargs grep -vl pattern")

(-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.)

Then I realized that "grep -v" only works in line level, i.e. a file which contain the "pattern" in some line(s) may also contain some line(s) which "do not" contain that "pattern"; so as a whole that file will also be in the list of files which "do not" contain the "pattern" even though it contains the pattern."

From GREP(1) man pages:

-L, --files-without-match

Suppress normal output; instead print the name of each input file from which no out-put would normally have been printed. The scanning will stop on the first match.

-q, --quiet, --silent

Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also see the -s or --no-messages option.

So to remove or delete the files which do not contain the pattern or string "hello"

$ find . -type f -exec grep -L "hello" {} \; | xargs rm

$ find . -type f \! -exec grep -q "hello" {} \; -print | xargs rm

Remove files without match using bash grep:

-----------------------------------------------------------------

I have already discussed in one of my earlier post how we can find the files which do not contain a particular pattern (string).

As I mentioned in that particular post:

"Well, when I first heard of this requirement; I just thought I will use "grep -vl pattern" piped with xargs to a regular find command (i.e. "find . -type f | xargs grep -vl pattern")

(-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.)

From GREP(1) man pages:

-L, --files-without-match

Suppress normal output; instead print the name of each input file from which no out-put would normally have been printed. The scanning will stop on the first match.

-q, --quiet, --silent

Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also see the -s or --no-messages option.

So to remove or delete the files which do not contain the pattern or string "hello"

$ find . -type f -exec grep -L "hello" {} \; | xargs rm

$ find . -type f \! -exec grep -q "hello" {} \; -print | xargs rm

screen

Beginning of line in UNIX screen session:

-----------------------------------------------------------

As you know every "screen" command begins with "Ctrl-a", then how to go to beginning of the line when you are working under a "screen" UNIX session (which we achieve by "Ctrl-a" in a normal UNIX session) ?

Solution:

Under a "screen" UNIX session you can do "Ctrl-a a" to move to the beginning of the line.

So in order to print ^A (hex: \x01) in a UNIX "screen" session, usual "Ctrl-v Ctrl-a" will not work, you will type "Ctrl-v Ctrl-a a" to print ^A.

Thursday, 4 June 2015

sed command examples

Insert, append, change lines using sed :

--------------------------------------------------------

Input file:

$ cat order789.txt

title:

PO for vessel unit1 const.

items:

fan:F34539

tube:L1245

driller:M4545

Description:

PO signed and verified by factory manager S K Lp

Date: Fri Jan 2 17:26:44 UTC 2009

Author: M Kumar

# Add the line "heater:M21789" after the line "items"

$ sed '

/items/ a\

heater:M21789

' order789.txt

Output:

title:

PO for vessel unit1 const.

items:

heater:M21789

fan:F34539

tube:L1245

driller:M4545

Description:

PO signed and verified by factory manager S K Lp

Date: Fri Jan 2 17:26:44 UTC 2009

Author: M Kumar

# Add the line "heater:M21789" after line number 3 (output would be same as above)

$ sed '

3 a\

heater:M21789

' order789.txt

# Insert a line "heater:M21789" before the line beginning with "fan"

$ sed '

/^fan/ i\

heater:M21789

' order789.txt

Output:

title:

PO for vessel unit1 const.

items:

heater:M21789

fan:F34539

tube:L1245

driller:M4545

Description:

PO signed and verified by factory manager S K Lp

Date: Fri Jan 2 17:26:44 UTC 2009

Author: M Kumar

# We can insert or add more than one line also.

$ sed '

4 i\

heater:M21789\

newitem:YYYYY

' order789.txt

Output:

title:

PO for vessel unit1 const.

items:

heater:M21789

newitem:YYYYY

fan:F34539

tube:L1245

driller:M4545

Description:

PO signed and verified by factory manager S K Lp

Date: Fri Jan 2 17:26:44 UTC 2009

Author: M Kumar

# One can change a line as well, e.g. to change the line beginning with "fan" with the line "BigFAN:F5757"

$ sed '

/^fan/ c\

BigFAN:F5757

' order789.txt

Output:

title:

PO for vessel unit1 const.

items:

BigFAN:F5757

tube:L1245

driller:M4545

Description:

PO signed and verified by factory manager S K Lp

Date: Fri Jan 2 17:26:44 UTC 2009

Author: M Kumar

Embedding shell command in sed - bash:

-----------------------------------------------------------

Input file:

$ cat file.txt

port:9903

os-version:VERSION

codename:hardy

status:active

The content of '/etc/lsb-release' file on my ubuntu desktop:

$ cat /etc/lsb-release

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=8.04

DISTRIB_CODENAME=hardy

DISTRIB_DESCRIPTION="Ubuntu 8.04.3 LTS"

Lets extract the 'DISTRIB_RELEASE' version from the above file:

$ awk -F '=' '/DISTRIB_RELEASE/ {print $2}' /etc/lsb-release

8.04

Required: Replace the text 'VERSION' in the input file 'file.txt' with the output of the above command (i.e. 'DISTRIB_RELEASE' version)

First way of doing this:

$ myvar=$(awk '/DISTRIB_RELEASE/ {print $2}' FS=\= /etc/lsb-release)

$ sed "s/VERSION/$myvar/g" file.txt

Output

port:9903

os-version:8.04

codename:hardy

status:active

** Important to see that we have used double quotes (instead of regular single quote) in the above sed statement. Using single quote is not going to expand the content of the variable 'myvar'.

Other ways :

$ sed "s/VERSION/`awk -F '=' '/DISTRIB_RELEASE/ {print $2}' /etc/lsb-release`/g" file.txt

Which is same as:

$ sed "s/VERSION/$(awk -F '=' '/DISTRIB_RELEASE/ {print $2}' /etc/lsb-release)/g" file.txt

And another way of quoting (This will allow you to use the single quote with sed statement):

$ sed 's/VERSION/'"$(awk -F '=' '/DISTRIB_RELEASE/ {print $2}' /etc/lsb-release)"'/g' file.txt

same as

$ sed 's/VERSION/'"`awk -F '=' '/DISTRIB_RELEASE/ {print $2}' /etc/lsb-release`"'/g' file.txt

Extract range of lines using sed awk bash:

-------------------------------------------------------------

Below are few different ways to print or extract a section of a file based on line numbers.

Lets try to extract lines between line number 27 and line number 99 of input file 'file.txt'

Using sed editor:

$ sed -n '27,99 p' file.txt > /tmp/file1

Which is same as:

$ sed '27,99 !d' file.txt > /tmp/file2

Awk alternative : you can make use of awk NR variable

$ awk 'NR >= 27 && NR <= 99' file.txt > /tmp/file3

Using Linux/UNIX 'head' and 'tail' command:

$ head -99 file.txt | tail -73 > /tmp/file4

Which is basically:

$ head -99 file.txt | tail -$(((99-27)+1)) > /tmp/file5

In vi editor, we can use the following command in ex mode (open the main file 'file.txt' in vi):

:27,99 w! /tmp/file6

i.e. Write lines between line number 27 and line number 99 of main file 'file.txt' to file '/tmp/file6'

Perl alternative would be:

$ perl -ne 'print if 27..99' file.txt > /tmp/file7

And the solution using python:

$ python

Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)

[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

>>> fp = open("/tmp/file8","w")

>>> for i,line in enumerate(open("file.txt")):

... if i >= 26 and i < 99 :

... fp.write(line)

...

>>>

So the contents of all the output files produced (i.e /tmp/file[1-8]) will be the same (i.e. line number 27 to line number 99 of 'file.txt')

Sed to replace a part of file : bash scripting :

-----------------------------------------------------------

$ cat idfile

id='1' dsadsad adsad

id='31' dsadsad adsad 32432

id='231' dsadsad adsad 3234

id='123' 2332 dsadsad adsad

id='124' 2332

Output required:

Subtract 1 from the id value from each line of the above file. i.e. final output should like this:

id='0' dsadsad adsad

id='30' dsadsad adsad 32432

id='230' dsadsad adsad 3234

id='122' 2332 dsadsad adsad

id='123' 2332

The script:

$ cat idfile | while read line

> do

> R=`echo $line | sed "s/id='$[0-9].*$'.*/\1/"`

> ((R-=1))

> echo $line | sed "s/$id='$$[0-9].*$$'.*$/\1$R\3/"

> done

Note:

((R-=1))

complians, then use

R=`expr $R - 1`

Some references:

$ echo "id='1' dsadsad adsad" | sed "s/$id='$$[0-9].*$$'.*$/\1/"

id='

$ echo "id='1' dsadsad adsad" | sed "s/$id='$$[0-9].*$$'.*$/\2/"

$ echo "id='1' dsadsad adsad" | sed "s/$id='$$[0-9].*$$'.*$/\3/"

' dsadsad adsad

Merge previous line using sed :

----------------------------------------------

Input file:

$ cat file.txt

1 AA 2

2 BB 3

3 DD 4

5 CC 12

7 ZZ 12

Required output:

If I search for line with first field as "3" , th line with first field 3 should be merged with the previous line to it.

i.e

1 AA 2

2 BB 3 3 DD 4

5 CC 12

7 ZZ 12

Find the character immediately after a word - sed:

------------------------------------------------------------------------

$ echo "unix:bash/scripting:12"

unix:bash/scripting:12

Purpose: find the character after the word "bash"

$ echo "unix:bash/scripting:12" | sed 's/.*bash$.$.*/\1/'

Delete one or more space using sed- bash:

------------------------------------------------------------

Input file:

$ cat test.txt

one two three four

1 2 3 4

i ii iii iv

Sed code to delete/remove one or more space from a file. (Basically combine one or more space to a single space)

$ sed 's/[ \t]\+/ /g' test.txt > test.txt.1

$ cat test.txt.1

one two three four

1 2 3 4

i ii iii iv

Remove or replace newlines using sed,awk,tr - BASH:

----------------------------------------------------------------------------

$ cat week.txt

Output Required:

- a) Remove the newlines. i.e. required output:

SuMoTuWeThFrSa

- b) Replace the newlines with "|", i.e.

Su|Mo|Tu|We|Th|Fr|Sa

Remove/Replace newlines with sed

$ sed -e :a -e '$!N;s/\n//;ta' week.txt

SuMoTuWeThFrSa

$ sed -e :a -e '$!N;s/\n/|/;ta' week.txt

Su|Mo|Tu|We|Th|Fr|Sa

One more way of doing. But not suitable for files with large number of records, as you see the number of N's is just 1 less than number of lines in the file.

$ sed 'N;N;N;N;N;N;s/\n//g' week.txt

SuMoTuWeThFrSa

$ sed 'N;N;N;N;N;N;s/\n/|/g' week.txt

Su|Mo|Tu|We|Th|Fr|Sa

Remove/Replace newlines with awk

$ awk '{printf "%s",$0} END {print ""}' week.txt

SuMoTuWeThFrSa

$ awk '{printf "%s|",$0} END {print ""}' week.txt

Su|Mo|Tu|We|Th|Fr|Sa|

So we need to remove the last "|" in the above output.

$ awk '{printf "%s|",$0} END {print ""}' week.txt | awk '{sub(/\|$/,"");print}'

Su|Mo|Tu|We|Th|Fr|Sa

Remove/Replace newlines with tr

$ tr -d '\n' < week.txt

SuMoTuWeThFrSa

$ tr '\n' '|' < week.txt

Su|Mo|Tu|We|Th|Fr|Sa|

Similarly we need to remove the last "|" from the above output:

$ tr '\n' '|' < week.txt | sed 's/|$//'

Su|Mo|Tu|We|Th|Fr|Sa

Tuesday, 2 June 2015

split

Split Command:

---------------------------

-d , numeric file name

-a, number of character in the file name.

0 at the end is file prefix

split -5100 -d -a6 187557-post1.13134_12.1367607900 0

output:

-rw-r--r-- 1 root root 5.1M Sep 1 07:01 0000003

-rw-r--r-- 1 root root 4.7M Sep 1 07:01 0000002

-rw-r--r-- 1 root root 4.8M Sep 1 07:01 0000001

Monday, 1 June 2015

xargs

Unix xargs parallel execution of commands:

--------------------------------------------------------------

Xargs has option that allows you to take advantage of multiple cores in your machine. Its -P option which allows xargs to invoke the specified command multiple times in parallel. From XARGS(1) man page:

-P max-procs

Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option

with -P; otherwise chances are that only one exec will be done.

-n max-args

Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the -x

option is given, in which case xargs will exit.

-i[replace-str]

This option is a synonym for -Ireplace-str if replace-str is specified, and for -I{} otherwise. This option is deprecated; use -I instead.

Let me try to give one example where we can make use of this parallel option avaiable on xargs. e.g. I got these 8 log files (each one is of 1.5G size) for which I have to run a script named count_pipeline.sh which does some calculation around the log lines in the log file.

$ ls -1 *.out

log1.out

log2.out

log3.out

log4.out

log5.out

log6.out

log7.out

log8.out

The script count_pipeline.sh takes nearly 20 seconds for a single log file. e.g.

$ time ./count_pipeline.sh log1.out

real 0m20.509s

user 0m20.967s

sys 0m0.467s

If we have to run count_pipeline.sh for each of the 8 log files one after the other, total time needed:

$ time ls *.out | xargs -i ./count_pipeline.sh {}

real 2m45.862s

user 2m48.152s

sys 0m5.358s

Running with 4 parallel processes at a time (I am having a machine which is having 4 CPU cores):

$ time ls *.out | xargs -i -P4 ./count_pipeline.sh {}

real 0m44.764s

user 2m55.020s

sys 0m6.224s

We saved time ! Isn't this useful ? You can also use -n1 option instead of the -i option that I am using above. -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args).

$ time ls *.out | xargs -n1 -P4 ./count_pipeline.sh

real 0m43.229s

user 2m56.718s

sys 0m6.353s

Thursday, 28 May 2015

diff

Diff remote files using ssh in Linux:

-----------------------------------------------------

I have already discussed how we can edit a remote file using vi and scp in one of my previous post ; today we will see how we can find or show differences between a local file and a remote file using ssh.

Suppose we have to find the differences between local file "/tmp/filepurge.sh.old" and remote file "/root/scripts/filepurge.sh" located in remote host 172.21.16.11.

This is how can do it:

$ ssh root@172.21.16.11 "cat ~/scripts/filepurge.sh" | diff - /tmp/filepurge.sh.old

And using vimdiff:

$ vimdiff scp://root@172.21.16.11//root/scripts/filepurge.sh /tmp/filepurge.sh.old

** We would need ssh to work using public key authentication so that we can do remote commands execution without being prompted for passwords.

Wednesday, 20 May 2015

cat

Bash cat command space issue explained:

-----------------------------------------------------------

Input file contains some 4 student names like this:

$ cat file.txt

Alex C M

Peter S

Dhiren K

Prahlad G N

Required: I was trying to produce the following output:

1) Alex C M [3]

2) Peter S [2]

3) Dhiren K [2]

4) Prahlad G N [3]

i.e. a serial number, Name of the student, number of words in his name.

Lets try with bash for loop like this:

$ cat lp1.sh

#!/bin/sh

c=0

for line in $(cat file.txt)

((c+=1))

numfields=$(echo $line | awk '{print NF}')

echo "$c) $line [$numfields]"

done

And the output it produced !

$ ./lp1.sh

1) Alex [1]

2) C [1]

3) M [1]

4) Peter [1]

5) S [1]

6) Dhiren [1]

7) K [1]

8) Prahlad [1]

9) G [1]

10) N [1]

So what went wrong ?

I tried echo "$line" as well, same output.

In the above example, we need to take care of the Bash IFS environmental variable. From Bash man page:

IFS:

The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read built in command.

The default value is

<space><tab><newline>.

And since the lines in the input file got lines with spaces in between, above script is behaving in that way.

We can temporarily change the IFS in the shell script like this:

$ cat lp3.sh

#!/bin/sh

OLD_IFS=$IFS

IFS=$'\n'

c=0

for line in $(cat file.txt)

((c+=1))

numfields=$(echo $line | awk '{print NF}')

echo "$c) $line [$numfields]"

done

IFS=$OLD_IFS

Output:

$ ./lp3.sh

1) Alex C M [3]

2) Peter S [2]

3) Dhiren K [2]

4) Prahlad G N [3]

Bash 'while loop' used in the below way also works without changing the IFS:

$ cat lp2.sh

#!/bin/sh

c=0

while read line

((c+=1))

numfields=$(echo $line | awk '{print NF}')

echo "$c) $line [$numfields]"

done < "file.txt"

Output:

$ ./lp2.sh

1) Alex C M [3]

2) Peter S [2]

3) Dhiren K [2]

4) Prahlad G N [3]

Note: The above example is taken mainly to show the use of Bash IFS variable. Using awk, the above can be done easily like this:

$ awk '{print NR")",$0,"["NF"]"}' file.txt

$ awk '{++c}{print c")",$0,"["NF"]"}’ file.txt