Home Blogs Unix as a Second Language Counting individual characters on Linux

by Sandra Henry Stocker

Unix Dweeb

Counting individual characters on Linux

How-To

Oct 26, 20225 mins

Linux

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Determining how many characters are in a file is easy on the Linux command line: use the ls -l command.

On the other hand, if you want to get a count of how many times each character appears in your file, you’re going to need a considerably more complicated command or a script. This post covers several different options.

Counting how many times each character appears in a file

To count how many of each character are included in a file, you need to string together a series of commands that will consider each character and use a sort command before it counts how many of each character are included.

To do that, you can use a command like this one:

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | column
     24              58 c           112 i           132 o             7 T
    254               2 C             3 I             2 O            30 u
      1 '            50 d             4 j            29 p            23 v
     25 ,           163 e             5 k             1 P             9 w
     20 .             2 E            60 l             2 q             4 x
    142 a            21 f            48 m            90 r            36 y
      5 A            16 g             2 M             1 R             3 z
     23 b             1 G           117 n           147 s
      1 B            51 h             1 N           119 t

The sed command will separate the file into a single character chunks. That output is then sorted by the sort command. After that, each group of the same character is counted by the uniq -c command and the column command is used to create the multi-column output. Since the results are based on the file content, no characters are listed besides those in the file.

Notice that the output displays the list of characters in the selected file in alphanumeric order thanks to the sort command. The first two characters aren’t shown because linefeeds and spaces are only recognizable in context.

If you want to display the characters in frequency order instead, all you need to do is add a second sort command using the -g (general numeric).

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | sort -g | column
      1 '             2 O             9 w            30 u           117 n
      1 B             2 q            16 g            36 y           119 t
      1 G             3 I            20 .            48 m           132 o
      1 N             3 z            21 f            50 d           142 a
      1 P             4 j            23 b            51 h           147 s
      1 R             4 x            23 v            58 c           163 e
      2 C             5 A            24              60 l           254
      2 E             5 k            25 ,            90 r
      2 M             7 T            29 p           112 i

To reverse the listing to show the most frequently used characters first, add an r (reverse) option to that last sort command.

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | sort -gr | column
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 k             2 E

The character at the top of the list is, as I assume you guessed, the space character. The second most often used character in the file is an “e”. No surprise there either. In addition, capital letters are listed last since they are not frequently used.

Note that if you don’t want to distinguish between uppercase and lowercase letters you can insert a tr (translate) command into the command string like this:

$ cat myfile | sed 's/(.)/n1/g' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -gr | column"
    254             115 i            36 y            21 f             3 z
    165 e            91 r            30 u            20 .             2 q
    147 s            60 l            30 p            17 g             1 '
    147 a            60 c            25 ,             9 w
    134 o            51 h            24 b             5 k
    126 t            50 m            24               4 x
    118 n            50 d            23 v             4 j

Switch the positions of the “upper” and “lower” arguments to display the results all in uppercase.

Counting character-by-character in a word or phrase

You can also use a command similar to those shown above to count how many times each letter appears in a single word or phrase. Here’s an example:

$ echo "Hello, World!" | sed 's/(.)/n1/g' | sort | uniq -c | sort -gr |  column
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Using an alias

While the commands shown above are clever, they’re not easy to remember or type. Creating an alias can help with this. Once you decide what form of output you prefer, turn the command into an alias like this:

$ alias CountChars="sed 's/(.)/n1/g' | sort | uniq -c | sort -gr | column"

Save the alias in your .bashrc file so that you can use it as needed. Then use it in commands like these:

$ cat myfile | CountChars
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 k             2 E
$ echo "Hello, World!" | CountChars
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Using a script

If you want to see only alphabetic characters, you can use a script like the one shown below. It first changes all the letters to lowercase before it runs through the alphabet, uses awk to count the number of times each letter appears and then displays the counts only if they’re larger than 1. It only works with whatever string is provided as an argument.

#!/bin/bash # make argument all lowercase string=$(echo $1 | tr '[:upper:]' '[:lower:]') for char in {a..z} do count=`awk -F"${char}" '{print NF-1}'

Run it like this:

$ CountByChar "Hello, World!"
d:1
e:1
h:1
l:3
o:2
r:1
w:1

Note that characters will always be listed in alphabetical order. You can pipe the output to the column command if you want fewer lines of output.

$ CountByChar "Hello, World!" | column
d:1     e:1     h:1     l:3     o:2     r:1     w:1

Wrap-up

Whether you’re looking for character counts in files or phrases, there are some handy options available. Turning the complex ones into aliases is probably the best way to make the task easy.

by Sandra Henry Stocker

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Americas

Topics

About

Policies

Our Network

More

Counting individual characters on Linux

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Counting how many times each character appears in a file

Counting character-by-character in a word or phrase

Using an alias

Using a script

Wrap-up

More from this author

Pipe viewer: Using the pv command on Linux

Backgrounding and foregrounding processes in the Linux terminal

Making a case for case statements on Linux

Compressing files using the zip command on Linux

NSA, FBI warn of email spoofing threat

The logic of && and || on Linux

Using the apropos command on Linux

Shredding files on Linux with the shred command

Most popular authors

Show me more

Cisco patches actively exploited zero-day flaw in Nexus switches

Nokia to buy optical networker Infinera for $2.3 billion

French antitrust charges threaten Nvidia amid AI chip market surge

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command

Counting individual characters on Linux

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Counting how many times each character appears in a file

Counting character-by-character in a word or phrase

Using an alias

Using a script

Wrap-up

Related content

How to find files on Linux

Linux in your car: Red Hat’s milestone collaboration with exida

How to print from the Linux command line: double-sided, landscape and more

Converting between uppercase and lowercase on the Linux command line

Newsletter Promo Module Test

More from this author

Pipe viewer: Using the pv command on Linux

Backgrounding and foregrounding processes in the Linux terminal

Making a case for case statements on Linux

Compressing files using the zip command on Linux

NSA, FBI warn of email spoofing threat

The logic of && and || on Linux

Using the apropos command on Linux

Shredding files on Linux with the shred command

Most popular authors

Show me more

Cisco patches actively exploited zero-day flaw in Nexus switches

Nokia to buy optical networker Infinera for $2.3 billion

French antitrust charges threaten Nvidia amid AI chip market surge

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command