Home SysAdmin Commands & Shells Finding Prime Numbers

Finding Prime Numbers

January 7, 2024

258

Originally published January 7, 2018 @ 2:43 pm

I have no intention of competing with GIMPS. To me this is a fun scripting exercise that may produce useful results beyond this initial application. Our goal is to take a bunch of integers and pick out the prime numbers. There are a couple of simple ways to spot a prime number from shell. One is to use perl (or python, or java, etc.) with the awesome “/^1?$|^(11+?)\1+$/” regex (Abigail, 1998).

perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)+$/' [number]

Another way is to use the “factor” utility (part of the coreutils package) that decomposes integers into prime factors. You can wrap that command in a small script called “primefind”:

#!/bin/bash
if [ ! "" ] ; then exit 1 ; else n="" ; fi
if [ `BC_LINE_LENGTH=0 bc <<<"${n}" | factor | awk '{print NF}'` -eq 2 ] ; then echo "${n}" ; fi

Here are examples of using both methods to find all prime number between 1 and 2^4:

# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do echo $i | primefind $i; done
2
3
5
7
11
13

# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done
2
3
5
7
11
13

As far as performance is concerned, the second method is considerably (about three times) faster. The next step is to increase the range, but here we may run into our first unexpected result: a Perl bug with the regex iteration limit. The bug is known, but seems to re-appear in various Perl versions, including some of the latest:

for i in $(seq 67867333 67867967) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done

Complex regular subexpression recursion limit (32766) exceeded at -e line 1.

The Perl bug aside, this elegant regex approach will eventually grind to a halt as it tries every possible factorization as a pattern match. The regex makes n attempts to match a string of length n against each given pattern. And, as unary expansion, this gives us 2^(2n).

Consider the following shell script (geirha, 2008):

#!/bin/bash
n=$1
function sqrt() 
{
	local i=1 j=1 n=$1
	while (( i*i <= n ))
	do
		((j=i, i++))
	done
	echo "$j"
}

function isprime() 
{
    local i=3 n=$1 h=0
    (( n >= 2 )) || return
    (( n == 2 || n == 3 )) && return
    (( n % 2 == 0 )) && return 1
    h=$(( $(sqrt $n) + 1 ))
    while (( i < h ))
	do
        (( n % i == 0 )) && return 1
        (( i += 2 ))
    done
    return 0
}

if isprime $n ; then echo $n ; fi

While a bit more complex than the Perl regex, this approach is a lot faster for large integers. We can couple it with xargs to take advantage of multi-core CPUs:

time seq 67857333 67867967 | xargs -n1 -P`grep -c processor /proc/cpuinfo` /var/adm/bin/prime3.sh

...

real    0m20.389s
user    10m14.600s
sys     0m23.247s

Uncork your CPUs

To get the most out of your CPUs, you may want to disable automatic CPU frequency scaling (kondeman daemon):

for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;
do [ -f $i ] || continue; echo -n performance > $i; done

Going back to the factor utility, here’s a much faster implementation:

time seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | xargs -n100 -P`grep -c processor /proc/cpuinfo` factor | awk '{gsub(":","")}; NF==2{print $1}{}'

...

real    0m0.238s
user    0m0.460s
sys     0m1.035s

You may get even better time using parallel and passing multiple arguments per line (option -N). The advantage of parallel over xargs in this case is the ability to distribute load across multiple compute nodes on the network.

seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | parallel --gnu --will-cite -j10% -P10% -N256 factor | awk '{gsub(":","")}; NF==2{print $1}{}'

With very large numbers, seq becomes useless:

seq `BC_LINE_LENGTH=0 bc <<<"2^64"` `BC_LINE_LENGTH=0 bc <<<"2^110"`|more

18446744073709551616
18446744073709551616
18446744073709551618
18446744073709551620
18446744073709551620
18446744073709551620
...

Brace expansion will not work either:

echo {1..10}
1 2 3 4 5 6 7 8 9 10

echo {18446744073709551644..18446744073709551656}
{18446744073709551644..18446744073709551656}

Shell arithmetic is out of the question as well (and, contrary to popular belief, the answer to life, the universe and everything is not 42, but it’s close.):

i=18446744073709551656
echo $i
18446744073709551656

((i++))
echo $i
41

These are all unfortunate consequences of the range limitation of a 64-bit signed long int. To a point, the answer is to stick with bc, until you hit this limit:

BC_LINE_LENGTH=0 bc <<<10^10^10
Runtime error (func=(main), adr=14): exponent too large in raise

The dc doesn’t work either:

dc <<< '10 10000000000^pq'
Runtime error: exponent too large in raise
1

And we can’t use Bash exponentiation:

var=$((10**10**10))
echo $var
0

Don’t be angry with Unix: considering that the observable universe consists of about 10¹¹ galaxies, 10²¹ stars, 10⁷⁸ atoms, and 10⁸⁸ photons, the 10^10^10 limit of Bash seems rather generous. Luckily, here is Hypercalc, either Perl or JavaScript version. The large-number limit of Hypercalc in Knuth up-arrow notation is 10↑↑(10¹⁰):

yum -y install perl-Time-HiRes ; mkdir -p /var/adm/bin ; wget -O /var/adm/bin/hypercalc.pl "http://mrob.com/pub/comp/hypercalc/hypercalc.txt" ; chmod 755 /var/adm/bin/hypercalc.pl ; ln -s /var/adm/bin/hypercalc.pl /usr/bin/hypercalc

But what to do with it is a subject for later.

According to Wikipedia, the largest known prime number is 2^74,207,281 − 1 found by GIMPS in January of 2016. This is still well within bc and factor limitations, so, theoretically at least, running the following command will eventually confirm this finding:

BC_LINE_LENGTH=0 bc -l <<< 2^74207281-1 | factor | awk '{gsub(":","")}; NF==2{print "prime"}{}'

However, how long would this take? Well, 2^74,207,281 − 1 contains 22,338,618 digits, which puts it between 10^10^7 and 10^10^8 in the following attempt to estimate approximate time required to run the command above:

for i in 1 2 3 4 5 6 7 8 9 ; do echo "------------------" ; echo "10^10^$i" ; time -p BC_LINE_LENGTH=0 bc -l <<< 10^10^$i | factor >/dev/null 2>&1 ; done

The time I got for each iteration on a 2.9GHz Xeon E5-2690 and 96GB of RAM is as follows:

x       seconds
10^10^1	0.01
10^10^2	0.01
10^10^3	0.01
10^10^4	0.03
10^10^5	1.11
10^10^6	101.66
10^10^7 yawn...
10^10^8 yawn...
10^10^9 yawn...

As you can see, I had no patience to get past 10^10^6. Still, based on those few data points and assuming exponential growth (a good assumption in this case), I got y = 1.151806818·10^-8 e^{3.815837901 x} , where x is 10^10^x and y is estimated time to execute “bc <<< 10^10^x | factor”. Using this equation, we can see about how long it would’ve taken my computer to confirm that 2^74,207,281 − 1 is in fact a prime number:

x       seconds
10^10^7	4589.45
10^10^8	208429.00
10^10^9	9465770.00

So, I would have needed somewhere between 1.3 hours to 2.4 days, but closer to hours. And that’s just to confirm a known prime number. As you may imagine, actually finding one in that range would require lots of CPUs or an equivalent amount of luck.

So what about the elusive 10^10^10 – how long would it take me to confirm a prime number in that range, when one is inevitably found? Well, the Wolfram Alpha I’ve been using to solve the above equation also fails at that point and probably for the very same reason. The aforementioned Hypercalc, however, has no such problem:

hypercalc "1.151806818*10^(-8)*e^(3.815837901*10)/60/60/24/365"
13.631588907193

So, just over thirteen-and-a-half years.

Igor

Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.

Symbol	USD	% 1h	% 24h	% 7d
BTC	37,157	0.55	2.50	7.72
ETH	1,716.5	0.31	3.66	4.71
USDT	0.9995	0.01	0.02	0.09
BNB	576.50	0.19	1.61	14.22
USDC	0.9997	0.01	0.02	0.02
XRP	0.3813	0.14	0.63	2.13
SOL	147.93	0.13	1.23	6.13
TRX	0.3206	0.06	1.27	6.75
FIGR_HELOC	1.029	0.00	0.96	0.30
	?	---	0.00	0.00

Bitcoin $ 37,157	Bitcoin 2.50 %
Ethereum $ 1,716.5	Ethereum 3.66 %
Litecoin $ 53.16	Litecoin 0.18 %
XRP $ 0.3813	XRP 0.63 %

Randomizing Filenames

Finding Gaps in Timestamps

Coronavirus Stats in Bash

Updating Lynis

Converting Geofency Data to Google Maps

The Black Box on the Org Chart

The Post-Language Future of AI Systems

Outsmarted by a River, a Rope, and an Anchor

The AI Bubble Isn’t a Bubble. It’s a Trap.

College Students Demand Refunds

Randomizing Filenames

To Mask or Not to Mask

Checking Linux Account Password

Finding Gaps in Timestamps

Converting Geofency Data to Google Maps

Analyzing atop Logs with atopsar

Verifying SNMP Connectivity on Multiple Hosts

Validating HTTPS Cache Peers for Squid

Verifying SNMP Connectivity on Multiple Hosts

Bulk-Adding IPTables Rules

Automatically Block Frequent Visitors

Canyonlands National Park, Utah

Yellowstone National Park, Wyoming

Sun Juan Mountains, Colorado

Detecting Blurry Photos with ImageMagick

Finding Prime Numbers

Philly Flower Show 2018

Benford’s Law in Bash

AWS CLI Cheat Sheet

London 2017

Finding Cron Jobs

Productivity Shortcuts for .bashrc