Forums

encoder and decoder for compact representation of the prime numbers

Started by Jamie Morken March 18, 2016
Hi,

The prime numbers can be stored with a density approaching 1 prime per
bit by using a primorial set of equations.  When the primes are 
compressed like this, any patterns in the distribution of the primes
becomes apparent to see compared to the uncompressed primes in the
natural numbers.  Compression algorithms are related to pattern
recognition so this maximally compact representation of the primes
will have the best chance to find patterns in the prime numbers.

The way it works is using a primorial set of formulas to encode
the primes into a binary sequence.  If the prime occurs at a
given index it is a 1 in the sequence and if there is no prime
at a given index then there is a 0.  The decoder only needs
to know which specific primorial was used to encode the binary
sequence, and it can then decompress the binary sequence directly
into the prime numbers.

Here is an example of compressing the primes into one bit
per prime using primorial 30

First step is to find all coprimes of primorial 30 and remove all
the unique prime factors.

So for primorial 30 this creates a list with 8 elements:
1,7,11,13,17,19,23,29

This has a count of 8 and that is the maximum number of primes
that can occur in any 30 digit spacing to infinity with the
digit spacings being ie 1 to 30, 31 to 60, 61 to 90 etc..
That is an interesting result in itself to show that
there can only be at most 8 primes in any block of 30 numbers
in that sequence.

The primorial list of 8 elements are used to create this set of equations:


y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

These give the offsets within each 30 number block of where the prime 
numbers can occur, ie the primes can only occur at those 8 specific
offset (1,7,11,13 etc) locations in each 30 number block, except for
the first block that has primes 2,3,5 (prime factors of 30)

Now that a primorial has been chosen and the list of elements have been 
found, the next step to create the binary sequence is to use each of the
eight equations to set the bits in the binary sequence to 1 or 0 
depending on if there is a prime at each location.  Ie for y = 30x + 1
for x=0 to x=maxSize, check all y if they are prime and create the 
binary sequence using all 8 equations.

Here is how the primes are compressed with this primorial 30 binary 
compression:

input primes:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113 



output binary sequence before being written has 8 bits per block of 30, 
ceil(113/30)=4 blocks required max = 8*4= 32 bits

block1	block2	block3	block4
00000000 00000000 00000000 00000000

Each block corresponds to a consecutive 30 number group 1 to 30, 31 to 
60, 61 to 90, 91 to 120

The prime factors of 30 which are 2,3,5 aren't encoded into the block,

So to fill block1, set x to zero and check each y if it is prime:

y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

y=1
y=7
y=11
y=13
y=17
y=19
y=23
y=29

All are prime except for 1, so the first 8bit binary block looks like this:

01111111

For the second binary block do the same thing but set x to 1 and check 
each y if it is prime:

y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

y=31
y=37
y=41
y=43
y=47
y=49
y=53
y=59

All are prime except for 49, so the second 8bit binary block looks like 
this (where the zero indicates offset 30x+19=49 is not prime)

11111011


For the third binary block do the same thing but set x to 2 and check 
each y if it is prime:


y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

y=61
y=67
y=71
y=73
y=77
y=79
y=83
y=89

All are prime except for 77, so the third 8bit binary block looks like 
this (where the zero indicates offset 30x+17=77 is not prime)

11110111


For the fourth binary block do the same thing but set x to 3 and check 
each y if it is prime:


y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

y=91
y=97
y=101
y=103
y=107
y=109
y=113
y=119


All are prime except for 91 and 119 so the fourth 8bit binary block 
looks like this (where the zeros indicate offsets 30x+1=91 and 30x+29 
are both not prime)

01111110


So all four blocks can be put together (or constructed up to "maxsize")

block1 block2 block3 block4
00000000 00000000 00000000 00000000

becomes:

block1 block2 block3 block4
01111111 11111011 11110111 01111110

This encodes all primes up to 119 with just 32bits stored.

It is interesting to look for patterns in this maximally compressed
representation of the primes, ie with an additional compression 
algorithm applied to it etc, or by turning it into a raster with
each bit representing a pixel or each byte representing a greyscale
pixel etc (just for fun)

SECOND STEP:

To decode the binary sequence it must be know what primorial was used
to create it and then it can be decoded.  So ideally the sequence will
be stored as a file named: compressedPrimes30Primorial.bin etc to
indicate which primorial was used, as different primorials will make
different binary sequences of compressed primes (larger primorials will
encode the primes more efficiently, as the ratio of the count of 
coprimes of large primorials to the primorial size approaches log(n),
just like the prime number counting formula.

So once the primorial is known and a binary sequence to decode exists,
the first step is to regenerate the primorial 8 equations again, just
like the encode step above, by finding the coprimes of the primorial,
and removing the prime factors to get the same 8 equations:

y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

The decoder will provide a "lookup table" of all the primes with
the highest compression possible.

First step is to decode block1, so set x=0 and then take the block1 
compressed digits: 01111111

y = 30x + 1
y = 30x + 7
y = 30x + 11
y = 30x + 13
y = 30x + 17
y = 30x + 19
y = 30x + 23
y = 30x + 29

The 8bits serves as a mask to remove the first equation, and then
solve the equations for x=0

y = 30x + 1 (masked - no calculation done as not prime)
y = 7
y = 11
y = 13
y = 17
y = 19
y = 23
y = 29

So before entering those primes first the special prime factors
of primorial 30 (2,3,5) have to be entered into the prime list
being constructed, then the 7 new primes can be added to give
the sequence of the primes up to 29 (just from 8 encoded bits!)

2,3,5,7,11,13,17,19,23,29

Now more primes can be appended on or looked up at a desired range
by using the compressed binary sequence.

Ie to add on more primes set x=1 and then take the block2 compressed
digits: 11111011

The 8bits serves as a mask to remove the sixth equation, and then
solve the equations for x=1

y = 31
y = 37
y = 41
y = 43
y = 47
y = 30x + 19 (masked - no calculation done as not prime)
y = 53
y = 59

Ok so now we can add 7 more primes.

2,3,5,7,11,13,17,19,23,29
becomes
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59

looks good so far just like the primes so the decoder works fine at 
least on paper.  Using a very large primorial is a good idea too
as the prime density per bit (or bits required per prime) is more
compact.

Very large primes can still be encoded with just a single bit,
and only BigInteger math is required once in the last step for the 
formula y = m * x + b but the binary sequence size is small! :D

cheers,
Jamie

On 18/03/2016 6:58 PM, Jamie Morken wrote:
7,19,23,29,31,37,41,43,47,53,59
> > looks good so far just like the primes so the decoder works fine at > least on paper. Using a very large primorial is a good idea too > as the prime density per bit (or bits required per prime) is more > compact.
As the density of primes drops asymptotically towards zero for large primes, it doesn't matter how large a number of you choose, the packing efficiency will also drop towards zero. Sylvia.
Am Freitag, 18. M�rz 2016 09:09:37 UTC+1 schrieb Sylvia Else:
> On 18/03/2016 6:58 PM, Jamie Morken wrote: > 7,19,23,29,31,37,41,43,47,53,59 > > > > looks good so far just like the primes so the decoder works fine at > > least on paper. Using a very large primorial is a good idea too > > as the prime density per bit (or bits required per prime) is more > > compact. > > As the density of primes drops asymptotically towards zero for large > primes, it doesn't matter how large a number of you choose, the packing > efficiency will also drop towards zero. >
I do not think so, you could just give n for the nth prime. This is the most compact representation. Decoding will be difficult, of course. If you cannot access a full table you could store only the primes for powers of two, or similar, then you have a starting point for the search. Andreas
On 3/18/2016 5:46 AM, acd wrote:
> Am Freitag, 18. M�rz 2016 09:09:37 UTC+1 schrieb Sylvia Else: >> On 18/03/2016 6:58 PM, Jamie Morken wrote: >> 7,19,23,29,31,37,41,43,47,53,59 >>> >>> looks good so far just like the primes so the decoder works fine at >>> least on paper. Using a very large primorial is a good idea too >>> as the prime density per bit (or bits required per prime) is more >>> compact. >> >> As the density of primes drops asymptotically towards zero for large >> primes, it doesn't matter how large a number of you choose, the packing >> efficiency will also drop towards zero. >> > > I do not think so, you could just give n for the nth prime. > This is the most compact representation. Decoding will be difficult, of course. > If you cannot access a full table you could store only the primes for powers of two, or similar, then you have a starting point for the search. > > Andreas >
Hi, Ya the packing density is optimal for an infinite primorial as there are log(n) coprimes for the infinite primorial (just the sequence of primes) This is similar to base(e) being the highest information density base, ie it is higher than base2 or base3 (base3 is the highest information density storage integer) cheers, Jamie
On 19/03/2016 8:21 AM, Jamie Morken wrote:
> On 3/18/2016 5:46 AM, acd wrote: >> Am Freitag, 18. M�rz 2016 09:09:37 UTC+1 schrieb Sylvia Else: >>> On 18/03/2016 6:58 PM, Jamie Morken wrote: >>> 7,19,23,29,31,37,41,43,47,53,59 >>>> >>>> looks good so far just like the primes so the decoder works fine at >>>> least on paper. Using a very large primorial is a good idea too >>>> as the prime density per bit (or bits required per prime) is more >>>> compact. >>> >>> As the density of primes drops asymptotically towards zero for large >>> primes, it doesn't matter how large a number of you choose, the packing >>> efficiency will also drop towards zero. >>> >> >> I do not think so, you could just give n for the nth prime. >> This is the most compact representation. Decoding will be difficult, >> of course. >> If you cannot access a full table you could store only the primes for >> powers of two, or similar, then you have a starting point for the search. >> >> Andreas >> > > Hi, > > Ya the packing density is optimal for an infinite primorial as there are > log(n) coprimes for the infinite primorial (just the sequence of primes)
You can't just treat infinity as if it were a number. It's not. There is no such thing as an infinite primorial. All primorials are finite. Sylvia.
On 3/18/2016 6:21 PM, Sylvia Else wrote:
> On 19/03/2016 8:21 AM, Jamie Morken wrote: >> On 3/18/2016 5:46 AM, acd wrote: >>> Am Freitag, 18. M�rz 2016 09:09:37 UTC+1 schrieb Sylvia Else: >>>> On 18/03/2016 6:58 PM, Jamie Morken wrote: >>>> 7,19,23,29,31,37,41,43,47,53,59 >>>>> >>>>> looks good so far just like the primes so the decoder works fine at >>>>> least on paper. Using a very large primorial is a good idea too >>>>> as the prime density per bit (or bits required per prime) is more >>>>> compact. >>>> >>>> As the density of primes drops asymptotically towards zero for large >>>> primes, it doesn't matter how large a number of you choose, the packing >>>> efficiency will also drop towards zero. >>>> >>> >>> I do not think so, you could just give n for the nth prime. >>> This is the most compact representation. Decoding will be difficult, >>> of course. >>> If you cannot access a full table you could store only the primes for >>> powers of two, or similar, then you have a starting point for the >>> search. >>> >>> Andreas >>> >> >> Hi, >> >> Ya the packing density is optimal for an infinite primorial as there are >> log(n) coprimes for the infinite primorial (just the sequence of primes) > > You can't just treat infinity as if it were a number. It's not. There is > no such thing as an infinite primorial. All primorials are finite. > > Sylvia. >
A primorial number is a product of primes, if there are infinite primes, then there is a primorial that is the product of them all. cheers, Jamie
On 3/19/2016 8:57 AM, Jamie Morken wrote:
> On 3/18/2016 6:21 PM, Sylvia Else wrote: >> On 19/03/2016 8:21 AM, Jamie Morken wrote: >>> On 3/18/2016 5:46 AM, acd wrote: >>>> Am Freitag, 18. M�rz 2016 09:09:37 UTC+1 schrieb Sylvia Else: >>>>> On 18/03/2016 6:58 PM, Jamie Morken wrote: >>>>> 7,19,23,29,31,37,41,43,47,53,59 >>>>>> >>>>>> looks good so far just like the primes so the decoder works fine at >>>>>> least on paper. Using a very large primorial is a good idea too >>>>>> as the prime density per bit (or bits required per prime) is more >>>>>> compact. >>>>> >>>>> As the density of primes drops asymptotically towards zero for large >>>>> primes, it doesn't matter how large a number of you choose, the >>>>> packing >>>>> efficiency will also drop towards zero. >>>>> >>>> >>>> I do not think so, you could just give n for the nth prime. >>>> This is the most compact representation. Decoding will be difficult, >>>> of course. >>>> If you cannot access a full table you could store only the primes for >>>> powers of two, or similar, then you have a starting point for the >>>> search. >>>> >>>> Andreas >>>> >>> >>> Hi, >>> >>> Ya the packing density is optimal for an infinite primorial as there are >>> log(n) coprimes for the infinite primorial (just the sequence of primes) >> >> You can't just treat infinity as if it were a number. It's not. There is >> no such thing as an infinite primorial. All primorials are finite. >> >> Sylvia. >> > > > A primorial number is a product of primes, if there are infinite primes, > then there is a primorial that is the product of them all. > > cheers, > Jamie > > >
I found lots of the same stuff of what I was talking about for encoding and decoding the primes: http://stackoverflow.com/questions/2614147/efficiently-storing-a-list-of-prime-numbers https://programmingpraxis.com/2010/03/26/the-next-prime/ http://www.qsl.net/w2gl/blackkey.html http://stackoverflow.com/questions/1428786/best-way-to-find-a-prime-number http://stackoverflow.com/questions/1024640/calculating-phik-for-1kn/1134851#1134851 https://en.wikipedia.org/wiki/Wheel_factorization cheers, Jamie