Forums

Referencing 42 bits

Started by mpm May 11, 2015
It's early AM here, no coffee in the house, so please forgive the following:  :)

I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once.  The receiving end would have an appropriate decoder.

Printable characters only.
And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), although a Dash could be used if it ended up reducing the character count in the encoded message.  Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc...

Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L.  Athough if upper-case only, the I and L can be kept since the confusion arises with their lowercase equivalents.

Maybe I'm just over-thinking this.?

Got any ideas?
Maybe I should just see what 2^42 comes out to be in decimal or hex.
But seems I could get it shorter.?
"mpm"  wrote in message 
news:f328de36-1152-4c1f-8eea-512566f8a25c@googlegroups.com...

It's early AM here, no coffee in the house, so please forgive the following: 
:)

I want to take 42 bits of information and encode, or condense them, into a 
small string of text to that I can easily, and even verbally, communicate a 
whole bunch of information at once.  The receiving end would have an 
appropriate decoder.

Printable characters only.
And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), 
although a Dash could be used if it ended up reducing the character count in 
the encoded message.  Maybe a couple others?, but want to stay away from 
things like "?" or ";", ";", "%", etc...

Even more ideally, eliminate easily confused characters such as "O" "Zero", 
I, and L.  Athough if upper-case only, the I and L can be kept since the 
confusion arises with their lowercase equivalents.

Maybe I'm just over-thinking this.?

Got any ideas?
Maybe I should just see what 2^42 comes out to be in decimal or hex.
But seems I could get it shorter.?
=================================================================================

I don't know what rules someone like Microsoft uses when they create one of 
those "6 groups of 5 characters each" software license id's, but for maximum 
safety I think as a minimum you should leave out zero, one, five, and the 
letters I, J, L, O, and S, since someone may write this and use lower case 
and not know what characters you excluded so they could try and "guess" a 1 
for an l, for example, so don't give them the chance.  You don't want to 
confuse zero and O, 1 and L, 5 and S, i and l, and I and J.  There may be 
other pairs but these are the biggies I can think of off the top of my head. 
2^42 is 4.4E12, and using 2-4, 6-9 and A-Z minus I, J, L, O, and S leaves 28 
characters, and 28^9 is 1.1e13 so you will need 9 digits for your data. 
Write it as xxxx-xxxxx or xxx-xxx-xxx since smaller groups are easier to 
copy and remember.  Just my layman's thoughts, anyway :-).

-----
Regards,
Carl Ijames carl.ijames aat deletethis verizon dott net


On 2015-05-11, mpm <mpmillard@aol.com> wrote:
> It's early AM here, no coffee in the house, so please forgive the following: :) > > I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appropriate decoder. > > Printable characters only. > And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), although a Dash could be used if it ended up reducing the character count in the encoded message. Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc... > > Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L. Athough if upper-case only, the I and L can be kept since the confusion arises with their lowercase equivalents. > > Maybe I'm just over-thinking this.? > > Got any ideas? > Maybe I should just see what 2^42 comes out to be in decimal or hex. > But seems I could get it shorter.?
2^42 = 32^8.4 so with a 32 symbol set it would take 9 symbols. even with a 37 symbol set it wouls still take 9 however: 26^9 is also more than 2^42, so with only 9 letters you can still cover 2^42. 2^42 = 4398046511104 37^8 = 3512479453921 - not enough 26^9 = 5429503678976 - sufficient perhaps drop the vowels and substitute in digits 34679 to avoid accidentally spelling offensive words, -- umop apisdn
On 11/05/2015 13:35, mpm wrote:
> It's early AM here, no coffee in the house, so please forgive the following: :) > > I want to take 42 bits of information and encode, or condense them, into a small string of text to
>that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appropriate decoder.
> > Printable characters only. > And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), although a Dash could be used
>if it ended up reducing the character count in the encoded message. 42 bits if you had an alphabet of length 64 = 2^6 would be 7 characters (exactly) 32 = 2^5 would be 8.4 characters 16 - 2^4 would be 10.5 characters 2^5.25 = 38.05 so you need 39 symbols to get it into 8 (pleasing length)
> Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc... > > Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L.
> Athough if upper-case only, the I and L can be kept since the confusion arises with their lowercase equivalents. Beware of 0OQ I1l 5S 8B my pet hate is when secure registration codes include one or more of these in some fancy hard to read 6pt font. My instinct would be to use all 26 letters plus 234679 = 32 and then + - / * @ # seven others to make up 39.
> > Maybe I'm just over-thinking this.? > > Got any ideas? > Maybe I should just see what 2^42 comes out to be in decimal or hex. > But seems I could get it shorter.?
In hex it will be 11 characters long (and it has the 8 B ambiguity). -- Regards, Martin Brown
On 5/11/2015 5:35 AM, mpm wrote:
> It's early AM here, no coffee in the house, so please forgive the following: > :) > > I want to take 42 bits of information and encode, or condense them, into a > small string of text to that I can easily, and even verbally, communicate a > whole bunch of information at once. The receiving end would have an > appropriate decoder.
Ah! That makes it easy! 2^42 is *about* 4*1000^4 (2^10 ~= 1000) So, build a dictionary of ~1500 words (or, ideally, 4 *different* dictionaries each of ~1500 words). Assign a unique 11 bit number to each of the words in the first (second, third and fourth) dictionaries. Use the first dictionary to encode the first 11 bits, the second for the next 11 bits, etc. So, you end up with a four word phrase: yellow dog house runs Or, make the dictionaries smaller (encode just *6* bits -- so 64 words in each dictionary) and use a *seven* word phrase. By keeping each dictionary "disjoint" from the others, you also eliminate the possibility of word transposition errors creeping into the phrase: dog yellow runs house would not be valid -- because "dog" is only present in the "second" dictionary, "yellow" only present in the *first*, etc. So, you have a sort of check algorithm built into the encoding. Alternatively, you can view it as providing more flexibility to the user (he can mix up the words and you can *still* recover the original data!). Or, more flexibility to how you rearrange those words to make a more memorable "phrase": yellow dog runs house is probably more memorable than house yellow dog runs This is practical because you claim the receiving end will have an "appropriate decoder" (yet haven't specified how complex that decoder will be -- so, a little piece of software or a list of words, etc.) You can add words and shorten dicstionaries. Then, by carefully choosing the words in each dictionary (i.e., word position), you can create pseudo-meaningful sentences: - first word is a 2 digit number between one and thrity two (it encodes 5 *obvious* bits) - second word is an adjective from a list of eight colors to encode three bits - third word is one of 32 nouns (cow, dog, horse, bottle, etc.) to encode 5 more bits - fourth word is a verb from a list of 16 verbs to encode 4 bits - fifth is a preposition from a list of... "Twelve blue artichokes slid down..." "Eight green pidgeons ran along..." Sure, they're nonsense. But, they have the same benefits of the dictionary approach listed above ("Blue artichokes twelve down slid..." is obviously not a valid encoding!) and are more memorable to a casual user. Can you recall the "authorization code" from *any* of your software licenses FIVE SECONDS after having READ IT?! If you want a *simpler* decoder, you could adopt something similar; encode data in consonants and allow vowels to be inserted at will. So, B=0000, C=0001, D=0010, F=0011, etc. Then, "make up" pronounceable strings by inserting vowels as convenient: FoD = 0011 0010 This is harder to *create* encodings but trivial to decode them (just elide the vowels and convert the consonants).
> Printable characters only. And ideally, restricted 0-9, A-Z (uppercase, and > avoiding lower case), although a Dash could be used if it ended up reducing > the character count in the encoded message. Maybe a couple others?, but > want to stay away from things like "?" or ";", ";", "%", etc... > > Even more ideally, eliminate easily confused characters such as "O" "Zero", > I, and L. Athough if upper-case only, the I and L can be kept since the > confusion arises with their lowercase equivalents. > > Maybe I'm just over-thinking this.? > > Got any ideas? Maybe I should just see what 2^42 comes out to be in decimal > or hex. But seems I could get it shorter.? >
On Monday, May 11, 2015 at 9:33:02 AM UTC-4, Jasen Betts wrote:
> On 2015-05-11, mpm wrote: > > It's early AM here, no coffee in the house, so please forgive the following: :) > > > > I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appropriate decoder. > > > > Printable characters only. > > And ideally, restricted 0-9, A-Z (uppercase, and avoiding lower case), although a Dash could be used if it ended up reducing the character count in the encoded message. Maybe a couple others?, but want to stay away from things like "?" or ";", ";", "%", etc... > > > > Even more ideally, eliminate easily confused characters such as "O" "Zero", I, and L. Athough if upper-case only, the I and L can be kept since the confusion arises with their lowercase equivalents. > > > > Maybe I'm just over-thinking this.? > > > > Got any ideas? > > Maybe I should just see what 2^42 comes out to be in decimal or hex. > > But seems I could get it shorter.? > > 2^42 = 32^8.4 > > so with a 32 symbol set it would take 9 symbols. > > even with a 37 symbol set it wouls still take 9 > > > however: 26^9 is also more than 2^42, so with only 9 letters you can still > cover 2^42. > > > 2^42 = 4398046511104 > 37^8 = 3512479453921 - not enough > 26^9 = 5429503678976 - sufficient > > perhaps drop the vowels and substitute in digits 34679 to avoid > accidentally spelling offensive words, > > -- > umop apisdn
This looks promising... working on it now. Thanks to everyone here. -mpm
On Monday, May 11, 2015 at 5:35:34 AM UTC-7, mpm wrote:

> I want to take 42 bits of information and encode, or condense them, into a small string of text to that I can easily, and even verbally, communicate a whole bunch of information at once. The receiving end would have an appropriate decoder.
> Got any ideas?
Another consideration, error detection and correction, argues that you WANT some redundancy, some extra symbols that can act as a test of the transmission. There are many examples of these: parity bits, and casting-out-nines, and ISBN (the numbers on books' barcodes have a final check-symbol) for numeric info come to mind. Hashes and MP5 checksums get more elaborate, and you can go all the way to ECC (error-correcting codes).
mpm <mpmillard@aol.com> wrote:
> I want to take 42 bits of information and encode, or condense them, > into a small string of text to that I can easily, and even verbally, > communicate a whole bunch of information at once.
uuencode is an early example of a program that does exactly this; it translates 8-bit data (usually from a file) into 7-bit ASCII that would survive early email and file-transfer protocols. It uses the whole ASCII set from space to _, so it includes a lot of punctuation that you may not want. xxencode uses a more restricted character set which may be closer to what you want. Base64 is a newer development of the same idea. http://en.wikipedia.org/wiki/Uuencoding http://en.wikipedia.org/wiki/Xxencoding http://en.wikipedia.org/wiki/Base64 An advantage to these is that code already exists in lots of programming languages to handle them. One knock on all of these is that they use both uppercase and lowercase letters. If you always transmit this data with a computer, this doesn't matter so much, but humans might not know that the case is significant. Matt Roberds
Hi Matt,

On 5/11/2015 12:21 PM, mroberds@att.net wrote:
> mpm <mpmillard@aol.com> wrote: >> I want to take 42 bits of information and encode, or condense them, >> into a small string of text to that I can easily, and even verbally, >> communicate a whole bunch of information at once. > > uuencode is an early example of a program that does exactly this; it > translates 8-bit data (usually from a file) into 7-bit ASCII that would > survive early email and file-transfer protocols. It uses the whole > ASCII set from space to _, so it includes a lot of punctuation that you > may not want. xxencode uses a more restricted character set which may > be closer to what you want. Base64 is a newer development of the same > idea. > > http://en.wikipedia.org/wiki/Uuencoding > http://en.wikipedia.org/wiki/Xxencoding > http://en.wikipedia.org/wiki/Base64 > > An advantage to these is that code already exists in lots of programming > languages to handle them. > > One knock on all of these is that they use both uppercase and lowercase > letters. If you always transmit this data with a computer, this doesn't > matter so much, but humans might not know that the case is significant.
And, the issue of O0Q, 1l|, 8BE (E being a B that has "faded" over time or due to mechanical damage to a label, etc). I *despise* MS's CoA's as, invariably, there are one or two characters that require a judgement call -- get it wrong and you risk the machine "tattling" on your "misdeeds". I think it is important to remember that people have reasonably short (as in "number of items") memories -- the "magic seven" concept. So, beyond 7 ARBITRARY characters, most people will have to resort to writing things down. Witness how few folks commit LENGTHY passwords to memory. I prefer the "encoded phrase" approach as it is usually easier for folks to remember such -- even if it is nonsensical. And, the inherent structure of the phrase helps reinforce that memory: "It was something about 25 ponies and a swan... and something was BLUE!" (i.e., "blue" only is allowed as the 3rd word in the phrase so even though the user forgot the phrase, they've remembered enough of it that you can reconstruct it -- far easier than "one of the digits was a '3'...") I also like the "read it to me over the phone" test; if you have to ask "was that 'capital B' or just 'b'?" then you've compounded the memory and transfer actions. "Was that an 'eff' or an 'ess'?" [Of course, you also have to consider *how* the data will be exchanged; punching digits on a DTMF keypad places different restrictions on the data format than "reading words to a human operator". Even then, the choice of words matters: "was that 'buck' or 'but'?"; "'safe' or 'save'?"; etc -- esp over a noisey or bandwidth limited channel!] Regardless, the fact that he's *thinking* about the issue instead of just blindly opting for an "obvious" solution is 2 steps in the right direction!
On 12/5/2015 1:19 AM, Don Y wrote:
> On 5/11/2015 5:35 AM, mpm wrote: >> It's early AM here, no coffee in the house, so please forgive the >> following: >> :) >> >> I want to take 42 bits of information and encode, or condense them, >> into a >> small string of text to that I can easily, and even verbally, >> communicate a >> whole bunch of information at once. The receiving end would have an >> appropriate decoder. > > Ah! That makes it easy! 2^42 is *about* 4*1000^4 (2^10 ~= 1000) > So, build a dictionary of ~1500 words (or, ideally, 4 *different* > dictionaries each of ~1500 words). Assign a unique 11 bit number to > each of the words in the first (second, third and fourth) dictionaries. > Use the first dictionary to encode the first 11 bits, the second for > the next 11 bits, etc. So, you end up with a four word phrase: > yellow dog house runs > Or, make the dictionaries smaller (encode just *6* bits -- so 64 words > in each dictionary) and use a *seven* word phrase. > > By keeping each dictionary "disjoint" from the others, you also eliminate > the possibility of word transposition errors creeping into the phrase: > dog yellow runs house > would not be valid -- because "dog" is only present in the "second" > dictionary, "yellow" only present in the *first*, etc. So, you have a > sort of check algorithm built into the encoding. > > Alternatively, you can view it as providing more flexibility to the > user (he can mix up the words and you can *still* recover the > original data!). Or, more flexibility to how you rearrange those > words to make a more memorable "phrase": > yellow dog runs house > is probably more memorable than > house yellow dog runs > > This is practical because you claim the receiving end will have an > "appropriate decoder" (yet haven't specified how complex that decoder > will be -- so, a little piece of software or a list of words, etc.) > > You can add words and shorten dicstionaries. Then, by carefully > choosing the words in each dictionary (i.e., word position), you > can create pseudo-meaningful sentences: > - first word is a 2 digit number between one and thrity two > (it encodes 5 *obvious* bits) > - second word is an adjective from a list of eight colors to > encode three bits > - third word is one of 32 nouns (cow, dog, horse, bottle, etc.) > to encode 5 more bits > - fourth word is a verb from a list of 16 verbs to encode 4 bits > - fifth is a preposition from a list of... > > "Twelve blue artichokes slid down..." > "Eight green pidgeons ran along..." > > Sure, they're nonsense. But, they have the same benefits of the > dictionary approach listed above ("Blue artichokes twelve down slid..." > is obviously not a valid encoding!) and are more memorable to a > casual user. Can you recall the "authorization code" from *any* > of your software licenses FIVE SECONDS after having READ IT?! > > If you want a *simpler* decoder, you could adopt something similar; > encode data in consonants and allow vowels to be inserted at will. > So, B=0000, C=0001, D=0010, F=0011, etc. Then, "make up" pronounceable > strings by inserting vowels as convenient: > > FoD = 0011 0010 > > This is harder to *create* encodings but trivial to decode them > (just elide the vowels and convert the consonants). >
Thats neat Don ! Might be a good way to encode passwords into something memorable. -- Regards, Adrian Jansen adrianjansen at internode dot on dot net Note reply address is invalid, convert address above to machine form.