sci.electronics.design | Nice MCU for small jobs| page 8

Reply by Joseph Gwinn ●September 5, 20182018-09-05

On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote
(in article <pmor53$153b$3@gioia.aioe.org>):

> Joseph Gwinn wrote
> > On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote
> > This successive approximation approach is well known in computer science,
> > where it is called a "binary search" in English. As you note, it is
> > very fast.
> >
> > .<https://en.wikipedia.org/wiki/Binary_search_algorithm>
> >
> > The only algorithm that is faster is hashing, which is also discussed in the
> > above wiki article. The lookup time for hashing is constant, regardless of
> > the size of the dataset.
>
> I am not sure hashing is faster, what gives you that idea?
> I have used it, was long time ago, trying to remember...
> You may well be right, but why?

On average, hashing is way faster than anything else for large datasets, but 
must be used correctly, and the pathalogical cases avoided or addressed. Most 
importantly, the hash array must be large enough that collisions are 
reasonabley rare. Each hash array location contains either null (no data 
here) or a pointer to a short linked list of data items.

To load the array, take every data item in term and generate the hash index 
from the item&rsquo;s key, whatever that may be. This process is called hashing. 
Any hash algorithm that spreads the data items more or less uniformly in the 
hash index array will do. A very common approach is to divide the key by a 
prime number (this being the length of the hash array as well), and use the 
remainder as the index (address) into the hash index array. The quotient is 
discarded. There are lots of hashing algorithms in the literature, but keep 
it simple.

At the just computed location in the hash index array, the value will either 
be null - no data value hashed to that location, or there will be a pointer 
to a small data item containing the full hash key and the data being saved 
(or a pointer to the data, if the data is large). In either case, add the new 
data item to the list of data items at that hash location.

To use the now-loaded array, go through the same index address computation as 
used when loading the array, find the short list at that location in the 
table, and look through it for a match. If the database is large enough, 
things may be arranged to allow the list at a hash-table location to be 
searched by succesive approximation, but usually it&rsquo;s cheaper to just 
traverse the list linearly, because if the table is large enough, very few of 
these lists are longer than two or three items long.

Hash tables in 2D are common for searching 2D maps of what is close to what 
geographically.

For a table of ten million IPv32 addresses, one would take the 32-bit address 
(without punctuation) and divide by 10000019, and use the remainder to index 
the hash table. Use a Poisson distribution to determine the fraction of cells 
with 0, 1,2,3, ... entries.

Spellcheckers also use a large hash table, one with one-bit table entries - 
if such a word exists (is spelled that way), the bit will be set. Collisions 
are ignored - the bit remains set. If the hash table is large enough, false 
positives (misspelled word marked as correct) will be rare.

Joe Gwinn

Reply by ●September 5, 20182018-09-05

On Monday, September 3, 2018 at 1:25:06 PM UTC-4, Phil Hobbs wrote:
> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1 
> LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP:  67 cents in 
> reels, dev board $27.
> 
> We're using it for a super-quiet diode laser controller where it will 
> run the temperature control and automatic power control loops.  I'll 
> never use another ATmega. ;)
> 
> Cheers
> 
> Phil Hobbs
> 
> -- 
> Dr Philip C D Hobbs
> Principal Consultant
> ElectroOptical Innovations LLC / Hobbs ElectroOptics
> Optics, Electro-optics, Photonics, Analog Electronics
> Briarcliff Manor NY 10510
> 
> http://electrooptical.net
> http://hobbs-eo.com

Why don't you just start a thread about your favorite color?

Reply by Gerhard Hoffmann ●September 5, 20182018-09-05

Am 06.09.2018 um 02:00 schrieb bloggs.fredbloggs.fred@gmail.com:
> On Monday, September 3, 2018 at 1:25:06 PM UTC-4, Phil Hobbs wrote:
>> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1
>> LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP:  67 cents in
>> reels, dev board $27.
>>
>> We're using it for a super-quiet diode laser controller where it will
>> run the temperature control and automatic power control loops.  I'll
>> never use another ATmega. ;)
>>
>> Cheers
>>
>> Phil Hobbs
>>
>> -- 
>> Dr Philip C D Hobbs
>> Principal Consultant
>> ElectroOptical Innovations LLC / Hobbs ElectroOptics
>> Optics, Electro-optics, Photonics, Analog Electronics
>> Briarcliff Manor NY 10510
>>
>> http://electrooptical.net
>> http://hobbs-eo.com
> 
> Why don't you just start a thread about your favorite color?
> 

Because he talks about electronics and not the weather in Japan?

Reply by ●September 6, 20182018-09-06

On Wed, 5 Sep 2018 14:18:31 -0400, bitrex <user@example.net> wrote:

>On 09/05/2018 02:13 PM, bitrex wrote:
>> On 09/05/2018 01:31 PM, 698839253X6D445TD@nospam.org wrote:
>>> bitrex wrote
>>>> Binary search time-complexity is O(1) best-case and O(log n) worst-case,
>>>> a hash table is O(1) best-case and O(n) worst-case.
>>>>
>>>> A naive binary search implementation can certainly beat a pathological
>>>> hash table implementation in the average; one would hope most hash
>>>> tables aren't pathological, though, and binary search is only O(1) in
>>>> the average for pathological use-cases of binary search.
>>>
>>> OK, yes, exactly.
>>>
>>> Just did look up where I used hashing,
>>> I did actually use hashing for message IDs in my newsreader,
>>> to stop cross posted repeats from being listed again and again.
>>> and also in my videotext / ceefax program for storing pages.
>>> Both applications are not really that time critical and the data sets 
>>> are small.
>>>
>> 
>> Big O is really only useful for analysis comparisons for very large data 
>> sets, where for example your data set is >> CPU cache size on desktop 
>> architectures. what's actually better in practice for small datasets 
>> where the set-up/tear-down overhead of the algorithm and execution time 
>> of things like branches becomes on the order of the time you spend 
>> actually carrying out the full search.
>> 
>> For "small" data sets sometimes just linear search beats everything else 
>> hands-down. What the specific uses cases it will are and what the 
>> precise definition of "small" is for a particular architecture endlessly 
>> debated; x86 is so complex it's often anyone's guess, you have to 
>> profile to see.
>> 
>> as usual the answer to "what's the best way to..." is "it depends"
>
>For example if your arch of choice has truly exceptional branch 
>prediction binary search can be faster even in cases where an on-paper 
>analysis says it shouldn't be, just cuz a binary search can be written 
>where it's a lot easier for a the branch predictor to suss out the taken 
>branch than a naive linear search or hash table lookup.

You also have to consider the data base usage, is it search only or
also frequent insertion of new record. A well balanced binary tree is
fast, but if there are a lot of new insertions, you easily end up in
pathological cases, with bad search times. Inserting new records may
require a lot of heavy operations keeping the tree well balanced.

Reply by Jasen Betts ●September 6, 20182018-09-06

On 2018-09-05, Joseph Gwinn <joegwinn@comcast.net> wrote:
>
> Hash tables in 2D are common for searching 2D maps of what is close to what 
> geographically.

how is that goes to work? hashes don't support in-order access

-- 
     &#1578;

Reply by Jasen Betts ●September 6, 20182018-09-06

On 2018-09-05, Joseph Gwinn <joegwinn@comcast.net> wrote:
>
> Hash tables in 2D are common for searching 2D maps of what is close to what 
> geographically.

how is that going to work? hashes don't support in-order access

-- 
     &#1578;

Reply by ●September 6, 20182018-09-06

Joseph Gwinn wrote
>> I am not sure hashing is faster, what gives you that idea?
>> I have used it, was long time ago, trying to remember...
>> You may well be right, but why?
>
>On average, hashing is way faster than anything else for large datasets, but 
>must be used correctly, and the pathalogical cases avoided or addressed. Most 
>importantly, the hash array must be large enough that collisions are 
>reasonabley rare. Each hash array location contains either null (no data 
>here) or a pointer to a short linked list of data items.
>
>To load the array, take every data item in term and generate the hash index 
>from the item&rsquo;s key, whatever that may be. This process is called hashing. 
>Any hash algorithm that spreads the data items more or less uniformly in the 
>hash index array will do. A very common approach is to divide the key by a 
>prime number (this being the length of the hash array as well), and use the 
>remainder as the index (address) into the hash index array. The quotient is 
>discarded. There are lots of hashing algorithms in the literature, but keep 
>it simple.

Yes, I'v done that, several places in this newsreader, here for the groups list
(there are thousands of groups on some severs, at least there ware in the old days):

int hash(s)/* form hash value for string s */
char *s;
{ 
int hashval;
  
for(hashval = 0; *s != '\0';) hashval += *s++;
/* sum of ascii value of characters divided by tablesize */
return(hashval % GROUPS_HASH_SIZE);
}


Those groups, like for example sci.electonics.design, are in a linked list.

struct group *lookup_group(char *name)
{
struct group *pa;

if(debug_flag)
    {
    fprintf(stdout, "lookup_group(): arg name=%s\n", name);
    }
     
/*   
fprintf(stdout,\
"lookup_group(): looking up name=%s at grouptab=%d loc=%ld\n",\
name, hash(name), grouptab[hash(name)]);
*/
  
for(pa = grouptab[hash(name)]; pa != 0; pa = pa -> nxtentr)
    {
    if(strcmp(pa -> name, name) == 0) return pa; /* found sequence entry */
    }

return 0; /* not found */
} /* end function lookup_group */


etc etc...
int walk_grouplist_and_display(struct newsgroup *pa)
{
char sformatstr[4];
/*char *ptr;*/

/*   
Note: filter has already been applied, only allowed groups are in this list.
*/   

/*
This routine prints out the tree recursively: walk left node, print
value of this node, and walk right node
*/
.....





>At the just computed location in the hash index array, the value will either 
>be null - no data value hashed to that location, or there will be a pointer 
>to a small data item containing the full hash key and the data being saved 
>(or a pointer to the data, if the data is large). In either case, add the new 
>data item to the list of data items at that hash location.
>
>To use the now-loaded array, go through the same index address computation as 
>used when loading the array, find the short list at that location in the 
>table, and look through it for a match. If the database is large enough, 
>things may be arranged to allow the list at a hash-table location to be 
>searched by succesive approximation, but usually it&rsquo;s cheaper to just 
>traverse the list linearly, because if the table is large enough, very few of 
>these lists are longer than two or three items long.
>
>Hash tables in 2D are common for searching 2D maps of what is close to what 
>geographically.


Indeed for this specific purpose (newsgroups) a binary search makes no sense I think.


>For a table of ten million IPv32 addresses, one would take the 32-bit address 
>(without punctuation) and divide by 10000019, and use the remainder to index 
>the hash table. Use a Poisson distribution to determine the fraction of cells 
>with 0, 1,2,3, ... entries.

That could work, but testing shows that in this case, with IP addresses in incremental order,
(pre processed list) it takes ont log(n) or basicaly about what was it I tested? far less less than 12 iterations anyway
that are only a number compare.
Hashing makes no sense here I think.
If you can write hashing code that can do that faster I would like to see it.
Since ip_to_country has to run in real time faster than a bad actor can hit the server,
it needed to be _really_ fast.

I think I got that working.




>Spellcheckers also use a large hash table, one with one-bit table entries - 
>if such a word exists (is spelled that way), the bit will be set. Collisions 
>are ignored - the bit remains set. If the hash table is large enough, false 
>positives (misspelled word marked as correct) will be rare.

Yes there it makes perfect sense.

Reply by David Brown ●September 6, 20182018-09-06

On 05/09/18 18:57, bitrex wrote:
> On 09/05/2018 11:00 AM, 698839253X6D445TD@nospam.org wrote:
>> Joseph Gwinn wrote
>>> On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote
>>> This successive approximation approach is well known in computer
>>> science,
>>> where it is called &ldquo;binary search&rdquo; in English. As you note, it is
>>> very
>>> fast.
>>>
>>> .<https://en.wikipedia.org/wiki/Binary_search_algorithm>
>>>
>>> The only algorithm that is faster is hashing, which is also discussed
>>> in the
>>> above wiki article. The lookup time for hashing is constant,
>>> regardless of
>>> the size of the dataset.
>>
>> I am not sure hashing is faster, what gives you that idea?
>> I have used it, was long time ago, trying to remember...
>> You may well be right, but why?
>>
> 
> Binary search time-complexity is O(1) best-case and O(log n) worst-case,
> a hash table is O(1) best-case and O(n) worst-case.
> 
> A naive binary search implementation can certainly beat a pathological
> hash table implementation in the average; one would hope most hash
> tables aren't pathological, though, and binary search is only O(1) in
> the average for pathological use-cases of binary search.

Yes, a binary search is O(log n) on average (as well as worst-case), and
it is easy to achieve.  Hash algorithms need to be tuned to the
application, with hash tables of a size to suit the data size and hash
functions that give good distributions.  Done well, you achieve very
close to O(1) on average.

Reply by Joseph Gwinn ●September 6, 20182018-09-06

On Sep 6, 2018, Jasen Betts wrote
(in article <pmqdnj$svq$1@gonzo.alcatraz>):

> On 2018-09-05, Joseph Gwinn<joegwinn@comcast.net>  wrote:
> >
> > Hash tables in 2D are common for searching 2D maps of what is close to what
> > geographically.
>
> how is that going to work? hashes don't support in-order access

Unlike a line, there is no concept of numerical order in a plane. The metric 
of &ldquo;near&rdquo; is instead euclidean distance.

So, given a 2D hash index array, one finds the X and Y index values of the 
cell into which the current item will fall, and search that cell and also the 
eight abutting neighbor cells. And maybe the sixteen cells that abut the 
eight direct abutters. X and Y are the hashed longitude and attitude 
respectively, or a local planar map of same.
Joe Gwinn

Reply by Phil Hobbs ●September 7, 20182018-09-07

On 09/03/2018 01:25 PM, Phil Hobbs wrote:
> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1
> LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP:&nbsp; 67 cents in
> reels, dev board $27.
> 
> We're using it for a super-quiet diode laser controller where it will
> run the temperature control and automatic power control loops.&nbsp; I'll
> never use another ATmega. ;)
> 
> Cheers
> 
> Phil Hobbs
> 

The dev boards came in, and they're pretty slick.  For the $27 you get
not only the MCU and so on, but two shields, one for capacitive touch
and one full of buttons and switches and jumpers and stuff, for use with
the programmable logic unit.  The PLU is a small CPLD with 26 5-input
LUTs that you can program in Verilog.  Super handy where hardware timing
is needed, which it usually is in instruments.

The other thing I really like about the LPC804 is that it has this giant
pin multiplexer, so that you can put just about any peripheral function
on any GPIO pin, which is a huge time saver compared to the assing
around you have to do on other MCUs to try to get the best possible
combination of limited pinout choices.  You can mix and match inputs and
outputs, e.g. you can put a capture input and a UART TXD on the same pin
if you like.  Some combinations are silly, but some are potentially
useful, and it  Probably the big MUX works a lot better at 15 MHz than 200.

They basically give the dev board away, but (I imagine) charge just
enough that hobbyists aren't so tempted to use it instead of an Arduino
or RasPi.

Fun.

Cheers

Phil Hobbs

-- 
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
https://hobbs-eo.com

Previous 6 789 10 Next

Nice MCU for small jobs

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group