Electronics-Related.com
Forums

Nice MCU for small jobs

Started by Phil Hobbs September 3, 2018
On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote
(in article <pmor53$153b$3@gioia.aioe.org>):

> Joseph Gwinn wrote > > On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote > > This successive approximation approach is well known in computer science, > > where it is called a "binary search" in English. As you note, it is > > very fast. > > > > .<https://en.wikipedia.org/wiki/Binary_search_algorithm> > > > > The only algorithm that is faster is hashing, which is also discussed in the > > above wiki article. The lookup time for hashing is constant, regardless of > > the size of the dataset. > > I am not sure hashing is faster, what gives you that idea? > I have used it, was long time ago, trying to remember... > You may well be right, but why?
On average, hashing is way faster than anything else for large datasets, but must be used correctly, and the pathalogical cases avoided or addressed. Most importantly, the hash array must be large enough that collisions are reasonabley rare. Each hash array location contains either null (no data here) or a pointer to a short linked list of data items. To load the array, take every data item in term and generate the hash index from the item&rsquo;s key, whatever that may be. This process is called hashing. Any hash algorithm that spreads the data items more or less uniformly in the hash index array will do. A very common approach is to divide the key by a prime number (this being the length of the hash array as well), and use the remainder as the index (address) into the hash index array. The quotient is discarded. There are lots of hashing algorithms in the literature, but keep it simple. At the just computed location in the hash index array, the value will either be null - no data value hashed to that location, or there will be a pointer to a small data item containing the full hash key and the data being saved (or a pointer to the data, if the data is large). In either case, add the new data item to the list of data items at that hash location. To use the now-loaded array, go through the same index address computation as used when loading the array, find the short list at that location in the table, and look through it for a match. If the database is large enough, things may be arranged to allow the list at a hash-table location to be searched by succesive approximation, but usually it&rsquo;s cheaper to just traverse the list linearly, because if the table is large enough, very few of these lists are longer than two or three items long. Hash tables in 2D are common for searching 2D maps of what is close to what geographically. For a table of ten million IPv32 addresses, one would take the 32-bit address (without punctuation) and divide by 10000019, and use the remainder to index the hash table. Use a Poisson distribution to determine the fraction of cells with 0, 1,2,3, ... entries. Spellcheckers also use a large hash table, one with one-bit table entries - if such a word exists (is spelled that way), the bit will be set. Collisions are ignored - the bit remains set. If the hash table is large enough, false positives (misspelled word marked as correct) will be rare. Joe Gwinn
On Monday, September 3, 2018 at 1:25:06 PM UTC-4, Phil Hobbs wrote:
> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1 > LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP: 67 cents in > reels, dev board $27. > > We're using it for a super-quiet diode laser controller where it will > run the temperature control and automatic power control loops. I'll > never use another ATmega. ;) > > Cheers > > Phil Hobbs > > -- > Dr Philip C D Hobbs > Principal Consultant > ElectroOptical Innovations LLC / Hobbs ElectroOptics > Optics, Electro-optics, Photonics, Analog Electronics > Briarcliff Manor NY 10510 > > http://electrooptical.net > http://hobbs-eo.com
Why don't you just start a thread about your favorite color?
Am 06.09.2018 um 02:00 schrieb bloggs.fredbloggs.fred@gmail.com:
> On Monday, September 3, 2018 at 1:25:06 PM UTC-4, Phil Hobbs wrote: >> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1 >> LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP: 67 cents in >> reels, dev board $27. >> >> We're using it for a super-quiet diode laser controller where it will >> run the temperature control and automatic power control loops. I'll >> never use another ATmega. ;) >> >> Cheers >> >> Phil Hobbs >> >> -- >> Dr Philip C D Hobbs >> Principal Consultant >> ElectroOptical Innovations LLC / Hobbs ElectroOptics >> Optics, Electro-optics, Photonics, Analog Electronics >> Briarcliff Manor NY 10510 >> >> http://electrooptical.net >> http://hobbs-eo.com > > Why don't you just start a thread about your favorite color? >
Because he talks about electronics and not the weather in Japan?
On Wed, 5 Sep 2018 14:18:31 -0400, bitrex <user@example.net> wrote:

>On 09/05/2018 02:13 PM, bitrex wrote: >> On 09/05/2018 01:31 PM, 698839253X6D445TD@nospam.org wrote: >>> bitrex wrote >>>> Binary search time-complexity is O(1) best-case and O(log n) worst-case, >>>> a hash table is O(1) best-case and O(n) worst-case. >>>> >>>> A naive binary search implementation can certainly beat a pathological >>>> hash table implementation in the average; one would hope most hash >>>> tables aren't pathological, though, and binary search is only O(1) in >>>> the average for pathological use-cases of binary search. >>> >>> OK, yes, exactly. >>> >>> Just did look up where I used hashing, >>> I did actually use hashing for message IDs in my newsreader, >>> to stop cross posted repeats from being listed again and again. >>> and also in my videotext / ceefax program for storing pages. >>> Both applications are not really that time critical and the data sets >>> are small. >>> >> >> Big O is really only useful for analysis comparisons for very large data >> sets, where for example your data set is >> CPU cache size on desktop >> architectures. what's actually better in practice for small datasets >> where the set-up/tear-down overhead of the algorithm and execution time >> of things like branches becomes on the order of the time you spend >> actually carrying out the full search. >> >> For "small" data sets sometimes just linear search beats everything else >> hands-down. What the specific uses cases it will are and what the >> precise definition of "small" is for a particular architecture endlessly >> debated; x86 is so complex it's often anyone's guess, you have to >> profile to see. >> >> as usual the answer to "what's the best way to..." is "it depends" > >For example if your arch of choice has truly exceptional branch >prediction binary search can be faster even in cases where an on-paper >analysis says it shouldn't be, just cuz a binary search can be written >where it's a lot easier for a the branch predictor to suss out the taken >branch than a naive linear search or hash table lookup.
You also have to consider the data base usage, is it search only or also frequent insertion of new record. A well balanced binary tree is fast, but if there are a lot of new insertions, you easily end up in pathological cases, with bad search times. Inserting new records may require a lot of heavy operations keeping the tree well balanced.
On 2018-09-05, Joseph Gwinn <joegwinn@comcast.net> wrote:
> > Hash tables in 2D are common for searching 2D maps of what is close to what > geographically.
how is that goes to work? hashes don't support in-order access -- &#1578;
On 2018-09-05, Joseph Gwinn <joegwinn@comcast.net> wrote:
> > Hash tables in 2D are common for searching 2D maps of what is close to what > geographically.
how is that going to work? hashes don't support in-order access -- &#1578;
Joseph Gwinn wrote
>> I am not sure hashing is faster, what gives you that idea? >> I have used it, was long time ago, trying to remember... >> You may well be right, but why? > >On average, hashing is way faster than anything else for large datasets, but >must be used correctly, and the pathalogical cases avoided or addressed. Most >importantly, the hash array must be large enough that collisions are >reasonabley rare. Each hash array location contains either null (no data >here) or a pointer to a short linked list of data items. > >To load the array, take every data item in term and generate the hash index >from the item&rsquo;s key, whatever that may be. This process is called hashing. >Any hash algorithm that spreads the data items more or less uniformly in the >hash index array will do. A very common approach is to divide the key by a >prime number (this being the length of the hash array as well), and use the >remainder as the index (address) into the hash index array. The quotient is >discarded. There are lots of hashing algorithms in the literature, but keep >it simple.
Yes, I'v done that, several places in this newsreader, here for the groups list (there are thousands of groups on some severs, at least there ware in the old days): int hash(s)/* form hash value for string s */ char *s; { int hashval; for(hashval = 0; *s != '\0';) hashval += *s++; /* sum of ascii value of characters divided by tablesize */ return(hashval % GROUPS_HASH_SIZE); } Those groups, like for example sci.electonics.design, are in a linked list. struct group *lookup_group(char *name) { struct group *pa; if(debug_flag) { fprintf(stdout, "lookup_group(): arg name=%s\n", name); } /* fprintf(stdout,\ "lookup_group(): looking up name=%s at grouptab=%d loc=%ld\n",\ name, hash(name), grouptab[hash(name)]); */ for(pa = grouptab[hash(name)]; pa != 0; pa = pa -> nxtentr) { if(strcmp(pa -> name, name) == 0) return pa; /* found sequence entry */ } return 0; /* not found */ } /* end function lookup_group */ etc etc... int walk_grouplist_and_display(struct newsgroup *pa) { char sformatstr[4]; /*char *ptr;*/ /* Note: filter has already been applied, only allowed groups are in this list. */ /* This routine prints out the tree recursively: walk left node, print value of this node, and walk right node */ .....
>At the just computed location in the hash index array, the value will either >be null - no data value hashed to that location, or there will be a pointer >to a small data item containing the full hash key and the data being saved >(or a pointer to the data, if the data is large). In either case, add the new >data item to the list of data items at that hash location. > >To use the now-loaded array, go through the same index address computation as >used when loading the array, find the short list at that location in the >table, and look through it for a match. If the database is large enough, >things may be arranged to allow the list at a hash-table location to be >searched by succesive approximation, but usually it&rsquo;s cheaper to just >traverse the list linearly, because if the table is large enough, very few of >these lists are longer than two or three items long. > >Hash tables in 2D are common for searching 2D maps of what is close to what >geographically.
Indeed for this specific purpose (newsgroups) a binary search makes no sense I think.
>For a table of ten million IPv32 addresses, one would take the 32-bit address >(without punctuation) and divide by 10000019, and use the remainder to index >the hash table. Use a Poisson distribution to determine the fraction of cells >with 0, 1,2,3, ... entries.
That could work, but testing shows that in this case, with IP addresses in incremental order, (pre processed list) it takes ont log(n) or basicaly about what was it I tested? far less less than 12 iterations anyway that are only a number compare. Hashing makes no sense here I think. If you can write hashing code that can do that faster I would like to see it. Since ip_to_country has to run in real time faster than a bad actor can hit the server, it needed to be _really_ fast. I think I got that working.
>Spellcheckers also use a large hash table, one with one-bit table entries - >if such a word exists (is spelled that way), the bit will be set. Collisions >are ignored - the bit remains set. If the hash table is large enough, false >positives (misspelled word marked as correct) will be rare.
Yes there it makes perfect sense.
On 05/09/18 18:57, bitrex wrote:
> On 09/05/2018 11:00 AM, 698839253X6D445TD@nospam.org wrote: >> Joseph Gwinn wrote >>> On Sep 5, 2018, 698839253X6D445TD@nospam.org wrote >>> This successive approximation approach is well known in computer >>> science, >>> where it is called &ldquo;binary search&rdquo; in English. As you note, it is >>> very >>> fast. >>> >>> .<https://en.wikipedia.org/wiki/Binary_search_algorithm> >>> >>> The only algorithm that is faster is hashing, which is also discussed >>> in the >>> above wiki article. The lookup time for hashing is constant, >>> regardless of >>> the size of the dataset. >> >> I am not sure hashing is faster, what gives you that idea? >> I have used it, was long time ago, trying to remember... >> You may well be right, but why? >> > > Binary search time-complexity is O(1) best-case and O(log n) worst-case, > a hash table is O(1) best-case and O(n) worst-case. > > A naive binary search implementation can certainly beat a pathological > hash table implementation in the average; one would hope most hash > tables aren't pathological, though, and binary search is only O(1) in > the average for pathological use-cases of binary search.
Yes, a binary search is O(log n) on average (as well as worst-case), and it is easy to achieve. Hash algorithms need to be tuned to the application, with hash tables of a size to suit the data size and hash functions that give good distributions. Done well, you achieve very close to O(1) on average.
On Sep 6, 2018, Jasen Betts wrote
(in article <pmqdnj$svq$1@gonzo.alcatraz>):

> On 2018-09-05, Joseph Gwinn<joegwinn@comcast.net> wrote: > > > > Hash tables in 2D are common for searching 2D maps of what is close to what > > geographically. > > how is that going to work? hashes don't support in-order access
Unlike a line, there is no concept of numerical order in a plane. The metric of &ldquo;near&rdquo; is instead euclidean distance. So, given a 2D hash index array, one finds the X and Y index values of the cell into which the current item will fall, and search that cell and also the eight abutting neighbor cells. And maybe the sixteen cells that abut the eight direct abutters. X and Y are the hashed longitude and attitude respectively, or a local planar map of same. Joe Gwinn
On 09/03/2018 01:25 PM, Phil Hobbs wrote:
> LPC804: 15 MHz Cortex M0+, 32kB flash, 4k RAM, 12-bit 2 us ADC with 1 > LSB DNL, 10-bit DAC, the usual serial stuff, 20-TSSOP:&nbsp; 67 cents in > reels, dev board $27. > > We're using it for a super-quiet diode laser controller where it will > run the temperature control and automatic power control loops.&nbsp; I'll > never use another ATmega. ;) > > Cheers > > Phil Hobbs >
The dev boards came in, and they're pretty slick. For the $27 you get not only the MCU and so on, but two shields, one for capacitive touch and one full of buttons and switches and jumpers and stuff, for use with the programmable logic unit. The PLU is a small CPLD with 26 5-input LUTs that you can program in Verilog. Super handy where hardware timing is needed, which it usually is in instruments. The other thing I really like about the LPC804 is that it has this giant pin multiplexer, so that you can put just about any peripheral function on any GPIO pin, which is a huge time saver compared to the assing around you have to do on other MCUs to try to get the best possible combination of limited pinout choices. You can mix and match inputs and outputs, e.g. you can put a capture input and a UART TXD on the same pin if you like. Some combinations are silly, but some are potentially useful, and it Probably the big MUX works a lot better at 15 MHz than 200. They basically give the dev board away, but (I imagine) charge just enough that hobbyists aren't so tempted to use it instead of an Arduino or RasPi. Fun. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net https://hobbs-eo.com