Electronics-Related.com
Forums

FPGA sensitivities

Started by John Larkin September 25, 2020
On Sat, 26 Sep 2020 07:25:36 +0200, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

>Am 26.09.20 um 04:41 schrieb jlarkin@highlandsniptechnology.com: > >> >> I was wondering if anyone else had a problem like this. > >That leads to the question if it happens on more >than one board. > > >Gerhard > >
Four. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
On 9/25/20 7:41 PM, jlarkin@highlandsniptechnology.com wrote:
> I was wondering if anyone else had a problem like this.
We're currently debugging a ZYNQ 7045 that seems to lose some of its register programming when first coming up. Re-writing after boot seems to be our only workaround to the problem. So far just one board is exhibiting the problem. I don't believe it is a heat problem as we have a heatsink installed and air blowing on it, plus it happens on a cold start. Buzz
On Wed, 30 Sep 2020 20:56:07 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

>On 9/25/20 7:41 PM, jlarkin@highlandsniptechnology.com wrote: >> I was wondering if anyone else had a problem like this. > >We're currently debugging a ZYNQ 7045 that seems to lose some of its >register programming when first coming up. Re-writing after boot seems >to be our only workaround to the problem. So far just one board is >exhibiting the problem. I don't believe it is a heat problem as we have >a heatsink installed and air blowing on it, plus it happens on a cold start. > >Buzz
You might test to see if it is temperature sensitive. Just spritz it with a heat gun and freeze spray. We're still seeing our problem on some boxes. It looks like the boot-time stuff, which runs in cpu SRAM, works, but then Linux crashes when the chip is warm. Vcc_core = 1.1 volts fixes it. 0.92 breaks it hard. People are still hunting. The tools for tracking down things like this are few. Might be a DRAM problem, but it runs the DRAM test OK. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com:

> > The tools for tracking down things like this are few. > > Might be a DRAM problem, but it runs the DRAM test OK.
Back in Z80 days I knew someone who could run DRAM tests all day long without a single error. And that was the only thing he could run on this Z80. Turned out the Z80 supplies 7 Bits for refresh and he had bought 64K rams with 8 bit refresh. And LOTs of them. The DRAM test program did its own refresh by addressing all possible row adresses. Cheers, Gerhard
On 1.10.20 18.24, Gerhard Hoffmann wrote:
> Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com: > >> >> The tools for tracking down things like this are few. >> >> Might be a DRAM problem, but it runs the DRAM test OK. > > Back in Z80 days I knew someone who could run DRAM tests > all day long without a single error. > And that was the only thing he could run on this Z80. > > Turned out the Z80 supplies 7 Bits for refresh and he had > bought 64K rams with 8 bit refresh. And LOTs of them. > > The DRAM test program did its own refresh by addressing > all possible row adresses. > > > Cheers, Gerhard
This reminds me of a CP/M computer we built using a Z80 and DRAMs (with proper 7 bit refresh). The computer booted fine and run as long as it was not left idle for longer than some seconds. The idle period killed it totally. After searching for the cause, it proved that the refresh circuitry was totally broken (a bad chip), so the DRAMs did not forget in milliseconds, but seconds. -- -TV
On Thu, 1 Oct 2020 18:48:19 +0300, Tauno Voipio
<tauno.voipio@notused.fi.invalid> wrote:

>On 1.10.20 18.24, Gerhard Hoffmann wrote: >> Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com: >> >>> >>> The tools for tracking down things like this are few. >>> >>> Might be a DRAM problem, but it runs the DRAM test OK. >> >> Back in Z80 days I knew someone who could run DRAM tests >> all day long without a single error. >> And that was the only thing he could run on this Z80. >> >> Turned out the Z80 supplies 7 Bits for refresh and he had >> bought 64K rams with 8 bit refresh. And LOTs of them. >> >> The DRAM test program did its own refresh by addressing >> all possible row adresses. >> >> >> Cheers, Gerhard > > >This reminds me of a CP/M computer we built using a Z80 >and DRAMs (with proper 7 bit refresh). The computer booted >fine and run as long as it was not left idle for longer >than some seconds. The idle period killed it totally. > >After searching for the cause, it proved that the refresh >circuitry was totally broken (a bad chip), so the DRAMs >did not forget in milliseconds, but seconds.
Sometimes a DRAM can remember for many seconds without refresh. We will look into possible refresh issues. We hadn't considered that. Worst case, we could maybe run a little program that did refresh. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
On Thursday, 1 October 2020 at 07:50:45 UTC-7, jla...@highlandsniptechnology.com wrote:
..
> We're still seeing our problem on some boxes. It looks like the > boot-time stuff, which runs in cpu SRAM, works, but then Linux crashes > when the chip is warm. > > Vcc_core = 1.1 volts fixes it. 0.92 breaks it hard. People are still > hunting. >...
I had a tricky problem with somewhat similar symptoms (I don't remember whether it was temperature-sensitive) but it also was cured by increasing the core voltage. We worked with Xilinx on that and it seems that there can be package resonances in the 30-50MHz range (this was a Virtex 5 in a large package). Our system was running with a 168MHz clock and 5 time-slots but one of time-slots had no significant processing. The result was that we had 30A pulses in the supply current at ~33MHz. I did a board spin to increase external decoupling without any improvement. The fix we took into production that avoided the problem was to process random data during the fifth time slot to reduce the supply current perturbations. kw
In article <rl4to3$crq$1@dont-email.me>,
Tauno Voipio  <tauno.voipio@notused.fi.invalid> wrote:

>After searching for the cause, it proved that the refresh >circuitry was totally broken (a bad chip), so the DRAMs >did not forget in milliseconds, but seconds.
The official spec for 4164 DRAM chips says "refresh at least every 4ms". In an Oric (6502A based) computer, a ULA is used to provide memory refresh as a side effect of building the TV picture. Suppressing the memory refresh by holding this ULA in a "reset" state for a second or so seems to have no effect on memory contents, even though this also stops the system 1MHz clock. Everything comes back working when the reset is released. It takes at least a couple of seconds of refresh/clock loss for corruption of screen memory contents or the system to crash (bad data/bad code in RAM, loss of dynamic registers in the 6502A). Didn't expect that, so DRAM *is* more resilient than you'd think. -- --------------------------------------+------------------------------------ Mike Brown: mjb[-at-]signal11.org.uk | http://www.signal11.org.uk
On Fri, 2 Oct 2020 10:56:43 +0100 (BST), mjb@signal11.invalid (Mike)
wrote:

>In article <rl4to3$crq$1@dont-email.me>, >Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote: > >>After searching for the cause, it proved that the refresh >>circuitry was totally broken (a bad chip), so the DRAMs >>did not forget in milliseconds, but seconds. > >The official spec for 4164 DRAM chips says "refresh at >least every 4ms". > >In an Oric (6502A based) computer, a ULA is used to >provide memory refresh as a side effect of building the >TV picture. Suppressing the memory refresh by holding >this ULA in a "reset" state for a second or so seems >to have no effect on memory contents, even though this >also stops the system 1MHz clock. > >Everything comes back working when the reset is released. > >It takes at least a couple of seconds of refresh/clock >loss for corruption of screen memory contents or the >system to crash (bad data/bad code in RAM, loss of >dynamic registers in the 6502A). > >Didn't expect that, so DRAM *is* more resilient than you'd >think.
We're using a Micron 64G DDR BGA part, which is "self refreshing" whatever that means. The data sheet is 132 pages. But there are a jillion parameters that the Vivado software uses to build the DRAM interface, so maybe we have one of those wrong. My guys like to tune for performance, and I like to tune for reliable and good enough. An older version of this product used a 68332 CPU running at 16 MHz. Now we have dual ARM cores running at 600 MHz, with cache. We don't need to push anything. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
fredag den 2. oktober 2020 kl. 16.15.52 UTC+2 skrev jla...@highlandsniptechnology.com:
> On Fri, 2 Oct 2020 10:56:43 +0100 (BST), mjb@signal11.invalid (Mike) > wrote: > > >In article <rl4to3$crq$1@dont-email.me>, > >Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote: > > > >>After searching for the cause, it proved that the refresh > >>circuitry was totally broken (a bad chip), so the DRAMs > >>did not forget in milliseconds, but seconds. > > > >The official spec for 4164 DRAM chips says "refresh at > >least every 4ms". > > > >In an Oric (6502A based) computer, a ULA is used to > >provide memory refresh as a side effect of building the > >TV picture. Suppressing the memory refresh by holding > >this ULA in a "reset" state for a second or so seems > >to have no effect on memory contents, even though this > >also stops the system 1MHz clock. > > > >Everything comes back working when the reset is released. > > > >It takes at least a couple of seconds of refresh/clock > >loss for corruption of screen memory contents or the > >system to crash (bad data/bad code in RAM, loss of > >dynamic registers in the 6502A). > > > >Didn't expect that, so DRAM *is* more resilient than you'd > >think. > > We're using a Micron 64G DDR BGA part, which is "self refreshing" > whatever that means.
is it not the same part as on the microzed? try loading the standard linux image and see if that also crashes