Reply by John Larkin October 5, 20202020-10-05
On Mon, 5 Oct 2020 13:20:26 -0700 (PDT), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>mandag den 5. oktober 2020 kl. 22.06.28 UTC+2 skrev John Larkin: >> On Fri, 25 Sep 2020 12:16:07 -0700, John Larkin >> <jlarkin@highland_atwork_technology.com> wrote: >> >> > >> > >> >I have a time-critical thing where the signal passes through an XC7A15 >> >FPGA and does a fair lot of stuff inside. I measured delay vs some >> >voltages: >> > >> >1.8 aux no measurable DC effect >> > >> >3.3 vccio no measurable DC effect >> > >> >2.5 vccio ditto (key io's are LVDS in this bank) >> > >> >+1 core -10 ps per millivolt! >> > >> >If I vary the trigger frequency, I can see the delay heterodyning >> >against the 1.8V switcher frequency, a few ps p-p maybe. Gotta track >> >that down. >> > >> >A spritz of freeze spray on the chip had practically no effect on >> >delay through the chip, on a scope at 100 ps/div. >> > >> >I expected sensitivity to core voltage, so we'll make sure we have a >> >serious, analog-quality voltage regulator next rev. >> > >> >The temperature thing surprised me. I was used to CMOS having a >> >serious positive delay TC. Maybe modern FPGAs have some sort of >> >temperature compensation designed in? >> > >> >We also have a ZYNQ on this board that crashes the ARM core >> >erratically, especially when the chip is hot. It might crash in maybe >> >a half hour MTBF if the chip reports 55C internally; the FPGA part >> >keeps going. At powerup boot from an SD card, it will always configure >> >the PL FPGA side, but will then fail to run our application if the >> >chip is hot. We're playing with DRAM and CPU clock rates to see if >> >that has much effect. >> > >> > >> >> Fixed both problems. >> >> Jitter: replaced the 1.8V Vccaux switcher with a linear regulator. >> > >I believe the mixed mode clock manger and pll in the PL is powered from Vccaux
I did a static sensitivity test on the critical-path FPGA. It showed essentially zero through-chip delay vs Vccaux. It was super sensitive to core voltage. But the +1 core supply was LDO'ed from the noisy 1.8, so maybe some noise sneaked through there. I'm going to rip out some switchers and use a chain of LDOs to make the various supplies for the critical XC7A15 FPGA. The Zynq is not in the picoseconds-time-critical path. +5 ldo to 3.3 for i/o banks 3.3 ldo to 2.5 for the bank that does LVDS 2.5 ldo to 1.8 for aux 1.8 ldo to 1.0 for core in one long string. We're using ST1L08 regs, super low dropout, good filtering, small and cheap.
Reply by Lasse Langwadt Christensen October 5, 20202020-10-05
mandag den 5. oktober 2020 kl. 22.06.28 UTC+2 skrev John Larkin:
> On Fri, 25 Sep 2020 12:16:07 -0700, John Larkin > <jlarkin@highland_atwork_technology.com> wrote: > > > > > > >I have a time-critical thing where the signal passes through an XC7A15 > >FPGA and does a fair lot of stuff inside. I measured delay vs some > >voltages: > > > >1.8 aux no measurable DC effect > > > >3.3 vccio no measurable DC effect > > > >2.5 vccio ditto (key io's are LVDS in this bank) > > > >+1 core -10 ps per millivolt! > > > >If I vary the trigger frequency, I can see the delay heterodyning > >against the 1.8V switcher frequency, a few ps p-p maybe. Gotta track > >that down. > > > >A spritz of freeze spray on the chip had practically no effect on > >delay through the chip, on a scope at 100 ps/div. > > > >I expected sensitivity to core voltage, so we'll make sure we have a > >serious, analog-quality voltage regulator next rev. > > > >The temperature thing surprised me. I was used to CMOS having a > >serious positive delay TC. Maybe modern FPGAs have some sort of > >temperature compensation designed in? > > > >We also have a ZYNQ on this board that crashes the ARM core > >erratically, especially when the chip is hot. It might crash in maybe > >a half hour MTBF if the chip reports 55C internally; the FPGA part > >keeps going. At powerup boot from an SD card, it will always configure > >the PL FPGA side, but will then fail to run our application if the > >chip is hot. We're playing with DRAM and CPU clock rates to see if > >that has much effect. > > > > > > Fixed both problems. > > Jitter: replaced the 1.8V Vccaux switcher with a linear regulator. >
I believe the mixed mode clock manger and pll in the PL is powered from Vccaux
Reply by John Larkin October 5, 20202020-10-05
On Fri, 25 Sep 2020 12:16:07 -0700, John Larkin
<jlarkin@highland_atwork_technology.com> wrote:

> > >I have a time-critical thing where the signal passes through an XC7A15 >FPGA and does a fair lot of stuff inside. I measured delay vs some >voltages: > >1.8 aux no measurable DC effect > >3.3 vccio no measurable DC effect > >2.5 vccio ditto (key io's are LVDS in this bank) > >+1 core -10 ps per millivolt! > >If I vary the trigger frequency, I can see the delay heterodyning >against the 1.8V switcher frequency, a few ps p-p maybe. Gotta track >that down. > >A spritz of freeze spray on the chip had practically no effect on >delay through the chip, on a scope at 100 ps/div. > >I expected sensitivity to core voltage, so we'll make sure we have a >serious, analog-quality voltage regulator next rev. > >The temperature thing surprised me. I was used to CMOS having a >serious positive delay TC. Maybe modern FPGAs have some sort of >temperature compensation designed in? > >We also have a ZYNQ on this board that crashes the ARM core >erratically, especially when the chip is hot. It might crash in maybe >a half hour MTBF if the chip reports 55C internally; the FPGA part >keeps going. At powerup boot from an SD card, it will always configure >the PL FPGA side, but will then fail to run our application if the >chip is hot. We're playing with DRAM and CPU clock rates to see if >that has much effect. > >
Fixed both problems. Jitter: replaced the 1.8V Vccaux switcher with a linear regulator. Temperature-dependant crashing: I found an oscillation on the Zynq 1v core power supply, about 100 mV p-p and 80 KHz. Putting a lot more capacitance at the switcher output kills that and makes the crash go away. The regulator design followed a chart in the LTM8078 data sheet. A Spice sim with the original values looks stable, no oscillation and a clean load-step recovery. There are other indications that ADI's Spice model of the LTM8078 is less than perfect. I think ADI is struggling to add a lot of new parts to the LT Spice libraries. Mike E in an interview suggested that rushing them out was compromising quality. Then he quit. Glad I fixed this this way. Guys were snooping the AXIbus and Linux at great expense and no progress.
Reply by Lasse Langwadt Christensen October 2, 20202020-10-02
fredag den 2. oktober 2020 kl. 16.15.52 UTC+2 skrev jla...@highlandsniptechnology.com:
> On Fri, 2 Oct 2020 10:56:43 +0100 (BST), mjb@signal11.invalid (Mike) > wrote: > > >In article <rl4to3$crq$1@dont-email.me>, > >Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote: > > > >>After searching for the cause, it proved that the refresh > >>circuitry was totally broken (a bad chip), so the DRAMs > >>did not forget in milliseconds, but seconds. > > > >The official spec for 4164 DRAM chips says "refresh at > >least every 4ms". > > > >In an Oric (6502A based) computer, a ULA is used to > >provide memory refresh as a side effect of building the > >TV picture. Suppressing the memory refresh by holding > >this ULA in a "reset" state for a second or so seems > >to have no effect on memory contents, even though this > >also stops the system 1MHz clock. > > > >Everything comes back working when the reset is released. > > > >It takes at least a couple of seconds of refresh/clock > >loss for corruption of screen memory contents or the > >system to crash (bad data/bad code in RAM, loss of > >dynamic registers in the 6502A). > > > >Didn't expect that, so DRAM *is* more resilient than you'd > >think. > > We're using a Micron 64G DDR BGA part, which is "self refreshing" > whatever that means.
is it not the same part as on the microzed? try loading the standard linux image and see if that also crashes
Reply by October 2, 20202020-10-02
On Fri, 2 Oct 2020 10:56:43 +0100 (BST), mjb@signal11.invalid (Mike)
wrote:

>In article <rl4to3$crq$1@dont-email.me>, >Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote: > >>After searching for the cause, it proved that the refresh >>circuitry was totally broken (a bad chip), so the DRAMs >>did not forget in milliseconds, but seconds. > >The official spec for 4164 DRAM chips says "refresh at >least every 4ms". > >In an Oric (6502A based) computer, a ULA is used to >provide memory refresh as a side effect of building the >TV picture. Suppressing the memory refresh by holding >this ULA in a "reset" state for a second or so seems >to have no effect on memory contents, even though this >also stops the system 1MHz clock. > >Everything comes back working when the reset is released. > >It takes at least a couple of seconds of refresh/clock >loss for corruption of screen memory contents or the >system to crash (bad data/bad code in RAM, loss of >dynamic registers in the 6502A). > >Didn't expect that, so DRAM *is* more resilient than you'd >think.
We're using a Micron 64G DDR BGA part, which is "self refreshing" whatever that means. The data sheet is 132 pages. But there are a jillion parameters that the Vivado software uses to build the DRAM interface, so maybe we have one of those wrong. My guys like to tune for performance, and I like to tune for reliable and good enough. An older version of this product used a 68332 CPU running at 16 MHz. Now we have dual ARM cores running at 600 MHz, with cache. We don't need to push anything. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
Reply by Mike October 2, 20202020-10-02
In article <rl4to3$crq$1@dont-email.me>,
Tauno Voipio  <tauno.voipio@notused.fi.invalid> wrote:

>After searching for the cause, it proved that the refresh >circuitry was totally broken (a bad chip), so the DRAMs >did not forget in milliseconds, but seconds.
The official spec for 4164 DRAM chips says "refresh at least every 4ms". In an Oric (6502A based) computer, a ULA is used to provide memory refresh as a side effect of building the TV picture. Suppressing the memory refresh by holding this ULA in a "reset" state for a second or so seems to have no effect on memory contents, even though this also stops the system 1MHz clock. Everything comes back working when the reset is released. It takes at least a couple of seconds of refresh/clock loss for corruption of screen memory contents or the system to crash (bad data/bad code in RAM, loss of dynamic registers in the 6502A). Didn't expect that, so DRAM *is* more resilient than you'd think. -- --------------------------------------+------------------------------------ Mike Brown: mjb[-at-]signal11.org.uk | http://www.signal11.org.uk
Reply by ke.....@kjwdesigns.com October 1, 20202020-10-01
On Thursday, 1 October 2020 at 07:50:45 UTC-7, jla...@highlandsniptechnology.com wrote:
..
> We're still seeing our problem on some boxes. It looks like the > boot-time stuff, which runs in cpu SRAM, works, but then Linux crashes > when the chip is warm. > > Vcc_core = 1.1 volts fixes it. 0.92 breaks it hard. People are still > hunting. >...
I had a tricky problem with somewhat similar symptoms (I don't remember whether it was temperature-sensitive) but it also was cured by increasing the core voltage. We worked with Xilinx on that and it seems that there can be package resonances in the 30-50MHz range (this was a Virtex 5 in a large package). Our system was running with a 168MHz clock and 5 time-slots but one of time-slots had no significant processing. The result was that we had 30A pulses in the supply current at ~33MHz. I did a board spin to increase external decoupling without any improvement. The fix we took into production that avoided the problem was to process random data during the fifth time slot to reduce the supply current perturbations. kw
Reply by October 1, 20202020-10-01
On Thu, 1 Oct 2020 18:48:19 +0300, Tauno Voipio
<tauno.voipio@notused.fi.invalid> wrote:

>On 1.10.20 18.24, Gerhard Hoffmann wrote: >> Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com: >> >>> >>> The tools for tracking down things like this are few. >>> >>> Might be a DRAM problem, but it runs the DRAM test OK. >> >> Back in Z80 days I knew someone who could run DRAM tests >> all day long without a single error. >> And that was the only thing he could run on this Z80. >> >> Turned out the Z80 supplies 7 Bits for refresh and he had >> bought 64K rams with 8 bit refresh. And LOTs of them. >> >> The DRAM test program did its own refresh by addressing >> all possible row adresses. >> >> >> Cheers, Gerhard > > >This reminds me of a CP/M computer we built using a Z80 >and DRAMs (with proper 7 bit refresh). The computer booted >fine and run as long as it was not left idle for longer >than some seconds. The idle period killed it totally. > >After searching for the cause, it proved that the refresh >circuitry was totally broken (a bad chip), so the DRAMs >did not forget in milliseconds, but seconds.
Sometimes a DRAM can remember for many seconds without refresh. We will look into possible refresh issues. We hadn't considered that. Worst case, we could maybe run a little program that did refresh. -- John Larkin Highland Technology, Inc Science teaches us to doubt. Claude Bernard
Reply by Tauno Voipio October 1, 20202020-10-01
On 1.10.20 18.24, Gerhard Hoffmann wrote:
> Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com: > >> >> The tools for tracking down things like this are few. >> >> Might be a DRAM problem, but it runs the DRAM test OK. > > Back in Z80 days I knew someone who could run DRAM tests > all day long without a single error. > And that was the only thing he could run on this Z80. > > Turned out the Z80 supplies 7 Bits for refresh and he had > bought 64K rams with 8 bit refresh. And LOTs of them. > > The DRAM test program did its own refresh by addressing > all possible row adresses. > > > Cheers, Gerhard
This reminds me of a CP/M computer we built using a Z80 and DRAMs (with proper 7 bit refresh). The computer booted fine and run as long as it was not left idle for longer than some seconds. The idle period killed it totally. After searching for the cause, it proved that the refresh circuitry was totally broken (a bad chip), so the DRAMs did not forget in milliseconds, but seconds. -- -TV
Reply by Gerhard Hoffmann October 1, 20202020-10-01
Am 01.10.20 um 16:50 schrieb jlarkin@highlandsniptechnology.com:

> > The tools for tracking down things like this are few. > > Might be a DRAM problem, but it runs the DRAM test OK.
Back in Z80 days I knew someone who could run DRAM tests all day long without a single error. And that was the only thing he could run on this Z80. Turned out the Z80 supplies 7 Bits for refresh and he had bought 64K rams with 8 bit refresh. And LOTs of them. The DRAM test program did its own refresh by addressing all possible row adresses. Cheers, Gerhard