Electronics-Related.com
Forums

What's Your Favorite Processor on an FPGA?

Started by rickman April 20, 2013
On 22 Apr 2013 14:57:24 GMT, Allan Herriman <allanherriman@hotmail.com> wrote:

>On Mon, 22 Apr 2013 07:09:40 -0700, John Larkin wrote: > >> On 22 Apr 2013 12:59:27 GMT, Allan Herriman <allanherriman@hotmail.com> >> wrote: >> >>>On Sun, 21 Apr 2013 09:05:49 -0700, John Larkin wrote: >>> >>>> The annoying thing is the CPU-to-FPGA interface. It takes a lot of >>>> FPGA pins and it tends to be async and slow. It would be great to have >>>> an industry-standard LVDS-type fast serial interface, with hooks like >>>> shared memory, but transparent and easy to use. >>> >>>You've just described PCI Express. >> >> No. PCIe is insanely complex and has horrible latency. It takes >> something like 2 microseconds to do an 8-bit read over gen1 4-lane PCIe. >> It was designed for throughput, not latency. > >I agree about it being designed for throughput, not latency. However, >with a fairly simple design, we can do 32 bit non-bursting reads or >writes in about 350ns over a single lane of gen 1 through 1 layer of >switching. I suspect there's some problem with your implementation >(unless your 2 microsecond figure was just hyperbole).
Writes are relatively fast, ballpark 350 ns gen1/4lane. Reads are slow, around 2 us. That's from an x86 CPU into the PCIe hard core of an Altera FPGA, cabled PCIe. A read requires two serial packets so is over twice the time of a write. A random read or write from an embedded CPU, to, say, a DPM in an FPGA, really should take tens of nanoseconds. We do parallel ARM-FPGA transfers with a klunky async parallel interface in 100 ns or so, but it takes a lot of pins. From an x86 (not that we'd ever use an Intel chip in an embedded app) we haven't found any way to move more than 32 bits in a non-DMA PCIe read/write, even on a 64-bit CPU that has a few 128-bit MOVE opcodes.
> > >> We've done three PCIe projects so far, and it's the opposite of >> "transparent and easy to use." The PCIe spec reads like the tax code and >> Obamacare combined. > >I found the spec clear. It's rather large though, and a text book serves >as more friendly introduction to the subject than the spec itself. > >One of my co-workers was confused by the way addresses come most >significant octet first, whilst the data come least significant octet >first. It makes sense on a little endian machine, once you get over the >WTF.
Little-endian is evil, another legacy if Intel's clumsiness.
> >Hot plug is the only thing that gives us headaches. PCIe Hot plug is >needed when reconfiguring the FPGA while the system is running. >OS support for hot plug is patchy.
We are still trying to get hot plug to work, both Linux and Windows. HELP! -- John Larkin Highland Technology Inc www.highlandtechnology.com jlarkin at highlandtechnology dot com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom timing and laser controllers Photonics and fiberoptic TTL data links VME analog, thermocouple, LVDT, synchro, tachometer Multichannel arbitrary waveform generators
On Mon, 22 Apr 2013 08:16:04 -0700, John Larkin wrote:

> On 22 Apr 2013 14:57:24 GMT, Allan Herriman <allanherriman@hotmail.com> > wrote: > >>On Mon, 22 Apr 2013 07:09:40 -0700, John Larkin wrote: >> >>> On 22 Apr 2013 12:59:27 GMT, Allan Herriman >>> <allanherriman@hotmail.com> wrote: >>> >>>>On Sun, 21 Apr 2013 09:05:49 -0700, John Larkin wrote: >>>> >>>>> The annoying thing is the CPU-to-FPGA interface. It takes a lot of >>>>> FPGA pins and it tends to be async and slow. It would be great to >>>>> have an industry-standard LVDS-type fast serial interface, with >>>>> hooks like shared memory, but transparent and easy to use. >>>> >>>>You've just described PCI Express. >>> >>> No. PCIe is insanely complex and has horrible latency. It takes >>> something like 2 microseconds to do an 8-bit read over gen1 4-lane >>> PCIe. >>> It was designed for throughput, not latency. >> >>I agree about it being designed for throughput, not latency. However, >>with a fairly simple design, we can do 32 bit non-bursting reads or >>writes in about 350ns over a single lane of gen 1 through 1 layer of >>switching. I suspect there's some problem with your implementation >>(unless your 2 microsecond figure was just hyperbole). > > Writes are relatively fast, ballpark 350 ns gen1/4lane. Reads are slow, > around 2 us. That's from an x86 CPU into the PCIe hard core of an Altera > FPGA, cabled PCIe. A read requires two serial packets so is over twice > the time of a write.
I thought it was faster than that. If I remember, I'll measure some in the lab tomorrow. BTW, the write requires two packets as well.
>>Hot plug is the only thing that gives us headaches. PCIe Hot plug is >>needed when reconfiguring the FPGA while the system is running. >>OS support for hot plug is patchy. > > We are still trying to get hot plug to work, both Linux and Windows. > HELP!
I don't know anything about hot plug support on Windows. On Linux, however, there are two ways to do it: - True hot plug. You need to use a switch (or root complex) that has hardware support for the hot plug signals (particularly "Presence Detect" that indicates a card is plugged in). The switch turns these into special messages that get sent back to the RC, and the OS should honour these and do the right thing. This should work on Windows too, as it's part of the standard. - Fake hot plug. With the Linux "fakephp" driver you can fake the hot plug messages if you don't have hardware support for them. This isn't supported in all kernel versions though. Read more here: http://scaryreasoner.wordpress.com/2012/01/26/messing-around-with-linux- pci-hotplug/ In both cases there can be address space fragmentation that can stop the system from working. By that I mean that the OS can't predict what will be plugged in, so it can't know to reserve a contiguous chunk of address space for your FPGA. The OS may do something stupid like put your soundcard right in the middle of the space you wanted. Grrr. Recent versions of the Linux kernel allow you to specify rules regarding address allocation to avoid the fragmentation problem, but I've never used them and I'm not a kernel hacker, so I don't know anything about that. Regards, Allan
On 22 Apr 2013 16:02:14 GMT, Allan Herriman <allanherriman@hotmail.com> wrote:

>On Mon, 22 Apr 2013 08:16:04 -0700, John Larkin wrote: > >> On 22 Apr 2013 14:57:24 GMT, Allan Herriman <allanherriman@hotmail.com> >> wrote: >> >>>On Mon, 22 Apr 2013 07:09:40 -0700, John Larkin wrote: >>> >>>> On 22 Apr 2013 12:59:27 GMT, Allan Herriman >>>> <allanherriman@hotmail.com> wrote: >>>> >>>>>On Sun, 21 Apr 2013 09:05:49 -0700, John Larkin wrote: >>>>> >>>>>> The annoying thing is the CPU-to-FPGA interface. It takes a lot of >>>>>> FPGA pins and it tends to be async and slow. It would be great to >>>>>> have an industry-standard LVDS-type fast serial interface, with >>>>>> hooks like shared memory, but transparent and easy to use. >>>>> >>>>>You've just described PCI Express. >>>> >>>> No. PCIe is insanely complex and has horrible latency. It takes >>>> something like 2 microseconds to do an 8-bit read over gen1 4-lane >>>> PCIe. >>>> It was designed for throughput, not latency. >>> >>>I agree about it being designed for throughput, not latency. However, >>>with a fairly simple design, we can do 32 bit non-bursting reads or >>>writes in about 350ns over a single lane of gen 1 through 1 layer of >>>switching. I suspect there's some problem with your implementation >>>(unless your 2 microsecond figure was just hyperbole). >> >> Writes are relatively fast, ballpark 350 ns gen1/4lane. Reads are slow, >> around 2 us. That's from an x86 CPU into the PCIe hard core of an Altera >> FPGA, cabled PCIe. A read requires two serial packets so is over twice >> the time of a write. > > >I thought it was faster than that. If I remember, I'll measure some in >the lab tomorrow. > >BTW, the write requires two packets as well.
Does it? Writes are buffered and there is some token-quota mechanism that lets writes blast away, and there may be a "back off, Sam!" reply packet now and then if the target can't keep up. If the target is fast, like a RAM or something, that won't happen, and writes are packet limited in one direction. Probably.
> > > >>>Hot plug is the only thing that gives us headaches. PCIe Hot plug is >>>needed when reconfiguring the FPGA while the system is running. >>>OS support for hot plug is patchy. >> >> We are still trying to get hot plug to work, both Linux and Windows. >> HELP! > > >I don't know anything about hot plug support on Windows. On Linux, >however, there are two ways to do it: > >- True hot plug. You need to use a switch (or root complex) that has >hardware support for the hot plug signals (particularly "Presence Detect" >that indicates a card is plugged in). The switch turns these into >special messages that get sent back to the RC, and the OS should honour >these and do the right thing. This should work on Windows too, as it's >part of the standard.
Yeah, Microsoft lives to honor standards.
> >- Fake hot plug. With the Linux "fakephp" driver you can fake the hot >plug messages if you don't have hardware support for them. This isn't >supported in all kernel versions though. Read more here: >http://scaryreasoner.wordpress.com/2012/01/26/messing-around-with-linux- >pci-hotplug/ > >In both cases there can be address space fragmentation that can stop the >system from working. By that I mean that the OS can't predict what will >be plugged in, so it can't know to reserve a contiguous chunk of address >space for your FPGA. The OS may do something stupid like put your >soundcard right in the middle of the space you wanted. Grrr.
We're assuming that an application will crash if its memory-mapped target region (in our case, the remapped VME bus) vanishes. What we can't do so far under Linux is re-enumerate the PCI space and start things back up without rebooting. We're still working on it. We have implemented all the optocoupled sideband signals for hot plug, and training packets resume after we reconnect. We're still working on it. -- John Larkin Highland Technology Inc www.highlandtechnology.com jlarkin at highlandtechnology dot com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom timing and laser controllers Photonics and fiberoptic TTL data links VME analog, thermocouple, LVDT, synchro, tachometer Multichannel arbitrary waveform generators
On Sun, 21 Apr 2013 14:12:05 -0700, John Larkin
<jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:

>On Sun, 21 Apr 2013 16:40:22 -0400, rickman <gnuarm@gmail.com> wrote: > >>On 4/21/2013 4:22 PM, John Larkin wrote: >>> On Sun, 21 Apr 2013 17:34:12 GMT, Ralph Barone<address_is@invalid.invalid> >>> wrote: >>>> >>>> and end up doing making new and innovative mistakes (just channeling Murphy >>>> here). >>> >>> DEC wrote operating systems (TOPS10, VMS, RSTS) that ran for months between >>> power failures, time-sharing multiple, sometimes hostile, users. We are now in >>> the dark ages of computing, overwhelmed by bloat and slop and complexity. No >>> wonder people are buying tablets. DEC understood things that Intel and Microsoft >>> never really got, like: don't execute data. >> >>You really should stick to things you understand. Every Intel processor >>since the 8086 has included protection mechanism to prevent the >>execution of data. But they have to be used properly... Blame >>Microsoft and all the other software vendors, but don't blame Intel. > >The Intel memory protection is primitive.
The 8086 had execute privileges on the segment register level and thus comparable to PDP-11 with eight up to 8 KiB segments with different protection attributes. With 80386 and some sort of virtual memory support, unfortunately Intel forgot to include the exe/noexe bit in each page table entry (as in VAX/VMS), but still relied on the segment register protection bits.
On 22.4.13 11:12 , upsidedown@downunder.com wrote:
> On Sun, 21 Apr 2013 14:12:05 -0700, John Larkin > <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: > >> On Sun, 21 Apr 2013 16:40:22 -0400, rickman <gnuarm@gmail.com> wrote: >> >>> On 4/21/2013 4:22 PM, John Larkin wrote: >>>> On Sun, 21 Apr 2013 17:34:12 GMT, Ralph Barone<address_is@invalid.invalid> >>>> wrote: >>>>> >>>>> and end up doing making new and innovative mistakes (just channeling Murphy >>>>> here). >>>> >>>> DEC wrote operating systems (TOPS10, VMS, RSTS) that ran for months between >>>> power failures, time-sharing multiple, sometimes hostile, users. We are now in >>>> the dark ages of computing, overwhelmed by bloat and slop and complexity. No >>>> wonder people are buying tablets. DEC understood things that Intel and Microsoft >>>> never really got, like: don't execute data. >>> >>> You really should stick to things you understand. Every Intel processor >>> since the 8086 has included protection mechanism to prevent the >>> execution of data. But they have to be used properly... Blame >>> Microsoft and all the other software vendors, but don't blame Intel. >> >> The Intel memory protection is primitive. > > The 8086 had execute privileges on the segment register level and thus > comparable to PDP-11 with eight up to 8 KiB segments with different > protection attributes. > > With 80386 and some sort of virtual memory support, unfortunately > Intel forgot to include the exe/noexe bit in each page table entry (as > in VAX/VMS), but still relied on the segment register protection bits.
The first Intel family member to have segment-based protection was 80286, neither 8086 nor 80186. There is certain sense in Intel's policy: segmentation is for protection and paging for virtual mempry under it. -- Tauno Voipio
On Apr 22, 5:16=A0pm, John Larkin
<jjlar...@highNOTlandTHIStechnologyPART.com> wrote:
> On 22 Apr 2013 14:57:24 GMT, Allan Herriman <allanherri...@hotmail.com> w=
rote:
> > > > > > > > > > >On Mon, 22 Apr 2013 07:09:40 -0700, John Larkin wrote: > > >> On 22 Apr 2013 12:59:27 GMT, Allan Herriman <allanherri...@hotmail.com= > > >> wrote: > > >>>On Sun, 21 Apr 2013 09:05:49 -0700, John Larkin wrote: > > >>>> The annoying thing is the CPU-to-FPGA interface. It takes a lot of > >>>> FPGA pins and it tends to be async and slow. It would be great to ha=
ve
> >>>> an industry-standard LVDS-type fast serial interface, with hooks lik=
e
> >>>> shared memory, but transparent and easy to use. > > >>>You've just described PCI Express. > > >> No. PCIe is insanely complex and has horrible latency. It takes > >> something like 2 microseconds to do an 8-bit read over gen1 4-lane PCI=
e.
> >> It was designed for throughput, not latency. > > >I agree about it being designed for throughput, not latency. =A0However, > >with a fairly simple design, we can do 32 bit non-bursting reads or > >writes in about 350ns over a single lane of gen 1 through 1 layer of > >switching. =A0I suspect there's some problem with your implementation > >(unless your 2 microsecond figure was just hyperbole). > > Writes are relatively fast, ballpark 350 ns gen1/4lane. Reads are slow, a=
round 2
> us. That's from an x86 CPU into the PCIe hard core of an Altera FPGA, cab=
led
> PCIe. A read requires two serial packets so is over twice the time of a w=
rite.
> > A random read or write from an embedded CPU, to, say, a DPM in an FPGA, r=
eally
> should take tens of nanoseconds. We do parallel ARM-FPGA transfers with a=
klunky
> async parallel interface in 100 ns or so, but it takes a lot of pins. > > From an x86 (not that we'd ever use an Intel chip in an embedded app) we =
haven't
> found any way to move more than 32 bits in a non-DMA PCIe read/write, eve=
n on a
> 64-bit CPU that has a few 128-bit MOVE opcodes. > > > > >> We've done three PCIe projects so far, and it's the opposite of > >> "transparent and easy to use." The PCIe spec reads like the tax code a=
nd
> >> Obamacare combined. > > >I found the spec clear. =A0It's rather large though, and a text book ser=
ves
> >as more friendly introduction to the subject than the spec itself. > > >One of my co-workers was confused by the way addresses come most > >significant octet first, whilst the data come least significant octet > >first. =A0It makes sense on a little endian machine, once you get over t=
he
> >WTF. > > Little-endian is evil, another legacy if Intel's clumsiness. >
why is it any more or less evil than big endian? -Lasse
On Mon, 22 Apr 2013 23:19:50 +0300, Tauno Voipio
<tauno.voipio@notused.fi.invalid> wrote:


>The first Intel family member to have segment-based protection was >80286, neither 8086 nor 80186.
I have actively tried to forget that I idid some satellite image and planeratory probe image analyzing using an i286 machine with a 10 MHz clock :-)
On Mon, 22 Apr 2013 13:37:21 -0700 (PDT), "langwadt@fonz.dk"
<langwadt@fonz.dk> wrote:

>> >> Little-endian is evil, another legacy if Intel's clumsiness. >> > >why is it any more or less evil than big endian? > >-Lasse
!sdrawkcab s'ti esuaceB -- John Larkin Highland Technology, Inc jlarkin at highlandtechnology dot com http://www.highlandtechnology.com Precision electronic instrumentation Picosecond-resolution Digital Delay and Pulse generators Custom laser drivers and controllers Photonics and fiberoptic TTL data links VME thermocouple, LVDT, synchro acquisition and simulation
John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:

>On Sun, 21 Apr 2013 08:23:37 -0500, Vladimir Vassilevsky <nospam@nowhere.com> >wrote: > >>On 4/20/2013 5:42 PM, rickman wrote: >>> I have been working on designs of processors for FPGAs for quite a >>> while. I have looked at the uBlaze, the picoBlaze, the NIOS, two from >>> Lattice and any number of open source processors. Many of the open >>> source designs were stack processors since they tend to be small and >>> efficient in an FPGA. J1 is one I had pretty much missed until lately. >>> It is fast and small and looks like it wasn't too hard to design >>> (although looks may be deceptive), I'm impressed. There is also the b16 >>> from Bernd Paysan, the uCore, the ZPU and many others. >>> >>> Lately I have been looking at a hybrid approach which combines features >>> of addressing registers in order to access parameters of a stack CPU. It >>> looks interesting. >>> >>> Anyone else here doing processor designs on FPGAs? >>> >> >>Soft core is fun thing to do, but otherwise I see no use. >>Except for very few special applications, standalone processor is better >>then FPGA soft core in every point, especially the price.
Most entry level scopes consist of an FPGA running a soft processor.
>The annoying thing is the CPU-to-FPGA interface. It takes a lot of FPGA pins and >it tends to be async and slow. It would be great to have an industry-standard >LVDS-type fast serial interface, with hooks like shared memory, but transparent >and easy to use.
You mean PCI express? :-) -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------
On Mon, 22 Apr 2013 09:27:04 -0700, John Larkin wrote:

[ snip pcie hot plug discussion ]

> We're assuming that an application will crash if its memory-mapped > target region (in our case, the remapped VME bus) vanishes. What we > can't do so far under Linux is re-enumerate the PCI space and start > things back up without rebooting.
With fakephp, you should just need to rescan that slot. With proper hot swap hardware support, it should just happen automatically. (As if anything would go wrong with that!) When the hot plug removal event happens, the OS is meant to unload the drivers. The drivers get reloaded after the hot plug insertion event. Possibly not the same drivers as before, if the FPGA contains something else. Your higher level application needs to be aware that the driver can come and go with the hot plug events. You'll need some sort of mechanism to inform the application (e.g. a signal). Presumably the application is the actual cause of the FPGA reconfiguration, in which case it knows when the FPGA is there or not and doesn't need to be told.
> We're still working on it. We have > implemented all the optocoupled sideband signals for hot plug, and > training packets resume after we reconnect. We're still working on it.
I found that just the presence detect was needed for reliable hot plug. All the others are optional. Regards, Allan