Description
The goal of this issue is to document the kinds of PLL primitives of the various FPGA manufacturers, so that we can support them in nmigen.
Lattice
Lattice PLLs are different for the different FPGA families.
iCE40 (LP/LM/HX/Ultra/UltraLite/UltraPlus)
The iCE40 family has five types of PLLs. They are all capable of shifting the output clock by 0 and 90 degrees, and allow fine delay adjustments of up to 2.5 ns (typical) in 150 ps increments (typical).
SB_PLL40_CORE
Can be used if the source clock of the PLL originates on the FPGA or is driven by an input pad that is not the bottom IO bank (bank 2).
SB_PLL40_PAD
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). When using this PLL, the source clock cannot be used anymore.
SB_PLL40_2_PAD
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). This PLL outputs the requested clock as well as the the source clock.
SB_PLL40_2F_CORE
Can generate two different output frequencies. Can be used if the source clock of the PLL originates on the FPGA.
SB_PLL40_2F_PAD
Can generate two different output frequencies. Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively).
ECP5
The ECP5 family has bunch of clocking elements, but only one type of PLL.
EHXPLLL
It has four outputs, but if the user wants hook the PLL feedback, then one of the outputs cannot be used in the fabric. It supports dynamic clock selection and control, dynamic phase adjustment, etc.
Xilinx
Xilinx has two clock synthesis IPs as far as I know: PLL and Mixed-Mode Clock Manager (MMCM). The different FPGA families have different versions of these IPs.
Spartan-6
The Spartan-6 family have Clock Management Tiles (CMTs), where each contain one PLL and two Digital Clock Managers (DCMs). The latter can be used to implement Delay Locked Loops, digital frequency synthesizers, digital phase shifters, or a digital spread spectrum.
There are two PLL primitives, that each have six clock outputs and a dedicated feedback output.
PLL_BASE
PLL_BASE provides access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.
PLL_ADV
PLL_ADV provides access to all PLL_BASE features, such as dynamically reconfiguring the PLL.
7 Series
The 7 series also have CMTs, where each contains a MMCM and a PLL. The PLL contains a subset of the functions of the MMCM. The PLL has six clock outputs, whereas the MMCM has seven clock outputs. Both have a dedicated feedback output.
MMCME2_BASE and PLLE2_BASE
Both BASE
IPs proved access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.
MMCME2_ADV and PLLE2_ADV
The MMCME2_ADV
IP provides access to all BASE
features, such as additional ports for clock switching, access to the Dynamic Reconfiguration Port (DRP), and dynamic fine-phase shifting. The PPLE2_ADV
IP provides the same features, except for dynamic fine-phase shifting.
UltraScale / UltraScale+
The CMTs in the Ultrascale family contain an MMCM and two PLLs. Just like with the 7 series FPGAs, there are _BASE
and _ADV
IPs so they won't be repeated here.
MMCM
The MMCMs have seven clock outputs and a dedicate feedback output. The MMCM IPs for the UltraScale are MMCME3_BASE
and MMCME3_ADV
, whereas for the UltraScale+ they are MMCME4_BASE
and MMCME4_ADV
.
PLL
The PLLs have two clock outputs each. Similarly to the MMCMs, the PLL IPs are PLLE3_BASE
and PLLE3_ADV
for the UltraScale family, and PLLE4_BASE
and PLLE4_ADV
for the UltraScale+ family.
Intel
From what I can gather, all Intel FPGA PLLs are instantiated using the ALTPLL
IP, and all PLLs have five clock outputs.
Common Remarks
-
PLLs multiply and divide clocks to achieve the desired output clock frequencies. But they are limited to the allowed minimum and maximum mulitplier, divider, and intermediate clock frequencies. These values can also depend on the speedgrade of the FPGA, which would mean that we should be able to supply the speedgrade of the FPGA.
-
The calculation of how to achieve the desired clock frequencies given the input clock frequencies is different.
-
Most PLLs have additional functionalities that we could initially not support, and slowly introduce one by one. Also, there are other IPs that are often used in conjunction with PLLs, so they could be added too.
-
For Xilinx clocking resources, we could just always use the
_ADV
IPs and just supply default values for addition feartures that are not used. -
The information above can of course contain errors :)
Approach
Maybe it would be a good idea to just start with the iCE40 family (since its the simplest, and I use it my current design). We could implement the PLLs similarly to how Litex does it.
Activity
jeanthom commentedon Jul 8, 2020
I'd rather think that the ECP5 would be a better starting point since there is only one PLL type to take care of. I'm not sure if it has completely been reverse engineered though.
GuzTech commentedon Jul 8, 2020
Fair enough, I'll see if I can whip up something for the ECP5 and share it here.
GuzTech commentedon Jul 8, 2020
I have an initial implementation in #426.
GuzTech commentedon Jul 9, 2020
Here are some of my thoughts on how a PLL could be used:
PLL Creation
As I see it, there are three ways of creating a PLL.
SB_PLL40_2F_CORE
).iCE40PLL(SB_PLL40_PAD)
).The first one is basically a class that instantiates the specific PLL primitive, with some logic that calculates the PLL parameters. I think this leads to repeated code, since most primitives differ slightly among themselves.
The second one somewhat abstracts away the specifics of each PLL primitive and leads to code reuse.
The third one looks like it would require quite a bit of code. Who decides what the "correct" PLL primitive is? For example, for the iCE40 family, the usage of the
_CORE
and_PAD
primitives is limited by which bank the input clock originates from. We would have to be able to check this, and I don't think this is the responsibility of nMigen.Creating Output Clocks
In Litex, you create a PLL object and call
register_clkin
on it, where you give the input clock signal and it's frequency. For each output clock, you first create a ClockDomain, and callcreate_clkout
on the PLL object and supply the ClockDomain. Finally, you add a period constraint to the Platform.I like how this works, but of course I'm open to suggestions. It might not make sense for the iCE40 PLL primitives with one clock output, but it would be consistent.
Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.
rroohhh commentedon Jul 9, 2020
One more thing that might be interesting to think about (but not sure if this actually fits this issue) is where the PLL gets stored. Given there is usually only a small number of them, it almost feels like putting them into the platform could be a good idea. It would make it easier to generate the required clocks for different parts of a design without manually passing a PLL instance around or having to create all clocks toplevel.
Of course it also has downsides, like making phase relations less obvious.
Another thing (atleast on xilinx 7series, not sure if there are similar things on other platforms) is the placement of the PLL, if it is in the same clocking region as say the pin the input clock comes into the fpga from, one only needs a regional clock buffer for the input clock, if it is not one needs a global buffer. This is probably far out (and maybe out of scope), but some support from nmigen to help with choosing the correct clock buffer / automatically finding a good combination would be pretty interesting.
whitequark commentedon Jul 9, 2020
Okay, so I think we have two basic problems here that are essentially separate:
It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations? Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.
rroohhh commentedon Jul 9, 2020
I think the PLL should definitely create the clock constraints and given that the platform is already passed to elaborate it should be fairly easy to do that there.
whitequark commentedon Jul 9, 2020
This isn't usually necessary because the backend toolchain already knows the output frequency if you constrained the inputs. But in any case, the platform is the factory of PLLs, so a PLL naturally knows what the platform is.
rroohhh commentedon Jul 9, 2020
Oh whoops somehow that slipped my mind.
GuzTech commentedon Jul 9, 2020
These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?
Is brute forcing an actual issue? It might just be me, but it's not like Python chokes on calculating the parameters every time I build my design. Maybe I'm overlooking something?
whitequark commentedon Jul 9, 2020
Yes, with integer solutions, and some inequalities too, since none of the parameters have an arbitrary range.
We can start with brute forcing and improve it later, assuming that our data representation doesn't prevent us from doing so. (E.g. expressing the equations with arbitrary Python code obviously won't work.)
GuzTech commentedon Jul 9, 2020
Ok, I'll look up the formulas from the datasheets for each PLL. The data representation should express the equations symbolically using a symbolic expression library I assume.
30 remaining items