‍

Troubleshooting: Synchronizing Data Arrival to a DAC from a Zynq Ultrascale+ FPGA

‍

Intro

Synchronize the arrival of data to a DAC from a Zynq Ultrascale+ MPSoC FPGA. The FPGA transmits data samples to a Texas Instruments DAC3283 via an LVDS interface in DDR mode, utilizing LVDS pairs for high-speed communication. LVDS is a widely used signaling standard, valued for its low power consumption, high noise immunity, and suitability for high-performance applications. This synchronization challenge is particularly interesting due to the stringent timing requirements imposed by the DDR mode and the precision required to close timing on the DAC interface at high frequencies.

This article explores two options for synchronizing the interface: applying delays to data signals and adjusting the phase between data and clock.

For this article, a driver was made in VHDL and loaded into the FPGA’s programmable logic. This driver is responsible for reading samples from an internal BRAM and encode the data in a way that the DAC can process the input samples and get the desired analog output signal

‍

The interface between the FPGA and DAC is a parallel bus that has 8 data signals, 2 control signals and a forwarded clock. The clock is used by the DAC to sample the signals.

DAC timing

The next diagram shows an adaptation from figures 34 and 36 from the datasheet, where you can see timing specifications for all data and frame traces. The signal tx_enable is not present in this diagram because its synchronization is not critical for signal integrity.

‍

Looking inside both Alinx AXU4EV and FMC-01 pcb designs, we know that the length of data and clock traces are almost equal. In this case, the maximum difference is around 0.6mm (24mils). This fact is important for the signal propagation, since the traces have almost the same length, you can expect that signals arrive at the same time. If the traces have different lengths, you must take in consideration the time propagation delay.

In the RTL we have 2 clock signals, one for the internal FSM that generates data, and the other is the one we send to the DAC IC (DATACLK). A simple way to follow the timing specifications is to control the delay between the clock edges and the data signals. This can be achieved by either delaying the data signals relative to the clock driving the DAC driver module or delaying the clock relative to the data signals.

Sync delaying each bit

Our first approach was using a combo of ODELAYE3 and IDELAYCTRL primitives.

The ODELAYE3 primitive, is a device primitive that delays the output data. It is built of a delay line with 512 taps and a maximum delay of XYZ ps. It allows signals to be delayed on an individual basis. And the IDELAYCTRL primitive must be instantiated when using the ODELAYE3. This module requires a reference clock input that allows internal circuitry to calibrate precise delay tap values independent of PVT (process, voltage, and temperature) for the ODELAYE3 components.

In our case, we are implementing one ODELAYE3 for each output bit and only one IDELAYCTRL.

By using this configuration, you can tune each output bit separately

Internal connections

The IDELAYCTRL primitive has the inputs ports:

RST coming from an internal reset signal
REFCLK connected to the clock used for the DAC driver logic

And the only output will be connected to each instance of an ODELAYE3 control FSM. This FSM is detailed in the UG571, chapter 2, section ODELAYE3 subsection “DELAY_VALUE Attribute"

And each ODELAYE3 instance will be connected as the next image shows

All the mentioned components in this subsection are connected to the same clock and reset. For the clock speed, you will need to check in the corresponding user guide for your FPGA, in our case the clock must be between 300 and 800MHz.

Let’s discuss the ODELYE3 parameters and connections that don’t appear in the previous image. Since we are not cascading ODELAYs, all CASC_* inputs are connected to a logic 0. Also, we are going to load the delay using de CNTVALUEIN inputs, so we can tie the ‘0’ CE pin low and tie the INC pin high.

As for the parameters:

IS_CLK_INVERTED, IS_RST_INVERTED, SIM_DEVICE and REFCLK_FREQUENCY depends on your design. In this case we used, IS_CLK_INVERTED = 0, IS_RST_INVERTED = 1, REFCLK_FREQUENCY = 400.0 and SIM_DEVICE = ULTRASCALE_PLUS
Since we are not cascading, CASCADE = NONE.
Following the UG571 recommendations, the UPDATE_MODE parameter is set to ASYNC.
The parameter DELAY_FORMAT is set to TIME. In this way, the value loaded from the FSM equals an amount of time that the signal is delayed
Set the parameter DAC_DELAY_TYPE to VAR_LOAD. So you can read the current delay value from the CNTVALUEOUT output and use the setup we are using

Checking data arriving to the DAC

In order to know if data is arriving correctly, you need to use DAC’s internal pattern checker.

The pattern checker has 8 configurable registers (CONFIG9 to CONFIG16) in which you can leave default patterns or customize them to suit your needs. In register CONFIG8 the result of this logic is allocated where a logic ‘1’ indicates an error in that bit. This is the result of all registers ORed, i.e. CONFIG8[0]=CONFIG9[0]+CONFIG10[0]+ ...+CONFIG16[0].

Now, to enable the pattern checker logic and get the result, you need to write DAC internal registers in the following way:

Write to register CONFIG1 0x04 (this is only the first time)
From the FPGA, send the same patterns that are written in CONFIG9 thru CONFIG16 to the DAC
Write to register CONFIG8 0x00
Write to register CONFIG6 0x00
Read from register CONFIG8

If there is an error detected, modify the delay in that bit and repeat the process. This process must be repeated until you get a reading of 0x00 in CONFIG8 DAC’s internal register.

Sync changing the phase

This whole process can be cumbersome, so we propose an alternative approach: changing the phase of DATACLK instead of using an ODELAYE3 to tune each output’s delay before the OBUFTDS primitive. In this way, you tune all lanes at the same time. By doing this, you change the DAC’s sampling moment in both flanks and make sure data is stable at those moments. In your first trials, if a phase=90° gives you 0xFF at CONFIG8, it is recommended that you generate several bitstreams with different phases. And try all of them. Now if there is a way to connect an oscilloscope or, better yet, a logic analyzer, you will have a very good guess of what your new phase value should be around.

‍

The diagram below shows the system we built:

‍

Constraining the FPGA

Constraints used:

The first line creates a virtual clock on the output pin. The follow four lines configure time limits for max and min delay in rising edge and falling edge (lines with -clock_fall) for the data[*] signals (DDR). Note in all cases that lines where the min delay is defined, you are setting the hold time. And where the max delay, you set the setup time. And, as frame is not a DDR signal, the -clock_fall is not necessary and only the rising edge is constrained.

Why is the hold time negative? Because Vivado sets positive time as time before the clock edge arrives. A negative hold value means that the signal must remain stable after the clock’s rising edge. As we can see in the next image taken from the “Constraints Wizard” located in the implementation menu. Inside this wizard, this image is from the ‘Output delays’ section and you will get it in the waveform tab when the ‘Data Rate and Edge’ is set to ‘Dual’

‍

Where:

tsu_r: destination device setup time requirement for rising edge
thd_r: destination device hold time requirement for rising edge
tsu_f: destination device setup time requirement for falling edge
thd_f: destination device hold time requirement for falling edge
trce_dly_max: maximum board trace delay
trce_dly_min: minimum board trace delay

Since we estimate that the traces have the same length, we assume that trce_dly_min and trce_dly_max values are 0.

If you are using an MMCM, it is also possible to do a phase fine tuning while the FPGA is working instead of having to generate the entire bitstream. This is achievable by selecting the “Dynamic Phase Shift” in the clocking options, then check the option “Use fine PS” on the desired clock. Note that, when this option is checked, the selected phase will return to 0 degrees. A way to do a fine tuning after you have found the approximate phase follow the next steps, on the “MMCM Settings” tab:

check the “Allow Override Mode”
Make sure that the “Use fine PS” option is unselected
Modify the phase to the values that you have found that works. This will happen when the internal pattern checker returns 0x00 when reading CONFIG8 register
Select the “Use fine PS” option.

This fine tuning can be useful in order to make debugging without having to regenerate another bitstream.

If you need more information about this mode, you can check this link: https://adaptivesupport.amd.com/s/question/0D52E00006hpsbSSAQ/how-to-use-dynamic-phase-shift?language=en_US

Or the Clocking Wizard Product Guide PG065 document: https://docs.amd.com/r/en-US/pg065-clk-wiz

‍

Conclusion

This article introduced two approaches for synchronizing a DAC with an FPGA. Either by bit or by phase.

By bit: might be a good selection if the pattern checker is off by a little amount of bits.
By phase: in the case you get a large amount of errors, and all traces are matched, you can change the phase looking for a number that will suit DAC’s pattern checker needs. This way also needs less resources than instantiating ODELAYS and IDELAYCONTROL
In the case the traces aren’t matched, you might consider using both methods. I.e. apply a delay to the bits that have longer distance to travel and move the phase for all your bits .

‍

References

https://www.ti.com/product/DAC3283

https://docs.amd.com/r/en-US/ug974-vivado-ultrascale-libraries/ODELAYE3

https://docs.amd.com/r/en-US/ug974-vivado-ultrascale-libraries/IDELAYCTRL

https://docs.amd.com/v/u/en-US/ug571-ultrascale-selectio (version 1.16 english)

‍

Written by Nicolas Bertolo & Adrian Evaraldo

‍‍Any Comments or questions, please feel free to contact us: info@emtech.com.ar