Home » knowledge » The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

In the actual development process of FPGA, this paper proposes a simulation method to locate, solve the fault and verify the fault solution, aiming at the problems of difficulty in locating faults, repeated modification of the code and compilation time, and failure to solve the fault after the board is installed. . It can greatly save development time and improve development efficiency.

FPGA has been applied in more and more fields in recent years, and many large communication systems (such as communication base stations, etc.) use it for core data processing. However, the long compilation time makes the troubleshooting part of the development process very headache. This article introduces a simulation method to solve faults so as to reduce the number of compilations in the research and development process, and finally achieve the purpose of accurately locating faults and shortening the time to solve faults. The software development platform used in the example is Quartus II of Altera Corporation, and the simulation tool is ModelSim.

statement of problem
In the process of system development on board debugging, sometimes the bugs are very extreme or rarely appear, and now the usual practice is: when the fault occurs, use SignalTap to catch the signal to find the problem, Modify the program; after modifying the version, first recompile the entire project, and then run the version on the board for verification to see if the fault is resolved.
Three problems arise in this way:
①Sometimes it is difficult to locate the fault, only knowing which module is faulty, it is difficult to locate the specific signal, which brings trouble to grasp the signal. If the fault location is inaccurate and key signals are missed, you need to re-add the signal in SignalTap, compile the version, and locate the fault on the board again, which wastes time.
②After the fault is located, the modified code needs to be compiled again to generate a new version of the download file. If there is still a problem after modification, this process must be repeated, so that from the fault location to the completion of the modification, many compilations are required.
③ When the upper board is re-verified, if the probability of the bug appearing is very small and it will not recur in a short time, it does not mean that the fault in extreme cases has really been solved.
for example:

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 1 The data when the bug caught by SignalTap appeared

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 2 SignalTap signal capture interface

For example, in the FPGA logic version of a baseband system, the output module calls an asynchronous FIFO, and at a certain moment, when the FIFO is empty, one more data is read, resulting in a bug, as shown in Figure 1.
The function of the output module is to judge whether there are more than 4 data in the FIFO that can be read out, and if it is more than 4 data are continuously output as a group. In the system, the internal read data pointer of the asynchronous FIFO is used to make the judgment, and the read and write data of the asynchronous FIFO needs to cross the clock domain, requiring at least 2 clock cycles of handshake time, resulting in inaccurate data pointers. On the judged clock edge, although it is displayed that more than 4 data are readable, because of the delay of the handshake time, there may actually be only 3 data in the FIFO.

In Figure 1, rdreq is the read enable signal of the FIFO, which is valid within 4 clock cycles, but only 3 numbers are read (the data 0D2086C9F is read twice), because the FIFO has been read empty in the 4th clock cycle. This should be changed to synchronous FIFO. Since the reading and writing of synchronous FIFO data is only performed in one clock domain, there is no problem of this handshake time delay.

When locating this fault, we can easily know which module has the problem, but it still needs some work on which signal inside it. If the error signal is hidden deeply, it is difficult to catch the required signal at one time; and Even if we catch the correct signal, if the fault is not resolved after the correction, we need to re-modify and compile again, which is time-consuming; even if the fault does not recur after the correction, it may be because the conditions of the bug are harsh and cannot be Prove that the fault is really solved.

In response to these three questions, the author puts forward the following ideas:

Although it is difficult to locate the specific error signal, it is easy to locate which module is faulty. When a bug occurs, we can capture all the input signals of this module, and consider whether these signals can be used to reconstruct the condition of the bug in the simulation environment. Use the simulation environment to specifically locate the location of the error signal.

After locating the exact location of the error signal, modify the code and simulate again with the same conditions. In this way, by comparing the output data before and after the modification, it is very intuitive to verify whether the modification is successful, so that after the modification is successful, you only need to compile it once, saving time.
It can also be ruled out that the bug does not recur after the board is installed because the extreme situation is difficult to meet, which removes the worries and completely solves the fault.

Simulation method to solve the problem
By solving this asynchronous FIFO problem, it can be proved that this method of establishing bug existence conditions through the captured signals, locating and clearing bugs is feasible. Proceed as follows:

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 3 SignalTap II List File interface

①Save the signal captured by SignalTap when the bug occurs as a document file

Figure 2 shows the interface of the Quartus II platform that uses SignalTap to capture signals.
Right-click on the signal name, select the Create SignalTap II List File option shown in Figure 2, and generate an interface in the format shown in Figure 3.
The upper part of the interface in Figure 3 shows the list’s description of the number of signals and signal names, the lower part is the signal value corresponding to the sampling point, and the representation with h is a hexadecimal value.

Save the list file as a text format file, as shown in Figure 4.

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 4 “Save As” option interface

After that, you can delete the useless description in this text file, leaving only the data captured by SignalTap (space, h and other symbols should also be deleted), and save it as a .dat file for simulation use.
With the input data when the fault occurs, we can construct the conditions under which the fault occurs in the simulation environment.

②Using the .dat file to establish the conditions for the occurrence of bugs
Write a simulation file (testbench) in verilog language, and use the statement $readmemh or $readmemb to store the data in the .dat file in a set ram, such as: $readmemh(“s.dat”,ram).
Note that the reading of $readmemh is performed according to hexadecimal data (the data in the .dat file is considered to be all hexadecimal numbers), and it will be automatically converted into a 4-bit binary number and stored in ram, so the set ram The bit width should be 4 times the data bit width in the .dat file; when using $readmemb, when storing the signal captured by SignalTap, the signal must be set to binary type first, and the ram bit width is the bit width of the data in the .dat file. The depth of ram is the number of data in the .dat file.

Then in the program, the data in the ram is output to a register variable according to the corresponding clock edge, and the ram address can be accumulated.
[email protected](posedge clk)
begin
data<=ram[addr];
addr<=addr+1'b1;
end

When there is a condition to reproduce the bug, the input signal of the module needs to correspond to the data bit in the ram. When the simulation file calls the module, the corresponding bit of the register data can be accessed as input.
The bug waveform is reproduced in the simulation environment as shown in Figure 5.
Comparing Figure 5 with Figure 1, it can be seen that through this method, we have established an environment for bug errors in the simulation environment, and obtained the same output error data.

③ After modifying the program, verify whether the modification is successful in the simulation environment
After modifying the program, we only need to use the same environment for simulation and observe whether the bug is solved in a targeted manner. The reason for the bug in this example is that asynchronous FIFO is used. After changing to synchronous FIFO, the problem should be solved, and we can verify it through simulation. The simulated waveform after modifying the program is shown in Figure 6.

It can be seen from Figure 6 that after the modification, the same conditional FIFO reads 4 numbers, indicating that there is no read empty, it meets the requirements, and the bug is resolved. Figure 7 shows the signal waveform captured by the upper board using SignalTap after version compilation for comparison.

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 5 Error data reproduced in the modelsim environment

The Method of Applying Simulation Technology to Solve Faults in FPGA Design and Development

Figure 6 Output data under the same conditions after modifying the program

Figure 7 Signals captured by SignalTap after modifying the program
After the comparison, it is easy to see that the waveforms are exactly the same, indicating that the method is feasible.

Summarize

The methods described herein can address a variety of failures. When a fault occurs, you only need to locate the faulty module, and it is okay to embed some sub-modules in these modules; when catching the signal, just grab the input and output signals of the faulty module; use the input signal to reconstruct the fault environment, if the simulated output signal and all If the output signals are the same, it means that the fault environment is established correctly; using this simulation platform, you can specifically locate which sub-module and which signal is wrong, without the need to capture these signals in SignalTap; and you can verify whether the modification is successful after modifying the code. , save time, very clearly prove that the fault is really solved, and get twice the result with half the effort.

The Links:   LQ9D023 MID200-12A4 IGBT-STOCK