

# Estimating Performance and Capacity of Actel Devices

It is important to know, prior to getting too far into the design process, if a particular FPGA can implement the required application at the capacity and performance level of a target device. If the application does not fit, features need to be sacrificed, or a higher-capacity device (if it exists) substituted for the lower-capacity device. If the speed of the target device isn't sufficient, a higher-speed device or significant changes to the design might be needed to achieve the required performance. With Actel FPGAs, designers can estimate application performance and capacity requirements prior to the design, with minimal effort, and they can increase their confidence that the desired application will fit in the target device. This application note presents a technique, using a small library of common logic functions, to make performance and capacity estimates for Actel devices.

## **Performance Estimation Techniques**

It is possible to make some fairly accurate estimates of the performance and capacity of Actel devices, because of the abundance of high-speed routing resources available in the devices. Estimates are less dependent on placement and routing limitations, unlike other FPGA architectures. For Actel devices, estimates for applications can be based on other estimates for common building blocks, and the combined estimates will be reasonably accurate. This technique may not work for other, less routing-intensive architectures.

A small library of common logic functions is shown in Table 1. The estimated performance of each function is given for each of Actel's FPGA families. For example, the performance of a 16-bit accumulator in the A1200XL family is around 33 MHz (for the standard, -0, speed grade). Most of these blocks are generated by Actel's ACTgen Macro Builder tool (an automated logic generator) and provide good estimates for post-place-and-route performance. The two state machines are representatives of a typical simple state machine and a more complicated state machine implementation. The small state machine is an 8-state state machine with 10 simple transition terms and 8 simple outputs. The large state machine is a 16-state state machine with 26 complex transitions and 8 outputs. More or less complex machines can be extrapolated from these two data points for more accurate performance estimates.

|                          | Family |         |         |                     |  |  |
|--------------------------|--------|---------|---------|---------------------|--|--|
| Function                 | ACT 1  | 1200XL  | 3200DX  | ACT 3/<br>ACT 3 PCI |  |  |
| 2-input XOR*             | 9.3 ns | 7 ns    | 7 ns    | 5.5 ns              |  |  |
| 4-input AND/OR gate*     | 9.3 ns | 7 ns    | 7 ns    | 5.5 ns              |  |  |
| 16-bit Accumulator       | 20 MHz | 33 MHz  | 33 MHz  | 43 MHz              |  |  |
| 16-bit Shift Register    | 80 MHz | 159 MHz | 159 MHz | 182 MHz             |  |  |
| 16-bit Counter           | 30 MHz | 45 MHz  | 45 MHz  | 57 MHz              |  |  |
| 16-bit Prescaled Counter | 80 MHz | 119 MHz | 119 MHz | 149 MHz             |  |  |
| 16-bit Register          | 80 MHz | 159 MHz | 159 MHz | 182 MHz             |  |  |
| 32x8 FIFO (fast)         | NA     | NA      | 60 MHz  | NA                  |  |  |
| 32x8 FIFO (compact)      | NA     | NA      | 40 MHz  | NA                  |  |  |
| Small State Machine      | 20 MHz | 60 MHz  | 60 MHz  | 80 MHz              |  |  |
| Large State Machine      | 12 MHz | 40 MHz  | 40 MHz  | 66 MHz              |  |  |

| <i>Table 1</i> | • [ | ogic 1 | Function | Performance | Estimates |
|----------------|-----|--------|----------|-------------|-----------|
|----------------|-----|--------|----------|-------------|-----------|

\* Single module delays with fanout of 4



### A Scaling Method for Speed Grades

Performance estimates for speed grade versions can be made by multiplying the standard, -0, speed grade performance by the associated scaling factor. These scaling factors are given in Table 2. For example, a -3 version of the 16-bit accumulator in the A1200XL family would be estimated at 33 MHz \* 1.54, or 50 MHz. Alternatively, the designer could estimate the overall application performance based on the standard speed grade and then scale the application performance based on the speed grade scaling factors. This approach is probably preferred if the application has several key functions and is moderately to very complicated.

#### An Application Example

Performance estimates for an application may require a combination of logic functions. If the designer can identify the key functions in the target application, perhaps as a "back of the envelope" block diagram, estimates for each block can be made and combined as the overall estimate. Figure 1 shows a block diagram for an example application. It is a graphics controller for an LCD display. The display is bit mapped, with the image data stored in RAM. The controller interleaves simple operations on the bits (SET, RESET, XOR, XOR with mask, SHIFT, etc.) with reading the data from memory and sending it to the display.

| Table 2 • | • Scaling Factors for | FPGA Speed Grades: | Frequency and Delay |
|-----------|-----------------------|--------------------|---------------------|
|-----------|-----------------------|--------------------|---------------------|

|               | Delay Scaling |      |      | Frequency Scaling |      |      |      |      |
|---------------|---------------|------|------|-------------------|------|------|------|------|
|               | Std           | -1   | -2   | -3                | Std  | -1   | -2   | -3   |
| A1010B/A1020B | 1.00          | 0.85 | 0.75 | 0.65              | 1.00 | 1.18 | 1.33 | 1.54 |
| 1200XL        | 1.00          | 0.85 | 0.75 | 0.65              | 1.00 | 1.18 | 1.33 | 1.54 |
| 3200DX        | 1.00          | 0.85 | 0.75 | 0.65              | 1.00 | 1.18 | 1.33 | 1.54 |
| A1400A        | 1.00          | 0.85 | 0.75 | 0.65              | 1.00 | 1.18 | 1.33 | 1.54 |



Figure 1 • LCD Controller Block Diagram

The address counter is used to address the memory during display operations. A multiplexer is used to select between display addresses and the host address register. The desired address is loaded into the memory address register. The addressed value is read from memory and loaded into the serial shift register, which provides a serial bit stream to the display. Bit operations are executed in the ALU, with data sources from the memory register, the host data register, and the bit mask register. Notice that all registers can be read by the host processor over the host data bus (via the multiplexer) to simplify coding and debugging. A small state machine controls the host-controller interface and provides the required timing for the display.

The key function blocks are given in Table 3. Associated with each block is the closest equivalent function from Table 1. The performance of that function in a particular device family can be estimated directly from Table 1. For example, the ALU is most similar to the 16-bit accumulator in Table 1, so its performance estimate is 35 MHz in ACT 3 with the standard speed grade. Other family's estimates are also listed in Table 3.

Estimates can also be made by using the scaling factors in Table 2. This allows comparisons between high-speed devices and standard speed devices. For example, a -3 speed grade 3200DX device would have around a 50 MHz ALU estimated performance (1.54 for -3 speed grade x 33 MHz standard speed grade performance estimate).

Note that more extensive performance estimation tables are available in the "Macro Libraries" section in this data book. In addition, the ACTgen Macro Builder Test Drive can be used to evaluate the capacity and performance of particular functions not tabulated. The ACTgen Test Drive is available from Actel free of charge. Finally, for the most up-to-date tables, including future family estimates, visit Actel's World Wide Web Homepage at www.actel.com.

# **Capacity Estimation Techniques**

Capacity estimates can be made by using a technique similar to that used for estimating performance. Table 4 gives estimates for the module count for a number of common logic building blocks in each Actel family of devices. Module counts for larger building blocks can be easily estimated. Capacity for each block can be estimated by finding the closest corresponding function in Table 4. The total capacity estimate can be found by simply summing the estimates for all blocks. In some cases, the capacity estimates depend on whether the functions need to be high-speed. Actel devices allow higher speed implementations for some logic functions by using extra logic modules. Make sure you select the module count based on the expected performance requirements. For example, a pre-16-bit counter in ACT 3 requires more modules than does the regular 16-bit counter. Table 4 gives the total capacity estimates for the example application.

 Table 3 • Associated Block Estimated Performance for Standard Speed Devices

| Block             | Estimated Block     | ACT 1  | 1200XL  | 3200DX  | ACT 3/<br>ACT 3 PCI |
|-------------------|---------------------|--------|---------|---------|---------------------|
| ALU               | 16-bit Accumulator  | 20 MHz | 33 MHz  | 33 MHz  | 43 MHz              |
| Address Counter   | 16-bit Counter      | 30 MHz | 45 MHz  | 45 MHz  | 57 MHz              |
| Shift Register    | Shift Register      | 80 MHz | 159 MHz | 159 MHz | 182 MHz             |
| Bus State Machine | Small State Machine | 20 MHz | 60 MHz  | 60 MHz  | 80 MHz              |

| Table 4 | Capacity Estimates for the Example Application |  |
|---------|------------------------------------------------|--|
|         |                                                |  |

| Application        | Estimated           | ACT 1 | 1200XL | 3200DX | ACT 3/<br>ACT 3 PCI |
|--------------------|---------------------|-------|--------|--------|---------------------|
| Block              | Block               |       |        |        |                     |
| Address Counter    | Counter             | 64    | 46     | 46     | 46                  |
| ALU                | Accumulator         | 100   | 92     | 92     | 92                  |
| Shift Register     | Shift Register      | 40    | 21     | 21     | 21                  |
| Bus State Machine  | Small State Machine | 32    | 18     | 18     | 16                  |
| Data Registers (4) | Register            | 64    | 64     | 64     | 64                  |
| Multiplexers (4)   | Multiplexer         | 64    | 80     | 80     | 80                  |
| Total              |                     | 364   | 321    | 321    | 319                 |



## Details for the Expert

These techniques are estimates only and leave out some of the details associated with the architecture of Actel devices. These details are usually second-order effects, but even better estimates can be made if these factors are kept in mind. Try them after you are familiar with the simple estimation techniques.

- Clock frequency limit performance in some very simple blocks, like shift registers. Higher-capacity devices in a family have slightly slower maximum clock frequencies. If a simple logic function must operate at a very high speed, consult the appropriate device data sheet after selection to determine if the maximum performance is affected.
- Logic modules can sometimes be combined in Actel FPGAs when logic functions are connected. For example, a simple register and a 4-to-1 multiplexer can be combined into a single module. If you have many multiplexers and registers in a data path, count only one block for capacity purposes. The other block comes free!
- When estimating capacity for wider data paths, add a few extra percentage points to account for buffering on common signals. For example, a 32-bit 4-to-1 multiplexer could be estimated at about 210 percent of a 16-bit version. This is usually a second-order effect, however, and can be ignored unless it applies to a large percentage of the design.
- When logic modules are combined, propagation delays are also reduced. Thus, the performance of a 4-to-1

multiplexer in front of a simple register isn't just the sum of the two. If you take the longest delay and add one-half of the smaller, you will get a good estimate for the expected delay after combining.

- High-performance designs sometimes require logic duplication to keep fanout low. If you have a high-performance design, reserve 10 to 20 percent of the device logic modules to ensure that logic duplication can be implemented.
- Actel devices are composed of two different types of logic modules. Sequential logic modules include a dedicated flip-flop; combinatorial modules don't. If you have a register-rich design, you may need to estimate the total number of registers and check your target device to ensure that enough registers are present. Since Actel devices are composed of about a 50-50 mix of sequential logic modules and combinatorial logic modules, it is unusual to run out of registers, but it can happen.

## Conclusion

Actel devices have good performance and capacity predictability because of the large amount of routing resources available on the device. This allows the performance and estimation techniques presented here to be used to determine the application capacity and performance fit prior to the detailed design of the application. A small amount of time put into doing this estimation can result in a much better choice for the target family.