

# Using External SRAM Memory with Actel SX/SX-A FPGAs

## Introduction

Today's system designs are growing in complexity, requiring larger amounts of memory for high-performance buffers and other local data storage. System designs that require both logic gates and memory are sometimes implemented in a single FPGA device containing embedded SRAM. The embedded SRAM in these FPGAs can be implemented as synchronous or asynchronous memory functions such as register files and FIFOs. FPGAs with on-chip RAM provide an easy solution for both logic and memory needs. However, selecting FPGAs containing blocks of SRAM to meet system memory requirements may increase the overall cost of the system.

Actel's SX-A/SX FPGAs were designed without on-chip SRAM to maximize their price/performance advantage. Small blocks of distributed memory can be implemented with registers using the ACTgen macro generator. Larger memory requirements can easily be met with external RAMs that efficiently interface with the Actel devices. For many applications, an FPGA with an external memory device can meet functionality requirements more cost effectively. This application note details the price/performance advantage when implementing both simple and complex SRAM memory functions in the SX/SX-A devices.

# **Memory Types**

There are many types of external SRAM memory devices, each providing features tailored for specific design requirements. Table 1 describes some of the most common SRAM types, applications for which they are best suited, and FPGAs that contain this type of SRAM.

When a system requires a single large block of memory, an external SRAM component can provide any variation of density, speed, price, type, package, etc., while also allowing for easy future upgrading to larger-density or higher-speed devices. Separate components enable the designer to select the SRAM storage requirements independently of FPGA gate count – flexibility not available in an FPGA with SRAM blocks.

| <b>I U U U</b> U U U U U U U U U U U U U U U U | Table | 1 • | • 1 | upes | of SRAM |
|------------------------------------------------|-------|-----|-----|------|---------|
|------------------------------------------------|-------|-----|-----|------|---------|

| Memory Type                     | Application Examples                                | FPGA Availability                  |
|---------------------------------|-----------------------------------------------------|------------------------------------|
| Asynchronous SRAM               | DSP interfacing and wireless devices                | Xilinx Virtex<br>Altera Apex, Flex |
| Synchronous (Syncburst) SRAM    | DSP and high-end microprocessor cache               | Xilinx Virtex<br>Altera Apex, Flex |
| ZBT™ (Zero Bus Turnaround) SRAM | Network interface cards and switches <200 MHz       | NA                                 |
| QDR (Quad-data rate) SRAM       | High-speed networking switches and routers >200 MHz | NA                                 |

# Size

For applications with simple memory requirements such as small FIFOs, external memory is not required with the SX/SX-A devices. With such devices, it is reasonable to use up to 25 percent of the sequential cells as memory equivalents. For example, one might use up to 1000 registers in the A54SX72A device to implement local distributed memory while leaving about 3000 registers available to implement logic functions. For larger, more complex memory requirements, an external memory device that is specifically suited to the application can be used. Some key advantages of using an independent external memory device rather than the embedded memory of an FPGA are:

• The RAM density and FPGA gate count can more closely match system requirements. Choosing an FPGA with embedded RAM can waste gates in order to obtain the required RAM size. This results in increased die area and system cost.



• The external RAM solution offers a simple migration path to higher-density RAM or FPGAs for future system upgrades. All that is needed is a simple component change plus minor changes to the interface logic. By using a standard JEDEC package that supports multiple densities (e.g. Micron and IDT have pin-compatible 2Mb, 4Mb, 8Mb RAM devices) the original circuit can be designed to accommodate various memory sizes.

#### Performance

Using FPGAs to interface with high-speed memory devices often enables meeting demanding clock-to-output or input setup-and-hold specifications. When writing to memory, the FPGA must output the data and control signals within a single system clock cycle as well as meet the external device's input setup-and-hold requirements. When reading from memory, the data access time (the time for the addressed data to appear on the bus) and the input setup-and-hold times on the FPGA are critical to performance.

Actel's SX/SX-A FPGA families are optimized for this application because the I/O cells are designed to offer the lowest clock-to-output delays in the programmable logic industry. The SX/SX-A families achieve worst-case clock-to-output delays of less than 4 ns and require 0 ns hold time and 0.5 ns setup time. This supports synchronous system performance of over 200 MHz when interfacing with external memory components. The global clock networks are also optimized for ultra-high performance and, in conjunction with their fine-grained antifuse architecture, are able to support internal clock rates in excess of 300 MHz SRAM-based FPGAs usually have larger die sizes than antifuse devices because SRAM switches must reside next to the gates. On the other hand, antifuse switching elements can be the same width as routing channels and located above them; therefore, the antifuse switches do not add to the die size. Also, the larger die sizes of the SRAM-based FPGAs cause additional delays in the device due to the increased length of the routing tracks.

## **Price Comparison**

We will use three design examples to demonstrate the cost effectiveness of the Actel SX-A family when combined with external SRAM components. The Actel SX-A FPGA family with external SRAM will be compared with the Xilinx Virtex, Altera FlexE, and Altera Apex FPGA families with internal SRAM. All prices are based on 1-99 piece quantities of standard-speed-grade FPGAs.

#### EXAMPLE#1: FPGA and 256k SRAM

Design requirements:

- Low gate count (8k logic gates)
- Low-density (256Kb), medium-performance (70 ns) SRAM
- Low-cost

Figure 1 shows a normalized comparison of the total system cost for the implementations in Table 2.



Figure 1 • FPGA and 256Kb SRAM

Table 2 • Alternative Implementations – FPGA with256Kb SRAM

| Vendor  | Solution    | SRAM Bits  | Package  |
|---------|-------------|------------|----------|
| Actel/  | A54SX08A/   | None/      | PQ208/   |
| Cypress | CY62256V    | 256k       | 70SNC    |
| Altera  | EPF10K200E  | 98k (max)  | BC600-3  |
| Allera  | or EP20K400 | 213k (max) | BC652-3  |
| Xilinx  | XCV400      | 236k (max) | -4HQ240C |

# EXAMPLE#2: FPGA and 1 Mb SRAM

Design requirements:

- Moderate gate count (32k logic gates)
- Medium-density (1Mb), high-performance (12 ns) SRAM
- Moderate cost

Figure 2 shows a normalized comparison of the total system cost for the implementations in the Table 3.



Figure 2 • FPGA and 1Mb SRAM

**Table 3** • Alternative Implementations – FPGA with1Mb SRAM

| Vendor  | Solution        | SRAM Bits  | Package  |
|---------|-----------------|------------|----------|
| ACTEL/  | A54SX32A-PQ208/ | None/      | PQ208/   |
| CYPRESS | CY7C021V33-12VC | 1Mb        | SOJ44    |
| ALTERA  | EP20K400        | 213k (max) | BC652-3  |
| XILINX  | XCV1000         | 524k (max) | -5BG560C |

## EXAMPLE#3: FPGA and 4 Mb SRAM

**Design requirements:** 

- Higher-density gate count (72k logic gates)
- High-density (Mb), high-performance (ZBT or DDR) SRAM
- Higher cost

Figure 3 shows a normalized comparison of the total system cost for the implementations in Table 4.



Figure 3 • FPGA and 4Mb SRAM

| Table 4 • | Alternative Implementations – FPGA with |
|-----------|-----------------------------------------|
| 4Mb SRAM  |                                         |

| Vendor  | Solution        | SRAM Bits  | Package  |
|---------|-----------------|------------|----------|
| ACTEL/  | A54SX72A-PQ208/ | None/      | PQ208    |
| CYPRESS | CY7C1350-133AC  | 4 Mb       | TQ100    |
| ALTERA  | EP20k400        | 213k (max) | BC652-3  |
| XILINX  | XCV1000         | 524k (max) | -5BG560C |

In addition to the advantage of lower system cost, the Actel solution can actually offer a lower component count. Volatile SRAM-based FPGAs require a second memory device for configuration at system power-up, resulting in a two-chip solution. If an external SRAM is also needed, this requires a third chip. Actel's nonvolatile FPGAs do not require a boot PROM at power-up, thus lowering the overall system component count.

# Implementing Distributed Memory and Interfacing to External FIFOs

Actel's ACTgen tool can generate Register Files and FIFO Flag Controller Logic. The Register File macro for the SX and SX-A families synthesizes the functional equivalent of small RAM blocks using ordinary logic. The register file macro is unique to these two families and provides the designer with a dual-port synchronous RAM architecture with parameterized word length and depth.



The FIFO Flag Controller is designed for off-chip RAM. It provides the state machine logic for flags used by a FIFO. Flag signal choices include Full, Empty, Almost Full, and Almost Empty. The threshold points for triggering the Almost Full and Almost Empty Flags can be specified by the user along with the word length and depth of the external RAM. A global reset capability for the FIFO address pointers and flag logic is also provided. Detailed information about these two macros can be found in Actel's online manual, *A Guide to ACTgen Macros*.

# **Design Example**

To demonstrate SX-A's high-speed interfacing capability, the following VHDL example describes a ZBT interface for use in very high-performance networking systems.

For scalability, the data, address, and parity bus widths as well as ByteWrite Select widths can easily be changed to accommodate specific system requirements. The VHDL generics that define the bus sizes for this example are listed in Table 5. The design example consists of two major blocks – the ZBT SRAM Interface and the ZBT SRAM Controller.

| Generic           | Value (in this example) |
|-------------------|-------------------------|
| Address           | 18                      |
| Data              | 32                      |
| Data Parity       | 4                       |
| Byte Write Select | 4                       |

# **ZBT SRAM Address/Data Interface**

The ZBT SRAM interface is between the SX-A FPGA's internal logic and the external ZBT SRAM. The logic in this block was optimized for a 200 MHz external ZBT SRAM interface. The logic required by the interface and the percentage that this would consume in an Actel SX-A device are listed in Table 6.

The slowest A54SX32A timing path between registers, using worst-case commercial operating conditions and a -3 speed grade, is 3.7 ns. This equates to a worst-case performance capability of 270 MHz.

#### **ZBT SRAM Controller**

The ZBT SRAM Controller is the logic that controls the ZBT SRAM's control signals (Table 7 on page 5). This block implements the FPGA designer's unique system logic, which would also contain the controller of the ZBT SRAM, and performs the following functions (at 200 MHz):

- Synchronous State Machine
- State Machine Output Signal Generation
  - ZBT SRAM Control Signals
  - ZBT SRAM Address Signals
  - ZBT SRAM Data Values
  - Data Comparator and Error Output Generation

The slowest timing path in the controller, between registers, is 4.7 ns. This equates to a performance capability of 212 MHz. Again, this is calculated for the 54SX32A, -3 speed grade, using worst-case commercial operating conditions. The ZBT controller is implemented as a state-machine that generates single-word pipelined read and write cycles, and quad-word pipelined burst read and write cycles. These four unique cycle types are described below:

- 1. Pipelined Write Cycle (1 x 36-bit word)
- 2. Pipelined Burst Read Cycle (x 36-bit words)
- 3. Pipelined Burst Write Cycle (4 x 36-bit words)
- 4. Pipelined Burst Read Cycle (4 x 36-bit words)

The ZBT SRAM controller has been designed to continually perform single and burst read and write command cycles in the sequence listed in Table 8 on page 5. These cycles are arranged to demonstrate the ability to perform all combinations of back-to-back read-to-read cycles, read-to-write cycles, write-to-write cycles, and write-to-read cycles. All cycle types run at 200 MHz, with 100% bus utilization and no dead cycles or wait cycles at any time. The data bus has a unique data value on every clock cycle.

The timing diagram for these cycles is shown in Figure 4 on page 5.

|                 | Number of<br>Cells | Percentage Utilization |          |          |          |
|-----------------|--------------------|------------------------|----------|----------|----------|
| Logic Cell Type |                    | A54SX08A               | A54SX16A | A54SX32A | A54SX72A |
| Combinatorial   | 24                 | 5%                     | 2.5%     | 1.3%     | 0.6%     |
| Sequential      | 176                | 69%                    | 33%      | 16%      | 9%       |

| Number of       | Percentage Utilization |          |          |          |          |
|-----------------|------------------------|----------|----------|----------|----------|
| Logic Cell Type | Cells                  | A54SX08A | A54SX16A | A54SX32A | A54SX72A |
| Combinatorial   | 60                     | 12%      | 6.5%     | 3.3%     | 1.5%     |
| Sequential      | 62                     | 24%      | 12%      | 5.8%     | 3.1%     |

 Table 7 • ZBT SRAM Controller Logic Requirement

 Table 8
 ZBT SRAM Command Cycle Sequence

| Cycle Type                                 | Data Width |  |  |
|--------------------------------------------|------------|--|--|
| Write Pipelined 1 x 36-bit Data Word Cycle |            |  |  |
| Read Pipelined 1 x 36-bit Data Word Cycle  |            |  |  |
| Write Pipelined 1 x 36-bit Data Word Cycle |            |  |  |
| Read Pipelined 1 x 36-bit Data Word Cycle  |            |  |  |
| Write Pipelined 4 x 36-bit Data Word Cycle |            |  |  |
| Read Pipelined 4 x 36-bit Data Word Cycle  |            |  |  |
| Repeat this sequence continually           |            |  |  |



Figure 4 • FPGA ZBT Controller Continuous ZBT Cycle Sequence

# **Critical Timing**

In order to interface to a high speed ZBT SRAM device, an FPGA must have very fast input/output (I/O) buffers. Every cycle can either be a read or a write cycle, and the cycle time for a 200 MHz clock rate is only 5 ns wide. The most challenging timing specifications are those involving bus turnaround - from a write to a read cycle or a read to a write cycle.

## Write-to-Read Cycle Transition

When transitioning from a write cycle to a read cycle, the requirements on the FPGA are listed below:

- Write Cycle The FPGA must ensure that write data is valid for a minimum setup time of  $t_{su}$  = 1.5 ns before ZBT\_CLK.
- Write Cycle The FPGA must ensure that write data is valid for a minimum hold time of  $t_{\rm h}=0.5$  ns after ZBT\_CLK.

To ensure meeting these very small input setup and hold times, the ZBT data pins are output from the FPGA on bidirectional buffers (BIBUF macro), and the ZBT clock is output from the FPGA on an output buffer (OUTBUF macro). The difference between the BIBUF delay and the OUTBUF delay, combined with the very fast SX-A FPGA, will meet the very small input setup and hold time requirements.



#### **Read-to-Write Cycle Transition**

When transitioning from a write cycle to a read cycle, the requirements on the FPGA are listed below:

- Read Cycle The ZBT SRAM can use a maximum of  $t_{rco} = 3.2$  ns to provide valid read data after ZBT\_CLK.
- Read Cycle The ZBT SRAM will only hold the read data for a minimum hold time of  $t_{cdc} = 1.0$  ns after ZBT\_CLK.

When a ZBT read cycle is performed, the ZBT SRAM only provides a minimum setup time of 1.8 ns ( $t_{cyc}$  -  $t_{rco}$  = 5.0 ns - 3.2 ns = 1.8 ns), and a maximum hold time  $t_{cdc}$  = 1.0 ns, with respect to the ZBT\_CLK.

To ensure meeting these very small input setup and hold times, the ZBT data pins are fed to generic FPGA input buffers (INBUF macro), and the ZBT\_CLK is fed back to the FPGA using the HCLKBUF macro. The HCLKBUF macro is a hardwired clock resource that has a very small insertion delay. The difference between the INBUF insertion delay and the HCLKBUF insertion delay, combined with the very fast SX-A FPGA, is that the INBUF will meet the very small input setup and hold times.

#### Conclusion

Actel SX-A and SX FPGAs are very high-performance programmable logic devices. When storage space is needed, they can implement small quantities of storage on the chip using logic functions. When large quantities of storage are needed, an external memory device is recommended. The larger the memory requirement, the less attractive SRAM FPGAs become, especially when additional support devices like boot-up PROMs are considered. The flexibility of selecting optimally-sized memory devices from multiple vendors, combined with cost savings arising from choosing an FPGA with optimal gate count, makes a compelling case for implementing a two-chip, FPGA plus external RAM solution.

Actel and the Actel logo are registered trademarks of Actel Corporation. All other trademarks are the property of their owners.



http://www.actel.com

## Actel Europe Ltd.

Maxfli Court, Riverside Way Camberley, Surrey GU15 3YL United Kingdom Tel: +44 (0)1276 401450 Fax: +44 (0)1276 401590 Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 USA Tel: (408) 739-1010 Fax: (408) 739-1540 Actel Asia-Pacific EXOS Ebisu Bldg. 4F 1-24-14 Ebisu Shibuya-ku Tokyo 150 Japan

**Tel:** +81-(0)3-3445-7671

**Fax:** +81-(0)3-3445-7668