# FP7- Grant Agreement no. 283393 - RadioNet3

<u>Project name</u>: Advanced Radio Astronomy in Europe

Funding scheme: Combination of CP & CSA

Start date: 01 January 2012 <u>Duration:</u> 48 month



# Deliverable 8.9 Revised Hardware Design Document

Due date of deliverable: 2015-08-31 Actual submission date: 2015-11-17

Deliverable Leading Partner: STICHTING ASTRONOMISCH ONDERZOEK IN

NEDERLAND (ASTRON), The Netherlands





# 1 Document information

Document name: Revised Hardware Design Document

Type Report

WP 8 (UniBoard<sup>2</sup>)

Authors Gijs Schoonderbeek (ASTRON)

#### 1.1 Dissemination Level

| Dissemination Level |                                                                                       |  |  |  |  |  |  |  |
|---------------------|---------------------------------------------------------------------------------------|--|--|--|--|--|--|--|
| PU                  | Public                                                                                |  |  |  |  |  |  |  |
| PP                  | Restricted to other programme participants (including the Commission Services)        |  |  |  |  |  |  |  |
| RE                  | Restricted to a group specified by the consortium (including the Commission Services) |  |  |  |  |  |  |  |
| со                  | Confidential, only for members of the consortium (including the Commission Services)  |  |  |  |  |  |  |  |

# 1.2 Terminology

ADC Analogue to Digital Converter

BF BeamFormer
bps Bits per second
BW BandWidth

COTS Commercial Of The Shelve

DDR Double Data Rate

EMI Electro-Magnetic Interference

Firmware Embedded or real-time code that runs on a microprocessor (e.g. written in C)

FPGA Field Programmable Gate Array

GMII GbE media independent Interface 8 bits @125MHz

Hardware Boards, subracks and COTS equipment

HDL Hardware Description Language

IO Input Output

IP Intellectual Property

IPC Association Connecting Electronics Industries (formally Institute for Interconnecting and

Packaging Electronic Circuits)

LDO Low DropOut regularo

LUT Look Up Table

MAC Multiply and Accumulate, Medium Access, Monitoring and Control, Media Excess

Controller (layer 2 of OSI model)

PCB Printed Circuit Board

POL Point of Load

PHY physical interface (layer 1 of OSI model)

RF Radio Frequency

RSP Remote Station Processing (in LOFAR)

SFP+ SFP for 10GbE

Subband Frequency band, unit output of the filterbank

XAUI 10G attachment Unit Interface (4x 3.125Gbps) interface between MAC and PHY

XGMII 10G Media Independent Interface

XFP 10G Small form-factor Pluggable transceiver

#### 1.3 References

- [1] UniBoard2 Work Package description, Arpad Szomoru, 2011-10-28, RadioNet3 283393
- [2] UniBoard Evaluation, Gijs Schoonderbeek, March 2012, ASTRON-RP-1316 1.0
- [3] Arria 10 Device Overview, Altera 2013.09.04, <a href="http://www.altera.com/literature/hb/arria-10/arria\_10\_aib.pdf">http://www.altera.com/literature/hb/arria-10/arria\_10\_aib.pdf</a>
- [4] UltraScale Architecture and Product Overview, Xilinx, DS890
- [5] QSFP+ Cage and PT Connector Assembly, TE Connectivity, 108-127010
- [6] Product Specification 40BASE-SR4/10GBASE-SR 300m QSFP+ Gen2 Optical Transceiver Module FTL41QD2C, Finisar, 24-Jul-13 Rev B5
- [7] CXP Connector and Housing Assembly, TE Connectivity, 114-13283, 20 AUG 12 RevD
- [8] Impact<sup>TM</sup> Backplane Connector and Cable Assembly System, Molex <a href="http://www.molex.com/molex/products/family?key=impact\_backplane\_connector\_system">http://www.molex.com/molex/products/family?key=impact\_backplane\_connector\_system</a> &channel=PRODUCTS&chanName=family&pageTitle=Introduction#related
- [9] ERmet ZDplus, ERNI Electronics GmbH
- [10] MiniPOD<sup>™</sup> Datasheet, Avago, AV02-3438EN, July 12, 2012
- [11] Experimental evaluation of Avago MicorPOD 120Gbs module, Peter Maat, December 2012, ASTRON / DOME
- [12] Cyber SKA website: <a href="http://www.cyberska.org/pg/groups/27370/powermx-platform-working-group/">http://www.cyberska.org/pg/groups/27370/powermx-platform-working-group/</a>
- [13] InfinX High Speed Mezzanine Connector, Amphenol TCS
- [14] VSC7389 SparX-G16 Datasheet, Vitesse Semiconductor Corp, VMDS-10187 Revision 2.4 December 2006
- [15] AVSP-1104 Datasheet, Avago Technologies, version 1.4.3, November 2012
- [16] VSC3144-12 Product brief, Vitesse Semiconductor Corp. VPPD-02743 Revision 1.0 2011
- [17] BCM56540 Product brief, Broadcom, 56540-PB01-R April 28,2011
- [18] Micron HMC: http://www.micron.com/products/hybrid-memory-cube/short-reach-hmc
- [19] Hybrid Memory Cube Specification 1.0, Hybrid Memory Cube consortium, 2013
- [20] Hybrid Memory Cube consortium: http://www.hybridmemorycube.org/
- [21] Breaking The 1Tbps Barrier With UniBoard2, Kai Schmidt, EBV <a href="http://blog.ebv.com/breaking-the-1tbps-barrier-with-uniboard2/">http://blog.ebv.com/breaking-the-1tbps-barrier-with-uniboard2/</a>
- [22] UniBoard2 Measurement Report, Gijs Schoonderbeek, Sjouke Zwier and Leon Hiemstra, August 2015, ASTRON-RP-1494 1.0

# 1.4 Content

| 1  | Doc     | cument information                             | 2  |
|----|---------|------------------------------------------------|----|
|    | 1.1     | Dissemination Level                            | 2  |
|    | 1.2     | Terminology                                    | 2  |
|    | 1.3     | References                                     | 3  |
|    | 1.4     | Content                                        | 4  |
| 2  | Intro   | oduction                                       | 5  |
|    | 2.1     | Scope                                          | 5  |
|    | 2.2     | System functionality                           | 5  |
|    | 2.3     | General Design Standpoints                     | 5  |
|    | 2.4     | UniBoard1                                      | 6  |
|    | 2.5     | Mezzanine                                      | 6  |
|    | 2.6     | Single Column                                  | 7  |
|    | 2.7     | Overview                                       | 8  |
| 3  | Tec     | hnologies                                      | 9  |
|    | 3.1     | FPGA                                           | 9  |
|    | 3.2     | In- and output interfaces                      | 10 |
|    | 3.3     | Memory                                         | 15 |
|    | 3.4     | PCB                                            | 19 |
|    | 3.5     | Mezzanine connector                            | 19 |
|    | 3.6     | Clock and control                              | 19 |
|    | 3.7     | CPU                                            | 20 |
|    | 3.8     | Ethernet Switch                                | 24 |
|    | 3.9     | Test strategy                                  | 24 |
| 4  | Sys     | tem examples                                   | 25 |
|    | 4.1     | Making a beamformer with UniBoard <sup>2</sup> | 25 |
|    | 4.2     | Making a Correlator with UniBoard <sup>2</sup> | 27 |
| 5  | Wha     | at will it look like?                          | 30 |
|    | 5.1     | Meshing on UniBoard                            | 31 |
| 6  | Res     | sults Prototype                                | 33 |
| 7  | Mod     | dification for Rev 2.0                         | 33 |
| 8  | App     | pendix A, Measurement Report                   | 34 |
| 1  | Intro   | oduction                                       |    |
|    | 1.1     | Applicable documents (AD)                      |    |
|    | 1.2     | Reference documents (RD)                       |    |
|    | 1.3     | Abbreviations                                  |    |
|    | 1.4     | Used Equipment                                 |    |
| 2  |         | edance                                         |    |
| 3  | Pov     | ver up                                         |    |
|    | 3.1     | Power Sequencing                               |    |
|    | 3.2     | 1000uF Hold-up                                 |    |
|    | 3.3     | Input Current                                  |    |
| 4  |         | ındary Scan                                    |    |
|    |         | Standard Tests                                 |    |
|    | 4.2     | JTAG Functional Test (JFT)                     |    |
| 5  |         | ernet communication                            |    |
| 6  |         | NFIG                                           |    |
| 7  |         | R4 test                                        |    |
| 8  |         | Des                                            |    |
| 9  |         | nclusion                                       |    |
| 1( | ) 1\/16 | emory Calibration Report                       | 52 |

#### 2 Introduction

#### 2.1 Scope

UniBoard<sup>2</sup> is a Joint Research Activity (JRA) in the RadioNet3 project [1], funded by the EC through the FP7 programme, under grant agreement no. 283393. The partners in this JRA are the Universities of Bordeaux and Orleans (UB, UORL respectively), INAF, MPG, the University of Manchester (UMAN), ASTRON and JIVE. This document started out as an overview of several options for the high-level hardware design of UniBoard<sup>2</sup> and has served as a discussion piece in the collaboration. It summarizes a number of board design options, incorporating the inputs and suggestions of several of the collaborators. This final version, which is submitted as deliverable D8.2, reflects the consensus on the high-level design and functionality of the hardware platform among the UniBoard<sup>2</sup> partners. This being a high-level design document, throughout the actual hardware design phase several more in-depth technical documents will be produced and published on the activity RadioNet3 wiki.

# 2.2 System functionality

In Figure 1 the block diagram of a general beamforming or correlator architecture is shown. This functionality can be mapped on single UniBoards, or mapped on to multiple boards on a backplane.



Figure 1 System block diagram

At the input side, data from multiple sources are aligned and fed into a filterbank. In the filterbank the frequency band is split into multiple channels. The channels are transposed from all frequency channels of one signal input to all signal inputs of one frequency channel. This transpose function can be partly implemented in firmware (on the FPGA) and in the routing between the FPGAs. In the last stage the data from the different inputs is processed (beams are formed or correlation products calculated).

# 2.3 General Design Standpoints

For the design of UniBoard<sup>2</sup>, a processing system for antenna signals, we want to use the >50% principle. This means that:

- >50% of the power consumption goes to the processing of the antenna signals
- >50% of the cost goes to the goes to the processing of the antenna signals
- >50% of the design time goes to the development for the processing

#### 2.4 UniBoard1

In Figure 2 the first concept is shown. This concept is based on the first UniBoard [2], which has the maximum possible number of FPGAs on a reasonably sized board with a semi-full mesh connecting the FPGAs. On one side off the board standard optical interconnections are placed, on the other backplane connectors. Through these connectors board to board connections can be established via a backplane.



Figure 2 Concept 1 UniBoard like

#### Pro's

- Minimize external interconnections
- · Reuse architecture from UniBoard
- Full speed FPGA-to-FPGA interconnections on the board.

#### Con's

- Scaling UniBoard<sup>2</sup>'s to large systems limited
- Fixed routing
- · Fixed ratio between font and back nodes.

#### 2.5 Mezzanine

The second concept is the mezzanine concept. In this concept the fixed mesh is replaced by a mezzanine board. Mezzanines can have different mesh structures. By replacing the mesh with active components like a switch, flexible routing or data transport can be offloaded from the main board. A parallel optics module placed on a mezzanine board would allow short range optical interconnections.



Figure 3 Concept 2 Mezzanine

#### Pro's

Flexible

#### Con's

- Lot of connectors, and crossings introducing risk and SI issues
- · Height of the combination limits rack density
- · Cooling more complicated

# 2.6 Single Column

The third concept is the single column concept. With the mezzanine connectors replaced by backplane connectors, a single column of FPGAs remains. In this concept all FPGAs have the same interconnection scheme.



Figure 4 Concept 3 Single Column

#### Pro's

- Flexible
- Smaller boards are less expensive
- All FPGAs have the same IO interfaces.

#### Con's

• Connectors and crossings introducing risk and SI issues

# 2.7 Overview

On overview of the architectures are shown in Table 1.

**Table 1 Overview architecture selection** 

|                                   | UniBoard like                                         | Mezzanine                                                                | Single Column                                     |
|-----------------------------------|-------------------------------------------------------|--------------------------------------------------------------------------|---------------------------------------------------|
| Density                           | 8 FPGAs                                               | 8 FPGAs                                                                  | 4-5 FPGAs                                         |
| FPGA-FPGA speed                   | Optimal no connector crossings                        | Connector crossing limiting maximal throughput                           | Connector crossing limiting maximal throughput    |
| Cost                              | Lowest cost per MAC                                   | Extra connectors needed                                                  | For 8 FPGAs extra control logic and PCB's needed. |
| Flexibility                       | Ratio first stage / second stage processing fixed 1:1 | Ratio first stage / second stage processing programmable with mezzanine. | Ratio in multiple of fours.                       |
| # Hops from one node to any other | 2, within subrack                                     | 2, with mesh of 1 with parallel optics                                   | 1                                                 |
| Parallel ADC BW                   | 4 FPGA (50% of the FPGAs)                             | 4 FPGA (50% of the FPGAs)                                                | 4 FPGA (100% of the FPGAs)                        |

# 3 Technologies

#### 3.1 FPGA

# 3.1.1 Configuration

The requirements for the configuration are:

- Remote configuration (downloading new images in the flash) must be possible.
- Configuration within 5 sec

# 3.1.2 Altera Options

Some of the options are summarized in Table 2.

**Table 2 Altera FPGAs** 

| Device | Multipliers | KLE  | Mbits | CPU | Max. Trans. |
|--------|-------------|------|-------|-----|-------------|
|        |             |      |       |     | Speed       |
| GX660  | 3356        | 660  | 42    | No  | 17.4 Gbps   |
| GX900  | 3036        | 900  | 47    | No  | 17.4 Gbps   |
| GX1150 | 3036        | 1150 | 53    | No  | 17.4 Gbps   |
| GT900  | 3036        | 900  | 47    | On  | 28 Gbps     |
| GS660  | 3036        | 660  | 42    | yes | 17.4 Gbps   |

Table 3 IO per package

|        | F40 (1517 pins) |     | F45 (1932 pins) |     |
|--------|-----------------|-----|-----------------|-----|
|        | Transceivers IO |     | Transceivers    | Ю   |
| GX900  | 48              | 624 | 96              | 480 |
| GT900  | 48              | 624 | 96              | 480 |
| GX1150 | 48              | 624 | 96              | 480 |
| SX660  | 48              | 588 | -               |     |

More information can be found in [3].

#### 3.1.3 FPGA pinning

To get some feeling of the pinning needed for this setup in the tables Table 4 and Table 5 are used to explore the pinning for the shown setup.

**Table 4 Preliminary standard IO pinning requirements** 

| Function     | Pins per block | Blocks | Total pins |
|--------------|----------------|--------|------------|
| DDR4         | 140 (SE)       | 4      | 560        |
| LVDS         | 9 (DIFF) 18    | 4      | 72         |
|              | (SE)           |        |            |
| Test IO      | 24             | 1      | 24         |
| Monitoring & | 4              | 1      | 4          |
| Control      |                |        |            |
|              |                |        |            |
| Total        |                |        | 660        |

As can be seen in Table 3, the required pins are more than the available pin's. Some choices have to be made!

| Function  | Pairs per block | Blocks | Total pairs |
|-----------|-----------------|--------|-------------|
| QSFP+     | 4               | 4      | 16          |
| Backplane | 4               | 8      | 32          |
| Power Mx  | 12              | 1      | 12          |
|           |                 |        |             |
| Total     |                 |        | 60          |

**Table 5 Preliminary Serial IO pinning requirements** 

# 3.2 In- and output interfaces

For the in- and output interface standard 40/100GbE interfaces will be used.

#### 3.2.1 Front panel interface

On the front side an optical interface will be placed. In Figure 5 an overview of the interface is shown. For UniBoard<sup>2</sup> no media converter is needed. The 10/25Gbps interface can interface directly to the optical module.



Figure 5 Block diagram optical interface

The length of the connection outside the board is not known on forehand, as this can be anywhere from <10m, in order to connect multiple cabinets, up to 10km, for connecting nearby telescopes. Different modules can be plugged into the cage on the board to accommodate both short and long range applications.

An overview of industry standard IO solutions is shown in Table 6.

| Table | 6 | Standard | 10 | solutions |
|-------|---|----------|----|-----------|
|-------|---|----------|----|-----------|

| Technology | Pro                                                       | Con                             | Lanes               | Board area         |
|------------|-----------------------------------------------------------|---------------------------------|---------------------|--------------------|
| QSFP+      | <ul><li>Pluggable module</li><li>Industry ready</li></ul> |                                 | CAUI-4<br>(4x up-to | Stacked<br>25x73mm |
|            | (Finisar / Avago<br>module available for                  |                                 | 10.5Gb/s→           |                    |
|            | 40GBASE-SR4)                                              |                                 | 40Gbps)             |                    |
|            | - MSA (IEE802.3ba)                                        |                                 |                     |                    |
| CXP        | - Pluggable module                                        | <ul> <li>Short range</li> </ul> | CAUI                | stacked            |
|            | - Industry ready                                          |                                 | (12x12.5Gb/s        | 27x69mm            |
|            | (Finisar module                                           |                                 | → 100Gbps)          |                    |
|            | available for                                             |                                 |                     |                    |
|            | 100GBASE-SR10)                                            |                                 |                     |                    |
|            | - IEE802.ba                                               |                                 |                     |                    |
| CFP        | - Pluggable module                                        | Large footprint                 | In: CAUI            |                    |
|            | - Industry ready                                          |                                 | Out:4x25            |                    |
|            | (Finisar)                                                 |                                 |                     |                    |
| SFP+       | - Pluggable module                                        | Single channel                  | 1x 25Gbps           |                    |
|            |                                                           | takes more board                |                     |                    |
|            |                                                           | space                           |                     |                    |

The Technologies are described in more detail in the following subsections.

#### 3.2.1.1QSFP+

UniBoard<sup>2</sup> will have a cage for QSFP+ pluggable modules. In Figure 6 an example of a TE cage is shown [5]. Multiple suppliers like Molex, TE, Samtec make these cages. Instead of QSFP+ a zQSFP+ cage could be placed, optimised for higher data rates like 100Gbps (4x28Gbps).



Figure 6 TE QSFP+ cage

An example of optical modules that could be used is the Finisar FTL410QD2C 40GBASE-SR4 [6]. Besides optical modules direct attach copper cables can be plugged in as well. An example is the Cisco QSFP-H40G-CU50CM, shown in Figure 7.



Figure 7 QSFP pluggable 10GBASE-SR4 module (left) and passive direct attach copper (right)

The power consumption of the modules is of the order of 1.5W. With eight modules on the board a total of 12W will be consumed

#### 3.2.1.2CXP

CXP is a 12 fibre standard in which each fibre can handle 10Gbps data rates, resulting in 120Gbps per interface. This standard is used for short distances. Cages are available through Molex and TE [7], module by Finisar.



Figure 8 CXP cage and optical connectors

#### 3.2.2 Backplane interface

For the backplane side a copper interface will be made. Through this interface multiple UniBoards can be connected to a backplane. The backplane interface will be used for:

- ADC input
- Serial connectivity
- Power input
- Control signals

#### 3.2.2.1 Serial connectivity

As described by Altera [3] speeds of up to 14Gbps can be achieved on a backplane. With 24 transceivers the maximal throughput per FPGA will be of the order of 300Gbps.

The backplane connector used for UniBoard1 is capable up to 12 Gbps. For UniBoard<sup>2</sup> another type of connectors might be more advantageous. In Figure 9 an example of an orthogonal connector from Molex [8] capable of data rates up to 25Gbps is shown. With liquid cooling boards do not need to be mounted vertically anymore, which makes different board arrangements possible.



Figure 9 Molex Orthogonal connector

#### Orthogonal plugging

More details about the orthogonal pinning are shown in Figure 10. From this detail it can be seen that when orthogonal board are to be used a small backplane pass connector could be used. The pinning is point symmetric. This should constrain the pinning of the connector (B1-C1 for transmit T11-S11 for receive).



NOTE: PINOUTS SHOWN IN BALLOONS ARE GROUNDS.
PINOUTS SHOWN ARE FOR SQUARE (12 COLUMN) PATTERN ONLY.

#### Figure 10 orthogonal pinning

In section 5.1 an example of using orthogonal connectors is shown.

#### Coplanar plugging

Instead of orthogonal plugging back-to-back or coplanar plugging can be used as well, see Figure 11.



Figure 11 Coplanar plugging

Due to the ground pin's it is however not possible to use the same board for the normal daughter card side and the RAM assembly. This connection type can be used between UniBoard and breakout board (not between two UniBoards).

#### 3.2.2.2Standard backplane connectors

The standard backplane connectors as used by UniBoard are shown in Figure 12. With proper layout this connector is capable of 10Gbps. For a modified version (with the same pinning) the speed in increased to 20Gbps (including back drilling) [9].



Figure 12 ERmet ZD backplane connector

#### 3.2.2.3ADC interface

Part of the backplane interface is an ADC interface. For UniBoard1 an LVDS type interface is implemented, with four times 8 bits per FPGA. More pins may be used for the ADC interfaces, depending on the connector pinning.

More and more ADCs nowadays are equipped with the JESD2004 serial interface. These types of ADCs can be connected to the UniBoard<sup>2</sup> through the serial interfaces.

## 3.2.3 Short range optical interface

If not all the transceivers are used for the default connections, a short range optical interface could be placed next to each FPGA. As an example the miniPOD [10] and MicroPOD are shown in Figure 13. On these connectors 12 receive or 12 transmit serial interfaces can be made at a maximum of 10Gbps each, resulting in a 120Gbps aggregate bandwidth. By placing a transmitter and receiver near each FPGA other interconnections to and from the FPGAs can be made, creating for example a ring architecture on the board, or giving the possibility to connect an optical backplane or extra optical IO. An evaluation of these modules is described in [11].





Figure 13 Avago Micro Pod (left) with UniBoard (right)

# 3.2.4 Power<sup>MX</sup>

Instead of a dedicated IO connector a PowerMX IO Cluster can be placed. The smallest Atomic is a Qtr I/O Cluster which exists of two 100 pin's MegArray connectors, see Figure 14.

On one of these connectors 12 serial links can be placed, on the other 4 serial links and control signals. More information about the PowerMX can be found on the CyberSKA website [12].



Figure 14 Power<sup>MX</sup> IO cluster

# 3.3 Memory

#### 3.3.1 Persistence Memory

Persistence memory can be used to store information like

- IP/MAC addresses
- Board parameters (serial number)

#### 3.3.2 Image Memory

Serial Flash can be used for the FPGA image.

#### 3.3.3 Application Memory

The memory requirements for UniBoard<sup>2</sup> are:

- 1 second of data storage
- FIFO (write and read within sample rate)

Solutions for the memory are:

- DDR3
- DDR4 (Standard DIMMs)
- GDDR5
- QDRII / QDRII+
- Hybrid Memory Cube

Some major technologies are described in the following subsections:

# 3.3.3.1 DDR3 (DIMM / SODIMM)

Double Data Rate memory is a SDRAM based memory. DDR3 chips can be placed on JDEC standard modules. The same 64 or 72 bits wide bus is used for writing and reading. The transfer rate of a

SODIMM memory module is up to 1866MT/s (Micron's maximal speed grade), which for a 64 bit wide module results in 14.9GByte/s (119Gbps). The power consumption of the module can be reduced by lowering the supply voltage to 1.35V. With a bus width of 72 bit and a maximal speed of 1866MT/s, DDR3 can reach a speed of 17GByte/s (without overhead). UniBoard<sup>2</sup> could hold at least two modules per FPGA, resulting in a maximum speed of 238.8Gbps.

#### 3.3.3.2DDR4 (DIMM / SODIMM)

DDR4 is the next DDR generation. Compared with DDR3 power consumption is reduced and speed increased. With a theoretical maximal transfer rate of 3200MT/s the speed of DDR4 is 33% higher than that of DDR3. Reducing the voltage from 1.5V to 1.2V further reduces power consumption. In the future it should be possible to drop the voltage to 1.05V. First engineering samples have been produced, however Micron does not have any SODIMM DDR4 modules available yet, although production should have started Q4 2012.... At an speed of 3200MT/s and a bus width of 64 bit, DDR4 can reach transfer speeds of up to 25GByte/s (without overhead).

Pinning of the DDR4 is shown in Table 7.

| Function  | Pins                     |
|-----------|--------------------------|
| Data      | 72                       |
| Address   | 3 (GA) + 3 (BA) + 18 (A) |
| Data Mask | 9                        |
| DQS       | 9 (differential          |
| Control   | 5                        |
|           | (RAS+CAS+WE+ODT+ACT)     |
| CKE       | 4                        |
| Clock     | 4 (differential)         |
| Total     | 136 IO lines needed      |

**Table 7 Preliminary DDR4 SODIMM pinning** 

For UniBoard2 two or four DDR4 modules can be placed. When four modules are placed per FPGA a memory bandwidth of 688 Gbps Half duplex can be achieved. In this case a device with more IO is needed, compromising the number of transceivers to the front panel and the backplane. When two modules are placed per FPGA the memory bandwidth to the DDR is approximately 344Gbps (90% efficiency, 72 bits). This is not sufficient for the input bandwidth.

#### 3.3.3.3GDDR5

Graphics Double Data Rate memory is mainly used on GPU cards. This memory uses different clocks for commands and data. The input is split into two 32bit wide busses with each a separate clock for writing. The top speed is 7Gbps, for normal applications the speed is 3.2Gbps, resulting in 25.6GByte/s (without overhead). However GDDR5 is not supported by Altera or Xilinx.

#### 3.3.3.4QDRII+

Quad Data Rate SRAMs are dual port Double Data Rate SRAM memories. With a data bus width of 36 bit and a speed of 1100Mbps (550MHz) the double bus can reach data speeds of up to 10GByte/s (without overhead). The maximal size of a QDRII+ device is of the order of 18MByte

#### 3.3.3.5 Hybrid Memory Cube

Hybrid Memory Cube is a new type of memory which became possible by the use of TSV (Through Sillicon Vias). In this technology memory chips are stacked onto a serial interface chip. The technology is under development, close to producing engineering samples. First pictures are available of two devices with four or two serial links each link build round 16 lanes reaching 160 / 120Gbps. With the use of serial interfacing to the memory the memories can be placed for near memory close to the FPGA or far memory on another module. With four lanes and a total data rate of 160Gbps HMC can reach speeds of up to 20 GByte/s (without overhead, counting send and receive and the same time).

To make deeper memories, memory bandwidth can be exchanged with size. By linking HMC devices as shown in Figure 15 for a 4-lane device 4 HMC's can be chained. With an eight lane device 8 HMC's can be chained.

More information about HMC can be found in [18] [20] and [20].



Figure 15 Chaining multiple MHC devices

#### 3.3.3.6Conclusion

An overview of the types of memory is given in Table 8.

Table 8 Overview of application memory solutions

| Techn. | Pro                | Con                           | Speed                   |
|--------|--------------------|-------------------------------|-------------------------|
| DDR3   | SO-DIMM Module     | Latency                       | 2133Mbps*8=17Gbyte/s/mo |
|        | (limited board     |                               | d.                      |
|        | space)             |                               |                         |
| DDR4   | DIMM module        | Modules not in production yet | 2666Mbps*8=21Gbyte/s/mo |
|        | Low power Standard |                               | d.                      |
| GDDR   | Fast               | BGA-chips                     |                         |
| 5      |                    | Capacity up-to 256MByte       |                         |
|        |                    | IO not supported by Altera    |                         |
| QDR    |                    | BGA-chips                     | 1400Mbps                |
| II+    |                    | Price                         |                         |
|        |                    | memory size                   |                         |
| QDRIV  | _                  |                               | 2666Mbps                |
| HMC    | Fast               | Engineering state             | 20Gbytes/s/chip         |
|        |                    | Limited information           |                         |

QDR is small in memory size and uses a lot of board space and IO. For DDR more overhead is needed compared to QDR but the total BW is the same.

GDDR is not supported by the FPGA vendors making memory interface firmware complex. Therefore on UniBoard² two DDR4 memories will be placed next to every FPGA. This will enable a small memory interface while maintaining maximal IO bandwidth with 96 transceivers which are needed to achieve the best processing to IO-bandwidth ratio. By making an HMC break-out board the memory bandwidth will be increased.

# 3.3.4 Background information:

http://www.altera.com.cn/literature/wp/wp\_memoryselect.pdf

http://www.micron.com/products/dram/ddr3-to-ddr4

http://www.micron.com/about/blogs/2012/december/ddr4-gathers-momentum

http://www.cypress.com/?id=107&addcols=&parametric=html#parametric

http://www.micron.com/products/hybrid-memory-cube

#### 3.4 PCB

With the high-speed traces (up to 25Gbps) the PCB is part of the design. Losses of serial links increase with increasing speeds as skin depth, the area of copper used by the high-speed signal, becomes smaller. This can be compensated by using wider traces, which implies that the distance between the layers has to be increased (fewer layers in a 2.5mm board), or by decreasing the dielectric constant of the PCB material. Another source of loss is dielectric loss. New materials focus on both parameters.

#### 3.4.1 Material

One examples of PCB material is:

Panasonic Megtron-6 Df 0.002 used for >25Gbps

Before the layout of UniBoard<sup>2</sup> will start, a discussion with the PCB supplier will take place. In this discussion the material selection and the technologies used (microvia's) will be covered.

#### 3.4.2 Trace impedance

By using 85  $\Omega$  instead of 100  $\Omega$  the traces and the layers can be moved closer, allowing more layers and more traces.

#### 3.5 Mezzanine connector

For concept 2 a mezzanine connector is needed which can handle the serial interconnections of the FPGA. This means that for a full duplex connection of four 40Gbps interfaces, each built around four 10Gbps interfaces, at least 32 pairs are needed. In Figure 16 the Amphenol InfinX<sup>TM</sup> High Speed Mezzanine connector [13] is shown. This connector has pairs capable of 25 Gbps. The maximum number of pairs per connector is 108 (for the largest connector).



Figure 16 Mezzanine connector

#### 3.6 Clock and control

#### 3.6.1 Clock and PPS

Tuning of the clock and PPS traces will be used for all nodes with ADC interfaces. This should ease firmware design for synchronic ADC sampling.

#### 3.6.2 Control

To reduce the number of interconnections to the board a 1 GbE switch will be placed on the board, which will distribute one 1GbE to all FPGAs. For the switch the Vitesse VSC7389 [14] will be used. This switch has 8 SGMII interfaces (each interface consist of two pairs, one for transmit and one for receive) for connecting the FPGAs, and 8 integrated tri-speed copper

transceivers of which four will be connected to the front side of the board. The VSC7389 has an EEPROM for storing the controlling software of the switch.

Through I2C interfaces the temperature of the FPGA can be monitored. The last FPGA on the board (lower backside FPGA) will use the I2C interface for monitoring not only the FPGA temperatures but also the board power levels. All front FPGAs have dedicated status information about the optical interface. This information includes whether fibres are connected and any errors during communication. In Figure 17 an overview of the control is given.



Figure 17 UniBoard control

#### 3.7 CPU

UniBoard1 has a CPU embedded in the FPGA. On UniBoard<sup>2</sup> an external CPU could be placed for board control. The tasks of the CPU are:

- Remote configuration
- Reset
- Register access

#### Nice to have

- TCP/IP connection to UniBoard
- Post processing data
- High level control instead of register control
- Processing settings

Interfaces from the CPU to the FPGAs and outside

- Ethernet connection
- I2C for temperature monitoring
- JTAG programming FPGA's / Flash

#### 3.7.1 Possible solutions

The CPU can be located on the UniBoard<sup>2</sup>, on a module or in the form of an external PC (Local Control Unit, LCU). The options are discussed in the following subsections. A table summarizes the pros and cons.

#### 3.7.1.1FPGA solutions

The CPU can be located inside the FPGA. This could work together with an external LCU when needed.

#### Hardware implementation

For the basic functionality a processor might not be needed. Using dedicated hardware blocks and efficient fast access, the memory map/control bus and the configuration can be created. On the LCU all processing can be done for all nodes. When more processing would be needed the LCU could be upgraded.

#### Embedded Softcore Processor in FPGA

The UniBoard uses an embedded CPU (Nios). The logic needed for the UniBoard NIOS is of the order of 1-2% of the total.

#### Embedded ARM processor

The current FPGA families have parts with and without hardcore ARM processor. Within the Arria family the SoC devices are footprint compatible.

#### 3.7.1.2Processor on UniBoard

#### Small processor (uP)

On UniBoard<sup>2</sup> a small processor could be placed for basic control functionality. This type of processor has a 100MBit interface. Examples are a PIC processor, Arduino, Raspberry Pi. These processors have limited functionality but have a user-friendly interface. The processor can also be used for general board control like reset, setting and monitoring the power supplies. A PIC processor is used in the Apertif system to control a subrack with four UniBoards.

#### Full scale processor on UniBoard

When the functionality must be increased a bigger CPU could be placed on UniBoard<sup>2</sup> with a proper operating system. This CPU has dedicated memory. A possibility is the Freescale P4080 (used in the Dome project).

#### Pro's

- Don't need processor in the FPGA (a Nios-processor might still be needed to setup the transceivers, DDR and small control loops)
- TCP/IP connection to UniBoard instead of UDP (although UDP is now commonly used in the financial networking due to its limited overhead (reduction in transport time))
- Linux on UniBoard
- FPGA configuration

#### Con's

- Board space for CPU and DDR memory comparable with FPGA (in other words a CPU instead of an FPGA)
- Development time (as experienced in the DOME project, a second spin might be needed to get the CPU running), CPUs have a variety of busses and more clock sources compared to an FPGA. (e.g. CASPER)
- The life time of a CPU is shorter than that of an FPGA or even the board life time of the UniBoard (the time between the design and the last production batch of UniBoard<sup>2</sup> is expected to be > 5 year). This means that the CPU will be obsolete by the time the board is in production.
- Extra tool flow needed for Processor software (e.g. CASPER).

 Solder connections and components needed for the CPU introduce more risk in the production of the board.

#### 3.7.1.3Processor on Module

Instead of placing a CPU on UniBoard<sup>2</sup> and increasing the complexity of routing and fabrication, a CPU can be placed on a module. In this ways the complexity of UniBoard can be limited while enabling the flexibility of the CPU for control.

#### COM (CPU on module)

A standard form factor can be used. Standard CPU modules are:

- COM
   Express
   Module size ranging from 84x55mm till 155x110mm (Basic module 95x125mm).
   Connector has 220 pins which has, USB ports, 6 PCIe lanes, 24 bit LVDS channels
   Gigabit Ethernet and 8 GPIO.
- ESMexpress
  - Module size of 85x115m. Connectors with LDVS, SATA, USB, 3x GbE PCIe PEG
- Qseven
  - Modules with a size of 70x70mm. Board edge connector  $\rightarrow$  limited hight of components underneath module.

#### Background

http://emea.kontron.com/com-express-r-from-kontron/

#### CPU on XGB

Instead of placing the CPU on UniBoard<sup>2</sup> a Backplane board can be made holding the CPU.

#### 3.7.1.4Connections to the FPGA

The connection between the CPU and the FPGA can be through

- Ethernet:
  - Ethernet is usually used for mid to long distances. This serial interface can handle speeds of up to 10Gbps. For control interfaces speeds of up to 1Gbps can be used. The FPGA will need logic to handle the traffic. In the newer FPGAs hard IP in combination with firmware can be used to implement the control interface.
- PCle:
  - PCI express is a low level (level 1 and 2 of the OSI model) communication protocol which is used for chip-to-chip communication in a memory mapped local bus structure. The bus exists of a master, usually the CPU, and a slave. Between a master and multiple slaves a bridge needs to be established. Some processors however (like the Freescale P5020) have multiple busses (masters). Hard IP and firmware code examples are available from Altera for the configuration of their FPGAs. PCIe is used with OpenCL for the communication between the CPU and the FPGA.

#### 3.7.1.5 Conclusion CPU

A summary of the pro's and con's of the different CPU options are shown in Table 9.

Table 9 CPU pro's and con's

|                                              | Low level<br>Firmware<br>implementation | Soft core<br>(Nios)                   | CPU                                                                                      | Emb. CPU (SoC)                                                      | CPU on module                                                       |
|----------------------------------------------|-----------------------------------------|---------------------------------------|------------------------------------------------------------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------------|
| Hardware<br>design                           | Simple PCB design                       | Simple PCB<br>Design                  | Extra time and space needed on the PCB                                                   | Simple PCB<br>Design                                                | Connector<br>with routing<br>needed                                 |
| Hardware production risk                     |                                         |                                       | Increase in components and therefore higher risk.                                        | Increase in components and therefore higher risk.                   | Limited increase in components.                                     |
| SW/FW implementation                         | More firmware development time needed.  | Reuse of<br>UniBoard's<br>UNBOS       | Firmware design shifted to software. Software development time needed to get CPU running | Firmware design shifted to software.                                | Firmware design shifted to software.                                |
| Boot time                                    | Simple booting.                         | Simple booting                        | First CPU<br>Linux boot,<br>than FPGA.<br>This will take<br>time.                        | First CPU Linux boot, than FPGA. This will take time.               | First CPU<br>Linux boot,<br>than FPGA.<br>This will take<br>time.   |
| Interface<br>speed                           | Implementation made for speed.          | Soft core limits the control speed.   | CPU handles<br>TCP/IP. Low<br>level control<br>from CPU to<br>FPGA.                      | CPU handles<br>TCP/IP. Low<br>level control<br>from CPU to<br>FPGA. | CPU handles<br>TCP/IP. Low<br>level control<br>from CPU to<br>FPGA. |
| TCP/IP                                       | Development time needed to implement.   | Development time needed to implement. | Supported by kernel.                                                                     | Supported by kernel.                                                | Supported by kernel.                                                |
| Memory<br>bus                                | Part of implementa-tion                 | Part of implementa-tion               | implementa-<br>tion                                                                      | implementa-<br>tion                                                 | Part of implementa-tion                                             |
| Post processin g                             | Not possible                            | Not possible                          | On CPU.                                                                                  | On CPU.                                                             | On CPU.                                                             |
| Extra tool<br>needed for<br>develop-<br>ment | Not needed                              | Not needed                            | Compiler needed. Extra tools might be needed to use all function of the controller.      | FPGA tools needed for the CPU.                                      | Compiler incl. Module tools needed.                                 |

For all situations a control interface has to be implemented on the FPGA. This interface must accept PCIe (for an external large CPU on UniBoard<sup>2</sup>) or standard Ethernet. Only a small

number of IO-lines can be made from the CPU to the FPGA (not a complete register map). To place a CPU on UniBoard<sup>2</sup> for only a 'hello world' might be a bit of overkill. Therefore a firmware implementation will be taken as the starting point for UniBoard2. In this case a hardware implementation, soft core or an embedded CPU can be used. By using PCIe lanes to the break out board experiments with an external CPU can be done as well. A PIC processor with an Ethernet interface will be placed on UniBoard<sup>2</sup> for reset and power settings.

#### 3.8 Ethernet Switch

To increase the flexibility of UniBoard<sup>2</sup> an onboard switch will be needed. In this section some examples are discussed in more detail.

#### 3.8.1 10Gbps Gearbox

The Avago Technologies Vortex Gearbox<sup>TM</sup> AVSP-1104 is a 10:4 gearbox [15]. This means that ten 10Gbps lanes can be multiplexed (full duplex) to four 25Gbps channels. The chip can also be used as an repeater to increase the maximum distance between two points.

#### 3.8.2 10Gbps Switch

The Vitesse Semiconductor VSC3144-12 [16] is a 144x144 Cross point Switch with data rates of up-to 10.7Gbps. Smaller variants (16x16) crossbars are available as well. A proprietary 5 wire bus is used to set the switch.

#### 3.8.3 40Gbps Switch

An example of a 40Gbps switch is the Broadcom BCM56540 [17], 16x10G (4x40G) + 48x3G. This switch need a external processor to set the switch.

#### 3.8.4 Conclusion

Like the CPU the switch can be placed on an extension board or on a dedicated switching board connected to the backplane.

# 3.9 Test strategy

#### 3.9.1 Boundary scan

The board will be equipped with a boundary scan interface. Via this interface access will be provided to all boundary-scanable devices. This will enable testing the board at low speeds

#### 3.9.2 Monitoring

All FPGA temperatures will be monitored. Where possible the power of a supply will be monitored as well.

#### 3.9.3 Operational status information

UniBoard<sup>2</sup> will be capable of monitoring:

- FPGA temperature
- Supply voltages
- Power supply status

# 4 System examples

# 4.1 Making a beamformer with UniBoard<sup>2</sup>

How could a beamformer be made using UniBoard<sup>2</sup>? In the figures Figure 18, Figure 19 and Figure 20 some examples are shown.

In the ring beamformer of Figure 18 the partial sums are made close to the input in each node. The output is transferred to the next FPGA on the board or the next UniBoard. With this structure nearly unlimited number of inputs can be processed. The bandwidth between nodes determines the output bandwidth.



Figure 18 Ring beamformer

The FFT beamformer shown in Figure 19 can be used to make the same number of beams as there are inputs. Increasing the number of inputs does not only increase the number of input boards (vertically) but the processing boards (horizontally) as well.



Figure 19 FFT beamformer

When not all beams need to be formed, but still more than can be formed in the ring structure, a subband beamformer can be used, see Figure 20. In this beamformer subbands from the input nodes are transferred to beamforming nodes. The number of input nodes does not have to be equal to the number of beamformer nodes. If sufficient resources are available the beamformer can be implemented on the input node FPGA.



Figure 20 Subband beamformer

In practice the beamformer could look like shown in Figure 21, with on one side of the backplane the sensitive analog channels and on the other side the input and processing nodes.



Figure 21 Beamformer system build up

#### 4.1.1 Apertif beamformer

What could the Apertif Beamformer be constructed with UniBoard<sup>2</sup>?

The analog bandwidth of the 64 inputs for the Apertif beamformer is 400MHz. From this input an average of 42 beams with a bandwidth of 300MHz (384 subbands) are made. This yields an input data rate of 410Gbips, after the filterbank 1152 Gbips on the board mesh, 1152Gbps on the backplane mesh. The output data rate is 160Gbps. The processing load is approximately 8.2TMAC.

If a beamformer were to be made using four UniBoard<sup>2</sup>s each with 4 Arria10 FPGAs the processing capacity would be 20TMAC, more than double that which is needed. This would result in a 1GHz input bandwidth or 128 inputs at 500MHz bandwidth. This could be handled by the doubling of speed in the LVDS pairs and the more than doubled IO at the front node.

# 4.2 Making a Correlator with UniBoard<sup>2</sup>

UniBoard<sup>2</sup> can be used to construct a correlator as well. In the figures Figure 22 and Figure 23 some examples are shown. For a ring correlator shown in Figure 22 correlation products are calculated near the signal input. All inputs are further transferred over the ring to the next node to calculate the next correlation product. This is comparable with the correlator structure as described in (CASPER, APERTIF AARTFAAC)



Figure 22 Ring correlator

In Figure 23 a subband correlator is shown. In the input nodes on one UniBoard the signals are aligned and passed through a filterbank. The subbands are distributed to correlator nodes where the same subbands from all signals are processed. The number of input nodes does not have to be the same as the number of correlation nodes. This is similar to the EVN UniBoard VLBI correlator.



Figure 23 Subband Correlator

In practice a multi UniBoard correlator system could look like shown in Figure 24, with the input and filter banks on one side and the correlation nodes on the other side.



Figure 24 UniBoards in a correlator system

A system like this would have a capacity of 8x4 FPGAs for the filterbank, a maximum of 8x4 FPGAs for the correlator resulting in 29 TMAC/s for the filterbank and 29 TMAC/s for the correlator, assuming Arria10 FPGAs. To transfer data directly from a filterbank node to a correlator node 32 transceivers (8x4) would be needed.

# 4.2.1 Apertif Correlator with UniBoard<sup>2</sup>

#### 4.2.1.110

The input of the correlator has 16\*24=384 10Gbps links.

The input at the UniBoard<sup>2</sup> front side has 4 FPGA, each FPGA has 4 QSPFs which have 4 times 10Gbps lanes, for a total of 64. Using the front panel connections 6 UniBoard<sup>2</sup>s are needed. The input at the UniBoard<sup>2</sup> backplane side has 4 FPGAs with 32 lanes at 10Gbps each. Using the backplane connections 3 UniBoard<sup>2</sup>s will be needed. When both the QSFP on the front panel and the backplane interfaces are used only 2 boards are needed to make the correlator.

# 4.2.1.2Processing

The current Apertif Correlator design has 16 UniBoards. On UniBoard<sup>2</sup> the processing capacity is doubled which results in 8 boards for processing

# 4.2.1.3System Design

Each optical link from the beamformer contain 1/16 of the frequency band (24 subbands). To combine all subbands from all telescopes for both polarisations would require 24 10G links

for a single correlator cell. If the backplane interface is used all data can be received on a single node. The 96 SFP+ cages per UniBoard<sup>2</sup> (using single mode fibres) can be divided over multiple input boards. In this case 16 modes are needed or 4 boards.

#### 4.2.1.4Power

Using UniBoard<sup>2</sup> will reduce the power needed for the correlator. Using half the number of board the estimated power reduction is 50%. The FPGA son UniBoard<sup>2</sup> have 4 times more capacity at 2 times the power, but UniBoard<sup>2</sup> has only half the number of FPGAs. If we assume 300W per UniBoard and 1W/year at 2€/W, switching over to UniBoard2 could save 1k€ per year of operation

# 4.2.2 AARTFAAC Correlator

#### 4.2.2.110

The input of the AARTFAAC correlator comes from 3 subracks with each 4 RSP processing boards with an infiniband interface existing of 4 lanes, resulting in 48 lanes at 10Gbps. UniBoard<sup>2</sup> has 32 lanes per node which means that only 1/3 of the interfaces are used.

# 4.2.2.2Processing

For processing each station uses two UniBoards, these could be replaced by a single UniBoard<sup>2</sup>.

#### 4.2.2.3System design

The lanes from the subracks are subband oriented. This would mean that each node on UniBoard<sup>2</sup> could process the data from a single lane, easing data communication.

#### 4.2.2.4Power

Using half the number of board the power reduction will be about 50%.

#### 5 What will it look like?

In Figure 25 a proposal for the setup of UniBoard<sup>2</sup> is shown. In this setup there are two DDR4 modules and six QSFP+ cages per FPGA. Each FPGA has up to 96 transceivers, 24 transceivers are used for the front panel interface and 72 connected to the backplane connector. On the backplane serial ADC can be used for data input, or breakout boards for application specific needs. On a default breakout board as shown in Figure 25 Hybrid Memory Cube (HMC) devices will be placed.

With this setup a platform will be made for the next version of devices where interconnection with busses of serial links are expected. This is e.g. seen in the memory area with HMC and with the serial standard for ADC JESD204B.



Figure 25 First setup UniBoard<sup>2</sup>

# 5.1 Meshing on UniBoard

By placing a mesh from the FPGA to the connectors, a simple orthogonal backplane connector (section 3.2.2.1) can be used to make all connection from one FPGA on the vertical boards to all FPGA's on the horizontal boards, see Figure 26.



Figure 26 UniBoard with connector mesh

The pinning reserved for the mesh is shown in Table 10. In this table T is transmitter, F is FPGA, M is mesh and R is receiver. In this case four lanes per FPGA are used to a single backplane connector. By placing two backplane connectors a full 8x8 board structure can be made. The bandwidth from each FPGA to the other 32 FPGA's is expected to be 320Gbps in total.

Table 10 Pinning for the mesh

|    | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10    | 11    | 12    |    |
|----|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----|
| ВС | TFOMO | TFOM1 | TFOM2 | TFOM3 |       |       |       |       |       |       |       |       | AB |
| EF | TF1MO | TF1M1 | TF1M2 | TF1M3 |       |       |       |       |       |       |       |       | DE |
| HJ | TF2MO | TF2M1 | TF2M2 | TF2M3 |       |       |       |       |       |       |       |       | GH |
| LM | TF3MO | TF3M1 | TF3M2 | TF3M3 |       |       |       |       |       |       |       |       | KL |
| PO |       |       |       |       | RF2M3 | RF3M3 | RF2M2 | RF3M2 | RF2M1 | RF3M1 | RF2M0 | RF3MO | NP |
| ST |       |       |       |       | RF0M3 | RF1M3 | RF0M2 | RF1M2 | RF0M1 | RF1M1 | RFOM0 | RF1M0 | RS |

A dedicated HMC-module can be made or systems where memory is needed or for single board systems. To enable these system the empty pairs can be used for the HMC interfaces. With two connectors per FPGA two HMC links can be made per FPGA enabling a storage bandwidth of <320Gbps (40GByte/s). The pinning for a single connector containing a single HMC link is shown in Table 11.

**Table 11 Pinning for single HMC link** 

|    | 1 | 2 | 3 | 4 | 5   | 6   | 7   | 8   | 9    | 10   | 11   | 12   |    |
|----|---|---|---|---|-----|-----|-----|-----|------|------|------|------|----|
| ВС |   |   |   |   | TD0 | RD0 | TD4 | RD4 | TD8  | RD8  | TD12 | RD12 | AB |
| EF |   |   |   |   | TD1 | RD1 | TD5 | RD5 | TD9  | RD9  | TD13 | RD13 | DE |
| HJ |   |   |   |   | TD2 | RD2 | TD6 | RD6 | TD10 | RD10 | TD14 | RD14 | GH |
| LM |   |   |   |   | TD3 | RD3 | TD7 | RD7 | TD11 | RD11 | TD15 | RD15 | KL |
| PO |   |   |   |   |     |     |     |     |      |      |      |      | NP |
| ST |   |   |   |   |     |     |     |     |      |      |      |      | RS |

The pairs used for the mesh can have other functions in the single board application e.g. QSFP+ IO. A full mesh can be made by connecting transmit to receive pins of a single connector on the extension board.

For Serial ADC applications the HMC receiver pins can be used for ADC input. The mesh pins can be used to make vertical connections form UniBoard-to-UniBoard. In this case two 320Gbps of ADC data can be fed into UniBoard.

The pins that have a function in neither Table 10 nor Table 11 can be used for general IO or other serial links.

#### **REMARKS**

With the long links for the mesh, the speed from FPGA-to-FPGA might be limited. The total expected trace length is in the order of 600mm.

In view of the added complexity of the board and longer links that such a mesh would imply, the decision was made to use the design as shown in figure 25.

# 6 Results Prototype

In Figure 27 an image of the prototype is shown.



Figure 27 Image of working proto type

No show stoppers where found during production and debug phase of the board. A few adjustment were needed to get the board ready for firmware testing. With a little bit of help from Altera the milestone of 1 Terabit per second was achieved, see [21]. With the use of a backplane test board all 96 transceivers for all four FPGA were tested. Besides the transceivers the DDR4 interfaces were successfully tested. More technical details of the testing is described in [22] which is added to this document as appendix A.

#### 7 Modification for Rev 2.0

A few schematic and PCB changes have been made. These changes consist of:

- connection changes in the power supply
- connection changed of test and control signals
- component value changes in power supplies and control signals
- power plane changes to enable full transceiver operation
- adding connector to enable fan cooling

# 8 Appendix A, Measurement Report

# UniBoard<sup>2</sup> Measurement Repor

|                                                  |     | Organisatie / Organization | Datum / Date |
|--------------------------------------------------|-----|----------------------------|--------------|
| Author(s):                                       |     |                            |              |
| Gijs Schoonderb<br>Sjouke Zwier<br>Leon Hiemstra | eek | ASTRON                     | 28-08-2015   |

# **Table of contents:**

| 1  | Intro | ductionduction             | . 35 |
|----|-------|----------------------------|------|
|    | 1.1   | Applicable documents (AD)  | . 35 |
|    | 1.2   | Reference documents (RD)   | . 35 |
|    | 1.3   | Abbreviations              | . 35 |
|    | 1.4   | Used Equipment             | . 35 |
| 2  | Impe  | edance                     | . 36 |
| 3  | Pow   | er up                      | . 37 |
|    | 3.1   | Power Sequencing           | . 38 |
|    | 3.2   | 1000uF Hold-up             | . 43 |
|    | 3.3   | Input Current              | . 43 |
| 4  | Bour  | ndary Scan                 | . 45 |
|    | 4.1   | Standard Tests             | . 45 |
|    | 4.2   | JTAG Functional Test (JFT) | . 46 |
| 5  | Ethe  | rnet communication         | . 47 |
| 6  | CON   | IFIG                       | . 48 |
| 7  | DDR   | 4 test                     | . 48 |
| 8  | SerD  | )es                        | . 49 |
| 9  | Cond  | clusion                    | . 51 |
| 10 | ) M   | emory Calibration Report   | 52   |

# 1 Introduction

In this report the measurement results for of the UniBoard2 proto type are presented. This document is used to verify the design principles of AD-1 and AD-2.

UniBoard<sup>2</sup> is a Joint Research Activity (JRA) in the RadioNet3 project, funded by the EC through the FP7 programme, under grant agreement no. 283393. The partners in this JRA are the Universities of Bordeaux and Orleans, INAF, MPG Bonn, the University of Manchester, ASTRON and JIVE.

# 1.1 Applicable documents (AD)

| Ref.nr. | Document number | Title                        |
|---------|-----------------|------------------------------|
| AD-1    | ASTRON-TN-042   | UniBoard2 HW Detailed Design |
| AD-2    | ASTRON-RP01484  | UniBoard2 SI simulation      |

#### 1.2 Reference documents (RD)

| Ref.nr. | Document number                    | Title                                     |
|---------|------------------------------------|-------------------------------------------|
| RD-1    | INFRA-2011-1.1.21<br>ASTRON-TN-040 | Deliverable 8.2, Hardware Design Document |
| RD-2    |                                    |                                           |
| RD-3    |                                    |                                           |

#### 13 Abbreviations

| 1.3 A | poreviations                                                 |
|-------|--------------------------------------------------------------|
| AD-n  | n <sup>th</sup> document in the list of Applicable Documents |
| BER   | Bit Error Rate                                               |
| BW    | Band Width                                                   |
| DC    | Direct Current                                               |
| DCA   | Digital Communication Analyzer                               |
| DC/DC | Converter from one DC voltage to another                     |
| DDR   | Double Data Rate (memory protocol)                           |
| FPGA  | Field Programmable Gate Array                                |
| JTAG  | Joint Test Action Group (Protocol for Hardware testing)      |
| PCB   | Printed Circuit Board                                        |
| POL   | Point of load                                                |
| PRBS  | Pseudo Random Binary Sequence                                |
| RD-n  | n <sup>th</sup> document in the list of Reference Documents  |
| PMBus | Power Management Bus (I2C interface to SMPS)                 |
| TDR   | Time Domain Reflectometry                                    |
| SPD   | Serial Presence Detect                                       |
| SMPS  | Switch mode Power Supply                                     |
|       |                                                              |

# 1.4 Used Equipment

In Table 12 the equipment used during the measurements are shown.

**Table 12 Used Equipment** 

| Equipment       | Туре            | Manufacturer         | ZWO number |
|-----------------|-----------------|----------------------|------------|
| Power supply    | N6705B          | Agilent Technologies | ZWO2104    |
| Oscilloscope    | Infinium        | HP                   | ZWO1797    |
| DMM             |                 | Fluke                |            |
| JTAG controller | JT3707 + JT2148 | JTAG Technologies    |            |
| Digital         | DCA-J           | Agilent              |            |
| Communication   |                 |                      |            |
| Analyser        |                 |                      |            |
| (TDS+Scope)     |                 |                      |            |
|                 |                 |                      |            |

# 2 Impedance

The impedance of the PCB is tests with the TDR. For this a test board is used. Only a few traces are measured, see Table 13.

**Table 13 TDR results** 

| Node | Туре                   | Net          | Layer | Measurement    | Impedance<br>[Ω] |
|------|------------------------|--------------|-------|----------------|------------------|
| 0    | Backplane<br>Interface | RING_0_TX_x5 | 14    | Node0_p1.bmp   | 100 -0.2 +17     |
| 0    | Backplane<br>Interface | RING_0_TX_x4 | 16    | Node0_p2.bmp   | 100 -0.1 +14     |
| 0    | Backplane<br>Interface | RING_0_RX_x5 | 7     | Node0_p3.bmp   | 100 -0.1 +13     |
| 0    | Backplane<br>Interface | RING_0_RX_x4 | 5     | Node0_p4.bmp   | 100 -0.1 +4      |
| 3    | Backplane<br>Interface | RING_1_RX_x5 | 7     | Pair1b.bmp     | 100 -0.1 +10     |
| 3    | Backplane<br>Interface | RING_1_RX_x2 | 5     | Pair2b.bmp     | 100 -0.1 +7      |
| 3    | Backplane<br>Interface | RING_1_TX_x1 | 14    | Pair3b.bmp     | 100 -0.1 +4      |
| 3    | Backplane<br>Interface | RING_1_TX_x0 | 16    | Pair4b.bmp     | 100 -0.1 +4      |
| 0    | QSFP Interface         | QSFP_1_TX_x3 | 3     | Qsfp_node0.bmp | 100 -0.2 +4      |
| 3    | QSFP Interface         | QSFP_5_TX_x3 | 5     | Qsfp_node3.bmp | 100 -0.5 +2      |

In Figure 28 the TDR measurement result for RING\_0\_TX\_x5 is shown. At the start of the measurement a large change in impedance is seen, this is due to the coupling of the test fixture. The line is not terminated, this means that the reflection at the end of the trace is high. Close to the end of the line a large bump is seen. From this measurement it can be seen that in impedance in close to 100Ohm, expect a small bump at the end. From the trace layout in Figure 29 it can be seen that this might be caused by the rounding of the mounting hole.



Figure 28 TDR measurement result for RING\_0\_TX\_x5



Figure 29 Layout of RING\_0\_TX\_5x

Given the length of the line, 812.78mm, the time for a single direction of 1.11ns, the  $\epsilon_r$  can be estimates at 3.3, close to the design parameter of 3.6. ((300e6\*1.11e-9)/182e-3)^2

# 3 Power up

Before power-up the impedances of the board are measured, see Table 14. Because all impedances are  $>10\Omega$  the power of the board is switch on and the supply voltages are measured with the digital multimeter. The Ericsson SMPS are incorporated with readout possibilities and a PMBus. The values read from the SMPSs are shown in Table 14.

**Table 14 Central Power supplies (unconfigured FPGAs)** 

| Supply            | Impedanc | Voltag | Expected | I2C     | Current | Max | Mes.  |
|-------------------|----------|--------|----------|---------|---------|-----|-------|
|                   | е        | е      | voltage  | Address |         | exp | Power |
| 48V Input         | 10kΩ     | 48V    | 48V      |         | 1.25A   |     | 60W   |
| UniBoard2         |          |        |          |         |         |     |       |
| Output PIM        | 600kΩ    | 47.7V  | 48V      |         |         |     |       |
| Hold-up PIM       | 1.4ΜΩ    | 75V    | 75V      |         |         |     |       |
| Output Bus        | 190kΩ    | 12V    | 12V      | 0x2C    | 4.5A    |     | 54W   |
| converter         |          |        |          |         |         |     |       |
| VCC_QSFP_N01      | 320kΩ    | 3.30V  | 3.3V     | 0x02h   | 2.3A    | 13A | 7.6W  |
| VCC_QSFP_N23      | 200kΩ    | 3.30V  | 3.3V     | 0x01h   | 0.7A    | 13A | 2.3W  |
| VDD_CLK           | 113Ω     | 2.50V  | 2.5V     | 0x0Dh   | 1.1A    | 2A  | 2.8W  |
| Switch IO (3V3)   | 500Ω     | 3.29V  | 3.3V     | 0x0Eh   | 0.4A    | 2A  | 1.32W |
| Switch Core (1V2) | 290Ω     | 1.20   | 1.2V     | 0x0Fh   | 1       | 3A  | 1.2W  |

From the measurement result it can be seen that 6W is used in the PIM, 2.5W in the switch, 10W in the QSFP logic and 2.8W for the clock, resulting in approximately 22W board overhead and 9.5W per FPGA.

Table 15 Node supplies with FPGAs in reset

| Supply     | Impedance | Voltage | Expected | I2C     | Current | Max | Mes.  |
|------------|-----------|---------|----------|---------|---------|-----|-------|
|            |           |         | voltage  | Address |         | exp | Power |
| 1V8        | 380Ω      | 1.79V   | 1.8V     | 0x11h   | 0.3A    | 2A  | 0.5W  |
| VCC (core) | 2Ω        | 0.96V   | 0.95V    | 0x01h   | 4.7A    | 30A | 4.5W  |
| VCCERAM    | 110Ω      | 0.95V   | 0.95V    | 0x0Dh   | 0.2A    | 10A | 0.2W  |
| VCCR_GXB   | 37Ω       | 1.00V   | 1.0V     | 0x0Eh   | 0.5A    | 10A | 0.5W  |
| VCCT_GXB   | 63Ω       | 1.00V   | 1.0V     | 0x0Fh   | 0.3A    | 5A  |       |
| VCC_BAT    | 14Ω       | 1.80V   | 1.8V     | 0x10h   | 0.7A    | 6A  |       |
| 1V8        |           |         |          |         |         |     |       |
| VCCH_GXB   | 14Ω       | 1.80V   | 1.8V     | Via     |         | 3A  |       |
|            |           |         |          | 0x10h   |         |     |       |
| VCCPT      | 15Ω       | 1.79V   | 1.8V     | Via     |         | 1A  |       |
|            |           |         |          | 0x10h   |         |     |       |
| VCC_Fuse   | 3.2kΩ     | 2.20V   | 2.2V     | -       |         |     |       |
| VCC DDR    | 96Ω       | 1.20V   | 1.2V     | -       |         |     |       |
| VTT DDR    | 300kΩ     | 0.60V   | 0.6V     | -       |         |     |       |
| VREF DDR   | 250kΩ     | 0.60V   | 0.6V     | -       |         |     |       |

From the power result it can be seen that 6W is used by the measured power supply, this means that 3.5W is used for the DDR4 power supplies and the POL overhead.

### 3.1 Power Sequencing

As described in AD-1 the different powers of the FPGA should be switch on at a given sequence. In Figure 30 the sequence is repeated.



Figure 30 Power Sequence Reference diagram

The measurement is done to verify the on-sequence, see Figure 31 and Figure 32.



Figure 31 Sequence ON showing on/off (yellow), Vcc\_core (green), Vcc\_eram (purple) and Vcc\_bat (pink)



Figure 32 Sequence ON showing on/off (yellow), Vcc\_bat (pink), Vcc\_io (green) and Config (purple)

From this measurement it can be seen that all supplies switch on according the specifications. The release of the CONFIG signal, indicating configuration of the FPGA, is added. The CONFIG line is released 250ms after the last power is available.

The measurement result of the off-sequence measurement is shown in Figure 33 and Figure 34.



Figure 33 Sequence OFF showing on/off (yellow), Vcc\_core (green), Vcc\_eram (purple) and Vcc\_Bat (pink)



Figure 34 Sequence OFF showing on/off (yellow), Vcc\_bat (pink), Vcc\_IO (green) and CONFIG (purple)

From this measurement it can be seen that the FPGAs are placed in reset direct at a falling edge of the ON/OFF, thereby reducing the power consumption. The IO voltage is switched off first. The capacitance of the IO voltage however is on the large side, this result in holding up the power longer than wanted.

To see the effect of switching off the power to UniBoard, first a measurement is done by button switch off, see Figure 35, where a marker is placed at the switch off of the last power supply the  $VCC_{core}$ .



Figure 35 Board off with button showing on/off (yellow), Vcc\_core (green), Vcc\_IO (pink), VDD\_input (purple)

After this measurement, the measurement is repeated by switching off the power to the board. The result is shown in Figure 36.



Figure 36 Board off with power off, showing on/off (yellow), Vcc\_core (green), Vcc\_IO (pink), VDD\_input (purple)

From this measurement it can be seen that the core supply is switch off before the expected time. In

Figure 37 the power at the input of the VCC<sub>core</sub> POL is shown, together with the input power of the Bus converter (converting 48V to 12V).



Figure 37 Switch off, showing on/off (yellow), Vcc\_12V (green), Vcc\_UNB2in (purple), Vcc\_brick\_in (pink)

From this measurement it can be seen that input voltage of the bus converter is not extended beyond the input of the PIM. The hold-up function of the PIM should take care of this.

For the calculation of the hold-up time the power to hold up is estimated 60W, the time for hold-up is estimated 40ms. This results with the current capacitor of 330uF is:

$$C_{holdup} = (2*P*t) / (Vhu^2 - Vth^2) \rightarrow$$
  
 $t = C*(Vup^2 - Vth^2)/(2*P) = 330uF(75^2-36^2)/(2*60) = 11ms$ 

### 3.2 1000uF Hold-up

The 330uF Hold up capacitor is replaced by a 1000uF capacitor for testing. With all FPGAs configured, UniBoard² running at 200W, the power supply is switched off. The result of the measurement is shown in 12V and the VCC $_{core}$  are shown in Figure 38, in this figure the green line is ON/OFF, the pink line VCC $_{core}$ , the purple line 12V. The dashed marker indicates the falling edge of the VCC $_{core}$  during normal on-off by the button. From this measurement it can be seen that the hold on is at the limit (12V is going down before the core is switched off, but the there is enough capacitance in the 12V bus supply to keep the level at the input of the SMPS enough to enable 12V conversion to 0.9V for VCC $_{core}$ .



Figure 38 Hold up with 1000uF capacitor.

This measurement shows that a 1000uF capacitor is the best value to have a controller down sequence.

## 3.3 Input Current

The input current of the board is measured by the power N6705B DC Power Analyser, see Figure 39.



Figure 39 UniBoard<sup>2</sup> input current at Switch ON

From this measurement the sequencing of the power supplies is clearly seen. The measurement is repeated at switch off, see Figure 40



Figure 40 Switch off

With a marker placed at switch off and one at the end point the energy is measured during shut down. The capacitance needed to switch down can be estimated with the definition of a capacitor: 1F = 1A/s/V.

 $(11*10^{-6})*3600/75V = 528uF$ 

# 4 Boundary Scan

#### 4.1 Standard Tests

For the boundary scan JTAG Technologies Provision version CD21 has been used. For the test of the interconnections between the FPGA and the DDR4 sockets test board have been made where  $100\Omega$  resistors are placed between two lines, see Figure 41.



Figure 41 DDR4 Test board

For boundary scan the following tests are made.

#### **Table 16 JTAG tests**

| Test              | Testing                                                      |  |  |  |  |  |
|-------------------|--------------------------------------------------------------|--|--|--|--|--|
| Interconnection   | Infrastructure, and general interconnection tests.           |  |  |  |  |  |
| Inter_ddr         | General interconnection + all DDR4 connectors                |  |  |  |  |  |
| Transc_QSFP_node0 | Transceiver test for Ring transceivers between FPGAs and all |  |  |  |  |  |
|                   | QSFP cages for Node 0                                        |  |  |  |  |  |
| Transc_QSFP_node1 | Transceiver test for Ring transceivers between FPGAs and all |  |  |  |  |  |
|                   | QSFP cages for Node 0                                        |  |  |  |  |  |
| Transc_QSFP_node2 | Transceiver test for Ring transceivers between FPGAs and all |  |  |  |  |  |
|                   | QSFP cages for Node 0                                        |  |  |  |  |  |
| Transc_QSFP_node3 | Transceiver test for Ring transceivers between FPGAs and all |  |  |  |  |  |
|                   | QSFP cages for Node 0                                        |  |  |  |  |  |

The testability can be calculated, the result is shown below.

| Net Statistics                                              | Festability % Coverage %         |
|-------------------------------------------------------------|----------------------------------|
| Total number of nets calculated  Nets in Netlist            | 4893 100% 4893 100%<br>4792 4792 |
| Nets added for not connected pi<br>Nets ignored by user (-) | ns (+) 101 101<br>0 0            |
| Nets ignored by user (-)                                    | 0 0                              |
| Sensed and driven nets                                      | 2041 42% 1471 30%                |
| Sensed by BSCAN device (direct                              | ,                                |
| Sensed through transp. device (                             | ,                                |
| Sensed Pwr / Gnd nets                                       | 45 45                            |
| Implicitly tested nets                                      | 112 102                          |
| Nets Covered 100% by user                                   | 0                                |
| Nets Covered by JFT                                         | 0                                |
| Nets Covered by Imported CSV                                | File 0                           |
| Nets not tested by BSCAN                                    | 2852 58% 3422 70%                |
| Device statistics report                                    | <del></del>                      |
| Device Statistics                                           | Testability % Coverage %         |

| Total number of devices calculated |           | 5649 10             |        | 49 100% |
|------------------------------------|-----------|---------------------|--------|---------|
| Devices in Netlist                 | 5649      | 5649                | 9      |         |
| Devices excluded by Bill of Mater  | rials (-) | 0                   | 0      |         |
| Ignored Devices (-)                | 0         | 0                   |        |         |
| Devices involved in a BScan test   |           | 535 9%              | 6 2881 | 51%     |
| BSCAN devices                      | 6         | 6                   |        |         |
| Devices under test (50% - 100%)    | )         | 529                 | 2874   |         |
| Devices under test (0% - 49%)      |           | 0                   | 1      |         |
| Devices Covered 100% by user       |           |                     | 0      |         |
| Devices Covered by JFT             |           |                     | 0      |         |
| Devices Covered by Imported CS     | SV File   |                     | 0      |         |
| Devices without BSCAN testability  |           | 5114 9 <sup>-</sup> | 1% 27  | 68 49%  |
| Devices without model              | 11        | 00 1                | 100    |         |
| Devices not testable according to  | user      | 0                   | 0      |         |
| Capacitors                         | 3319      | 699                 |        |         |
| Other devices                      | 695       | 969                 |        |         |

### 4.2 JTAG Functional Test (JFT)

With Provision's JFT all IO pins can be accessed by Python scripts. This has been used to test all I2C interfaces on UniBoard<sup>2</sup>. In Table 17 the tests are described.

**Table 17 JFT tests** 

| Test          | Testing                                                                    |
|---------------|----------------------------------------------------------------------------|
| JFT_EEPROM    | Testing the EEPROM by writing and reading data                             |
| JFT_LOC_PWR   | Testing the interface to the local power supplies. This is done by         |
|               | reading out the current and the voltage of one POL                         |
| JFT_BRD_PWR   | Test to verify the Node2 connection to the central power and the           |
|               | temperature sensor of the Ethernet Switch.                                 |
| JFT_SPD_DDR   | Verify the I2C interface to the SPD of the DDR4 module. This is            |
|               | down for the temperature sensor and by reading out the part number         |
| JFT_QSFP_READ | Verify the six individual I2C interface to the six modules per FPGA.       |
|               | This is done by reading out the temperature and the voltage on the module. |

In Figure 42 and example of an I2C sequence is shown. In this measurement it can be seen that the minimal clock is 63Hz. For some parts (like the BMR464 used for  $VCC_{core}$ ) this clock is to slow, these SMPSs have to be tested with functional hardware.



Figure 42 JFT I<sup>2</sup>C sequence

As an example the test result for the SPD DDR is shown in Figure 43.

| Node Slot | TEMP  | Part Number       |     |
|-----------|-------|-------------------|-----|
|           |       |                   |     |
| 0   0     | 35.75 | 18ASF1G72HZ-2G1A1 | · 1 |
| 0   1     | 36.25 | 18ASF1G72HZ-2G1A1 | 1   |
| 1   0     | 36.50 | 18ASF1G72HZ-2G1A1 | 1   |
| 1   1     | 37.00 | 18ASF1G72HZ-2G1A1 | 1   |
| 2   0     | 31.25 | 18ASF1G72HZ-2G1A1 | 1   |
| 2   1     | 35.25 | 18ASF1G72HZ-2G1A1 | 1   |
| 3   0     | 27.00 | 18ASF1G72HZ-2G1A1 | 1   |
| 3   1     | 29.50 | 18ASF1G72HZ-2G1A1 | 1   |
|           |       |                   |     |

Figure 43 JFT test for the DDR SDP.

### 5 Ethernet communication

The Ethernet communication is tested with the test firmware loaded in the FPGA. The result is shown Figure 44.

```
[2015:06:05 16:38:05] - (0) UTIL - >>> Title : Utility for pi_system_info.py on UNB-[63], FN-[0, 1, 2, 3] [2015:06:05 16:38:05] - (3) UTIL - SI - UNB-63, FN-0: Design = unb2_test_1GbE [2015:06:05 16:38:05] - (3) UTIL - SI - UNB-63, FN-1: Design = unb2_test_1GbE [2015:06:05 16:38:05] - (3) UTIL - SI - UNB-63, FN-2: Design = unb2_test_1GbE [2015:06:05 16:38:05] - (3) UTIL - SI - UNB-63, FN-3: Design = unb2_test_1GbE
```

Figure 44 Results of first communication

The status of the switch is readout by the RS232 control interface. For this Realterm is used where the interface is set to 8 data bits, 1 stop bit, no parity 9600 baud. The result is shown in Figure 45 and Figure 46.

```
UniBoard2 0.4
UniBoard2
Length Jumbo: 9600
Chip id: 273890e9
```

PRT0 PRT1 PRT2 PRT3 PRT4 PRT5 PRT6 PRT7 PRT8 PRT9 PRT10 PRT11 PRT12 PRT13 N0-0 N0-1 N1-0 N1-1 N2-0 N2-1 N3-0 N3-1 C ll C ul C lm C um C lr C ur

| Port | Status |      | RXOCT      | TXOCT      |
|------|--------|------|------------|------------|
| 0    | Port   | Up   | 0x00000000 | 0x00000000 |
| 1    | Port   | Down | 0x00000000 | 0x00000000 |
| 2    | Port   | Up   | 0x0000000  | 0x00000000 |
| 3    | Port   | Down | 0x0000000  | 0x00000000 |
| 4    | Port   | Up   | 0x0000000  | 0x0000000  |
| 5    | Port   | Down | 0x0000000  | 0x00000000 |
| 6    | Port   | Up   | 0x0000000  | 0x0000000  |
| 7    | Port   | Down | 0x0000000  | 0x00000000 |
| 8    | Port   | Down | 0x0000000  | 0x00000000 |
| 9    | Port   | Up   | 0x0000000  | 0x0000000  |
| 10   | Port   | Down | 0x0000000  | 0x00000000 |
| 11   | Port   | Down | 0x0000000  | 0x0000000  |
| 12   | Port   | Down | 0x0000000  | 0x00000000 |
| 13   | Port   | Down | 0x0000000  | 0x00000000 |

Figure 46 Result of 'S' command on the switch.

### 6 CONFIG

After some modification Quartus II programmer is used to program the flash.

#### 7 DDR4 test

The DDR4 are tested with the use of the EMIF toolkit. An example of the calibration report is shown in chapter 10.

After all parameters are set for the calibration the design is included in UNB2\_test. A python test case is used to test both memory banks for large package sizes.

```
[2015:08:13 15:48:26] - (3) TB - >>>
[2015:08:13 15:48:26] - (1) TB - >>> Title : Test case for the unb2_test_ddr design with MB = I on UNB-[0], GN-[0, 1, 2, 3]:
[2015:08:13 15:48:26] - (3) TB - >>>
[2015:08:13 15:48:26] - (3) TB - >>>
[2015:08:13 15:48:26] - (3) TB - >>> Rep-0
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-0: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-1: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-2: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-2: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-3: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - UNB-0, FN-3: RX_SEQ read_result = Data OK
[2015:08:13 15:48:33] - (3) TB - >>>
[2015:08:13 15:48:33] - (3) TB - >>>
[2015:08:13 15:48:33] - (0) TB - >>> Test bench result: PASSED
[2015:08:13 15:48:33] - (0) TB - >>> Runtime=6.220504 seconds (0.001728 hours)
[2015:08:13 15:48:33] - (3) TB - >>>
```

Figure 47 DDR test results

The reference clock for the DDR is shown in Figure 48.



Figure 48 DDR reference clock

### 8 SerDes

The transceivers are tested with the Transceiver Toolkit as part of Quartus II version 15.0.1. The power consumption for different number of used transceivers is shown in Table 18. For the measurement of 96 transceivers the VCCR-GXB and VCCT\_GXB are connected together underneath the FPGA.

Table 18 Power consumption as function of the number of transceivers.

| Power    | MAX Current | Boot image | 24     | 48     | 72     | 96     |
|----------|-------------|------------|--------|--------|--------|--------|
|          |             |            | Trans. | Trans. | Trans. | Trans. |
| VCCR_GXB | 12A         | 0.6A       | 4.0A   | 7.3A   | 11.6A  | 10.6A  |
| VCCT_GXB | 12A         | 0.3A       | 1.2A   | 1.9A   | 3.2A   | 9.3A   |
| VCC_CORE | 40A         | 2.1A       | 4.6A   | 7.3A   | 9.7A   | 14.7A  |
| VCC_BAT  | 12A         | 1.0A       | 2.8A   | 4.3A   | 6.4A   | 8.5A   |
| VCC_ERAM | 12A         | 0.2A       | 0.2A   | 0.2A   | 0.2A   | 0.3A   |
| VCC_IO   | 12A         | 0.1A       | 0.1A   | 0.1A   | 0.1A   | 0.1A   |

From this table it can be seen that the power consumption per transceiver is in the order of 0.2W (0.15W for Rx and 0.05W for Tx).

For the transceiver test the build in PRBS 31 is used. On the receiver a checker is used to calculate the BER. In Figure 49 the interfaces of the ring are shown. The output is sorted on Bit errors.



Figure 49 Screen Dump Transceiver Toolkit with Ring interfaces

In this table the net names are:

Loopback\_link\_slave\_?000\_address\_<bus>\_<lane>\_<node-1> Bus 0, ring 0 Bus 1, ring 1 Bus 2 and 3 backplane

From this result it can be seen that some links have a high BER >1e-12. These links are from Node 0 ring 0 (Loopback\_link\_slave\_5000\_address\_0\_<lane>) to/from Node 3 ring 1 (Loopback\_link\_slave\_6000\_address\_1\_<lane>\_2). The link between node 0 and node 3 on UNB2\_TB has long traces. Because the test board uses standard FR4, it shows that not all PCB loss can be compensated. On the ES1 devices the 10GbE option for Backplanes is not available. Better results are expected with this option. Another option to compensate for PCB losses is the use of a better PCB material like Megtron6. On the production boards with production Arria10 device will be placed better PLLs and some extra options for compensating loss can be used. It is expected with the right material and the production devices it should be possible to make a links of 300 mm

In Figure 50 the results for the QSFP and the backplane interfaces are shown. On the QSFP a combination of copper and optical interfaces are used. All interfaces are within the expected 1e-12.



Figure 50 Screen dump QSFP + Backplane

In this table the names of the transceivers are: Loopback\_link\_slave\_?000\_address\_<bus>\_<lane>\_<node-1> Bus 0 and 1 QSFP cages 0-6 Bus 2 and 3 backplane

#### 9 Conclusion

After some modification UniBoard<sup>2</sup> is working. The ES device maximum speed of 1600MT/s of the Memories are reached. Some errors in transceiver lines are seen. It is expected with that with the production devices speeds of 10Gbps can be achieved.!

## 10 Memory Calibration Report

Calibration Report report for ed\_synth
Thu Aug 13 10:56:48 2015
Quartus II 64-Bit Version 15.0.2 Build 153 07/15/2015 SJ Full Version

; Table of Contents ;

- 1. Legal Notice
- 2. Calibration Status Per Group
- 3. DQ Pin Margins Observed During Calibration
- 4. DQS Pin Margins Observed During Calibration
- 5. FIFO Settings
- 6. Latency Observed During Calibration
- 7. Address/Command Margins Observed During Calibration
- 8. VREF Margins Observed During Calibration

; Legal Notice ;

Copyright (C) 1991-2015 Altera Corporation. All rights reserved. Your use of Altera Corporation's design tools, logic functions and other software and tools, and its AMPP partner logic functions, and any output files from any of the foregoing (including device programming or simulation files), and any associated documentation or information are expressly subject to the terms and conditions of the Altera Program License Subscription Agreement, the Altera Quartus II License Agreement, the Altera MegaCore Function License Agreement, or other applicable license agreement, including, without limitation, that your use is for the sole purpose of programming logic devices manufactured by Altera and sold by Altera or its authorized distributors. Please refer to the applicable agreement for further details.

```
+-----+
; Calibration Status Per Group;
+-----+
; Group; Status; Error Stage;
+-----+
; 0 ; Pass; N/A;
; 1 ; Pass; N/A;
; 2 ; Pass; N/A;
; 3 ; Pass; N/A;
; 4 ; Pass; N/A;
; 5 ; Pass; N/A;
; 6 ; Pass; N/A;
; 7 ; Pass; N/A;
; 8 ; Pass; N/A;
```

```
; -180 to 180 ; 12
; 8
                                 ; -162 to 171
                                                   ; 603
      ; -196 to 196
                     ; 6
                                                   ; 594
; 9
                               ; -162 to 171
                       ; 7
; 10
       ; -180 to 184
                                  ; -144 to 153
                                                   : 595
      ; -184 to 184
                                  ; -162 to 171
                       ; 6
                                                    : 597
; 11
      ; -188 to 192
                                  ; -162 to 171
                                                    ; 598
; 12
; 13
       ; -204 to 204
                       ; 0
                                  ; -162 to 162
                                                    : 595
; 14
       ; -180 to 184
                       ; 4
                                  ; -162 to 171
                                                    ; 596
                                  ; -162 to 162
; 15
       ; -192 to 192
                       ; 14
                                                    ; 600
       ; -168 to 168
                      ; 14
                                  ; -162 to 162
                                                     . 552
; 16
       ; -184 to 184
                                                    ; 548
; 17
                       ; 3
                                  ; -162 to 162
       ; -180 to 184
                                  ; -162 to 171
                                                    : 551
: 18
                       : 9
       ; -184 to 188
                                  ; -162 to 171
; 19
                                                    : 549
; 20
       ; -176 to 180
                       ; 7
                                  ; -162 to 171
                                                    ; 549
; 21
       ; -188 to 192
                       ; 0
                                  ; -171 to 180
                                                    ; 550
; 22
       ; -172 to 176
                       ; 9
                                  ; -171 to 171
                                                    ; 554
; 23
                                                    ; 552
       ; -184 to 188
                       ; 8
                                  ; -180 to 189
       ; -172 to 172
; 24
                       ; 13
                                   ; -171 to 180
                                                    ; 622
. 25
       : -184 to 184
                                  · -162 to 162
                       . 0
                                                    613
                                  ; -189 to 189
; 26
       ; -176 to 176
                                                    ; 619
                                  ; -171 to 171
; 27
       ; -168 to 172
                       ; 20
                                                    ; 624
; 28
       ; -184 to 184
                       ; 19
                                   ; -171 to 171
                                                    ; 621
; 29
       ; -180 to 184
                       ; 7
                                  ; -180 to 180
                                                    ; 618
; 30
                                  ; -180 to 180
       ; -180 to 180
                       ; 15
                                                    ; 621
; 31
       ; -176 to 176
                                   ; -180 to 180
                                                    ; 624
: 32
       ; -168 to 172
                                  ; -189 to 189
                                                    : 653
                       : 6
; 33
       ; -176 to 180
                                  ; -189 to 189
                                                    ; 650
: 34
       ; -156 to 160
                       ; 0
                                   ; -180 to 180
                                                    : 652
                                  ; -180 to 180
       : -176 to 176
; 35
                       ; 9
                                                    : 655
      ; -168 to 168
                                  ; -171 to 171
; 36
                                                    : 654
; 37
       ; -164 to 168
                                  ; -171 to 171
                                                    ; 651
                       ; 3
; 38
       ; -172 to 172
                                  ; -189 to 198
                                                    ; 654
: 39
       ; -184 to 188
                       ; 8
                                  ; -180 to 189
                                                    : 655
; 40
       ; -172 to 172
                                  ; -180 to 189
                                                    ; 719
; 41
       ; -172 to 172
                       ; 7
                                   ; -180 to 180
                                                    ; 719
                       ; 0
                                  ; -171 to 171
       : -168 to 172
: 42
                                                    : 718
; 43
      ; -184 to 184
                                  ; -180 to 189
                                                    ; 719
                       ; 8
; 44
       ; -172 to 176
                       ; 4
                                  ; -171 to 180
                                                    ; 719
; 45
       ; -192 to 192
                       ; 16
                                   ; -189 to 198
                                                     ; 722
                                  ; -171 to 180
; 46
       ; -180 to 180
                       : 2
                                                    : 716
; 47
                                  ; -162 to 162
       ; -180 to 184
                       ; 12
                                                    ; 721
; 48
       ; -176 to 180
                       ; 3
                                  ; -171 to 171
                                                    ; 667
                                  ; -180 to 180
                       ; 4
                                                    ; 670
: 49
       : -176 to 180
; 50
       ; -156 to 160
                      ; 11
                                  ; -171 to 171
                                                    ; 671
; 51
       ; -176 to 176
                       ; 0
                                  ; -162 to 171
                                                    : 666
; 52
       ; -168 to 168
                                  ; -162 to 171
                                                    ; 667
; 53
      ; -180 to 184
                       ; 10
                                  ; -171 to 171
                                                    ; 667
       ; -180 to 184
                                                    ; 666
; 54
                                  ; -189 to 198
; 55
       ; -176 to 176
                       ; 0
                                  ; -144 to 153
                                                    : 665
                                  ; -171 to 171
       ; -176 to 176
; 56
                       ; 7
                                                    ; 671
; 57
       ; -176 to 180
                      ; 21
                                  ; -162 to 162
                                                    : 677
       ; -180 to 184
                                  ; -162 to 171
; 58
                       ; 10
                                                    ; 678
; 59
       ; -192 to 196
                       ; 21
                                   ; -162 to 162
                                                    ; 680
; 60
      ; -184 to 184
                                  ; -144 to 153
                                                    : 675
                       ; 10
       ; -196 to 200
; 61
                      ; 15
                                  ; -153 to 153
                                                    : 681
       ; -188 to 192
                       ; 0
                                                    ; 669
; 62
                                  ; -153 to 162
       ; -184 to 184
                                   ; -153 to 162
: 63
                       ; 21
                                                    : 679
; 64
       ; -176 to 176
                                  ; -171 to 180
                                                    ; 637
                      ; 6
       ; -172 to 176
                                  ; -162 to 171
; 65
                                                    ; 634
; 66
       ; -172 to 176
                       ; 4
                                  ; -162 to 162
                                                    ; 631
; 67
       ; -164 to 168
                       ٠7
                                  ; -189 to 189
                                                    ; 635
; 68
                      ; 12
       ; -164 to 168
                                                    : 636
                                   ; -171 to 171
       ; -164 to 168
; 69
                       ; 0
                                  ; -162 to 162
                                                    ; 634
      : -184 to 184
                       ; 14
                                   : -180 to 180
                                                    ; 637
; 70
      ; -180 to 180
                      ; 10
                                   ; -171 to 180
                                                   ; 636
```

```
; DQS Pin Margins Observed During Calibration ; ; ; DQS Pin ; DQS Read Margin (ps) ; DQS Input Delay ; DQS Write Margin (ps) ; DQS Output Delay ; DQS Enable Delay ;
```

```
; FIFO Settings
+----+
; Group ; VFIFO Setting ; LFIFO Setting ;
; 0 ; 0 ; 25
        ; 25
   ; 0
; 2
   ; 0
           ; 29
; 3
   ; 0
           ; 29
           ; 29
; 4
   ; 0
   ; 0
          ; 29
; 5
   ; 0
; 6
           ; 29
: 7
   : 0
           : 29
;8 ;0
           ; 29
```

```
; Latency Observed During Calibration ; +----+
; Type ; Latency ; +----+
; Read ; 13 ; ; Write ; 4 ;
```

**+----**+

; Address/Command Margins Observed During Calibration ;

```
+----+
; Pin ; Margin (ps) ; Delay Setting
+-----+
+-----; CKE_0 ; Uncalibrated ; 1933
; CKE_1 ; n/a ; n/a
; CKE_2 ; n/a ; n/a
; CKE_3 ; n/a ; n/a
; ODT_0 ; Uncalibrated ; 1933
; ODT_1 ; n/a ; n/a
                        ; n/a
; ODT_2 ; n/a
; ODT_3 ; n/a
                        ; n/a
; RESET ; Uncalibrated ; 1933
; ACT ; -612 to 603 ; 1932
; CS_0 ; -540 to 540 ; 1933
; CS_1 ; n/a ; n/a ;
; CS_2 ; n/a
                      ; n/a
; CS_3 ; n/a ; n/a
; C_0 ; n/a ; n/a
; C_1 ; n/a ; n/a
        ; n/a
; C_2
                     ; n/a
; BA_0 ; -594 to 585 ; 1918
; BA_1 ; -585 to 576 ; 1913
; BG_0 ; -585 to 585 ; 1915
; BG_1 ; -594 to 585 ; 1921
; ADD_0 ; -603 to 594 ; 1915
; ADD_1 \,; -594 to 594 \,; 1916
; ADD_2 ; -585 to 585 ; 1922
; ADD_3 ; -612 to 612 ; 1919
; ADD_4 ; -612 to 603 ; 1916
; ADD_5 ; -603 to 603 ; 1915
; ADD_6 ; -594 to 585 ; 1914
```

```
; ADD_7 ; -603 to 594 ; 1918
; ADD_8 ; -612 to 603 ; 1915 ; ADD_9 ; -594 to 585 ; 1911
; ADD_10 ; -576 to 576 ; 1922
; ADD_11 ; -603 to 603 ; 1912
; ADD_12 ; -585 to 585 ; 1919 ; ADD_13 ; -594 to 594 ; 1911
; ADD_14 ; -594 to 594 ; 1921
; ADD_15 ; -612 to 603 ; 1934 ; ADD_16 ; -603 to 594 ; 1911
; ADD_16 ; -603 to 594 ; 1911
; ADD_17 ; n/a ; n/a
; ADD_18 ; n/a ; n/a
; ADD_19 ; n/a ; n/a
; PAR_IN ; -603 to 603 ; 1918
; ALERTO_N ; n/a ; n/a
; ALERT1_N ; n/a ; n/a
; CK0 ; n/a ; n/a
; CK0_N ; n/a ; n/a
; CK1 ; n/a ; n/a
; CK1_N ; n/a ; n/a
; CK2 ; n/a ; n/a
; CK2_N ; n/a
                       ; n/a
; CK3 ; n/a ; n/a
; CK3_N ; n/a ; n/a
; VREF Margins Observed During Calibration
+-----+
; Group ; VREFIN margin ; VREFIN setting ; VREFOUT margin ; VREFOUT setting ;
+-----+
```

--+-----