

# **CENICS 2018**

The Eleventh International Conference on Advances in Circuits, Electronics and Micro-electronics

ISBN: 978-1-61208-664-4

September 16 - 20, 2018

Venice, Italy

## **CENICS 2018 Editors**

Claus-Peter Rückemann, Leibniz Universität Hannover / Westfälische Wilhelms-Universität Münster / North-German Supercomputing Alliance (HLRN), Germany

## **CENICS 2018**

## Forward

The Eleventh International Conference on Advances in Circuits, Electronics and Microelectronics (CENICS 2018), held between September 16, 2018 and September 20, 2018 in Venice, Italy, continued a series of events initiated in 2008, capturing the advances on special circuits, electronics, and micro-electronics on both theory and practice, from fabrication to applications using these special circuits and systems.

Innovations in special circuits, electronics and micro-electronics are the key support for a large spectrum of applications. The conference is focusing on several complementary aspects and targets the advances in each on it: signal processing and electronics for high speed processing, micro- and nano-electronics, special electronics for implantable and wearable devices, sensor related electronics focusing on low energy consumption, and special applications domains of telemedicine and eHealth, bio-systems, navigation systems, automotive systems, home-oriented electronics, bio-systems, etc. These applications led to special design and implementation techniques, reconfigurable and self-reconfigurable devices, and require particular methodologies to be integrated on already existing Internet-based communications and applications. Special care is required for particular devices intended to work directly with human body (implantable, wearable, eHealth), or in a human-close environment (telemedicine, house-oriented, navigation, automotive). The mini-size required by such devices confronted the scientists with special signal processing requirements.

We take here the opportunity to warmly thank all the members of the CENICS 2018 technical program committee, as well as all the reviewers. The creation of such a high quality conference program would not have been possible without their involvement. We also kindly thank all the authors who dedicated their time and effort to contribute to CENICS 2018. We truly believe that, thanks to all these efforts, the final conference program consisted of top quality contributions.

We also gratefully thank the members of the CENICS 2018 organizing committee for their help in handling the logistics and for their work that made this professional meeting a success.

We hope that CENICS 2018 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in the field of circuits, electronics and micro-electronics. We also hope that Venice, Italy provided a pleasant environment during the conference and everyone saved some time to enjoy the unique charm of the city.

### **CENICS 2018 Chairs**

### **CENICS Steering Committee**

Falk Salewski, Muenster University of Applied Sciences, Germany Chun-Hsi Huang, University of Connecticut, USA Vladimir Privman, Clarkson University - Potsdam, USA Diego Ettore Liberati, National Research Council of Italy, Italy Julio Sahuquillo, Universitat Politècnica de València, Spain Sergei Sawitzki, FH Wedel (University of Applied Sciences), Germany Manuel José Cabral dos Santos Reis, University of Trás-os-Montes e Alto Douro, Portugal Bartolomeo Montrucchio, Politecnico di Torino, Italy Petr Hanáček, Brno University of Technology, Czech Republic

## **CENICS Research/Industry Committee**

John Vardakas, Iquadrat Informatica, Barcelona, Spain Laurent Fesquet, TIMA laboratory | Grenoble Institute of Technology, France Christian Wögerer, PROFACTOR GmbH, Austria Miroslav Velev, Aries Design Automation, USA Ivo Stachiv, Institute of Physics | Czech Academy of Sciences, Prague, Czech Republic / Harbin Institute of Technology | Shenzhen Graduate School, Shenzhen, China Amir Shah Abdul Aziz, TM Research & Development, Malaysia

## CENICS 2018 Committee

## **CENICS Steering Committee**

Falk Salewski, Muenster University of Applied Sciences, Germany Chun-Hsi Huang, University of Connecticut, USA Vladimir Privman, Clarkson University - Potsdam, USA Diego Ettore Liberati, National Research Council of Italy, Italy Julio Sahuquillo, Universitat Politècnica de València, Spain Sergei Sawitzki, FH Wedel (University of Applied Sciences), Germany Manuel José Cabral dos Santos Reis, University of Trás-os-Montes e Alto Douro, Portugal Bartolomeo Montrucchio, Politecnico di Torino, Italy Petr Hanáček, Brno University of Technology, Czech Republic

## **CENICS Research/Industry Committee**

John Vardakas, Iquadrat Informatica, Barcelona, Spain Laurent Fesquet, TIMA laboratory | Grenoble Institute of Technology, France Christian Wögerer, PROFACTOR GmbH, Austria Miroslav Velev, Aries Design Automation, USA Ivo Stachiv, Institute of Physics | Czech Academy of Sciences, Prague, Czech Republic / Harbin Institute of Technology | Shenzhen Graduate School, Shenzhen, China Amir Shah Abdul Aziz, TM Research & Development, Malaysia

## **CENICS 2018 Technical Program Committee**

Adel Al-Jumaily, University of Technology, Sydney, Australia Mohammad Amin Amiri, Malek Ashtar University of Technology, Islamic Republic of Iran Amir Shah Abdul Aziz, TM Research & Development, Malaysia Timm Bostelmann, FH Wedel (University of Applied Sciences), Germany Hamza Bouzeria, Constantine - 1- University, Algeria Khalid Bouziane, Université Internationale de Rabat, Morocco David Cordeau, XLIM UMR CNRS 7252, France Nicola D'Ambrosio, Laboratori Nazionali del Gran Sasso (LNGS) – INFN, Italy Jamal Deen, Academy of Science - Royal Society of Canada / McMaster University, Canada Javier Diaz-Carmona, Technoligical Institute of Celaya, Mexico Alie El-Din Mady, United Technologies Research Center, Cork, Ireland Diego Ettore Liberati, National Research Council of Italy, Italy Paulo Felisberto, LARSyS | University of Algarve, Portugal Laurent Fesquet, TIMA laboratory | Grenoble Institute of Technology, France Kelum Gamage, Glasgow University, UK Patrick Girard, LIRMM, France Petr Hanáček, Brno University of Technology, Czech Republic Houcine Hassan, Universitat Politècnica de València, Spain Chun-Hsi Huang, University of Connecticut, USA

Kun Mean Hou, LIMOS - UMR 6158 - CNRS, France Jose Hugo Barron-Zambrano, Universidad Autonoma de Tamaulipas, Mexico Wen-Jyi Hwang, National Taiwan Normal University, Taiwan Manuel José Cabral dos Santos Reis, University of Trás-os-Montes e Alto Douro, Portugal Eric Kerherve, IMS Laboratory, France Oliver Knodel, Technische Universität Dresden, Germany Junghee Lee, University of Texas at San Antonio, USA Kevin Lee, Nottingham Trent University, UK Yo-Sheng Lin, National Chi Nan University, Taiwan David Lizcano, Madrid Open University (UDIMA), Spain Cristina Meinhardt, Federal University of Rio Grande (FURG), Brazil Harris Michail, Cyprus University of Technology (CUT), Cyprus Amalia Miliou, Aristotle University of Thessaloniki, Greece Georgi Mladenov, Bulgarian Academy of Sciences | Institute of electronics, Bulgaria Jose Manuel Molina Lopez, Universidad Carlos III de Madrid, Spain Bartolomeo Montrucchio, Politecnico di Torino, Italy Rafael Morales Herrera, University of Castilla-La Mancha, Spain Ioannis Moscholios, University of Peloponnese, Greece Shinobu Nagayama, Hiroshima City University, Japan Arnaldo Oliveira, UA-DETI/IT-Aveiro, Portugal Nikos Petrellis, TEI of Thessaly, Greece Vladimir Privman, Clarkson University - Potsdam, USA Càndid Reig, University of Valencia, Spain Piotr Remlein, Poznan University of Technology, Poland Brian M. Sadler, Army Research Laboratory, Adelphi, USA Djohra Saheb, Centre de Développement des Energies Renouvelables (CDER), Algeria Julio Sahuquillo, Universitat Politècnica de València, Spain Falk Salewski, Muenster University of Applied Sciences, Germany Sergei Sawitzki, FH Wedel (University of Applied Sciences), Germany Sandra Sendra, Universidad de Granada, Spain Saeideh Shirinzadeh, University of Bremen, Germany Ivo Stachiv, Institute of Physics | Czech Academy of Sciences, Prague, Czech Republic / Harbin Institute of Technology | Shenzhen Graduate School, Shenzhen, China Francisco Torrens, Universitat de Valencia, Spain Carlos M. Travieso-González, Universidad de Las Palmas de Gran Canaria, Spain John Vardakas, Iquadrat Informatica, Barcelona, Spain Miroslav Velev, Aries Design Automation, USA Manuela Vieira, ISEL-CTS-UNINOVA, Portugal Jin Wei, University of Akron, USA Robert Wille, Institute for Integrated Circuits | Johannes Kepler University, Linz, Austria Christian Wögerer, PROFACTOR GmbH, Austria Ravi M Yadahalli, SG Balekundri Institute of Technology, India Piotr Zwierzykowski, Poznan University of Technology, Poland

### **Copyright Information**

For your reference, this is the text governing the copyright release for material published by IARIA.

The copyright release is a transfer of publication rights, which allows IARIA and its partners to drive the dissemination of the published material. This allows IARIA to give articles increased visibility via distribution, inclusion in libraries, and arrangements for submission to indexes.

I, the undersigned, declare that the article is original, and that I represent the authors of this article in the copyright release matters. If this work has been done as work-for-hire, I have obtained all necessary clearances to execute a copyright release. I hereby irrevocably transfer exclusive copyright for this material to IARIA. I give IARIA permission or reproduce the work in any media format such as, but not limited to, print, digital, or electronic. I give IARIA permission to distribute the materials without restriction to any institutions or individuals. I give IARIA permission to submit the work for inclusion in article repositories as IARIA sees fit.

I, the undersigned, declare that to the best of my knowledge, the article is does not contain libelous or otherwise unlawful contents or invading the right of privacy or infringing on a proprietary right.

Following the copyright release, any circulated version of the article must bear the copyright notice and any header and footer information that IARIA applies to the published article.

IARIA grants royalty-free permission to the authors to disseminate the work, under the above provisions, for any academic, commercial, or industrial use. IARIA grants royalty-free permission to any individuals or institutions to make the article available electronically, online, or in print.

IARIA acknowledges that rights to any algorithm, process, procedure, apparatus, or articles of manufacture remain with the authors and their employers.

I, the undersigned, understand that IARIA will not be liable, in contract, tort (including, without limitation, negligence), pre-contract or other representations (other than fraudulent misrepresentations) or otherwise in connection with the publication of my work.

Exception to the above is made for work-for-hire performed while employed by the government. In that case, copyright to the material remains with the said government. The rightful owners (authors and government entity) grant unlimited and unrestricted permission to IARIA, IARIA's contractors, and IARIA's partners to further distribute the work.

## **Table of Contents**

| Ultra-Low-Voltage Dual-Rail NAND/NOR for High Speed Processing<br>Ole Herman Schumacher Elgesem, Omid Mirmotahari, and Yngvar Berg                                     | 1  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Real-Time SDR-Based ISM-Multiantenna Receiver for DoA-Applications<br>Janos Buttgereit, Erik Volpert, Horst Hartmann, Dirk Fischer, Gotz C. Kappen, and Tobias Gemmeke | 5  |
| A New Front-End Readout Electronics for the ALICE Charged-Particle Veto Detector<br>Clive Seguna, Edward Gatt, Giacinto De Cataldo, Ivan Grech, and Owen Casha         | 11 |

## **Ultra-Low-Voltage Dual-Rail NAND/NOR** for High Speed Processing

Ole Herman Schumacher Elgesem

Omid Mirmotahari

Yngvar Berg

Department of Informatics University of Oslo Email: olehelg@ifi.uio.no

Department of Informatics Department of Informatics University of Oslo Email: omidmi@ifi.uio.no

University of Oslo Email: yngvarb@ifi.uio.no

Abstract—This paper expands Ultra-Low-Voltage Dual-Rail (UL-VDR) technology to 2-input logic gates. While previous research has been focused on inverters, it is important to investigate and demonstrate the function and speed of ULVDR in bigger, more complex circuits. ULVDR offers a significant speed increase over the more traditional Cascode Voltage Switch Logic (CVSL). Using the industry standard 90 nm CMOS process and a supply voltage of 300 mV, ULVDR NAND gates are more than 50 times faster than CVSL, when comparing chain evaluation delay.

Keywords-Ultra-Low-Voltage; high-speed; ULVDR; NAND; CVSL.

#### I. INTRODUCTION

Over the past 30 years, electronics have become faster, cheaper and much more prevalent. This has pushed the industry towards smaller devices with lower supply voltage, as well as power consumption. As feature sizes approach a few atoms in length, further miniaturization becomes impossible.

Wearables, as well as smaller smart / Internet of Things (IoT) devices, are becoming much more common. All these devices can be powered by batteries and/or various means of energy harvesting. Either way the circuits need to be energy efficient and possibly operate at lower supply voltages. Within energy harvesting the supply voltage domain ranges from  $175 \,\mathrm{mV}$  to  $350 \,\mathrm{mV}$ , which is often referred to as Ultra low Voltage. Exploring alternate circuit topologies is the most accessible way to reduce supply voltage and power consumption while maintaining speed. New circuit topologies can be manufactured using existing factories and technology are favorable. Completely new ways to build computers inspired by biology or quantum physics are still far away from competing with the silicon electronics industry.

A prominent new logic style which builds on CMOS [1], CVSL [2] and domino logic [3], namely the ULVDR inverter, is stated to be 25 times faster than traditional dual rail clocked CVSL [4]. Our work in this paper contributes to the field of Ultra Low Voltage (300 mV) and is based on the design logic presented in [4]. In this paper, we present a ULVDR NAND/NOR gate.

The content of this paper is as follows: In Section II, we introduce the ultra low voltage dual rail CVSL logic style. The ULVDR NAND/NOR gates with transistor details are discussed in Section III with the simulation verifying the logic is presented in Section IV. In Section IV-D, we compare our design to a CVSL gate with the simulation environment of a chain. Finally, a conclusion is included in Section V.

#### II. ULTRA LOW VOLTAGE DUAL RAIL LOGIC

A ULVDR precharge to 1 (0P1) inverter is shown in Figure 1. (A 0P1 gate has low voltage on inputs and high voltage on outputs, during precharge). At 300 mV, the delay of a ULVDR inverter has been demonstrated to be 7% of the CVSL inverter delay [4].



Figure 1. ULVDR 0P1 Inverter

#### During precharge:

When  $\varphi$  is high, the inverter is in precharge and not evaluation. Recharge transistors set all floating gates to their active state, such that both precharge (P) and evaluation (E) transistors are conducting. The output is brought to the precharge voltage, 300 mV for 0P1 and 0 mV for 1P0. Keeper(K) transistors do not play a significant role during precharge.

#### During evaluation:

When  $\varphi$  is low, the inverter is in evaluation. Recharge transistors are turned off, allowing the gate nodes of precharge (P) and evaluation (E) transistors to float. This is called a floating gate. When the input rising edge arrives, capacitive coupling causes the evaluation transistors floating gate to be supercharged, achieving voltages outside  $0 \le V_{GS} \le 300 \,\mathrm{mV}$ . The output switches quickly, and the keeper (K) transistors discharge the floating gate, turning the evaluation and precharge transistors which should not be conducting completely off.

For example, on a precharge to 1 inverter, a gate voltage,  $V_{GS} \approx 550 \text{ mV}$  allows the output to quickly transition to 0. Digital circuits limited to gate voltages within  $0 \le V_{GS} \le$ 300 mV, like CVSL and CMOS, are much slower as the transistors are only weakly conducting in this sub-threshold state. At 300 mV, the delay of a ULVDR inverter has been demonstrated to be 7% of the CVSL inverter delay [4].

#### III. NAND GATES

#### A. CVSL NAND gate

CVSL technology is used for comparison. Figure 2 shows a static CVSL NAND. At 300 mV, the CVSL has similar speed to a static CMOS.



Figure 2. Static CVSL NAND used for comparison

In CVSL, when the input(s) arrive, one pull-down nMOS network is active. This pulls the output down to low, which in turn activates the pull-up pMOS transistor of the other output. As the other output goes high, the pMOS transistor of the first output is turned off, eliminating static power consumption. After some amount of time, an evaluation delay, one pull-down nMOS network is active, and the other pull-up pMOS transistor is active.

Sizes used for CVSL NAND gates are shown in Table I. Transistors are sized using W/L = 120 nm/240 nm, finger count, N = 1 with exceptions listed in Table I. n and pare used to specify nMOS and pMOS transistors.  $\parallel$  and sare used for sizes applying to parallel and series transistors, respectively. This circuit was simulated for all possible input combinations, and average delays were computed.  $t_{df\mu}$  and  $t_{dr\mu}$  are the average delays (falling and rising edge on output). Note that the rising edge is much slower than the falling edge, due to the relatively weak pull-up pMOS.

#### B. ULVDR NAND

A ULVDR NAND gate was created based on the CVSL NAND (Figure 2) and ULVDR Inverter (Figure 1). The evaluation transistors were substituted by parallell( $\parallel$ ) and series(s) evaluation resistors. The precharge circuitry was duplicated to accomodate for the two floating gate inputs. Figure 3 shows both versions of the ULVDR NAND/NOR gate.

TABLE I. CVSL NAND DIMENSIONS

| Variable:        | Value:             |
|------------------|--------------------|
| W                | 120 nm             |
| L                | $240\mathrm{nm}$   |
| $N_{n\parallel}$ | 2                  |
| $N_{ns}$         | 4                  |
| $t_{ie}$         | $1\mathrm{ps}$     |
| $t_{df\mu}$      | $0.826\mathrm{ns}$ |
| $t_{dr\mu}$      | $6.80\mathrm{ns}$  |

When designing ULVDR gates and setting transistor dimensions it is important to consider the state of the circuit once evaluation starts. Before the inputs arrive, all precharge and evaluation transistors are active, with gate to source voltages,  $|V_{GS}| \approx 300 \,\mathrm{mV}$ . Thus, the output will be pulled by the evaluation networks away from the precharge value. It is important to dimension precharge and evaluation transistors to be at equilibrium around 90% of the precharge value. This was done by Mirmotahari, Dadashi, Azadmehr, *et al.* in [4] and those sizes are used as a starting point. As the ULVDR inverter has symmetric rails (sides) it is enough to do one such matching per circuit.

TABLE II. ULVDR NAND DIMENSIONS

| 0                 | P1     | 11                | P0               |
|-------------------|--------|-------------------|------------------|
| ymbol:            | Value: | Symbol:           | Value:           |
| 7                 | 7 fF   | C                 | 11 fF            |
| V                 | 120 nm | W                 | $120\mathrm{nm}$ |
|                   | 100 nm | L                 | 100 nm           |
| pP                | 240 nm | $L_{nP}$          | $240\mathrm{nm}$ |
| $V_{nE}$          | 240 nm | $N_{nP\parallel}$ | 2                |
| $V_{nE\parallel}$ | 1      | $N_{nPS}$         | 1                |
| $V_{nES}$         | 2      | $N_{pE\parallel}$ | 1                |
| $V_{pP\parallel}$ | 8      | $N_{pES}$         | 2                |
| V <sub>nPS</sub>  | 4      |                   |                  |

For the ULVDR NAND/NOR gate, each rail is different and requires separate matching. Series evaluation transistors are doubled in size (finger count, N) to account for increased series resistance. Precharge transistors connected to parallel evaluation transistors are also doubled, to account for the increased parallel conductance. New transistor dimensions can be found in Table II. Recharge and keeper transistors are minimum sizes, but can be scaled according to timing requirements.

#### IV. SIMULATION

#### A. Logic verification

Figure 4 shows the NAND gate response to a binary counting sequence. Stimuli sequence 00, 01, 10, 11 produces the familiar NAND response; 1, 1, 1, 0. A1, B1 and X1 are the noninverted signals, A2, B2 and X2 are their respective compliments.

The transient in Figure 5 shows the evaluation transistor floating gate voltage, FGA1, quickly jump when the input arrives. It peaks at 565.44 mV allowing the nMOS transistors to rapidly pull the output down to 0.

#### B. Parasitic delay

Using identical inputs, and ideal clock and voltage sources, parasitic delays were simulated. Delay was measured from input switches to output switches (50% to 50%). Another gate



Figure 3. ULVDR NAND; 0P1(left), 1P0(right)



Figure 4. ULVDR NAND 0P1 response to 4 different inputs



Figure 5. Floating gate voltage when ULVDR NAND is switching

was connected to the output as a semi-realistic load (opposite polarity for ULVDR). Table III shows the results for both CVSL and ULVDR gates.

TABLE III. PARASITIC DELAYS (WORST CASE AND AVERAGE)

|            | CVSL NAND:         | ULVDR NAND:    |
|------------|--------------------|----------------|
| $t_e$      | $1\mathrm{ps}$     | $1\mathrm{ps}$ |
| $t_{dw}$   | 9.032 ns           | 0.178 ns       |
| $t_{d\mu}$ | $6.796\mathrm{ns}$ | 0.103 ns       |

When using ideal inputs and supply (300 mV) the ULVDR gates have parasitic delays ranging from 32.1 ps to 178.0 ps. The average parasitic delay for ULVDR NAND gates is approximately two orders of magnitude smaller than for CVSL. These ideal characteristics are useful for comparison, but not realistic - Section IV-D shows a better delay estimate, using chain delay.

#### C. Monte Carlo simulation

A 200 sample Monte Carlo Sweep was run to show the effects of mismatch and process parameters (variance). The results for both CVSL and ULVDR NAND gates are shown in Figure 6. Both plots are on the same time scale.

#### D. Chain delay

In Section IV-B, parasitic delay was estimated. An ideal input signal gives lower parasitic delay than what you can expect in a real circuit. A more realistic delay can be estimated using a chain of NAND gates. In this configuration the NAND gates act as inverters. There are 2 logic states, either the input (and ouptut) is low, or high. As there are 2 versions of the ULVDR gate the delays for these states differ.

Figures 7 and 8 show transients from the chain delay simulations. Two cases were simulated,  $00 \rightarrow 11$  and  $11 \rightarrow 00$ .

#### V. CONCLUSION

The CVSL NAND gates achieve an average (per gate) delay of 9.658 ns and 9.738 ns. (For the two simulation cases mentioned in Section IV-D.) The ULVDR NAND chain has a



Figure 6. Monte Carlo simulation; CVSL NAND (Left), ULVDR NAND (Right)



Figure 7. Output delay for chain of 30 CVSL NAND gates



Figure 8. Output delay for chain of 30 ULVDR NAND gates

per gate average delay of 169.705 ps and 141.33 ps. A speedup factor, s, can be calculated (worst case delays used):

~ **-**~~

$$s = \frac{t_{\text{CVSL}}}{t_{\text{ULVDR}}} = \frac{9.738 \,\text{ns}}{0.169\,705 \,\text{ns}} \approx 57$$

The chain test in Section IV indicates that ULVDR NAND gates can be more than 50 times faster than static CVSL NAND gates. The tradeoff is the complexity and size in silicon, especially when considering necessary clock and precharge circuits. This paper does not consider power usage, layout, clock drivers, etc. Further research is needed to completely characterize the ULVDR NAND gate and the differences between ULVDR and CVSL in terms of power, speed, robustness, area, etc.

#### REFERENCES

- F. M. Wanlass and C.-T. Sah, "Nanowatt logic using field-effect metal-oxide semiconductor triodes", 84, International Solid-State Circuits Conference, IEEE, 1963, pp. 32–33. [Online]. Available: https://ieeexplore.ieee. org/document/1157450/ (visited on 08/05/2018).
- [2] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential cmos logic family", 84, International Solid-State Circuits Conference, IEEE, 1984, pp. 16–17. [Online]. Available: https://ieeexplore.ieee.org/document/1156629/ (visited on 08/05/2018).
- [3] R. H. Krambeck, C. M. Lee, and H.-F. S. Law, "Highspeed compact circuits with cmos", *IEEE Journal of Solid-State Circuits*, vol. 17, no. 3, pp. 614–619, Jun. 1982. [Online]. Available: https://ieeexplore.ieee.org/ document/1051786/ (visited on 08/05/2018).
- [4] O. Mirmotahari, A. Dadashi, M. Azadmehr, and Y. Berg, "High-speed dynamic dual-rail ultra low voltage static cmos logic operating at 300 mv", in *International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS)*, IEEE, 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7483898/ (visited on 08/05/2018).

## **Real-Time SDR-Based ISM-Multiantenna Receiver for DoA-Applications**

### Janos Buttgereit, Erik Volpert, Horst Hartmann, Dirk Fischer, Götz C. Kappen

NTLab University of Applied Science Münster Münster, Germany Email: goetz.kappen@fh-muenster.de

Abstract-Spectral efficiency is one of the critical issues, which has to be considered during setup of large wireless sensor networks in the Internet of Things (IoT). This paper presents a real-time hardware/software demonstrator based on a flexible Software Defined Radio (SDR) and a low-cost multiantennaarray. The main purpose of this demonstrator is to evaluate cost-benefit parameters (i.e., required processing power, logic resources vs. performance of the multiantenna algorithm) of the overall multiantenna receiver (i.e., antenna, analog and digital signal processing). Therefore, size and power consumption as well as miniaturization of the demonstrator are not considered at this time. To motivate software functions and high-level software architecture, this paper gives a brief theoretical background of multiantenna receivers. A highly adaptable and modular C++based framework has been developed that realizes all relevant low level and high level signal processing tasks (e.g., ADC-data transfer, online system calibration, Direction of Arrival ((DoA) estimation and interferer suppression), as well as graphical visualization of the spatial spectrum in a multithread-based manner. The multithread-based realization of the demonstrator ensures high performance and a convenient user experience. First measurements of the whole system (i.e., low-cost antenna, C++-based high level and low level signal processing, as well as graphical visualization using a host PC) in a real-world environment proof functional correctness while demonstrating real-time capability of the overall system.

Keywords–Multiantenna Systems; Wireless Sensor Networks; Spectral Efficiency; Software-defined-radio; IoT.

#### I. INTRODUCTION

During the next years the number of IoT-nodes will increase rapidly [1], [2]. Simultaneously, the IoT-node complexity is widely spread starting with simple sensor nodes, used for temperature or humidity measurements, to completely integrated embedded systems which are able to control processes and act autonomously. Figure 1 shows the exponential increase of IoT-nodes starting from 1992 and the forecast of the number of devices in 2025 [2]. Additionally, the world population is given for the same years and it can be seen that from the year 2011 on the number of IoT-devices per person will be greater than one. The dominant drivers of this evolution are miniaturization, cost reduction and increased power efficiency of semiconductor and sensor devices. Most IoT-based sensor nodes exchange data adopting wireless standards suitable for required short or long-range communication. Thus, since the spectrum is a limited resource, spectral efficiency will play a critical role during IoT-transceiver development. Moreover, communication security and resistance against harmfully inTobias Gemmeke

IDS RWTH Aachen University Aachen, Germany Email: gemmeke@ids.rwth-aachen.de

terfering signals will be further design objectives, as they are already today in nearly all other wireless systems [3].



Figure 1. IoT-Roadmap (based on: [1], [2], [4])

Multiantenna receivers are able to significantly improve spectral efficiency by using digital beamforming techniques. Interferer suppression can be realized by nulling techniques in the spatial domain. Finally, the DoA of signals and interferers can be estimated, which can be used to increase received signal strengths and improve the security of the communication channel by digital post-processing in the spatial domain [5], [6]. Figure 2 shows a simple stack of a wireless sensor node, featuring data sink/source, sensor data preprocessing, and analog and digital multiantenna processing.



Figure 2. Simple model of an IoT-sensor node

The major drawbacks of multiantenna transceivers are the increased amount of required digital signal processing, as well as the complexity of algorithms and software-code. Therefore, a clear code structure, as well as efficiency, flexibility and re-usability of the code play a central role, when realizing the digital signal processing part of the receiver. Also, for the sensor node, special care must be taken during the realization of the analog part and the data transfer to the digital domain. Especially, interferer suppression and DoA-estimation rely on coherent signal reception and processing. Therefore, this paper describes the used SDR and digital calibration techniques to allow for these algorithms to be realized. Finally, the antenna array drives size and costs of the receiver and therefore is the key for user acceptance and suitable application domains. This paper focuses on the two upper layers (i.e., digital and analog multiantenna signal processing) shown Figure 2 and the antenna array. Since, during the design and evaluation process, flexibility is the key challenge, a flexible SDRapproach is adopted to implement these layers of the sensor node. The SDR has been programmed in a very modular way. Thus, the proposed system is very flexible and can be easily adapted to other receiver standards and frequency bands (e.g., DECT, GPS). Finally, a generic antenna design can be used to test receivers in various frequency bands and for different applications. For all examples in this paper a receiver setup for the 2.4 GHz ISM-band is assumed. The rest of the paper is organized as follows. Section II gives a general discussion of the problem from the application's and user's point of view. It can be seen that the DoA-estimation is a crucial part during beamforming and interferer suppression, as well as during the process to gain information about the current environment. For a mathematical description, Section III defines the signal model and presents the simulation and receiver test environment. Afterwards, Section IV gives an in-depth description of the hardware used throughout the paper. The central part of the presented receiver is the SDR, which allows to select various frequency bands and to define sampling frequency and receiver bandwidth. Additionally, this section provides a high level overview of C++-based receiver software (low-level and high-level Digital Signal Processing (DSP)) and Graphical User Interface (GUI) programming, as well as a description of the various external and internal interfaces of the system, while details of the receiver software are discussed in Section V. The final part of Section IV presents some details of the low-cost antenna design and setup. Section V is devoted to the softwarerealization of the receiver and gives implementation details of the main blocks of the receiver software (e.g., recording of the incoming frontend samples, calculation of the covariance matrix, DoA-estimation and visualization of the time plot and the DoA-spectrum. Special emphasis lies on the thread-based realization to ensure real-time performance, portability and flexibility). Therefore, this section deals with three central points:

- Parallel realization of the receiver software tasks.
- Object oriented programming to ensure flexibility and cope with large code-complexity.
- Cross-platform realization of the software-code.

In Section VI, the used measurement setup and measurement results are described to show the potential of the overall receiver hardware/software-concept.

Section VII summarizes the paper and shows the intended optimization steps of the receiver hardware/software (i.e., miniaturization, introduction of new algorithms, introduction of new applications).

#### II. PROBLEM DEFINITION

IoT-nodes and IoT-node networks suffer from the operation of a large amount of nodes in close vicinity and indoor operation. As discussed in the introductory part this leads to:

- Interference and
- Multipath (especially in an indoor environment).

While multiantenna concepts are able to mitigate these problems, hardware and software development is timeconsuming, and power consumption of the sensor node is always a critical issue [7]. Therefore, the critical task is to perform a cost-benefit-analysis (e.g., minimal power consumption vs. meeting application defined DoA-estimation accuracy as well as interferer suppression) in short time.

To quickly develop and evaluate IoT-nodes that fulfill the required user demands, performance needs to be observed i.e., the quality of several different DoA-algorithms and low-cost antenna setups for various real-world signal-situations under real-time conditions. Thus, the first step is to develop a modular PC-application that uses SDR-hardware as input source, runs various estimation algorithms and visualizes their results in real-time using a GUI. This application acts as a proof-of-concept demonstrator and shall help to judge performance of the algorithms and arrays under various circumstances and trigger critical estimation edge cases to ultimately develop better or cheaper algorithms and arrays. This research approach is followed by a design and realization phase of the low-cost and low-power sensor, analog and digital signal processing hardware (cf. Figure 2).

#### III. SIGNAL MODEL AND SIMULATION

This section describes the signal model and the simulation setup, as well as simulations results. Furthermore, the main algorithms for DoA-estimation (e.g., Capon-Beamformer and Multiple Signal Classification (MUSIC) algorithm [8]) are introduced and the respective equations are given.

#### A. Signal Model

In this section the signal model, based on the theory described in [6], [5] and [8], is briefly summarized while the description is restricted to one received signal. We assume that we are in the far field of the sending antenna, the narrow band assumption holds and that the antenna has a flat frequency response. Then the vector  $\mathbf{u}$ , which might be used to describe signal and interferer, can be defined. Figure 3 shows an arbitrary antenna array with N antenna elements and the vector  $\mathbf{u}$ .



Figure 3. Multiantenna Model

Then, **u** can be written, depending on  $\phi$  and  $\theta$  as

$$\mathbf{u}(\phi,\theta) = \begin{pmatrix} -\sin\theta\,\cos\phi\\ -\sin\theta\,\sin\phi\\ -\cos\theta \end{pmatrix} \tag{1}$$

and the wave number  $\mathbf{k}$ , relative to the origin of the given coordinate system can be calculated as

$$\mathbf{k}(\phi,\theta) = \frac{2\pi}{\lambda} \cdot \mathbf{u}(\phi,\theta) \tag{2}$$

In the following, the angles  $\phi$  and  $\theta$  are omitted. If it is now assumed that an N-element antenna (cf. Figure 3) receives this signal from a defined DoA the resulting equation, which describes the time-dependent output vector, is:

$$\mathbf{x}(t) = \exp\left(-\mathbf{j}\mathbf{p}\mathbf{k}\right)s(t) + \mathbf{n}(t) = \mathbf{a}s(t) + \mathbf{n}(t)$$
(3)

Afterwards, the so called spatial covariance matrix can be estimated using the estimation operator  $E\{\cdot\}$  as

$$\mathbf{R} = E\{\mathbf{x}(t)\mathbf{x}^{H}(t)\}$$
  
=  $\mathbf{a}E\{\mathbf{s}(t)\mathbf{s}^{H}(t)\}\mathbf{a}^{H} + E\{\mathbf{n}(t)\mathbf{n}^{H}(t)\}$  (4)  
=  $\mathbf{a}\mathbf{P}\mathbf{a}^{H} + \sigma^{2}\mathbf{I}$ 

Equation (4) can be written using a unitary matrix U and a matrix of the Eigenvalues  $\Lambda = \text{diag}\{\Lambda_0, ..., \Lambda_{N-1}\}$  [9].

$$\mathbf{R} = \mathbf{U} \mathbf{\Lambda} \mathbf{U}^{H} = \mathbf{U}_{s} \mathbf{\Lambda}_{s} \mathbf{U}^{H}_{a} + \mathbf{U}_{n} \mathbf{\Lambda}_{n} \mathbf{U}^{H}_{n}$$
(5)

The Eigenvalues of noise (index n) and signal (index s) have been separated. For a real-word implementation only a limited number of samples can be recorded and used to estimate the spatial covariance matrix. Following [8] this matrix is called  $\hat{\mathbf{R}}$ .

In this work two DoA-estimation algorithms are considered. First the Capon and second the MUSIC algorithm [8]. Both algorithms generate a spatial spectrum, where the maximum gives an estimate of the DoA of the incoming signal.

For the Capon beamformer, the spatial spectrum is defined as:

$$P_{\rm CAP} = \frac{1}{\mathbf{a}^H(\phi, \theta) \hat{\mathbf{R}}^{-1} \mathbf{a}(\phi, \theta)}$$
(6)

The MUSIC spectrum is defined as:

$$P_{\rm M} = \frac{\mathbf{a}^{H}(\phi, \theta)\mathbf{a}(\phi, \theta)}{\mathbf{a}^{H}(\phi, \theta)\hat{\mathbf{U}}\hat{\mathbf{U}}^{H}\mathbf{a}(\phi, \theta)}$$
(7)

For interferer suppression a simplified version of the Applebaum [5] array will be used.

#### B. Real-time Receiver Tests

The whole receiver signal processing chain has been developed and simulated in MATLAB. This Golden Reference model has been used during the receiver design process (see Section V) to validate the correctness of the real-time C++based receiver results.

Therefore, modulated carrier signals with random elevation and azimuth angles were generated in MATLAB for each sensor element and for various array geometries (i.e., circular, rectangular and uniform linear). Additionally, additive white Gaussian noise has been added to the signals (cf. equation (3)). These signals were used as input signals for the C++-based and MATLAB based offline processing, by using a simple file format developed for this purpose.



Figure 4. MATLAB and Real-Time C++ Comparison

Both processing paths in Figure 4 estimate covariance matrix, spatial spectrum, as well as the azimuth and elevation angles using floating-point precision (see Section V for implementation details). Since input signal and data precision are identical, the results could be directly compared, which eases the debugging of the real-time capable C++-receiver. Additionally, the effect of a reduced precision (e.g., single precision calculations) can easily be investigated. The results show that while the spatial spectrum of the Capon Beamformer is slightly degraded the MUSIC-spectrum is identical with a resolution of  $1^{\circ}$  (see Section V).

#### IV. SDR-BASED RECEIVER OVERVIEW

The following subsections give an overview of the hardware (i.e., SDR, host computer and low cost multiantenna) used to realize the DoA-estimation task, while the software is described in detail in Section V. Mainly an Ettus SDRs, equipped with daughterboards and connected to a host computer using 10 GBit/s connections are used for analog preprocessing, analog-to-digital conversion and realization of the signal processing algorithms (cf. Figure 5). On the host computer the Ettus API is used to establish the connection, control data transfer and configure the Ettus daughterboards. Moreover, the DoA-estimation and calibration algorithms, as well as the GUI are implemented on the host computer. For maximal flexibility (i.e., center frequency, antenna dimensions and geometry, as well as number of antenna elements) and minimal costs, the receiver antenna array is manufactured inhouse based on simple dipole antennas.



Figure 5. General Schematic of a Multiantenna-Receiver

#### A. Receiver Hardware-Setup

A general approach of low cost multiantenna receivers for Global Navigation Satellite System (GNSS) receivers has been described in [3]. Since the hardware should be used to evaluate various DoA and interferer suppression algorithms, the concept presented in this work replaces the FPGA development board used in [3] with a commercially available SDR [10]. This architecture features substantially more flexibility, which comes at significantly higher costs, which are acceptable during this early phase of the receiver design. The proposed receiver hardware is based on an Ettus SDR USRP X300 equipped with two SBX daughterboards [10]. Each daughterboard has a frequency range from 400 MHz to 4.4 GHz, allows duplex operation, 40 MHz bandwidth and 16-bit ADC resolution. X300 device can be equipped with two daughterboards, therefore a 4 channel SDR-receiver requires four SBX daughterboards and two X300. Each X300 is connected to a host PC using a 10 GBit/s connection.

Figure 5 shows the setup based on multiple, independent receiver units each generating their own LO (Local Oscillator) signal. As the phase relation of the received signals is a key factor for most DoA- and interferer suppression algorithms, and the LO-phase will be added to the input signal phase, totally unsynchronized LOs will generate totally useless input signals. If the phase offset between the individual LOs is known, they can be easily canceled out by correcting the unwanted phase shift in software. To overcome this issue, the SDR-receivers, used in the presented setup, have two separate inputs, one connected to the antenna and one connected to a synchronization signal that is distributed to all receivers from a central signal source. Measuring results showed that switching over to the synchronization signal each five seconds to recalibrate the LO-phase error correction values is sufficient to get an overall stable measurement situation. Additionally, a time-invariant phase error is introduced by slightly different cable lengths (i.e., connections between antenna array and receiver). This error was measured once and is added as a time-invariant complex correction factor to the dynamically measured correction factors.

#### B. Software Overview and GUI

A high level schematic of the demonstrator software is shown in Figure 6. On the left hand side the four 16-bit digital input streams enter the signal processing stage and the spatial covariance matrix is calculated. The subsequent block performs the calibration of the spatial covariance matrix by applying time-varying and time-invariant complex correction factors.



Figure 6. Real-Time Demonstrator Schematic

Based on the corrected covariance matrix, the estimation algorithm (e.g., MUSIC, Capon) generates the spatial spectrum, which is displayed in the GUI. A parallel task searches for the maximum in the spatial spectrum. Its numerical results (i.e., elevation and azimuth) are also displayed in the GUI (c.f. Figure 10). For debugging purposes the software allows reading out the four channel input data, as well the output of the estimation algorithm. The data files can be used to compare the results of the C++-based processing of the real-time demonstrator and the MATLAB-based Golden Reference model (see Section III).

#### C. Antenna Setup

For tests in the ISM-band an array with four ground plane antennas has been designed. This type of antenna is low cost and easy to build and allows simplified antenna tuning [11]. Figure 7 shows the VSWR-plot of a single antenna, which shows a minimum at the desired frequency f = 2.45 GHz.



Figure 7. VSWR-Measurement used for Antenna Tuning

The driven element and the four radials do have a mechanical length of around l = 3 cm, which is approximately  $\lambda/4$  for the selected center frequency of 2.45 GHz. As can be seen in Figure 8 the rectangular array features an interelement spacing of  $\lambda/2 \approx 6.1$  cm. In the construction shown, the electronic beam pattern is omni-directional for the azimuth angle while there is no radiated energy at an elevation angle of  $\phi = 0^{\circ}$ . While this is a prefect setup for ground based signals and interferers it will lead to problems if the desired signals have larger elevation angles.



Figure 8. Low-Cost Multiantenna Realization

#### V. RECEIVER SOFTWARE REALIZATION

This section discusses details of the signal processing block realization in Figure 6. As mentioned in Section II the software should meet the following key constraints:

- Modular software architecture, e.g., implementing a new estimator or interferer suppression algorithm should be as easy as programming the algorithm itself.
- Modular hardware architecture, e.g., changing antenna array dimensions should be as easy as changing the description of the antenna positions, changing the center frequency should just be a change of a single variable.

• Optimum real-time DSP performance without any sample-drop combined with an optimal GUI-operation.

Furthermore, the phase-synchronization described in Section IV has to be implemented. To achieve these goals, state-ofthe-art DSP-software design flow is employed. The software is solely written in C++, using a cross-platform capable framework, originally developed for professional audio DSPapplications [12]. Besides the ability of displaying the current measurement snapshots of the input signals in the time domain, the resulting spatial spectrum can be captured at any moment in time and stored to data files. This allows to analyze all parameters for various signal situations using software like MATLAB afterwards (see Section III).

#### A. Concurrent Data Processing

To make use of modern multicore-CPUs and meet the throughput requirements, the work is spread over multiple threads running in parallel, arranged in a software-pipeline structure, where each thread is a consumer of the previous thread's data and a producer for the following thread. Passing data from one thread to another is done by simply swapping buffers.



Figure 9. Multithreaded Software Pipeline

Figure 9 shows the data flow. All data-exchange buffers are allocated twice at start-up. As memory allocation is a system call with unpredictable execution time on general purpose operating systems, avoiding memory allocation on the high and medium priority threads narrows down the operations invoked on these threads to function calls with fully predictable execution time. This guarantees that the thread's job will be predictably finished before the next data buffer is passed for processing.

Samples are received by blocking calls to the Ettus UHD API [13], which invokes the 10 GBit-Ethernet interface and returns as soon as a whole block of samples has been received from the hardware units and filled into the buffer passed to the API call. This buffer is forwarded to the sample processing thread afterwards, which returns the buffer it processed in the previous run to the receive thread to be filled again. This enables the new sample block to be processed, while another thread handles the acquisition of the following sample block in parallel. The sample processing thread fills a buffer for the scope if needed and then accumulates samples into the covariance matrix. Computation of this matrix is done by extensive use of SIMD-instructions on sub-vectors that exactly fit one cache-line of the CPU and uses an additional thread, not shown in the figure, to parallelize the matrix computation even further.

After a covariance matrix calculation finished, the phase correction factors are applied to the matrix, which leads to much smaller computational overhead, compared to correction on a sample-basis. Depending on the covariance matrix accumulation length, which can be modified using the GUI at runtime, the accumulation process is done over several sample blocks. Thus, in general it takes several runs of the sample processing thread until a covariance matrix is handed over to the covariance matrix processing thread, which realizes the current estimator algorithm. This is why the update rate of the covariance matrix thread is slightly lower. However, the DoA-algorithms invoked on this thread, usually do some computational heavy tasks like eigenvalue-decomposition and matrix inversion, so the broader time-slot for this thread gives it the ability to finalize computations, before the next covariance matrix will be passed.

The estimation algorithms in general are expected to generate a spatial spectrum in the form of a 90x360 matrix (in case of a usual angular resolution of  $1^{\circ}$  - other values are possible) and two vectors with azimuth and elevation angles of the estimated source positions. Those buffers are again handed over to the GUI-thread that visualizes the spatial spectrum and prints out the positions of sources detected in a given interval. As updating the GUI is scheduled by the operating system, frame drops are theoretically possible at this point. However, those drops won't interrupt the processing activity. Practically, a GUI framedrop almost never happens, which leads to a smooth presentation of the spatial spectrum.

A special case is handled when the receiver switches over to the synchronization signal. In this case, the covariance matrix computation will be paused and the phase correction value table will be updated, depending on the measured input signal phase offsets.

#### B. Object-Oriented Signal Processing

Object-oriented signal processing increases flexibility, as it allows a modular structure that directly models the signal flow block-diagram. Classes are used to encapsulate, e.g.,

- SDR-hardware
- Sample buffers
- Covariance-matrix calculation
- Phase correction measurement and application
- DoA-algorithm
- Spatial spectrum visualization

An important feature of C++ is the ability to describe (fully virtual) interface classes. This feature has been to describe a generic DoA-algorithm class, consuming a covariance matrix and generating a spatial spectrum, as well as a pair of estimation vectors that can be overridden by an actual implementation. A Capon Beamformer, as well as a MUSIC-estimator algorithm have been implemented, which can be chosen at runtime. As mentioned in the earlier sections, further algorithm development is one of the main goals. Thus, implementing new algorithms and switching from the one the other at runtime, while remaining within the same real-world signal situation, is

a highly powerful feature of the demonstrator. Another powerful options comes from the SDR-hardware abstraction layer, which is currently under development for its next iteration. This next generation will allow to use a completely different receiver hardware, abstracted by the same IO-interface class thus requiring minimal or no changes to the algorithm and visualization part of the software.

#### C. Cross-Platform Implementation

The abstraction approach described in the previous subsection allows for portability of the code to various processing platforms. In a first version, this allows to build software from the same codebase that runs on all three major operating systems (Microsoft Windows, Linux and Mac OS) without code changes. Therefore, various parts of the software can be implemented on different operating systems and could be seamlessly integrated. This approach significantly speeds up development time as team members could exactly use their development tools of choice. For the final application this results in the key benefit that the whole application or parts of it can be easily ported to an IoT-device. By design, an embedded Linux platform, as used for most IoT-devices, is a fully compatible target for the application, which radically enhances the code re-use factor for upcoming development. Furthermore, deployment to mobile platforms, like Android or iOS, are suitable options.

#### VI. MEASUREMENT SETUP AND RESULTS

The described SDR-based demonstrator featuring the lowcost multiantenna array has been used to perform some indoor measurements in the ISM-band at 2.45 GHz. Since multipath and interfering signals are expected in the utilized frequency band the environment close to the real application and the measurement quality will be degraded. Nevertheless, first qualitative results show an accuracy of the DoA-estimation around 5° for the azimuth  $\theta$  and approximately 10° for the elevation angle  $\phi$ . Moreover, the real-time GUI (cf. Figure 10) shows a correct dynamic behavior. The GUI features some additional options (e.g., taking a data snapshot, real-time modification of receiver parameters, selection of the DoA-Algorithm), which help to improve measurement results, and ease software debugging.



Figure 10. Graphical User Interface of the Multiantenna-Receiver

Besides the qualitative test a first profiling has be conducted to evaluate the computational requirements of the three threads shown in Figure 9. The profiling shows that about 53% of the overall processing time is consumed by the GUI and the user interaction (i.e., the green block in Figure 9) while 45% is required for the covariance matrix calculations and the DoA algorithm (i.e., blue block in Figure 9). The high priority thread (i.e., red block in Figure 9) only consumes about 1.5% of the overall processing time. These numbers are a good starting point for optimization and for comparison of various DoA-estimation algorithm.

#### VII. CONCLUSION AND FURTHER DEVELOPMENT

Spectral efficiency, robustness and security are critical design parameters of wireless IoT-sensor nodes. Since costs (i.e., silicon area, power consumption) of multiantenna IoTsensor nodes, compared to single antenna sensor nodes, are significantly higher, a detailed cost-benefit analysis has to be performed in a first step. This paper presents a modular and flexible hardware-/software-architecture, based on an SDR, which realizes the analog preprocessing and the ADconversion. The modular C++-code realizes all digital signal processing parts, allows simple debugging and features easy extendability. The presented modular and generic approach supports porting the existing software to embedded platforms to reduce size and power consumption in a next step. Finally, a simple technique to realize low-cost antenna arrays supports the overall approach. Measurements and simulations validate functional correctness and the demonstrator shows real-time capability of the overall receiver.

#### ACKNOWLEDGMENT

The authors would like to thank the team of the Central Area of Electrical Engineering and Computer Science (ZBE) for their support during the antenna manufacturing process.

#### REFERENCES

- [1] Cisco Internet Business Solutions Group (IBSG), "The internet of things," 2011.
- "Number of IoT Devices," 2018, URL: https://www.statista.com/ statistics/471264/iot-number-of-connected-devices-worldwide/ [accessed: 2018-07-20].
- [3] G. Kappen, C. Haettich, and M. Meurer, "Towards a robust multiantenna mass market GNSS receiver," in Proceedings of the 2012 IEEE/ION Position, Location and Navigation Symposium, April 2012, pp. 291–300.
- [4] "World Population," 2018, URL: https://www.populationpyramid.net/ world/2025/ [accessed: 2018-07-20].
- [5] R. T. Compton, Adaptive Antennas, Concepts and Performance. Prentice Hall, 1988.
- [6] H. van Trees, Optimum Array Processing. John Wiley and Sons, Inc., 2002, ISBN: 9780471093909.
- [7] J. Xue, S. Biswas, A. C. Cirik, H. Du, Y. Yang, T. Ratnarajah, and M. Sellathurai, "Transceiver design of optimum wirelessly powered full-duplex mimo iot devices," IEEE Transactions on Communications, 2018, pp. 1–1.
- [8] H. Krim and M. Viberg, "Two decades of array signal processing research: the parametric approach," IEEE Signal Processing Magazine, vol. 13, no. 4, Jul 1996, pp. 67–94.
- [9] G. Golub and C. Van Loan, Matrix Computations, 2nd ed. Baltimore: Johns Hopkins University Press, 1989.
- [10] "Ettus Homepage," 2014, URL: http://www.ettus.com/ [accessed: 2018-07-20].
- [11] T. Milligan, Modern antenna design. Macmillan, 1985. [Online]. Available: https://books.google.de/books?id=sxUoAQAAMAAJ
- [12] "Juce Homepage," 2018, URL: https://juce.com/ [2018-07-20].
- [13] "Ettus API," 2014, URL: http://files.ettus.com/manual/page\_coding. html/ [accessed: 2018-07-20].

## A New Front-End Readout Electronics for the ALICE Charged-Particle Veto Detector

Clive Seguna, Edward Gatt, Ivan Grech, Owen Casha Department of Microelectronics and Nanoelectronics University of Malta Msida, Malta e-mail: {clive.seguna, edward.gatt, ivan.grech, owen.casha}@um.edu.mt

Abstract—The A Large Ion Collider Experiment (ALICE) upgrade strategy is based on collecting more than 10  $nb^{-1}$  of Pb-Pb collisions at luminosities of  $6x10^{27}$  cm<sup>-2</sup>s<sup>-1</sup> which corresponds to a collision rate of 50 kHz for Pb-Pb and 200 kHz for pp and p-Pb. Such high beam luminosity requirements cannot be met with the presently existing electronics having a low readout rate of 5 kHz. This work presents the design of a new front-end readout electronics for the Charged-Particle Veto detector (CPV) module located in PHOton Spectrometer (PHOS). The proposed new architecture, when compared to prior systems, allows the parallel readout and processing of all 480 silicon photomultiplier pads that are connected to digital signal processing cards. Preliminary results demonstrate that this work will enable the CPV detector to reach an interaction rate of at least 50 kHz. The system design consists of three modules, each containing two segment boards, two Readout Common Boards (RCBs) and 16 digital signal processors called DiLogic cards. This paper presents the architecture layout and preliminary performance measurement results for the proposed new design. This work concludes with recommendations for other future planned updates in hardware schema.

Keywords— Electronics; Detector; Field-Programmable Gate Arrays; CPV; ALICE; PHOS.

#### I. INTRODUCTION

The ALICE experiment is dedicated to study and collect data for comparison about heavy ion and proton-proton collisions in heavy ion-physics. The current system still leaves open physics questions that need to be addressed, and these questions relate to, among others, hadronization, nuclei, long range capability correlations and small x-proton structure [1][2]. The photon spectrometer PHOS is a lead-tungsten calorimeter designed to detect, identify and measure the 4-momenta of photons. The CPV is a charged particle veto detector for photon identification located in PHOS consisting of a multiwire proportional chamber (MWPC) with cathode readout. CPV electronics consist of dedicated Application Specific Integrated Circuit (ASIC) devices in each column, Gassiplex for analogue signal processing and DiLogic for handling the digitized information. Every column consists of 10 Gassiplex cards, called 3-GAS cards interfaced directly on the backside of the MWPC cathode. A customized electronic board called 5-DiLogic contains five channels of 12-bit Analogue-to-Digital Converter (ADC) modules and five

Giacinto De Cataldo Department of Physics University of Bari Bari, Italy e-mail: Giacinto.de.Cataldo@cern.ch

DiLogic (5-DiLogic) processors [3]. Each column contains 480 pads connected with two 5-DiLogic cards and a group of Field-Programmable Gate Arrays (FPGAs) called column and segment controllers that are used to process signals from a column and provide the necessary interface with DAQ (Data Acquisition) and Central Trigger Processor (CTP) systems. The CPV consists of three electronic modules, one of them shown in Figure 1, where each CPV module contains sixteen columns and 7680 channels for amplitude analysis.



Figure 1. Hardware for one CPV module.

A typical event size consists of 1.3 Kbytes for Pb-Pb particles. The maximum event readout rate that the detector can presently reach is 10 kHz for an occupancy of 1% [4], therefore, due to this technical limitation, a new front-end readout electronic system is being developed to collect more than 10 nb<sup>-1</sup> of Pb-Pb collisions at luminosities of up to  $6x10^{27}$  cm<sup>-2</sup> s<sup>-1</sup>.

The rest of the paper is structured as follows. Section 2 gives an overview of the developed system hardware. Section 3 provides a description of the implemented firmware architecture. Preliminary results are shown in Section 4. Finally, Section 5 presents the conclusion and future work.

#### II. OVERVIEW OF SYSTEM HARDWARE ARCHITECTURE

Figure 2 illustrates the architecture of the new Front-End Electronic (FEE) Readout detector hardware. The proposed hardware architecture for one module includes the re-design of a motherboard interface card called Segment board used to

concurrently process four DiLogic cards via an FPGA Altera column controller card containing a 28nm Cyclone V GX device. A Readout Common Board (RCB) is used for transmitting information to run the experiment over a radiation GBT link is composed of a Gigabit Transmitter (GBTx) component that encodes and scrambles transmitted parallel data, a Gigabit Transmitter/Receiver (GBTRx) component that decodes and descrambles incoming data, and a Multi-Gigabit



Figure 2. Block diagram of custom developed front-end electronic cards.

tolerant Gigabit transceiver (GBT) link chipset [5] to a Data Acquisition (DAQ) Common Readout Unit (CRU) at a high speed of (~5 Gb/s). The custom RCB card solution uses a Stratix IV FPGA device with four full duplex transceivers to transfer event data from column controllers to the GBT over optical Small Form-factor Pluggable (SFP) link for processing by DAQ CRU. Furthermore, the radiation-hard GBT link provides the simultaneous transmission of trigger and experiment control data over the same optical link. One SFP for Versatile transceiver link (VTRx) to CRU and a Detector Data link version two (DDL2) interface shall be optionally included to comply with ALICE standards. The newly custom developed hardware shown in Figure 2 enables the simultaneous readout of all column analogue patterns concerning the 480 channels thus drastically reducing the readout time of DiLogic cards by more than 50% when compared to the present CPV and High Momentum Particle Identification Detector (HMPID) readout detector systems. Additionally, this architecture shall reduce the implementation costs because, unlike the existing system, every FPGA column controller is processing signals from two columns instead of one.

#### III. FIRMWARE DESIGN AND DEVELOPMENT

The firmware is divided into two separate top system VHDL (VHSIC Hardware Description Language) modules for column and RCB or Readout Receiver Card (RORC) controllers.

#### A. RCB Top System Module

The RCB Top level entity combines all lower level entities, as shown in Figure 3, into the Altera FPGA controller. The GBTx BANK entity includes several GBT links, where each



Figure 3. RCB control top VHDL module.

Transceiver (MGT) that serializes, transmits, receives and deserializes the data. The RCB segment controller VHDL module is responsible for the synchronization logic between the FPGA Transceivers, GBTx link and the Standard Interface Unit (SIU) DDL2 module. Additionally, it processes L0 CTP signal via the Timing, Trigger Control system (TTC) and issues a Busy flag for the reduction of the overall dataflow. The Busy flag is issued from the arrival of the L0 trigger to the end of the transmission of event data, as shown in Figure 6. The RCB segment controller includes also the implementation of the standard SIU protocol as an optional feature for the event data transmission to DAQ or Destination Interface Unit (DIU) experiment recorder.

Figure 4 illustrates the command sequence adopted and implemented in VHDL RCB Top module for transmitting event data from the FEE to RCB or Readout-Receiver Card (RORC). The transmission of the Ready to Receive (RDRYX) command is then acknowledge by SIU and followed by a group of commands, as explained in [6]. The maximum DDL2 data transfer rate between SIU and the DIU is 5.125 Gb/s full Duplex.

#### B. Column Controller Top System Module

A VHDL top entity implemented in the Cyclone V GX column controller FPGA device consists of at least two main low-level components called Gassiplex.vhd and Read\_sm\_simple.vhd. The Gassiplex.vhd component is responsible for the control logic of the Gassiplex chips, consisting of a charge-sense amplifier with a long decay time to acquire the detector analogue signal.



Figure 4. The event data transmission transaction [6].

A Track/Hold (T/H) signal is used to store charges in Gassiplex sampling capacitors using T/H switches. A burst of clock pulses triggered by the column controller FPGA device is then generated to operate the multiplexed readout of the stored charges on a single output line. The Read sm simple.vhd contains the hardware logic for the simultaneous readout of two 5-DiLogic cards, as described in [7]. The 10 MHz clock is used for reading DiLogic First-In, First-Out (FIFO) memory. Every 5-DiLogic card contains five DiLogic signal processing chips each having a FIFO of 512 18-bit words, strobe and enable pins StrIn N, EnIn N,

EnOut\_N to initiate and indicate the termination of FIFO readout [8][9]. Each DiLogic chip is put in analogue readout mode when the EnIn\_N is set low. Successive StrIn\_N cycles cause all DiLogic modules in one 5-DiLogic card to sequentially output the digitised data on an 18-bit data bus starting with the simultaneous readout of the first DiLogic chips labelled DiLogic 0 and DiLogic 5 in the chain, as shown in Figure 5.

An enable signal is then passed from the EnOut\_N pin to the EnIn\_N pin of the next DiLogic module after finishing the transfer of digitised data for one event-word on the data bus. The concurrent readout of DiLogic cards contributes to a twofold increase in reading event data from DiLogic cards when compared to the previous and present CPV electronics architecture.

Each event-word contains the selected channel address and digitised amplitude information that need to be transferred via the FPGA transceivers at a rate of 3.125 Gbps then finally to the RCB controller for further formatting and transfer to DAQ.

The RCB and column controller's FPGA transceiver IP blocks include a built-in 8B/10B encoder decoder, byte serializer and deserializer modules enabling the simultaneous transmission of data packets from various FPGA column controllers to always start in a known byte lane and therefore allowing the RCB FPGA controller to correctly decode and properly recover the event frame before any further processing by the RCB fabric. Additionally, on-chip FPGA power supply decoupling to satisfy transient current requirements at high frequencies of 3.125 Gbps have been configured so to reduce the need for on-board decoupling capacitors.

The timing diagram obtained via Altera Signal Tap Logic Analyzer for a data block transfer initiated on receiving L0 trigger signal from CTP for the event-word number 3354h consisting of 10 words (40 bytes) for Common Data Header (CDH) followed by event data from RCB to DAQ is shown in Figure 6.



Figure 5. FPGA column controller card for the simultaneous readout of two 5-DiLogic card processors (right) [8].

| LO                            |           |                       |           |
|-------------------------------|-----------|-----------------------|-----------|
| BUSY                          |           |                       |           |
| <pre>b_tx_data:s_tx_cs</pre>  | EVENT_GAP | XXXX SEND_CDH_WORDS X | SEND_DATA |
| b_tx_data:cdh_word[90][310]   |           |                       |           |
| b_tx_data:u_event_number[230] |           | 003354h               |           |
| ib_tx_data:u_cdh_word[30]     | Oh        |                       | Ah        |

Figure 6. Timing diagram obtained using Altera signal-tap logic analyser for data transfer between RCB and DAQ.

#### IV. PRELIMINARY MEASUREMENT RESULTS

A test jig was setup to evaluate the prototype performance of this new readout electronics architecture. The test jig, shown in Figure 7, consists of a Windows workstation terminal for FPGA programming, a Linux terminal for displaying performance results, two RORCs modules and DAQ servers for storing event data, and a Local Trigger Unit (LTU) for issuing the CTP L0 trigger signal. The busy time of the data collection is mainly defined by the CTP waiting time for the completion of the readout electronics to transmit event data from FEE to DAQ server. The detector busy time due to readout in general depends on the event size. Increasing the trigger rate up to 200 Hz, the measured busy time averaged over a one-minute time interval and, as shown in Table I, is 10 us, which is equivalent to an estimated event size of 2.6 Kbytes (2.5 Columns, 5.2% occupancy of total detector channels). This measurement result leading to an estimated event readout rate of 100 kHz is above the required target for a detector occupancy of 1.3 Kbyte Pb-Pb collisions.



Figure 7. Test jig setup for CPV front-end readout prototype electronics.

The event readout rate measurements of the prior system is shown in Figure 8, thus indicating a maximum estimated readout rate of 5 kHz, twenty-fold slower than this work. The major contribution of our work is the re-design of new electronics, including concurrent readout of DiLogic cards and use of high speed 3.125 Gbps FPGA transceiver links. Another test workbench containing a Cyclone V GT 28 nm FPGA technology was setup in [10] to characterize GBTx performance in Single Event Upset (SEU) and therefore allow GBTx users to estimate SEU errors. GBTx was irradiated using high penetration particles at different angles to estimate the SEU and possible bit-error mechanisms under a luminosity of  $10^{34}$  cm<sup>-2</sup>s<sup>-1</sup>. The location of the proposed new readout electronics presented in this work will be in the ALICE detector where the measured radiation doses are estimated to be 0.1 kRad and  $1.9 \times 10^{10}$  charged particles/cm<sup>2</sup>, which puts CPV electronics in a safe operating side by 3 to 4 orders of magnitude [11].



Figure 8. Estimated event readout rate for prior system.

 TABLE I.
 Estimate Readout Rate For Various Event Sizes

| Event Size<br>(Bytes) | ~Busy Time(us)<br>[This work] | Detector<br>Occupancy<br>(channels) | Number of<br>Columns |
|-----------------------|-------------------------------|-------------------------------------|----------------------|
| 536                   | 5.62                          | 0.5 Columns                         | 1%                   |
| 1196                  | 6.68                          | 1 Column                            | 2.1%                 |
| 1752                  | 7.56                          | 1.5 Columns                         | 3.125%               |
| 2152                  | 8.22                          | 2 Columns                           | 4.1%                 |

Additionally, as described in [12], to detect and protect the system against errors caused by SEU in the FPGA memory cells, a threefold way is to be adopted:

- An efficient error detection scheme based on parity check logic,

-8/10 bits of data coding as part of the DDL2 and GBTx/GBTRx low level protocols have been implemented,

-A Cyclic Redundancy Check (CRC) will be accompanying data on its way between FEE and RCB board.

The obtained preliminary measurement results shown in Table II indicate an event readout time of ~10 us (100 kHz) for this new architecture leading to a performance improvement in data transfer rate between column controllers and DAQ by almost a factor of two when compared with the present Scalable Readout Unit (SRU) (~21us), Time Projection Chamber (TPC), 100us for High Momentum Particle Identification (HMPID) readout detector electronics as reported in [13], [14] and [15] respectively.

| ΤΔΒΙ Ε Π  | READOUT RATE | COMPARISON | WITH OTHER | DETECTORS  |
|-----------|--------------|------------|------------|------------|
| TADLE II. | KEADUUI KAIE | COMPARISON | WITH OTHER | DETECTORS. |

| Detector    | Estimated Readout Rate (us) |
|-------------|-----------------------------|
| (this work) | 10                          |
| SRU [13]    | 21                          |
| TPC [14]    | 33                          |
| HMPID [15]  | 100                         |

#### V. CONCLUSION

This paper presented the design of a new CPV Front-end Readout electronics system which attains the ALICE Readout rate goal of 50 kHz. The preliminary prototype measurements indicate an estimated event Readout rate of 100 kHz, twice the target value. The newly designed upgrade offers significantly improved electronics performance. Such an improvement in event readout rate when compared with the prior CPV, TPC, HMPID and SRU readout detector electronics is mainly due to the parallel readout and processing of column controllers and the adopted GBTx/SIU transceiver link speeds between DAQ and readout electronics of around 3.125- 5Gb/s. Additionally, the integrated CRC hard Intellectual Property (IP) FPGA block, shall detect and correct errors due to SEU, thus ensuring a reliable operation of the newly developed CPV electronics. A further study to be considered is the evaluation of data reliability versus the improvement in readout trigger rates.

Initial prototype cards have been completed and full production of all electronic cards is planned to be ready in the second half of year 2018. Finally, the old 700 nm 5-DiLogic card technology shall be replaced with an ASIC chip, thus leading to a better system performance, throughput and maintainability.

#### REFERENCES

[1] ALICE Collaboration, "Upgrade of Readout & Trigger System," CERN\_LHC-2013-0019.

[2] P. Riedler, "Upgrade of the ALICE Detector," in Technology and Instrumentation in Particle Physics, Chicago, 2011.

[3] J. C. Santiard, "The ALICE HMPID on-detector front-end end readout electronics," in Nuclear Instruments and Methods in Physics Research, Geneva, 2004.

[4]S.Calleja, "HMPID RCB firmware upgrade for Run3 environment," Department of Microelectronics and Nanoelectronics University of Malta, July 2016.

[5] P. Moreira et al., "The GBT Project," Proc. Topical Workshop on electronics for Particle Physics. France, pp. 342–346, September 2009.

[6] F. Carena, "DDL, the ALICE data transmission protocol and its evolution from 2 to 6 Gb/s," in Workshop on Electronics for Particle Physics. Aix-En, 2014.

[7] J. C. Santiard et al., "The Gassiplex07-2 integrated front-end analog processor for the HMPID and Dimuon spectrometer of ALICE," Proc. 5th Conference on Electronics for LHC Experiments, Snowmass, CO, USA, September 1999.

[8] C. Seguna et al., "Proposal for a new ALICE CPV-HMPID frontend electronics topology," in 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), July 2017.

[9] H. Witters et al., "DILOGIC-2: A sparse data scan readout processor for the HMPID detector of ALICE," Proc. 6th Workshop on Electronics for LHC Experiments, September 2000.

[10] P. Leitao et al.,"Test bench development for the radiaition Hard GBTX ASIC," Journal of Instrumentation, vol. 10, 2015.

[11] ALICE Collaboration, "Radiation Dose and Fluence in ALICE after LS2," ALICE, Geneva, 2017.

[12] E. Denes et al., "Radiation Tolerance Qualification Tests of the Final Source Interface Unit for the ALICE Experiment," Proc. Topical Workshop on electronics for Particle Physics. Valencia, Spain, pp. 438–441, September 2006.

[13] F. Zhang et al., "Point-to-point readout for the ALICE EMCal detector," Nuclear Instruments and Methods in, vol. A, pp. 157-162, 2014.

[14] A. Velure et al., "Upgrades of ALICE TPC Front-End Electronics for Long Shutdown 1 and 2," IEEE Transcations on Nuclear Science, vol. 62, pp. 1040-1044, 2015.

[15] ALICE Collaboration, Performance of the ALICE experiment at the CERN LHC, International Journal of Modern Physics, September 2014.