# ISTITUTO NAZIONALE DI FISICA NUCLEARE Sezione di Bari INFN/TC-96/19 28 Ottobre 1996 F. Loddo, G. Maggi, A. Ranieri, F. Ruggieri: A CMOS SEMI-CUSTOM ASIC FOR THE CLUSTERING OF BIT STRINGS OF THE ALEPH HADRON CALORIMETER SIS-Pubblicazioni dei Laboratori Nazionali di Frascati | | | 2 | |--|--|---| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INFN/TC-96/19 28 Ottobre 1996 # A CMOS Semi-Custom ASIC for the clustering of bit strings of the Aleph Hadron Calorimeter. F.Loddo<sup>a)</sup>, G.Maggi<sup>b)</sup>, A.Ranieri<sup>a)</sup>, F.Ruggieri<sup>a)</sup> a)Dipartimento di Fisica and Sezione INFN, Bari, Italy # Abstract This note describes an integrated circuit designed for the cluster search in a bit string, coming from a streamer tubes detector[1]. This integrated circuit is the basic component of a Fastbus module, which is used for the digital read out of Aleph hadron calorimeter at LEP200 [2][3]. b)Dipartimento di Fisica dell'Università di Bari and Sezione INFN, Bari, Italy ## Introduction The information coming from more than 220000 digital channels of the Aleph hadron calorimeter and muon chamber system, is read out by a Fastbus module called ASTROS [4]. This digital information is organized as shift register chains up to 256 bits long. The module performs a zero suppression in the bit strings coming from the shift-registers chains and it determines the address and the clusters size, of fired channels in the chain. The performances of the first implementation of the module was limited by: - the low data transfer rate between the shift register chains and the module; - the single and inadequate memory bank used to store all the data coming from the 24 shift register chains read out by the same module; - the need of initializing the data transfer for each module in the Fastbus crate: this involves a high dead time. The need of removing these limits and the growing requirement of improving the overall performances of the readout electronics induced us to design an ASIC, in CMOS n-well 1 $\mu$ m technology; it was produced by the ES2 (European Silicon Structure) foundry, and it implements all the required functions for reading one single chain[5]. # 1. ASIC architectural characteristics The ASIC includes some new features as regards the present implementation: - the reordering of the strip position: in the case in which the physical strip address does not correspond to the electronic channel address, it assignes the correct adress value to the strips; - the suppression of the chains which have all the channels set at logical level "one"; - the suppression of the noisy strips; - the check of the synchronization among the different chains which are read out by the same module; - the reduction of the readout time since *only one initialization* is required for all the modules in a crate; - the implementation of a double event buffer to allow the *derandomization* of the events. # 2. ASIC design The design of the chip for the new scanner was developed using the language VERILOG-XL for describing its functional blocks and for the behavioural simulation of the circuit. Then, the Cadence-EDGE CAD software was used for schematic entry, placement and routing tool. The chip has been designed using a "semi-custom" approach, since nothing in the logical structure justified the more complex "full-custom" technique, while fundamental requirements were its functionality and testability. The CMOS components of the ES2-ECPD10 library in $1\mu$ technology were used. The CMOS technology was chosen for the following reasons: - it allows a very high integration density that is suitable for devices with a very high number of components (more than 10<sup>6</sup>); - the CMOS digital circuits are low power consumpting, since the CMOS devices draw current only during the switching phase while, in the steady state, the current consumption is negligible. In order to implement some of the required functions, some "megacells" were used: two Static RAM (SRAM) memories and two FIFOs. The megacells include a self-diagnostic circuit, called BIST (Built In Self Test). This additional circuitry increases the device testability, but it requires a little growth of the device silicon area and a slight reduction of device speed for some unavoidable multiplexers. # 3. The main chip components and their functions The block diagram of the chip (delimited by the dashed line), along with some Fastbus registers of the module, is shown in Figure 1. The main chip components are: - a reordering memory; - an acquisition memory; - a Marker check circuit; - a noisy chains suppression logic; - a clustering logic; - an event buffer made of a double FIFO memory; - a Test register. Fig.1: Block diagram of the chip and of its control logic # 3.1 The reordering memory Because of assembling constraints [2], the physical positions of some strips in the Aleph hadron calorimeter do not correspond to the electronic addresses of the correspondent channels; the conversion is presentely done by software. In the ASIC this reordering is performed by hardware. 'The chip stores the 256 bits of a string coming from the detector in the acquisition memory (described below). The reordering is achieved by means of a 256x10 bits static ram called STrip Address Memory (STAM) which is loaded, during the initialization phase, with the physical addresses of strips and used as look-up table during the readout phase. If the reordering option is selected, the information of the i'th bit of the input string is stored in the memory location whose address is contained into the i'th STAM location that is pointed by an external 8+1 bit counter, the Strip Counter (Fig.1). On the other hand, if the reordering option is disabled, the information of the i'th bit of the input string is stored in the i'th location of the acquisition memory, that is directely addressed by the Strip Counter. Fig.2: Reordering mechanism # 3.2 The acquisition memory The bit string coming from the front-end electronics of the streamer tubes apparatus, is stored in the *acquisition memory*, called Shift In Memory (SHIM): it is a 256x1 bit SRAM. During the acquisition phase, it is addressed either by the STAM (reordering option enabled) or by the Strip Counter (reordering option disabled). During the clustering phase, it is addressed by the Clustering Counter. Both counters are common to all the chips in the module. The obvious benefit of this configuration is the possibility of splitting the acquisition into two indipendent phases: the storing into the SHIM and the clustering into the FIFOs. The clock used to write data into the acquisition memory runs at 1 MHz (shck signal in Fig.3a and 3b), while for the clustering phase a 10 MHz clock (ckclust) is used. The acquisition speed is in fact constrained by the clock frequency of the front-end electronics while, once the event is stored into the chip, the data processing can run at a higher frequency. # 3.3 The noisy chain suppression logic During the bit string acquisition, a special circuit analyzes the content of the acquisition memory. If all bits in the chain are at logical "one", (this may be caused, for instance, by some break-down of the front-end electronics), it prevents the useless clustering phase of such bit string: no data are generated. # 3.4 The clustering logic This circuit analyzes the bit string and searches the clusters of consecutive '1' bit stored in the acquisition memory; for each cluster, only the size (4 bits) and the address (8 bits) of the last bit are stored in the output buffer accomplishing, therefore, a zero suppression operation. The clustering logic is made of: a 4 bit binary counter as "size counter", an 8 bit binary counter as "address counter" and a finite state machine which drives the size counter according to the bit string coming from the SHIM. On the beginning of the clustering phase, the "address counter" starts to run while the circuit analyzes the output of the SHIM. When the first bit at "one" is detected, the "size counter" starts to run and it stops the counting when the next bit at "zero" is read out. At the same time, the circuit provides a write pulse to the FIFO (par. 3.5), where the current state of the two counters is recorded as a 12 bit word. The "size counter" is then resetted and ready for a new counting. If the cluster size is larger than "15", the cluster is broken into two or more cluster in such a way: the first 15 bits make a cluster, the next bits at "1" form another cluster and so on, up to the next "0". ## 3.5 The double buffer As event buffer, two 128x12 bits FIFOs are used. This makes the acquisition more flexible; in fact, the chip can release the acquisition memory as soon as the clustering has been completed and another event can be stored into the same memory during the readout phase of the previous one. The FIFOs are written by means of write pulses generated by the clustering logic, while they are read according to the Fastbus protocol. # 3.6 The Marker check circuit The Marker check is performed at the end of the chain scanning, only if the reordering option is enabled. This circuit continuously verifies the integrity of the transferred data; it is made of: an 8 bit shift register containing the marker word (precharged during the initialization), a 4 bit counter and an 8 bit comparator. During the readout phase, the bit pattern of the marker word is injected at the far side of the shift register chain, shifted through all the chain and read out after the last bit of the chain. The circuit compares the bit pattern at the end of the chain scan with the marker word: if the marker is wrongly received, an error bit is set and read as the 13'th bit of the FIFO words. # 3.7 The Test register This is an 8 bits circular shift register for testing the chip functionality. It can be loaded with a predefined 8 bits word and used to simulate the acquisition of the bit string coming from the front-end electronics. # 4. Chip signals specification The input/output signals of the chip are described in TABLE I: | CSR031 | Input | Common Reset | | |--------------|--------|-----------------------------------------------------------------|--| | SHCK | 66 | 1 MHz acquisition clock | | | CKCLUST | 66 | 10 MHz clustering clock | | | FBOP_ONMEM | 46 | Fastbus operation enable pin | | | RW_MEM | 66 | memory read/write enable pin | | | DATA_RDWR | " | internal memory and registers strobe | | | CHIP_SEL | " | chip select pin | | | TEST_REG_SEL | " | Test Register select | | | MARK_REG_SEL | " | Marker register select | | | REORDER | 66 | chip operation mode select | | | DATA<12:0> | bidir. | chip data I/O bus | | | ADD<7:0> | Input | chip address bus | | | TRIGDEL | 66 | start acquisition pin | | | END_WRITE | Output | determines the end of the acquisition phase | | | START_CLUST | Input | signal used to indicate the start of the clustering phase | | | SERIALIN | " | input for the data bit strings | | | FIFOSEL | " | fifo select (FIFO1 or FIFO2) | | | EMPTY | Output | Fifo "empty" flag output | | | EMPTY_DEC_OU | Input | internal Fifo enable | | | T | | | | | READ_FIFO | Input | read enable Fifo | | | EN_STRIP | Input | acquisition enable pin | | | TEST2_CHAN | Input | chip test select | | | WRITE_FIFO | Output | pulse sent out every time a Fifo write operation is in progress | | TABLE I: Chip pins specification ## 5. Simulation results. The simulation results will be described in the following, showing how the chip works. These results have been obtained using, for the standard cells, the physical characteristics given by the foundry, in terms of propagation delay, setup and hold time. For the "megacells", (the Ram and the FIFO), a functional description has been built, taking into account the physical characteristics of the particular component implemented in the device and the general informations given by the various data-sheets. All these simulations have been performed by means of the high level description language VERILOG-XL. A particular description of every component of the device has been made, allowing a hierarchical implementation of the entire device, according to a "bottom-up" approach. # 5.1 Simulation with Reordering and without In Fig. 3a, the procedure of acquisition with "reordering" is shown: the reorder input (reord signal in the figure) must be set at logical "1"; in this case, the address to the acquisition memory is given by the STAM (STAout signal in the figure) which contains the real address of the front-end electronics channel. The STAM is sequentially addressed by the strip counter, whose output is add. The signal MEstam and MEshim are the enable inputs of STAM and SHIM respectively. The acquisition starts on the transition zero to one of the st\_ac signal and the input to the SHIM is the shimin signal. In Fig.3b, the same procedure, without the reordering, is shown: in this case the acquisition memory address is directly given by the strip counter (adshim corresponds to add). Fig.3b: Start acquisition without reordering # 6. Placement, Routing and post-layout simulation After the logical simulation of the chip, the chip layout was defined. In the case of standard-cell design, and this was our case, the layout can be obtained by means of automatic placement and routing tools. The CAD CADENCE SOLO 2030, inside the environment EDGE, was used. The chip layout is shown in Fig.4, where the white areas are occupied by the macrocells, whose layout is not available to the customer. FIG. 4: Layout of the chip. The chip size, including the I/O ring, is 20 mm<sup>2</sup> while the die size is 16 mm<sup>2</sup>. The chip is therefore "core limited". Once the layout was completed, the capacitive loads of the routing lines were extracted and translated into a Verilog format; then, a more realistic simulation of the circuit was performed. The results were consistent with the previous ones. # 7. Electrical and mechanical characteristics In TABLE II and TABLE III, the physical characteristics of the chip are shown. | SYMBOL | PARAMETER | VALUE | UNIT | |--------|------------------------------|-----------------|------| | Vdd | DC supply voltage | -05 to 7.0 | V | | Vin | DC input voltage | -1.5 to Vdd+1.5 | V | | Vout | DC output voltage | -0.5 to Vdd+0.5 | V | | Iout | DC current drain/pin | 25 | mA | | Ipo | DC current drain Vdd and Vss | 75 | mA | | Tstg | Storage temperature | -65 to 150 | deg. | | Tl | Lead soldering Temp. | 300 | deg. | TABLE II: Absolute maximum ratings | SYMBOL | PARAMETER | MIN | MAX | UNIT | |--------|-----------------------|------|-----|------| | Vdd | DC supply voltage | 4.75 | 5.5 | V | | Vin | DC input voltage | 0 | Vdd | V | | Vout | DC output voltage | 0 | Vdd | V | | Тор | Operating temperature | -25 | 70 | deg. | TABLE III: Recommended operating conditions # 7.1 Logic levels and loading All the signal pins on the chip are TTL compatible. TTL input buffers with a current capability included between -10 and 10 $\mu A$ were used. For the two system clock inputs, which have to drive an high number of lines, non-inverting Schmitt-Trigger input buffers, which allow fast level transitions, were used. The output pads are driven by "low slew rate" output buffers, with 4 mA drive capability, having Vol<sub>max</sub>= 0.5V and Voh<sub>min</sub> = 4.0V. This type of output buffer allows a certain noise reduction since the noise is proportional to the "slew rate". # 7.2 **Power supply** The chip runs with a standard 5V + /- 5% supply. The power is distributed to the device by two independent power rails: one for the I/O buffers, the other one for the core of the chip. In fact, the current consumption of the output power rail may have large fluctuations when output drivers change level on highly loaded pins. For this reason a good decoupling of the output power pins has been implemented, taking care to prevent ground bounce from the output power rail to propagate into the core power rail. The power consumption of the chip is very dependent on the bus activity: in a "typical" situation, running with a system clock not greater than 10 MHz and with a capacitive load per output pad equivalent to $C_L$ =40 Pf, the total dynamic dissipation per output pad is about 40 mW and the total dynamic dissipation per chip is equivalent to 210 mW. # 7.3 Package The chip is packaged in a 50 mils pitch 68 pins plastic chip carrier. It can be mounted in plastic socket or surface mounted soldered on the board. The pin layout of the chip is shown in TABLE IV. | Pin | Signal Name | | Pin | Signal Name | | |-----|-------------|-----|-----|---------------|-----| | 1 | BST_RES_F1 | OUT | 37 | GND | OUT | | 2 | RW_MEM | IN | 38 | VDD | IN | | 3 | FBOP_ONMEM | IN | 39 | NTA14 | IN | | 4 | FIFOSEL | IN | 40 | NTA8 | IN | | 5 | REORDER | IN | 41 | GND | IN | | 6 | GND | IN | 45 | PWR | IN | | 7 | VDD | IN | 46 | DATA0 | IO | | 11 | CKCLUST | IN | 47 | DATA1 | IO | | 12 | EMPTY_D_OUT | IN | 48 | DATA2 | IO | | 13 | SHCK | IN | 49 | DATA3 | IO | | 14 | READ_FIFO | IN | 50 | DATA4 | IO | | 15 | TEST_2CHAN | IN | 51 | DATA5 | IO | | 16 | DATA_RDWR | IN | 52 | DATA6 | IO | | 17 | EN_STRIP | IN | 53 | DATA7 | IO | | 18 | SERIALIN | IN | 54 | DATA8 | IO | | 19 | START_CLUST | IN | 55 | DATA9 | IO | | 20 | END_WRITE | IN | 56 | DATA10 | IO | | 21 | TRIGDEL | IN | 57 | DATA11 | IO | | 22 | CHIP_SEL | IN | 58 | DATA12 | IO | | 23 | BST_TEST | IN | 62 | GND | IN | | 24 | BST_CLK | IN | 63 | VDD | IN | | 28 | CSR031 | IN | 64 | WRITE_FIFO | OUT | | 29 | ADD7 | IN | 65 | EMPTY | OUT | | 30 | ADD6 | IN | 66 | BST_RES_RAM0 | OUT | | 31 | ADD5 | IN | 67 | BST_RES_RAM1 | OUT | | 32 | ADD4 | IN | 68 | BST_RES_FIFO0 | OUT | | 33 | ADD3 | IN | | | | | 34 | ADD2 | IN | | | | | 35 | ADD1 | IN | | | 1 | | 36 | ADD0 | IN_ | | | 1 | TABLE IV: Pin layout # 8. Usage of ASICs The ASICs were produced by ES2 foundry. They have been placed on the new ASTROS modules wich are installed on the Aleph Hadron Calorimeter apparatus running now at CERN . These new ASICs showed, in particular, how the use of the internal double buffer (the two FIFOs) can lead to very good performances in terms of time data taking. In fact, while one event is read by the processor controlling the acquisition, another event can be scanned and clustered. Using an acquisition clock at 1 MHz and a clustering clock at 10 MHz, the chip takes about 300 $\mu s$ for the acquisition and the clustering of a 256 bit string long. ## 9. Conclusions A special Integrated Circuit for the clustering of bit strings was built and mounted on the new ASTROS module. This module is used in the hadron calorimeter acquisition system of the Aleph experiment at LEP. A description of the chip, both at functional and behavioural level, has been extensively made by means of Verilog-XL simulator, and this allowed to have a reliable device, fulfilling all our design requirements. # Acknowledgement We would like to thank Mr. Michele Papagni for his help during the mounting and testing of the ASTROS module prototype, for testing the ASIC. # 11. Figure captions - Fig.1 Block diagram of the chip and of its control logic - Fig.2 Reordering mechanism - Fig.3a Time diagram: start acquisition in "reordering" mode - Fig.3b Time diagram: start acquisition in "non reordering" mode - Fig.4 Layout of the chip # 12. References - [1] M.G.CATANESI ET. AL.: "Performance of a limited streamer tube hadron calorimeter." Nucl. Instr. Meth., A247 (1986) 438-444. - [2] **The ALEPH Collaboration**: "ALEPH: a detector for electron-positron annihilations at LEP." Nucl.Instr.Meth., A294 (1990) 121-178. - [3] **The ALEPH Collaboration**: "Performance of the ALEPH detector at LEP." CERN-PPE/94-170, 1 November 1994. - [4] M.G.CATANESI ET. AL.: "The ALEPH hadron calorimeter strip readout scanner." Nucl.Instr.Meth., A297 (1990) 390-395. - [5] F. LODDO.: Phd Thesis. "Progettazione di un ASIC per l'esperimento ALEPH." Politecnico di Bari, A.A. 1992-1993 \* "