CALICE DAQ Developer's Day
IPNL, 04/11/2011
Presents: V. Boudry, J. Prast, R. Cornat, L. Mirabito, N. Roche, H. Mattez, L. Caponetto, I. Laktineh, G. Vouters, C. Adloff, G. Baulieu, R. Gaglione, G. Grenier
The meeting took place at IPNL from 10:00–16:00.
The goal of the meeting was
To consolidate the experience of the past TB, in order to pave the way to improvements and list actions to be taken immediately
Discuss various way of improvements of the actual system
Prepare the CALICE mini-WS of the following week.
A reminder of the future milestones was given: ECAL test bench readout ASAP, AHCAL readout started by december, TB for SDHCAL m³ & ECAL slabs in April and Summer/Fall, AIDA specification for a common DAQ by end of September and DBD by end of 2012.
The morning has been dedicated to a comprehensive overview of the difficulty encountered in all (?) the aspect of the acquisition during the TB, more of less in the following order:
LDA reliability
Agreed that the LDA is hopeless for large systems and should be dropped ASAP.
Could still be used for placid table top system with few channels; typically ECAL & AHCAL test benches. Therefore the improvement of the libLDA and DIF FW should be continued.
Configuration of HW:
DIF FW : some action needs to be taken (by G. Vouters)
Improvement of the stability of startup state (DIF can be in ≠ states at startup, preventing links to be established)
Improvement of the DIF reset (regression since FW40 due to Altera/Xilinks ≠ logics)
Make the ASIC Reset signal length configurable via a register
Power Pulsing implementation (its apparent fault at PS might have been caused by other mis-behaviour).
The unsuccessful readout of μMegas @ SPS is not yet fully understood; with the sending of empty data (start & end of buffers seen), by the μM plane but also by an RPC one. Could be due to a loss (or a constant) of trigger (a DCC / LDA channel stuck in strange mode ?).
The case needs to be further investigated using single events on external trigger.
Diagnostics of faults
A more comprehensive diagnosis system needs to be set-up, now that the DAQ can take stable runs.
Ability to run Post-Mortem (crash or, locking of a DIF, loss of a link) diagnostics tools: hierarchical check of status of links & elements.
Some tools exist but are not practically (require ethernet cable change, for example).
Should be experimented in concrete situations and integrated in the main SW for standard use.
An elog centralising the main cause of trouble should be used to allow for statistical analysis.
Additionally critical failures could trigger a mail/message directly to the experts.
A clearer visualisation of the status of the DAQ is required for used by experimented shift crew : one exists already, but wasn't really used during the last TB.
A calibration system is required for the μMegas running
Could also be used for unambiguous diagnostics of readout failures.
CTEST of ASICs or fake data from DIFs ?
Stability of running and recovery procedure
DIF FW should be moved asap to DEV3 of DIF FW framework which includes a counter of DIF packets; this would allow to handle the loss of packets between the LDA and the PC.
Additionally a counter of sent packets in the DIF global trailer could allow a check of data consistency.
In the case some DIF stops sending data.
The GTC counter of the DIF ensures a global sync
Should a shift in DTC trigger a reset of all counters ?
The incomplete events could be, on demand,
Eliminated after a timeout, or
Their data replaced by a FAKE data type
(the same for some missing ASICs)
The physics analysis could be performed by cutting on the number of FAKE data.
An effective RESET procedure should be run; this requires a low level access to each element of the chain and the readout of link status registers (bidirectional).
The hypothesis of the charging of the AC links should be investigated in such cases
How ?
Configuration Management
To improve the speed of the configuration changes
A mask (ASIC/DIF/DCC) procedure is (now / ready to be ?) implemented.
The complete ASIC configuration will be produced once
The topology of the readout should be modified separately
The slowness of the configuration upload is due to a large ping response between CERN & CC. Could be by-passed by proxy in IPNL.
A fast remote / local ConfigDB procedure has been implemented & tested.
Performances
No CPU limitation was found during the TB; some data packet were lost while spying the pcap buffers.
The present use of RAW ethernet (baseline of CDAQ2) over switches was pointed out as risky as it doesn't provide collision nor loss of packets.
They can also cause a straight in data input/output as they use the same kernel part.
The mainstream is to use the ODR on point-to-point connection.
ODR use should be re-evaluated/tested.
The use of IP protocol for the future version (GigaDCC) should be evaluated.
This seems to be the path envisaged for LHC upgrades (L. Mirabito)
Utilisability
A Python GUI already exists for the XDAQ, but wasn't used during the last TB.
A GUI has developed for the ECAL needs could be used as a ConfigDB GUI and as expert debugging tool in case of crash/difficulty.
~Similar tools was developed in XDAQ for early tests.
The development of GUI for Python DAQ ancillary commands is not necessary
Mechanics
The development of a fix setup platform for the SDHCAL setup was raised; it would avoid the critical cabling and ground steps of the installation. The CALICE AHCAL table could be of use there. Alternatively a new support should be developed.
Ongoing & future developments
Rewriting of low level SW (GASOLine)
A draft of a rewriting of the interface SW to the HW as a server was presented; described as more generic in the present form it would re-implement/replace some functionalities already developed in XDAQ (LDAsupervisor).
Benches & qualification tests
Some help / documentation was requested to restart the XDAQ setup in LLR.
A minimal number of planes (~3) of the SDHCAL after its installation in IPNL was seen as necessary to tests the SW developments (configuration, monitoring).
The following points were foreseen but not discussed in depth.
Machine integration
Organisation
The idea (discussed next week at CALICE DAQ meeting) is to have 2 repositories & forges (SVN + Tickets + Wiki):
one at CC IN2P3 reserved for developers to keep track of internal dev & tests + all codes
one at CERN (savannah) for “external” used (FCAL, TB users) to keep track of global demands.
AIDA
DAQv3
Presented by G. Vouters
The idea is to pass all the Data (Config & Readout) by the USB while keeping all the fast signals (Clock, Trigger, Busy) on the DAQ2 HW (CCC, LDA, DCC).
The direct connection CCC↔DIF worked with m² without any pbm ≥ 20 days.
For m³:
2 levels of DCC needed (1 level ≤ 24 m²)
To be done : Data reception by USB (should be ✔); New commands on RS232 (which one ?); DCC FW ➔ simple switch for busy ramfull & passing of RS232 commands (without translation as fast commands); DIF FW for RPCs; Spill signal on CCC for Power Pulsing.
4–5 additional DCC required.
Add a DCC mezzanine ? ➔ for signal passing ? DCC as CCC ?
Have a SuperDCC with some sync signal passing via the VME bus.
Use FNC connectors for better reliability