CALICE DAQ Developer's Day
Salle des conseil IPNL
Universe
Bilan «interne» post-TB et préparation de la suite:
- Test beam du printemps: ECAL & SDHCAL
-
Meeting CALICE DAQ général du 9 novembre
- Acceil de Mayence sur la CCC & amélioration
- DAQv2.5 ou v3
- CALICE DAQ dans AIDA
- ce que nous avons appris des tests en faisceau
- les points blocants et ce qu'il reste à faire
-
les idées & plans d'amélioration
- penser au manpower
CALICE DAQ Developer's Day
IPNL, 04/11/2011
Presents: V. Boudry, J. Prast, R. Cornat, L. Mirabito, N. Roche, H. Mattez, L. Caponetto, I. Laktineh, G. Vouters, C. Adloff, G. Baulieu, R. Gaglione, G. Grenier
The meeting took place at IPNL from 10:00–16:00.
1Goal
The goal of the meeting was
-
To consolidate the experience of the past TB, in order to pave the way to improvements and list actions to be taken immediately
-
Discuss various way of improvements of the actual system
-
Prepare the CALICE mini-WS of the following week.
A reminder of the future milestones was given: ECAL test bench readout ASAP, AHCAL readout started by december, TB for SDHCAL m³ & ECAL slabs in April and Summer/Fall, AIDA specification for a common DAQ by end of September and DBD by end of 2012.
2Discussion
The morning has been dedicated to a comprehensive overview of the difficulty encountered in all (?) the aspect of the acquisition during the TB, more of less in the following order:
-
LDA reliability
-
Agreed that the LDA is hopeless for large systems and should be dropped ASAP.
-
Could still be used for placid table top system with few channels; typically ECAL & AHCAL test benches. Therefore the improvement of the libLDA and DIF FW should be continued.
-
-
Configuration of HW:
-
DIF FW : some action needs to be taken (by G. Vouters)
-
Improvement of the stability of startup state (DIF can be in ≠ states at startup, preventing links to be established)
-
Improvement of the DIF reset (regression since FW40 due to Altera/Xilinks ≠ logics)
Make the ASIC Reset signal length configurable via a register -
Power Pulsing implementation (its apparent fault at PS might have been caused by other mis-behaviour).
-
-
The unsuccessful readout of μMegas @ SPS is not yet fully understood; with the sending of empty data (start & end of buffers seen), by the μM plane but also by an RPC one. Could be due to a loss (or a constant) of trigger (a DCC / LDA channel stuck in strange mode ?).
The case needs to be further investigated using single events on external trigger.
-
-
Diagnostics of faults
-
A more comprehensive diagnosis system needs to be set-up, now that the DAQ can take stable runs.
-
Ability to run Post-Mortem (crash or, locking of a DIF, loss of a link) diagnostics tools: hierarchical check of status of links & elements.
-
Some tools exist but are not practically (require ethernet cable change, for example).
Should be experimented in concrete situations and integrated in the main SW for standard use.
-
-
An elog centralising the main cause of trouble should be used to allow for statistical analysis.
-
Additionally critical failures could trigger a mail/message directly to the experts.
-
-
-
A clearer visualisation of the status of the DAQ is required for used by experimented shift crew : one exists already, but wasn't really used during the last TB.
-
A calibration system is required for the μMegas running
-
Could also be used for unambiguous diagnostics of readout failures.
-
CTEST of ASICs or fake data from DIFs ?
-
-
-
-
Stability of running and recovery procedure
-
DIF FW should be moved asap to DEV3 of DIF FW framework which includes a counter of DIF packets; this would allow to handle the loss of packets between the LDA and the PC.
-
Additionally a counter of sent packets in the DIF global trailer could allow a check of data consistency.
-
-
In the case some DIF stops sending data.
-
The GTC counter of the DIF ensures a global sync
-
Should a shift in DTC trigger a reset of all counters ?
-
-
The incomplete events could be, on demand,
-
Eliminated after a timeout, or
-
Their data replaced by a FAKE data type
-
(the same for some missing ASICs)
-
The physics analysis could be performed by cutting on the number of FAKE data.
-
-
-
An effective RESET procedure should be run; this requires a low level access to each element of the chain and the readout of link status registers (bidirectional).
-
The hypothesis of the charging of the AC links should be investigated in such cases
-
How ?
-
-
-
-
Configuration Management
-
To improve the speed of the configuration changes
-
A mask (ASIC/DIF/DCC) procedure is (now / ready to be ?) implemented.
-
The complete ASIC configuration will be produced once
-
The topology of the readout should be modified separately
-
-
The slowness of the configuration upload is due to a large ping response between CERN & CC. Could be by-passed by proxy in IPNL.
-
-
A fast remote / local ConfigDB procedure has been implemented & tested.
-
-
Performances
-
No CPU limitation was found during the TB; some data packet were lost while spying the pcap buffers.
-
The present use of RAW ethernet (baseline of CDAQ2) over switches was pointed out as risky as it doesn't provide collision nor loss of packets.
-
They can also cause a straight in data input/output as they use the same kernel part.
-
The mainstream is to use the ODR on point-to-point connection.
-
ODR use should be re-evaluated/tested.
-
-
-
The use of IP protocol for the future version (GigaDCC) should be evaluated.
This seems to be the path envisaged for LHC upgrades (L. Mirabito)
-
-
Utilisability
-
A Python GUI already exists for the XDAQ, but wasn't used during the last TB.
-
A GUI has developed for the ECAL needs could be used as a ConfigDB GUI and as expert debugging tool in case of crash/difficulty.
-
~Similar tools was developed in XDAQ for early tests.
-
-
The development of GUI for Python DAQ ancillary commands is not necessary
-
-
Mechanics
-
The development of a fix setup platform for the SDHCAL setup was raised; it would avoid the critical cabling and ground steps of the installation. The CALICE AHCAL table could be of use there. Alternatively a new support should be developed.
-
Ongoing & future developments
-
Rewriting of low level SW (GASOLine)
-
A draft of a rewriting of the interface SW to the HW as a server was presented; described as more generic in the present form it would re-implement/replace some functionalities already developed in XDAQ (LDAsupervisor).
-
-
Benches & qualification tests
-
Some help / documentation was requested to restart the XDAQ setup in LLR.
-
A minimal number of planes (~3) of the SDHCAL after its installation in IPNL was seen as necessary to tests the SW developments (configuration, monitoring).
-
-
The following points were foreseen but not discussed in depth.
-
Machine integration
-
Organisation
-
The idea (discussed next week at CALICE DAQ meeting) is to have 2 repositories & forges (SVN + Tickets + Wiki):
-
one at CC IN2P3 reserved for developers to keep track of internal dev & tests + all codes
-
one at CERN (savannah) for “external” used (FCAL, TB users) to keep track of global demands.
-
-
-
AIDA
-
DAQv3
-
3Propositions of solutions for a securer data taking
3.1USB readout using DAQ2 Clock & Control HW
Presented by G. Vouters
-
The idea is to pass all the Data (Config & Readout) by the USB while keeping all the fast signals (Clock, Trigger, Busy) on the DAQ2 HW (CCC,
LDA,DCC). -
The direct connection CCC↔DIF worked with m² without any pbm ≥ 20 days.
-
For m³:
-
2 levels of DCC needed (1 level ≤ 24 m²)
-
To be done : Data reception by USB (should be ✔); New commands on RS232 (which one ?); DCC FW ➔ simple switch for busy ramfull & passing of RS232 commands (without translation as fast commands); DIF FW for RPCs; Spill signal on CCC for Power Pulsing.
-
4–5 additional DCC required.
-
Add a DCC mezzanine ? ➔ for signal passing ? DCC as CCC ?
-
3.2«IPNL proposition»
-
Have a SuperDCC with some sync signal passing via the VME bus.
-
Use FNC connectors for better reliability