DIRAC Project meeting

Europe/Paris
Vanessa Hamar (CC-IN2P3 / CNRS)
Description

Vanessa HAMAR is inviting you to a scheduled Zoom meeting.

Topic: DIRAC@IN2P3
Time: Apr 29, 2025 02:00 PM Paris
Join Zoom Meeting
https://cnrs.zoom.us/j/95073992421?pwd=NQcBxrgVPvncBzbr06E0gLcNGYCdNI.1

Meeting ID: 950 7399 2421
Passcode: 9VDfwt

---

One tap mobile
+33186995831,,95073992421#,,,,*511661# France
+33170372246,,95073992421#,,,,*511661# France

---

Dial by your location
• +33 1 8699 5831 France
• +33 1 7037 2246 France
• +33 1 7037 9729 France
• +33 1 7095 0103 France
• +33 1 7095 0350 France

Meeting ID: 950 7399 2421
Passcode: 511661

Find your local number: https://cnrs.zoom.us/u/aOttp7Yct

---

Join by SIP
• 95073992421@zoomcrc.com

---

Join by H.323
• 144.195.19.161 (US West)
• 206.247.11.121 (US East)
• 159.124.15.191 (Amsterdam Netherlands)
• 159.124.47.249 (Germany)

Meeting ID: 950 7399 2421
Passcode: 511661

--------

https://etherpad.in2p3.fr/p/DIRAC%40IN2P3-29042025

 

Participants
  • Andrei Tsaregorodtsev
  • Axel BONNET
  • Bertrand Rigaud
  • Luisa Arrabito
  • Natthan Pigoux
  • Sorina POP
  • Vanessa Hamar

Compte rendu réunion DIRAC@IN2P3

Mardi 29 Avril 2025

 

 

Attendes: 

Andrei, Bertrand, Luisa, Axel, Vanessa

 

Agenda:

  1. Project Status
  2. Projects
  3. DiracX
  4. Publications / Conferences
  5. Consortium news
  6. AOB

 

1. Project Status

Previous meeting : https://indico.in2p3.fr/event/35438/ 

 

·      The DIRAC9/DiracX0.1 certification:

 

o   Last certification hackathon on the 20th of March with rather positive results. 

o   LHCb decided to migrate to DIRAC9/DiracX while no data taking. 

 

 

·      Results from the BiLD meeting:

 

o   Migrated to DIRAC v9 and DIRACX, and lots of other updates: 

  1. We took everything down
  2. We applied optional updates that we profited to make since anyway the whole system was down 
    1. MySQL updates: (these are optional updates that we profited to make since anyway the whole system was down) 
    2. Update MySQL to 8.4
    3. Update ROW_FORMAT to “Dynamic” (many tables were created when the default was “Compact”)
    4. Updated character set to utfmb4
    5. optimize (defrag)
    6. updated the passwords, adding special characters (which were NOT OK) – PR, also backported to v8
    7. few machines resized
    8. deployed lhcbdiracx and lhcbdiracx-web
    9. Update plan followed: https://codimd.web.cern.ch/dnfwITCRRTSvhopGDjlHSA?both
    10. effectively, more than a full week of downtime
    11. Restart NOT smooth - many hotfixes and quickly-merged PRs both for DIRAC and DiracX (and LHCbDIRAC and lhcbdiracx)
    12. a token is added the proxy, calling diracx to get it
    13. SandboxStore forwards to DiracX
    14. Turned on and off several times the WorkloadManagement/JobStateUpdate to diracx route with the never-tested-before JobStateUpdate legacy adapter
    15. 10 lhcbdiracx pods running, for now (scaled by hand)
    16. Issues (probably forgetting some!): 
    17. The MySQL update scripts were partly incomplete – now should be alright
    18. AccountingDB (MySQL) optional updates could not complete (few tables are too big)
    19. diracx client extension code was re-worked (and basically tested in production)
    20. All user jobs that were still in the system (or those that were added at the initial restart) failed until we put in this PR
    21. diracx scalability issues (should all be solved by now): 
    22. The SandboxMetadataDB access was VERY slow (from diracx) because of non-optimized query, fixed
    23. this query is specific to diracx in order to verify that a user can actually get the sandbox
    24. we also cleaned the DB from quite some inconsistencies
    25. AuthDB was missing indices
    26. too many (refresh?) tokens requested (DIRAC fix)
    27. DIRAC Framework/ProxyManager under some load, added few instances, reflected on diracx pods – possible related issue and maybe a PR
    28. We are now running with the DIRAC JobStateUpdate services (not the diracx ones)
    29. We realized that pilots could not e.g. update the jobs status – an issue was opened and a PR opened and closed (??) 
    30. for the moment, we gave the pilot the JobAdministrator property
    31. Too many DNS requests from the DIRAC SandboxStore machine to the lhcbdiracx pods – not fully resolved, not clear who’s at fault
    32. (we discovered that) Pilots can’t use anymore the host certificates. LHCb HLT farm was relying on that, so now it first downloads the pilot proxy (with a token inside).
    33. The diracx /api/jobs/status route sometimes (?) stores local time instead of UTC, fixed with PR ?
    34. Decent monitoring became quickly important. The openshift one is alright-ish but OTEL is needed, so PR created
    35. Status: 
    36. went through a few releases, now running with alpha versions, and hotfixes
    37. we are now running “almost everything”. Went up to 100k running jobs yesterday
    38. more issues still coming up one-by-one but now “running”

 

·      We now have to see how to install the cctbdiracXX with DIRAC9/DiracX with our peculiarities:

o   K3S as opposed to Openshift

o   MariaDB

o   ElasticSearch

o   MQ Rabbit

 

1.1 Versions:

  • DIRAC v8.0.73

o   Several fixes

o   Singularity CE, AREX CE

o   Clear any non-UTF encodable environment variables in pilots

  • DIRAC Prerelease  v9.0.0a55

o   Multiple fixes following the LHCb migration 

  • DIRACX Prerelease v0.0.1a26

o   Multiple fixes following the LHCb migration 

o   See Project tasks: https://github.com/orgs/DIRACGrid/projects/2

1.2 Developments :

·      Bild meetings https://indico.cern.ch/event/1531451/

·      DIRAC/DiracX test/certification setup (cctbdiracXX)

·      02 host is ready for usage (Vanessa)

·      Multi-host k3s installation is set up and being tested (Bertrand)

·      Making cctbdiracXX installation functional on the same level as the LHCb

·      Benefit from the hackathon 5-6th May at CERN

1.3. EGI Services :

  • Smooth running
  • Romain asks for token access to DIRAC services. Use this case for testing the migrated cctbdiracXX installation
  • Pilot submission with tokens enabled for 5 VOs: wenmr, biomed, auger, km3net, complex. No requests for other VOs yet. HTCondorCE supporting the above VOs were contacted to update their token's configuration. But not all yet.

 

1.4. European/National Projects : 

  • GreenDIGIT 

o   Reporting period started. Formulating the architecture of the eventual demonstrator

  • PEPR STEEL 

o   Some discussions about the practical demonstrator. DIRAC webinar was presented, well accepted, should be followed by technical discussion of collaboration with other partners in the "computing continuum".

  • SpectrumCoP-WP[1,2,3]

o   NTR

o   https://www.spectrumproject.eu

2. Projects

2.1 CTAO (Luisa, Natthan)

  • Release of the CTAO offline software stack is ongoing. Certification is going. The pipeline using the updated applications will be tested with DIRAC/Rucio. The job applications are described in CWL and the intention is to try out the interface elaborated by Alexandre for DiracX. CTAO proper parsing of the CWL job description is developed. CWLTool whcih is interpreting the job application in a worker node is supposed to use containerized application from CVMFS but should be updated to do so.  
  • Migration attempt to DIRAC9 was done, but not much progress. Should benefit from the LHCb example of the service migration.

2.2 Biomed (Sorina, Axel)

  •  Sites to be reexamined with respect to the tokens configuration. 

2.3. Juno  

  • Working on the JUNO File Catalog demonstrator of the metadata capabilities
  • Filled in a demo sample of files (1.2M, 2 years equivalent)
  • Added ENUM metadata typec
  • Using objects (directories) to just store metadata without dat

2.4 Km3NET.org

  •  NTR

4. Conference, workshop, publications

  • DIRAC Hackathon 5-6 May, CERN
  • EGI Conference 2-6 June, Santander, Spain
  • DIRAC User's Workshop, 17-20 September 2025, IHEP, Beijing
  • JCAD'25 15-17 Sep

5. Consortium

  • Meeting 25/04/2025
  • OK for the HSF affiliation program
  • Vanessa to discuss details at the HSF workshop next week
  • OK for CTAO membership case study
  • OK for DIRAC-Rucio reference platform at CERN study

6. AOB

  • Zacharie Bedecarrax joins CPPM as CDD Greendigit starting from 1st May
  • The next meeting date: 10 June 14h
Il y a un compte-rendu associé à cet événement. Les afficher.
    • 14:00 14:40
      Etat du projet 40m
      • Tour de table
      Orateurs: Dr Andrei Tsaregorodtsev (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France), Vanessa Hamar (CC-IN2P3 / CNRS)
    • 14:40 15:20
      Projets 40m
      Orateurs: Dr Andrei Tsaregorodtsev (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France), Axel BONNET ({CNRS}UMR5220), M. Bertrand Rigaud (CC-IN2P3), Luisa Arrabito (LUPM), Natthan Pigoux (LUPM), Sorina POP (CNRS)
    • 15:20 15:25
      Publications / Conférences 5m

      Publications

      Conferences

      Orateur: Dr Andrei Tsaregorodtsev (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France)
    • 15:25 15:35
      Consortium news 10m
      Orateur: Dr Andrei Tsaregorodtsev (CPPM-CNRS)
    • 15:35 15:45
      AOB 10m