Réunion jointe Opération + AT Grille

Europe/Paris
322 (IN2P3)

322

IN2P3

Hélène Cordier (CNRS/IN2P3)
Description
Exceptionnellement, seulement la partie commune entre Opération et AT Grille aura lieu. Opération site CCIN2P3 Participants: représentants de l'exploitation, du support et d'autres selon ordre du jour, plus ceux de la réunion AT Grille habituelle.
    • 16:00 16:40
      Incidents et problèmes des VOs (CC, sites français, autres sites) 40m
      Sujets Stockage: ------------------- Sujets Exploitation: ------------------------ 28/02: Arrêt du batch (pb climatisation) 02/03: Charge sur un serveur AFS -> blocage atlas050 + jobs aligrid limités 03/03: Test de charge sur Oracle par Atlas Sujets Support: ------------------- CMS report ********** * Transfer - FNAL-->CCIN2P3 - The import data arrived to CC with a very high rate of bout 250MB/s for several hours last week. - This saturated the CUSTODIAL Space tokens of about 2TB. In addition, the migration towards HPSS suffered from many I/O errors because of the pools filled quickly up. - These caused the failure of the transfer, including others links. - For this transfer the FNAL-->STAT channel was used, so dcachemaster did not have the control for reducing the rate. Action considered to resolve the issue: - The CUSTODIAL buffer was increased by about 8TB - the transfer traffic was reduced by manipulating some parameters on the Phedex agents. - The used of the FANL->STAT channel was due to a Phedex misconfiguration. - This was identified and fix. The dedicated channels for all incoming connections was configured correctly to be use for this purpose. - CCIN2P3-->GRIF - Since last week the pools hosted the LoadTest seem have some issues. A fix was provided by Lionel, but the issues still persist. - All transfers in Debug instance from CCIN2P3 to T2_FR_GRIF_LLR and T2_FR_GRIF_IRFU are failing with that same error. - Actually it seems that this is happening to all CCIN2P3->* transfers in Debug instance that use the LoadTest. * CMS jobs - The running jobs that try to write to the backfill space [1] are failing due to: "SRM_NO_FREE_SPACE". Jonathan provided me a space of about 20-25TB to be used by the prod people to perform their tests. Known that the test was just started last week, so why the jobs can not write to this area ? * L'activite de reprocessing va debuter la semaine prochaine pour CMS. [1] /pnfs/fs/usr/data/cms/data/store/backfill /pnfs/fs/usr/data/cms/data/store/backfill1 Atlas: ****** * 2 reprocessing simultanes prevus la semaine prochaine: - donnees cosmiques: recuperer les fichiers de HPSS et les mettre sur disque (80TB de donnees). - Monte Carlo: effectuer la migration des fichiers sur disques vers HPSS apres avoir effectue le merge des petits fichiers. Alice: ****** Nothing to report. LHCb: ***** Nothing to report. AT Grille généralement: ---------------------------- - CE, BDII, VOMS... - SE, FTS, LFC, SRM, dCache
      Orateur: Tous
      Problèmes récurrents des VOs
    • 16:40 16:45
      Charge et événements prévisibles pour la semaine à venir 5m
      Demandes de production, annonces de transfert, data challenges, arrêts, installations, ... Tout cela uniquement s'il y a un intérêt général. Aussi: des nouvelles des projets Grille, encore uniquement s'il y a un intérêt général.
      Orateur: Tous
    • 16:45 16:50
      Nouvelles des équipes du CC 5m
      Sujets par équipe, avec un impact potentiel sur l'exploitation. Nouvelles des embauches
      Orateur: Un membre de chaque équipe