Minutes of the CAF meeting 10/11/2017
At CC: AMI team (Jerôme O., Jerôme F., Fabien), Catherine, Sabine, Manu, Eric, Manoulis, LP
Phone: Fred, Stephane, Laurent, Mihai, David
1) T2s INTRODUCTION (Luc)
- T1 contribution at 11% ATLAS requests
- T2s contribution (today in Rebus) 9% of ATLAS requests. Likely to increase.
Feedback from S&C week September and TIM:
- Software: AthenaMT (see Manus's talk next CAF), HSF activity, Progress on FastSim, Overlay for pileup
- ExtraResources: Cloud (stable), HPC (from US) big contribution: Event Service & Overlay increasing usage
- Promote derivation from tape, needs to support and increase software manpower.
- Support Containers roolout (via Singularity). New wrapper available for containers.
- T2s policy setting up
- Many deletion & transfer issues at most sites. Usually fixed by restarting services. 2 sites (CPPM, Tokyo)
had to increase the #threads to handle high SRM load.
- Hong-Kong T2 candidate has the 3 queues running. Candidacy to be presented at December ICB. Still some memory
issues with ARC+HTCondor on mcore queue.
- RO-07 new ARC+HTCondor+SL7+Docker mcore queue at RO-07. Memory issues to be undrestood. 100TB added on DATADISK.
2) T2s SQUAD REPORT (Manu)
- Stable activity wrt previous period
- Profile (mcore part, activities sharing) stable and same as non-FR T2s
- IRFU: Possibly drop in CPU delivered since beginning of September. To be followed.
3) AMI STATUS (Jerôme O.)
Use in ATLAS:
- Tool to handle metadata with Java & Web framework
- Cloud Openstack architecture based at CC & duplicated at CERN (load balancing)
- Usage: AMI-tags, Dataset discovery, AMI-Glance, Metadata dictionary
Status & Dvt:
- V1.0 15 yrs old, V2.0 started in 2015 (better scalability, performance, more recent technologies)
- Possibility for physics groups to plug in directly links in twiki (auto-update)
- ATLAS trend is to move from dataset metadat to event metadata (devt work to do)
- Technical master-project creation at IN2P3 level
- Tag Collector dead because replaced by GIT
- AMI team communication via S&C weeks and DCC working group
- Having infra at CC allows better & faster reactivity. Essential for developpers.
4) FEEDBACK from LABS (All)
- Old SE dumps not erased -> Cedric to be contacted.
- CPPM: 200TB added on LGD
- RO-07 (see in Sec. 1)
- LPNHE: Work on cloud & available HPC (Aurelien)
5) LCG-FR FEEDBACK (Catherine)
Journees LCG-FR 22-24/11@LPC:
- Future IR-T2 ('mesocentre') discussion
- New CERN monitoring. Good to have a presentation at Sites Jamboree in 2018. Sabine & Catherine to take care.
- SL7 migration planning not clear
HSF/WLCG wkshop 26-29/03@Napoli:
- Focus on Run-3 & 4
CA Certification changes:
- IGC -> Education Nationale. Start en 2017. Potentially troublesome (admin, robot,...)
- Master IN2P3 project
- Budget 1981kEuros (CC-IN2P3 1661k, T2 280k , functioning 40k)
- 2018-2022, only between IN2P3 & IN2P3 labs
- Draft iterated in CoDir
- For T2s: only support for pledged capacities renewal at 70% level. To be complemented by the labs: 30% renewal
– Sites growth and services: in the hands of the labs
- Priority given to renewal at CCIN2P3 and a 10% resource growth at T1
- Overall goal for T1 and T2s: 8-10% of worldwide resource
- Flat budget, Sharing among expts unchanged (45% for ATLAS)
- In 2 most optimistic & pessimistic scenerii, still possible to maintain 11% (pess.)-20% (opti.) growth at T1
6) INTRODUCTION T1 (Luc)
- Converge on archiving setup for sps & LGD & discuss afs status
- WLCG availability& Reliability OK
- Wrt other T1s: 14-15% total WT (2nd after BNL)
- 8500 slots running over the period. From batchmaster, 7500 is baseline. NB: 15% of total WNs are in SL7 and not seen by ATLAS
- 2 SDT (1 standard & 1 for security update)
- 2 issues: 24/09 1 dCache pool offline, and 7/11 Massive staging failures due to hw pb on pool. Discussion ongoing in Operation team: Latency time before informing and how (via Elog?)
7) T1 SQUAD REPORT (Manu)
- Stable wrt last period
- More mcore CPU compared to last period
8) T1 STATUS (Manoulis)
- New wrapper supporting Singularity installed in 2 Voboxes
- HPC@IDRIS: 6hrs CPU (in preemptive mode) possible (used to be limited to 1.5hrs)
- Farm migration for 2018 w/ SL7 as default (15% to increase). SL6 pledges maintained
9) SPS & LGD archiving
- No progress on docuementation for users. Eric will follow on CC side. Gaol is to inform users by the end of 2017
(doc to retrieve migrated files needed)
- Datasets (list provided by Manoulis) migration to LOCALGROUPTAPE & their deletion on LGD is done.
Tool (Rucio 'mv command') to perform migration by someone with atlas/fr privileges not ready yet.
- Extra 80TB expected to be recovered soon
10) AFS (Manoulis)
- already migrated to new tool (pbs)
- Used for software install
- To be migrated by end of 2017. Represents 2 GB (15 GB from old user to be deleted)
- To be checked (Manoulis): Twiki ATLASFrance still on Throng-Dir.
- No mail needed to inform users