# Data Injection Demonstrators

Europe/Paris
• 11:00 11:05
ATLAS 5m
• 11:05 11:10
CMS 5m
• 11:10 11:15
CTA 5m
• 11:15 11:20
EGO/VIRGO 5m

AUTHENTICATION: OK
==============
user@3ae7a83d158b ~]$rucio whoami /usr/lib/python2.7/site-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release. from cryptography.hazmat.backends import default_backend status : ACTIVE account : pchanial account_type : SERVICE created_at : 2020-10-13T09:59:50 updated_at : 2020-10-13T09:59:50 suspended_at : None deleted_at : None email : pierre.chanial@ego-gw.it SCOPE CREATION: OK ============== rucio-admin scope add --account pchanial --scope VIRGO_EGO_CHANIAL UPLOAD: OK ====== [user@9c903e7070d5 ~]$ rucio upload --rse EULAKE-1 --scope VIRGO_EGO_CHANIAL V-FakeV1_GWOSC_O2_4KHZ_R1-1185615872-4096.hdf5
2020-10-15 09:28:22,987    INFO    Preparing upload for file V-FakeV1_GWOSC_O2_4KHZ_R1-100000004-4096.hdf5
2020-10-15 09:28:23,261    INFO    Successfully added replica in Rucio catalogue at EULAKE-1
2020-10-15 09:28:23,354    INFO    Successfully added replication rule at EULAKE-1
2020-10-15 09:28:24,845    INFO    Trying upload with gsiftp to EULAKE-1
2020-10-15 09:28:28,346    INFO    Successfully uploaded file V-FakeV1_GWOSC_O2_4KHZ_R1-100000004-4096.hdf5

• 11:20 11:25
FAIR 5m

1. Environment

No changes.

2. Test run

* scheduled run time: 96 hours
* actual run time: about 60 hours - since around 2020-10-05, 04:00 UTC
rucio operations have been failing with "unable to get authentication
token" in spite of voms-proxy-info claiming the proxy certificate to be
valid
1. for each DOWNLOAD, 1 in 5 chance of the DID having been
2. new set of replication rules - after each upload, an equal chance
of requesting:
- 1 replica at DESY-DCACHE, or
- 1 replica at QOS=SAFE, or
- 2 replicas at QOS=SAFE, or
- 1 replica at QOS=CHEAP-ANALYSIS, or
- 2 replicas at QOS=CHEAP-ANALYSIS, or
- no further replicas.
For the record, GSI-ROOT advertises QOS=CHEAP-ANALYSIS so rules
requesting this QoS class get one replica without any transfers.

3. Manual tests

Uploads of full data sets (via appropriate invocation of 'rucio
add-dataset' + 'rucio attach'), removal of files from data sets ('rucio
detach') and deletion of whole data sets ('rucio erase') have been
tested manually.

4. Results

* 100% success rate for replication to QOS=SAFE (16x one replica, 9x two
replicas) and, unsurprisingly, "one replica at QOS=CHEAP-ANALYSIS" (9)
* 4 out of 10 "two replicas at QOS=CHEAP-ANALYSIS" rules ended up stuck:

- 2 due to an authentication error on our end. Server logs show
several occurrences of the error " XrootdResponse: sending err 3006:
Invalid request; user not authenticated" for connections from
ccdcalitest11.in2p3.fr at the time the replication was to take place,
and indeed IN2P3-CC-DCACHE is one of the two RSEs advertising
QOS=CHEAP-ANALYSIS (the other being LAPP-WEBDAV) which do not presently
hold any replicas of FAIR data;

- 1 due to the target RSE having run out of storage space. Server
logs only showed a WebDAV connection originating from
fts-pilot-07.cern.ch but having extracted the relevant job ID
(235923f4-05c4-11eb-8d79-02163e018830) from said logs, I was able to
determine that the actual recipient of the transfer was LAPP-WEBDAV;

- 1 due to an allegedly unknown TLS certificate presented by the
destination. No idea what the destination was this time, none of the
relevant job IDs have corresponding entries show up in the FTS Monitor.

* no major issues with data set-related operations but the user
experience pertaining to direct uploads of data sets leaves something to
be desired of (see below)

5. Conclusions

* We need a way of looking further into the past while analysing
FTS-transfer errors. Unless I have missed something, FTS Monitor does
not show more than the last 6 hours of activity and while our Grafana
dashboard does allow selecting longer time frames, it doesn't seem to
have any effect on the contents of the failure-log panel;

* I have found both rucio-clients help messages and Rucio RTD
set (in contrast to attaching previously uploaded files to a data set)
woefully inadequate. In the end I found the necessary syntax in some
ATLAS tutorials on the Web, and even those I had to modify a bit - for
some reason the only way this works for me is to have the scope declared
twice, both via --scope and in the data-set name (e.g. 'rucio upload
--scope FAIR_GSI_SZUBA --rse GSI-ROOT FAIR_GSI_SZUBA:testDS aaa bbb') -
if I omit the former Oracle complains.

Further information on this, partly from my own experiments and partly from Martin's private response. Was going to add it to today's Indico entry but it turns out I cannot edit the material there.

1. The '--scope' argument must be used because it specifies the scope for *uploaded files*, same as when you upload them without creating a data set in the process. Without it Rucio attempts to use the user's personal scope for the files even if the data set itself has got a scope in its prefix, which we haven't got on our testbed - hence the error.

2. The scope prefix for the data set is strictly speaking not necessary (if absent, Rucio will use the value of --scope) - but it is recommended to use it because it explicitly marks the first argument as data set. If you do not specify it, the first argument will only be treated as data-set name UNLESS THERE IS A FILE OR DIRECTORY WITH THAT NAME IN THE CURRENT WORKING DIRECTORY. Consider the command

rucio upload [...] testDS aaa bbb

, where aaa and bbb are files, in the three following cases:

- testDS does not exist - rucio creates a data set called testDS, creates the initial replication rule for it, uploads both files and attaches them to testDS;

- testDS is a file - rucio uploads all three files, each one with its own initial replication rule - no data set is created;

- testDS is a directory - similar to the above but rucio recursively uploads the files found inside testDS/.

• 11:25 11:30
LOFAR 5m
• 11:30 11:35
LSST 5m

### 1. Environment setup
For this exercise, we're using a shared Python virtual environment installed on CC-IN2P3 interactive machines. All ESCAPE members with a CC-IN2P3 account can load this environment by sourcing the /pbs/throng/escap/rucio/rucio_escape.sh file.
By default, authentication is done with x509_proxy, but this can be configured by the user via environment variables if needed, as well as other Rucio configuration details.
After loading the environment, the user needs to obtain an ESCAPE certificate proxy via voms-proxy-init -voms escape.
Once this is done, everything is setup accordingly and we can start interacting with Rucio.

### 2. Running the exercise
To make testing quicker and make things reproductible, most actions are done within bash scripts.
The first script:
- Generates a random file of a random size between 100MB and 1000MB.
- Uploads that file to IN2P3-CC-DCACHE RSE via davs protocol
- Attaches that file to the LSST_CCIN2P3_GOUNON:demo_test01 dataset

Then, we created the rule with ID **bf0ce8d640964c6cba5baed7f03ceaee** with the following command:


This will ensure 2 copies of the files in LSST_CCIN2P3_GOUNON:demo_test01 are present at any time, on sites where the QOS=CHEAP-ANALYSIS flag is present.
After a little while, all files in the rule went to OK state.

The second script will:
- Get the md5sum of that file from Rucio with the rucio get-metadata command
- Compare the remote md5sum with the local md5sum for that file and output the result

We were also able to mark files for deletion with rucio erase. We have yet to confirm that those files will be effectively deleted after the 24 hour delay.
It should be noted that we are now running automatic uploads to that same dataset hourly via a crontab task.

### 3. Errors:
The single warning we've encountered during this exercise is the following:


2020-10-02 16:31:47,756    DEBUG    The requested service is not available at the moment.
Details: An unknown exception occurred.
Details: Could not open source: Connection timed out
2020-10-02 16:32:47,869    DEBUG    The requested service is not available at the moment.
Details: An unknown exception occurred.
Details: Could not open source: Connection timed out
[...]
2020-10-02 16:32:54,540    INFO    File LSST_CCIN2P3_GOUNON:6r1XIOlGCsPX2UwaVu0D8wWbQ4WYjp4x successfully downloaded. 896.532 MB in 3.2 seconds = 280.17 MBps


This means that, to download LSST_CCIN2P3_GOUNON:6r1XIOlGCsPX2UwaVu0D8wWbQ4WYjp4x, Rucio first tries to get it via davs protocol on LAPP-DCACHE, fails twice, and then switches to root protocol on IN2P3-CC-DCACHE before succeeding. This probably means there is an issue with the davs protocol on LAPP-DCACHE, which I haven't investigated more into.

### 4. Feedback:
Everything works as expected at this stage and we didn't have any major issue.
I think it would be interesting to investigate non-deterministic RSE configurations as well, and our ability to feed existing dataset into the Rucio catalog via file registration.

• 11:35 11:40
MAGIC 5m
• 11:40 11:45
SKA 5m
• 11:45 12:00
Report, Summary, and Next Steps 15m
Speaker: Riccardo Di Maria (CERN)