JAliEn documentation - how to - catalogue¶
This document is JAliEn update of the AliEn tutorial available here.
Basic commands¶
If you got this far, you have a working installation of alien and are able to authenticate to the system. Congratulations!!
Now, let's take a look at the different commands that you can execute inside alien. The first thing that you have to do is enter a supported environment:
either use an AliPhysics JALIEN version that would also enable the alien.py
shell or
one can enable the standalone module
/cvmfs/alice.cern.ch/bin/alienv enter xjalienfs/1.2.2-9
nhardi@pcalice ~> /cvmfs/alice.cern.ch/bin/alienv enter AliPhysics/vAN-20200330_JALIEN-1
[AliPhysics/vAN-20200330_JALIEN-1] ~ > alien.py
Welcome to the ALICE GRID
support mail: adrian.sevcenco@cern.ch
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >
From this prompt, you can pass any commands to the AliEn System. Be careful, because even if it looks like a UNIX shell, it IS NOT a UNIX shell.
To get the full list of the commands that you can use, press tab:
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ > <tab>
$? df kill mkdir quota token-init
$?err dirs la motd resubmit top
? du lfn2guid mv rm touch
access edit listFilesFromCollection nano rmdir type
addFileToCollection error listSEDistance packages run uptime
cat exec listSEs pfn setSite user
cd exit listTransfer pfn-status showTagValue uuid
cert-info exitcode ll ping stat version
changeDiff find lla popd submit vi
chgroup find2 logout prompt testSE vim
chown getSE ls ps time w
commandlist grep masterjob pushd toXml whereis
commit groups mcedit pwd token whoami
cp guid2lfn md5sum queryML token-destroy whois
deleteMirror help mirror quit token-info xrdstat
Most of the commands are similar to the standard UNIX commands:
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >whoami
nhardi
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >ls -l | tail
-rwxr-xr-x nhardi nhardi 3748 Feb 04 12:48 input.jdl
-rwxr-xr-x nhardi nhardi 3772 Feb 04 12:42 input.jdl_20200204_124400
drwxr-xr-x nhardi nhardi 0 Jan 29 13:33 jalien-job-1759206333/
drwxr-xr-x nhardi nhardi 0 Jan 29 13:42 jalien-job-1759209346/
drwxr-xr-x nhardi nhardi 0 Jan 29 13:43 jalien-job-1759209959/
drwxr-xr-x nhardi nhardi 0 Feb 06 22:19 LHC18q/
drwxr-xr-x nhardi nhardi 0 Dec 20 15:07 minimal_example/
drwxr-xr-x nhardi nhardi 0 Jan 14 16:51 output/
-rwxr-xr-x nhardi nhardi 472 Feb 11 15:23 test.zip
drwxr-xr-x nhardi nhardi 0 Jan 15 10:07 workspace/
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >cd workspace/
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/workspace/ >ls
test/
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/workspace/ >ls -l
drwxr-xr-x nhardi nhardi 0 Jan 15 10:07 test/
Finally, to exit the catalogue, you can use 'exit', 'quit', or Ctrl+c
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >quit
Exit
[AliPhysics/vAN-20200330_JALIEN-1] ~ >
If you want to execute only one command, you can do it also from the UNIX prompt
(not the alien prompt!), typing alien.py command
[AliPhysics/vAN-20200330_JALIEN-1] ~ > alien.py ls -l
drwxr-xr-x nhardi nhardi 0 Feb 24 16:52 alien-job-1794849428/
drwxr-xr-x nhardi nhardi 0 Feb 24 17:17 alien-job-1794875442/
drwxr-xr-x nhardi nhardi 0 Jan 14 15:46 CharmTriggerEfficiency/
-rwxr-xr-x nhardi nhardi 3748 Feb 04 12:48 input.jdl
-rwxr-xr-x nhardi nhardi 3772 Feb 04 12:42 input.jdl_20200204_124400
drwxr-xr-x nhardi nhardi 0 Jan 29 13:33 jalien-job-1759206333/
drwxr-xr-x nhardi nhardi 0 Jan 29 13:42 jalien-job-1759209346/
drwxr-xr-x nhardi nhardi 0 Jan 29 13:43 jalien-job-1759209959/
drwxr-xr-x nhardi nhardi 0 Feb 06 22:19 LHC18q/
drwxr-xr-x nhardi nhardi 0 Dec 20 15:07 minimal_example/
drwxr-xr-x nhardi nhardi 0 Jan 14 16:51 output/
-rwxr-xr-x nhardi nhardi 472 Feb 11 15:23 test.zip
drwxr-xr-x nhardi nhardi 0 Jan 15 10:07 workspace/
[AliPhysics/vAN-20200330_JALIEN-1] ~ >
You can also use aliases of the form alien_<command>
for the aliases that are
predefined. Currently, the following aliases are available:
[AliPhysics/vAN-20200330_JALIEN-1] ~ > alien_ <tab>
alien_cmd alien_lfn2guid alien_mv alien_rmdir alien_whereis
alien_cp alien_ls alien_pfn alien_rsync.sh
alien_find alien_mirror alien_ps alien_stat
alien_guid2lfn alien_mkdir alien_rm alien_submit
Host shell interactions¶
Notice that when a command starts with !
it means that it will be executed in
the shell on a local machine. For instance, running !ls $HOME
lists files from
the local home directory.
AliEn commands can be piped to one or more host shell commands
(the output of the left side of the first pipe is passed as input into the shell to right side of the pipe)
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >whereis input.jdl | awk '/CERN/ {print $NF}'
root://eosalice.cern.ch:1094//02/44400/313c07a0-57bb-11ea-8f07-e83935243962
Working with files¶
Downloading files from the Grid¶
All the entries that you can see in the catalogue are not real files, but Logical File Names (LFN). You can think of an LFN like an index that points to one (or more) Physical File Names (PFN). The physical file name points to a copy of the file. It can be on a disk, tape, or any kind of mass storage system.
You can use commands whereis
and xrdstat
to get the list of replicas of an
LFN and their status.
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >whereis input.jdl
the file input.jdl is in
SE => ALICE::CERN::EOS pfn => root://eosalice.cern.ch:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1
SE => ALICE::ICM::EOS pfn => root://a-se.grid.icm.edu.pl:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1
SE => ALICE::PRAGUE::SE pfn => root://xrdhead.farm.particle.cz:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >xrdstat input.jdl
Checking the replicas of /alice/cern.ch/user/n/nhardi/input.jdl
ALICE::CERN::EOS root://eosalice.cern.ch:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1 OK
ALICE::ICM::EOS root://a-se.grid.icm.edu.pl:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1 OK
ALICE::Prague::SE root://xrdhead.farm.particle.cz:1094//02/62390/335c0bc0-4744-11ea-8877-0242599c3dd1 OK
If an LFN points to more than one PFN, all these PFNs are identical copies (also known as mirrors or replicas) of the same file. However, users usually work with just logical filenames. The translation from logical to physical filenames is transparent to the user and is done automatically by the framework.
Use the command cp
to copy a file to and from the Grid. This command requires
two arguments to transfer a file. You can print the help message by passing flag
-h
to find more options such as using filename patterns and parallel transfers
to copy multiple files at once.
The paths must have a label to denote the path type:
alien:
oralien://
to specify a remote (GRID) filefile:
orfile://
to specify the local host location
Either one can be used, but at least one is required.
Any path without a prefix are considered to be remote paths by default.
WARNING: with the legacy AliEn shell, the aliensh
, the file paths without a
prefix are considered to be local files. So for compatibility purposes always
specify a prefix, to both local and Grid files.
File path alias %ALIEN
can be used in place of the full path to the Grid home
directory OR use a relative path, the current work directory will be automatically appended.
The remote (GRID) paths, either as source or destination support the @
qualifier.
This can specify the number of replicas to be used and additionally (comma separated list) a list of Storage Elements (SE)s to be used or excluded
@disk:3,SE1,!SE2,!SE3
This have the meaning of: get 3 replicas but excluding SE2 and SE3 and add also SE1 (for a total of 4 replicas)
When downloading, a specification like SE1,SE2,!SE3,!SE4
means: download only from SE1 and SE2 and exclude SE3 and SE4
The location of replicas can be found with xrdstat
, see above
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >cp
at least 2 arguments are needed : src dst
the command is of the form of (with the strict order of arguments):
cp args src dst
...
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >pwd
/alice/cern.ch/user/n/nhardi/
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >!pwd
/home/nhardi
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >ls -l input.jdl
-rwxr-xr-x nhardi nhardi 3748 Feb 04 12:48 input.jdl
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >!ls -l input.jdl
ls: cannot access 'input.jdl': No such file or directory
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >cp input.jdl file://input.jdl
jobID: 1/1 >>> Start
jobID: 1/1 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 215.51 B/s MESSAGE: [SUCCESS]
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >!ls -l input.jdl
-rw-r--r-- 1 nhardi nhardi 3748 Feb 25 11:28 input.jdl
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >!ls -l /tmp/local_file.jdl
-rw-r--r-- 1 nhardi nhardi 3748 Feb 25 11:28 /tmp/local_file.jdl
You can also see the file, if you use the commands cat
, less
and vim
to
view and edit files directly on the Grid. After editing a file, the updated
version will be uploaded automatically. A backup version of the file is kept in
a filename called the same as the original one + a tilda (~
) character.
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >cat input.jdl
jobID: 1/1 >>> Start
jobID: 1/1 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 64.61 B/s MESSAGE: [SUCCESS]
LPMJobTypeID = "20351";
InputDataListFormat = "xml-single";
MasterJobId = "1766663504";
PWG = "COMMON";
LegoResubmitZombies = "1";
ValidationCommand = "/alice/cern.ch/user/a/aliprod/QA/validation_merge.sh";
JDLPath = "/alice/cern.ch/user/a/aliprod/QA/QA_merge.jdl";
LPMChainID = "718";
...
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >vim input.jdl
jobID: 1/2 >>> Start
jobID: 1/2 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 53.70 KiB/s MESSAGE: [SUCCESS]
jobID: 2/2 >>> Start
200225 11:40:09 17580 cryptossl_X509CreateProxy: Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=nhardi/CN=801402/CN=Nikola Hardi
jobID: 2/2 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 4.35 KiB/s MESSAGE: [SUCCESS]
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >ls -l input.jdl*
-rwxr-xr-x nhardi nhardi 3749 Feb 25 11:40 input.jdl
-rwxr-xr-x nhardi nhardi 3748 Feb 04 12:48 input.jdl~
Uploading files to the Grid¶
To create a new file just copy a local file to the Grid using copy command cp
.
Specify a local file as the first parameter (source), and remote file as the
second parameter (destination). Note, the alien://
prefix and %ALIEN
Grid
home directory alias are not required.
You can inspect details of the new file registered in the AliEn file catalog
with the stat
command. The cp
command by default creates two replicas and
registers links them to an LFN in the catalog.
AliEn[nhardi]:/alice/ >cp file:///tmp/local_file.jdl alien://%ALIEN/remote_file.jdl
jobID: 1/2 >>> Start
jobID: 1/2 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 6.35 KiB/s MESSAGE: [SUCCESS]
jobID: 2/2 >>> Start
jobID: 2/2 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 5.32 KiB/s MESSAGE: [SUCCESS]
AliEn[nhardi]:/alice/ >xrdstat remote_file.jdl
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >stat remote_file.jdl
File: /alice/cern.ch/user/n/nhardi/remote_file.jdl
Type: f
Owner: nhardi:nhardi
Permissions: 755
Last change: 2020-02-25 11:56:41.0 (1582628201000)
Size: 3748 (3.66 KB)
MD5: 6f605d1c2301e8c0753281e22179bd4b
GUID: 3ce6b710-57bd-11ea-8f07-e83935243962
GUID created on Tue Feb 25 11:54:47 CET 2020 (1582628087041) by e8:39:35:24:39:62
Replicating files¶
For critical files (accessed from many jobs, have to be distributed around the
world for performance / availability reasons) you can either
- create more replicas upfront, with cp file://... alien://<filename>@disk:5
- use the mirror
command to increase the number of replicas for a file: mirror <filename> -S disk:5
Searching for files¶
Like in UNIX file system, you can search for files in the catalogue. You can
specify path and pattern for the files you are looking for. Regex patterns are
also supported. See find
command help for more information.
Check cp
command help for more information how to use filename patterns with
copy command and -select
and -name
arguments.
WARNING: the find
command in legacy AliEn shell aliensh
printed XML
collections to stdout, while the JAliEn alien.py
shell creates output
collection on the Grid directly.
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >find . *.jdl
/alice/cern.ch/user/n/nhardi/.input.jdl~
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy.jdl
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy_merge.jdl
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy_merge_final.jdl
/alice/cern.ch/user/n/nhardi/input.jdl
/alice/cern.ch/user/n/nhardi/input.jdl~
/alice/cern.ch/user/n/nhardi/minimal_example/.minimal.jdl~
/alice/cern.ch/user/n/nhardi/minimal_example/minimal.jdl
/alice/cern.ch/user/n/nhardi/minimal_example/minimal.jdl~
/alice/cern.ch/user/n/nhardi/minimal_example/minimal_any.jdl
/alice/cern.ch/user/n/nhardi/minimal_example/minimal_any.jdl~
/alice/cern.ch/user/n/nhardi/remote_file.jdl
AliEn[nhardi]:/alice/cern.ch/user/n/nhardi/ >find -r . .*jdl$
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy.jdl
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy_merge.jdl
/alice/cern.ch/user/n/nhardi/CharmTriggerEfficiency/output/CharmTriggerStudy_merge_final.jdl
/alice/cern.ch/user/n/nhardi/input.jdl
/alice/cern.ch/user/n/nhardi/minimal_example/minimal.jdl
/alice/cern.ch/user/n/nhardi/minimal_example/minimal_any.jdl
/alice/cern.ch/user/n/nhardi/remote_file.jdl
MetaData catalogue¶
Files can be associated custom tags and one place in particular where this
feature is used is calibration objects. To see the metadata associated to a file
you can use the showTagValue
command.
Advanced usage¶
The alien.py
command allows printing of json output if -json
argument is used.
If used for the main invocation (alien.py -json command
) it will be enabled globally for the whole session,
but it can be used per command (alien.py command -json
). The same will happen in the shell mode.
- Unknown commands will be automatically passed for execution to the shell.
- Multiple commands, delimited by
;
or\\n
can be passed as input - If the argument of alien.py is a file, it will parsed as input.
cp
is aware of globbing*
- Site administrators can use commands like:
listSEDistance
,listSEs
,getSE
,queryML
Environment variables¶
Basic usage¶
ALIENPY_PROMPT_DATE
: enable printing of date in alien.py shell prompt
ALIENPY_PROMPT_CWD
: enable printing of cwd in alien.py shell prompt
ALIENPY_NO_CWD_RESTORE
: do not restore the last (saved)session cwd
Advanced usage¶
ALIENPY_DEBUG
: enable DEBUG output to logfile
ALIENPY_DEBUG_FILE
: set to new filepath; default to ~/alien_py.log
ALIENPY_JSON
: all output will be in the json format. N.B.!! client-side implementations have no json output. Use -json
argument for per-command usage.
Expert usage¶
ALIENPY_TIMECONNECT
: enable timing of connections (it will be shown in log file)
ALIENPY_KEEP_META
: cp operations:: keep the metafile used for download operations. it will be found in $TMPDIR
ALIENPY_TIMEOUT
: connection timeout when waiting for server answer
ALIENPY_NO_STAGGER
: disable the staggered socket connection to the server
ALIENPY_CONNECT_TRIES
: try this many times to connect before giving up (3)
ALIENPY_CONNECT_TRIES_INTERVAL
: wait this many seconds between tries (0.5)
ALIENPY_JCENTRAL
: set the remote endpoint
ALIENPY_JCENTRAL_PORT
: the the remote endpoint port
API usage¶
Internal Python interpreter¶
alien.py
allow usage within python environment.
The fast option is alien.py term
:
alien.py term
Welcome to the ALICE GRID - Python interpreter shell
support mail: adrian.sevcenco@cern.ch
AliEn seesion object is >jalien< ; try jalien.help()
>>> jalien.help()
Methods of AliEn session:
.run(cmd, opts) : alias to SendMsg(cmd, opts); It will return a RET object: named tuple (exitcode, out, err, ansdict)
.ProcessMsg(cmd_list) : alias to ProcessCommandChain, it will have the same output as in the alien.py interaction
.wb() : return the session WebSocket to be used with other function within alien.py
>>> jalien.ProcessMsg('pwd')
/alice/cern.ch/user/a/asevcenc/
0
High level/Basic API usage¶
The class AliEn
is a helper to make it easy to establish websocket connection to central services.
The basic usage is shown below:
import alienpy.alien as alien
alien.setup_logging()
j = alien.AliEn()
j.ProcessMsg('whoami -v')
The instantiation of the class will establish connection automatically.
help()
method will show the methods descriptions.
Low level/Advanced API usage¶
With the exception of functions designed to be used exclusively within alien.py
(but there are no hard limits)
all user-sided functions return an RET object:
class RET(NamedTuple):
exitcode: int = -1
out: str = ''
err: str = ''
ansdict: dict = {}
retf_print(ret_info: RET, opts: str = '') -> int
that will return the
exitcode
and print the out
and err
messages.
The message printing/logging (opts
string) can be steered by:
-
json
: it will pretty print the json and return the exitcode -
if
exitcode != 0
(that means error and there is NO stdout)-
info
/warn
/err
/debug
: the errror message will be logged to corresponding logging facilities -
noerr
/noprint
: no error message will be printed
-
-
noout
/noprint
: no output message will be printed
Connection is established by:
InitConnection(token_args, use_usercert: bool = False, localConnect: bool = False) -> websockets.client.WebSocketClientProtocol
-
token_args
are token arguments to be passed totoken
command for the token creation IF connection is created with user certificate, -
use_usercert
ifTrue
will force connection re-initialization with user certificate, -
localConnect
(WIP) ifTrue
will try to connect to a local socket of a host-level websocket proxy service (long lived websocket endpoint)