Skip to content

ALICE Grid Monitoring with MonALISA

When talking about a worldwide distributed system, like the ALICEs Grid, you have to take into consideration various platforms, software and consequently, error conditions. In order to quickly understand what is happening in a system of this scale, monitoring should provide a global view of the entire system.

It is important to be able to correlate the evolution of various monitored parameters, on different grid sites or in relation with the central services parameters. Aside from that, the monitoring system must be non-intrusive, accurate and it should provide both historical and near real-time image of the Grids status and performance.

Based on these requirements, MonALISA framework was chosen to monitor the entire JAliEn Grid system. Currently almost all JAliEn components are monitored as shown in the table below:

Central Services Task Queue, Information Service, Optimizers, API Service etc.
Site Services Job Agents, Cluster Monitor, Computing and Storage Elements
LCG Services (on VOBoxes)
Jobs Job status and resource usage
Network traffic inter/intra-site

Monitoring Architecture in JAliEn

JAliEn monitoring follows closely the MonALISA architecture: each JAliEn service, including the Job Agent, is instrumented with ApMon, the Perl and C++ versions. It regularly sends monitoring data to the local MonALISA service running on the site. Here, data from all the services, jobs and nodes is aggregated, the site profile being generated with a resolution of 2 minutes. Local on-site MonALISA services keep a short (in memory only) history about each received or aggregated parameter. All these can be queried with a MonALISA GUI Client. Only the aggregated data is collected by the MonALISA Repository for long term histories.

monalisa

Deployment and Configuration

For JAliEn monitoring, MonALISA is packaged and prepared for installation by the JAliEn Build and Test System, deployed in CMVFS.

Configuration files for MonALISA are generated automatically from JAliEn LDAP at startup. If a MonALISA entry for the site is not present in LDAP, MonALISA won't start.

Then, MonALISA behaves like any other AliEn service using the following commands:

Action Command
Start ~$ alien StartMonaLisa
Stop ~$ alien StopMonaLisa
Check status ~$ alien StatusMonaLisa