How to Install a New SP Release on the Westgrid Cluster


This document is divided into three parts:
I – Installing the New Release
II – Installing the Objectivity Conditions Database
III – Validating the Installation


Part I: Installing the New Release


First, note that if you are aiming to upgrade to a “lettered” release, such as xx.yy.zzb, then you will need to install the base release, xx.yy.zz first, and then repeat the instructions for the subsequent “lettered” release.

Step 1: Create an AFS token for connection to the SLAC afs drives. Use the command “klog -principal username -cell slac.stanford.edu”. You will be asked to input a password for username, which should be your own SLAC account.

Step 2: Set environment variable for the remote distribution drives at SLAC. Type the following: “setenv BFDISTr /afs/slac.stanford.edu/g/babar/dist”.

Step 3: Identify the exact release number and architecture name that you wish to setup for running simulation production. You can see a list of all available releases at $BFDISTr/releases. You can see the supported architectures by looking for spec files: “ls -a $BFDISTr/releases/xx.yy.zz/.spec* “

Step 4: Import the entire release, and keep a log file of the progress for future debugging, if necessary. Use the command “importrel -pa xx.yy.zz >& import_rel_xx.y.zz.log &” This is a BABAR-specific utility that will be available on your system if you have previously been running earlier versions of the BABAR SP software. This step may take several hours to complete, depending on the level of network traffic.

Step 5: Import the machine-architecture-specific binaries associated with the cluster that you are working on. Use the command “importarch -p xx.yy.zz Linux24SL3 >& importarch_xx.yy.zz.log &” This is another BABAR-specific utility that will be available on your system if you have previously been running earlier versions of the BABAR SP software. This step may take several hours to complete, depending on the level of network traffic.

Step 6: Before you can install the release you have just downloaded, a couple of site-specific modifications are necessary. cd $BFDIST/releases cp -r anyrelease/tcl8.0 xx.yy.zz/ * copy the tcl8.0 libraries into the new release. cd $BFROOT/etc cp -r anyrelease xx.yy.zz * copy the contents of any release directory here to the new release

Step 7: Build the release executables and directory structure from the files you have just imported. Use the following commands: “cd $BFDIST/releases/xx.yy.zz” to change to the new release directory. Then type “gmake siteinstall” to begin the build process. This will take quite some time to complete, so you might want to go away and come back later to check the results.

Step 8: Login as the production user, and create a release workdir the same way it is done at SLAC: “newrel -t xx.yy.zz spxxyyzz_event” is the convention used in naming the directories on Westgrid. Also do the following: “addpkg workdir”, “gmake workdir.setup”, “gmake installdirs”.

Step 9: Now you are ready to follow the usual instructions to start working in a test release directory:
>cd ~/release/spxxyyzz_event
>srtpath [enter] [enter]
>source setboot_con ( this sets the environmental variable $OO_FD_BOOT)

Part II – Installing the Objectivity Conditions Database



Step 1: Now it will be time to install the new Objectivity conditions databases on your local cluster. Firstly, look at the HyperNews to find out what snapshot is being used for this release. For example, for release 18.10d we are using the snapshot “SP8::sixth” . Also identify the date of the snapshot, ie 2006mmdd

Step 2: Now use the “babar” account to download the snapshot onto the local machine, in preparation for importing it. This only transfers the files, they won't be usable yet.
>cd /export/BABAR/objyserv3/objy/databases/snapshots/sp8/
>mkdir 2006mmdd (use the date of the snapshot)
edit the file bbftp.command to use the right snapshot date
>cp bbftp.command 2006mmdd/
>./bbftp.command <- This will start the transfer, it will take several hours to finish.

Step 3: Now login as the “babarwg” account to begin the process of importing the snapshot into usable Objectivity files. It is a good idea to check that you have at least 20% disk free space on the /export/BABAR volume. Then perform the following steps:
>cd ~/release/spxxyyzz_event/
>srtpath [enter] [enter]
edit .bbobjy file to contain a new FDID number, usually one digit higher than the previous. I.e. 8135 ->8136
Edit setboot_con to contain the same new FDID number.
>gmake database.config to verify that the oo lockserver is running, and you have setup the Federation ID correctly.

Step 4: Download the schema from SLAC. Make sure you have an AFS token before continuing.
>gmake datbase.import
>gmake datbase..load BYPASS_CONDITIONS_LOAD=yes BYPASS_CONFIG_LOAD=yes

Step 5: Now you are ready to start importing the snapshot files. It will take several hours to complete.
>gmake datbase.load SNAPSHOT_DIR=/export/BABAR/objyserv3/objy/databases/snapshots/sp8/2006mmdd/

Part III – Validating the Installation


When it is time to perform validation, you first need to identify the run numbers to use. You can find them by looking on the following webpage: validation info

Step 1: You can use the spbuild command to build the runs:
>spbuild -j valid runnumber1
>spbuild -j valid runnumber2 or
>spbuild -j valid runnumber1-runnumbern
The “-j valid” option flags the run directories so that they are not merged or exported later by the spmerge and spexport tools. It is no longer strictly necessary to use the -j valid option, but it doesn't hurt either.

Step 2: The validation jobs are submitted normally:
>spsub -y -t simu runnumber1-runnumbern , or using the subbuilt.pl script.

To check that they have executed properly, you then have a nice script to use that will go through and compare them against the SLAC standard files. First, however, you must copy the output .root files to the appropriate directory at SLAC in order to run the comparison.

Step 3: Create a .tar file of the runs that you want to validate at SLAC.
>cd $BFROOT/prod/log/allruns
Use the following command to create stripped-down tar files of the validation runs:
>saveDir –nodb -t ./ runnumber1-runnumbern

Step 4: Use bbcp or bbftp to copy these archives to the following directory at SLAC: $BFROOT/prod/log/validation/westgrid/

Step 5: Log in to SLAC servers (for example yakut) using any user name, i.e. asgeirss. Go to thedirectory cd $BFROOT/prod/log/validation/. There you will find directories named after all of the SP sites, cd to westgrid.

Step 6: Extract the tar files, and then remove the tar files, leaving only the run directories:
>tar -xf *.tar
>rm *.tar

Step 7: Compare the runs against the SLAC standards. SLAC runs are labelled V01 for Intel, and V02 for AMD processors. Westgrid uses Intel Xeon processors, so compare to V01.

> cd $BFROOT/prod/log/validation/
> validate.csh <first run> <last run> slac 01 westgrid 01

Now check that any discrepancies reported are acceptable, i.e. they are seen by most of the other sites, and don't indicate a problem with your installation.
Once the SP coordinator has confirmed that your validation results are ok, then you are ready to resume production!