Sunday, March 25, 2007

OSCAR Bench, a Summe of Code proposal

Contact Info:
Name: James Elliott
Email: jje011@latech.edu
Email: pottyotter@gmail.com

Project Title: OSCAR Bench

Benefits to OSCAR
A common question on the OSCAR mailing list has been benchmarking a cluster. Benchmarks not only provide an idea of computing capacity, but they can
also highlight potential problems. This package will create a standard way to benchmark an OSCAR cluster. This will benefit both users and developers, giving users an estimate on computing capacity, and possibly pointing out areas where OSCAR can improve to developers.

Synopsis:
Currently OSCAR can install a cluster, and determine if the cluster is usable. What OSCAR can not do is give an estimate on how powerful the cluster
is. I propose to integrate ATLAS, HPL, and the new DARPA benchmarks into OSCAR as a package called OSCAR Bench. This package will allow users to easily
install these benchmarks, provide a mechanism to tune these benchmarks, and then run and report the results. Ideally these results will be automatically
submitted to a database so that users could see other results, and also see the configuration that lead to those results. Since each cluster is different
simply comparing the results of one to another is insufficient, which is why knowing the configuration of the benchmarking program is imperative.

Deliverables:
1.An OSCAR package (opkg) for each program containing the source code for the benchmark, as well as prerequisite software dependencies.
2.Documentation so that if someone else needs to maintain the code they will not be confused.
3.Some form of online repository where results may be submitted

Project Details:
I will create a simple interface that will allow the user to select which benchmarks to perform. Based on user selection, any prerequisite software will be installed. Next the cluster will be scanned to determine key information. This information will vary based on which benchmarks are chosen. E.g., for HPL I will gather a count of CPUs, the memory of the cluster. If ganglia is installed its database could be used, if not I will provide a method for finding the information. Using this information I will then configure the benchmarks. This configuration will be presented to the user and they may change settings if they want. Also this config data will be stored in a database table or in a file so that if the user wishes to run again with the same configuration it is easy, this table/file can also be parsed to provide the fields used for user configured benchmarks. Then the benchmark will be run, there will have to be some logic in running the job since I will not know which MPI implementation the user has chosen. I would also like to add a progress bar if possible, I know this could be done with HPL since it could updated each time a test is completed. Once the job is completed, I will parse the output, and present the results to the user. I would also like to give the option to upload the results to a database that may be hosted by the OSCAR group.

Project Schedule:
Initially I will need to spend time with my mentor defining exactly which benchmarks we want, and defining the best way to break them into OSCAR packages. ATLAS offers a good automatic tuning script, but they also offer many prebuilt binaries. I will need to find out if we want to build from source if a binary is already available. This will also influence the packaging approach since if everything is built from source, then I will only need 1 RPM per program. Otherwise I will need to create architecture specific RPMs. I believe this will only influence ATLAS. After deciding on the best packaging approach. I will begin by determining the steps, and information needed to install, configure and run each benchmark. This information will be used to create the scripts in the OSCAR Package:
Configurator
Rpm-install
Post-install
Setup
Post-clients
Post-uninstall

I will use the OSCAR package manager to handle the installation of the benchmarks. By using the package manager I can enforce that dependency software is installed. I will then need write functions to configure and run each benchmark. Later these functions will be hooked to GUI buttons/panels. I plan to write this as a command line tool, and then wrap a GUI around the command line calls, this will allow the package to be used either in a graphical environment or from the command line.

Project Timeline:
Week 1:
clearly define the scope of the project ( update this timeline accordingly )
Determine the preferred way to package the benchmarks
Setup needed repositories, obtain login information
Define a regular schedule for collaborating with my mentor
week 2:
Start with ATLAS and determine the required software, steps to build, and begin creating an OSCAR package for ATLAS
week 3:
Create scripts to handle the configuration* of ATLAS
Review the OSCAR package installation, and scripts for configuration* looking for shortcomings, revise any scripts as needed
week 4:
Determine the required software, steps to build, and begin creating an OSCAR package for HPL
week 5:
Create scripts to handle the configuration* and running of HPL
week 6:
Determine the required software, steps to build, and begin creating a package ( maybe packages ) for the DARPA benchmarks
Create scripts to handle the configuration* and running of the DARPA benchmarks
week 7-8:
build a GUI panel that lists all installed benchmarks and has buttons to 'run', 'configure', 'view results', and 'send results' the benchmark ( very simple )
hook these buttons to the underlying command line calls starting with 'run'
build a configuration panel that allows the user to change settings from what the smart configuration script determined.
add a panel to display the results of the benchmark
provide a way to upload those results ( if time permits )
week 9-10
review the project, polish code, and test test test**


*Configuration should be smart, possibly running small tests to determine the correct values ( such as the size of P and Q in HPL, or the problem size 'N'
based off cluster memory )
**Testing will be done at every stage, since the project is broken into distinct modules hopefully by weeks 9-10 I can focus more on polishing and verifying more architectures than I was able to test along the way.

Biography:
I am a senior in Computer Science at Louisiana Tech University. I began using OSCAR in March 2006. My work related to virtualization and cluster computing, not long into this work I had a need to determine the cost of virtualization, and this involved benchmarking. While it was easy to measure the cost of virtualization on a single computer it was difficult to determine the cost to the cluster. Much of difficulty I had was do to lack of cluster experience and lack of virtualization experience. But even with some cluster experience now I still find benchmarking a cluster tedious, and definitely in need of simplification. That is how I became involved with OSCAR and HPC. I enjoyed HPC enough that I quit my job, and now work for the university setting up, maintaining and offering support for clusters. I plan to graduate in the Fall of 2007, and enter Graduate school in the winter. On the side I have helped some friends setup and run a small gaming site. I taught myself PHP and SQL to get this done, and recently I have been moving more to Perl. I am a self motivated person, I drive over an hour to goto school(I plan to move closer soon), and I somehow land in leadership positions. I want to get more involved in the opensource movement, and I look forward to working with the OSCAR developer team.