TACC Offers SC15 Booth Tutorials - R for HPC, Million Core Job Scheduling, and more

SC15 offers a plethora of content for those who wish to learn. Check out the following tutorials scheduled in the TACC booth!

Introduction to Data Analysis with Hadoop Streaming by Weijia Xu

Tuesday, 10:30-11:30am | Beginner | 60 minutes

Hadoop has become a popular choice for both large-scale data analysis and storage. Although Hadoop is natively implemented in Java, the hadoop streaming feature enables users to process data stored in HDFS with other programming languages. This tutorial is intended to give attendee an introduction on Hadoop cluster and how to utilize analytic routines developed in diverse programming languages with Hadoop cluster through Hadoop streaming feature.

The tutorial will start with 10 minutes lecture to introduce basic ideas in the MapReduce programming model and an overview of a typical hadoop cluster infrastructure. The attendees will then be given a temporary training credential to access the hadoop cluster running at TACC. After log onto Hadoop cluster, the attendee will be first guided to work around with Hadoop distributed file system and to access prepared lab exercises that include exemplar code and sample data. The instructor will walkthrough an example Hadoop code and let attendee to run their first Hadoop application as well as explore other examples. The session will continue with introducing Hadoop streaming API. Examples for streaming will be prepared with java, python, bash script and R Script. The session will conclude with attendees to try out examples and/or try developing streaming with their favorite programming language.

Introduction to Machine Learning using Spark in Wrangler by Anusua Trivedi

Thursday, 10:30-11:30am | Beginner/Intermediate | 60 minutes

There’s a variety of tools available for advanced analytics, namely Spark (and MLlib), R, Scikit-learn, GraphLab etc. How do we decide which one is best for our problem? It depends on the scale of our problem (i.e, the amount of data that you are using). For this tutorial, we use Spark(MLlib) in Wrangler to explore different standard ML algorithms.

Wrangler has Spark (integrated into Hadoop/HDFS) implemented. In this tutorial, we start from a general introduction to Spark and describe why and how it is used through examples. Next, we explain Machine Learning (ML) and Spark in Wrangler through an exploratory study of some of the basic ML algorithms.

Software-Defined Visualization: Data Analysis for Current and Future Cyberinfrastructure by Paul Navratil

Wednesday, 12-1:30pm | Beginner | 90 minutes

The design emphasis for supercomputing systems has moved from raw performance to performance-per- watt, and as a result, supercomputing architectures are converging on processor with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in soft- ware. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this ”software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be per- formed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization).

In this introductory tutorial, we present a primer on rasterization and ray tracing and an overview of (mostly) open-source software packages available to the open-science community, as well as hands-on experience with the fundamental techniques. We begin with a brief background of terms and concepts to ensure that all participants have a working knowledge of the material covered in the remainder of the tutorial. We then motivate the concepts through three application lightning talks that demonstrate the use of rasterization and ray tracing in actual domain applications. Finally, participants will apply the concepts in guided hands-on visualization labs using the TACC XSEDE resources Stampede and Maverick. ms.

Chameleon: a Large-Scale, Reconfigurable Experimental Environment for Cloud Research by Paul Rad

Tuesday, 1-2:30pm | Intermediate | 90 minutes

With its hardware strongly emphasizing research on cloud computing in Big Compute and Big Data, and capabilities spanning modes of usage from bare metal reconfiguration to production clouds, Chameleon is poised to become an important instrument in the toolbox of anybody working on HPC or cloud research. This tutorial will present the Chameleon project, teach the attendees how to sign up and use it to run experiments, showcase several types of advanced usage of the testbed to address specific areas of research, and teach how to contribute scientific software instruments such as framework implementations that others can then leverage in their research.

The overall goal of this tutorial is to introduce Chameleon to HPC users, specifically:

Teach Chameleon basics: from provisioning and configuring bare metal resources to setting up simple monitored experiments in a hands-on session;
Demonstrate how to use existing Chameleon tools to easily set up experiments in active research areas, such as specifically data analytics, cloud computing, high-performance networking and virtualization, and wide-area networking/SDN;
Teach hands-on how attendees can use Chameleon to contribute frameworks, algorithms and other artifacts that could serve as basis for further research.

Hands on w/LMOD by Robert McLay

Wednesday, 2pm-3:30pm | Beginner/Intermediate | 90 minutes

Lmod is the software that “sets the table” for our users by allowing them to load the software packages they need to conduct their research. Lmod, a replacement for the TCL/C based environment module system, is part of the “secret sauce” that allows our users to do so in a way that protects them from many common mistakes. With Lmod, users cannot load mismatched combinations of compilers, libraries, and software tools.

This tutorial will break-down into three parts. The first part will be a brief introduction into Lmod, followed by a discussion of the new features added to Lmod. The second part will be a general question and answer session.

There has been an Lmod TACC Booth talk for the past four years and the audience has ranged from people who have never used Lmod to seasoned users who want to know about the latest feature. Given this wildly diverse audience, the hands-on third part will be a “chose your own adventure” from a list of topics to appeal to beginner to intermediate users.

R+HPC by David Walling

Wednesday, 3:30-4pm | Beginner | 90 minutes

This tutorial will demonstrate how users are leveraging the popular programming language R on TACC systems. We’ll focus on advantages to running R on HPC systems, such as access to the Intel compilers, MKL, GPU/automatic offloading and methods for running R in parallel on a cluster.

The tutorial will be in done in R and run on Stampede (Wrangler if we have vis portal integration by then) and will demonstrate how to launch parallel tasks to run on all cores of multiple nodes. Emphasis will be place on monitoring of those jobs to determine the computational efficiency of the code. The user will gain hands on experience in running R jobs in an shared HPC environment.

Programming the Intel Xeon Phi by Carlos Rosales

Thursday, 1pm-2pm | Beginner | 60 minutes

Stampede was the first large scale deployment of a Many Integrated Core product in the world, and remains one of the largest such deployments. Nearly three quarters of the 10 Petaflops in TACC’s Stampede system are provided by Xeon Phi coprocessors, and the HPC group has extensive experience in porting and optimizing codes for the Phi. The Xeon Phi is an x86-based architecture which hosts its own Linux OS. Although it is capable of running most user codes with little porting effort, attaining optimal performance requires a detailed understanding of the possible execution models and the architecture. The Intel Xeon Phi provides an introduction to future generations of HPC hardware by highlighting the importance of lightweight multithreading, vectorization and data management.

The tutorial will have a short introduction to the Many Integrated Core architecture, highlighting differences and similarities with traditional processors and other accelerators. Following this I will delve into more detailed characteristics of the Knights Corner architecture, and describe how to compile and execute native code. A hands-on session focused on threading and vectorization will then take place, and then I will move on to describe offload and symmetric execution modes very briefly.

Simple HTC using Launcher by Lucas A. Wilson

Wednesday, 4pm-5pm | Beginner | 60 minutes

The launcher is a simple framework for executing parametric studies performed across a large number of processors quickly and efficiently. Launcher’s lightweight, shell-based implementation allows for excellent scaling of high-throughput bag-of-task computations, and has been tested up to 65K cores on Stampede – TACC’s flagship high performance system.

This tutorial will provide hands-on experience with installing and using Launcher on a single system, as well as a discussion of how to configure Launcher to run on multiple workstations in a workgroup or lab environment and on batch scheduled systems. Attendees will receive hands-on experience using Launcher’s various internal scheduling modes to achieve better load balance, as well as using the provided Launcher environment variables to customize task execution. Sample sequential, multi-threaded, and MPI applications will be provided.

Also check out the launcher github: https://github.com/TACC/launcher. See his tweet:

Have MILLIONS of single-core jobs to run? Come by the @TACC booth at #SC15 and learn how to #UseTheLauncher! https://goo.gl/hctlBH