Skip to main content Skip to secondary navigation

HPC Resources

Main content start

The Institute for Computational and Mathematical Engineering (ICME) has built a diverse high performance compute (HPC) infrastructure for our students, faculty, and collaborators.  We aim to create a prototype center in ICME that allows researchers across campus to test their algorithms and computer implementation on the latest architectures.  

Stanford Computing Policies

Please read the Stanford University Compute Policy prior to requesting accounts on a system.

The Stanford Administrative Guide includes information on computing and networking policies and procedures relating to usage, security, identification, and Stanford domains:  https://adminguide.stanford.edu/chapter-6.

For support on the ICME HPC systems email research-computing-support@stanford.edu and use "ICME" in the subject.


ICME Cluster

Cluster

The ICME-GPU cluster is used by ICME students, icme workgroups and has a restricted partition for certain courses. The cluster has a total of 32 nodes. 20 CPU nodes and 12 GPU nodes.

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST

k80          up    4:30:00      5   idle icme[01-05]

CME          up    2:00:00      6   idle icmet[01-06] This partition is a restricted partition.

V100         up    8:00:00      1   idle icme06

ICME-SHARE

The icme-share server is a large resource server (3TB RAM, 144 CPU Cores, 20TB storage). This system is used by ICME students only to run large jobs in many different applications like MATLAB, R, tensorflow, and many others.

ICME-DGX

The icme-dgx system specifications are:

GPU:

8X Tesla V100

Performance:

1 Peta FLOPS

GPU Memory:

128 GB Total

CUDA Cores

40,960

Tensor Cores:

5,120

CPU:

Dual 20-Core Intel Xeon E5-2698 v4 2.2 GHz

Memory:

512 GB 2,133 MHz DDR4 LRDIMM

Base OS

Ubuntu 18.04

ICME-DGX-Station

The ICME-DGX Station is a desktop PC that has multiple RTX-6000 GPU’s. This system is meant to be leant to users requesting a local machine learning platform that has GPU’s and the RAPIDS development environment.

Access

If you would like an account on the GPU contact Brian Tempero.  

Provide your SUNet ID and a brief explanation of how you will use the Cluster.

  • Once you are provided with your login information you can ssh into the Cluster. Example: ssh -l <yoursunetid> icme-gpu.stanford.edu
  • The password will be your sunetid password.
  • This cluster is not backed up so you are responsible for your data.

Information

  • Slurm is the job manager for the icme-gpu cluster.
  • All accounts have 500GB of storage space. 
  • Please do not store any critical information on this cluster. It can be rebuilt with very little notice.
  • Login using your sunet_id and password. 
  • To see available resources please use the command “spart”
  • Here is a sample of how to get to allocate a compute node and run a module in interactive mode:
    • srun --partition=k80 --gres=gpu:1 --pty bash
slurm_101.pdf (47.9 KB)
basic_slurm_0_3 (94.44 KB)

 

SLURM Complete

bspmv_icme.pdf (1.42 MB)

 

 

NVIDIA CUDA Forum