HPC Resources

The Institute for Computational and Mathematical Engineering (ICME) has built a diverse high performance compute (HPC) infrastructure for our students, faculty, and collaborators. We aim to create a prototype center in ICME that allows researchers across campus to test their algorithms and computer implementation on the latest architectures.
Stanford Computing Policies
Please read the Stanford University Compute Policy prior to requesting accounts on a system.
The Stanford Administrative Guide includes information on computing and networking policies and procedures relating to usage, security, identification, and Stanford domains: https://adminguide.stanford.edu/chapter-6.
For support on the ICME HPC systems email research-computing-support@stanford.edu and use "ICME" in the subject.
ICME Cluster
Cluster
The ICME-GPU cluster is used by ICME students, icme workgroups and has a restricted partition for certain courses. The cluster has a total of 32 nodes. 20 CPU nodes and 12 GPU nodes.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
k80 up 4:30:00 5 idle icme[01-05]
CME up 2:00:00 6 idle icmet[01-06] This partition is a restricted partition.
V100 up 8:00:00 1 idle icme06
ICME-SHARE
The icme-share server is a large resource server (3TB RAM, 144 CPU Cores, 20TB storage). This system is used by ICME students only to run large jobs in many different applications like MATLAB, R, tensorflow, and many others.
ICME-DGX
The icme-dgx system specifications are:
GPU: |
8X Tesla V100 |
Performance: |
1 Peta FLOPS |
GPU Memory: |
128 GB Total |
CUDA Cores |
40,960 |
Tensor Cores: |
5,120 |
CPU: |
Dual 20-Core Intel Xeon E5-2698 v4 2.2 GHz |
Memory: |
512 GB 2,133 MHz DDR4 LRDIMM |
Base OS |
Ubuntu 18.04 |
ICME-DGX-Station
The ICME-DGX Station is a desktop PC that has multiple RTX-6000 GPU’s. This system is meant to be leant to users requesting a local machine learning platform that has GPU’s and the RAPIDS development environment.
Access
If you would like an account on the GPU contact Brian Tempero.
Provide your SUNet ID and a brief explanation of how you will use the Cluster.
- Once you are provided with your login information you can ssh into the Cluster. Example: ssh -l <yoursunetid> icme-gpu.stanford.edu
- The password will be your sunetid password.
- This cluster is not backed up so you are responsible for your data.
Information
- Slurm is the job manager for the icme-gpu cluster.
- All accounts have 500GB of storage space.
- Please do not store any critical information on this cluster. It can be rebuilt with very little notice.
- Login using your sunet_id and password.
- To see available resources please use the command “spart”
- Here is a sample of how to get to allocate a compute node and run a module in interactive mode:
- srun --partition=k80 --gres=gpu:1 --pty bash