Colonel William A. Phillips

Edit links

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.[1]

The NCCS provides resources for calculation and simulation in fields including astrophysics, materials science, and climate research to users from government, academia, and industry who have many of the largest computing problems in science.[2]

The OLCF’s flagship supercomputer, the IBM AC922 Summit, is supported by advanced data management and analysis tools. The center hosted the Cray XK7 Titan system, one of the most powerful scientific tools of its time, from 2012 through its retirement in August 2019. The same year, construction began for Frontier, which is slated to debut as the OLCF’s first exascale system in 2021.[3]

History

On December 9, 1991, Congress signed the High-Performance Computing Act (HPCA) of 1991, created by Senator Al Gore. HPCA proposed a national information infrastructure to build communications networks and databases and also called for proposals to build new high-performance computing facilities to serve science.[4]

On May 24, 1992, ORNL was awarded a high-performance computing research center called the Center for Computational Sciences, or CCS, as part of HPCA.[5] ORNL also received a 66-processor, serial #1 Intel Paragon XP/S 5 for code development the same year. The system had a peak performance of 5 gigaflops (5 billion floating-point operations per second).

Oak Ridge National Laboratory (ORNL) joined with three other national laboratories and seven universities to submit the Partnership in Computational Science (PICS) proposal to the US Department of Energy as part of the High-Performance Computing and Communications Initiative.[6][7]

With the High-End Computing Revitalization Act of 2004, CCS was tasked with carrying out the Leadership Computing Facility (LCF) Project at ORNL with the goal of developing and installing a petaflops-speed supercomputer by the end of 2008.[8] The center officially changed its name from the Center for Computational Sciences to NCCS the same year.

On December 9, 2019, Georgia Tourassi–who previously served as the director of ORNL's Health Data Sciences Institute and as group leader for ORNL’s Biomedical Sciences, Engineering, and Computing Group–was appointed to director of the NCCS, succeeding James Hack.[9]

Previous Systems[10]

Intel Paragons

The creation of the CCS in 1992 ushered in a series of Intel Paragon computers, including:

  • Intel Paragon XP/S 5 (1992): The Intel Paragon XP/S 5 provided 128 GP compute nodes arranged in a 16 row by 8 column rectangular mesh consisting of one 8 by 8 group of 16MB nodes and one 8 by 8 group of 32MB nodes. Also available were four 128MB MP compute nodes in a 2 row by 2 column mesh. In addition, there was the128 MB MP boot node, four 32MB GP service nodes and six I/O nodes, five of which were connected to 4.8 GB RAID disks and the sixth to a 16 GB RAID disk. This provided a total of 40 GB of system disk space.[11]
  • Intel Paragon XP/S 35 (1992): The Intel Paragon XP/S 35 provided 512 compute processors arranged in a 16 row by 32 column rectangular mesh. In addition, there were five service nodes and 27 I/O nodes each connected to a 4.8 GB RAID disk. This provided a total of 130 GB of system disk space. Each of the five service nodes and the 512 compute nodes had 32MB of memory.[12]
  • Intel Paragon XP/S 150 (1995): The fastest computer in the world at the time of its delivery to ORNL,[13] the Intel Paragon XP/S 150 provided 1,024 nodes arranged in a 16 row by 64 column rectangular mesh. These were MP nodes, which meant there were two compute processors per node. Most of the nodes had 64MB, but 64 of the nodes had 128MB. In addition, there were five service nodes and 127 I/O nodes (119 regular I/O nodes and 4 high-performance SCSI-16 I/O nodes) each connected to a 4.8 GB RAID disk. This provided a total of 610 GB of system disk space.[14]

Eagle (2000–2005)[15]

Eagle was a 184-node IBM RS/6000 SP operated by the Computer Science and Mathematics Division of ORNL. It had 176 Winterhawk-II “thin” nodes, each with four 375 MHz Power3-II processors and 2GB of memory. Eagle also had eight Winterhawk-II “wide” nodes - each with two 375 MHz Power3-II processors and 2 GB of memory—for use as filesystem servers and other infrastructure tasks. Eagle’s estimated computational power was greater than 1 teraflop in the compute partition.

Falcon (2000)[16]

Falcon was a 64-node Compaq AlphaServer SC operated by the CCS and acquired as part of an early-evaluation project. It had four 667 MHz Alpha EV67 processors with 2 GB of memory per node and 2 TB of Fiber Channel disk attached, resulting in an estimated computational power of 342 gigaflops.

Cheetah[17] (2001–2008)[18]

Cheetah was a 4.5 TF IBM pSeries System operated by the CCS. The compute partition of Cheetah included 27 p690 nodes, each with thirty-two 1.3 GHz Power4 processors. The login and I/O partitions together included 8 p655 nodes, each with four 1.7 GHz Power4 processors. All nodes were connected via IBM’s Federation interconnect.

The Power4 memory hierarchy consisted of three levels of cache. The first and second levels were on the Power4 chip (two processors to a chip). Level-1 instruction cache was 128 KB (64 KB per processor) and the data cache was 64 KB (32 KB per processor.) The level-2 cache was 1.5 MB shared between the two processors. The level 3 cache was 32 MB and was off-chip. There were 16 chips per node, or 32 processors.

Most of Cheetah’s compute nodes had 32 GB of memory. Five had 64 GB of memory and two had 128 GB of memory. Some of the nodes in Cheetah had approximately 160 GB of local disk space that could be used as temporary scratch space.

In June 2002, Cheetah was ranked the eighth-fastest computer in the world, according to TOP500, the semi-annual list of the world's top supercomputers.[19]

Ram (2003–2007)[20]

Ram was an SGI Altix supercomputer provided as a support system for the NCCS.

Ram was installed in 2003 and was used as a pre- and post-processing support system for allocated NCCS projects until 2007.

Ram had 256 Intel Itanium2 processors running at 1.5 GHz, each with 6 MB of L3 cache, 256K of L2 cache, and 32K of L1 cache. Ram had 8 GB of memory per processor for a total of 2 TB of shared memory. By contrast, the first supercomputer at ORNL, the Cray XMP installed in 1985, had one-millionth the memory of the SGI Altix.

Phoenix (OLCF-1) (2003–2008)[21]

Phoenix was a Cray X1E provided as a primary system in NCCS.

The original X1 was installed in 2003 and went through several upgrades, arriving at its final configuration in 2005. From October 2005 until 2008, it provided almost 17 million processor-hours. The system supported more than 40 large projects in research areas including climate, combustion, high energy physics, fusion, chemistry, computer science, materials science, and astrophysics.

At its final configuration, Phoenix had 1,024 multistreaming vector processors (MSPs). Each MSP had 2 MB of cache and a peak computation rate of 18 gigaflops. Four MSPs formed a node with 8 GB of shared memory. Memory bandwidth was very high, roughly half the cache bandwidth. The interconnect functioned as an extension of the memory system, offering each node direct access to memory on other nodes at high bandwidth and low latency.

Jaguar (OLCF-2) (2005–2012)[22]

Jaguar began as a 25-teraflop Cray XT3 in 2005. Later, it was upgraded to an XT4 containing 7,832 compute nodes, each containing a quad-core AMD Opteron 1354 processor running at 2.1 GHz, 8 GB of DDR2-800 memory (some nodes used DDR2-667 memory), and a SeaStar2 router. The resulting partition contained 31,328 processing cores, more than 62 TB of memory, more than 600 TB of disk space, and a peak performance of 263 teraflops (263 trillion floating point operations per second).

In 2008, Jaguar was upgraded to a Cray XT5 and became the first system to run a scientific application at a sustained petaflop. By the time of its ultimate transformation into Titan in 2012,[23] Jaguar contained nearly 300,000 processing cores and had a theoretical performance peak of 3.3 petaflops. Jaguar had 224,256 x86-based AMD Opteron processor cores and operated with a version of Linux called the Cray Linux Environment.

From November 2009 until November 2010, Jaguar was the world's most powerful computer.

Hawk (2006–2008)[24]

Hawk was a 64-node Linux cluster dedicated to high-end visualization.

Hawk was installed in 2006 and was used as the Center’s primary visualization cluster until May 2008 when it was replaced by a 512-core system named Lens.[25]

Each node contained two single-core Opteron processors and 2 GB of memory. The cluster was connected by a Quadrics Elan3 network, providing high-bandwidth and low-latency communication. The cluster was populated with two flavors of NVIDIA graphics cards connected with AGP8x: 5900 and QuadroFX 3000G. Nodes with 3000G cards were directly connected to the EVEREST PowerWall and were reserved for PowerWall use.

Ewok (2006–2011)[26]

Ewok was an Intel-based InfiniBand cluster running Linux. The system was provided as an end-to-end resource for center users. It was used for workflow automation for jobs running from the Jaguar supercomputer and for advanced data analysis. The system contained 81 nodes. Each node contained two 3.4 GHz Pentium IV processors, a 3.4 GHz Intel Xeon central processing unit (CPU), and 6 GB of memory. An additional node contained 4 dual-core AMD processors and 64 GB of memory. The system was configured with a 13 TB Lustre file system for scratch space.

Eugene (2008–2011)[27]

Eugene was a 27-teraflop IBM Blue Gene/P System operated by NCCS. It provided approximately 45 million processor-hours yearly for ORNL staff and for the promotion of research collaborations between ORNL and its core university partner members.

The system consisted of 2,048 850Mhz IBM quad-core 450d PowerPC processors and 2 GB of memory per each node. Eugene had 64 I/O nodes; each submitted job was required to use at least one I/O node. This means that each job consumed a minimum of 32 nodes per execution.

Eugene was officially decommissioned in October 2011. However, on December 13 of the same year, a portion of Eugene’s hardware was donated to Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory.[28]

Eos (2013–2019)

Eos was a 736-node Cray XC30 cluster with a total of 47.104 TB of memory. Its processor was the Intel Xeon E5-2670. It featured 16 I/O service nodes and 2 external login nodes. Its compute nodes were organized in blades. Each blade contained 4 nodes. Every node had 2 sockets with 8 physical cores each. Intel’s HyperThreading (HT) Technology allowed each physical core to work as 2 logical cores so each node could function as if it had 32 cores. In total, the Eos compute partition contained 11,776 traditional processor cores (23,552 logical cores with HT Technology enabled).[29]

Eos provided a space for tool and application porting, small scale jobs to prepare capability runs on Titan, as well as software generation, verification, and optimization.[30]

Titan (OLCF-3) (2012–2019)

Titan was a hybrid-architecture Cray XK7 system with a theoretical peak performance exceeding 27,000 trillion calculations per second (27 petaflops). It contained both advanced 16-core AMD Opteron CPUs and NVIDIA Kepler graphics processing units (GPUs). This combination allowed Titan to achieve 10 times the speed and 5 times the energy efficiency of its predecessor, the Jaguar supercomputer, while using only modestly more energy and occupying the same physical footprint.[31]

Titan featured 18,688 compute nodes, a total system memory of 710 TB, and Cray’s high-performance Gemini network. Its 299,008 CPU cores guided simulations and the accompanying GPUs handled hundreds of calculations simultaneously. The system provided decreased time to solution, increased complexity of models, and greater realism in simulations.[32] In November 2012, Titan received the Number 1 position on the TOP500 supercomputer list.[33]

After 7 years of service, Titan was decommissioned in August 2019 to make room for the Frontier supercomputer.[34]

Current Systems

Spider

The OLCF’s center-wide Lustre file system, called Spider, is the operational work file system for most OLCF computational resources. As an extremely high-performance system, Spider has over 20,000 clients, providing 32 PB of disk space, and it can move data at more than 1 TB/s. Spider comprises two filesystems, Atlas1 and Atlas2, in order to provide high availability and load balance across multiple metadata servers for increased performance.[35]

HPSS

HPSS, ORNL’s archival mass-storage resource, consists of tape and disk storage components, Linux servers, and High Performance Storage System (HPSS) software. Tape storage is provided by StorageTek SL8500 robotic tape libraries, each of which can hold up to 10,000 cartridges.[36] Each library has 24 T10K-A drives, 60 T10K-B drives, 36 T10K-C drives, and 72 T10K-D drives.[37]

EVEREST

EVEREST (Exploratory Visualization Environment for Research in Science and Technology) is a large-scale venue for data exploration and analysis. EVEREST measures 30 feet long by 8 feet tall, and its main feature is a 27-projector PowerWall with an aggregate pixel count of 35 million pixels. The projectors are arranged in a 9×3 array, each providing 3,500 lumens for a very bright display.

Displaying 11,520 by 3,072 pixels, the wall offers a tremendous amount of visual detail. The wall is integrated with the rest of the computing center, creating a high-bandwidth data path between large-scale high-performance computing and large-scale data visualization.

EVEREST is controlled by a 14-node cluster. Each node contains four dual-core AMD Opteron processors. These 14 nodes have NVIDIA QuadroFX 3000G graphics cards connected to the projectors, providing a very-high-throughput visualization capability. The visualization lab acts as an experimental facility for development of future visualization capabilities. It houses a 12-panel tiled LCD display, test cluster nodes, interaction devices, and video equipment.

Rhea

Rhea is a 521-node, commodity-type Linux cluster. Rhea provides a conduit for large-scale scientific discovery via pre- and post-processing of simulation data generated on the Titan supercomputer. Each of Rhea’s first 512 nodes contain two 8-core 2.0 GHz Intel Xeon processors with Intel’s HT Technology and 128 GB of main memory. Rhea also has nine large memory GPU nodes. These nodes each have 1 TB of main memory and two NVIDIA K80 GPUs with two 14-core 2.30 GHz Intel Xeon processors with HT Technology. Rhea is connected to the OLCF’s high performance Lustre filesystem, Atlas.[38]

Wombat

Wombat is a single-rack cluster from HPE based on the 64-bit ARM architecture instead of traditional x86-based architecture. This system is available to support computer science research projects aimed at exploring the ARM architecture.

The Wombat cluster has 16 compute nodes, four of which have two AMD GPU accelerators attached (eight GPUs total in the system). Each compute node has two 28-core Cavium ThunderX2 processors, 256 GB RAM (16 DDR4 DIMMs) and a 480 GB SSD for node-local storage. Nodes are connected with EDR InfiniBand (~100 Gbit/s).[39]

Summit (OLCF-4)

The OLCF's IBM AC922 Summit supercomputer.
The OLCF's IBM AC922 Summit supercomputer.

The IBM AC922 Summit, or OLCF-4, is ORNL’s 200-petaflop flagship supercomputer. Summit was originally launched in June 2018, and as of the November 2019 TOP500 list, is the fastest computer in the world with a High Performance Linpack (HPL) performance of 148.6 petaflops.[40] Summit is also the first computer to reach exascale performance, achieving a peak throughput of 1.88 exaops through a mixture of single- and half-precision floating point operations.[41]

Like its predecessor Titan, Summit makes use of a hybrid architecture that integrates its 9,216 Power9 CPUs and 27,648 NVIDIA Volta V100 GPUs using NVIDIA’s NVLink.[42] Summit features 4,608 nodes (nearly a quarter of Titan’s 18,688 nodes), each with 512 GB of Double Data Rate 4 Synchronous Dynamic Random-Access Memory (DDR4) and 96 GB of High Bandwidth Memory (HBM2) per node, with a total storage capacity of 250 petabytes.[43]

Frontier (OLCF-5)

Scheduled for delivery in 2021 with user access becoming available the following year, Frontier will be ORNL’s first sustainable exascale system, meaning it will be capable of performing one quintillion—one billion billion—operations per second. The system will be composed of more than 100 Cray Shasta cabinets with an anticipated peak performance around 1.5 exaflops.[44]

Research areas

  • Biology – With OLCF supercomputing resources, researchers can use knowledge of the molecular scale to develop new drugs and medical therapies, study complex biological systems, and model gene regulation.[45]
  • Chemistry – Supercomputers like Summit can explore the intricacies of matter at the atomic level, allowing for first principles discoveries and detailed molecular models.[46]
  • Computer Science – Researchers are developing the tools necessary to evaluate a range of supercomputing systems, with the goals of discovering how best to use each, how to find the best fit for any given application, and how to tailor applications to get the best performance.[47]
  • Earth Science – High performance computing allows for large scale computation of complex environmental and geographical systems, and NCCS researchers use this information to better understand the changes in Earth's climate brought on by global warming.[48]
  • Engineering – OLCF resources like Summit are being used for engineering applications such as simulations of gas turbines and combustion engines.[49]
  • Fusion – Understanding the behavior of fusion plasmas and simulating various device aspects gives researchers insight into the construction of ITER, a prototype fusion power plant.[50]
  • Materials Science – Research into materials science at ORNL has aimed at improving various areas of modern life, from power generation and transmission to transportation to the production of faster, smaller, more versatile computers and storage devices.[51]
  • Nuclear Energy – The development of new nuclear reactors that employ advanced fuel cycles and adhere to modern safety and nonproliferation constraints requires complex modelling and simulations.[52] Often, the complexity of these simulations necessitates the use of supercomputers that can ensure accuracy of models.[53]
  • Physics – Physicists use NCCS’s high performance computing power to reveal the fundamental nature of matter, including the behavior of quarks, electrons, and other fundamental particles that make up atoms.[54]

References

  1. ^ "Overview". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  2. ^ "Overview". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  3. ^ "Frontier". www.olcf.ornl.gov. Retrieved 2020-03-11.
  4. ^ "Overview". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  5. ^ "May 2007". web.ornl.gov. Retrieved 2020-03-11.
  6. ^ Huray, Paul G. (1999-02-24). "Partnership in Computational Science". UNT Digital Library. Retrieved 2020-03-11.
  7. ^ "May 2007". web.ornl.gov. Retrieved 2020-03-11.
  8. ^ Biggert, Judy (2004-11-30). "H.R.4516 - 108th Congress (2003-2004): Department of Energy High-End Computing Revitalization Act of 2004". www.congress.gov. Retrieved 2020-03-11.
  9. ^ "Tourassi appointed director of ORNL National Center for Computational Sciences". WYSH AM 1380. 2019-12-23. Retrieved 2020-06-22.
  10. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  11. ^ "XP/S 5 Hardware Description". 1997-01-21. Archived from the original on 1997-01-21. Retrieved 2020-03-11.
  12. ^ "Oak Ridge Leadership Computing Facility". www.tiki-toki.com. Retrieved 2020-03-11.
  13. ^ "Oak Ridge Leadership Computing Facility". www.tiki-toki.com. Retrieved 2020-03-11.
  14. ^ "Hardware Description of Intel Paragon XP/S 150". Archived from the original on 1999-04-28.
  15. ^ "CCS: IBM SP (Eagle)". 2006-06-22. Archived from the original on 2006-06-22. Retrieved 2020-03-11.
  16. ^ "ORNL CCS Resources". 2000-12-10. Archived from the original on 2000-12-10. Retrieved 2020-03-11.
  17. ^ "CCS: IBM pSeries Cluster (Cheetah)". 2005-03-04. Archived from the original on 2005-03-04. Retrieved 2020-03-11.
  18. ^ "December 2007". web.ornl.gov. Retrieved 2020-03-11.
  19. ^ "The Oak Ridger Online -- Feature: Business -- 'Cheetah' eighth fastest computer 06/21/02". www.csm.ornl.gov. Retrieved 2020-03-11.
  20. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  21. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  22. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  23. ^ Ago, • 7 Years. "Jaguar Gone But Not Forgotten". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.{{cite web}}: CS1 maint: numeric names: authors list (link)
  24. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  25. ^ "Lens | National Center for Computational Sciences". 2009-12-21. Archived from the original on 2009-12-21. Retrieved 2020-03-11.
  26. ^ "National Center for Computational Sciences » Decommissioned Systems". 2012-09-13. Archived from the original on 2012-09-13. Retrieved 2020-03-11.
  27. ^ "Decommissioned Systems". Archived from the original on 2012-09-13.
  28. ^ Ago, • 8 Years. "Oak Ridge Computing Facility Donates Eugene System to Argonne". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.{{cite web}}: CS1 maint: numeric names: authors list (link)
  29. ^ "Eos". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  30. ^ "Eos". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  31. ^ "Titan". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  32. ^ "Titan". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  33. ^ "Titan: Oak Ridge National Laboratory | TOP500 Supercomputer Sites". www.top500.org. Retrieved 2020-03-11.
  34. ^ "Farewell, Titan". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  35. ^ "Spider". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  36. ^ "HPSS". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  37. ^ "HPSS". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  38. ^ "Rhea". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  39. ^ "Wombat". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  40. ^ "November 2019 | TOP500 Supercomputer Sites". www.top500.org. Retrieved 2020-03-11.
  41. ^ "Genomics Code Exceeds Exaops on Summit Supercomputer". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  42. ^ Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy’s New Pre-Exascale Systems (PDF). Teratec (Report).
  43. ^ "Summit: By the Numbers" (PDF).
  44. ^ "Frontier". www.olcf.ornl.gov. Retrieved 2020-03-11.
  45. ^ "Biology". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  46. ^ "Chemistry". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  47. ^ "Computer Science". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  48. ^ "Earth Science". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  49. ^ "Engineering". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  50. ^ "Speeding Toward the Future of Fusion". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  51. ^ "Materials Science". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  52. ^ "Nuclear Energy". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  53. ^ "Predictive chemistry of realistic systems for advanced nuclear energy". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.
  54. ^ "Physics". Oak Ridge Leadership Computing Facility. Retrieved 2020-03-11.

External links