HPCG(High PerformanC++e Conjugate Gradients)基准测试是一个高性能计算性能评估工具,它主要用于衡量超级计算机在稀疏矩阵、内存访问密集型任务下的真实性能,比传统的 HPL(LINPACK)更贴近很多科学与工程计算场景。
HPL(LINPACK) 侧重浮点运算能力(FLOPS),适合反映处理器的峰值计算能力,但偏向计算密集型任务
HPCG 重点考察:
稀疏矩阵存取
内存带宽
缓存效率
通信延迟
因此,HPCG 的成绩通常是 HPL 的 0.3%~4% 左右,更接近真实 HPC 应用性能。
HPCG 实现了预条件共轭梯度法 (Preconditioned Conjugate Gradient) 求解三维泊松方程的迭代过程,包含:
稀疏矩阵-向量乘法 (SpMV)
向量更新 (AXPY)
点积 (Dot Product)
全局通信(MPI Allreduce)
多重网格预条件器
安装依赖库、clone代码、拷贝编译配置文件
# dnf install -y gcc gcc-C++ make cmake openmpi openmpi-devel# Git clone https://github.com/hpcg-benchmark/hpcg.git# cd setup# cp Make.Linux_MPI Make.kunpeng
修改编译配置文件
# vim setup_make.kunpeng#HEADER# -- High Performance Conjugate Gradient Benchmark (HPCG)# HPCG - 3.1 - March 28, 2019# Michael A. Heroux# Scalable AlGorithms Group, Computing Research Division# Sandia National Laboratories, Albuquerque, NM## Piotr Luszczek# Jack Dongarra# University of Tennessee, Knoxville# Innovative Computing Laboratory## (C) Copyright 2013-2019 All Rights Reserved### -- Copyright notice and Licensing terms:## Redistribution and use in source and binary forms, with or without# modification, are permitted provided that the following conditions# are met:## 1. Redistributions of source code must retain the above copyright# notice, this list of conditions and the following disclaimer.## 2. Redistributions in binary form must reproduce the above copyright# notice, this list of conditions, and the following disclaimer in the# documentation and/or other materials provided with the distribution.## 3. All advertising materials mentioning features or use of this# software must display the following acknowledgement:# This product includes software developed at Sandia National# Laboratories, Albuquerque, NM and the University of# Tennessee, Knoxville, Innovative Computing Laboratory.## 4. The name of the University, the name of the Laboratory, or the# names of its contributors may not be used to endorse or promote# products derived from this software without specific written# permission.## -- Disclaimer:## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS# ``AS IS'' AND ANY Express OR IMPLIED WARRANTIES, INCLUDING, BUT NOT# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.# #######################################################################@HEADER# ----------------------------------------------------------------------# - shell --------------------------------------------------------------# ----------------------------------------------------------------------#SHELL = /bin/sh#CD = cdCP = cpLN_S = ln -s -f MKDIR = mkdir -p RM = /bin/rm -f TOUCH = touch## ----------------------------------------------------------------------# - HPCG Directory Structure / HPCG library ------------------------------# ----------------------------------------------------------------------#TOPdir = . SRCdir = $(TOPdir)/src INCdir = $(TOPdir)/src BINdir = $(TOPdir)/bin## ----------------------------------------------------------------------# - Message Passing library (MPI) --------------------------------------# ----------------------------------------------------------------------# MPinc tells the C compiler where to find the Message Passing library# header files, MPlib is defined to be the name of the library to be# used. The variable MPdir is only used for defining MPinc and MPlib.#MPdir = /usr/lib64/openmpi MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi### ----------------------------------------------------------------------# - HPCG includes / libraries / specifics -------------------------------# ----------------------------------------------------------------------#HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc) HPCG_LIBS =## - Compile time options -----------------------------------------------## -DHPCG_NO_MPI Define to disable MPI# -DHPCG_NO_OPENMP Define to disable OPENMP# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous# -DHPCG_DEBUG Define to enable debugging output# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output## By default HPCG will:# *) Build with MPI enabled.# *) Build with OpenMP enabled.# *) Not generate debugging output.#HPCG_OPTS = -DHPCG_NO_OPENMP## ----------------------------------------------------------------------#HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)## ----------------------------------------------------------------------# - Compilers / linkers - Optimization flags ---------------------------# ----------------------------------------------------------------------#CXX = mpicxx#CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -WallCXXFLAGS = -O3 -march=armv8-a#LINKER = $(CXX) LINKFLAGS = $(CXXFLAGS)#ARCHIVER = ar ARFLAGS = r RANLIB = echoUSE_CUDA = 0## ----------------------------------------------------------------------#
注意这里禁用了CUDA
编译
cd .. make arch=kunpengcd bin
准备执行配置文件
默认的配置文件: hpcg.dat
HPCG benchmark input file Sandia National Laboratories; University of Tennessee, Knoxville 104 104 104 60
我们的配置文件: hpcg.dat
HPCG benchmark input file Sandia National Laboratories; University of Tennessee, Knoxville 128 128 128 300
注意上面最后一行表示执行时间,建议是1800s起,300s以下可能会不准。
执行:
# mpirun --allow-run-as-root --mca pml ob1 -np 64 ./xhpcg]# tail HPCG-Benchmark_3.1_2025-07-08_10-21-09.txtDDOT Timing Variations::Avg DDOT MPI_Allreduce time=2.58093 Final Summary= Final Summary::HPCG result is VALID with a GFLOP/s rating of=16.1137 Final Summary::HPCG 2.4 rating for historical reasons is=16.1678 Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal Final Summary::Results are valid but execution time (sec) is=310.259 Final Summary::Official results execution time (sec) must be at least=1800# cat hpcg20250708T100940.txtWARNING: PERFORMING UNPRECONDITIONED ITERATIONS Call [0] Number of Iterations [11] Scaled Residual [1.12102e-13] WARNING: PERFORMING UNPRECONDITIONED ITERATIONS Call [1] Number of Iterations [11] Scaled Residual [1.12102e-13] Call [0] Number of Iterations [2] Scaled Residual [2.79999e-17] Call [1] Number of Iterations [2] Scaled Residual [2.79999e-17] Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 7.31869e-10 Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 5.92074e-11 SpMV call [0] Residual [0] SpMV call [1] Residual [0] Call [0] Scaled Residual [0.00454823] Call [1] Scaled Residual [0.00454823]
重要参数如下:
此机的HPCG结果为:16.1137 GFLOP/s
allow-run-as-root:这个参数明确告诉 mpirun 允许以 root 用户身份运行程序。
默认情况下,许多 MPI 实现(尤其是 Open MPI)会出于安全考虑,阻止 root 用户直接运行并行应用程序,因为这可能带来安全风险,尤其是在共享集群环境中。如果您的程序需要 root 权限才能运行(或者您正在以 root 身份直接执行 mpirun 命令),但 MPI 库又默认禁止 root 运行,就会出现权限相关的错误。添加此参数可以绕过此安全检查。
在非生产环境或测试中,如果确实需要 root 权限,可以使用此参数。但在生产环境或多用户共享集群中,不推荐以 root 身份运行计算任务,通常应该使用普通用户账户。
mca pml ob1: 这个参数是 MPI Component Architecture (MCA) 的一个选项,用于选择 点对点通信层 (PML - Point-to-Point Messaging Layer) 的具体实现。ob1 是 Open MPI 中一个常用的 PML 组件。
Open MPI 具有高度模块化的架构,允许用户为不同的功能(如点对点通信、集体通信、进程管理等)选择不同的组件。ob1 是 Open MPI 中默认的、也是最常用的 PML 组件之一。它通常使用各种网络接口(如 InfiniBand、Ethernet 等)进行通信。
显式指定 ob1 通常是为了确保使用特定的通信机制,或者解决与默认 PML 相关的兼容性/性能问题。在大多数情况下,如果您不指定,mpirun 也会默认使用 ob1,但显式指定可以确保行为一致性。
np 64: 这个参数指定了要启动的 MPI 进程(或称为“秩”或“rank”)的数量。
MPI 程序通过在多个进程之间分配任务来实现并行。每个进程都有一个唯一的 ID (rank),从 0 到 np-1。
64 表示您希望 xhpcg 程序以 64 个并行进程运行。这些进程可以分布在多个计算节点上,也可以全部运行在单个节点上,这取决于您的 hosts 文件配置和 mpirun 的其他资源调度参数。
软件测试精品书籍文档下载持续更新 https://github.com/china-testing/python-testing-examples 请点赞,谢谢!
本文涉及的python测试开发库 谢谢点赞! https://github.com/china-testing/python_cn_resouce
python精品书籍下载 https://github.com/china-testing/python_cn_resouce/blob/main/python_good_books.md
Linux精品书籍下载 https://www.cnblogs.com/testing-/p/17438558.html
联系方式:钉ding或V信: Pythontesting
https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
https://developer.nvidia.com/nvidia-hpc-benchmarks-downloads?target_os=Linux&target_arch=x86_64
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
安装Phoronix Test Suite参见:https://www.cnblogs.com/testing-/p/18303322
安装 hpcg
# phoronix-test-suite install hpcg# cd /var/lib/phoronix-test-suite/test-profiles/pts/hpcg-1.3.0
修改配置文件
test-definition.xml的内容:
<?xml version="1.0"?><!--Phoronix Test Suite v10.8.4--><PhoronixTestSuite> <TestInformation> <Title>High Performance Conjugate Gradient</Title> <AppVersion>3.1</AppVersion> <Description>HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC.</Description> <ResultScale>GFLOP/s</ResultScale> <Proportion>HIB</Proportion> <TimesToRun>1</TimesToRun> </TestInformation> <TestProfile> <Version>1.3.0</Version> <SupportedPlatforms>Linux</SupportedPlatforms> <SoftwareType>Benchmark</SoftwareType> <TestType>Processor</TestType> <License>Free</License> <Status>Verified</Status> <ExternalDependencies>build-utilities, fortran-compiler, openmpi-development</ExternalDependencies> <EnvironmentSize>2.4</EnvironmentSize> <ProjectURL>http://www.hpcg-benchmark.org/</ProjectURL> <RepositoryURL>https://github.com/hpcg-benchmark/hpcg</RepositoryURL> <InternalTags>SMP, MPI</InternalTags> <Maintainer>Michael Larabel</Maintainer> </TestProfile> <TestSettings> <Option> <DisplayName>X Y Z</DisplayName> <Identifier>xyz</Identifier> <Menu> <Entry> <Name>104 104 104</Name> <Value>--nx=104 --ny=104 --nz=104</Value> </Entry> <Entry> <Name>144 144 144</Name> <Value>--nx=144 --ny=144 --nz=144</Value> </Entry> <Entry> <Name>160 160 160</Name> <Value>--nx=160 --ny=160 --nz=160</Value> </Entry> <Entry> <Name>192 192 192</Name> <Value>--nx=192 --ny=192 --nz=192</Value> </Entry> </Menu> </Option> <Option> <DisplayName>RT</DisplayName> <Identifier>time</Identifier> <ArgumentPrefix>--rt=</ArgumentPrefix> <Menu> <Entry> <Name>300</Name> <Value>300</Value> <Message>Shorter run-time</Message> </Entry> <Entry> <Name>1800</Name> <Value>1800</Value> <Message>Official run-time</Message> </Entry> </Menu> </Option> </TestSettings></PhoronixTestSuite>
执行测试
# phoronix-test-suite benchmark hpcg Evaluating External Test Dependencies ....................................................................................................................... Phoronix Test Suite v10.8.4 Installed: pts/hpcg-1.3.0 High Performance Conjugate Gradient 3.1: pts/hpcg-1.3.0 Processor Test Configuration 1: 104 104 104 2: 144 144 144 3: 160 160 160 4: 192 192 192 5: Test All Options ** Multiple items can be selected, delimit by a comma. ** X Y Z: 1 1: 300 [Shorter run-time] 2: 1800 [Official run-time] 3: Test All Options ** Multiple items can be selected, delimit by a comma. ** RT: 1 System Information PROCESSOR: ARMv8 @ 2.90GHz Core Count: 128 Cache Size: 224 MB Scaling Driver: cppc_cpufreq performance GRAPHICS: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] Screen: 1024x768 MOTHERBOARD: WUZHOU BC83AMDAA01-7270Z BiOS Version: 11.62 Chipset: Huawei HiSilicon Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892 MEMORY: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET DISK: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N File-System: xfs Mount Options: attr2 inode64 noquota relatime rw Disk Scheduler: MQ-DEADLINE Disk Details: Block Size: 4096 OPERATING SYSTEM: Kylin Linux Advanced Server V10 Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314 Display Server: X Server 1.20.8 Compiler: GCC 7.3.0 + CUDA 12.8 Security: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected Would you like to save these test results (Y/n): y Enter a name for the result file: hpcg_45_31 Enter a unique name to describe this test run / configuration: If desired, enter a new description below to better describe this result set / system configuration under test. Press ENTER to proceed without changes. Current Description: ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite. New Description: High Performance Conjugate Gradient 3.1: pts/hpcg-1.3.0 [X Y Z: 104 104 104 - RT: 300] Test 1 of 1 Estimated Trial Run Count: 1 Estimated Time To Completion: 38 Minutes [03:16 CDT] Started Run 1 @ 02:39:00 X Y Z: 104 104 104 - RT: 300: 69.8633 Average: 69.8633 GFLOP/s Do you want to view the text results of the testing (Y/n): Y hpcg_45_31 ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite. ARMv8: Processor: ARMv8 @ 2.90GHz (128 Cores), Motherboard: WUZHOU BC83AMDAA01-7270Z (11.62 BIOS), Chipset: Huawei HiSilicon, Memory: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET, Disk: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N, Graphics: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support], Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892 OS: Kylin Linux Advanced Server V10, Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314, Display Server: X Server 1.20.8, Compiler: GCC 7.3.0 + CUDA 12.8, File-System: xfs, Screen Resolution: 1024x768 High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 300 GFLOP/s > Higher Is Better ARMv8 . 69.86 |================================================================================================================================================== Would you like to upload the results to OpenBenchmarking.org (y/n): y Would you like to attach the system logs (lspci, dmesg, lsusb, etc) to the test result (y/n): y Results Uploaded To: https://openbenchmarking.org/result/2507083-NE-HPCG4531360
可用浏览器查看测试结果:
# docker pull nvcr.io/nvidia/hpc-benchmarks:25.04# vi HPCG.datHPCG benchmark input file Sandia National Laboratories; University of Tennessee, Knoxville 128 128 128 300# docker run --rm --gpus all --ipc=host --ulimit memlock=-1:-1 \ -v $(pwd):/host_data \ nvcr.io/nvidia/hpc-benchmarks:25.04 \ mpirun -np 1 \ /workspace/hpcg.sh \ --dat /host_data/HPCG.dat \ --cpu-affinity 0 \ --gpu-affinity 0 ========================================================= ================= NVIDIA HPC Benchmarks ================= ========================================================= NVIDIA Release 25.04 Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license WARNING: No InfiniBand devices detected. Multi-node communication performance may be reduced. Ensure /dev/infiniband is mounted to this container. HPCG-NVIDIA 25.4.0 -- NVIDIA accelerated HPCG benchmark -- NVIDIA Build v0.5.6 Start of application (GPU-Only) ... Initial Residual = 2838.81 Iteration = 1 Scaled Residual = 0.185703 Iteration = 2 Scaled Residual = 0.101681 ... Iteration = 50 Scaled Residual = 3.94531e-07 GPU Rank Info: | cuSPARSE version 12.5 | Reference CPU memory = 935.79 MB | GPU Name: 'NVIDIA GeForce RTX 4090' | GPU Memory Use: 2223 MB / 24082 MB | Process Grid: 1x1x1 | Local Domain: 128x128x128 | Number of CPU Threads: 1 | Slice Size: 2048 WARNING: PERFORMING UNPRECONDITIONED ITERATIONS Call [0] Number of Iterations [11] Scaled Residual [1.19242e-14] WARNING: PERFORMING UNPRECONDITIONED ITERATIONS Call [1] Number of Iterations [11] Scaled Residual [1.19242e-14] Call [0] Number of Iterations [1] Scaled Residual [2.94233e-16] Call [1] Number of Iterations [1] Scaled Residual [2.94233e-16] Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 8.42084e-10 Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 4.21042e-10 SpMV call [0] Residual [0] SpMV call [1] Residual [0] Initial Residual = 2838.81 Iteration = 1 Scaled Residual = 0.220178 Iteration = 2 Scaled Residual = 0.118926 ... Iteration = 49 Scaled Residual = 4.98548e-07 Iteration = 50 Scaled Residual = 3.08635e-07 Call [0] Scaled Residual [3.08635e-07] Call [1] Scaled Residual [3.08635e-07] Call [2] Scaled Residual [3.08635e-07] ... Call [1501] Scaled Residual [3.08635e-07] Call [1502] Scaled Residual [3.08635e-07] HPCG-Benchmark version=3.1 Release date=March 28, 2019 Machine Summary= Machine Summary::Distributed Processes=1 Machine Summary::Threads per processes=1 Global Problem Dimensions= Global Problem Dimensions::Global nx=128 Global Problem Dimensions::Global ny=128 Global Problem Dimensions::Global nz=128 Processor Dimensions= Processor Dimensions::npx=1 Processor Dimensions::npy=1 Processor Dimensions::npz=1 Local Domain Dimensions= Local Domain Dimensions::nx=128 Local Domain Dimensions::ny=128########## Problem Summary ##########=Setup Information= Setup Information::Setup Time=0.00910214 Linear System Information= Linear System Information::Number of Equations=2097152 Linear System Information::Number of Nonzero Terms=55742968 Multigrid Information= Multigrid Information::Number of coarse grid levels=3 Multigrid Information::Coarse Grids= Multigrid Information::Coarse Grids::Grid Level=1 Multigrid Information::Coarse Grids::Number of Equations=262144 Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000 Multigrid Information::Coarse Grids::Number of Presmoother Steps=1 Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1 Multigrid Information::Coarse Grids::Grid Level=2 Multigrid Information::Coarse Grids::Number of Equations=32768 Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584 Multigrid Information::Coarse Grids::Number of Presmoother Steps=1 Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1 Multigrid Information::Coarse Grids::Grid Level=3 Multigrid Information::Coarse Grids::Number of Equations=4096 Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336 Multigrid Information::Coarse Grids::Number of Presmoother Steps=1 Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1########## Memory Use Summary ##########=Memory Use Information= Memory Use Information::Total memory used for data (Gbytes)=1.49883 Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0 Memory Use Information::Bytes per equation (Total memory / Number of Equations)=714.697 Memory Use Information::Memory used for linear system and CG (Gbytes)=1.31912 Memory Use Information::Coarse Grids= Memory Use Information::Coarse Grids::Grid Level=1 Memory Use Information::Coarse Grids::Memory used=0.15755 Memory Use Information::Coarse Grids::Grid Level=2 Memory Use Information::Coarse Grids::Memory used=0.0196946 Memory Use Information::Coarse Grids::Grid Level=3 Memory Use Information::Coarse Grids::Memory used=0.00246271########## V&V Testing Summary ##########=Spectral Convergence Tests= Spectral Convergence Tests::Result=PASSED Spectral Convergence Tests::Unpreconditioned= Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11 Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12 Spectral Convergence Tests::Preconditioned= Spectral Convergence Tests::Preconditioned::Maximum iteration count=1 Spectral Convergence Tests::Preconditioned::Expected iteration count=2 Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon= Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSED Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=8.42084e-10 Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.21042e-10########## Iterations Summary ##########=Iteration Count Information= Iteration Count Information::Result=PASSED Iteration Count Information::Reference CG iterations per set=50 Iteration Count Information::Optimized CG iterations per set=50 Iteration Count Information::Total number of reference iterations=75150 Iteration Count Information::Total number of optimized iterations=75150########## Reproducibility Summary ##########=Reproducibility Information= Reproducibility Information::Result=PASSED Reproducibility Information::Scaled residual mean=3.08635e-07 Reproducibility Information::Scaled residual variance=0########## Performance Summary (times in sec) ##########=Benchmark Time Summary= Benchmark Time Summary::Optimization phase=0.017375 Benchmark Time Summary::DDOT=6.03317 Benchmark Time Summary::WAXPBY=6.80771 Benchmark Time Summary::SpMV=58.5598 Benchmark Time Summary::MG=227.166 Benchmark Time Summary::Total=298.585 Floating Point Operations Summary= Floating Point Operations Summary::Raw DDOT=9.5191e+11 Floating Point Operations Summary::Raw WAXPBY=9.5191e+11 Floating Point Operations Summary::Raw SpMV=8.54573e+12 Floating Point Operations Summary::Raw MG=4.76988e+13 Floating Point Operations Summary::Total=5.81484e+13 Floating Point Operations Summary::Total with convergence overhead=5.81484e+13 GB/s Summary= GB/s Summary::Raw Read B/W=1200 GB/s Summary::Raw Write B/W=277.327 GB/s Summary::Raw Total B/W=1477.32 GB/s Summary::Total with convergence and optimization phase overhead=1457.89 GFLOP/s Summary= GFLOP/s Summary::Raw DDOT=157.779 GFLOP/s Summary::Raw WAXPBY=139.828 GFLOP/s Summary::Raw SpMV=145.932 GFLOP/s Summary::Raw MG=209.974 GFLOP/s Summary::Raw Total=194.747 GFLOP/s Summary::Total with convergence overhead=194.747 GFLOP/s Summary::Total with convergence and optimization phase overhead=192.185 User Optimization Overheads= User Optimization Overheads::Optimization phase time (sec)=0.017375 User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=0.0396317 DDOT Timing Variations= DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.220609 DDOT Timing Variations::Max DDOT MPI_Allreduce time=0.220609 DDOT Timing Variations::Avg DDOT MPI_Allreduce time=0.220609 Final Summary= Final Summary::HPCG result is VALID with a GFLOP/s rating of=192.185 Final Summary::HPCG 2.4 rating for historical reasons is=193.058 Final Summary::Results are valid but execution time (sec) is=298.585 Final Summary::Official results execution time (sec) must be at least=1800
https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
安装:
# vim setup_make.kunpeng#HEADER# -- High Performance Conjugate Gradient Benchmark (HPCG)# HPCG - 3.1 - March 28, 2019# Michael A. Heroux# Scalable AlGorithms Group, Computing Research Division# Sandia National Laboratories, Albuquerque, NM## Piotr Luszczek# Jack Dongarra# University of Tennessee, Knoxville# Innovative Computing Laboratory## (C) Copyright 2013-2019 All Rights Reserved### -- Copyright notice and Licensing terms:## Redistribution and use in source and binary forms, with or without# modification, are permitted provided that the following conditions# are met:## 1. Redistributions of source code must retain the above copyright# notice, this list of conditions and the following disclaimer.## 2. Redistributions in binary form must reproduce the above copyright# notice, this list of conditions, and the following disclaimer in the# documentation and/or other materials provided with the distribution.## 3. All advertising materials mentioning features or use of this# software must display the following acknowledgement:# This product includes software developed at Sandia National# Laboratories, Albuquerque, NM and the University of# Tennessee, Knoxville, Innovative Computing Laboratory.## 4. The name of the University, the name of the Laboratory, or the# names of its contributors may not be used to endorse or promote# products derived from this software without specific written# permission.## -- Disclaimer:## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS# ``AS IS'' AND ANY Express OR IMPLIED WARRANTIES, INCLUDING, BUT NOT# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.# #######################################################################@HEADER# ----------------------------------------------------------------------# - shell --------------------------------------------------------------# ----------------------------------------------------------------------#SHELL = /bin/sh#CD = cdCP = cpLN_S = ln -s -f MKDIR = mkdir -p RM = /bin/rm -f TOUCH = touch## ----------------------------------------------------------------------# - HPCG Directory Structure / HPCG library ------------------------------# ----------------------------------------------------------------------#TOPdir = . SRCdir = $(TOPdir)/src INCdir = $(TOPdir)/src BINdir = $(TOPdir)/bin## ----------------------------------------------------------------------# - Message Passing library (MPI) --------------------------------------# ----------------------------------------------------------------------# MPinc tells the C compiler where to find the Message Passing library# header files, MPlib is defined to be the name of the library to be# used. The variable MPdir is only used for defining MPinc and MPlib.#MPdir = /usr/lib64/openmpi MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi### ----------------------------------------------------------------------# - HPCG includes / libraries / specifics -------------------------------# ----------------------------------------------------------------------#HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc) HPCG_LIBS =## - Compile time options -----------------------------------------------## -DHPCG_NO_MPI Define to disable MPI# -DHPCG_NO_OPENMP Define to disable OPENMP# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous# -DHPCG_DEBUG Define to enable debugging output# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output## By default HPCG will:# *) Build with MPI enabled.# *) Build with OpenMP enabled.# *) Not generate debugging output.#HPCG_OPTS = -DHPCG_NO_OPENMP## ----------------------------------------------------------------------#HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)## ----------------------------------------------------------------------# - Compilers / linkers - Optimization flags ---------------------------# ----------------------------------------------------------------------#CXX = mpicxx#CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -WallCXXFLAGS = -O3 -march=armv8-a#LINKER = $(CXX) LINKFLAGS = $(CXXFLAGS)#ARCHIVER = ar ARFLAGS = r RANLIB = echoUSE_CUDA = 0## ----------------------------------------------------------------------#0
执行:
# vim setup_make.kunpeng#HEADER# -- High Performance Conjugate Gradient Benchmark (HPCG)# HPCG - 3.1 - March 28, 2019# Michael A. Heroux# Scalable AlGorithms Group, Computing Research Division# Sandia National Laboratories, Albuquerque, NM## Piotr Luszczek# Jack Dongarra# University of Tennessee, Knoxville# Innovative Computing Laboratory## (C) Copyright 2013-2019 All Rights Reserved### -- Copyright notice and Licensing terms:## Redistribution and use in source and binary forms, with or without# modification, are permitted provided that the following conditions# are met:## 1. Redistributions of source code must retain the above copyright# notice, this list of conditions and the following disclaimer.## 2. Redistributions in binary form must reproduce the above copyright# notice, this list of conditions, and the following disclaimer in the# documentation and/or other materials provided with the distribution.## 3. All advertising materials mentioning features or use of this# software must display the following acknowledgement:# This product includes software developed at Sandia National# Laboratories, Albuquerque, NM and the University of# Tennessee, Knoxville, Innovative Computing Laboratory.## 4. The name of the University, the name of the Laboratory, or the# names of its contributors may not be used to endorse or promote# products derived from this software without specific written# permission.## -- Disclaimer:## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS# ``AS IS'' AND ANY Express OR IMPLIED WARRANTIES, INCLUDING, BUT NOT# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.# #######################################################################@HEADER# ----------------------------------------------------------------------# - shell --------------------------------------------------------------# ----------------------------------------------------------------------#SHELL = /bin/sh#CD = cdCP = cpLN_S = ln -s -f MKDIR = mkdir -p RM = /bin/rm -f TOUCH = touch## ----------------------------------------------------------------------# - HPCG Directory Structure / HPCG library ------------------------------# ----------------------------------------------------------------------#TOPdir = . SRCdir = $(TOPdir)/src INCdir = $(TOPdir)/src BINdir = $(TOPdir)/bin## ----------------------------------------------------------------------# - Message Passing library (MPI) --------------------------------------# ----------------------------------------------------------------------# MPinc tells the C compiler where to find the Message Passing library# header files, MPlib is defined to be the name of the library to be# used. The variable MPdir is only used for defining MPinc and MPlib.#MPdir = /usr/lib64/openmpi MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi### ----------------------------------------------------------------------# - HPCG includes / libraries / specifics -------------------------------# ----------------------------------------------------------------------#HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc) HPCG_LIBS =## - Compile time options -----------------------------------------------## -DHPCG_NO_MPI Define to disable MPI# -DHPCG_NO_OPENMP Define to disable OPENMP# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous# -DHPCG_DEBUG Define to enable debugging output# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output## By default HPCG will:# *) Build with MPI enabled.# *) Build with OpenMP enabled.# *) Not generate debugging output.#HPCG_OPTS = -DHPCG_NO_OPENMP## ----------------------------------------------------------------------#HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)## ----------------------------------------------------------------------# - Compilers / linkers - Optimization flags ---------------------------# ----------------------------------------------------------------------#CXX = mpicxx#CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -WallCXXFLAGS = -O3 -march=armv8-a#LINKER = $(CXX) LINKFLAGS = $(CXXFLAGS)#ARCHIVER = ar ARFLAGS = r RANLIB = echoUSE_CUDA = 0## ----------------------------------------------------------------------#1
# vim setup_make.kunpeng#HEADER# -- High Performance Conjugate Gradient Benchmark (HPCG)# HPCG - 3.1 - March 28, 2019# Michael A. Heroux# Scalable AlGorithms Group, Computing Research Division# Sandia National Laboratories, Albuquerque, NM## Piotr Luszczek# Jack Dongarra# University of Tennessee, Knoxville# Innovative Computing Laboratory## (C) Copyright 2013-2019 All Rights Reserved### -- Copyright notice and Licensing terms:## Redistribution and use in source and binary forms, with or without# modification, are permitted provided that the following conditions# are met:## 1. Redistributions of source code must retain the above copyright# notice, this list of conditions and the following disclaimer.## 2. Redistributions in binary form must reproduce the above copyright# notice, this list of conditions, and the following disclaimer in the# documentation and/or other materials provided with the distribution.## 3. All advertising materials mentioning features or use of this# software must display the following acknowledgement:# This product includes software developed at Sandia National# Laboratories, Albuquerque, NM and the University of# Tennessee, Knoxville, Innovative Computing Laboratory.## 4. The name of the University, the name of the Laboratory, or the# names of its contributors may not be used to endorse or promote# products derived from this software without specific written# permission.## -- Disclaimer:## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS# ``AS IS'' AND ANY Express OR IMPLIED WARRANTIES, INCLUDING, BUT NOT# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.# #######################################################################@HEADER# ----------------------------------------------------------------------# - shell --------------------------------------------------------------# ----------------------------------------------------------------------#SHELL = /bin/sh#CD = cdCP = cpLN_S = ln -s -f MKDIR = mkdir -p RM = /bin/rm -f TOUCH = touch## ----------------------------------------------------------------------# - HPCG Directory Structure / HPCG library ------------------------------# ----------------------------------------------------------------------#TOPdir = . SRCdir = $(TOPdir)/src INCdir = $(TOPdir)/src BINdir = $(TOPdir)/bin## ----------------------------------------------------------------------# - Message Passing library (MPI) --------------------------------------# ----------------------------------------------------------------------# MPinc tells the C compiler where to find the Message Passing library# header files, MPlib is defined to be the name of the library to be# used. The variable MPdir is only used for defining MPinc and MPlib.#MPdir = /usr/lib64/openmpi MPinc = -I$(MPdir)/include MPlib = -L$(MPdir)/lib -lmpi### ----------------------------------------------------------------------# - HPCG includes / libraries / specifics -------------------------------# ----------------------------------------------------------------------#HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc) HPCG_LIBS =## - Compile time options -----------------------------------------------## -DHPCG_NO_MPI Define to disable MPI# -DHPCG_NO_OPENMP Define to disable OPENMP# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous# -DHPCG_DEBUG Define to enable debugging output# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output## By default HPCG will:# *) Build with MPI enabled.# *) Build with OpenMP enabled.# *) Not generate debugging output.#HPCG_OPTS = -DHPCG_NO_OPENMP## ----------------------------------------------------------------------#HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)## ----------------------------------------------------------------------# - Compilers / linkers - Optimization flags ---------------------------# ----------------------------------------------------------------------#CXX = mpicxx#CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -WallCXXFLAGS = -O3 -march=armv8-a#LINKER = $(CXX) LINKFLAGS = $(CXXFLAGS)#ARCHIVER = ar ARFLAGS = r RANLIB = echoUSE_CUDA = 0## ----------------------------------------------------------------------#2