Version 3.1.4

(TM) Trademark 2008 CapeSym, Inc.

CapeSym |
Table of Contents |

SYMMIC calculates the temperature field in a device by solving the energy conservation equation.

The term on the left multiplies volume density (ρ),
specific heat capacity (c), and the change in temperature with time.
The first term on the right hand side is the divergence of the
product of the conductivity tensor (κ)
and the temperature gradient. (The tensor embodies material thermal
conductivity that may vary with temperature and direction.) The
second term on the right (Q_{v}) is the heat flux per unit
volume.

In steady-state, the equation reduces to

such that volume density and specific heat capacity are not required for solution.

Four types of boundary conditions (BCs) can be applied when solving the energy equation: the Dirichlet BC, in which the temperature is fixed at a known value; the flux BC, a constant or time-varying heat flux on a surface or volume; the film BC, in which a heat transfer coefficient and sink temperature are specified; and the radiation BC that models radiative heat loss to the ambient environment. The default BC is the adiabatic or insulated condition, essentially a zero-flux BC. Device templates typically have all external surfaces as adiabatic, except for a film BC on the bottom surface (backside) of the device.

The film boundary condition obeys Newton's law of cooling:

where the surface flux Q_{A} is equal to a heat transfer
coefficient (h) times the difference between the surface temperature
and a constant environmental temperature (T_{}).
The heat transfer coefficient and temperature are intended to capture
the thermal resistance from the backside of the device or layout to
the reference (e.g. cold plate) temperature. A very large value for
the heat transfer coefficient (e.g. 1 W/µm^{2}
°C) can be used to effectively turn the film BC into a Dirichlet
BC.

For semiconductor transistors, power dissipation is typically due
to joule heating which is modeled as a surface or volume heat flux BC
applied in the active epi-layers. Power
parameters allow the user to set the magnitude of this flux.
The power dissipation and film parameters of a device template are
usually modifiable in a parameter list. See the documentation for a
particular device template for details. Materials properties and
thermal conductivities are also modifiable through the *Materials*
list.

SYMMIC uses the Finite Element Method (FEM) to solve the energy conservation equation. The finite element method reduces the continuous problem to a discrete problem with a finite number of degrees of freedom. The starting point of the method is to transform the strong form of the continuous problem (i.e. the partial differential equations) to a weak variational form (principle of virtual work in structural mechanics) defined in appropriate functional spaces. The second step of the finite element approximation is the subdivision of the problem domain into conforming finite elements. The last step is to discretize the weak form of the equations based on the finite dimensional subspaces associated with the elements. This process is detailed in many books (e.g.[1]). The end result is a matrix equation of the form

where x is the vector of unknown temperatures at the nodes of the mesh.

The mesh resolution is defined in the device template. Details on
the specification of the mesh are given in the appendix, in the
**Points Section** of the **Device Template File Format**.
SYMMIC automatically generates the finite element mesh and solves the
energy conservation equation based on this specification, so that the
template user does not need to be concerned with such details. In
addition, the hexahedral mesh generation algorithms in SYMMIC have
been custom designed for this application, to ensure robustness and
enable high accuracy without creating a mesh that is too big to solve
quickly on desktop hardware.

Template designers should take advantage of the biased meshing capabilities built into the specification of the geometric features (Points and Layers), which allow each feature to be meshed finely in some parts or directions, and meshed more coarsely in other regions or directions. The best way to mesh for thermal analysis is to choose a size attribute for the smallest mesh elements at the edges of the power dissipation regions, along with bias attributes that gradually increase the mesh size away from high temperature gradients. These size and bias attributes should be defined as template parameters so that they can be easily modified from the SYMMC user interface. A simple example of this type of meshing design is the gResistor.xml device template.

Once this specfication is established, the template designer should conduct a mesh resolution study in which the problem is repeatedly solved and the mesh size within and near the heat flux incrementally decreased. The mesh resolution may be considered sufficient when the peak temperature stops increasing as the size parameters are decreased. Another indicator of poor mesh resolution can be slow convergence of the solver and/or the nonlinear fixed-point iterations. Poor mesh resolution across a large, sharp temperature gradient will not only slow down convergence, it can also result in a visibly unstable solution in which large temperature spikes appear in place of the usual smooth transition. Note that aspect ratios of the mesh elemets are not usually an issue, because large aspect ratios are not likely to cause convergence or stability problems for the type of finite elements used by SYMMIC.

The matrix equation is solved using one of three linear solvers.
The table below shows the three solvers and the commands that may be
used to access them on different platforms. The default linear solver
is the Parallel Direct Sparse Solver (PARDISO). The *Run simulation*
command in the Solve menu invokes the PARDISO solver on a Windows
desktop workstation. This is represented in the table by the green
area under the PARDISO heading.

PARDISO uses multi-threading to speed up the matrix inversion
through parallel computations on multi-core systems. An iterative
Preconditioned Conjugate Gradient (PCG) solver is also available on
Windows. This PCG solver requires Message Passing Interface (MPI)
technology for its parallel computations and can only be accessed
through the mpiexec command operating on the xSYMMIC command line
utility. The mpiexec command is only available when the MPI library
has been installed on the platform. For more details on mpiexec,
please see the related section in the **xSYMMIC Simulations**
chapter of the manual.

Linear solvers from the PETSc Library are also available when that library has been installed on a Linux operating system. Use of the PETSc library is not available on a Windows platform. Linux execution is currently only available through the xSYMMIC command line utility. Both PETSc and the built-in PCG solver can operate under mpiexec to parallelize the computation across a cluster, provided the MPI library is installed on each of the compute nodes.

SYMMIC supports solving the FEM problem on a different computer
system from the one running the Windows GUI, through the Remote Run
and Remote Jobs dialogs. These dialogs interact with remote computer
systems that already have licensed copies of xSYMMIC installed.
CapeSym provides a cloud-based Linux machine image with xSYMMIC
pre-installed that can be accessed for a small hourly fee. For
details, please see the section entitled **SYMMIC in the Cloud**.

The linear solver is applied in sequential iterations to resolve the effects of temperature-dependent properties. The fastest solutions are obtained for single device problems in which the material properties do not depend on temperature, since no additional iterations are required. To solve problems with temperature-dependent properties, after each iteration the values of the M and K matrices are updated based on the just-computed temperature solution. These nonlinear iterations will be reported in the console during the computation of a solution in the following format.

Iteration: 1

Fixed Point Residual = 4.71e-05

The "Fixed Point Residual" is the absolute maximum change in temperature at any node in the solution from the temperature at that node on the previous iteration. The first iteration (Iteration: 0) is the linear solution to the matrix equation assuming M and K are constant rather than varying with temperature.

The nonlinear iterations may be disabled by unchecking the box at
the top of the Solver Settings dialog, as shown in the figure below.
The convergence level for the nonlinear iterations can be adjusted
through the *Fixed-point residual tolerance* parameter in the
dialog. The problem is considered to have converged if the "Fixed
Point Residual" is less than this parameter, the minimum value
of which is fixed at 1.0e-4 °C. The default value is1.0e-3 for
accurate results, but it may increased to reduce the number of
nonlinear fixed-point iterations and possibly speed up the
computation of the solution.

**The
Solver Settings Dialog**

To solve layouts, SYMMIC uses a superposition of solutions for individual devices. This approach reduces the size of the mesh that must be solved numerically. Rather than creating a large finite element problem that fully resolves the heating of all devices simultaneously, each device is meshed and solved separately and the results recombined to get the final solution. Since each single device simulation requires less memory and computation time than the full layout, the problem can often be solved using fewer resources.

For a layout involving N devices with individual solutions T_{i}(t),
the full solution T(t) is computed by

where T_{Ø}
is the “null” solution of the non-powered layout. The
null solution captures the influence of both the initial and
environmental temperatures on the full solution, allowing the
individual device solutions to be computed at realistic temperatures.

Solving a layout of devices using superposition uses a different
mesh for each part *T*_{i} in the above equation.
The total solution *T* is represented by a mesh different from
any of the *T*_{i}, as illustrated in the
following graphic. In this example, a layout of four mesa resistors
is solved by summing together the null solution mesh and the meshes
for each device simulated alone. The total solution is summed on a
coarse mesh between the devices and a finer mesh within each device
region. The fine meshes are determined by the device template meshing
specifications, while the uniform spacing of the coarse mesh between
devices is specified by the *Mesh Size* parameter in the Layout
Editor. The *Mesh Size* also determines the grid spacing of the
null solution mesh *T*_{Ø}.

As each individual device is solved during superposition, a biased
(non-uniform) mesh *T*_{i} is created that grows
outward from the device to cover the entire MMIC. This mesh is used
to determine the heat flow from the device to adjacent devices in the
layout. The growth rate of the mesh elements is specified by the *Bias*
parameter in the Layout Editor. The user does not see the individual
meshes used to solve the layout and so the effect of the *Bias*
parameter may not be readily apparent. To view the biased mesh for a
device, delete all of the other devices in the layout and then use
*Show Mesh* in the Model Check dialog.

**Note: For a layout of only one device, the Show Mesh preview
is different from the solution mesh.**

**Superposition
of individual meshes are used to solve a layout.**

Adjacent devices in a layout will, in general, cause the
temperature of a device to be elevated above the level obtained from
single-device simulation. When temperature-dependent material
properties are involved, this may alter the solution for that device.
Strict linear superposition will not be accurate in this case. Once
the magnitude of the effect is known, however, each device solution
can be recalculated taking into account the heating from adjacent
devices. The user has control over whether this second iteration is
performed through the “*Repeat superposition*” check
box in the Solver Settings dialog. The repetition essentially doubles
the computation time for the problem, but may not be necessary if
devices are widely separated and of sufficiently low power, or if
material properties are not strongly affected by temperature.

**Note: When solving a layout that contains only a single
device, “Repeat superposition” is unnecessary because
there are no interactions with other devices.**

Parallel computations in SYMMIC may be divided into two types, those that share a common memory space versus those that do not. Shared memory parallelism often occurs within the cores of a single processor, while separate computers usually have independent memory spaces. These different ways of distributing a problem over multiple cores or computers require different programming constructs. The Message Passing Interface (MPI) is designed for distributed memory systems, where each process has isolated memory, whereas Open Multi-Processing (OpenMP) requires shared memory. OpenMP provides parallelism for a single workstation, and MPI provides parallelism for a group (or “cluster”) of machines. MPI also works on the multiple cores within the shared memory of a single workstation.

The superposition computation offers
a *hybrid parallel* mode, where MPI is used to distribute the
problem over different workstations (or “nodes”) in a
cluster, and OpenMP is used to distribute a portion of problem among
the CPU cores of a node. Layouts containing a large number of devices
may be more efficiently solved by distributing the individual devices
in a superposition across the cores of multiple nodes in a cluster
using MPI. A master-worker approach is used, where the master
process distributes device templates to be solved to all of the other
processes, receives the device solutions, and hands out more work as
needed, until the entire layout solution is completed. Since layouts
are solved by superposition, each of the device template solutions
can be calculated independently of all the others, so this works very
well in parallel. Speedups of over 100x have been achieved. (The
speedup is the serial execution time divided by the parallel
execution time.) This parallel algorithm is known as “Level 2”
superposition to distinguish it from normal, “Level 1”
superposition, which has been a hallmark of SYMMIC's solution
approach for layouts. The figures below illustrate the two
superposition algorithms.

Level 2 superposition works very well when there are many more templates in the layout than there are CPU cores, and each node has enough memory to solve a different template on every core, or at least on most of its cores. At a minimum, the number of templates in the layout should not be less than the number of cores requested from MPI. On Windows, each worker node solves for device temperatures using the PARDISO solver and may benefit from the availability of additional cores for OpenMP parallelism.

**Level
1 Superposition:** There are three cores (n1, n2, n3) which all
work together to solve each device template (T1, T2) in sequence. At
the end of each solution, the main thread (root path) sums the device
into the final solution mesh.

**Level
2 Superposition:** There are three MPI nodes (n1, n2, n3) that work
independently to solve each device template (T1, T2, T3) in parallel.
The n1 process (root path) is the scheduler that doles out problems
to the other nodes to be solved in parallel. When a node finishes one
problem it asks for another one to solve, until all device solutions
have been computed and summed to get the final solution for the
layout.

Level 2 superposition execution speed scales very well with the number of nodes in the cluster. This is demonstrated for a layout with 5039 devices (depicted below) solved on the NERSC Edison supercomputer cluster. As shown in the results below, the run time is still decreasing at 252 cores spread over 42 nodes.

Although the supercomputer has a
very fast interconnect between the nodes, the interconnect is not a
big factor in the execution speed. A 10 Gbit interconnect will
perform just about as well. Excellent performance can also be
obtained by solving on high performance instances in the cloud.
xSYMMIC is now available on the Amazon Marketplace, giving everyone
access to as large a single workstation or cluster as desired. Amazon
only charges hourly for the wall time that the instance is running,
and they provide very high performance hardware, which together makes
for a compelling high-performance computing option for many
companies. See the **SYMMIC in the Cloud**
section for more details.

**A
layout of 5039 devices on a mixed-signal MMIC.**

**Faster
solution of the layout is achieved by level 2 superposition on more
cores.**

The default solver is the Parallel Direct Sparse Solver (PARDISO) in the Intel Math Kernel Libraries (MKL). The PARDISO algorithm was originally developed at the Department of Computer Science, University of Basel [2]. As implemented in the MKL, the PARDISO solver includes a version of the METIS algorithm originally developed in the Karypis lab at the Univeristy of Minnesota [3]. The PARDISO solver in SYMMIC does not use MPI parallelism, but it does use OpenMP parallel constructs.

The PARDISO solver is very fast, but uses a lot of memory. Fast performance is only obtained when there is sufficient physical RAM to hold the entire computation. The peak memory required for PARDISO is dependent on problem size and structure, as well as the number of threads utilized in the computation. The requirement usually lies in the range of 5-10 GigaBytes (GB) per million nodes. A simple estimate of the lower bound of the required memory for the PARDISO solver is 5.5 GB of physical RAM per million nodes. The number of unique problem nodes (temperatures) is reported to the console when the problem is meshed. Hence, it may be a good idea to mesh the problem first before trying to solve it. Use the Show Mesh button in the Model Check dialog to estimate the memory requirement. For example, if Show Mesh reports 10 million nodes then the PARDISO solver will require at least 10*5.5 = 55 GB of RAM for a fast solve. Be aware that memory consumption will grow faster than linear in the number of problem nodes due to multi-threading, as discussed further below.

PARDISO estimates the required memory to solve the problem in the analysis phase. Just before numerical factorization is attempted, this estimate is compared to the memory currently available as reported by the operating system. If there is insufficient memory, numerical factorization will not be attempted. Instead the following message sequence will appear in the console.

Memory available= 18172924 KB Memory needed to solve in-core= 28294654 KB, out-of-core= 6592078 KB Pardiso Error -2 : Not enough memory Unable to complete the simulation run.

In this specific case, the problem to be solved involved 4,968,066 temperature nodes, but the available memory of 17.3 GB was less than the 27 GB that PARDISO estimated it would need to factor a problem of this size. The memory analysis is not fool proof. More memory may be consumed during the solution than initially estimated by PARDISO. Also, it is possible for all of the available memory to be consumed in the analysis phase itself for extremely large problems. However, the PARDISO analysis will generally indicate whether another approach needs to be tried to solve the problem.

PARDISO normally holds the entire matrix factorization in physical
RAM, also known as *in-core* storage. There is a PARDISO option
to store the factorization on the hard drive instead. This
*out-of-core* storage generally requires a lot less RAM than the
in-core solution, but the computation may be twice as slow. In the
above message, PARDISO also estimated that 6.3 GB would be needed for
out-of-core solution. Solving the whole problem out-of-core typically
uses more RAM than the amount estimated. A useful rule of thumb for
the lower bound requirement is 2.75 GB per million nodes, (e.g. 13.7
GB for 4.97 million nodes). If this rule-of-thumb estimate is more
than the memory available, consider using one of the iterative
solvers and **mpiexec** to distribute the memory requirement over
multiple machines. For an example, see the section on **Choosing
Parallel Computing Methods** later in this chapter.

The PARDISO out-of-core option may be selected by checking the box in the Solver Settings dialog. During out-of-core solution, a group of temporary files ending in _ooc.ind, _ooc.jal, _ooc.lup, _ooc.sin, and _ooc.sle will be written to the working directory. This directory will need to have enough disk space to hold the whole factorization. For the 4.97 million node problem, this turns out to be about 25 GB on Windows and 23 GB on Linux. In addition to the files just mentioned, a sequence of .lnz files is generated on Windows. On Linux, a single .lnz file is generated to hold the factorization. Often PARDISO generates a new error message and aborts the computation if the RAM or disk space turns out to be inadequate.

* Known Issue:* In some instances of Linux, the
out-of-core solver will hang just before creating the .lnz file. If
this happens try moving the template to a directory with lots of disk
space that is not in a network file system, and then restart the
computation in that directory.

SYMMIC uses OpenMP threading technology to speed up the finite element computations on multi-core systems. OpenMP recognizes the environment variable OMP_NUM_THREADS as specifying the number of threads available to the application. The PARDISO solver also uses OpenMP technology. The MKL also recognizes environment variables that are independent of OpenMP, such as MKL_NUM_THREADS, and these variables are inspected first, then the OpenMP variables are examined, and if neither is found, PARDISO chooses the default number of threads. The default number of OpenMP threads is equal to the number of physical cores on the system.

Some systems support two independent threads per physical core, effectively doubling the number of “processors” available. For instance, the Task Manager on Windows may show eight processors which are really hyper-threads running on four physical cores. Environment variables should never be defined to use more than the total number of hyper-threads, and may run faster with half that value.

The user may reduce the amount of parallelization in the PARDISO solver by setting the environment variable MKL_NUM_THREADS to a low number. This will slow down the computation but may result in less memory and CPU utilization, freeing computational resources for other tasks. The best practice is to not define any environment variables at all and let the MKL choose the fastest parallelization.

If you just want to set the environment variable locally in a command prompt before running xSYMMIC you can do that with the Windows command

> set MKL_NUM_THREADS= 1

In a Linux shell, use the command

$ export MKL_NUM_THREADS=1

To see the value of the environment variable, type the "set" command without any parameters. This will show all the defined environment variables and their values.

The amount of memory consumed by the PARDISO solver depends on the number of parallel threads it uses, as seen in the figures below. On the left is shown the memory consumed and estimated by PARDSIO on an 8-core machine, while the right image shows memory rates for the same set of problems on a 48-core machine. The analysis phase memory consumed and the estimated RAM for out-of-core factorization both increase linearly with problem size. On large problems, the analysis phase may actually use more memory than the out-of-core factorization. For example, the analysis phase needs 27 GB on the problem with 10.8 million temperatures, while the out-of-core estimate is 16 GB.

Linear estimates are not sufficient for multi-threaded solves on large problems. An exponent (on the number of nodes) larger than 1.1 is needed to keep up with the increase in memory consumption for in-core solves, and this exponent increases when more cores are made avalable to PARDISO for multi-threading.

When MPI parallelism (**mpiexec**) is used on the Windows
command line, the linear solver is switched from PARDISO to an
iterative preconditioned conjugate gradient (PCG) solver with block
Jacobi preconditioning using incomplete LU factorization with zero
fill-in [4]. Partitioning of equations among parallel processes is
performed by the METIS library [3] prior to the PCG computation. When
the number of parallel processes becomes large enough, the PCG
iterative solver will usually be faster than PARDISO, with less
memory usage per machine.

The next figure plots the peak memory used by the PCG solver for
different problem sizes, when running on an Intel Xeon Platinum 8175
processor with 48 cores and 2 hyperthreads per core. The number of
parallel MPI processes is designated by the **-n** flag on the
command line, so “PCG -n 48” indicates that 48 parallel
processes are being used in the computation. The lowest line in the
plot (*orange*) is the total memory used by two parallel
processes. For a problem with a million unknowns, the total memory
used by two processes was 1.6 GB, versus the 6 GB required for
PARDISO. At 48 parallel processes, the PCG memory requirement of
almost 13 GB exceeds that of PARDISO. Going from 2 to 48 processes
results in an 8-fold increase in the peak memory requirement at all
problem sizes.

A reasonably accurate model of the relationship of problem size
(designated by *x* million nodes) and number of parallel
processes (*n*) to the peak memory used, is given by

This formula is represented by the black diamonds on the above
figure. To find the RAM required by the PCG solver for a problem with
*x*=5 million nodes when running on *n*=4 MPI processes, we
would calculate just under 10 GB using this formula. Therefore, the
PCG solver could solve for 5 million temperatures on a Windows
workstation with 4 cores and 32 GB of RAM, while the PARDISO in-core
solver would fail due to insufficient memory on this same problem.

Use the MPI executor on a command line to run the PCG solver instead of PARDISO.

> mpiexec -n 4 xSYMMIC GaNSi_FET5million.xml

For more information about accessing MPI from the command line,
please see the chapter on **xSYMMIC Simulations**. The PCG solver
is only available when using xSYMMIC and **mpiexec**.

The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines for the parallel solution of partial differential equations developed at Argonne National Laboratory [5]. On Linux systems, xSYMMIC can use PETSc (or the built-in PCG solver) as the linear solver instead of PARDISO. The required libraries are included with the xSYMMIC Linux distribution, so no additional installation of PETSc is required. Neither SYMMIC nor xSYMMIC will use PETSc on Windows.

The PETSc solver is selected by using the **-usePETSc** command
line option. The PETSc solver may be selected with or without the
**mpiexec** command. Without the **mpiexec** command, the
**-usePETSc** flag selects the serial (single process) version of
the PETSc solver, instead of the PARDISO solver with full OpenMP
parallelism. When the **mpiexec** command is used, the PETSc
solver is parallelized over the number of processes (or *ranks *to
use the MPI terminology) specified by the **-n** flag. The PETSc
solver does not make use of OpenMP threads, so it is fastest when the
number of processes equals the number of physical cores in the
machine. Always choose as many processes per machine as the available
memory will support, but no more than the number of physical cores.

The solver implemented for the PETSc option is an iterative
preconditioned conjugate gradient (PCG) solver with block Jacobi
preconditioning using incomplete LU factorization with zero fill-in.
This is the same algorithm used by the built-in PCG solver and the
two solvers have comparable performance. Partitioning for the PETSc
solver is based on the PT-SCOTCH library [6]. Options for monitoring
or configuring PETSc may be added to the end of the xSYMMIC command
line following the **-usePETSc** flag. See the section on **xSYMMIC
Simulations** and the PETSc users
manual [5] for more details.

The peak memory required by the PETSc solver is more difficult to predict, but is consistently more than the memory required by the built-in PCG solver. An approximate model for the PETSc solver is

where *n* is the number of processes and *x* is the
problem size in millions of nodes. This model indicates that the
PETSc solver, when running on 48 parallel processes, requires about
8.5 times more memory than it uses when running on just 2 parallel
processes. PARDISO typically requires about 3-4 times as much RAM as
PETSc when only 2 cores are utilized, but when 48 cores are used on a
single machine PARDISO will probably use less memory than either the
PETSc or built-in PCG solvers.

PETSc parallel computation is capable of providing faster solves than PARDISO, even on a single computer, when there a enough physical cores. The graph above summarizes the solution speed on three problem sizes when running on 2-48 processes/threads on a single workstation with an Intel Xeon Platinum 8175 processor. Solution speed is problem-dependent, but in most cases the PARDISO solver is faster than the iterative solvers when running on just a few cores. When the problem is large and distributed to many cores, the PETSc solver is fastest. The PETSc solver is faster than the built-in PCG solver in this situation, because of relaxed convergence criteria that reduce the number of iterations. However, these relaxed convergence criteria could also affect the relative accuracy of the PETSc solver on some problems.

T.J.R. Hughes. The Finite Element Method. Linear Static and Dynamic Finite Element Analysis, Dover Publications, Mineola, NY, 2000.

O. Schenk, K. Gartner, and W. Fichtner. Efficient Sparse LU Factorization with Left-right Looking Strategy on Shared Memory Multiprocessors. BIT, 40(1):158-176, 2000.

G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20(1):359-392, 1998.

Y. Saad. Iterative Methods for Sparse Linear Systems. Second Edition. Society for Industrial and Applied Mathematics, 2003.

S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith, S. Zampini, H. Zhang, (2018) PETSc Users Manual, https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf

C. Chevalier and F. Pellegrini. PT-Scotch: A tool for efficient parallel graph ordering. Parallel Computing 34, 6-8 (2008) 318-331.

CapeSym > SYMMIC
> Users Manual
> Table of Contents

© Copyright 2007-2021 CapeSym, Inc. | 6 Huron Dr. Suite 1B, Natick, MA 01760, USA | +1 (508) 653-7100