Difference between revisions of "Using petsc"
(27 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | The chunk of f90 code at the bottom of this page shows one way to solve a sparse linear system, on a single processor | + | The chunk of f90 code at the bottom of this page shows one way to solve a sparse linear system, on a single processor - parallel version at [[using petsc (parallel)]] - |
− | using PETSc. If you're new to PETSc, you might find all of the MatSetValues statements a bit odd. It *is* possible | + | using PETSc. I'll try to solve the a discrete version of the elliptic problem |
+ | |||
+ | <math> \nabla . (a(x,y) \nabla u) + b(x,y)u = c(x,y)</math> | ||
+ | |||
+ | <math> u(0) = 0</math> | ||
+ | |||
+ | <math> u'(L) = d</math> | ||
+ | |||
+ | on a 256x256 grid, with a,b and c set so 1 on the left half of the domain, and h, r and p on the right - a bit like | ||
+ | a grounding line problem | ||
+ | |||
+ | If you're new to PETSc, you might find all of the MatSetValues statements a bit odd. It *is* possible | ||
to full up normal Fortran arrays, and then insert them into PETSc structures, but the methods I've used | to full up normal Fortran arrays, and then insert them into PETSc structures, but the methods I've used | ||
− | below are (a bit) easier to adapt to multiple processors. The constants in the code were picked to give the solvers a hard time - if they were all 1, the iterative | + | below are (a bit) easier to adapt to multiple processors. The constants in the code |
+ | |||
+ | <pre> | ||
+ | !x > 1/2 diffusion = h u' | ||
+ | h = 1e-2 | ||
+ | !x > 1/2 dissipation = r u , set small r to have problems inverting the matrix | ||
+ | !so that direct solvers win | ||
+ | r = 1e-6 | ||
+ | !x > 1/2 source = p | ||
+ | p = 0.5 | ||
+ | ! u' at x = 1 | ||
+ | d = 100 | ||
+ | </pre> | ||
+ | |||
+ | were picked to give the solvers a hard time - if they were all 1, the iterative methods below would beat the direct solver decisively in CPU time. I think the strength of PETSc is the number of solvers and preconditioners you get to try without too much fiddling | ||
+ | |||
+ | ---- | ||
− | To run this, first you need PETSc, I configured it like this | + | To run this, first you need PETSc, I configured it like this (stuff I compile myself ends up in /local) (petsc is already install on '''the musketeers''', see below) |
<pre> | <pre> | ||
− | /configure --prefix=/ | + | ./configure --prefix=/local/petsc/opt --download-parmetis=yes --download-mumps=yes |
+ | --download-f-blas-lapack=yes --download-blacs=yes | ||
+ | --download-scalapack=yes --enable-debugging=0 --with-mpi-dir=/local | ||
</pre> | </pre> | ||
then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners | then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners | ||
Line 13: | Line 42: | ||
Then compiled like this (with correct filenames and include / lib dirs) | Then compiled like this (with correct filenames and include / lib dirs) | ||
<pre> | <pre> | ||
− | mpif90 -I / | + | |
+ | mpif90 -I /local/petsc/opt/include/ -I /local/include/ -x f95-cpp-input scalar.f90 | ||
+ | -L /local/petsc/opt/lib/ -L/local/lib -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common | ||
+ | -lparmetis -lmetis -lpord -lscalapack -lblacs -lflapack -lfblas -lX11 | ||
+ | </pre> | ||
+ | |||
+ | ---- | ||
+ | |||
+ | On '''the musketeers''', you should do the following: | ||
+ | |||
+ | First load your chosen MPI layer, in this case we'll use '''MPICH''' compiled with '''ifort''', but we could also choose to use OpenMPI (with gfortran or ifort): | ||
+ | |||
+ | <pre> | ||
+ | module add mpich/intel_fc_ifort10.1/1.2.6 | ||
+ | </pre> | ||
+ | |||
+ | Now load the '''corresponding''' Petsc module: | ||
+ | |||
+ | <pre> | ||
+ | module add petsc/mpich-1.2.6/intel_fc_10.1/3.0.0-p11 | ||
</pre> | </pre> | ||
− | and run | + | and then compile using the command line: |
+ | |||
+ | <pre> | ||
+ | mpif90 -I $PETSC_DIR/include/ -I /local/include/ -fpp scalar.f90 \ | ||
+ | -L $PETSC_DIR/lib/ -L/local/lib \ | ||
+ | -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common -lparmetis \ | ||
+ | -lmetis -lpord -lscalapack -lblacs -lflapack -lfblas -lX11 -lpthread | ||
+ | </pre> | ||
+ | |||
+ | (There are also a debug versions of the libraries built at /opt/local/CentOS-64/petsc/3.0.0-p11/mpich-1.2.6/intel_fc_10.1/debug/lib/) | ||
+ | |||
+ | Note that there are '''docs''' and '''tutorials''', which you can copy under: | ||
+ | |||
+ | <pre> | ||
+ | /opt/local/CentOS-64/petsc/petsc-3.0.0-p11 | ||
+ | </pre> | ||
+ | |||
+ | '''Also, Vicky has noticed that if you use the intel compilers, it appears that the order of command line parameters when you come to run programs is important, so, for example, specify -pc_type before -ksp_tye''' | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | and run (despite compiling using mpif90 etc, the runs below will be '''serial'''. See [[using petsc (parallel)]] for an example of how to run a petsc job on more than one processor): | ||
<pre> | <pre> | ||
>time ./a.out -ksp_monitor | >time ./a.out -ksp_monitor | ||
− | 0 KSP Residual norm 1. | + | |
− | 1 KSP Residual norm 1. | + | 0 KSP Residual norm 1.559571605727e+04 |
− | 2 KSP Residual norm 1. | + | 1 KSP Residual norm 1.529424636367e+04 |
− | 3 KSP Residual norm 1. | + | 2 KSP Residual norm 1.515888043725e+04 |
+ | 3 KSP Residual norm 1.497463214399e+04 | ||
+ | 4 KSP Residual norm 1.480442438141e+04 | ||
. | . | ||
. | . | ||
. | . | ||
− | + | 541 KSP Residual norm 1.587899253129e-01 | |
− | + | 542 KSP Residual norm 1.570278720318e-01 | |
− | + | 543 KSP Residual norm 1.555037845050e-01 | |
− | + | ||Ax - b||/||b|| = 6.395110096044148E-006 | |
− | |||
− | |||
− | ||Ax - b||/||b|| = 6. | ||
− | real | + | real 0m3.702s |
− | user | + | user 0m3.651s |
− | sys 0m0. | + | sys 0m0.031s |
</pre> | </pre> | ||
These default args give you GMRES(30) precondined with ilu(0) - much the same as the SLAP DSLUGM subroutine | These default args give you GMRES(30) precondined with ilu(0) - much the same as the SLAP DSLUGM subroutine | ||
− | does. | + | does. 543 is a lot of GMRES iterations - anything more than 30 and GMRES restarts, by default. It's |
− | possible to change this number, but instead I'll try a direct solver, MUMPS | + | possible to change this restart number, but instead I'll try a direct solver, MUMPS |
+ | |||
<pre> | <pre> | ||
− | > time ./a.out -ksp_monitor | + | > time ./a.out -ksp_monitor -pc_type lu -pc_factor_mat_solver_package mumps |
− | + | ||
− | + | 0 KSP Residual norm 3.232253436059e+07 | |
− | + | 1 KSP Residual norm 1.077336304460e-06 | |
− | real | + | ||Ax - b||/||b|| = 1.173428394516308E-012 |
− | user 0m1. | + | real 0m1.377s |
− | sys 0m0. | + | user 0m1.263s |
+ | sys 0m0.097s | ||
</pre> | </pre> | ||
much better. But in this case, a different choice of iterative solver, biconjugate gradients, works | much better. But in this case, a different choice of iterative solver, biconjugate gradients, works | ||
− | well too | + | well too (and will use less memory as the problem size increases) |
<pre> | <pre> | ||
time ./a.out -ksp_monitor -ksp_type bcgs | time ./a.out -ksp_monitor -ksp_type bcgs | ||
− | + | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
||Ax - b||/||b|| = 1.84931214581757455E-005 | ||Ax - b||/||b|| = 1.84931214581757455E-005 | ||
+ | 0 KSP Residual norm 1.559571605727e+04 | ||
+ | 1 KSP Residual norm 2.654841192325e+04 | ||
+ | 2 KSP Residual norm 3.504875048005e+04 | ||
+ | 3 KSP Residual norm 3.727376385776e+04 | ||
+ | . | ||
+ | . | ||
+ | . | ||
+ | 107 KSP Residual norm 2.332690535574e-01 | ||
+ | 108 KSP Residual norm 1.976699334772e-01 | ||
+ | 109 KSP Residual norm 1.560968894387e-01 | ||
+ | 110 KSP Residual norm 1.366366438749e-01 | ||
+ | ||Ax - b||/||b|| = 3.980990389893450E-006 | ||
+ | |||
+ | real 0m0.892s | ||
+ | user 0m0.857s | ||
+ | sys 0m0.014s | ||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
Line 77: | Line 154: | ||
<pre> | <pre> | ||
time ./a.out -ksp_monitor -ksp_type bcgs -pc_type sor | time ./a.out -ksp_monitor -ksp_type bcgs -pc_type sor | ||
− | 0 KSP Residual norm 9. | + | 0 KSP Residual norm 9.191868858477e+03 |
− | 1 KSP Residual norm 1. | + | 1 KSP Residual norm 1.399973304466e+04 |
− | 2 KSP Residual norm 2. | + | 2 KSP Residual norm 2.017715315232e+04 |
− | 3 KSP Residual norm | + | 3 KSP Residual norm 2.256365643228e+04 |
− | ... | + | 4 KSP Residual norm 2.245000461834e+04 |
− | + | . | |
− | + | . | |
− | + | . | |
− | + | 114 KSP Residual norm 2.371656459961e-01 | |
− | ||Ax - b||/||b|| = | + | 115 KSP Residual norm 2.201423306979e-01 |
+ | 116 KSP Residual norm 4.348021435156e-02 | ||
+ | ||Ax - b||/||b|| = 5.648286513658119E-006 | ||
− | real | + | real 0m0.913s |
− | user | + | user 0m0.884s |
− | sys 0m0. | + | sys 0m0.013s |
</pre> | </pre> | ||
− | contents of | + | contents of scalar.f90 |
<pre> | <pre> | ||
!! solve a 2D n x n (a(x,y)u')' + b(x,y) u = c(x,y) | !! solve a 2D n x n (a(x,y)u')' + b(x,y) u = c(x,y) | ||
Line 126: | Line 205: | ||
!x > 1/2 diffusion = h u' | !x > 1/2 diffusion = h u' | ||
h = 1e-2 | h = 1e-2 | ||
− | !x > 1/2 dissipation = r u , set small r | + | !x > 1/2 dissipation = r u , set small r to have problems inverting the matrix |
!so that direct solvers win | !so that direct solvers win | ||
r = 1e-6 | r = 1e-6 | ||
Line 163: | Line 242: | ||
! bulk | ! bulk | ||
− | + | dfluxe(0) = 1.0 | |
− | + | dfluxe(1) = -1.0 | |
su(0) = 1.0 | su(0) = 1.0 | ||
Line 171: | Line 250: | ||
do i = 1, n-2 | do i = 1, n-2 | ||
− | + | dfluxw(0) = dfluxe(0) | |
− | + | dfluxw(1) = dfluxe(1) | |
− | + | dfluxn(0) = 1.0 | |
− | + | dfluxn(1) = -1.0 | |
Line 183: | Line 262: | ||
su(0) = r | su(0) = r | ||
sc = p | sc = p | ||
− | + | dfluxe(0) = h | |
− | + | dfluxe(1) = -h | |
− | + | dfluxn(0) = h | |
− | + | dfluxn(1) = -h | |
endif | endif | ||
− | + | dfluxs(0) = dfluxn(0) | |
− | + | dfluxs(1) = dfluxn(1) | |
do j = 1, n-2 | do j = 1, n-2 | ||
Line 330: | Line 409: | ||
end program main | end program main | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | Extra code to extract '''solution''' from '''PetSc variable''' | ||
+ | <pre> | ||
+ | PetscInt, dimension(:), allocatable :: indx ! array containing the ordering of the elements in soln | ||
+ | real(8), dimension(:), allocatable :: f90solnvec ! solution from the linear solver | ||
+ | real(8), dimension(:,:), allocatable :: answer | ||
+ | integer :: ii,jj | ||
+ | |||
+ | allocate(answer(n,n)) | ||
+ | allocate(f90solnvec(ndof)) | ||
+ | allocate(indx(ndof)) | ||
+ | |||
+ | ! Set up the ordering of the PetSc vector to be extracted back to f90 | ||
+ | do ii=1,ndof | ||
+ | j=ii-1 | ||
+ | indx(ii)=j | ||
+ | end do | ||
+ | |||
+ | ! Extract the solution from the PetSc variable | ||
+ | f90solnvec=0.0d0 | ||
+ | call VecGetValues(x,ndof,indx,f90solnvec,ierr) | ||
+ | |||
+ | ! Put into a 2d array | ||
+ | do j=1,n | ||
+ | do i=1,n | ||
+ | ii=(j-1)*n + i | ||
+ | answer(i,j)=f90solnvec(ii) | ||
+ | end do | ||
+ | end do | ||
</pre> | </pre> |
Latest revision as of 12:47, 25 February 2010
The chunk of f90 code at the bottom of this page shows one way to solve a sparse linear system, on a single processor - parallel version at using petsc (parallel) - using PETSc. I'll try to solve the a discrete version of the elliptic problem
[math]\displaystyle{ \nabla . (a(x,y) \nabla u) + b(x,y)u = c(x,y) }[/math]
[math]\displaystyle{ u(0) = 0 }[/math]
[math]\displaystyle{ u'(L) = d }[/math]
on a 256x256 grid, with a,b and c set so 1 on the left half of the domain, and h, r and p on the right - a bit like a grounding line problem
If you're new to PETSc, you might find all of the MatSetValues statements a bit odd. It *is* possible to full up normal Fortran arrays, and then insert them into PETSc structures, but the methods I've used below are (a bit) easier to adapt to multiple processors. The constants in the code
!x > 1/2 diffusion = h u' h = 1e-2 !x > 1/2 dissipation = r u , set small r to have problems inverting the matrix !so that direct solvers win r = 1e-6 !x > 1/2 source = p p = 0.5 ! u' at x = 1 d = 100
were picked to give the solvers a hard time - if they were all 1, the iterative methods below would beat the direct solver decisively in CPU time. I think the strength of PETSc is the number of solvers and preconditioners you get to try without too much fiddling
To run this, first you need PETSc, I configured it like this (stuff I compile myself ends up in /local) (petsc is already install on the musketeers, see below)
./configure --prefix=/local/petsc/opt --download-parmetis=yes --download-mumps=yes --download-f-blas-lapack=yes --download-blacs=yes --download-scalapack=yes --enable-debugging=0 --with-mpi-dir=/local
then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners
Then compiled like this (with correct filenames and include / lib dirs)
mpif90 -I /local/petsc/opt/include/ -I /local/include/ -x f95-cpp-input scalar.f90 -L /local/petsc/opt/lib/ -L/local/lib -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common -lparmetis -lmetis -lpord -lscalapack -lblacs -lflapack -lfblas -lX11
On the musketeers, you should do the following:
First load your chosen MPI layer, in this case we'll use MPICH compiled with ifort, but we could also choose to use OpenMPI (with gfortran or ifort):
module add mpich/intel_fc_ifort10.1/1.2.6
Now load the corresponding Petsc module:
module add petsc/mpich-1.2.6/intel_fc_10.1/3.0.0-p11
and then compile using the command line:
mpif90 -I $PETSC_DIR/include/ -I /local/include/ -fpp scalar.f90 \ -L $PETSC_DIR/lib/ -L/local/lib \ -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common -lparmetis \ -lmetis -lpord -lscalapack -lblacs -lflapack -lfblas -lX11 -lpthread
(There are also a debug versions of the libraries built at /opt/local/CentOS-64/petsc/3.0.0-p11/mpich-1.2.6/intel_fc_10.1/debug/lib/)
Note that there are docs and tutorials, which you can copy under:
/opt/local/CentOS-64/petsc/petsc-3.0.0-p11
Also, Vicky has noticed that if you use the intel compilers, it appears that the order of command line parameters when you come to run programs is important, so, for example, specify -pc_type before -ksp_tye
and run (despite compiling using mpif90 etc, the runs below will be serial. See using petsc (parallel) for an example of how to run a petsc job on more than one processor):
>time ./a.out -ksp_monitor 0 KSP Residual norm 1.559571605727e+04 1 KSP Residual norm 1.529424636367e+04 2 KSP Residual norm 1.515888043725e+04 3 KSP Residual norm 1.497463214399e+04 4 KSP Residual norm 1.480442438141e+04 . . . 541 KSP Residual norm 1.587899253129e-01 542 KSP Residual norm 1.570278720318e-01 543 KSP Residual norm 1.555037845050e-01 ||Ax - b||/||b|| = 6.395110096044148E-006 real 0m3.702s user 0m3.651s sys 0m0.031s
These default args give you GMRES(30) precondined with ilu(0) - much the same as the SLAP DSLUGM subroutine does. 543 is a lot of GMRES iterations - anything more than 30 and GMRES restarts, by default. It's possible to change this restart number, but instead I'll try a direct solver, MUMPS
> time ./a.out -ksp_monitor -pc_type lu -pc_factor_mat_solver_package mumps 0 KSP Residual norm 3.232253436059e+07 1 KSP Residual norm 1.077336304460e-06 ||Ax - b||/||b|| = 1.173428394516308E-012 real 0m1.377s user 0m1.263s sys 0m0.097s
much better. But in this case, a different choice of iterative solver, biconjugate gradients, works well too (and will use less memory as the problem size increases)
time ./a.out -ksp_monitor -ksp_type bcgs ||Ax - b||/||b|| = 1.84931214581757455E-005 0 KSP Residual norm 1.559571605727e+04 1 KSP Residual norm 2.654841192325e+04 2 KSP Residual norm 3.504875048005e+04 3 KSP Residual norm 3.727376385776e+04 . . . 107 KSP Residual norm 2.332690535574e-01 108 KSP Residual norm 1.976699334772e-01 109 KSP Residual norm 1.560968894387e-01 110 KSP Residual norm 1.366366438749e-01 ||Ax - b||/||b|| = 3.980990389893450E-006 real 0m0.892s user 0m0.857s sys 0m0.014s
or, with a different (more scalable) preconditioner, successive over relaxation
time ./a.out -ksp_monitor -ksp_type bcgs -pc_type sor 0 KSP Residual norm 9.191868858477e+03 1 KSP Residual norm 1.399973304466e+04 2 KSP Residual norm 2.017715315232e+04 3 KSP Residual norm 2.256365643228e+04 4 KSP Residual norm 2.245000461834e+04 . . . 114 KSP Residual norm 2.371656459961e-01 115 KSP Residual norm 2.201423306979e-01 116 KSP Residual norm 4.348021435156e-02 ||Ax - b||/||b|| = 5.648286513658119E-006 real 0m0.913s user 0m0.884s sys 0m0.013s
contents of scalar.f90
!! solve a 2D n x n (a(x,y)u')' + b(x,y) u = c(x,y) !! with periodic boundary conditions in y, u(x=0,y) = 0 and u'(x=1,y)= d !! a(x,y) = 1 for x < 1/2, h for x > 1/2 !! b(x,y) = 1 for x < 1/2, r for x > 1/2 !! c(x,y) = 1 for x < 1/2, p for x > 1/2 program main implicit none #include "finclude/petsc.h" #include "finclude/petscvec.h" #include "finclude/petscmat.h" #include "finclude/petscksp.h" #include "finclude/petscpc.h" Mat A Vec x,b, res KSP ksp PetscInt ierr, ndof, n, m, i , j PetscScalar dfluxe(0:1), dfluxw(0:1), dfluxn(0:1), dfluxs(0:1), su(0:0), sc, & h, r, p , d, rnorm, bnorm, one PetscInt row(0:0), fcol(0:1) n = 256 m = n/2 ndof = n * n !x > 1/2 diffusion = h u' h = 1e-2 !x > 1/2 dissipation = r u , set small r to have problems inverting the matrix !so that direct solvers win r = 1e-6 !x > 1/2 source = p p = 0.5 ! u' at x = 1 d = 100 ! this has to be the first petsc call call PetscInitialize(PETSC_NULL_CHARACTER,ierr) ! alocate space for rhs, residual and solution call VecCreateSeq(PETSC_COMM_SELF, ndof, x, ierr); call VecCreateSeq(PETSC_COMM_SELF, ndof, b, ierr); call VecCreateSeq(PETSC_COMM_SELF, ndof, res, ierr); !Matrix assembly. I'm doing this in the church of PETSc style, which might !look a bit odd. If you already have arrays i,j,a in the CSR form, there is !a single call, ! ! call MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, ndof, ndof , i, j, a, A, ierr) ! which would do the trick ! ! m is the number of rows, n > m the number of nonzeros ! a(0:n-1) contains values and j(0:n-1) coulmn indices ! i(0:m) contains row pointers, e.g i(4) data for row 4 starts ! at a(i(4)) and finishes at a(i(5)-1) ! create matrix A, allocating 5 nonzeros per row, this doesn't have to be exactly ! right, but if it is to small, there will be costly mallocs, and if too ! large, additional operations. There is a variant which allows a per-row sparisity ! pattern to be set. call MatCreateSeqAIJ(PETSC_COMM_SELF, ndof, ndof, 5, PETSC_NULL_INTEGER,A,ierr) ! bulk dfluxe(0) = 1.0 dfluxe(1) = -1.0 su(0) = 1.0 sc = 1.0 do i = 1, n-2 dfluxw(0) = dfluxe(0) dfluxw(1) = dfluxe(1) dfluxn(0) = 1.0 dfluxn(1) = -1.0 if (i > m) then su(0) = r sc = p dfluxe(0) = h dfluxe(1) = -h dfluxn(0) = h dfluxn(1) = -h endif dfluxs(0) = dfluxn(0) dfluxs(1) = dfluxn(1) do j = 1, n-2 row(0) = j*n + i ! source term call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr) call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr) ! fluxes fcol(0) = row(0) fcol(1) = row(0) + 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr) fcol(1) = row(0) - 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr) fcol(1) = row(0) + n call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr) fcol(1) = row(0) - n call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr) end do ! periodic boundaries j = 0 row(0) = j*n + i ! source term call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr) call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr) ! fluxes fcol(0) = row(0) fcol(1) = row(0) + 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr) fcol(1) = row(0) - 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr) fcol(1) = row(0) + n call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr) fcol(1) = (n-1)*n + i call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr) j = n-1 row(0) = j*n + i ! source term call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr) call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr) ! fluxes fcol(0) = row(0) fcol(1) = row(0) + 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr) fcol(1) = row(0) - 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr) fcol(1) = row(0) - n call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr) fcol(1) = i call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr) end do !x-boundaries dfluxw(0) = h dfluxw(1) = -h do j = 0, n - 1 i = 0 row(0) = j*n + i call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr) i = n - 1 row(0) = j*n + i su = r sc = h*d call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr) call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr) fcol(0) = row(0) fcol(1) = row(0) - 1 call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr) end do !needs to be done before A can be used call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr) !KSP context: KSP is petsc's interface to linear solvers call KSPCreate(PETSC_COMM_SELF,ksp,ierr) !A is defines both the operator and preconditioner (but might not) call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !grab all the options from the command line call KSPSetFromOptions(ksp,ierr) ! solve call KSPSolve(ksp,b,x,ierr) !check results call MatMult(A, x, res, ierr) one = -1.0 call VecAXPY(res , one , b, ierr) call VecNorm(res , NORM_2, rnorm, ierr) call VecNorm(b , NORM_2, bnorm, ierr) write(*,*) "||Ax - b||/||b|| = ", rnorm/bnorm !clean up call VecDestroy(x,ierr) call VecDestroy(b,ierr) call MatDestroy(A,ierr) !last petsc call call PetscFinalize(ierr) end program main
Extra code to extract solution from PetSc variable
PetscInt, dimension(:), allocatable :: indx ! array containing the ordering of the elements in soln real(8), dimension(:), allocatable :: f90solnvec ! solution from the linear solver real(8), dimension(:,:), allocatable :: answer integer :: ii,jj allocate(answer(n,n)) allocate(f90solnvec(ndof)) allocate(indx(ndof)) ! Set up the ordering of the PetSc vector to be extracted back to f90 do ii=1,ndof j=ii-1 indx(ii)=j end do ! Extract the solution from the PetSc variable f90solnvec=0.0d0 call VecGetValues(x,ndof,indx,f90solnvec,ierr) ! Put into a 2d array do j=1,n do i=1,n ii=(j-1)*n + i answer(i,j)=f90solnvec(ii) end do end do