Difference between revisions of "Using petsc"

From SourceWiki
Jump to navigation Jump to search
Line 7: Line 7:
  
 
<pre>
 
<pre>
/configure --prefix=/opt/petsc/opt --download-parmetis=yes --download-mumps=yes --download-lapack-f-blas=yes --download-blacs=yes  --download-scalapack=yes --enable-debugging=0
+
/configure --prefix=/opt/petsc/opt --download-parmetis=yes --download-mumps=yes --download-lapack-f-blas=yes  
 +
--download-blacs=yes  --download-scalapack=yes --enable-debugging=0
 
</pre>
 
</pre>
 
then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners
 
then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners
Line 13: Line 14:
 
Then compiled like this (with correct filenames and include / lib dirs)
 
Then compiled like this (with correct filenames and include / lib dirs)
 
<pre>
 
<pre>
mpif90 -I /opt/petsc/debug/include/ -I /usr/include/mpi -x f95-cpp-input stuff.f90 -L /opt/petsc/debug/lib/ -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common -lparmetis -lmetis -lpord -lscalapack -lblacs -llapack -lblas
+
mpif90 -I /opt/petsc/debug/include/ -I /usr/include/mpi -x f95-cpp-input stuff.f90 -L /opt/petsc/debug/lib/  
 +
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common  
 +
-lparmetis -lmetis -lpord -lscalapack -lblacs -llapack -lblas
 
</pre>
 
</pre>
  

Revision as of 08:31, 8 February 2010

The chunk of f90 code at the bottom of this page shows one way to solve a sparse linear system, on a single processor, using PETSc. If you're new to PETSc, you might find all of the MatSetValues statements a bit odd. It *is* possible to full up normal Fortran arrays, and then insert them into PETSc structures, but the methods I've used below are (a bit) easier to adapt to multiple processors. The constants in the code were picked to give the solvers a hard time - if they were all 1, the iterative method i try would below would beat the direct solvers decisively in CPU time. I think the strength of PETSc is the number of solvers and preconditioners you get to try without to much fiddling

To run this, first you need PETSc, I configured it like this

/configure --prefix=/opt/petsc/opt --download-parmetis=yes --download-mumps=yes --download-lapack-f-blas=yes 
--download-blacs=yes  --download-scalapack=yes --enable-debugging=0

then followed the install instructions. Most of the options are to do with obtaining and compiling mumps, and PETSc has interfaces to dozens of third party solvers and preconditioners

Then compiled like this (with correct filenames and include / lib dirs)

mpif90 -I /opt/petsc/debug/include/ -I /usr/include/mpi -x f95-cpp-input stuff.f90 -L /opt/petsc/debug/lib/ 
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc -ldmumps -lmumps_common 
-lparmetis -lmetis -lpord -lscalapack -lblacs -llapack -lblas

and run

>time ./a.out -ksp_monitor

0 KSP Residual norm 1.577558167459e+04 
1 KSP Residual norm 1.555235947175e+04 
2 KSP Residual norm 1.553510018819e+04 
3 KSP Residual norm 1.547596840402e+04 
.
.
.
772 KSP Residual norm 1.626026650646e-01 
773 KSP Residual norm 1.613790955161e-01 
774 KSP Residual norm 1.604599443998e-01 
775 KSP Residual norm 1.595287324841e-01 
776 KSP Residual norm 1.584856433220e-01 
777 KSP Residual norm 1.575598961059e-01 
||Ax - b||/||b|| =   6.41128982641614296E-005

real	0m16.660s
user	0m16.220s
sys	0m0.130s

These default args give you GMRES(30) precondined with ilu(0) - much the same as the SLAP DSLUGM subroutine does. 777 is a lot of GMRES iterations - anything more than 30 and GMRES 'restarts' by default. It's possible to change this number, but instead I'll try a direct solver, MUMPS

> time ./a.out -ksp_monitor -ksp_rtol 1e-6 -pc_type lu -pc_factor_mat_solver_package mumps
  0 KSP Residual norm 6.530481861352e+07 
  1 KSP Residual norm 4.428670841945e-05 
 ||Ax - b||/||b|| =   1.34874191271912861E-011

real	0m2.315s
user	0m1.980s
sys	0m0.150s

much better. But in this case, a different choice of iterative solver, biconjugate gradients, works well too

time ./a.out -ksp_monitor -ksp_type bcgs
  0 KSP Residual norm 1.577558167459e+04 
  1 KSP Residual norm 2.458531823848e+04 
  2 KSP Residual norm 4.125657503138e+04 
  3 KSP Residual norm 5.334534834228e+04 
...
114 KSP Residual norm 1.903431640500e-01 
115 KSP Residual norm 1.841847114437e-01 
116 KSP Residual norm 1.504458548881e-01 
||Ax - b||/||b|| =   1.84931214581757455E-005

real	0m2.156s
user	0m2.070s
sys	0m0.050s


or, with a different (more scalable) preconditioner, successive over relaxation

time ./a.out -ksp_monitor -ksp_type bcgs -pc_type sor
0 KSP Residual norm 9.212540695739e+03 
1 KSP Residual norm 1.469842459889e+04 
2 KSP Residual norm 2.561769554622e+04 
3 KSP Residual norm 3.431585140607e+04 
...
130 KSP Residual norm 1.262744059236e-01 
131 KSP Residual norm 1.046774705728e-01 
132 KSP Residual norm 9.862283924313e-02 
133 KSP Residual norm 8.489509088012e-02 
 ||Ax - b||/||b|| =   3.52960129111959793E-005

real	0m2.797s
user	0m2.650s
sys	0m0.090s

contents of stuff.f90

!! solve a 2D n x n (a(x,y)u')' + b(x,y) u = c(x,y) 
!! with periodic boundary conditions in y, u(x=0,y) = 0 and u'(x=1,y)= d
!! a(x,y) = 1 for x < 1/2, h for x > 1/2
!! b(x,y) = 1 for x < 1/2, r for x > 1/2 
!! c(x,y) = 1 for x < 1/2, p for x > 1/2 

program main

  implicit none

#include "finclude/petsc.h"
#include "finclude/petscvec.h"
#include "finclude/petscmat.h"
#include "finclude/petscksp.h"
#include "finclude/petscpc.h"
 
  
  Mat A
  Vec x,b, res
  KSP ksp
  PetscInt ierr, ndof, n, m, i , j
  PetscScalar dfluxe(0:1), dfluxw(0:1), dfluxn(0:1), dfluxs(0:1), su(0:0), sc, &
       h, r, p , d, rnorm, bnorm, one
  PetscInt row(0:0), fcol(0:1)
  
  n = 256
  m = n/2
  ndof = n * n

  !x > 1/2 diffusion = h u'
  h = 1e-2
  !x > 1/2 dissipation = r u , set small r (1e-12) to have problems inverting the matrix
  !so that direct solvers win
  r = 1e-6
  !x > 1/2 source = p
  p = 0.5
  ! u' at x = 1
  d = 100
  

  ! this has to be the first petsc call 
  call PetscInitialize(PETSC_NULL_CHARACTER,ierr)

  ! alocate space for rhs, residual and solution
  call VecCreateSeq(PETSC_COMM_SELF, ndof, x, ierr);
  call VecCreateSeq(PETSC_COMM_SELF, ndof, b, ierr);
  call VecCreateSeq(PETSC_COMM_SELF, ndof, res, ierr);

  !Matrix assembly. I'm doing this in the church of PETSc style, which might
  !look a bit odd. If you already have arrays i,j,a in the CSR form, there is 
  !a single call, 
  !
  ! call MatCreateSeqAIJWithArrays(PETSC_COMM_SELF, ndof, ndof , i, j, a, A, ierr)
  ! which would do the trick 
  !
  ! m is the number of rows, n > m the number of nonzeros
  ! a(0:n-1) contains values and j(0:n-1) coulmn indices
  ! i(0:m) contains row pointers, e.g i(4) data for row 4 starts
  ! at a(i(4)) and finishes at  a(i(5)-1)
 
  ! create matrix A, allocating 5 nonzeros per row, this doesn't have to be exactly
  ! right, but if it is to small, there will be costly mallocs, and if too
  ! large, additional operations. There is a  variant which allows a per-row sparisity
  ! pattern to be set.
  call MatCreateSeqAIJ(PETSC_COMM_SELF, ndof, ndof, 5, PETSC_NULL_INTEGER,A,ierr)
  
  
  ! bulk
  dfluxw(0) = 1.0
  dfluxw(1) = -1.0
 
  su(0) = 1.0
  sc = 1.0

  do i = 1, n-2

     dfluxe(0) = dfluxw(0)
     dfluxe(1) = dfluxw(1)

     
     dfluxs(0) = 1.0
     dfluxs(1) = -1.0

  

     if (i > m) then
        su(0) = r
        sc = p
        dfluxw(0) = h
        dfluxw(1) = -h
        dfluxs(0) = h
        dfluxs(1) = -h
     endif
     
     dfluxn(0) = dfluxs(0)
     dfluxn(1) = dfluxs(1)
    
     do j = 1, n-2
          
        row(0) = j*n + i
        
        
        ! source term
        call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr)
        call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr)

        ! fluxes
        fcol(0) = row(0)

        fcol(1) = row(0) + 1
        call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr)
        
        fcol(1) = row(0) - 1
        call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr)
        
        fcol(1) = row(0) + n
        call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr)
        
        fcol(1) = row(0) - n
        call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr)
      

     end do

     ! periodic boundaries
     j = 0
     row(0) = j*n + i
     ! source term
     call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr)
     call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr)

     ! fluxes
     fcol(0) = row(0)

     fcol(1) = row(0) + 1
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr)
        
     fcol(1) = row(0) - 1
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr)
        
     fcol(1) = row(0) + n
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr)
        
     fcol(1) = (n-1)*n + i
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr)
     

     j = n-1
     row(0) = j*n + i
     ! source term
     call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr)
     call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr)

     ! fluxes
     fcol(0) = row(0)

     fcol(1) = row(0) + 1
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxe, ADD_VALUES,ierr)
        
     fcol(1) = row(0) - 1
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr)
        
     fcol(1) = row(0) - n
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxn, ADD_VALUES,ierr)
        
     fcol(1) = i
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxs, ADD_VALUES,ierr)
     
     


  end do
   
  !x-boundaries

  dfluxw(0) = h
  dfluxw(1) = -h

  do j = 0, n - 1
  
     i = 0
     row(0) = j*n + i
     call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr)

     i = n - 1
     row(0) = j*n + i
     su = r
     sc = h*d
     call MatSetValues(A, 1 ,row, 1, row, su, ADD_VALUES,ierr)
     call VecSetValue( b, row(0) , sc , INSERT_VALUES, ierr)
     fcol(0) = row(0)
     fcol(1) = row(0) - 1
     call MatSetValues(A, 1 ,row, 2, fcol, dfluxw, ADD_VALUES,ierr) 
    
     


  end do


  !needs to be done before A can be used
  call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)
  call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)

 
  !KSP context: KSP is petsc's interface to linear solvers
  call KSPCreate(PETSC_COMM_SELF,ksp,ierr)

  !A is defines both the operator and preconditioner (but might not)
  call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)

  !grab all the options from the command line
  call KSPSetFromOptions(ksp,ierr)

  ! solve
  call KSPSolve(ksp,b,x,ierr)

  !check results
  call MatMult(A, x, res, ierr)
  one = -1.0
  call VecAXPY(res , one , b, ierr)
  call VecNorm(res , NORM_2, rnorm, ierr) 
  call VecNorm(b , NORM_2, bnorm, ierr)
  write(*,*) "||Ax - b||/||b|| = ", rnorm/bnorm


  !clean up
   call VecDestroy(x,ierr)
   call VecDestroy(b,ierr)
   call MatDestroy(A,ierr)

  !last petsc call
  call PetscFinalize(ierr)

end program main