/*! \page C_C++  C/C++ Interface and Example

# Interface to construct, factor and solve a hierarchical matrix
Similar to the Fortran interface, the C/C++ interface is named with the prefix "x_". x='z' for double-complex, 'd' for double-real, 'c' for single-complex, and 's' for single-complex. Take double-real precision for example, the caller needs to first define a class/object that can perform either matrix entry evaluation or matrix vector multiplication:
```
#include "dC_BPACK_wrapper.h"
//provide a user-defined class consisting all data and metadata needed for this matrix entry evaluation and/or matvec
class C_QuantApp {
//define your data or metadata here
};
// The entry evaluation function wrapper required by the Fortran code, val returns Z(m,n), F2Cptr is an alias of void*. 1<=m<=N, 1<=n<=N.
inline void C_FuncZmn(int *m, int *n, double *val, F2Cptr quant) {
  C_QuantApp* Q = (C_QuantApp*) quant;
  //call your entry evaluation function defined in C_QuantApp using Q
}

// The entry evaluation function wrapper required by the Fortran code, alldat_loc returns all local entries of Z, F2Cptr is an alias of void*.
inline void C_FuncZmnBlock(int* Ninter, int* Nallrows, int* Nallcols, int* Nalldat_loc, int* allrows, int* allcols, double* alldat_loc, int* rowidx,int* colidx, int* pgidx, int* Npmap, int* pmaps, C2Fptr quant) {
  C_QuantApp* Q = (C_QuantApp*) quant;
	// allrows: 1D array containing the global row indices (in original order starting from 1 to N) stacked together
	// allcols: 1D array containing the global column indices (in original order starting from 1 to N) stacked together
	// alldat_loc: 1D array containing the local entry values defined by pmaps (in column major) stacked together
	// colidx: 1D array containing sizes of columns of each block
	// rowidx: 1D array containing sizes of rows of each block
	// pgidx:  1D array containing the process group number of each block, the number starts from 0
	// Npmap:  number of process groups
	// pmaps:  2D array (Npmapx3) containing number of process rows, number of process columns, and starting process ID in each block

  //call your entry evaluation function defined in C_QuantApp using Q
}

// The matvec function wrapper required by the Fortran libraries, see "subroutine MatVec" of the Fortran interface for meanings of the arguments
inline void C_FuncHMatVec(char const *trans, int *nin, int *nout, int *nvec, double const *xin, double *xout, C2Fptr quant) {
  C_QuantApp* Q = (C_QuantApp*) quant;
  //call your matvec function defined in C_QuantApp using Q
}
```

Initialize ButterflyPACK metadata:
```
F2Cptr bmat;  //hierarchical matrix returned by Fortran code
F2Cptr option;     //option structure returned by Fortran code
F2Cptr stats;      //statistics structure returned by Fortran code
F2Cptr msh;		   //mesh structure returned by Fortran code
F2Cptr kerregister;   //kernel register quantities structure returned by Fortran code
F2Cptr ptree;      //process tree returned by Fortran code
MPI_Fint Fcomm;  // the Fortran MPI communicator
Fcomm = MPI_Comm_c2f(Comm); //Comm is the C MPI communicator provided by the user

// initialize ButterflyPACK metadata
d_c_bpack_createptree(&nmpi, groups, &Fcomm, &ptree); //groups is a int array of size nmpi holding MPI ranks out of communicator Fcomm used for ButterflyPACK
d_c_bpack_createoption(&option);
d_c_bpack_createstats(&stats);

// set ButterflyPACK options other than the default ones, the double and integer options are set with different functions:
d_c_bpack_set_D_option(&option, "name", val); //double-valued option
d_c_bpack_set_I_option(&option, "name", val); //int-valued option
```


Initialization of the construction phase
```
d_c_bpack_construct_init(&N, &Ndim, coordinates, nns_ptr, &nlevel, clustertree, P, &N_loc, &bmat, &option, &stats, &msh, &kerregister, &ptree, &C_FuncDistmn, &C_FuncNearFar, quant);
   //N is matrix dimension
   //coordinates is a double array of size N*Ndim representing Cartesian coordinates x1(1),...,x1(Ndim),x2(1),...,x2(Ndim)....
   //if Ndim=0, coordinates is not referenced
   //nns_ptr is a int array of size N*option%knn representing the knn nearest neighouring points of each of the N points. If option%knn=0, nns_ptr is not referenced.   
   //clustertree is an integer array of size 2^nlevel containing leafsizes in a user-provided cluster tree
   //if nlevel=0, input requires tree(1)=N
   //P is an integer array of size N, representing permutation vector returned by the ButterflyPACK clustering
   //N_loc is the local matrix dimension, P and N_loc can be used to define your matvec if needed
   //C_FuncDistmn: pointer to user-provided function to compute distance between any row and column of the matrix. Not referenced if option%nogeo is not 2. 
   //C_FuncNearFar: pointer to user-provided function to determine whether a block (in permuted order) is compressible or not. Not referenced if option%nogeo is not 2. 
```


Construction of the hierarchical matrix with entry evaluation:
```
d_c_bpack_construct_element_compute(&bmat, &option, &stats, &msh, &kerquant, &ptree, &C_FuncZmn, &C_FuncZmnBlock, quant);
   //C_FuncZmn: pointer to user-provided function to sample mn^th entry of the matrix. Used when option%elem_extract=0.
   //C_FuncZmnBlock: pointer to user-provided function to sample a list of intersections of entries of the matrix. Used when option%elem_extract>=1.
```

Construction of the hierarchical matrix with matrix-vector multiplication:
```
d_c_bpack_construct_matvec_compute(&bmat, &option, &stats, &msh, &kerregister, &ptree, &C_FuncHMatVec, quant);
   //C_FuncHMatVec: pointer to user-provided function to multiply A and A* with vectors (see above)
```

Factorization of the hierarchical matrix:
```
d_c_bpack_factor(&bmat,&option,&stats,&ptree,&msh);
```

Solve of the hierarchical matrix:
```
d_c_bpack_solve(x,b,&N_loc,&nrhs,&bmat,&option,&stats,&ptree);
	// bmat is the meta-data storing the compressed and factored matrix
	// N_loc is the local number of rows/columns
	// nrhs is the number of right-hand sides
	// b of N_loc*nrhs is the local right-hand sides (concatenation of nrhs vectors of length N_loc)
	// x of N_loc*nrhs is the local solution vector (concatenation of nrhs vectors of length N_loc)
```

# Interface to construct, and multiply a low-rank/butterfly block
ButterflyPACK also provides an interface for constructing a single MxN low-rank/butterfly block. Take double-real precision for example, the caller needs to first define a class/object that can perform either matrix entry evaluation or matrix vector multiplication:
```
#include "dC_BPACK_wrapper.h"
//provide a user-defined class consisting all data and metadata needed for this matrix entry evaluation and/or matvec
class C_QuantApp {
//define your data or metadata here
};
// The entry evaluation function wrapper required by the Fortran code, val returns Z(m,n), F2Cptr is an alias of void*. If n<0 (column index) and m>0 (row index), then 1<=m<=M, -N<=n<=-1; if m<0 (column index) and n>0 (row index), then 1<=n<=M, -N<=m<=-1. 
inline void C_FuncBZmn(int *m, int *n, double *val, F2Cptr quant) {
  C_QuantApp* Q = (C_QuantApp*) quant;
  //call your entry evaluation function defined in C_QuantApp using Q
}

// The entry evaluation function wrapper required by the Fortran code, alldat_loc returns all local entries of Z, F2Cptr is an alias of void*.
inline void C_FuncZmnBlock(int* Ninter, int* Nallrows, int* Nallcols, int* Nalldat_loc, int* allrows, int* allcols, double* alldat_loc, int* rowidx,int* colidx, int* pgidx, int* Npmap, int* pmaps, C2Fptr quant) {
  C_QuantApp* Q = (C_QuantApp*) quant;
	// allrows: 1D array containing the global row indices (in original order starting from 1 to N) stacked together
	// allcols: 1D array containing the global column indices (in original order starting from 1 to N) stacked together
	// alldat_loc: 1D array containing the local entry values defined by pmaps (in column major) stacked together
	// colidx: 1D array containing sizes of columns of each block
	// rowidx: 1D array containing sizes of rows of each block
	// pgidx:  1D array containing the process group number of each block, the number starts from 0
	// Npmap:  number of process groups
	// pmaps:  2D array (Npmapx3) containing number of process rows, number of process columns, and starting process ID in each block

  //call your entry evaluation function defined in C_QuantApp using Q
}

// The matvec function wrapper required by the Fortran libraries.
inline void C_FuncBMatVec(char const *trans, int *nin, int *nout, int *nvec, double const *xin, double *xout, C2Fptr quant, double *a, double *b){
  C_QuantApp* Q = (C_QuantApp*) quant;
  //call your matvec function defined in C_QuantApp using Q
}
```

Initialize ButterflyPACK metadata:
```
F2Cptr bf_b;  //hierarchical matrix returned by Fortran code
F2Cptr option_b;     //option structure returned by Fortran code
F2Cptr stats_b;      //statistics structure returned by Fortran code
F2Cptr msh_b;		   //mesh structure returned by Fortran code
F2Cptr kerregister_b;   //kernel register quantities structure returned by Fortran code
F2Cptr ptree_b;      //process tree returned by Fortran code
MPI_Fint Fcomm;  // the Fortran MPI communicator
Fcomm = MPI_Comm_c2f(Comm); //Comm is the C MPI communicator provided by the user

// initialize ButterflyPACK metadata
d_c_bpack_createptree(&nmpi, groups, &Fcomm, &ptree_b); //groups is a int array of size nmpi holding MPI ranks out of communicator Fcomm used for ButterflyPACK
d_c_bpack_createoption(&option_b);
d_c_bpack_createstats(&stats_b);

// set ButterflyPACK options other than the default ones, the double and integer options are set with different functions:
d_c_bpack_set_D_option(&option_b, "name", val); //double-valued option
d_c_bpack_set_I_option(&option_b, "name", val); //int-valued option
```

Initialization of the construction phase. In this interface, ButterflyPACK treats the single block as the 1-2 off-diagonal block of a larger hierarchical matrix. 
```
// one needs to first a mesh metadata for the row and column dimension, by calling d_c_bpack_construct_init. The *_dummy argument is not referenced. 
d_c_bpack_construct_init(&M, &Ndim, dat_ptr_m, nns_ptr_m,&nlevel_m, tree_m, perms_m, &myseg_m, &bmat_dummy, &option_b, &stats_dummy, &msh0_m, &kerquant_dummy, &ptree_b, &C_FuncDistmn_dummy, &C_FuncNearFar_dummy, quant_ptr_dummy);
d_c_bpack_construct_init(&N, &Ndim, dat_ptr_n, nns_ptr_n,&nlevel_n, tree_n, perms_n, &myseg_n, &bmat_dummy, &option_b, &stats_dummy, &msh0_n, &kerquant_dummy, &ptree_b, &C_FuncDistmn_dummy, &C_FuncNearFar_dummy, quant_ptr_dummy);
// the construction initialization can be done by:
d_c_bf_construct_init(&M, &N, &myseg_m, &myseg_n, nns_ptr_m, nns_ptr_n, &msh0_m, &msh0_n, &bf_b, &option_b, &stats_b, &msh_b, &kerregister_b, &ptree_b,&C_FuncDistmn, &C_FuncNearFar, quant);
```

Construction of the block with entry evaluation:
```
d_c_bf_construct_element_compute(&bf_b, &option_b, &stats_b, &msh_b, &kerregister_b, &ptree_b, &C_FuncBZmn, &C_FuncBZmnBlock, quant); 
//C_FuncBZmn: pointer to user-provided function to sample mn^th entry of the matrix. Used when option%elem_extract=0.
//C_FuncBZmnBlock: pointer to user-provided function to sample a list of intersections of entries of the matrix. Used when option%elem_extract>=1.
```

Construction of the block with matrix-vector multiplication:
```
d_c_bf_construct_matvec_compute(&bf_b, &option_b, &stats_b, &msh_b, &kerregister_b, &ptree_b, &C_FuncBMatVec, quant);
   //C_FuncBMatVec: pointer to user-provided function to multiply A and A* with vectors (see above)
```

Application of the constructed block:
```
c_bf_mult(trans, xin, xout, Ninloc, Noutloc, Nvec, bf_b, option_b, stats_b, ptree_b);
   // trans: 'N', 'C', or 'T'
   // xin: input vector of Ninloc*Nvec (concatenation of Nvec vectors of length Ninloc)
   // xout: output vector of Noutloc*Nvec (concatenation of Nvec vectors of length Noutloc)
   // Ninloc: length of local input vectors
   // Noutloc: length of local output vectors
   // Nvec: number of vectors
```

In addition, ButterflyPACK can be invoked from STRUMPACK with more compact C++ interfaces than the above. See http://portal.nersc.gov/project/sparse/strumpack/ for details.

# Example
A number of examples are available in the build/EXAMPLE folder.
You can modify options in the driver or from command line.

InterfaceTest.cpp:
An example using a collection of simple kernels. The kernels are selected by the parameter "ker". ker=1: RBF, 2: R^4, 3: sqrt(R^2+h), 4: 1/sqrt(R^2+h), 5: (X^tY+h)^2, 6: product of two random matrices. The coordinates are generated as follows: tst=1 read from a UCI machine learning dataset, tst=2 generates a random collection of coordinates, tst=3 or 4: no coordinates are generated.
This example first constructs (with entry evaluation), factor a hierarchical matrix and then uses it as matrix-vector multiplication to construct a second hierarchical matrix.
```
mpirun -n nmpi ./EXAMPLE/ctest
```

Taylor2D.cpp:
An example that solves a 2D EFIE assuming inhomogeneous media. A approximate Green's function is used to evaluate any entry of the matrix. 
```
mpirun -n nmpi ./EXAMPLE/go2d
```

FIO_Driver.cpp:
An example that constructs two 1D Fourier integral operators (FIOs) as two butterfly-compressed blocks using entry-evaluation-based APIs, then construct their product as another butterfly-compressed block using matvec-based APIs. 
```
mpirun -n nmpi ./EXAMPLE/cfio
```

InverseFIO_Driver.cpp:
An example that constructs a 1D FIO as a butterlfy block A, computes a rhs by b=A*x_ref, then generates an approximate solution using (A^*A)^-1A^*b. Then matrix A^*A is compressed as an HODLR matrix using matvec-based APIs. 
```
mpirun -n nmpi ./EXAMPLE/cifio
```
*/