How to make your Matlab code run 100x faster

Contents

An example: the Volterra Equation

The Volterra type II equation is a renewal equation

$$b(t) = f(t) + \int_0^t b(t-a) f(a) da$$

where $b(t)$ is the birth rate at time $t$ needed for keeping the population constant in a population with instantaneous death rate of individuals aged $a$, $f(a)$.

Solving the Volterra equation numerically

We use the trapezoidal rule to discretize the integral term

$$t_i = i h, i = 0,1,...,N, h = \frac{t_{final}}{N},$$ $$b(t_i) = bi,$$ $$f(t_i-a_j) = f_{ij},$$ $$f(t_i) = f_i,$$ $$b_0 = f_0.$$

and

$$(1-1/2 h f_{ii}) b_i = h \left(1/2 f_{i0} b_0 + \sum_{j-1}^{i-1} f_{ij} b_j\right) + f_i$$

This is a linear system

$$A b = f$$

with A a NxN matrix

$$A_{ii} = 1-1/2 h f_{ii}, i=1,...,N,$$ $$A_{ij} = -h f_{ij},$$ $$A_{i0} = -h/2 f_{i0},$$ $$A_{00} = 1.$$

Once $A$ is determined, the Volterra equation is solved with the command

b = A\f

First implementation: MATLAB

Best use of vectorial notation

type volterra_mat.m
function A = volterra_mat(f,dt)
% GENERATE_MATRIX lower triangular NxN matrix

n = length(f);
mf=repmat(f,1,n);
A=spdiags(fliplr(mf'),-(n-1):0,n,n);
A(:,1)=A(:,1)/2;
A=A-diag(diag(A))/2;
A(1,1)=0;
A = speye(n)-dt*A;

Second implementation: MEX file (C source code)

Pure C code with two for loops

Speed test

clear;
volterra_mat_test
speed ratio (matlab/mex): 0.11448
speed ratio (matlab/mex): 35.7465
speed ratio (matlab/mex): 50.0336
speed ratio (matlab/mex): 99.6
speed ratio (matlab/mex): 117.9858
speed ratio (matlab/mex): 56.8965
speed ratio (matlab/mex): 39.681
speed ratio (matlab/mex): 55.7403
speed ratio (matlab/mex): NaN
speed ratio (matlab/mex): NaN

MEX files

Source code can be FORTRAN, C, C++

Example with C

   void
   mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
   {
     ...
   }
mexFunction inputs:

mxArray is the basic type of a matlab variable.

Here, the mex-file voltmat_mex takes two arguments (one vector f and one double dt) and returns a matrix A: nlhs = 1, nrhs = 2

{
       mwSize mrows, ncols;
       double *f; // the pointer to the vector f
       double h; // timestep in the discretization
       f = mxGetPr(prhs[0]);       // assign the pointer to the 1D array
       mrows = mxGetM(prhs[0]);    // get its length
       ncols = mrows;              // not really useful here
       h = mxGetScalar(prhs[1]);
       // create a mrowsxncols matrix
       double *A = mxCalloc(mrows*ncols, sizeof(double));
       // use mxCalloc instead of malloc
       // filling up the matrix A, to solve the Volterra equation as A*x = f
       voltmat(A,f,h,mrows,ncols);
       // initialize the output
       plhs[0]=mxCreateDoubleMatrix(0,0,mxREAL);
       // this creates a pointer to an empty matrix of doubles (with real numbers)
       /* Point mxArray to A */
       mxSetPr(plhs[0], A);    // first output pointer to A
       mxSetM(plhs[0], mrows); // assign the size of A
       mxSetN(plhs[0], ncols); // assign the size of A
       // do not free A, since plhs[0] points to A
       return;
 }

The actual C routine

   void voltmat( double *A, double *f, double h, mwSize m, mwSize n )
   {
       mwIndex i,j;
     for (j = 0; j < n; j++)
       {
         for (i = j; i < n; i++)
           {
         *( A+i+n*j ) = - *(f + i - j) * h;
           }
       }
     *A = 0;
     for (i = 0; i < m; i++)
       {
         *(A + i) = *(A + i)/2;
       }
     for (i = 0; i < m; i++)
       {
         *(A + i*(n+1) ) = 1 + *(A + i*(n+1))/2;
       }
   }

The full C source code

type voltmat_mex.c
/* VOLTMAT_MEX generates a Volterra matrix from the vector f
 * To compile:
 * mex -O -v voltmat_mex.c
 *
 * mandatory input arguments:
 * f: column vector of doubles of length m
 * h: timestep (double scalar)
 */

/* always include mex.h, it contains the Matlab types */
#include "mex.h"
/* #include "matrix.h" */

/* voltmat takes an array of doubles f, a double h, and returns a matrix A */
void voltmat( double *A, double *f, double h, mwSize m, mwSize n )
{
    mwIndex i,j;
  for (j = 0; j < n; j++)
    {
      for (i = j; i < n; i++)
        {
	  *( A+i+n*j ) = - *(f + i - j) * h;
        }
    }
  *A = 0;
  for (i = 0; i < m; i++)
    {
      *(A + i) = *(A + i)/2;
    }
  for (i = 0; i < m; i++)
    {
      *(A + i*(n+1) ) = 1 + *(A + i*(n+1))/2;
    }    
}

void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
    mwSize mrows, ncols;
    
    double *f; /* the pointer to the vector f */
    double h; /* timestep in the discretization */
    
    f = mxGetPr(prhs[0]);       /* assign the pointer to the 1D array */
    mrows = mxGetM(prhs[0]);    /* get its length */
    ncols = mrows;              /* not really useful here */
    
    h = mxGetScalar(prhs[1]);
    
    /* create a mrowsxncols array */
    double *A = mxCalloc(mrows*ncols, sizeof(double));
    
    /* filling up the matrix A, 
     * to solve the Volterra equation as A*x = f
     */
    voltmat(A,f,h,mrows,ncols);
    
    /* initialize the output */
    plhs[0]=mxCreateDoubleMatrix(0,0,mxREAL); 
    
    /* Point mxArray to A */
    mxSetPr(plhs[0], A);
    mxSetM(plhs[0], mrows);
    mxSetN(plhs[0], ncols);
    
    /* Do not free A, since the output plhs[0] points to A */
    
    return;   
}

Compiling

First setup the mex compiler. This step is OS-dependent, see Matlab documentation by typing

mex -help

and then compile

mex -O voltmat_mex.c

add the -v switch to display compilation details

This creates a matlab executable file. On MacOS, it has an extension .mexmaci64

The executable can be called as any other Matlab function.

Caveat

Mex-files are pre-compiled, if you have bugs, Matlab will not be able to tell you where the error is.

You cannot interrupt execution of a mex-file from within Matlab (Ctrl-C does not work). To stop the execution, you need to kill Matlab, so make sure you properly save your data before launching possibly long simulations.

It is possible to use debugger tools, but this is for another tutorial.

Second example: The sum of the first N integers

Vectorized notation

type sum_first_n_vect.m
function s = sum_first_n_vect(n)
% SUM_FIRST_N_VECT sums the first N integers, vectorial form

s = sum(1:n);

Loop notation

type sum_first_n_loop.m
function s = sum_first_n_loop(n)
% SUM_FIRST_N_LOOP sums the first N integers, loop form

s = 0;
for i=1:n
    s = s + n;
end

mex code

type sum_first_n_mex.c
/* SUM_FIRST_N_MEX sums the first N integers
 * To compile:
 * mex -O -v sum_first_n_mex.c
 *
 * mandatory input arguments:
 * integer N
 */

/* always include mex.h, it contains the Matlab types */
#include "mex.h"

void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
    int n;
    double *s;
    int i;
    
    n = (int)mxGetScalar(prhs[0]);
   
    /* initialize the output */
    plhs[0]=mxCreateDoubleMatrix(1,1,mxREAL);  
    
    s = mxGetPr(plhs[0]);
    
    *s = 0;
    for (i = 1; i <= n; i++)
    {
        *s += i;
    }
    
    return;   
}

CPU times

clear
sum_first_n_test
speed ratio (matlab/mex): 2.0149
speed ratio (matlab/mex): 0.37371
speed ratio (matlab/mex): 0.42946
speed ratio (matlab/mex): 0.69378
speed ratio (matlab/mex): 1.7844
speed ratio (matlab/mex): 3.2112
speed ratio (matlab/mex): 5.8532
speed ratio (matlab/mex): 4.7788
speed ratio (matlab/mex): 5.9796