HemoCell, developed by the team of Prof Alfons Hoekstra at the University of Amsterdam (NL), is a parallel computing framework for the simulation of dense deformable capsule suspensions, with special emphasis on blood flows and blood related vesicles (cells). The library implements validated mechanical models for red blood cells and can reproduce emergent transport characteristics of such complex cellular systems. HemoCell can handle large simulation domains and high shear-rate flows providing a virtual environment to evaluate a wide palette of microfluidic scenarios.
The code targets academic and industrial users with an interest in simulating dense cellular suspensions, where the users typically have a special interest in blood-related flows. The users' applications can include a broad range of applications, e.g., understanding experimental observations using numerical simulation, validation based on experimental data, or investigating numerical aspects of HemoCell, such as performance analysis, scalability, and load-balancing.
The source code of HemoCell is provided as an open-source library under the AGPL license. The source code comes with a wide range of illustrative examples and corresponding documentation. HemoCell is easy to compile and runs on a variety of HPC systems (e.g., Snellius (SURF), SuperMUC (LRZ), MareNostrum (BSC), etc.), where helper scripts are included to define the needed compilation environments. So far, HemoCell is not yet provided by default on any HPC system.
HemoCell is implemented in C/C++ with parallelism achieved through MPI. The library provides the cellular mechanics and cell transport on top of the fluid simulations using the lattice Boltzmann methods implemented by the underlying Palabos library (also written in C/C++ and publicly available). Currently, the implementation only exploits CPU nodes on HPC systems, i.e., no accelerated hardware is (yet) used. Typically, simulations run with a single MPI thread per core available in the allocated job on the HPC system.
Parallelism is then managed through a decomposition of the simulated domain. The underlying fluid field is decomposed in blocks of (nearly) identical size. Each of these so-called atomic blocks are distributed across the available MPI threads, where typical communication patterns are implemented to exchange data at the interfaces. The suspended cells are distributed based on their location to the corresponding atomic blocks. As the simulation progresses, cells move through the domain and need to be redistributed to different MPI threads, potentially causing a load imbalance if cells accumulate in parts of the domain. To counteract such imbalance, HemoCell provides methods to dynamically resize the atomic blocks in addition to the original static domain decomposition.
HemoCell has few dependencies and to compile the library a user only requires a C/C++ compiler, MPI libraries, and a build system (e.g. CMake and Make). After compilation, the user can link with HDF5 to enable output of raw data. For pre-processing, HemoCell comes bundled with a cell packer that generates initial red blood cell positionings for various simulation domain. For post-processing, Python scripts are available enabling visualisation using ParaView or rendering through Blender.
HPC usage and parallel performance
The need for HPC is driven by the desire to simulate ever increasing blood volumes. This requires larger simulation domains as well as larger embedded vesicle counts. Additionally, to recreate complete experimental setups, the simulations need to simulate longer time frames. This combination requires large domains, many cells, and many iterations, requiring modern HPC systems to find solutions in practical run times.
Ongoing research aims to investigate and improve the parallel performance of HemoCell. Current measurements show reasonable scaling performance of both fluid and cellular parts with an efficiency of about 70% (as tested on Cartesius up to 512 nodes (x24 cores)). In the coming years, we aim to increase this scaling efficiency and reduce the overall runtime by providing support for accelerated hardware (GPU). Simultaneously, advanced load-balancing methods are being developed to maintain uniform load on heterogenous hardware and continuously changing cell distributions. Furthermore, we aim to extend HemoCell’s features to align with future experiments by including a variety of new boundary conditions and cell behaviours.
Multiple numerical examples are provided that can act as a “proxy” problem. These examples have known outcome and can be used as benchmark references when implementing support for new hardware, load-balancing, or parallelisation strategies. Moreover, a dedicated example is provided that is used exclusively for performance measurements (weak and strong scaling), which can be recreated by users on a variety of HPC environments.
 R. Borrell, D. Dosimont, M. Garcia-Gasulla, G. Houzeaux, O. Lehmkuhl, V. Mehta, H. Owen, M. Vázquez, and G. Oyarzun. Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture: Application to airplane aerodynamics. Future Generation Computer Systems, 107:31-48, 2020.
 G. Oyarzun, M. Avila, R. Borrell, G. Houzeaux, O. Lehmkuhl, H. Owen. Explicit/Implicit Navier-Stokes Solver on CPU/GPU heterogeneous Supercomputers, PARCFD2020, Paris, May 2021.