My research work primarily deals with High-Performance Computing, including topics  such as Task-based parallel runtime systems, Parallel programming languages, Communication libraries, and Performance tuning.
Many computer applications of high societal, economical, scientific values
require vast amounts of processing power to solve problems of interest. The
High-Performance Computing (HPC) research domain is twofold. On the hardware
side, it is in charge of building supercomputers able to supply the necessary
processing power. On the software side, it aims at proposing programming
models, methods, environments and tools to let applications harness this power.
The focus of my research works is on this software side.
    
    
    
    
- Begin: 2022
- End: 2025
- Duration: 36 months
- Participants:
        - Inria Team STORM
-  SIMULA HPC Department
 
- Personal role: co-Principal Investigator
Maelstrom associate team aims to build on the potential for synergy between
STORM and SIMULA to extend the effectiveness of FEniCS on heterogeneous,
accelerated supercomputers, while preserving its friendliness for scientific
programmers, and to readily make the broad range of applications on top of
FEniCS benefit from Maelstrom’s results.
 
- Begin: 2021
- End: 2025
- Duration: 42 months
- Participants:
        - University of Bordeaux
-  Inria
-  University of Strasbourg
-  Karlsruhe Institute of Technology
-  Simula
-  Zuse Institute Berlin
-  Megware
-  Numericor
-  OROBIX
-  Università di Pavia
-  Università della Svizzera Italiana
 
- Personal role: Participant
MICROCARD is a European research project to build software that can simulate
cardiac electrophysiology using whole-heart models with sub-cellular
resolution, on future exascale supercomputers. It is funded by EuroHPC call
Towards Extreme Scale Technologies and Applications.
 
    
    
    
    
        
        
            
- Begin: 2021
- End: 2025
- Duration: 42 months
- Participants:
        - University of Bordeaux
-  Inria
-  University of Strasbourg
-  Karlsruhe Institute of Technology
-  Simula
-  Zuse Institute Berlin
-  Megware
-  Numericor
-  OROBIX
-  Università di Pavia
-  Università della Svizzera Italiana
 
- Personal role: Participant
MICROCARD is a European research project to build software that can simulate
cardiac electrophysiology using whole-heart models with sub-cellular
resolution, on future exascale supercomputers. It is funded by EuroHPC call
Towards Extreme Scale Technologies and Applications.
 
- Begin: 2018
- End: 2021
- Duration: 39 months
- Participants:
        - ICCS
-  Linköping University
-  CERTH
-  Inria
-  Jülich
-  Maxeler
-  CNRS
-  University of Macedonia
 
- Personal role: Participant
The vision of EXA2PRO is to develop a programming environment that will enable the productive deployment of highly parallel applications in exascale computing systems. EXA2PRO programming environment will integrate tools that will address significant exascale challenges. It will support a wide range of scientific applications, provide tools for improving source code quality, enable efficient exploitation of exascale systems’ heterogeneity and integrate tools for data and memory management optimization Additionally, it will provide fault-tolerance mechanisms, both user-exposed and at runtime system level and performance monitoring features. EXA2PRO will be evaluated using 4 applications from 3 different domains, which will be deployed in JUELICH supercomputing centre: High energy physics, materials and supercapacitors.
 
- Begin: 2017
- End: 2019
- Duration: 28 months
- Personal role: Participant
Project PRACE 5th Implementation Phase’’ gathers 25 european entities (supercomputing centers, research institutes, universities) for the implementation and the development of the ecosystem enabling the effective exploitation of the capacity of national supercomputing centers, through the supporting software offer, through trainings and development of strategies and good practices. The contribution of Team Storm aims at the evolution of the KStar OpenMP compiler and OpenMP standard development for the transparent management of accelerators and distributed memory nodes.
 
- Begin: 2015
- End: 2018
- Duration: 36 months
- Participants:
        - EPCC/University of Edinburgh
-  Barcelona Supercomputing Center
-  KTH
-  Inria
-  Fraunhofer Institute
-  DLR
-  T-Systems Software
-  University Jaume 1
-  University of Manchester.
 
- Personal role: Local Contact
The objective of Project INTERTWinE is to implement, develop, and promote interoperability between programming interfaces, runtime systems, numerical libraries, in order to enable their joint use with high performance computing applications. On the one side, the contribution of the team STORM focusses on enabling the StarPU task-based runtime system to dynamically adapt its usage of computing resources (computing cores, accelerators) in accordance with instant requirements, and to lend/reclaim resources in concertation with other runtime systems. On the other side, it also focusses on the evolution of the OpenMP parallel language specification, in order to integrate elements of sharing and negociation of resources.
 
- Begin: 2013
- End: 2017
- Duration: 42 months
- Participants:
        - Barcelona Supercomputing Center
-  Bull SAS
-  STMicroelectronics SAS
-  ARM
-  Juelich Supercomputing Center
-  LRZ
-  Univ. Stuttgart
-  CINECA
-  CNRS
-  Inria
-  CEA
-  Univ. Bristol
-  Allinea Software
 
- Personal role: Participant, Local Contact (2015/12–2017/03)
Project Mont-Blanc 2 aims at developing the software stack of the Mont-Blanc experimental platform, exploring ARM-based computing node alternatives for supercomputers, especially from the performance point of view The development of the analysis and reassembly engine of the MAQAO software (Initially from Team Runtime, later from Team STORM) has been developed within this context.
 
- Begin: 2012
- End: 2014
- Duration: 36 months
- Participants:
        - Brazil
-  Mexico
-  Spain
-  France
 
- Personal role: Local Contact
Designing and porting Geophysics applications on contemporaneous supercomputers necessitates a strong expertise, in order to master their heterogeneous architectures made of multi-core processors and accelerating board. The goal of Project HPC-GA is to study and extend the runtime supports for an efficient exploitation of these machines within geophysics simulation applications. The contribution of Team Runtime has been the extension of the StarPU and ForestGOMP software components, and there use in the project’s applications.
 
- Begin: 2010
- End: 2013
- Duration: 36 months
- Participants:
        
    
- Personal role: Participant
The goal of this project is to contribute to establish the programming models, languages and software technologies to explore high performance computing beyond petascale, on the way to exascale on French and Japanese supercomputers The contribution of Team Runtime has mainly focussed on interfacing our StarPU and NewMadeleine software components with software components of our Japanese peers.
 
- Begin: 2008
- End: 2010
- Duration: 24 months
- Participants:
        - University of Lisbon
-  University of Evora
-  Inria Team RUNTIME
 
- Personal role: Participant
This collaboration focusses on the design of constraint solving engines (such as GNU Prolog) for parallel architectures, on top of the runtime systems designed within the Runtime team. On of the goals is notably to exploit heterogeneous computing hardware such as the IBM Cell processor, which shows an interesting potential for this specific use. The collaboration effort has mainly been directed towards the exploitation of the Marcel thread library within the constraint solvers designed by the Portuguese Teams.
 
- Begin: 1998
- End: 2001
- Duration: 48 months
- Participants:
        - UNH CS Dept.
-  Inria Team REMAP
 
- Personal role: Participant
The first collaboration project (1998-99) between the Group of Luc Bougé (LIP ENS Lyon) and the Group of Phil Hatcher (University of New Hampshire) has been jointly funded by Inria and the NSF. It aimed at studying the use of distributed multithreading for supporting parallel applications developed with high level languages. During this first phase, I have made a 3-month visit at the computer science laboratory of the UNH, funded by the project, on the topic of parallel, multithreaded file servers. The success of this collaboration led to the implementation of a second collaborative project (2000-01) centered on the study of a system support for Java programs on clusters of workstations, for which Madeleine 2 was used as the communication support.
  
     
        
        
            
- Begin: 2022
- End: 2025
- Duration: 36 months
- Participants:
        - Inria Team STORM
-  SIMULA HPC Department
 
- Personal role: co-Principal Investigator
Maelstrom associate team aims to build on the potential for synergy between
STORM and SIMULA to extend the effectiveness of FEniCS on heterogeneous,
accelerated supercomputers, while preserving its friendliness for scientific
programmers, and to readily make the broad range of applications on top of
FEniCS benefit from Maelstrom’s results.
  
     
        
        
            
- Begin: 2019
- End: 2023
- Duration: 48 months
- Participants:
        - CNRS-IRIT
-  Inria Bordeaux
-  Inria Lyon
-  CEA CESTA
-  Airbus Group Innovations
 
- Personal role: Participant
SOLHARIS aims at achieving strong and weak scalability (i.e., the ability to
solve problems of increasingly large size while making an effective use of the
available computational resources) of sparse, direct solvers on large scale,
distributed memory, heterogeneous computers. These solvers will rely on
asynchronous task-based parallelism, rather than traditional and widely adopted
message-passing and multithreading techniques; this paradigm will be
implemented by means of modern runtime systems which have proven to be good
tools for the development of scientific computing applications.
 
- Begin: 2015
- End: 2018
- Duration: 36 months
- Participants:
        - Bull SAS
-  CEA
-  Inria
-  SAFRAN
-  CERFACS
-  CORIA
-  CENAERO
-  ONERA
-  IFPEN
-  UVSQ
-  Kitware
-  AlgoTech
 
- Personal role: Participant
Project ELCI (Environnement Logiciel pour le Calcul Intensif, Software Environment for HPC) aims at developing a new generation of software stacks for controlling supercomputers, of numerical solvers, of pre-/post-/co-processing software, of programming and execution environments, and to validate these developments by demonstrating their ability at offering a better level of scalability, of resilience, of safety, of modularity, of abstraction, and of interactivity on applicative testcases One of the contributions of Team STORM has been the study, in partnership with Inria Team AVALON, of component models for task-based programming environments such as StarPU. A second contribution of Team STORM has been the interfacing of StarPU with CEA’s MPC environment.
 
- Begin: 2013
- End: 2017
- Duration: 48 months
- Participants:
        - Inria Bordeaux
-  Inria Lyon
-  CNRS-IRIT
-  CEA CESTA
-  Airbus Group Innovations
 
- Personal role: Participant
This project aims at studying and designing algorithms and direct methods for solving sparse linear systems on platforms equipped with accelerators, through the intermediation of a runtime system. The StarPU runtime system has been one of the runtime systems studied, extended, and used as the foundation for the software developments of the project
 
- Begin: 2009
- End: 2012
- Duration: 42 months
- Participants:
        - Inria Runtime
-  CAPS Entreprise
-  CEA INAC
-  UVSQ PRiSM
-  Inria Grenoble
-  Bull SAS
-  CEA CESTA
 
- Personal role: Principal Investigator
The goal of Project ANR ProHMPT is the exploration of architectures equipped with accelerators (such as the graphical processors, or GPUs), and the evolution of the software toolchain at the compiler, runtime and analysis levels, to enable high performance computing on such platforms This new generation of tools for heterogeneous platforms aims at addressing the needs in terms of performances from nanosimulations and aerodynamics applications. Project ProHMPT has in particular resulted in the design of the StarPU runtime system
 
- Begin: 2006
- End: 2009
- Duration: 36 months
- Participants:
        - ID-IMAG
-  BRGM
-  Bull SAS
-  CEA
-  IRISA
-  LaBRI
-  LMA
-  TOTAL SA
 
- Personal role: Participant
Computer architectures are evolving towards the use of hierarchical memories, organised with a non-uniform access of processors to memory banks. In order to extract the highest hardware performances, parallel applications need software components to enable the accurate distribution of computing processes and data, to avoid expensive non-local memory accesses. Project ANR NUMASIS aims at studying and extending the available mechanisms at the level of operating systems and middlewares for managing the smart distribution of computations and data on such platforms. The targeted field, seismology, is particularly representative of the requirements from high performance scientific applications. The contributions from Team Runtime have been on the development and interfacing of the Marcel multithread library and the ForestGOMP OpenMP runtime system with the applications of the project
 
- Begin: 2006
- End: 2009
- Duration: 36 months
- Participants:
        - LIP ENS-Lyon
-  IRISA
-  Inria Futurs/LaBRI
-  IRIT
-  CERFACS
-  CRAL
 
- Personal role: Local Contact
The goal of Project ANR LEGO is to propose and implement a multi-paradigm programming model (components, data sharing, master/slave, and workflow) constituting the state of the art for computing grid programming. It relies on efficient scheduling, deployment and communication technologies. The LEGO Model aims at supporting three kinds of classical high performance computing applications: climate modelling, astronomical simulation, and linear algebra. The contribution of Team Runtime has been the development and the interfacing of the NewMadeleine communication library with the applications of the project
 
- Begin: 2003
- End: 2006
- Duration: 36 months
- Personal role: Participant
The Grid 5000 Program has taken place within the context of the French ACI Grid Action It aimed at promoting the creation of an experimental platform for computer science research made of a large scale computation and data grid deployed on the territory of France The contribution of Team Runtime has been the development of the Madeleine 2 and the NewMadeleine communication libraries in a grid-computing context on the Grid 5000 platform"
 
- Begin: 2001
- End: 2002
- Duration: 24 months
- Personal role: Participant
This French Action coordinated by Christian Perez (IRISA Laboratory) has focussed on the theme of computing resources and data globalisation. One of the objective of this project was to design an object-oriented distributed platform to exploit multiple networking technologies in a transparent manner. This platform is built on extensions of the Marcel multithreading library and the Madeleine 2 communication library.
 
- Begin: 2000
- End: 2001
- Duration: 24 months
- Personal role: Participant
This collaborative project grouped several French research teams with the purpose of exploiting the VTHD (\textit{very high bandwidth}) 2.5 Gb/s network of the RENATER national entity, linking several research centers from Inria and from France Telecom. Research efforts have been devoted primarily on protocols, middlewares and applications. I have been more specifically involved in the designed of a communication interface tailored for this network, based on the Madeleine 2 communication library.
  
     
        
        
            
- Begin: 2015
- End: 2019
- Duration: 48 months
- Participants:
        - Teams AVALON and POLARIS (Inria Rhône-Alpes)
-  Teams MYRIADS and SUMO (Inria Bretagne Atlantique)
-  Teams HIEPACS and STORM (Inria Bordeaux - Sud-Ouest)
-  Team MEXICO (Inria Île-de-France)
-  Team VERIDIS (Inria Grand Est)
 
- Personal role: Participant
The goal of the HAC SPECIS (High-performance Application and Computers: Studying PErformance and Correctness In Simulation) project is to answer methodological needs of HPC application and runtime developers and to allow to study real HPC systems both from the correctness and performance point of view. To this end, we gather experts from the HPC, formal verification and performance evaluation community.
 
- Begin: 2013
- End: 2015
- Duration: 24 months
- Participants:
        - ANDRA
-  CEA IRFM
-  10 Inria Teams
 
- Personal role: Local Contact
This multidisciplinary IPL action from Inria (Inria Project Lab) aims primarily at developing a continuum of skills to efficiently exploit the computing capacity of future petaflops machines and beyond, for running complex, large scale numerical simulations The contribution of Team Runtime, and later of Team STORM has been the exploitation and the evolution of StarPU within linear algebra libraries and numerical simulation software, and the evolution of the KStar OpenMP compiler.
  
     
        
        
            
- Begin: 2018
- End: 2019
- Duration: 12 months
- Personal role: Participant
The SysNum cluster aims to develop top level research on the next generation of digital interconnected systems, going from sensor design to the decision processes. The contribution of Team STORM / SATANAS from LaBRI laboratory, in partnership with Team BKB, was to study the port of the StarPU runtime system as an alternative support for data analysis applications built on top of the Apache Spark environment. This contribution took place as part of Task T3.2: \emph{Convergence of High Performance Computing and Big Data}.
 
- Begin: 2017
- End: 2017
- Duration: 12 months
- Personal role: Principal Investigator
The Technical Watch Cell (\emph{Cellule de Veille Technologique}, CVT) for the French supercomputing centers’ supervising entity GENCI aims at anticipatively exploring emerging computing architectures, to prepare future hardware upgrades at these national centers. In this context, I have conducted a study on behalf of Team STORM about the adaptation of StarPU on two emerging platforms: a cluster of Intel Xeon Phi KNL nodes installed at CINES computing center and a cluster of IBM Power 8+ nodes equipped with accelerators installed at IDRIS.
 
- Begin: 2015
- End: 2016
- Duration: 24 months
- Participants:
        - Team STORM
-  IMS laboratory
 
- Personal role: Participant
The objectives of Cluster CPU, part of Bordeaux’s initiative of excellence (IDEX) are to federate the main public actors from Bordeaux related to numerical sciences, in order to develop them up to the level enabling their use as certification tools, stimulation tools for innovation, competitiveness, visibility, attractiveness of the CPU community in terms of research, education, promotion, especially from the international point of view The contribution of Team STORM, in partnership with the IMS Lab from the University of Bordeaux has been the design and development of AFF3CT (initially named P-EDGE during the first few months of the project), a software suite for the simulation and the analysis of algorithms for Error Correction Codes used in wireless communications.
 
- Begin: 2013
- End: 2016
- Duration: 36 months
- Participants:
        
    
- Personal role: Participant
Running task-based applications on distributed architectures made of thousands of heterogeneous nodes remains a challenge for the community. In the general case, it is necessary to widen the scope of the runtime system to the entire machine. This is the purpose of this project about the design of a task-based runtime system able to schedule large task graphs on architectures with a large number of nodes. This project has enabled the co-funding of the PhD Thesis of Marc Sergent, for which I have been one of the advisors. One of the major outcome of this project and of M. Sergent’s PhD Thesis is the extension of the programming model of StarPU to support large scale distributed and heterogeneous platforms.
 
- Begin: 2004
- End: 2004
- Duration: 12 months
- Participants:
        - Inria Teams Paris (Rennes) and Runtime
-  Team AND from LIFC Laboratory
 
- Personal role: Participant
Study on the potential performance gains to be obtained in selectively disabling the reliable message retransmission mechanisms for asynchronous, iterative distributed computations with robustness properties against specific message losses. The study has been conducted on the Madeleine 2 communication library.
  
     
        
        
            
- Begin: 2019
- End: 2020
- Duration: 12 months
- Personal role: co-Principal Investigator
AFF3CT is a toolchain for designing, validation and experimentation of new Error Correcting codes. This toolchain is written in C++, and this constitutes a difficulty for many industrial users, who are mostly electronics engineers. The goal of this ADT is to widen the number of possible users by designing a Matlab and Python interface for AFF3CT, in collaboration with existing users, and proposing a parallel framework in OpenMP.
 
- Begin: 2013
- End: 2015
- Duration: 24 months
- Participants:
        - Inria Team MOAIS (Inria Grenoble)
-  Inria Team Runtime
 
- Personal role: co-Principal Investigator
The goal of this action is to enable C/C++ application developers to exploit the power of heterogeneous architectures (more CPUs, GPUs, accelerators) through a high level of abstraction using programming directives (pragmas) compliant with the OpenMP standard. The KStar OpenMP compiler designed within the context of this action allows to automatically translate an OpenMP compliant application into an application targeting one of the runtime systems developed by Team MOAIS and Runtime respectively: Kaapi for Team MOAIS, and StarPU for Team Runtime. This action notably resulted in the design of the KStar OpenMP compiler and the KaSTORS benchmark suite.
 
- Begin: 2009
- End: 2011
- Duration: 24 months
- Participants: Inria Team Runtime
- Personal role: Principal Investigator
Marcel is a multithread library providing a high performance scheduler able to adapt itself to multi-core, multi-processor, and NUMA parallel architectures. This library is a foundation for the software components designed within the Runtime team, and in the context of collaborations. This action aims at enhancing the diffusion and durability of the Marcel Library by extending it with an API compliant with the POSIX Threads specification.
 
- Begin: 2009
- End: 2011
- Duration: 24 months
- Participants:
        - Inria Team Runtime
-  Inria Team Concha (Pau)
 
- Personal role: Participant
The aim of this project is to develop tools for the direct simulation of tridimensional turbulent flows in simple geometries on parallel machines within the CONCHA framework developed by Team Concha (Inria antenna from Pau). The collaboration has mainly focussed on the interfacing of CONCHA on top of the Marcel multithreading library.
 
- Begin: 1998
- End: 2099
- Duration: 24 months
- Personal role: Participant
This action aimed at uniting the efforts of several research teams around the topic of network technologies with addressing capabilities, such as the Scalable Coherent Interface (SCI) One of the result of this action was the initial design of the Madeleine 2 communication library and its port on the SCI network technology.
  
     
        
        
            
- Begin: 2015
- End: 2016
- Duration: 12 months
- Personal role: Participant
 
- Begin: 2013
- End: 2013
- Duration: 12 months
- Personal role: Participant
 
- Begin: 2012
- End: 2015
- Duration: 36 months
- Personal role: Participant
 
- Begin: 2001
- End: 2002
- Duration: 24 months
- Personal role: Participant
 
- Begin: 2000
- End: 2001
- Duration: 24 months
- Personal role: Participant
  
      
 
     
 
    
    
    
    
High-Performance Fast Forward Error Correction Toolbox for building numeric communication chains in Software Defined Radio (SDR).
Numerical communications rely on an encoding process that involves integrating
specific redundancy patterns in the communication flow prior to transmission. On
the receiving side a matching decoding process uses knowledge of the encoded
redundancy patterns to reconstruct the source signal that may have been
altered or distorted by the environment during the transmission. Algorithms in
charge of encoding and decoding such redundancy patterns are named Error
Correction Code (ECC) encoders and decoders. They are ubiquitous in numerical
communications, including WiFi and cellular phone communications, and
are computationally expensive, especially the decoders.
AFF3CT offers a C++ library of high performance
building blocks for designing efficient software transmission and reception
chains. The elementary blocks leverage the
MIPP C++ SIMD wrapper to take advantage of
streaming instruction sets from modern processors. Whole chains may also be
parallelized for increased transmission rates on multicore and multiprocessor
architectures.
The AFF3CT distribution also comes with a simulator
application to validate the error correction properties of algorithms and to
experiments with a range of algorithm variants and parameters. The simulator
compute residual Bit Error Rates (BER) and Frame Error Rates (FER) on a
generated numerical communication altered with a controlled noise level.
MIPP (MyIntrinsics++) is a portable and Open-source wrapper (MIT license) for
vector intrinsic functions (SIMD) written in C++11.
The MIPP wrapper works for SSE, AVX, AVX-512 and ARM NEON (32-bit and 64-bit) instructions,
as well as preliminary support for ARM SVE. It supports simple/double precision floating-point
numbers and signed integer arithmetic (64-bit, 32-bit, 16-bit and 8-bit).
Task-based parallel and distributed runtime system for heterogeneous clusters.
Supercomputer architectures are inherently diverse (see the Top500
list for instance) due to being tailored for distinct
kinds of workloads, and as a result of the various paths explored to design high
performance computing platforms. This technological diversity makes it difficult
to efficiently port an application from one supercomputer to another one.
Task-based runtime systems offer a means to simplify this process by enforcing
good practices in application design, such as the separation of concerns between
applicative algorithmics, kernels optimization and hardware resource management, and by
delegating part of the optimization process to some third-party software.
StarPU is a task-based runtime system
based on the Sequential Task Flow (STF) programming model. From a sequential
flow of tasks submitted by the application, StarPU drives a parallel execution
taking advantage of the computing units available on the platform nodes,
including specialized hardware accelerators, such as GPUs (Graphical Processing
Units) and FPGAs (Fully Programmable Gate Arrays). StarPU transparently manages
data transfers, replication and replicates consistency between the computing
units. It builds a performance model of task kernels on the heterogeneous units
to dynamically determine the most suitable mapping of the kernels according to
the units performance and availability.
    
        
        
            
Communication library for high performance networks.
 
         
     
 
    
        
        
            
GNU libgomp ABI-compatible library on top of the Marcel multithread library for hierarchical NUMA scheduling.
Benchmark suite of OpenMP dependent tasks kernels.
Source-to-source OpenMP compiler to transform C/C++ programs into StarPU programs.
Communication library for high performance networks, superseded by NewMadeleine.
Framework for binary code analysis and modification (private project).
Multithread library for multicore, NUMA platforms.
Composable communication stack for the Ibis grid computing environment.
StarPU Resource Manager (now included in StarPU).