Infering Logical Abstractions of Reaction Networks by Neural Networks

Joachim Niehren, Inria Center of Lille University, Biocomputing Team

Cristian Versari, Lille University, BioComputing Team

Aurelien Lemay, Lille University, Links Team

Scientific Context

Biological systems are frequently modeled using chemical reaction networks, where their temporal behavior is typically represented through sets of ordinary differential equations (ODEs). However, this approach assumes complete knowledge of reaction kinetics and numerical parameters, which is rarely achievable due to the limitations of wet-lab experimental techniques. To address this, alternative modeling strategies, such as Boolean networks and abstract chemical reaction networks, have been proposed for modeling biological system dynamics when only partial kinetic information is available.

Problem

Model design in these discrete frameworks is often carried out manually because the statistical noise present in wet-lab data complicates automatic inference. While neural networks can outperform traditional approaches, they are often regarded as "black boxes" due to their lack of interpretability, making it difficult to understand the biological behavior they learn.

Aim of the Internship

The goal of the internship is to develop and implement neural networks that are particularly suited to the inference of biological system dynamics. The aim is to identify architectures that are well-aligned with discrete models of biological systems, such as chemical reaction networks, while still leveraging the strengths of machine learning for processing complex, noisy datasets. These methods will be tested on a dataset from the North Sea, which includes environmental variables and the abundance of 270 phytoplankton species.

Objectives

Neural Network Implementation

The intern will implement and train neural networks using frameworks such as PyTorch, focusing on architectures that can potentially align with the characteristics of discrete biological models. The work will focus on learning from a dataset of environmental factors like temperature, salinity, nitrate, and carbon, alongside the abundance of phytoplankton species in the North Sea.

Exploration of Network Architectures

The student will explore and evaluate various neural network architectures that could serve as a basis for inferring symbolic, discrete models in future work. The focus here will be on identifying which types of neural networks might be well-suited for transitioning toward interpretable, discrete frameworks, though this transition will primarily be developed in a follow-up PhD.

Testing and Evaluation

The neural networks will be trained and tested on the North Sea dataset to evaluate their ability to model the interactions between environmental factors and phytoplankton species. Performance will be assessed to see how well the networks capture the essential dynamics from the data and how they could support later inference of discrete models.

PhD Continuation

A continuation in the form of a PhD thesis is possible following the internship. In the thesis, the focus would shift towards the development of methods for converting neural network outputs into interpretable, discrete models such as abstract chemical reaction networks. The approach developed during the internship would be expanded to include more complex biological datasets, with an emphasis on improving both model performance and interpretability. This would involve tailoring neural network architectures specifically for the symbolic modeling of biological systems.

Methodology

The intern will use deep learning frameworks such as PyTorch to build and train neural network models. The networks will be trained on the North Sea phytoplankton dataset, and emphasis will be placed on exploring architectures that could later facilitate the inference of symbolic models in the PhD stage.

Conclusion

This project provides an opportunity to explore the intersection between neural networks and biological system modeling. The intern will gain experience in handling complex biological datasets, implementing neural networks, and exploring their potential for supporting future work on interpretable models. The project has strong potential to extend into a PhD, where the focus will shift towards inferring discrete, symbolic models from neural network outputs, expanding the research to new datasets and applications.