Advances in machine learning and AI systems are increasingly influencing how we approach the quantitative sciences, including physics, chemistry, and biology. These opportunities include having machines learn new representations of interactions between particles, how matter transforms in reactions, help us decide what experiment to conduct next or detect emerging phenomena. Sparse data situations remain a significant hurdle in many sciences or situations where common data assumptions do not hold. Consequently, it remains critical to ground our efforts in the millennia of scientific insights embodied in the literature to avoid, in the best case, having machines relearn what we already know. The Chalmers AI4Science is a monthly seminar where we invite early-career researchers to present their work at the interface of machine learning, artificial intelligence, and a scientific discipline. This seminar series aims to provide an international platform at Chalmers for discussions about these topics and strengthen interdisciplinary research involving machine learning and AI at Chalmers.
Subscribe to our mailing list for reminders
Artificial intelligence (AI) is fueling computer-aided drug discovery. Chemical language models (CLMs) constitute a recent addition to the medicinal chemist’s toolkit for AI-driven drug design. CLMs can be used to generate novel molecules in the form of strings (e.g., SMILES, SELFIES) without relying on human-engineered molecular assembly rules. By taking inspiration from natural language processing, CLMs have shown able to learn “syntax” rules for molecule generation, and to implicitly capture “semantic” molecular features, such as physicochemical properties, bioactivity, and chemical synthesizability. This talk will illustrate some successful applications of CLMs to design novel bioactive compounds from scratch in the context of drug discovery, at the interface between theory and wet-lab experiments. Moreover, the talk will provide a personal perspective on current limitations and future opportunities for AI in medicinal and organic chemistry, to accelerate molecule discovery and chemical space exploration.
Francesca Grisoni is a tenure-track Assistant Professor at the Eindhoven University of Technology, where she leads the Molecular Machine Learning team. After receiving her PhD in 2016 at the University of Milano-Bicocca, with a dissertation on machine learning for (eco)toxicology, Francesca worked as a data scientist and as a biostatistical consultant for the pharmaceutical industry. Later, she joined the University of Milano-Bicocca (in 2017) and the ETH Zurich (in 2019) as a postdoctoral researcher, working on machine learning for drug discovery and molecular property prediction. Her current research focuses on developing novel chemistry-centered AI methods to augment human intelligence in drug discovery, at the interface between computation and wet-lab experiments.
Engineered proteins play increasingly essential roles in industries and applications spanning pharmaceuticals, agriculture, specialty chemicals, and fuel. Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Large self-supervised models pretrained on millions of protein sequences have recently gained popularity in generating embeddings of protein sequences for protein property prediction. However, protein datasets contain information in addition to sequence that can improve model performance. This talk will cover pretrained models that use both sequence and structural data, their application to predict which portions of proteins can be removed while retaining function, and a new set of protein fitness benchmarks to measure progress in pretrained models of proteins.
Kevin Yang is a senior researcher at Microsoft Research in Cambridge, MA who works on problems at the intersection of machine learning and biology. He did his PhD at Caltech with Frances Arnold on applying machine learning to protein engineering. Before joining MSR, he was a machine learning scientist at Generate Biomedicines, where he used machine learning to optimize proteins. Before graduate school, Kevin taught math and physics for three years at a high school in Inglewood, California through Teach for America.
In this talk I aim to showcase how machine learning inspired optimisations can help with current state-of-the-art experiments. In particular, I will first consider the readout of semiconductor spin qubits using simple principal component analysis. I will then highlight a specifically fabricated semiconductor device with a 3x3 ‘pixel array’, and discuss the simultaneous tuning of those 9 gate voltages to construct a quantum point contact. And finally, I will move on to larger arrays of quantum dots and the detection of transitions between charge states (i.e. finding the facets of high-dimensional coulomb diamonds).
Evert is a theoretical condensed matter physicist with a background in open systems, numerical simulations and many-body effects. He now also actively works on investigating how both condensed matter physics and machine learning can help each other.
Governing equations are essential to the study of physical systems, providing models that can generalize to predict previously unseen behaviors. There are many systems of interest across disciplines where large quantities of data have been collected, but the underlying governing equations remain unknown. This work introduces an approach to discover governing models from data. The proposed method addresses a key limitation of prior approaches by simultaneously discovering coordinates that admit a parsimonious dynamical model. Developing parsimonious and interpretable governing models has the potential to transform our understanding of complex systems, including in neuroscience, biology, and climate science.
Dr. Bethany Lusch is an Assistant Computer Scientist in the data science group at the Argonne Leadership Computing Facility at Argonne National Lab. Her research expertise includes developing methods and tools to integrate AI with science, especially for dynamical systems and PDE-based simulations. Her recent work includes developing machine-learning emulators to replace expensive parts of simulations, such as computational fluid dynamics simulations of engines and climate simulations. She is also working on methods that incorporate domain knowledge in machine learning, representation learning, and using machine learning to analyze supercomputer logs. She holds a PhD and MS in applied mathematics from the University of Washington and a BS in mathematics from the University of Notre Dame.
With novel measurement technologies easily resulting in a deluge of data, we need to consider multiple perspectives in order to ‘see the forest for the trees.’ A single perspective or scale is often insufficient to faithfully capture the underlying patterns of complex phenomena, in particular in the life sciences. However, moving from an ‘either–or’ selection of relevant scales to a ‘both–and’ utilisation of all scales promises better insights and improved expressivity. The emerging field of topological machine learning provides us with effective tools for building multi-scale representations of complex data. This talk presents two use cases that demonstrate the power of learning such representations. The first use case involves improving antimicrobial resistance prediction—a critical problem in a world suffering from superbugs—while the second use case permits us a glimpse into how cognition changes from early childhood to adolescence.
Bastian is Principal Investigator of the AIDOS Lab at the Institute of AI for Health and the Helmholtz Pioneer Campus, focusing on machine learning methods in biomedicine. Dr. Rieck is also TUM Junior fellow and a member of ELLIS. Dr. Rieck was previously senior assistant in the Machine Learning & Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Zürich and was awarded his Ph.D. in computer science from Heidelberg University.