Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning.

Lifar, Mikhail S; Tereshchenko, Andrei A; Bulgakov, Aleksei N; Guda, Sergey A; Guda, Alexander A; Soldatov, Alexander V

Lifar, Mikhail S; Tereshchenko, Andrei A; Bulgakov, Aleksei N; Guda, Sergey A; Guda, Alexander A; Soldatov, Alexander V.

Afiliación

Lifar MS; The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
Tereshchenko AA; The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
Bulgakov AN; The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
Guda SA; The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
Guda AA; Institute for Mathematics, Mechanics and Computer Science in the name of I.I. Vorovich, Southern Federal University, 344090 Rostov-on-Don, Russia.
Soldatov AV; The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.

ACS Omega ; 9(26): 27987-27997, 2024 Jul 02.

Article en En | MEDLINE | ID: mdl-38973853

ABSTRACT

ABSTRACT

Metal nanoparticles are widely used as heterogeneous catalysts to activate adsorbed molecules and reduce the energy barrier of the reaction. Reaction product yield depends on the interplay between elementary processes adsorption, activation, desorption, and reaction. These processes, in turn, depend on the inlet gas composition, temperature, and pressure. At a steady state, the active surface sites may be inaccessible due to adsorbed reagents. Periodic regime may thus improve the yield, but the appropriate period and waveform are not known in advance. Dynamic control should account for surface and atmospheric modifications and adjust reaction parameters according to the current state of the system and its history. In this work, we applied a reinforcement learning algorithm to control CO oxidation on a palladium catalyst. The policy gradient algorithm was trained in the theoretical environment, parametrized from experimental data. The algorithm learned to maximize the CO2 formation rate based on CO and O2 partial pressures for several successive time steps. Within a unified approach, we found optimal stationary, periodic, and nonperiodic regimes for different problem formulations and gained insight into why the dynamic regime can be preferential. In general, this work contributes to the task of popularizing the reinforcement learning approach in the field of catalytic science.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: ACS Omega Año: 2024 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google