Isai Roberto Sotarriva Alvarez

Physics Ph.D. | Machine Learning Engineer | Expertise in Statistical Analysis & Data Science

I am a dedicated Physics Ph.D. candidate with expertise in machine learning, data science, and advanced statistical analysis. During my time in Mexico I worked as part of ALICE Mexico on Machine learning and data analysis for event identification. I have since moved to Japan where I have been working as part of the ATLAS collaboration, during my master's I developed a computer vision algorithm for quality control on the new semiconductor detectors. On my Ph. D. I am working on a GNN/RNN method for background removal for the electron and jet backgrounds on the diphoton analysis. I am currently also working as a teaching assistance at Tokyo Institute of Technology. I am passionate about applying computational and analytical skills to solve complex problems, I am seeking roles in machine learning engineering or postdoctoral research in physics.

Isaí Roberto Sotarriva Álvarez

Early Work: Simulation and Statistical Analysis

During my bachelor's degree, I worked on Monte Carlo simulations using Pythia and presented results at the ALICE Mexico conference.

Bachelor's Thesis: First Use of Machine Learning

For my bachelor's thesis, I used machine learning techniques within the ROOT TMVA framework. The work focused on feature ranking, feature decorrelation, covariance analysis, and systematic comparison of multiple classifiers, including linear discriminants, support vector machines, neural networks, and boosted decision trees. Model performance was evaluated using statistical significance, which was used to define the final working point.

Master's Work: Detector Quality Control and Image Processing

During my master's studies in Japan, I worked on quality control for semiconductor pixel detector modules. This involved image processing tasks such as high-performance image stitching and visual inspection of wire bonding. I focused on traditional computer vision approaches rather than machine learning in order to improve stability, interpretability, and controllability in a production environment.

ATLAS Qualification Task: Scientific Software Development

As part of my ATLAS qualification task, I worked on detector alignment using the ATHENA software framework. This required a detailed understanding of the reconstruction software and adherence to a strict development workflow, including code style requirements, detailed reviews, and iterative revisions before merge approval. This experience provided strong exposure to large, long-lived scientific codebases.

Ph.D. Research: Machine Learning Under Experimental Constraints

At the beginning of my Ph.D., I explored graph neural network approaches to improve the discrimination between diphoton signal events and background from misidentified electrons. This work highlighted practical limitations related to simulation accuracy, domain shift between Monte Carlo and data, and the constraints imposed by analysis approval procedures in large collaborations.

Current Ph.D. Work: Probabilistic Background Modeling

My current thesis work focuses on modeling the Z→ee background in diphoton analyses. The problem can be formulated as an optimal transport task for the invariant mass distribution, combined with a probabilistic model for event yields. Each particle misidentification is treated as an independent stochastic process, allowing both shape and normalization effects to be modeled consistently.

Local LLM + RAG

Early exploration of retrieval-augmented generation using local inference and vector search.

Paper Recommendation System

Vector-based recommendation with user feedback and an interactive graph UI.

Physics PhD background

Experimental HEP researcher within the ATLAS collaboration at the HL-LHC.

🧠

ML & Data Science

Optimal transport, GNNs, probabilistic modeling, and retrieval-augmented systems.

🚀

Open to opportunities

Seeking ML engineering or postdoc roles where physics meets data-driven insight.

Skills

Languages

Python
90%
C++
80%
TypeScript / JS
70%
YAML / Bash
60%

ML & Data

PyTorch / PyG
85%
Keras / TensorFlow
75%
Optimal Transport
85%
Uncertainty Modeling
80%

Physics Tools

ROOT / TMVA
90%
Athena Framework
80%
Monte Carlo (Pythia)
85%
OpenCV
70%

Dev Tools

Git / GitHub
85%
FastAPI / REST
75%
Vercel / Next.js
70%
Vector Search / RAG
75%