ML Systems · High Dimensional Data · Scalable Inference

Stellamaris
Nakacwa

Applied Research Scientist adaptating computational intelligence (AI/ML) to real-world messiness, including efficient neural retrieval, document intelligence modeling and inferencing, and sensor/geometric data modelling with advanced methods. Eight years across national labs, academic research programs, and applied systems.

Selected Work

Research & Projects

Three areas of sustained inquiry, each asking a variation of the same question: how do you make ML systems that actually work on messy, high dimensional, real-world data, at inference time, under real constraints?

Start Here · Key Themes In My Work

01 — Thesis (04.2026)

Algorithm Resilience Under System

"Is embedding algorithm architecture inherently resilient to traffic constraints — or does resilience have to be designed in, at a cost?"

Stress-tests HNSW and IVF ANN architectures across a controlled degradation ladder (baseline → bandwidth-constrained → high-latency → packet loss), mirroring cloud throttling, cross-region lag, and edge node failures. Core contribution: a diagnostic sensitivity framework — HNSW is latency-sensitive (sequential traversal); IVF is bandwidth-sensitive (fewer, larger transfers) — giving engineers a principled vocabulary for architecture-to-environment matching. Energy per query tracked as a first-class metric alongside recall and latency; the research asks which design decisions make retrieval systems inherently robust or fragile, and at what cost. In short: Most research asks how to make retrieval faster. My thesis asks which retrieval architectures are inherently fragile to network conditions — and what that fragility costs in energy

Neural Retrieval architecture Netwoork Resilience FAISS H100, NSF ACCESS Parallel Systems
Thesis forthcoming →

02 — Publication (In Preparation)

Differentiable ML Evaluation: A Numerical Conditioning Approach

"What if metric instability isn't a statistics problem — it's a numerical conditioning problem?"

Co-authored paper reframing binary classifier evaluation as a well-posedness problem. We extend the binormal ROC model into a unified differentiable manifold linking ROC, PR, and F₁ simultaneously — enabling threshold optimization via Brent, Golden-section, and RK4 algorithms on a smooth, analytically grounded surface. Four of five optimizers converged to identical optima within 10⁻⁶ tolerance. Bootstrap experiments show >40% reduction in threshold variance under smoothing. Newton's method diverged, exposing the non-convex structure of empirical F₁ and the necessity of bounded search.

Numerical Analysis ML Evaluation ROC / PR Curves Optimization Binormal Model
Preprint forthcoming →

03 — Applied ML Engineering

AIPDF2Table: Structured Extraction from Unstructured Documents

"How do you turn a decade of PDF reports into a queryable knowledge base?"

Built a production document intelligence system that extracts structured table data from heterogeneous PDF corpora — the kind of documents where layout, encoding, and schema vary unpredictably. The key research contribution is the disambiguation layer: handling merged cells, multi-header tables, and rotated layouts without a fixed template. Now used on real document pipelines.

Document Intelligence NLP Information Extraction Python
View on GitHub →

04 — Program Leadership

YouthMappers: Open Geospatial Data at Scale

"How do you build AI/ML-ready datasets for domains where none exist?"

Expanded open source science curriculum and research program for YouthMappers through training, projects, and creating open geospatial datasets from diverse sources for policy and ML research learning. Developed pedagogical frameworks for ML + GIS methods and trained student scale. The outputs — datasets, methods, and trained practitioners — are in active use in the research community. Work directly addressed training data limitations and new ML/DL model developments.

Open Data Ecosystems Data-centric AI Spatial Intelligence Training Data Development
Program overview →

Ongoing Series

Parallel & Distributed Systems Notes

A chapter-by-chapter study of parallel computing applied to ML systems & large-scale data structures. 3 entries published, ongoing.

View Series →

Dynamic Resource

AI/ML Systems Research Links

Automatically curated papers and news on embedding systems, geospatial ML, and parallel computing. Updated weekly.

Explore Links →

Scholarly Work

Publications

Peer-reviewed contributions across energy systems informatics, geological data infrastructure, and applied learning frameworks.

LinkedIn profile card
Stellamaris Nakacwa LinkedIn profile photo SN

Stellamaris (Stella) N.W

Research Scientist · Leidos

Click to flip

Connect & Verify

Use this quick panel for direct profile links and publication verification.

Snapshot of published and in-progress scholarly work with citation signals, adapted to a cleaner Scholar-style reading flow.

59 Citations
2 h-index
2 i10-index
Year Title & Venue Cited by Link
2024

Digital humanitarians for the sustainable development goals: YouthMappers as a hybrid movement

Sustainable Development Goals · YouthMappers

34 View
2024

Open mapping towards sustainable development goals: voices of youthmappers on community engaged scholarship

Open Mapping · Community Engagement

23 View
2024

WELLBASE: A Standardized Data Infrastructure for Well Log Analytics

Geological, Oil & Gas data systems · Peer-reviewed

- View
2024

ROKBASE: Rock Sample Database for Imaging DL Applications

Imaging systems · Peer-reviewed

- View
2024

Cities of the Future Need to Be Both Smart and Just: How We Think Open Mapping Can Help

Smart Cities · Open Mapping

1 View
2024

Combining Volunteered Geographic Information and WPdx standards to Improve Mapping of Rural Water Infrastructure in Uganda

Geographic Information Systems · Water Infrastructure

1 View
2024

A National Critical Minerals and Materials Prospectus and Resource Assessment Methodology

Critical Minerals · Resource Assessment

- View
2024

RokBase Core Facility Data Collection

Digital Rock · Core Facility

- View
2024

WELLBASE-An Interactive Platform for Wellbore Material Assessment

Wellbore Materials · Interactive Platform

- View
2024

Carbon Storage Wellbore Materials Data Tool

Carbon Storage · Wellbore Materials

- View
2024

Rokbase: Digital Rock Visualization and Exploration Web Application

Digital Rock · Web Application

- View
2023

Lite Learning: Efficient Crop Classification in Tanzania Using Traditional Machine Learning & Crowd Sourcing

Crop Classification · Machine Learning · Crowd Sourcing

- View
2025

Numerical Smoothing of Noisy Evaluation Surfaces: A Classical Approach to Robust ML Threshold Optimization

Nakacwa S., Luis P. · Harrisburg University · In preparation

- Preprint forthcoming
2025

Thesis: Architecture Resilience Under Network Degradation: A Controlled Benchmarking Study of Embedding Retrieval Systems

Harrisburg University · In preparation

- Forthcoming

Background

Experience

Eight years across mission-critical research, academic ML, and applied systems engineering — always working on the same class of problem from different vantage points - how do intelligent systems behave when the environment they were designed for stops cooperating?

★ The complete hand

Download the full resume

Every role, publication, project & skill — formatted as a two-page PDF you can save, share, or print.

Download PDF
VIII
years
VI
roles
III
institutions
II
programs led
Experience 6 cards · click to flip

Reverse chronological. Current role highlighted.

A
A
★ Current · 2023 — present
Leidos / NETL

AI/ML R&D Science & Engineering

Embedding knowledge & computer vision for mineral regimes. Mathematical models and pipelines reconstructing fragmented oil-and-gas records into a unified national science asset.

NETLenergyvision
A
A
Detail · A♣
employer:
Leidos (NETL contract)
focus:
Energy resource AI/ML
output:
Inference systems for archive recovery
Develops inference systems that make inaccessible document archives — PDFs, scanned reports, legacy formats — machine-readable, recovering decades of domain knowledge for research.
K
K
2024 — 2026
Harrisburg University

Graduate Research

Computing systems & algorithms. Benchmarking robustness of retrieval-algorithm architectures. Strengthening AI/ML and computing-systems knowledge.

graduatealgorithms
K
K
Detail · K♣
program:
Computing Systems & Algorithms
thesis:
retrieval-architecture resilience
Graduate-level research on parallel systems, ML algorithms, and benchmarking under realistic network conditions.
Q
Q
2021 — 2022
West Virginia University

Graduate Research Assistant

ML training data & large-data system design. Addressed losses arising from database-schema invariation in open-source data.

data systemsopen data
Q
Q
Detail · Q♣
focus:
training data architectures
angle:
schema variance under scale
Research project on ML training data & large-data system design — earlier formulation of the data-quality problems now pursued at NETL.
J
J
2017 — 2023
YouthMappers

Program Director, Regional Training

Expanded open-science curriculums for GIS & ML application. Open geospatial dataset creation for communities. Program outputs supporting policy & AI/ML research.

programGIStraining
J
J
Detail · J♣
scope:
multi-year, international
built:
ML + GIS pedagogical frameworks
outputs:
datasets, methods, practitioners
Trained students at scale; the outputs — datasets, methods, trained practitioners — remain in active use across the research community.
10
10
2024 · Independent
Research Product

Applied ML — Software Development

Produced a document-intelligence utility tool for structured extraction from heterogeneous PDF corpora. Focus on software architecture & design for transformer models.

tool devNLP
10
10
Detail · 10♣
output:
AIPDF2Table
mode:
independent research product
Document intelligence utility for structured extraction from heterogeneous PDF corpora. Software architecture for transformer-model application.
9
9
2018 — 2020
LANDnet · Policy Research

Geospatial Subject Matter Expert

Mapped & advised schema variations for land-data mapping and national record digitisation. Data management, processing & software development.

policygeo
9
9
Detail · 9♣
capacity:
geospatial SME
scope:
national record digitisation
Schema-mapping advisor for land-data & national-record digitisation programmes. Data management, processing, and software development at the policy interface.

Exploratory Work

Experiments & Projects

Algorithmic tests, benchmark studies, and data models — work that expands the boundaries of knowledge to real-world incidents.

Behavioral Signal Extraction from Mobility Data

Treats COVID-19 lockdown periods as a natural experiment — asking what population movement signals reveal about how policy propagates through behavior, and whether real-time observation changes the answer.

GitHub →

Infrastructure Adoption as a Spatial Problem

Models U.S. EV charging growth as a spreading pattern across geography - to where adoption moves next and what the shape of that curve reveals about energy transition timelines.

GitHub →

Air Quality Geospatial Pipeline

Scrapy-based pipeline collecting EPA AirNow data for geospatial air quality analysis.

GitHub →

Time Series Forecasting Compendium

Comparative study of classical statistical and deep learning forecasting models (ARIMA, Prophet) — where each architectural class breaks down and whether failure modes are predictable from the structure of the algorithm.

GitHub →

Document Intelligence -OCR

Early tests in recovering structured data from documents never intended for machines- a precursor to the AIPDF2Table tool - A research table extraction and processing pipeline for PDF and multi-format document corpora.

GitHub →

GPU Modernization of Floyd-Warshall

Extending and benchmarking of Floyd-Warshall all-pairs shortest path on CPU vs Modern GPUs — examining where the architecture assumptions change. Part of the PCAM methodology study; results and benchmarks in the repo.

GitHub →

How I Can Help

About Me & Service Engagements

My work spans document intelligence, computer vision, geospatial mapping/imaging CV, and machine learning systems research. I support organizations and cross-functional teams from prototype through production.

I translate mathematical concepts into algorithmic systems, develop CV models for segmentation and imaging workflows, and design scalable ETL, model development, and inference strategies.

I also build and ship document intelligence applications to production, with emphasis on reliability under real-world data and infrastructure constraints.

Document Intelligence & Multimodal AI

  • Prototype and evaluate extraction pipelines for mixed-format document corpora.
  • Design table/layout understanding flows for noisy, heterogeneous documents.
  • Productionization support: ingestion, validation, and downstream integration.

Computer Vision & Geospatial Imaging

  • Segmentation model development for material, imaging, and mapping workflows.
  • Geospatial ML/CV architecture for earth systems and infrastructure problems.
  • Benchmark design that reflects operational and data constraints.

ML Systems & Applied Research Strategy

  • Translate research questions into measurable, testable engineering roadmaps.
  • Large-scale ETL, model development, and inference strategy for enterprise teams.
  • Program development for applied AI initiatives and cross-functional execution.