Document Intelligence
Multimodal extraction and production pipelines for high-variance enterprise documents.
ML Systems · High Dimensional Data · Scalable Inference
Applied Research Scientist adaptating computational intelligence (AI/ML) to real-world messiness, including efficient neural retrieval, document intelligence modeling and inferencing, and sensor/geometric data modelling with advanced methods. Eight years across national labs, academic research programs, and applied systems.
Away From Research
My Fun
Books, and Sequels - how I stay adventurous.
JOIN MY DISCOVERYMini Scroll Break
Find the hidden bookmark in 3 tries.
Connect
Selected Work
Three areas of sustained inquiry, each asking a variation of the same question: how do you make ML systems that actually work on messy, high dimensional, real-world data, at inference time, under real constraints?
Multimodal extraction and production pipelines for high-variance enterprise documents.
Geospatial imaging and segmentation work spanning applied systems and open-data science.
Architecture robustness, numerical conditioning, and benchmark design under constraints.
01 — Thesis (04.2026)
"Is embedding algorithm architecture inherently resilient to traffic constraints — or does resilience have to be designed in, at a cost?"
Stress-tests HNSW and IVF ANN architectures across a controlled degradation ladder (baseline → bandwidth-constrained → high-latency → packet loss), mirroring cloud throttling, cross-region lag, and edge node failures. Core contribution: a diagnostic sensitivity framework — HNSW is latency-sensitive (sequential traversal); IVF is bandwidth-sensitive (fewer, larger transfers) — giving engineers a principled vocabulary for architecture-to-environment matching. Energy per query tracked as a first-class metric alongside recall and latency; the research asks which design decisions make retrieval systems inherently robust or fragile, and at what cost. In short: Most research asks how to make retrieval faster. My thesis asks which retrieval architectures are inherently fragile to network conditions — and what that fragility costs in energy
Thesis forthcoming →02 — Publication (In Preparation)
"What if metric instability isn't a statistics problem — it's a numerical conditioning problem?"
Co-authored paper reframing binary classifier evaluation as a well-posedness problem. We extend the binormal ROC model into a unified differentiable manifold linking ROC, PR, and F₁ simultaneously — enabling threshold optimization via Brent, Golden-section, and RK4 algorithms on a smooth, analytically grounded surface. Four of five optimizers converged to identical optima within 10⁻⁶ tolerance. Bootstrap experiments show >40% reduction in threshold variance under smoothing. Newton's method diverged, exposing the non-convex structure of empirical F₁ and the necessity of bounded search.
Preprint forthcoming →03 — Applied ML Engineering
"How do you turn a decade of PDF reports into a queryable knowledge base?"
Built a production document intelligence system that extracts structured table data from heterogeneous PDF corpora — the kind of documents where layout, encoding, and schema vary unpredictably. The key research contribution is the disambiguation layer: handling merged cells, multi-header tables, and rotated layouts without a fixed template. Now used on real document pipelines.
View on GitHub →04 — Program Leadership
"How do you build AI/ML-ready datasets for domains where none exist?"
Expanded open source science curriculum and research program for YouthMappers through training, projects, and creating open geospatial datasets from diverse sources for policy and ML research learning. Developed pedagogical frameworks for ML + GIS methods and trained student scale. The outputs — datasets, methods, and trained practitioners — are in active use in the research community. Work directly addressed training data limitations and new ML/DL model developments.
Program overview →Ongoing Series
Parallel & Distributed Systems Notes
A chapter-by-chapter study of parallel computing applied to ML systems & large-scale data structures. 3 entries published, ongoing.
Dynamic Resource
AI/ML Systems Research Links
Automatically curated papers and news on embedding systems, geospatial ML, and parallel computing. Updated weekly.
Scholarly Work
Peer-reviewed contributions across energy systems informatics, geological data infrastructure, and applied learning frameworks.
Stellamaris (Stella) N.W
Research Scientist · Leidos
Click to flip
Connect & Verify
Use this quick panel for direct profile links and publication verification.
Digital humanitarians for the sustainable development goals: YouthMappers as a hybrid movement
Sustainable Development Goals · YouthMappers
Open mapping towards sustainable development goals: voices of youthmappers on community engaged scholarship
Open Mapping · Community Engagement
WELLBASE: A Standardized Data Infrastructure for Well Log Analytics
Geological, Oil & Gas data systems · Peer-reviewed
ROKBASE: Rock Sample Database for Imaging DL Applications
Imaging systems · Peer-reviewed
Cities of the Future Need to Be Both Smart and Just: How We Think Open Mapping Can Help
Smart Cities · Open Mapping
Combining Volunteered Geographic Information and WPdx standards to Improve Mapping of Rural Water Infrastructure in Uganda
Geographic Information Systems · Water Infrastructure
A National Critical Minerals and Materials Prospectus and Resource Assessment Methodology
Critical Minerals · Resource Assessment
WELLBASE-An Interactive Platform for Wellbore Material Assessment
Wellbore Materials · Interactive Platform
Rokbase: Digital Rock Visualization and Exploration Web Application
Digital Rock · Web Application
Lite Learning: Efficient Crop Classification in Tanzania Using Traditional Machine Learning & Crowd Sourcing
Crop Classification · Machine Learning · Crowd Sourcing
Numerical Smoothing of Noisy Evaluation Surfaces: A Classical Approach to Robust ML Threshold Optimization
Nakacwa S., Luis P. · Harrisburg University · In preparation
Thesis: Architecture Resilience Under Network Degradation: A Controlled Benchmarking Study of Embedding Retrieval Systems
Harrisburg University · In preparation
Background
Eight years across mission-critical research, academic ML, and applied systems engineering — always working on the same class of problem from different vantage points - how do intelligent systems behave when the environment they were designed for stops cooperating?
Every role, publication, project & skill — formatted as a two-page PDF you can save, share, or print.
Download PDF ↓Click any card for the full story behind it.
Flip through cards ↓Reverse chronological. Current role highlighted.
Embedding knowledge & computer vision for mineral regimes. Mathematical models and pipelines reconstructing fragmented oil-and-gas records into a unified national science asset.
Computing systems & algorithms. Benchmarking robustness of retrieval-algorithm architectures. Strengthening AI/ML and computing-systems knowledge.
ML training data & large-data system design. Addressed losses arising from database-schema invariation in open-source data.
Expanded open-science curriculums for GIS & ML application. Open geospatial dataset creation for communities. Program outputs supporting policy & AI/ML research.
Produced a document-intelligence utility tool for structured extraction from heterogeneous PDF corpora. Focus on software architecture & design for transformer models.
Mapped & advised schema variations for land-data mapping and national record digitisation. Data management, processing & software development.
Exploratory Work
Algorithmic tests, benchmark studies, and data models — work that expands the boundaries of knowledge to real-world incidents.
Behavioral Signal Extraction from Mobility Data
Treats COVID-19 lockdown periods as a natural experiment — asking what population movement signals reveal about how policy propagates through behavior, and whether real-time observation changes the answer.
GitHub →Infrastructure Adoption as a Spatial Problem
Models U.S. EV charging growth as a spreading pattern across geography - to where adoption moves next and what the shape of that curve reveals about energy transition timelines.
GitHub →Air Quality Geospatial Pipeline
Scrapy-based pipeline collecting EPA AirNow data for geospatial air quality analysis.
GitHub →Time Series Forecasting Compendium
Comparative study of classical statistical and deep learning forecasting models (ARIMA, Prophet) — where each architectural class breaks down and whether failure modes are predictable from the structure of the algorithm.
GitHub →Document Intelligence -OCR
Early tests in recovering structured data from documents never intended for machines- a precursor to the AIPDF2Table tool - A research table extraction and processing pipeline for PDF and multi-format document corpora.
GitHub →GPU Modernization of Floyd-Warshall
Extending and benchmarking of Floyd-Warshall all-pairs shortest path on CPU vs Modern GPUs — examining where the architecture assumptions change. Part of the PCAM methodology study; results and benchmarks in the repo.
GitHub →How I Can Help
My work spans document intelligence, computer vision, geospatial mapping/imaging CV, and machine learning systems research. I support organizations and cross-functional teams from prototype through production.
I translate mathematical concepts into algorithmic systems, develop CV models for segmentation and imaging workflows, and design scalable ETL, model development, and inference strategies.
I also build and ship document intelligence applications to production, with emphasis on reliability under real-world data and infrastructure constraints.
Contract Scope
Advisory sprints, prototype builds, evaluation frameworks, and production handoff support.
Collaboration Model
Embedded with product, research, and engineering teams to accelerate decision-making and delivery.
Also Open To
Senior research-science, engineering, and applied AI roles with high-impact systems scope.