COLLINS WESTNEDGE
SKILLS
Programming: Python (PyTorch, Transformers, NumPy, pandas, scikit-learn), SQL, Git
LLM/NLP: RAG, prompt engineering, fine-tuning (mixed precision, grad accumulation), shallow fusion for ASR, activation steering, debiasing, contrastive learning
MLOps & Data: Experiment design & permutation tests, calibration, distributed training, MLflow, Databricks/Spark, Azure (Cognitive Search, Document Intelligence), Kubernetes, Hugging Face Hub
EXPERIENCE
Northwestern Mutual — Data Science & Analytics
Senior Data Scientist (2022–Present) | Data Scientist (2021–2022)
- Led data-science efforts for a CSR-facing RAG platform; designed & implemented automated evaluation reducing A/B cycles from ~30 days to 7 minutes; adopted across CSR org (~700 users)
- Unlocked enterprise open source access by partnering with risk/security and documenting guardrails; delivered Whisper ASR pipeline with 95% cost reduction vs AWS Transcribe and 30% relative WER improvement over vendor
- Engineered a multi-stage LLM workflow for regulatory checks (risk tolerance & consent) with human-in-the-loop validation; saved 200+ hours annually while meeting audit requirements
- Sponsored & advised InfoNCE fine-tuning (contrastive + domain adaptation); project owner achieved a 70% reduction in vector storage
Northwestern Mutual Data Science Institute — Research Subcommittee Member (2023–Present)
- Reviewed & advised ML/AI grants, guided experimental design and evaluation, and delivered workshops and panel discussions on ML/AI
Hanover Investment Advisors — Statistical Programming Analyst (2014–2017)
- Analyzed commercial real-estate datasets to inform portfolio allocation and risk modeling for institutional clients
SELECTED PROJECTS
Medical Speech Recognition via Shallow Fusion —
GitHub
- Implemented shallow fusion integrating Whisper ASR with domain-adapted GPT-2 via a custom LogitsProcessor; achieved an 8.5% relative WER reduction (p = 0.012) on medical speech using beam-search integration
- Built an evaluation framework with medical terminology normalization, permutation-based significance testing, and domain-specific error analysis; released models and complete implementation
- Fine-tuned GPT-2 on 3.6B PubMed tokens using a production-grade training pipeline with mixed precision (FP16), gradient accumulation (128 effective batch size), and fault-tolerant checkpointing
- Implemented training optimizations including cosine scheduling with linear warmup, selective weight decay capability, and automatic checkpoint recovery; achieved stable convergence at 131K tokens/batch
- Developed a debiasing method using projection-based concept removal in embedding spaces; 42% reduction in bias triggers while maintaining comparable retrieval quality
EDUCATION
University of Chicago — B.A., Philosophy (2018)
Relevant coursework: Logic; Probability and Statistics; Dynamic Semantics
University of Wisconsin–Milwaukee — Mathematics Coursework (2024–Present)
Completed: Linear Algebra; In progress: Calculus III