Six frameworks and tools at the intersection of AI governance, bias detection, behavioral stability, and responsible deployment. All reproducible by design.
Every project in this portfolio connects back to a single question: how do we build AI systems that are trustworthy not just in theory, but in practice, at scale, under adversarial conditions? The answers span bias detection, epistemic governance, behavioral stability, and equitable deployment.
CIPHER demonstrates how bias can transfer silently from teacher to student LLMs through knowledge distillation, while standard evaluation metrics remain completely clean. The standard safety checks pass. The bias persists.
This introduced the "Metric Illusion" as a named phenomenon in AI safety research — a condition where evaluation metrics indicate safe, unbiased behavior while the underlying model perpetuates the very biases the evaluation was designed to catch.
The implications extend beyond distillation to any scenario where evaluation methodology does not account for distributional shifts in how bias manifests across model generations. CIPHER was presented at the KSU Computing Showcase and Symposium of Student Scholars.
Co-authored with Destiny Raburnel, PRISM surfaces how LLMs exhibit systematically different evaluation behavior across demographic contexts in resume screening tasks. The same qualifications, presented in different demographic framings, produce measurably different assessments.
The implications are direct and serious — organizations using LLMs to screen candidates may be encoding and amplifying the same demographic biases that human hiring processes have spent decades trying to eliminate.
PRISM was presented at the KSU Computing Showcase and contributes to the growing body of evidence that AI hiring tools require rigorous, context-sensitive bias evaluation before deployment.
The Surrogate Accountability Framework establishes the theoretical foundation for external AI accountability: the idea that governance of AI systems must be structurally separated from the systems being governed.
The framework is organized around four pillars: Entitlement Governance (defining what agents are permitted to do and believe), Continuous Observability (real-time monitoring of agent state), Lifecycle Accountability (governance across the full deployment lifecycle, not just at launch), and Emergency Governance (structured response to epistemic failures and safety events).
SAF is the theoretical architecture from which CHRYSALIS is derived. The whitepaper, now at v2.0, is publicly available at chrysalisai.io.
Explore CHRYSALISUIBF is a theoretical framework examining how interaction styles and conversational patterns induce measurable behavioral instability in large language models. The core claim is that users do not simply query LLMs — they shape them, often in ways that neither party intends or recognizes.
Extended interactions, particular conversational styles, and specific prompt patterns can push LLMs into behavioral fields — stable but non-standard response modes that persist across a session and may influence outputs in unpredictable ways.
UIBF has implications for both user interface design and safety monitoring, suggesting that behavioral stability cannot be evaluated from single-turn benchmarks alone.
CRAFT is a reproducible evaluation tool for measuring LLM output fidelity across prompt variations and retrieval conditions. It was built in response to a specific gap: standard RAG benchmarks evaluate whether systems retrieve relevant information, but not whether they maintain fidelity to that information under varied prompt conditions.
CRAFT tests LLM pipelines across four prompt styles (direct, contextual, adversarial, and ambiguous) on a real-world corpus, surfacing the reliability gaps that emerge when systems move from controlled evaluation to deployment conditions.
The tool is publicly deployed and reproducible. Live demo at msmetamorphosis.github.io/CRAFT.
View Live DemoVetNavi (NextMission Navigator) is my MSAI capstone project under Dr. Arthur Choi. It is a retrieval-augmented generation platform designed to help veterans navigate the transition from military service to civilian careers.
The system answers questions about VA benefits, civilian career pathways, education options, and transition resources by retrieving from a curated, verified corpus of veteran-specific information, then generating accurate, contextually appropriate responses.
VetNavi is the direct application of everything I research about RAG reliability and LLM fidelity — built in service of a population that cannot afford systems that hallucinate or mislead. The population matters. The stakes are real. The system has to work.
All publicly deployed projects are accessible on GitHub. Each is designed to be reproducible.