LinkedIn | GitHub | CV/Resume | Substack
ai.interpreter@proton.me
Research Samples
SlopBench: Informal Specs as In-Context Oracles for Secure Program Synthesis
BioGuard: Screening Biological Risk Across Multi-Turn AI Conversations
Attacker Pressure as a First-Class Variable in AI Control Evaluation
Hi, I'm Jason Tang (aka David Kim). I'm an interdisciplinary researcher and operational generalist bridging technical evaluation, behavioral science, cybersecurity/biosecurity, and governance. Currently, I'm focused on the AI Safety ecosystem's core generalist bottleneck (as defined by the Generator Residency).
I actually transitioned from natural science research and a clinical lab background (psychology/biology) into AI Safety research through fieldbuilding programs (like BlueDot, ARENA, etc), fellowships, and most importantly, Apart Research Sprints!
I think my comparative advantage is in translating abstract agent failure modes into deployable infra, decision relevant evals, and practical research agendas: identifying which concerns actually matter, scoping them into tractable projects, and building the physical protocols, benchmarks, and actionable research agendas that institutions can use to progress their work.
Right now, I lead research management for Beyond Overload as a CORDA Democracy Fellow, an independent contributor to a SPAR Project, and a member of the Singapore AI Safety Hub (SASH). I've also contributed mixed-methods data and input to Arcadia Impact and MIT FutureTech on AI incident classification, particularly for behavioral issues (like AI dependency, which was used in an ICML paper submission, which I was invited to give feedback on. I was also a recent participant at AI Security Bootcamp Singapore, learning fieldbuilding advice and career guidance from key community leaders like David Williams-King from ERA Cambridge, Jan MichelFeit from UK AISI, and Nitzan Shulman from Heron AI Security. This lead to my interest in fieldbuilding and a Coefficient Giving Proposal for an AI Biosecurity Bootcamp.
I'm honestly still figuring out where my highest leverage is: pure research, operations, or acting as the bridge between them. I plan to keep building my own technical evaluations on the side, because the most effective field-builders I know remain strong researchers themselves.
I spend a lot of time thinking about how frontier models influence human attitudes and behavior. Recently, I authored a working paper on evaluating psychosocial risk in high-engagement AI systems, designing psychology-grounded benchmarks for measuring socioaffective harms and dependency in companion chatbots. I'm deeply interested in mapping how models persuade, deceive, or shift into misaligned personas under in context learning.
I've led multiple Apart Research sprints. On the security side, I recently engineered BioGuard, a biosecurity protocol that catches malicious intent spread across multi-turn interactions.
Before transitioning to independent research, I ran adversarial evaluations and red-teaming for Anthropic, Trajectory Labs, ActiveFence, and Innodata. I've built hypervisor-based containment architectures, prompt-injection test suites, and open-source behavioral evaluation tooling used by thousands of practitioners.
I'm generally looking for roles and collaborations where I can design rigorous human-AI experiments, build better behavioral evaluations, or investigate the sociotechnical impacts of advanced AI.
Jason Tang
Selected Research & Writing
- Evaluating Psychosocial Risk in High-Engagement AI Systems: Proposed a psychology-grounded benchmark for measuring socioaffective harms, theory of mind failures, and dependency risks in companion chatbots.
- Beyond Overload: Built a working protocol for AI-mediated human deliberation, mapping how agent routing and pacing affect collective reasoning and attention allocation.
- In-Context Persona Shift Evals for CBRN Risk: Architected an evaluation framework measuring how frontier models get manipulated into misaligned personas via domain-specific context.
- Attacker Pressure Flips AI Control Conclusions: Bounded empirical result proving that stronger adaptive adversaries invalidate baseline oversight benchmarks.
- BioGuard: Screening Biological Risk Across Multi-Turn AI Conversations: Engineered a portable Agent Skill to monitor incremental biological knowledge transfer across an entire conversation history.
- Operationalizing SB 53 Through Hypervisor Agent Containment: Proposed a runtime architecture with microVM isolation and fail-closed fallback chains to translate statutory safety language into engineering invariants.
Open Source Tooling
Model Research Instruments | Context Engineering | AISecForge | arena-explainers | Cognitive Tools | Quant Lab
Research Interests
Human-AI interaction, psychosocial risk, persuasion and manipulation in conversational agents, AI control, adversarial threat modeling, computational social science, and translating fuzzy harms into governance-ready incident taxonomies.
Affiliations & Background
CORDA: Democracy Fellow (Research Management)
BlueDot Impact: Technical AI Safety Cohort
Singapore AI Safety Hub (SASH): Member
University of Texas at Austin: B.S., Psychology (Statistics)