GitHub | Resume | Substack | LinkedIn
ai.interpreter@proton.me
Hi, I'm Jason Tang (aka David Kim). I’m an AI safety research engineer with a background in cognitive science and experience across adversarial evaluations, AI control, agentic systems, and sociotechnical risk. I’m currently part of the Technical Alignment Research Accelerator (TARA) and the Singapore AI Safety Hub (SASH). I’m also currently being considered for an AI security researcher role at Carnegie Mellon SEI.
My strongest work starts with a fuzzy failure mode and turns it into something testable: an eval, a harness, a taxonomy, or a bounded empirical result. Recently, I built an attacker-pressure evaluation workflow for Control Tower, a new control setting from Redwood Research and showed that stronger attacker lanes can flip a fixed-threshold control conclusion. The report is published on Apart Research.
More broadly, I care about making safety-looking results harder to mistake for actually robust ones.
I’ve worked on AI security evals and red/blue teaming for Anthropic, Trajectory Labs, and Innodata, while building open-source evaluation and security tooling used by thousands of practitioners.
I also think a lot about the harder-to-classify side of AI risk: psychosocial harms, incident taxonomy, hazard estimation, and the problem of turning fuzzy emerging patterns into categories that governance can actually use. That line of work grew out of independent research and qualitative input I contributed to Arcadia Impact researchers working with the MIT AI Risk Repository & AI Incident Database ecosystem.
I’m most interested in roles and collaborations where I can help build rigorous evaluations, safer agent infrastructure, and better measurement for advanced-AI risk.
– Jason Tang
Selected Work
- Attacker Pressure Flips AI Control Conclusions — bounded empirical result on attacker-pressure-sensitive control evaluation
- Context Engineering — open-source methodology for context manipulation and prompt-injection testing
- AISecForge — AI security vulnerability taxonomy and disclosure framework
- Model Research Instruments — behavioral evaluation framework for scaffolded model hypothesis generation
- Cognitive Tools — prompt optimization and controllability research
Open Source
Quant Lab | Cognitive Tools | RL101 | TransformerCircuits | NVIDIA Universal Deep Research | AISecForge | Context Engineering | kernels
Research Interests
AI control, adversarial evaluation, scalable oversight, agentic red/blue teaming, sociotechnical risk, psychosocial harms, incident taxonomy, and research tooling for advanced AI systems.
Education & Independent Study
B.S., Psychology (Statistics) — University of Texas at Austin
ARENA: AI Safety Bootcamp
Stanford: CS229, CS230, CS231N, CS336
Berkeley: CS285
Harvard: CS 2881 (AI Safety)
Google: Machine Learning Education