Current work

Research

I design AI systems and studies, then test whether the apparent gains are stable, interpretable, and meaningful beyond a single result.

Overview

Research areas

Data and Computational Inference

Constructing useful measurements and representations from structured and unstructured data, then testing what can be inferred from them.

AI Experimentation

Designing controlled interventions to study how AI systems change decisions, coordination, and other downstream outcomes.

Model and System Evaluation

Training and adapting language models, building reproducible pipelines, measuring computational efficiency, and testing whether improvements generalize.

In Depth

Current research

Computational Experimentation

How AI support changes collective action

I am co-developing an online experiment that separates information provision from strategic interpretation, normative framing, and coordination support. The goal is to identify which kinds of assistance change what people do—not merely what they are shown.

Shared supply chains provide a concrete social dilemma: buyers can improve labor conditions together, but each has an incentive to reduce its own costs and hope others act. The study turns that tension into repeated choices, messages, links, and group outcomes.

Design: Five human buyers, one fixed-rule supplier, four experimental conditions
Role: Co-developing the study design, treatments, and behavioral measures
Status: Working paper complete; experimental implementation in progress

AI and Human Collaboration in Social Dilemmas

With Benjamin Rosche and Hanan Salam · Working paper, June 2026

Participants decide whether to pay fairly, audit, form information-sharing links, communicate, and join costly remediation. The AI does not choose on anyone’s behalf; it changes how people interpret and coordinate around the decision.

Structured-information control: rules, records, findings, and communication tools without AI interpretation.
Strategic interpretation: a private explanation of the decision’s strategic structure.
Normative interpretation: responsibility, complicity, and harm frames alongside the strategic context.
Collective-action support: partner suggestions, conditional-message drafting, and coalition tracking.

Measures include fair pay, auditing, links, remediation, welfare, responsibility attribution, and persistence after AI withdrawal.

Request the working paper

Model Training and Evaluation

Efficient adaptation with biological priors

This project asks how domain knowledge can be introduced into pretrained protein models while preserving the efficiency benefits of parameter-efficient fine-tuning.

Bio-Informed LoRA for Signal Peptide Prediction

Research Assistant, eBRAIN Lab, NYU Abu Dhabi · 2026 to present · Co-first author

I co-developed a method that incorporates BLOSUM62, hydrophobicity, and Grantham priors into LoRA adapters for ESM-2. BLOSUM62-guided adaptation matches or exceeds full fine-tuning on SignalP6 benchmarks while training approximately 3.6% of model parameters and using approximately 43% less peak GPU memory at the 3B-parameter scale.

~3.6%of parameters trained

~43%less peak GPU memory at 3B scale

Multi-seedevaluation and sensitivity analysis

My contribution focuses on training and evaluation design. I developed multi-seed protocols and a one-factor sensitivity analysis; several single-seed gains did not replicate, underscoring the importance of controlled comparison.

Under review at an EMNLP 2026 workshop.

Request the manuscript

Data and Representation

Learning from code-mixed language data

This project compared task-specific multilingual fine-tuning with zero-shot general-purpose models on Hinglish named-entity recognition, where languages and scripts are mixed within the same data.

Hinglish Named Entity Recognition Benchmark

Research project · Fall 2024

I fine-tuned mBERT and XLM-RoBERTa on the COMI-LINGUA dataset, then evaluated them against GPT-4o and Claude 3.5 Sonnet baselines under a common protocol. Fine-tuned XLM-RoBERTa achieved 78% entity-level F1, compared with 76% for the GPT-4o baseline.

Repository Read the report