ns.

Publications

Below you'll find my published papers, preprints, and technical reports. Each explores different aspects of ML research - from language models, computer vision to medical AI.

StackEval: Benchmarking LLMs in Coding Assistance

We introduce two coding benchmarks - StackEval and StackUnseen - to evaluate language models' performance on real programming tasks, along with a comprehensive framework to assess how well LLMs can judge coding solutions.

StackEval: Benchmarking LLMs in Coding Assistance