Publications and Pre-prints

My contributions to the research community, featuring AI-generated audio summaries.

Workshop NeurIPS'25 Workshop on Foundations of Reasoning in Language Models (FoRLM@NeurIPS'25)

COSMIR: Chain Orchestrated Structured Memory for Iterative Reasoning

Gupta, N., Gowaikar, S., Iyer, A., Shiragur, K., Bairi, R. B., Maurya, R., Maiti, R., Damle, S., Mishra Gupta, S.
Read Abstract

Reasoning over very long inputs remains difficult for LLMs. We introduce COSMIR, a chain-style framework that replaces ad hoc messages with a structured memory. A Planner agent turns a query into sub-questions, and worker agents process chunks via a fixed micro-cycle (Extract, Infer, Refine). This yields higher faithfulness and better long-range aggregation on datasets like HELMET.

Workshop AAAI'25 Workshop on AI for Science (AI2SE@AAAI'25)

An Agentic Approach to Automatic Creation of P&ID Diagrams

Gowaikar, S., Iyengar, S., Segal, S., Kalyanaraman, S.
Read Abstract

We introduce a novel copilot for automating the generation of P&IDs from natural language descriptions. Leveraging a multi-step agentic workflow, our copilot provides a structured and iterative approach to diagram creation directly from Natural Language prompts.

Conference AAAI/ACM AI Ethics and Society'25 (AIES'25)

From Efficiency to Equity: Measuring Fairness in Preference Learning

Gowaikar, S., Berard, H., Mushkani, R., Koseki, S.
Read Abstract

We introduce a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient and Atkinson Index to quantify fairness in these models.

Workshop CVPR'24 Workshop on Responsible Data

AI-EDI-SPACE: A Co-designed Dataset for Public Spaces

Gowaikar, S., Berard, H., Mushkani, R., Marchand, E., Ammar, T., Koseki, S.
Read Abstract

We propose a methodology involving a co-design model that actively engages stakeholders, integrating principles of Equity, Diversity, and Inclusion (EDI). We apply this to develop a dataset and AI model for evaluating public space quality using street view images, demonstrating effectiveness in capturing diverse perspectives.

Conference Findings of ACL'23

ANALOGICAL: A Novel Benchmark for Long Text Analogy Evaluation

Wijesiriwardene, T., Wickramarachchi, R., Gajera, B., Gowaikar, S., Gupta, C., Chadha, A., Reganti, A., Sheth, A., Das, A.
Read Abstract

Over the past decade, analogies have played a significant role as an intrinsic measure of evaluating word embeddings. We present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text. Using thirteen datasets, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space.