Publications and Pre-prints

These are my contributions to the research community

Publications

1. ANALOGICAL - A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models

Type: Conference Publication | Venue: Findings of ACL'23 | Link | NotebookLM Link (This NotebookLM recap is good!)

Abstract: Over the past decade, analogies, in the form of word-level analogies, have played a signifi-cant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of com-plexity – (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analog-ical pairs in the semantic vector space. Our evaluation finds that it is increasingly challeng-ing for LLMs to identify analogies when going up the analogy taxonomy.

2. AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces

Type: Workshop Presentation | Venue: CVPR'24 Workshop on Responsible Data | Link | NotebookLM Link (This NotebookLM recap is very good!)

Abstract: Advancements in AI heavily rely on large-scale datasets meticulously curated and annotated for training. However, concerns persist regarding the transparency and context of data collection methodologies, especially when sourced through crowdsourcing platforms. Crowdsourcing often employs low-wage workers with poor working conditions and lacks consideration for the representativeness of annotators, leading to algorithms that fail to represent diverse views and perpetuate biases against certain groups. To address these limitations, we propose a methodology involving a co-design model that actively engages stakeholders at key stages, integrating principles of Equity, Diversity, and Inclusion (EDI) to ensure diverse viewpoints. We apply this methodology to develop a dataset and AI model for evaluating public space quality using street view images, demonstrating its effectiveness in capturing diverse perspectives and fostering higher-quality data.

3. An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions

Type: Workshop Presentation | Venue: AAAI'25 Workshop on AI to Accelerate Science and Engineering | Link | NotebookLM Link (This NotebookLM recap is decent!)

Abstract: The Piping and Instrumentation Diagrams (P&IDs) are foundational to the design, construction, and operation of workflows in the engineering and process industries. However, their manual creation is often labor-intensive, error-prone, and lacks robust mechanisms for error detection and correction. While recent advancements in Generative AI, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), have demonstrated significant potential across various domains, their application in automating generation of engineering workflows remains underexplored. In this work, we introduce a novel copilot for automating the generation of P&IDs from natural language descriptions. Leveraging a multi-step agentic workflow, our copilot provides a structured and iterative approach to diagram creation directly from Natural Language prompts. We demonstrate the feasibility of the generation process by evaluating the soundness and completeness of the workflow, and show improved results compared to vanilla zero-shot and few-shot generation approaches.

From Efficiency to Equity: Measuring Fairness in Preference Learning

Type: Conference Presentation | Venue: AAAI/ACM AI Ethics and Society'25 | Link | NotebookLM Link (This NotebookLM recap is decent!)

Abstract: As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diverse human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.

More information on my GoogleScholar!