Browsing by Author "Kumar, Dhruv"

Now showing 1 - 20 of 29

AggFirstJoin: Optimizing Geo-Distributed Joins using Aggregation-Based Transformations
(IEEE, 2023-07) Kumar, Dhruv
Today, data is generated in a geographically distributed manner in a wide variety of domains such as social networks, e-commerce, search engines, online advertisements, audio and video streaming, energy, smart cities, IoT sensors etc. Consequently, this data is stored across geographically distributed edges and data centers (DCs) near to the end-users and end-devices, the very sources of this data. Analyzing this geographically distributed data is challenging primarily due to two reasons: 1) constrained and costly WAN bandwidth links which connect the geo-distributed edges and DCs (henceforth collectively called as sites) [1], and 2) limited compute availability at each site
AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics
(IEEE, 2021) Kumar, Dhruv
Large-scale real-time analytics services continuously collect and analyze data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially aggregating the data closer to end-users. We propose aggregation networks for performing aggregation on a geo-distributed edge-cloud infrastructure consisting of edge servers, transit and destination DCs. We identify a rich set of research questions aimed at reducing the traffic costs in an aggregation network. We present an optimization formulation for solving these questions in a principled manner, and use insights from the optimization solutions to propose an efficient, near-optimal practical heuristic. We implement the heuristic in AggNet, built on top of Apache Flink. We evaluate our approach using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai shows that our approach is able to achieve 47% to 83% reduction in traffic cost over existing baselines without any compromise in timeliness.
AI as a Medical Ally: Evaluating ChatGPT's Usage and Impact in Indian Healthcare
(2024-01) Kumar, Dhruv
This study investigates the integration and impact of Large Language Models (LLMs), like ChatGPT, in India's healthcare sector. Our research employs a dual approach, engaging both general users and medical professionals through surveys and interviews respectively. Our findings reveal that healthcare professionals value ChatGPT in medical education and preliminary clinical settings, but exercise caution due to concerns about reliability, privacy, and the need for cross-verification with medical references. General users show a preference for AI interactions in healthcare, but concerns regarding accuracy and trust persist. The study underscores the need for these technologies to complement, not replace, human medical expertise, highlighting the importance of developing LLMs in collaboration with healthcare providers. This paper enhances the understanding of LLMs in healthcare, detailing current usage, user trust, and improvement areas. Our insights inform future research and development, underscoring the need for ethically compliant, user-focused LLM advancements that address healthcare-specific challenges.
Analyzing LLM usage in an advanced computing class in india
(ACM Digital Library, 2025-04) Kumar, Dhruv
This study examines the use of large language models (LLMs) by undergraduate and graduate students for programming assignments in advanced computing classes. Unlike existing research, which primarily focuses on introductory classes and lacks in-depth analysis of actual student-LLM interactions, our work fills this gap. We conducted a comprehensive analysis involving 411 students from a Distributed Systems class at an Indian university, where they completed three programming assignments and shared their experiences through Google Form surveys and interviews. Our findings reveal that students leveraged LLMs for a variety of tasks, including code generation, debugging, conceptual inquiries, and test case creation. They employed a spectrum of prompting strategies, ranging from basic contextual prompts to advanced techniques like chain-of-thought prompting and iterative refinement. While students generally viewed LLMs as beneficial for enhancing productivity and learning, we noted a concerning trend of over-reliance, with many students submitting entire assignment descriptions to obtain complete solutions. Given the increasing use of LLMs in the software industry, our study highlights the need to update undergraduate curricula to include training on effective prompting strategies and to raise awareness about the benefits and potential drawbacks of LLM usage in academic settings.
Automated type annotation in Python using large language models
(2025-08) Kumar, Dhruv
Type annotations in Python enhance maintainability and error detection. However, generating these annotations manually is error prone and requires extra effort. Traditional automation approaches like static analysis, machine learning, and deep learning struggle with limited type vocabularies, behavioral over approximation, and reliance on large labeled datasets. In this work, we explore the use of LLMs for generating type annotations in Python. We develop a generate check repair pipeline: the LLM proposes annotations guided by a Concrete Syntax Tree representation, a static type checker (Mypy) verifies them, and any errors are fed back for iterative refinement. We evaluate four LLM variants: GPT 4oMini, GPT 4.1mini (general-purpose), and O3Mini, O4Mini (reasoning optimized), on 6000 code snippets from the ManyTypes4Py benchmark. We first measure the proportion of code snippets annotated by LLMs for which MyPy reported no errors (i.e., consistent results): GPT 4oMini achieved consistency on 65.9% of cases (34.1% inconsistent), while GPT 4.1mini, O3Mini, and O4Mini each reached approximately 88.6% consistency (around 11.4% failures). To measure annotation quality, we then compute exact-match and base-type match accuracies over all 6000 snippets: GPT 4.1mini and O3Mini perform the best, achieving up to 70.5% exact match and 79.1% base type accuracy, requiring under one repair iteration on average. Our results demonstrate that general-purpose and reasoning optimized LLMs, without any task specific fine tuning or additional training can be effective in generating consistent type this http URL perform competitively with traditional deep learning techniques which require large labeled dataset for training. While our work focuses on Python, the pipeline can be extended to other optionally typed imperative languages like Ruby
Can ChatGPT Play the Role of a Teaching Assistant in an Introductory Programming Course?
(2024-01) Kumar, Dhruv
The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include (1) grading student code submissions, and (2) providing feedback to undergraduate students in an introductory programming course. Firstly, we assess ChatGPT's proficiency in grading student code submissions using a given grading rubric and compare its performance with the grades assigned by human TAs. Secondly, we analyze the quality and relevance of the feedback provided by ChatGPT. This evaluation considers how well ChatGPT addresses mistakes and offers suggestions for improvement in student solutions from both code correctness and code quality perspectives. We conclude with a discussion on the implications of integrating ChatGPT into computing education for automated grading, personalized learning experiences, and instructional support.
ChatGPT in the Classroom: An Analysis of Its Strengths and Weaknesses for Solving Undergraduate Computer Science Questions
(ACM Digital Library, 2024) Kumar, Dhruv
This research paper aims to analyze the strengths and weaknesses associated with the utilization of ChatGPT as an educational tool in the context of undergraduate computer science education. ChatGPT's usage in tasks such as solving assignments and exams has the potential to undermine students' learning outcomes and compromise academic integrity. This study adopts a quantitative approach to demonstrate the notable unreliability of ChatGPT in providing accurate answers to a wide range of questions within the field of undergraduate computer science. While the majority of existing research has concentrated on assessing the performance of Large Language Models in handling programming assignments, our study adopts a more comprehensive approach. Specifically, we evaluate various types of questions such as true/false, multi-choice, multi-select, short answer, long answer, design-based, and coding-related questions. Our evaluation highlights the potential consequences of students excessively relying on ChatGPT for the completion of assignments and exams, including self-sabotage. We conclude with a discussion on how can students and instructors constructively use ChatGPT and related tools to enhance the quality of instruction and the overall student experience.
A Comparative Analysis of Large Language Models for Code Documentation Generation
(2023-12) Kumar, Dhruv
This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different levels of code documentation. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring Starchat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source/source-available LLMs, namely LLama 2 and StarChat. Considering the time taken for generation, GPT-4 demonstrated the longest duration, followed by Llama2, Bard, with ChatGPT and Starchat having comparable generation times. Additionally, file level documentation had a considerably worse performance across all parameters (except for time taken) as compared to inline and function level documentation.
Comuniqa : Exploring Large Language Models for improving speaking skills
(2024-05) Kumar, Dhruv
In this paper, we investigate the potential of Large Language Models (LLMs) to improve English speaking skills. This is particularly relevant in countries like India, where English is crucial for academic, professional, and personal communication but remains a non-native language for many. Traditional methods for enhancing speaking skills often rely on human experts, which can be limited in terms of scalability, accessibility, and affordability. Recent advancements in Artificial Intelligence (AI) offer promising solutions to overcome these limitations. We propose Comuniqa, a novel LLM-based system designed to enhance English speaking skills. We adopt a human-centric evaluation approach, comparing Comuniqa with the feedback and instructions provided by human experts. In our evaluation, we divide the participants in three groups: those who use LLM-based system for improving speaking skills, those guided by human experts for the same task and those who utilize both the LLM-based system as well as the human experts. Using surveys, interviews, and actual study sessions, we provide a detailed perspective on the effectiveness of different learning modalities. Our preliminary findings suggest that while LLM-based systems have commendable accuracy, they lack human-level cognitive capabilities, both in terms of accuracy and empathy. Nevertheless, Comuniqa represents a significant step towards achieving Sustainable Development Goal 4: Quality Education by providing a valuable learning tool for individuals who may not have access to human experts for improving their speaking skills.
Debatebench: a challenging long context reasoning benchmark for large language models
(2025-02) Kumar, Dhruv
We introduce DebateBench, a novel dataset consisting of an extensive collection of transcripts and metadata from some of the world's most prestigious competitive debates. The dataset consists of British Parliamentary debates from prestigious debating tournaments on diverse topics, annotated with detailed speech-level scores and house rankings sourced from official adjudication data. We curate 256 speeches across 32 debates with each debate being over 1 hour long with each input being an average of 32,000 tokens. Designed to capture long-context, large-scale reasoning tasks, DebateBench provides a benchmark for evaluating modern large language models (LLMs) on their ability to engage in argumentation, deliberation, and alignment with human experts. To do well on DebateBench, the LLMs must perform in-context learning to understand the rules and evaluation criteria of the debates, then analyze 8 seven minute long speeches and reason about the arguments presented by all speakers to give the final results. Our preliminary evaluation using GPT o1, GPT-4o, and Claude Haiku, shows that LLMs struggle to perform well on DebateBench, highlighting the need to develop more sophisticated techniques for improving their performance.
From Cash to Cashless: UPI’s Impact on Spending Behavior among Indian Users
(2024-01) Kumar, Dhruv
The emergence of digital payment systems has transformed how individuals conduct financial transactions, offering convenience, security, and efficiency. One groundbreaking innovation making waves in the Indian financial landscape is the Unified Payments Interface (UPI), developed by the National Payments Corporation of India (NPCI). Existing work has explored how digital payments benefit a country’s economy and GDP. However, our study explores how the introduction of UPI has influenced spending behavior among Indian users on an ”individual” level. We gathered 235 valid survey responses encompassing diverse demographics and conducted semi-structured interviews with 20 survey respondents. Approximately 75% of the survey respondents reported increased spending due to UPI, with only 7% indicating reduced spending. Significantly, 91.5% of the respondents reported satisfaction with their UPI usage. Also 95.2% of the survey respondents found making payments via UPI convenient. Our research also provides suggestions for UPI applications and various stakeholders to enhance digital payment systems, enabling users to make informed decisions and fostering responsible financial management.
From Cash to Cashless: UPI's Impact on Spending Behavior Among Indian Users and Prototyping Financially Responsible Interfaces
(2024) Kumar, Dhruv
Unified Payments Interface (UPI) is a groundbreaking innovation making waves in digital payment systems in India. It has revolutionised financial transactions by offering enhanced convenience and security. While previous research has primarily focused on the macroeconomic effects of digital payments, our study examines UPI's impact on individual spending behavior. Through a survey of 276 respondents and 20 follow-up interviews, we found that approximately 75% of participants reported increased spending due to UPI. Many attributed this to UPI's intangible nature, which reduced feelings of guilt typically associated with spending. Additionally, participants provided suggestions to improve the user experience of existing UPI applications. Utilizing this feedback, we developed a high-fidelity prototype based on a popular UPI app in India and conducted usability testing with 34 participants. The insights gathered from this testing shaped the final prototype and its features. This study offers valuable design recommendations for UPI app developers and other stakeholders.
HACCS: Heterogeneity-Aware Clustered Client Selection for Accelerated Federated Learning
(IEEE, 2022) Kumar, Dhruv
Federated Learning is a machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. While this technique avoids the cost of transferring data to a central location and achieves a strong degree of privacy, it presents additional challenges due to the heterogeneous hardware resources available for training. Furthermore, data is not independent and identically distributed (IID) across all edge devices, resulting in statistical heterogeneity across devices. Due to these constraints, client selection strategies play an important role for timely convergence during model training. Existing strategies ensure that each individual device is included, at least periodically, in the training process. In this work, we propose HACCS, a Heterogeneity-Aware Clustered Client Selection system that identifies and exploits the statistical heterogeneity by representing all distinguishable data distributions instead of individual devices in the training process. HACCS is robust to individual device dropout, provided other devices in the system have similar data distributions. We propose privacy-preserving methods for estimating these client distributions and clustering them. We also propose strategies for leveraging these clusters to make scheduling decisions in a federated learning system. Our evaluation on real-world datasets suggests that our framework can provide 18% −38% reduction in time to convergence compared to the state of the art without any compromise in accuracy.
The Impact of Large Language Models on K-12 Education in Rural India: A Thematic Analysis of Student Volunteer's Perspectives
(2025-05) Kumar, Dhruv; Challa, Jagat Sesh; Ramachandran, Veena
AI-driven education, particularly Large Language Models (LLMs), has the potential to address learning disparities in rural K-12 schools. However, research on AI adoption in rural India remains limited, with existing studies focusing primarily on urban settings. This study examines the perceptions of volunteer teachers on AI integration in rural education, identifying key challenges and opportunities. Through semi-structured interviews with 23 volunteer educators in Rajasthan and Delhi, we conducted a thematic analysis to explore infrastructure constraints, teacher preparedness, and digital literacy gaps. Findings indicate that while LLMs could enhance personalized learning and reduce teacher workload, barriers such as poor connectivity, lack of AI training, and parental skepticism hinder adoption. Despite concerns over over-reliance and ethical risks, volunteers emphasize that AI should be seen as a complementary tool rather than a replacement for traditional teaching. Given the potential benefits, LLM-based tutors merit further exploration in rural classrooms, with structured implementation and localized adaptations to ensure accessibility and equity.
Investigating pedagogical teacher and student LLM agents: genetic adaptation meets retrieval augmented generation across learning style
(2025-05) Kumar, Dhruv
Effective teaching requires adapting instructional strategies to accommodate the diverse cognitive and behavioral profiles of students, a persistent challenge in education and teacher training. While Large Language Models (LLMs) offer promise as tools to simulate such complex pedagogical environments, current simulation frameworks are limited in two key respects: (1) they often reduce students to static knowledge profiles, and (2) they lack adaptive mechanisms for modeling teachers who evolve their strategies in response to student feedback. To address these gaps, \textbf{we introduce a novel simulation framework that integrates LLM-based heterogeneous student agents with a self-optimizing teacher agent}. The teacher agent's pedagogical policy is dynamically evolved using a genetic algorithm, allowing it to discover and refine effective teaching strategies based on the aggregate performance of diverse learners. In addition, \textbf{we propose Persona-RAG}, a Retrieval Augmented Generation module that enables student agents to retrieve knowledge tailored to their individual learning styles. Persona-RAG preserves the retrieval accuracy of standard RAG baselines while enhancing personalization, an essential factor in modeling realistic educational scenarios. Through extensive experiments, we demonstrate how our framework supports the emergence of distinct and interpretable teaching patterns when interacting with varied student populations. Our results highlight the potential of LLM-driven simulations to inform adaptive teaching practices and provide a testbed for training human educators in controlled, data-driven environments.
“It's not like Jarvis, but it's pretty close!” - Examining ChatGPT's Usage among Undergraduate Students in Computer Science
(ACM Digital Library, 2024-01) Kumar, Dhruv
Large language models (LLMs) such as ChatGPT and Google Bard have garnered significant attention in the academic community. Previous research has evaluated these LLMs for various applications such as generating programming exercises and solutions. However, these evaluations have predominantly been conducted by instructors and researchers, not considering the actual usage of LLMs by students. This study adopts a student-first approach to comprehensively understand how undergraduate computer science students utilize ChatGPT, a popular LLM, released by OpenAI. We employ a combination of student surveys and interviews to obtain valuable insights into the benefits, challenges, and suggested improvements related to ChatGPT. Our findings suggest that a majority of students (over 57%) have a convincingly positive outlook towards adopting ChatGPT as an aid in coursework-related tasks. However, our research also highlights various challenges that must be resolved for long-term acceptance of ChatGPT amongst students. The findings from this investigation have broader implications and may be applicable to other LLMs and their role in computing education.
Loop invariant generation: a hybrid framework of reasoning optimised LLMs and SMT solvers
(2025-08) Kumar, Dhruv
Loop invariants are essential for proving the correctness of programs with loops. Developing loop invariants is challenging, and fully automatic synthesis cannot be guaranteed for arbitrary programs. Some approaches have been proposed to synthesize loop invariants using symbolic techniques and more recently using neural approaches. These approaches are able to correctly synthesize loop invariants only for subsets of standard benchmarks. In this work, we investigate whether modern, reasoning-optimized large language models can do better. We integrate OpenAI's O1, O1-mini, and O3-mini into a tightly coupled generate-and-check pipeline with the Z3 SMT solver, using solver counterexamples to iteratively guide invariant refinement. We use Code2Inv benchmark, which provides C programs along with their formal preconditions and postconditions. On this benchmark of 133 tasks, our framework achieves 100% coverage (133 out of 133), outperforming the previous best of 107 out of 133, while requiring only 1-2 model proposals per instance and 14-55 seconds of wall-clock time. These results demonstrate that LLMs possess latent logical reasoning capabilities which can help automate loop invariant synthesis. While our experiments target C-specific programs, this approach should be generalizable to other imperative languages.
OpineBot: Class Feedback Reimagined Using a Conversational LLM
(2024-01) Kumar, Dhruv
Conventional class feedback systems often fall short, relying on static, unengaging surveys offering little incentive for student participation. To address this, we present OpineBot, a novel system employing large language models (LLMs) to conduct personalized, conversational class feedback via chatbot interface. We assessed OpineBot's effectiveness in a user study with 20 students from an Indian university's Operating-Systems class, utilizing surveys and interviews to analyze their experiences. Findings revealed a resounding preference for OpineBot compared to conventional methods, highlighting its ability to engage students, produce deeper feedback, offering a dynamic survey experience. This research represents a work in progress, providing early results, marking a significant step towards revolutionizing class feedback through LLM-based technology, promoting student engagement, and leading to richer data for instructors. This ongoing research presents preliminary findings and marks a notable advancement in transforming classroom feedback using LLM-based technology to enhance student engagement and generate comprehensive data for educators.
Revieweval: an evaluation framework for ai-generated reviews
(2025) Kumar, Dhruv
The escalating volume of academic research, coupled with a shortage of qualified reviewers, necessitates innovative approaches to peer review. While large language model (LLMs) offer potential for automating this process, their current limitations include superficial critiques, hallucinations, and a lack of actionable insights. This research addresses these challenges by introducing a comprehensive evaluation framework for AI-generated reviews, that measures alignment with human evaluations, verifies factual accuracy, assesses analytical depth, and identifies actionable insights. We also propose a novel alignment mechanism that tailors LLM-generated reviews to the unique evaluation priorities of individual conferences and journals. To enhance the quality of these reviews, we introduce a self-refinement loop that iteratively optimizes the LLM's review prompts. Our framework establishes standardized metrics for evaluating AI-based review systems, thereby bolstering the reliability of AI-generated reviews in academic research.
The role of generative AI tools in shaping mechanical engineering education from an undergraduate perspective
(Springer Nature, 2025-03) Challa, Jagat Sesh; Kumar, Dhruv
This study evaluates the effectiveness of three leading generative AI tools-ChatGPT, Gemini, and Copilot-in undergraduate mechanical engineering education using a mixed-methods approach. The performance of these tools was assessed on 800 questions spanning seven core subjects, covering multiple-choice, numerical, and theory-based formats. While all three AI tools demonstrated strong performance in theory-based questions, they struggled with numerical problem-solving, particularly in areas requiring deep conceptual understanding and complex calculations. Among them, Copilot achieved the highest accuracy (60.38%), followed by Gemini (57.13%) and ChatGPT (46.63%). To complement these findings, a survey of 172 students and interviews with 20 participants provided insights into user experiences, challenges, and perceptions of AI in academic settings. Thematic analysis revealed concerns regarding AI’s reliability in numerical tasks and its potential impact on students’ problem-solving abilities. Based on these results, this study offers strategic recommendations for integrating AI into mechanical engineering curricula, ensuring its responsible use to enhance learning without fostering dependency. Additionally, we propose instructional strategies to help educators adapt assessment methods in the era of AI-assisted learning. These findings contribute to the broader discussion on AI’s role in engineering education and its implications for future learning methodologies.