Ibrahim Al Azher - Doctor on the way - DATALab - Northern Illinois University

Biography

I am a PhD student in Computer Science at Northern Illinois University (Aug 2023–present) and a Research Assistant in the Data Lab. My research centers on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) for scientific document processing—including automated extraction and generation of limitations and future work, evaluation with hybrid (NLP + LLM-as-Judge) metrics, and multi-agent pipelines.

I’ve presented posters and talks across the Midwest and published in venues such as EMNLP, JCDL, IEEE BigData, DSAA, and workshops at ICSE and ACL. Recent projects include benchmarking limitation extraction/generation (BAGELS), dual-stage RAG with LLM re-ranking for limitation generation (LimAgents), and RAG-based future-work generation (FutureGen).

Publications

I. A. Azher, M. J. Mokarrama, Z. Guo, S. R. Choudhury, H. Alhoori. “BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text.” The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. We proposed BAGELS, a benchmark dataset and framework for automated extraction and generation of limitations from scholarly texts. We framed a new evaluation metrics to evluate coverage of LLM generated limitations with ground truth. 📄 PDF
I. A. Azher, V. R. Seethi, A. P. Akella, H. Alhoori. “LimTopic: LLM-based Topic Modeling and Text Summarization for Analyzing Scientific Articles’ Limitations.” 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL), December 2024, Hong Kong. We extracted limitations from research articles and applied LLM-based topic modeling integrated with BERTopic to generate titles and topic sentences. LLM summarization produced concise and generalizable “Topic Summaries.” The method combines topic modeling, prompt engineering, and LLM fine-tuning for better comprehension of scientific limitations, which shows better performance than zero-shot LLM, fine-tuning LLM, or topic modeling alone. 📄 PDF
I. A. Azher, H. Alhoori. “Generating Suggestive Limitations from Research Articles Using LLM and Graph-Based Approach.” 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL), December 2024, Hong Kong. We proposed a graph-augmented LLM approach to generate limitations by integrating Retrieval-Augmented Generation (RAG) with citation and semantic graphs. This framework highlights context-aware limitations beyond simple extraction. 📄 PDF
I. A. Azher, M. J. Mokarrama, Z. Guo, S. R. Choudhury, H. Alhoori. “FutureGen: A RAG-based Approach to Generate the Future Work of Scientific Articles.” International Workshop on AI for Scientific Communication (Ai4SC), IEEE eScience 2025, Chicago. We developed FutureGen, an LLM-based RAG framework with self-feedback for generating high-quality future work suggestions from research articles. 📄 PDF
I. A. Azher, H. Alhoori. “Mitigating Visual Limitations of Research Papers.” 2024 IEEE International Conference on Big Data (IEEE BigData), December 2024, Washington D.C. We focused on visual limitations in scholarly papers (charts, graphs, diagrams), using multimodal LLMs (Qwen, Llava, LLaMA, GPT-4) to generate clear descriptions. Evaluation employed LLM feedback and LLM-as-a-judge. 📄 PDF
I. A. Azher, H. Alhoori. “Generating Suggestive Limitations from Research Articles Using LLM and Graph-Based Approach.” 11th IEEE International Conference on Data Science and Advanced Analytics (DSAA), October 2024, San Diego, CA. We built a knowledge graph from citations and semantic relations, then combined it with LLMs to recommend paper limitations, incorporating subgraphs into the generative process.
I. A. Azher, S. Ahmed, M. S. Islam. “Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism.” 24th International Conference on Computer and Information Technology (ICCIT), December 2021. Applied Bi-LSTM with attention to predict authorship in Bengali literature, achieving strong accuracy in literary style classification. 📄 PDF
T. Azad, I. A. Azher, S. R. Choudhury, H. Alhoori. “Predicting Scholarly Impact with Retrieval-Augmented LLMs.” Association of Computational Linguistics (ACL), Scholarly Document Processing (SDProc), 2025, Vienna, Austria. Introduced a retrieval-augmented LLM pipeline for predicting scholarly impact, integrating metadata, citations, and content signals. 📄 PDF
M. Shahzad, J. Wilson, I. A. Azher, H. Alhoori, M. Rahimi. “From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks.” 2nd International Workshop on Large Language Models for Code (LLM4Code), ICSE 2025. We explored code generation with LLMs for security frameworks (CAPEC, CWE), bridging theoretical vulnerabilities with practical implementation. 📄 PDF
C. Muasher-Kerwin, M. C. Hughes, M. Foster, I. A. Azher, H. Alhoori. “Exploring Large Language Models for Summarizing and Interpreting an Online Brain Tumor Support Forum.” Sage Journal. Investigated LLMs for summarizing and interpreting patient discourse in online medical forums, providing insights for healthcare support communities. 📄 PDF
M. S. R. Chowdhury, N. H. Khan, D. Singha, I. A. Azher, T. Ahmed, G. P. Shashi. “Leveraging Self-Sovereign Identity (SSI) with Hyperledger Indy: A Decentralized Identity Ecosystem for Secure Document Management in Bangladesh.” 27th International Conference on Computer and Information Technology (ICCIT), December 2024, Cox’s Bazar, Bangladesh. Proposed a decentralized SSI ecosystem using Hyperledger Indy for secure national-scale document management. 📄 PDF
H. Verma, M. J. Mokarrama, I. A. Azher, H. Alhoori. “A Comparative Study of ORKG and LLM Identified Research Contributions.” 2nd Workshop on Innovation Measurement for Scientific Communication (IMSC), JCDL 2024, Hong Kong. Compared ORKG vs LLM-based contribution identification, showing how LLMs can enhance structured research knowledge curation.
I. A. Azher “Limitations of Scientific Articles and Navigated Future Directions with LLM and RAG.” Master’s Thesis, Northern Illinois University, May 2025. Comprehensive thesis on automated limitation extraction and future work generation using topic modeling, RAG, and multi-agent LLM pipelines. 📄 PDF