About

Contact
I am currently working as a Lecturer in the Department of Computer Science and Engineering at Uttara University. I completed my Bachelor of Science (BSc) in Computer Science and Engineering from Chittagong University of Engineering and Technology (CUET) in 2024.

My research interests are rooted at the intersection of Natural Language Processing (NLP), Large Language Models (LLMs), Multimodal & Multilingual NLP, Machine Learning (ML) and Deep Learning (DL). I am also passionate about addressing the challenges of building Trustworthy AI and exploring responsible AI applications in healthcare to ensure safe, ethical, and effective systems. The internal architecture of LLMs, particularly attention mechanisms fascinates me and I am dedicated to optimizing these systems for meaningful real-world impact.

Beyond academia, I enjoy traveling while listening to music and consider myself a movie buff—always eager to discover new destinations or immerse myself in great films. I believe that exploring new places and stories not only enriches my personal life but also inspires creativity and fresh perspectives in my research.

But ultimately, my research is driven by a simple purpose:
"To help people."

Interests

Natural Language Processing (NLP)

Large Language Models (LLMs)

Multimodal AI

Multilingual NLP

Machine Learning & Deep Learning

Trustworthy AI

AI in Healthcare

Attention Mechanisms

Recent News

June 27, 2025

Two Paper Accepted and Presented at ITSS-IoE2025 2025, Springer

  • MedFastReason: LoRA-Adapted LLMs with Chain-of-Thought for Disease Detection, Diagnostic and Treatment Reasoning
  • Parameter Efficient Fine-Tuned Vision LLMs for Syntax Aware Mathematical Equation Image to LaTeX Conversion
June 27, 2025

Paper Published at SlavicNLP 2025: Detecting Persuasion Techniques

  • Paper on persuasion technique detection in Slavic languages accepted at the SlavicNLP 2025 workshop.
  • Proposed and evaluated a weighted soft voting ensemble approach for robust classification in four languages: Polish, Russian, Slovenian, and Bulgarian.
June 1, 2025

Two Paper Accepted at IEEE QPAIN 2025 Conference

  • Identification of Potential Biomarkers in Acute Myeloid Leukemia Using Integrated Bioinformatics and Two-step ML-based Feature Selection Approaches
  • Identification of Potential Genomic Biomarkers in Pancreatic Cancer, Colon Cancer, and Ulcerative Colitis Using Integrated Machine Learning and Bioinformatics Approaches
May 14, 2025

Secured 2nd Position in SlavicNLP 2025 Shared Task

  • Developed a weighted soft voting ensemble of transformer models for classification.
  • Implemented a weight optimization technique to maximize overall F1 score.
April 25, 2025

Published Two Papers at AmericasNLP 2025, Albuquerque, New Mexico

  • Paper 1: Leveraged large language models for machine translation between Spanish and 13 Indigenous languages, addressing low-resource challenges with advanced multilingual models and tailored preprocessing.
  • Paper 2: Developed an LLM-based system for sentence transformation in Indigenous languages (Bribri, Guarani, Maya), utilizing fine-tuning and prompt engineering to create educational tools.
March 20, 2025

AmericasNLP 2025: Top Ranks in Shared Tasks

  • 3rd place in Shared Task 1: Machine Translation for Indigenous languages, using advanced multilingual models and tailored preprocessing to address low-resource challenges.
    (Spanish ↔ Indigenous languages, including Awajun and Quechua)
  • 2nd place in Shared Task 2: Sentence Transformation for Indigenous languages, developing LLM-based systems to transform sentences according to diverse grammatical instructions.
    (Tasks included tense, aspect, and voice changes in Bribri, Guarani, and Maya)
January 1, 2024

Joined Uttara University as Lecturer

  • Appointed as a Lecturer in the Department of Computer Science and Engineering.
  • Started teaching and contributing to academic and research activities at Uttara University.
December 9, 2024

Completed IELTS with Overall Band Score 7

  • Section scores: Reading 7.5, Listening 7.0, Writing 6.5, Speaking 7.0
  • Overall band score: 7.0

Education

Online Certification

Generative AI Engineering

Transformer Models for NLP

Basic Python Programming

Hugging Face LLMs Fundamentals

Python Data Analysis

AI for Medical Diagnosis

Teaching & Research Experience

Uttara University

January 2025 - Present

Lecturer

  • Spring 2025: Taught Microprocessor & Assembly Language (theory and lab), Operating System (theory and lab), and Statistics and Queuing Theory.
  • Summer 2025: Will teach Automata Theory and Routing & Networking.

CUETMLResearchGroup

May 2022 - Jun 2024

Research Assistant & Mentorship

Under the supervision of Assistant Prof. Hasan Murad

  • Learned and taught core concepts in Machine Learning and Deep Learning algorithms.
  • Explored various attention mechanisms and the vanilla Transformer architecture.
  • Gained hands-on experience in fine-tuning NLP models for research and practical applications.

Industry Experience

Spectrum Engineering Consortium (Pvt.) Ltd

September 2023 - October 2023

Software Engineer Intern

  • Developed an algorithm for vehicle scheduling
  • Created backend server with proper database schemes
  • Successfully implemented UI using MERN stack

Prodigy InfoTech

August 2024 - September 2024

Machine Learning Intern

  • Completed an End-to-End Weather Temperture Prediction Application using XGBoost with Stearmlit
  • Applied unsupervised learning techniques for clustering and pattern discovery.
  • Conducted comprehensive data analysis using Python and relevant libraries

Projects

  • All
  • Web-App
  • Project

NarutoVerse

Explorehub

CloudTask Optimizer

FingerMatch

DualLingoDetect

DualLingoDetect: AI Text Detect

BanglaTranslation

Football Analyzer

GenreGenius: Book Classifier

Year 2025

Transformer-Enabled Reinforcement Learning Guided Gene Co-expression Analysis for Key Biomarker Identification in Esophageal Squamous Cell Carcinoma

Expected to be Published:  Biomedical Signal Processing and Control

Authors: Sakib Sarker, Mahshar Yahan, Biprodip Pal, Mushahidul Islam/em>

  • Developed TRFW, a novel transformer-based deep learning framework that uses self-attention mechanisms to identify important genes in high-dimensional data. Integrated Policy Optimization to reduce redundant features and improve gene selection. Applied the method to discover biomarkers in esophageal squamous cell carcinoma.
  • Preprint: View PDF

MedFastReason: LoRA-Adapted LLMs with Chain-of-Thought for Disease Detection, Diagnostic and Treatment Reasoning

Expected to be Published:  ITSS-IoE 2025@ Springer

Authors: Mahshar Yahan, Md. Tareq Zaman, Md. Torikur Rahman, Asif Mostofa Sazid, Nasim Ahmed, Md. Shafikul Islam

  • The paper introduces MedFastReason, an integrated medical AI framework combining a disease detection model (using fine-tuned, quantized Llama-3-8B with LoRA) and a reasoning model (using Qwen-based Chain-of-Thought prompting). This two-stage system achieves high clinical accuracy (e.g., ROUGE-L 0.75, F1-token 0.85, guideline compliance 0.80) and efficiency, but error propagation between stages and challenges with ambiguous or incomplete inputs remain.
  • Preprint: View PDF

Parameter Efficient Fine-Tuned Vision LLMs for Syntax Aware Mathematical Equation Image to LaTeX Conversion

Expected to be Published:  ITSS-IoE 2025@ Springer

Authors: Mahshar Yahan, Md. Tareq Zaman, Md. Torikur Rahman, Asif Mostofa Sazid, Nasim Ahmed, Md. Shafikul Islam

  • The pipeline uses parameter-efficient fine-tuning, syntax-aware augmentation, and iterative refinement for accurate mathematical equation conversion. Fine-tuned Vision-Language Model Qwen2-VL-7B outperform others in BLEU and exact match metrics.
  • Preprint: View PDF

Fine-Tuned Transformer-Based Weighted Soft Voting Ensemble for Persuasion Technique Classification in Slavic Languages

Authors: Mahshar Yahan, Sakib Sarker, Mohammad Amanul Islam

  • The paper introduces a weighted soft voting ensemble that combines outputs from three fine-tuned transformer models for each language. The ensemble uses weight optimization by searching for the set of weights that maximizes F1-score on the validation set, ensuring each model’s strengths are proportionally reflected in the final prediction and improving persuasion technique classification in low-resource Slavic languages.
  • Preprint: View PDF

Harnessing NLP for Indigenous Language Education: Fine-Tuning Large Language Models for Sentence Transformation

Authors: Mahshar Yahan, Mohammad Amanul Islam

  • The paper explores using large language models (LLMs) for sentence transformation in Indigenous languages (Bribri, Guarani, Maya) to support language education. By fine-tuning models like Llama 3.2 and employing few-shot prompting, the study achieved strong BLEU and ChrF++ scores, especially for Maya. Results show LLMs promise for low-resource language NLP, though challenges remain due to limited data and complex linguistic features.
  • Paper PDF Link: View PDF

Leveraging Large Language Models for Spanish-Indigenous Language Machine Translation at AmericasNLP 2025

Authors: Mahshar Yahan, Mohammad Amanul Islam

  • The paper presents a machine translation system for Spanish and 13 Indigenous American languages using fine-tuned multilingual models (NLLB-200, LLaMA 3.1, XGLM). Techniques like token adjustments and dynamic batching improved performance, especially for Awajun and Quechua.
  • Paper PDF Link: View PDF

Identification of Potential Biomarkers in Acute Myeloid Leukemia Using Integrated Bioinformatics and Two-step ML-based Feature Selection Approaches

Expected to be Published:  QPAIN, IEEE Xplore 2025

Authors: Md Ibrahim Sarker Raiyan, Sakib Sarker, Emon Ahammed, Mahshar Yahan, Md. Tareq Zaman

  • This study employs integrated bioinformatics and two-step machine learning-based feature selection to identify potential biomarkers in Acute Myeloid Leukemia (AML). The approach combines computational biology techniques with advanced ML algorithms to discover novel therapeutic targets and diagnostic markers, contributing to precision medicine approaches for AML treatment and improving patient outcomes through enhanced biomarker identification methodologies.
  • Preprint: View PDF

Identification of Potential Genomic Biomarkers in Pancreatic Cancer, Colon Cancer, and Ulcerative Colitis Using Integrated Machine Learning and Bioinformatics Approaches

Expected to be Published:  QPAIN, IEEE Xplore 2025

Authors: Md Ibrahim Sarker Raiyan, Sakib Sarker, Utsha Das, Emon Ahammed, Md. Tareq Zaman, Mahshar Yahan

  • This study employs integrated machine learning and bioinformatics approaches to identify shared genomic biomarkers across pancreatic cancer, colon cancer, and ulcerative colitis. Using LASSO regression for feature selection and protein-protein interaction network analysis, the research identified three hub genes—COL11A2, COL5A2, and COL11A1—as potential diagnostic biomarkers. These genes demonstrated high AUC scores in validation datasets and showed significant upregulation in tumor tissues, offering promising targets for early detection and therapeutic intervention.
  • Preprint: View PDF

Year 2024

Golden_Duck at #SMM4H 2024: A Transformer-based Approach to Social Media Text Classification

Authors: Md. Ayon Mia, Mahshar Yahan, Hasan Murad

  • Social anxiety disorder identification and mental illness recognition were addressed in the paper. For anxiety disorder, classified Reddit posts about outdoor spaces into four categories using RoBERTa-base(last four layers), achieving an F1 score of 0.596. For mental illness, classified tweets about child disorders using RoBERTa-large(mean pooling), achieving an F1 score of 0.928.
  • Paper PDF Link: View PDF

Year 2023

EmptyMind at BLP-2023 Task 2: Sentiment Analysis of Bangla Social Media Posts using Transformer-Based Models

Authors: Karnis Fatema, Udoy Das, Md. Ayon Mia, Mahshar Yahan,Md. Sajidul Mowla, Md. Fayez Ullah, Arpita Sarkar, Hasan Murad

  • Explored sentiment analysis for Bangla, covering Positive, Negative, and Neutral categories in social media posts, using transformer-based models. Achieved the best performance with BanglaBERT (Large), obtaining a micro F1 score of 0.7109.
  • Paper PDF Link: View PDF

EmptyMind at BLP-2023 Task 1: A Transformer-based Hierarchical-BERT Model for Bangla Violence-Inciting Text Detection

Authors: Udoy Das, Karnis Fatema, Md. Ayon Mia, Mahshar Yahan,Md. Sajidul Mowla, Md. Fayez Ullah, Arpita Sarkar, Hasan Murad

  • Focused on classifying text into nonviolence, passive violence, and direct violence using the VITD dataset. Achieved F1 score of 0.73797 with Hierarchical-BERT.
  • Paper PDF Link: View PDF

Skills

Languages and Databases

Python logo C++ logo HTML5 logo CSS3 logo MySQL logo PostgreSQL logo

Frameworks

Django Streamlit Bootstrap TensorFlow PyTorch OpenCV Gradio

Tools

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone

Contact

My Address

K-8/11 West Bilashpur Bottola

Gazipur Sadar-1700

Dhaka, Bangladesh

Social Profiles

Email

yahanmahsar1@gmail.com

mahshar@uttara.ac.bd

u1804007@student.cuet.ac.bd

Contact

+8801735996049

+8801852776929