International Journal of Emerging Research in Science, Engineering, and Management
Vol. 2, Issue 4, pp. 01-07, April 2026.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Enhanced Online Recruitment Fraud Detection Using Transform- Based Deep Learning Model
R G Kumar
C Sandhya
N Vishnu Vardhan
M Varsha
K Sandeep
1Professor, Department of CSE, Siddharth Institute of Engineering & Technology, Puttur, AP, India.
2-5UG Scholar, Department of CSE, Siddharth Institute of Engineering & Technology, Puttur, AP, India.
Abstract: Online recruitment platforms have significantly transformed the job search process by improving connectivity between employers and job seekers. However, this rapid growth has also increased the vulnerability of such platforms to fraudulent job postings designed to deceive applicants through misleading offers, unrealistic benefits, and unauthorized payment requests. Recruitment fraud has emerged as a serious cybersecurity and social engineering concern, particularly affecting students, recent graduates, and unemployed individuals. Traditional detection approaches based on rule-based filtering techniques and classical machine learning models such as Logistic Regression, Support Vector Machines, and Random Forests demonstrate limited adaptability to evolving fraud patterns and depend heavily on manually engineered features. These approaches often fail to capture contextual and semantic relationships within unstructured job descriptions, resulting in reduced detection accuracy and poor generalization. To overcome these limitations, a Transformer-based deep learning architecture using DistilBERT is employed to automatically classify job postings as genuine or fraudulent. The methodology incorporates an end-to-end pipeline including text cleaning, normalization, tokenization, contextual feature extraction, and transformer-based classification using self-attention mechanisms. The model effectively captures long-range dependencies and linguistic indicators commonly associated with fraudulent postings, such as urgency-driven language, vague job descriptions, and payment-related requests. Experimental evaluation conducted on a publicly available dataset containing 17,880 job postings demonstrates strong discriminative performance under class-imbalanced conditions, achieving high accuracy, precision, recall, and AUC-ROC scores. High precision ensures minimal false alarms for legitimate postings, while the strong AUC-ROC performance indicates reliable classification capability across different decision thresholds.
Keywords: Recruitment fraud detection, Natural Language Processing, Transformer-based deep learning, DistilBERT, Job posting classification.
References:
- G V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” ResearchGate, Oct. 2019, doi: 10.48550/arXiv.1910.01108.
- T. Wolf et al., “HuggingFace’s Transformers: State-of-the-art natural language processing,” arXiv (Cornell University), Oct. 2019, doi: 10.48550/arxiv.1910.03771.
- Y. Goldberg, “A primer on neural network models for natural language processing,” ResearchGate, Oct. 2015, doi: 10.48550/arXiv.1510.00726.
- AI Access FoundationPUB6570, “SMOTE: synthetic minority over-sampling technique: Journal of Artificial Intelligence Research: Vol 16, No 1,” Journal of Artificial Intelligence Research, doi: 10.5555/1622407.1622416.
- R. R. Popat and J. Chaudhary, “A Survey on Credit Card Fraud Detection Using Machine Learning,” 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2018, pp. 1120-1125, doi: 10.1109/ICOEI.2018.8553963.
- E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, and O. E. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems,” Heliyon, vol. 5, no. 6, p. e01802, Jun. 2019, doi: 10.1016/j.heliyon.2019.e01802.
- M. Hasan, M. S. Rahman, M. J. M. Chowdhury, and I. H. Sarker, “CNN Based Deep Learning Modeling with Explainability Analysis for Detecting Fraudulent Blockchain Transactions,” Cyber Security and Applications, vol. 3, p. 100101, May 2025, doi: 10.1016/j.csa.2025.100101.
- R. P. Pillai, “A deep learning based hybrid model using LSTM and CNN techniques for automated internal fraud detection in banking systems,” Journal of Information Systems Engineering & Management, vol. 10, no. 40s, pp. 674–686, Apr. 2025, doi: 10.52783/jisem.v10i40s.7468.
- M. A. Uddin, M. Mahiuddin, and I. H. Sarker, “An explainable transformer-based model for phishing email detection: A large language model approach,” Computer Networks, vol. 277, p. 112061, Jan. 2026, doi: 10.1016/j.comnet.2026.112061.
- K. Mishra, H. Pagare, and K. Sharma, “A hybrid rule-based NLP and machine learning approach for PII detection and anonymization in financial documents,” Scientific Reports, vol. 15, no. 1, p. 22729, Jul. 2025, doi: 10.1038/s41598-025-04971-9.
- A. K. Praveen, R. Harsita, R. D. Murali, and S. Niveditha, “Detecting fake job posting using ML classifications and ensemble model,” Advances in Science and Technology, vol. 124, pp. 362–369, Feb. 2023, doi: 10.4028/p-hdm12o.
- S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” in Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv (Cornell University), Jan. 2013, doi: 10.48550/arxiv.1301.3781.
