Türkçe istenmeyen e-postaların derin öğrenme ile tespit edilmesi

Eryılmaz, Ersin Enes

dc.contributor.advisor	Kılıç, Erdal
dc.contributor.author	Eryılmaz, Ersin Enes
dc.date.accessioned	2022-11-15T06:25:24Z
dc.date.available	2022-11-15T06:25:24Z
dc.date.issued	2021	en_US
dc.date.submitted	2021
dc.identifier.citation	Eryılmaz, E.E. (2021). Türkçe istenmeyen e-postaların derin öğrenme ile tespit edilmesi. (Yüksek lisans tezi). Ondokuz Mayıs Üniversitesi, Samsun.	en_US
dc.identifier.uri	http://libra.omu.edu.tr/tezler/135964.pdf
dc.identifier.uri	https://hdl.handle.net/20.500.12712/33752
dc.description	Tam Metin / Tez	en_US
dc.description.abstract	E-postalar günümüzün en etkili iletişim araçlarından biridir. E-postaların içinde meşru e-postalar bulunduğu gibi istenmeyen e-postalar da bulunmaktadır. Yaramaz, önemsiz, gereksiz e-posta anlamına istenmeyen e-postalar internet kullanıcılarına maddi ve manevi ciddi zararlar vermekte olup internet trafiğini de meşgul etmektedir. İstenmeyen e-postaların tespiti için birçok yöntem bulunmakla birlikte mevcut çözümler çoğunlukla spam göndericilerin yenilikçiliğinin ve geliştirdiği tekniklerin gerisinde kalmaktadır. Bu tez çalışmasında literatürde bulunan istenmeyen epostaların tespitinde kullanılan yöntemler incelenmiş olup Türkçe istenmeyen e-posta tespiti için 6 farklı model önerilmiştir. 4 farklı derin öğrenme modeli Python programlama dili Keras kütüphanesi kullanılarak Spyder geliştirme ortamı ile geliştirilmiştir. Önerilen derin öğrenme modelleri RNN, LSTM, GRU ve BLSTM modelleridir. 2 farklı derin öğrenme modeli ve hiperparametre ince ayarı ile en iyi hiperparametre seçimi internet tabanlı Google Colaboratory ile geliştirilmiştir. Google Colaboratory ile test edilen derin öğrenme modelleri BERT ve DistilBERT modelleridir. Google Colaboratory ile de Tensorflow tabanlı Keras kütüphanesi kullanılmaktadır. İstenmeyen e-posta tespitinde önerilen modeller geliştirilirken 400 adet istenmeyen, 400 adet meşru olmak üzere toplam 800 adet Türkçe e-posta veri kümesi kullanılmıştır. Bu modellerden 5 katlamalı çapraz doğrulama ile BLSTM 0.0373 ile en az test kaybına sahip olup LSTM ve BLSTM istenmeyen e-posta tespitinde %99.38 başarım oranına ulaşmıştır. İnce ayarlı BERT modeli ise %98.75 başarım oranına ulaşmıştır. RNN derin öğrenme modeli için hiperparametre ince ayarı Izgara Arama tahmin edici ile yapılmıştır. Hiperparametre ince ayarı yapılarak %97.66 başarım elde edilmiştir. Ayrıca tez çalışması kapsamında 350 adet e-posta içeren yeni bir Türkçe e-posta veri kümesi oluşturulmuştur. Daha sonraki çalışmalarda bu e-posta veri kümesinin boyutu artırılarak derin öğrenme modellerinde deneyler yapılması düşünülmektedir.	en_US
dc.description.abstract	E-mails are one of today's most effective communication tools. E-mails contain legitimate e-mails as well as spam e-mails. Spam e-mails, which mean naughty, junk, unnecessary e-mails, cause serious material and moral damage to internet users and also occupy internet traffic. Although there are many methods of detecting spam emails, current solutions often fall behind the innovation and techniques developed by spammers. In this thesis, the methods used in the detection of unsolicited e-mails in the literature were examined and 6 different models were proposed for the detection of spam e-mails in Turkish. 4 different deep learning models were developed with the Spyder development environment using the Python programming language Keras library. Recommended deep learning models are RNN, LSTM, GRU and BLSTM models. With 2 different deep learning models and hyperparameter fine-tuning, the best hyperparameter selection has been developed with the internet-based Google Colaboratory. Deep learning models tested with Google Colaboratory are BERT and DistilBERT models. Tensorflow-based Keras library is also used with Google Colaboratory. While developing the suggested models for spam detection, a total of 800 Turkish e-mail data sets, 400 of which are spam and 400 are legitimate, were used. Among these models, 5-fold cross validation has the least test loss with BLSTM 0.0373, and LSTM and BLSTM have achieved 99.38% success rate in spam detection. The fine tuned BERT model has achieved 98.75% performance rate. Hyperparameter fine-tuning for the RNN deep learning model was done with the Grid Search estimator. A performance of 97.66% was achieved by fine tuning the hyperparameter. Also, a new Turkish e-mail data set containing 350 e-mails was created within the scope of the thesis study. In future studies, it is planned to increase the size of this e-mail data set and experiment with deep learning models.	en_US
dc.language.iso	tur	en_US
dc.publisher	Ondokuz Mayıs Üniversitesi Lisansüstü Eğitim Enstitüsü	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	derin öğrenme	en_US
dc.subject	makine öğrenmesi	en_US
dc.subject	RNN	en_US
dc.subject	LSTM	en_US
dc.subject	GRU	en_US
dc.subject	BLSTM	en_US
dc.subject	BERT	en_US
dc.subject	DistilBERT	en_US
dc.subject	Keras	en_US
dc.subject	Google Colaboratory	en_US
dc.subject	istenmeyen e-posta tespiti	en_US
dc.subject	hiperparametre ince ayar	en_US
dc.subject	deep learning	en_US
dc.subject	machine learning	en_US
dc.subject	RNN	en_US
dc.subject	LSTM	en_US
dc.subject	GRU	en_US
dc.subject	BLSTM	en_US
dc.subject	BERT	en_US
dc.subject	DistilBERT	en_US
dc.subject	Keras	en_US
dc.subject	Google Colaboratory	en_US
dc.subject	spam detection	en_US
dc.subject	hyperparameter fine tuning	en_US
dc.title	Türkçe istenmeyen e-postaların derin öğrenme ile tespit edilmesi	en_US
dc.title.alternative	Detection of Turkish spam email by deep learning	en_US
dc.type	masterThesis	en_US
dc.contributor.department	OMÜ, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.contributor.authorID	0000-0003-1163-970X	en_US
dc.contributor.authorID	0000-0003-1585-0991	en_US
dc.relation.publicationcategory	Tez	en_US

Files in this item

Name:: 135964.pdf
Size:: 1.765Mb
Format:: PDF
Description:: Tam Metin / Tez

View/Open

This item appears in the following Collection(s)

Yüksek Lisans Tez Koleksiyonu [26]

Show simple item record