Skip to content

lucaslin2020/DATA7703

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DATA7703

As we all know, the emergence of e-mail has greatly facilitated people's daily life. However, it also brings some troubles like privacy and spam. Nowadays, spam is getting worse, and we believe that most people have received spam e-mails, and some of them have been deceived by it. Besides, spam also could consume network resources and productivity, and cause the disclosure of personal information, so it is vital to find a way to solve this problem. Besides, some famous mailbox applications such as Gmail or Outlook have such function to divide e-mails into essential or less critical. These applications also can distinguish spam; however, some ham can be classified as spam, which usually leads to some unexpected problems. Spam is a type of mail that is relatively easy to recognize and has some significant text characteristics as below: Grammatical or spelling errors. Harassing e-mails are often loosely written and often have misspelt words and grammatical errors. Leading words. Spam e-mails often have a lot of leading words, such as "winning", to induce users to click on relevant links or visit relevant websites. The sender address is unknown. The e-mail address of the sender is unknown and does not belong to the internal suffix of the organization. Generally, the e-mail address of the sender with unfamiliar suffix often belongs to the category of spam. The amount of spam is often huge and is often difficult to classify manually, so it needs to be classified automatically, and the machine learning method is the most commonly used in such case. In order to distinguish spam, the above characteristics can be used as criteria. In the machine learning method, support vector machine and random forest algorithm can be used to train the model. In this project, the main target is to train an appropriate model for spam classification by adjusting parameters.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages