As we all know, the emergence of e-mail has greatly facilitated people's daily life. However, it also brings some troubles like privacy and spam. Nowadays, spam is getting worse, and we believe that most people have received spam e-mails, and some of them have been deceived by it. Besides, spam also could consume network resources and productivity, and cause the disclosure of personal information, so it is vital to find a way to solve this problem. Besides, some famous mailbox applications such as Gmail or Outlook have such function to divide e-mails into essential or less critical. These applications also can distinguish spam; however, some ham can be classified as spam, which usually leads to some unexpected problems. Spam is a type of mail that is relatively easy to recognize and has some significant text characteristics as below: Grammatical or spelling errors. Harassing e-mails are often loosely written and often have misspelt words and grammatical errors. Leading words. Spam e-mails often have a lot of leading words, such as "winning", to induce users to click on relevant links or visit relevant websites. The sender address is unknown. The e-mail address of the sender is unknown and does not belong to the internal suffix of the organization. Generally, the e-mail address of the sender with unfamiliar suffix often belongs to the category of spam. The amount of spam is often huge and is often difficult to classify manually, so it needs to be classified automatically, and the machine learning method is the most commonly used in such case. In order to distinguish spam, the above characteristics can be used as criteria. In the machine learning method, support vector machine and random forest algorithm can be used to train the model. In this project, the main target is to train an appropriate model for spam classification by adjusting parameters.
-
Notifications
You must be signed in to change notification settings - Fork 0
lucaslin2020/DATA7703
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published