-
Notifications
You must be signed in to change notification settings - Fork 10
Team 3 pull request #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pmaletic
wants to merge
104
commits into
quantargo:master
Choose a base branch
from
pmaletic:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
104 commits
Select commit
Hold shift + click to select a range
f14dc79
pmaletic: Fixed factors from csv
pmaletic 6b07896
fixing decimal errors
cojocarucosmin da2449f
Add project file
pmaletic 82e3532
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin 7b28995
new version
cojocarucosmin c6e396b
Creating new package
pmaletic e6f1dc8
Moving package structure to the top
apeterseil 25f2c6e
delete tree3 folder
apeterseil d9cc1d0
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil 192f264
movin R
pmaletic b8ed5f0
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin eb14e6a
adding text
pmaletic f74859b
delete tree3 folder
apeterseil 628b4b5
created folder data and moved bmarketing.csv
apeterseil 1ed4e58
added clean function
apeterseil 7d6ee7d
prediction function prepared
KhristenkoDaniil 5c929f9
prediction function
KhristenkoDaniil 066fd46
prediction function
KhristenkoDaniil 3a0ae35
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil a3c684e
description predictions function
KhristenkoDaniil 1091652
implemented clean function
apeterseil fb35295
RoxygenNote automatically added
apeterseil 327965c
Creating new script for model and ploting
pmaletic c1cdaa4
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic aa276d0
predictions function
KhristenkoDaniil 8c5d8f3
added bmarketing data
apeterseil 6c20dda
Function, which calculates accuracy and confusion matrix.
JosephSpejbl 63261cd
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin f5be250
predictions functions
KhristenkoDaniil 369308e
defining the transformation function
cojocarucosmin ecf1103
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin a82fffc
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil 9d2bb1a
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil c98ccb1
prediction documenttation
KhristenkoDaniil a0fbf58
Merge branch 'master' of https://github.com/pmaletic/bmarketing
JosephSpejbl 4e260d2
predictions documentation
KhristenkoDaniil 91e8272
Merge branch 'master' of https://github.com/pmaletic/bmarketing
KhristenkoDaniil e557c3c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
KhristenkoDaniil aaf5625
repaired warning
JosephSpejbl 604a96d
clear description
KhristenkoDaniil 2411c54
removed tree3 file
apeterseil 2bece8e
prediction documentation
KhristenkoDaniil 5bbf1d2
documentation added
KhristenkoDaniil 8c22aa3
generated documentation for clean and predictions
apeterseil 0712c6b
Added documentation
pmaletic ff1bca7
added Imports
apeterseil 2e7d8ca
correcting function name
cojocarucosmin 2e2d85c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin 2dec434
added data
apeterseil 24a18dd
a
pmaletic b512bc6
Updated package description.
JosephSpejbl 70d2e45
aa
pmaletic 5642058
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic 6471652
new script for ploting
pmaletic ae93637
Update README.md
JosephSpejbl 9a9f366
p
pmaletic 4e0fa67
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic 0ebd4cc
repair
JosephSpejbl f1831a8
fix1
pmaletic 903404c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic 482ef27
a
pmaletic 4436579
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic 66f943e
changed factors to numeric in bmarketdata
apeterseil 2afa24e
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil 0a2f498
removing errors in documentation examples
apeterseil 11fa11c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil 2186fc8
fix2
pmaletic b718781
aa
pmaletic 791cd5b
q
pmaletic 19e115a
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic 321aa9c
testing all functuions on one place
pmaletic 947b879
aaa
pmaletic 7ebce52
testing file
apeterseil 7a406ca
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil b2de007
deleting
pmaletic 7b852ea
updated namespace file
apeterseil a06fa41
added parameters
apeterseil 49a790e
Fixing issue #4.
apeterseil 44e08f5
fixing target variable flexibility
pmaletic 0961ff9
a
pmaletic b584c93
tr
pmaletic 1bdb1c3
Removing plot_model function as this is now included in the model fun…
apeterseil c68fe0d
correcting the function to work with any dataset
cojocarucosmin 90ba520
deleting libraries from code
pmaletic fcac128
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil 2c0aac9
Removing plot_model function as this is now included in the model fun…
apeterseil 474dc93
Add Function documentation and export
apeterseil 6c676de
numeric correction
cojocarucosmin a00aa33
changed documentation to remove check errors
apeterseil 3abf7d5
changing documentation and clean function
apeterseil 93e3277
added parameter target_name to documentation
apeterseil 5a7b6fa
added target_name parameter to documentation
apeterseil 3655d99
updated readme file
apeterseil 402a93b
changed parameter for predictions
apeterseil 43dca38
changed data parameter in predictions
apeterseil b89a2d9
Updated model_accuracy() and its documentation.
JosephSpejbl d120e6a
changed Depends and license
apeterseil 999d76e
added data parameter to documentation
apeterseil 46c2ac0
Update README.Rmd
JosephSpejbl 6e74430
changed library loading
apeterseil a4b9f5d
implementing NA replacement option for both types of data
cojocarucosmin 035f7eb
introducing dependency on DescTools library
cojocarucosmin 4906433
library dependency (DescTools)
cojocarucosmin 5c200af
update readme for accuracy
JosephSpejbl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ^.*\.Rproj$ | ||
| ^\.Rproj\.user$ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| .Rproj.user | ||
| .Rhistory | ||
| .RData | ||
| .Ruserdata |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| Package: tree3 | ||
| Title: Package aimed to preprocess and build strong decision tree based predictions for deposit buying | ||
| Version: 0.0.0.9000 | ||
| Authors@R: person("Group 3", "Data Scinece Academy", email = "first.last@example.com", role = c("aut", "cre")) | ||
| Description: What the package does (one paragraph). | ||
| Depends: R (>= 3.5.0) | ||
| Imports: tidyverse, rpart, rpart.plot, DescTools | ||
| License: GPL-2 | ||
| Encoding: UTF-8 | ||
| LazyData: true | ||
| RoxygenNote: 6.1.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Generated by roxygen2: do not edit by hand | ||
|
|
||
| export(clean) | ||
| export(model) | ||
| export(model_accuracy) | ||
| export(predictions) | ||
| export(transform) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| #' Banking Marketing Data | ||
| #' | ||
| #' A dataset containing the prices and other attributes of almost 54,000 | ||
| #' diamonds. | ||
| #' | ||
| #' @format A data frame with 4119 rows and 21 variables: | ||
| #' \describe{ | ||
| #' \item{age}{age, integer} | ||
| #' \item{job}{factor} | ||
| #' \item{marital}{factor} | ||
| #' \item{education}{factor} | ||
| #' \item{default}{factor} | ||
| #' \item{housing}{factor} | ||
| #' \item{load}{factor} | ||
| #' \item{contact}{factor} | ||
| #' \item{month}{factor} | ||
| #' \item{day_of_week}{factor} | ||
| #' \item{duration}{integer} | ||
| #' \item{campaign}{integer} | ||
| #' \item{pdays}{integer} | ||
| #' \item{previous}{integer} | ||
| #' \item{poutcome}{factor} | ||
| #' \item{emp.var.rate}{numeric} | ||
| #' \item{cons.price.idx}{numeric} | ||
| #' \item{cons.conf.idx}{numeric} | ||
| #' \item{euribor3m}{numeric} | ||
| #' \item{nr.employed}{numeric} | ||
| #' \item{y}{target variable} | ||
| #' } | ||
| #' @source marketing | ||
| "bmarketing" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| #' Cleanin Data | ||
| #' | ||
| #' \code{clean} returns a clean data set. | ||
| #' | ||
| #' @param data A data.frame containing a target variable named y | ||
| #' @param target_name A character containing the name of the target variable | ||
| #' @return A cleaned data.frame. | ||
| #' | ||
| #' This means: | ||
| #' | ||
| #' - return an error if the target variable contains any missing values (NA’s). | ||
| #' - Give clear warnings for all other variables which contain NA’s. | ||
| #' - Remove any columns (and report as warning) which contain more than 50% NA’s. | ||
| #' | ||
| #' @examples | ||
| #' data(bmarketing) | ||
| #' clean(bmarketing,"y") | ||
| #' @export | ||
|
|
||
| clean<-function(data,target_name){ | ||
|
|
||
| if(sum(is.na(data[,target_name]))>0) stop("Target variable y contains missing values (NA's)") | ||
|
|
||
| for(col in names(data)){ | ||
|
|
||
| if(sum(is.na(data[,col]))>0){ | ||
|
|
||
| if(sum(is.na(data[,col]))/nrow(data) > 0.5){ | ||
|
|
||
| data <- data[,!(names(data)==col)] | ||
| warning(paste0("Variable ",col," removed due to more than 50% NA's")) | ||
|
|
||
| }else{ | ||
| warning(paste0("Variable ",col," contains NA's")) | ||
| } | ||
|
|
||
| } | ||
|
|
||
| } | ||
|
|
||
| return(data) | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #' Runing decision tree model and ploting it results | ||
| #' | ||
| #' \code{clean} returns a decision tree model. | ||
| #' | ||
| #' @param input_data A data.frame containing a target variable named which name is defined as input argument | ||
| #' @param target_name A character containing the name of the target variable | ||
| #' @return A decision tree model | ||
| #' | ||
| #' This means: | ||
| #' | ||
| #' Model: Create a decision tree model to predict if customer signing a term deposit. | ||
|
|
||
| #' Model-Plot: We shall implement a function to present a nice representation of the model, e.g. for a decision tree we should plot the tree and respective nodes. | ||
| #' | ||
| #' @examples | ||
| #' data("bmarketing") | ||
| #' dt_model <- model(input_data = bmarketing,target_name="y") | ||
| #' @export | ||
|
|
||
| model <- function(input_data,target_name) { | ||
| dt_model<- rpart::rpart(as.formula(paste(target_name," ~ .")), data = input_data) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why combine the model fitting with plot? |
||
| rpart.plot::rpart.plot(dt_model) | ||
| return (dt_model) | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| #' model_accuracy | ||
| #' | ||
| #' The function calculates accuracy (as a share of correctly classified observations and total observations), confusion matrix of a classification model | ||
| #' , sensitivity and specificity. For further reference to confusion matrix, sensitivity and specificity please see \url{https://en.wikipedia.org/wiki/Confusion_matrix} | ||
| #' , respective \url{https://en.wikipedia.org/wiki/Sensitivity_and_specificity}. | ||
| #' | ||
| #' @param real a vector of realized observations | ||
| #' @param pred a vector of corresponding predictions | ||
| #' @param chosenvar a string/numeric which choose for which class sensitivity will be calculated | ||
| #' @return \code{accuracy} ... a number | ||
| #' @return \code{confusion_matrix} ... a matrix | ||
| #' @return \code{sensitivity} ... a number | ||
| #' @return \code{specificity} ... a number | ||
| #' @examples | ||
| #' example_real=c(1,2,3,1,2,3,1,2,3) # a vector of realized observations | ||
| #' example_pred=c(1,2,3,1,2,3,2,2,2) # a vector of predictions | ||
| #' example_chosenvar=2 | ||
| #' model_accuracy(example_real,example_pred,example_chosenvar) | ||
| #' @export | ||
|
|
||
| model_accuracy <- function(real,pred,chosenvar){ | ||
|
|
||
| #Tests if input is correct | ||
| if(length(which(!(unique(pred) %in% unique(real))))>0) stop('Predictions attain at least one value, which is not in realized data.') | ||
| if(length(real)!=length(pred)) stop('Different length of input vectors.') | ||
| if(any(is.na(real))==TRUE) stop('There is at least one NA in vector of realized data.') | ||
| if(any(is.na(pred))==TRUE) stop('There is at least one NA in vector of predicted data.') | ||
|
|
||
|
|
||
| ## Calculate the accuracy as a percentage of correctly predicted (classificated) observations. | ||
| acc_perc=mean(real == pred) | ||
|
|
||
| # Calculate confusion matrix. | ||
| confusion_matrix=table(real,pred) | ||
|
|
||
| chosencol0<-colnames(confusion_matrix)==chosenvar | ||
| chosencol=which(chosencol0==TRUE) | ||
|
|
||
|
|
||
| #Calculate sensitivity and specificity | ||
| truepos<-sum(confusion_matrix[chosencol,chosencol]) | ||
| falseneg<-sum(confusion_matrix[chosencol,-chosencol]) | ||
| trueneg=sum(confusion_matrix[-chosencol,-chosencol]) | ||
| falsepos=sum(confusion_matrix[-chosencol,chosencol]) | ||
|
|
||
|
|
||
| sensitivity<-truepos/(truepos+falseneg) | ||
| specificity<-trueneg/(trueneg + falsepos) | ||
|
|
||
| #put results together | ||
| result<-list(acc_perc,confusion_matrix,sensitivity,specificity) | ||
| names(result) <-c('accuracy','confusion_matrix','sensitivity','specificity') | ||
|
|
||
| if(acc_perc<0.7) warning('Accuracy is below 70%.') | ||
|
|
||
| return(result) | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| #' Predictions. | ||
| #' | ||
| #' \code{predictions} returns predictions. | ||
| #' | ||
| #' @param dt_model model | ||
| #' @param data data the model was generated with | ||
| #' @examples | ||
| #' data("bmarketing") | ||
| #' dt_model <- model(input_data = bmarketing,target_name="y") | ||
| #' predictions(dt_model, bmarketing) | ||
| #' @export | ||
|
|
||
| predictions <- function(dt_model, data) | ||
| { | ||
| predict(dt_model, data, type = "class") | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # library(tree3) | ||
| # | ||
| # data("bmarketing") | ||
| # bmarketing | ||
| # | ||
| # t2 <- clean(bmarketing) | ||
| # t3 <- transform(t2) | ||
| # | ||
| # dt_model <- model(t3) | ||
| # | ||
| # pred <- predictions(dt_model,t3) | ||
| # | ||
| # model_accuracy(t3$y,pred) | ||
| # | ||
| # | ||
| # | ||
| # |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # Data Transformation | ||
| #' Log transformation and factor transformation into numeric variables (and vice versa) as necessary. | ||
| #' The function needs DescTools library installed | ||
| #' | ||
| #' \code{transform} transforms the sent data to log (for numeric) or factor transformation into numeric, as necessary. | ||
| #' | ||
| #' @param data A data.frame containing the target variable | ||
| #' @param column The column, with double quotation marks which needs to be transformed | ||
| #' @param option The option to replace NA's by the mean or by modus (1 - replace NA's - default, 0 - no replacement) | ||
| #' @return The column from the data frame that is returned is the transformed data | ||
| #' | ||
| #' @examples | ||
| #' transform(bmarketing, "age", 1) | ||
| #' @export | ||
|
|
||
| transform <- function(data, column, option = 1) { | ||
|
|
||
| if(is.integer(data[,column]) || is.numeric(data[,column])) { | ||
| # Taking care of missing data & log | ||
| if (option == 1) { | ||
| Mean <- mean(data[,column], na.rm = TRUE) | ||
| y <- ifelse(is.na(data[,column]), Mean, data[,column]) | ||
| } | ||
| if (option == 0) { | ||
| y <- data[,column] | ||
| } | ||
|
|
||
| if(min(y, na.rm = TRUE) >=0) | ||
| y <- log(y) | ||
| } | ||
|
|
||
| # Encoding categorical data | ||
| if(is.factor(data[,column])) { | ||
| y <- as.numeric(data[,column]) | ||
| if (option == 1) { | ||
| y <- ifelse(is.na(y), DescTools::Mode(y), y) | ||
| } | ||
| } | ||
| return(y) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer any here