Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
f14dc79
pmaletic: Fixed factors from csv
pmaletic May 8, 2019
6b07896
fixing decimal errors
cojocarucosmin May 8, 2019
da2449f
Add project file
pmaletic May 8, 2019
82e3532
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin May 8, 2019
7b28995
new version
cojocarucosmin May 8, 2019
c6e396b
Creating new package
pmaletic May 8, 2019
e6f1dc8
Moving package structure to the top
apeterseil May 8, 2019
25f2c6e
delete tree3 folder
apeterseil May 8, 2019
d9cc1d0
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
192f264
movin R
pmaletic May 8, 2019
b8ed5f0
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin May 8, 2019
eb14e6a
adding text
pmaletic May 8, 2019
f74859b
delete tree3 folder
apeterseil May 8, 2019
628b4b5
created folder data and moved bmarketing.csv
apeterseil May 8, 2019
1ed4e58
added clean function
apeterseil May 8, 2019
7d6ee7d
prediction function prepared
KhristenkoDaniil May 8, 2019
5c929f9
prediction function
KhristenkoDaniil May 8, 2019
066fd46
prediction function
KhristenkoDaniil May 8, 2019
3a0ae35
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
a3c684e
description predictions function
KhristenkoDaniil May 8, 2019
1091652
implemented clean function
apeterseil May 8, 2019
fb35295
RoxygenNote automatically added
apeterseil May 8, 2019
327965c
Creating new script for model and ploting
pmaletic May 8, 2019
c1cdaa4
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
aa276d0
predictions function
KhristenkoDaniil May 8, 2019
8c5d8f3
added bmarketing data
apeterseil May 8, 2019
6c20dda
Function, which calculates accuracy and confusion matrix.
JosephSpejbl May 8, 2019
63261cd
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin May 8, 2019
f5be250
predictions functions
KhristenkoDaniil May 8, 2019
369308e
defining the transformation function
cojocarucosmin May 8, 2019
ecf1103
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin May 8, 2019
a82fffc
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
9d2bb1a
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
c98ccb1
prediction documenttation
KhristenkoDaniil May 8, 2019
a0fbf58
Merge branch 'master' of https://github.com/pmaletic/bmarketing
JosephSpejbl May 8, 2019
4e260d2
predictions documentation
KhristenkoDaniil May 8, 2019
91e8272
Merge branch 'master' of https://github.com/pmaletic/bmarketing
KhristenkoDaniil May 8, 2019
e557c3c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
KhristenkoDaniil May 8, 2019
aaf5625
repaired warning
JosephSpejbl May 8, 2019
604a96d
clear description
KhristenkoDaniil May 8, 2019
2411c54
removed tree3 file
apeterseil May 8, 2019
2bece8e
prediction documentation
KhristenkoDaniil May 8, 2019
5bbf1d2
documentation added
KhristenkoDaniil May 8, 2019
8c22aa3
generated documentation for clean and predictions
apeterseil May 8, 2019
0712c6b
Added documentation
pmaletic May 8, 2019
ff1bca7
added Imports
apeterseil May 8, 2019
2e7d8ca
correcting function name
cojocarucosmin May 8, 2019
2e2d85c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
cojocarucosmin May 8, 2019
2dec434
added data
apeterseil May 8, 2019
24a18dd
a
pmaletic May 8, 2019
b512bc6
Updated package description.
JosephSpejbl May 8, 2019
70d2e45
aa
pmaletic May 8, 2019
5642058
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
6471652
new script for ploting
pmaletic May 8, 2019
ae93637
Update README.md
JosephSpejbl May 8, 2019
9a9f366
p
pmaletic May 8, 2019
4e0fa67
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
0ebd4cc
repair
JosephSpejbl May 8, 2019
f1831a8
fix1
pmaletic May 8, 2019
903404c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
482ef27
a
pmaletic May 8, 2019
4436579
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
66f943e
changed factors to numeric in bmarketdata
apeterseil May 8, 2019
2afa24e
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
0a2f498
removing errors in documentation examples
apeterseil May 8, 2019
11fa11c
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
2186fc8
fix2
pmaletic May 8, 2019
b718781
aa
pmaletic May 8, 2019
791cd5b
q
pmaletic May 8, 2019
19e115a
Merge branch 'master' of https://github.com/pmaletic/bmarketing
pmaletic May 8, 2019
321aa9c
testing all functuions on one place
pmaletic May 8, 2019
947b879
aaa
pmaletic May 8, 2019
7ebce52
testing file
apeterseil May 8, 2019
7a406ca
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 8, 2019
b2de007
deleting
pmaletic May 9, 2019
7b852ea
updated namespace file
apeterseil May 9, 2019
a06fa41
added parameters
apeterseil May 9, 2019
49a790e
Fixing issue #4.
apeterseil May 9, 2019
44e08f5
fixing target variable flexibility
pmaletic May 9, 2019
0961ff9
a
pmaletic May 9, 2019
b584c93
tr
pmaletic May 9, 2019
1bdb1c3
Removing plot_model function as this is now included in the model fun…
apeterseil May 9, 2019
c68fe0d
correcting the function to work with any dataset
cojocarucosmin May 9, 2019
90ba520
deleting libraries from code
pmaletic May 9, 2019
fcac128
Merge branch 'master' of https://github.com/pmaletic/bmarketing
apeterseil May 9, 2019
2c0aac9
Removing plot_model function as this is now included in the model fun…
apeterseil May 9, 2019
474dc93
Add Function documentation and export
apeterseil May 9, 2019
6c676de
numeric correction
cojocarucosmin May 9, 2019
a00aa33
changed documentation to remove check errors
apeterseil May 9, 2019
3abf7d5
changing documentation and clean function
apeterseil May 9, 2019
93e3277
added parameter target_name to documentation
apeterseil May 9, 2019
5a7b6fa
added target_name parameter to documentation
apeterseil May 9, 2019
3655d99
updated readme file
apeterseil May 9, 2019
402a93b
changed parameter for predictions
apeterseil May 9, 2019
43dca38
changed data parameter in predictions
apeterseil May 9, 2019
b89a2d9
Updated model_accuracy() and its documentation.
JosephSpejbl May 9, 2019
d120e6a
changed Depends and license
apeterseil May 9, 2019
999d76e
added data parameter to documentation
apeterseil May 9, 2019
46c2ac0
Update README.Rmd
JosephSpejbl May 9, 2019
6e74430
changed library loading
apeterseil May 9, 2019
a4b9f5d
implementing NA replacement option for both types of data
cojocarucosmin May 9, 2019
035f7eb
introducing dependency on DescTools library
cojocarucosmin May 9, 2019
4906433
library dependency (DescTools)
cojocarucosmin May 9, 2019
5c200af
update readme for accuracy
JosephSpejbl May 9, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
^.*\.Rproj$
^\.Rproj\.user$
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
11 changes: 11 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Package: tree3
Title: Package aimed to preprocess and build strong decision tree based predictions for deposit buying
Version: 0.0.0.9000
Authors@R: person("Group 3", "Data Scinece Academy", email = "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.5.0)
Imports: tidyverse, rpart, rpart.plot, DescTools
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
7 changes: 7 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(clean)
export(model)
export(model_accuracy)
export(predictions)
export(transform)
31 changes: 31 additions & 0 deletions R/bmarketing.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#' Banking Marketing Data
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 4119 rows and 21 variables:
#' \describe{
#' \item{age}{age, integer}
#' \item{job}{factor}
#' \item{marital}{factor}
#' \item{education}{factor}
#' \item{default}{factor}
#' \item{housing}{factor}
#' \item{load}{factor}
#' \item{contact}{factor}
#' \item{month}{factor}
#' \item{day_of_week}{factor}
#' \item{duration}{integer}
#' \item{campaign}{integer}
#' \item{pdays}{integer}
#' \item{previous}{integer}
#' \item{poutcome}{factor}
#' \item{emp.var.rate}{numeric}
#' \item{cons.price.idx}{numeric}
#' \item{cons.conf.idx}{numeric}
#' \item{euribor3m}{numeric}
#' \item{nr.employed}{numeric}
#' \item{y}{target variable}
#' }
#' @source marketing
"bmarketing"
43 changes: 43 additions & 0 deletions R/clean.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#' Cleanin Data
#'
#' \code{clean} returns a clean data set.
#'
#' @param data A data.frame containing a target variable named y
#' @param target_name A character containing the name of the target variable
#' @return A cleaned data.frame.
#'
#' This means:
#'
#' - return an error if the target variable contains any missing values (NA’s).
#' - Give clear warnings for all other variables which contain NA’s.
#' - Remove any columns (and report as warning) which contain more than 50% NA’s.
#'
#' @examples
#' data(bmarketing)
#' clean(bmarketing,"y")
#' @export

clean<-function(data,target_name){

if(sum(is.na(data[,target_name]))>0) stop("Target variable y contains missing values (NA's)")

for(col in names(data)){

if(sum(is.na(data[,col]))>0){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer any here


if(sum(is.na(data[,col]))/nrow(data) > 0.5){

data <- data[,!(names(data)==col)]
warning(paste0("Variable ",col," removed due to more than 50% NA's"))

}else{
warning(paste0("Variable ",col," contains NA's"))
}

}

}

return(data)

}
25 changes: 25 additions & 0 deletions R/model.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#' Runing decision tree model and ploting it results
#'
#' \code{clean} returns a decision tree model.
#'
#' @param input_data A data.frame containing a target variable named which name is defined as input argument
#' @param target_name A character containing the name of the target variable
#' @return A decision tree model
#'
#' This means:
#'
#' Model: Create a decision tree model to predict if customer signing a term deposit.

#' Model-Plot: We shall implement a function to present a nice representation of the model, e.g. for a decision tree we should plot the tree and respective nodes.
#'
#' @examples
#' data("bmarketing")
#' dt_model <- model(input_data = bmarketing,target_name="y")
#' @export

model <- function(input_data,target_name) {
dt_model<- rpart::rpart(as.formula(paste(target_name," ~ .")), data = input_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why combine the model fitting with plot?

rpart.plot::rpart.plot(dt_model)
return (dt_model)
}

58 changes: 58 additions & 0 deletions R/model_accuracy.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#' model_accuracy
#'
#' The function calculates accuracy (as a share of correctly classified observations and total observations), confusion matrix of a classification model
#' , sensitivity and specificity. For further reference to confusion matrix, sensitivity and specificity please see \url{https://en.wikipedia.org/wiki/Confusion_matrix}
#' , respective \url{https://en.wikipedia.org/wiki/Sensitivity_and_specificity}.
#'
#' @param real a vector of realized observations
#' @param pred a vector of corresponding predictions
#' @param chosenvar a string/numeric which choose for which class sensitivity will be calculated
#' @return \code{accuracy} ... a number
#' @return \code{confusion_matrix} ... a matrix
#' @return \code{sensitivity} ... a number
#' @return \code{specificity} ... a number
#' @examples
#' example_real=c(1,2,3,1,2,3,1,2,3) # a vector of realized observations
#' example_pred=c(1,2,3,1,2,3,2,2,2) # a vector of predictions
#' example_chosenvar=2
#' model_accuracy(example_real,example_pred,example_chosenvar)
#' @export

model_accuracy <- function(real,pred,chosenvar){

#Tests if input is correct
if(length(which(!(unique(pred) %in% unique(real))))>0) stop('Predictions attain at least one value, which is not in realized data.')
if(length(real)!=length(pred)) stop('Different length of input vectors.')
if(any(is.na(real))==TRUE) stop('There is at least one NA in vector of realized data.')
if(any(is.na(pred))==TRUE) stop('There is at least one NA in vector of predicted data.')


## Calculate the accuracy as a percentage of correctly predicted (classificated) observations.
acc_perc=mean(real == pred)

# Calculate confusion matrix.
confusion_matrix=table(real,pred)

chosencol0<-colnames(confusion_matrix)==chosenvar
chosencol=which(chosencol0==TRUE)


#Calculate sensitivity and specificity
truepos<-sum(confusion_matrix[chosencol,chosencol])
falseneg<-sum(confusion_matrix[chosencol,-chosencol])
trueneg=sum(confusion_matrix[-chosencol,-chosencol])
falsepos=sum(confusion_matrix[-chosencol,chosencol])


sensitivity<-truepos/(truepos+falseneg)
specificity<-trueneg/(trueneg + falsepos)

#put results together
result<-list(acc_perc,confusion_matrix,sensitivity,specificity)
names(result) <-c('accuracy','confusion_matrix','sensitivity','specificity')

if(acc_perc<0.7) warning('Accuracy is below 70%.')

return(result)

}
16 changes: 16 additions & 0 deletions R/predictions.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#' Predictions.
#'
#' \code{predictions} returns predictions.
#'
#' @param dt_model model
#' @param data data the model was generated with
#' @examples
#' data("bmarketing")
#' dt_model <- model(input_data = bmarketing,target_name="y")
#' predictions(dt_model, bmarketing)
#' @export

predictions <- function(dt_model, data)
{
predict(dt_model, data, type = "class")
}
17 changes: 17 additions & 0 deletions R/testing.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# library(tree3)
#
# data("bmarketing")
# bmarketing
#
# t2 <- clean(bmarketing)
# t3 <- transform(t2)
#
# dt_model <- model(t3)
#
# pred <- predictions(dt_model,t3)
#
# model_accuracy(t3$y,pred)
#
#
#
#
40 changes: 40 additions & 0 deletions R/transform.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Data Transformation
#' Log transformation and factor transformation into numeric variables (and vice versa) as necessary.
#' The function needs DescTools library installed
#'
#' \code{transform} transforms the sent data to log (for numeric) or factor transformation into numeric, as necessary.
#'
#' @param data A data.frame containing the target variable
#' @param column The column, with double quotation marks which needs to be transformed
#' @param option The option to replace NA's by the mean or by modus (1 - replace NA's - default, 0 - no replacement)
#' @return The column from the data frame that is returned is the transformed data
#'
#' @examples
#' transform(bmarketing, "age", 1)
#' @export

transform <- function(data, column, option = 1) {

if(is.integer(data[,column]) || is.numeric(data[,column])) {
# Taking care of missing data & log
if (option == 1) {
Mean <- mean(data[,column], na.rm = TRUE)
y <- ifelse(is.na(data[,column]), Mean, data[,column])
}
if (option == 0) {
y <- data[,column]
}

if(min(y, na.rm = TRUE) >=0)
y <- log(y)
}

# Encoding categorical data
if(is.factor(data[,column])) {
y <- as.numeric(data[,column])
if (option == 1) {
y <- ifelse(is.na(y), DescTools::Mode(y), y)
}
}
return(y)
}
35 changes: 33 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,38 @@ knitr::opts_chunk$set(

## Overview

The bmarketing dataset
This package is dedicated to classification analysis using decision trees. Aside from decision tree model itself and prediction function it has got all necessary supporting tools - data cleaning, transformation function, plot of the results and calculation of accuracy of predictions and confusion matrix.

<!-- TODO: Change README to make it more descriptive, add examples, etc. -->
## Functionalities

The functionalities are as follows.

* Data Cleaning

```{r}
require("tree3")

data("bmarketing")
cleanedData <- clean(data = bmarketing,target_name = "y")
```

* Data Transforming
```{r}
cleanedData$cons.price.id <- transform(cleanedData,column = "cons.price.idx")
```

* Finding a Model
```{r}
treeModel <- model(input_data = cleanedData,target_name="y")
```


* Getting the predictions
```{r}
predictionData <- predictions(dt_model = treeModel,data = cleanedData)
```

* Assessing the model accuracy
```{r}
model_accuracy(real = cleanedData$y,pred = predictionData,chosenvar='yes')
```
78 changes: 75 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,79 @@ Status](https://img.shields.io/codecov/c/github/Quantargo/bmarketing/master.svg)

## Overview

The bmarketing
dataset
This package is dedicated to classification analysis using decision
trees. Aside from decision tree model itself and prediction function it
has got all necessary supporting tools - data cleaning, transformation
function, plot of the results and calculation of accuracy of predictions
and confusion matrix.

<!-- TODO: Change README to make it more descriptive, add examples, etc. -->
## Functionalities

The functionalities are as follows.

- Data Cleaning

<!-- end list -->

``` r
require("tree3")
#> Loading required package: tree3
#>
#> Attaching package: 'tree3'
#> The following object is masked from 'package:base':
#>
#> transform

data("bmarketing")
cleanedData <- clean(data = bmarketing,target_name = "y")
```

- Data
Transforming

<!-- end list -->

``` r
cleanedData$cons.price.id <- transform(cleanedData,column = "cons.price.idx")
```

- Finding a Model

<!-- end list -->

``` r
treeModel <- model(input_data = cleanedData,target_name="y")
```

![](man/figures/README-unnamed-chunk-4-1.png)<!-- -->

- Getting the predictions

<!-- end list -->

``` r
predictionData <- predictions(dt_model = treeModel,data = cleanedData)
```

- Assessing the model
accuracy

<!-- end list -->

``` r
model_accuracy(real = cleanedData$y,pred = predictionData,chosenvar='yes')
#> $accuracy
#> [1] 0.9271668
#>
#> $confusion_matrix
#> pred
#> real no yes
#> no 3583 85
#> yes 215 236
#>
#> $sensitivity
#> [1] 0.5232816
#>
#> $specificity
#> [1] 0.9768266
```
Loading