[Python] E-wallet Performance Analysis

I. INTRODUCTION

In this project, Python techniques: Pandas, NumPy, EDA and data wrangling methods (cleaning, aggregation, and feature engineering) were utilized to analyze payment transactions, product performance, and team contributions. This project aimed to gain insights for decision-making enhancement in payment operations, product strategy, and financial reporting.

II. DATASETS

Datasets used (as attachment) is of an e-wallet company. There are 3 datasets with description as below table:

No.	Dataset	Description
1	payment_report.csv	Monthly payment volume of products
2	product.csv	Product information
3	transactions.csv	Transactions information

III. KEY AREAS TO ANALYZE

To gain insights from given datasets, the analysis focus on 3 key areas, which aim to help decision-making process:

1. Product Sales Performance

Top 3 products with highest sales volume.
Data integrity: each product belongs to a single team?
Lowest-performing team since Q2 2023.
Least-contributing category.

2. Refund Transaction Analysis

Distribution of refund transactions by source.
Identifying highest refund-contributing source.

3. Transaction Classification & Performance Metrics

Categorizing transactions based on predefined business rules.
Calculating transaction statistics for each transaction type.

IV. DATA PREPARATION

1. Checking

# Checking for Missing Values
print("---",payment_report.isna().sum())
print("---",product.isna().sum())
print("---",transactions.isna().sum())

# Checking for Duplications
print(payment_report.shape)
print(payment_report.nunique())
print(product.shape)
print(product.nunique())
print(transactions.shape)
print(transactions.nunique())

# Checking data type
print(payment_report.info())
print(product.info())
print(transactions.info())

# Checking data value
print(payment_report.describe())
print(product.describe())
print(transactions.describe())

2. Conclusion & Handling

Missing data:
- In Transactions table: 42183 rows in sender_id -> No base to fill in -> No action 138933 rows in receiver_id -> No base to fill in -> No action 1130449 rows in extra_info -> No action
Duplicates:
- In PK transaction_id of Transactions table: 28 duplicated rows -> Delete rows
Incorrect data types:
- All data types are correct
Incorrect values:
- In Transactions table: receiver_id has negative values -> Convert all to positive transStatus has negative values -> Don’t understand meaning -> No action

# Drop duplicates in Transactions df
transactions.drop_duplicates(inplace = True)

# Convert receiver_id to positive
transactions["receiver_id"] = np.abs(transactions["receiver_id"])

V. DATA WRANGLING

Problem #1: Find top 3 product_ids with the highest volume.

# Merge payment_report df with product df
payment_enriched = pd.merge(payment_report, product, on="product_id", how="outer")

# Find top 3 products
volume_by_product = payment_enriched.groupby("product_id")["volume"].sum()
volume_by_product.nlargest(3)

product_id	volume
1976	6.179758e+10
429	1.466768e+10
372	1.371366e+10

➡️ Top 3 products are ones having id: 1976, 429, 372.

Problem #2: Given that 1 product_id is only owed by 1 team, are there any abnormal products against this rule?

team_by_product = payment_enriched.groupby("product_id").agg({"team_own":"count"})
team_by_product[team_by_product["team_own"] != 1]

product_id	team_own
3	0
12	4
15	2
17	4
18	4
...	...
2356	3
2587	2
10033	0
10039	4
15067	3

➡️ Above is a list of all abnormal products, which can be used for further correction activities.

Problem #3: Find the team has had the lowest performance (lowest volume) since Q2.2023. Find the category that contributes the least to that team.

1. Lowest-performing team:

# Filter values in Q2.2023
q2_2023 = payment_enriched[payment_enriched["report_month"].isin(["2023-04","2023-05","2023-06"])]

# Calculate volume by team
volume_by_team = q2_2023.groupby("team_own").agg({"volume":"sum"})

# Identify lowest performance team
volume_by_team["volume"].nsmallest(1)

team_own	volume
APS	51141753.0

2. Least-contributing category of such team:

# Filter values of APS team
aps = payment_enriched[payment_enriched["team_own"] == "APS"]

# Calculate volume by category
volume_by_category = aps.groupby("category").agg({"volume":"sum"})

# Identify lowest contributed category
volume_by_category["volume"].nsmallest(1)

category	volume
PXXXXXO	0.0

➡️ Team with lowest performance since Q2.2023 was APS with PXXXXXO being the lowest contributed category.

Problem #4: Find the contribution of source_ids of refund transactions (payment_group = ‘refund’), what is the source_id with the highest contribution?

# Filter 'refund' payment group
refund = payment_enriched[payment_enriched["payment_group"] == "refund"]

# Calculate volume by source_id
volume_by_souce = refund.groupby("source_id").agg({"volume":"sum"})

# Identify source_id having highest volume
volume_by_souce["volume"].nlargest(1)

source_id	volume
38.0	3.652745e+10

➡️Source number #38 contributed most refund transactions. This source should be further examined to find high-refund causes.

Problem #5: Categorize type of transactions (‘transaction_type’) for each row, given:

transType = 2 & merchant_id = 1205: Bank Transfer Transaction
transType = 2 & merchant_id = 2260: Withdraw Money Transaction
transType = 2 & merchant_id = 2270: Top Up Money Transaction
transType = 2 & others merchant_id: Payment Transaction
transType = 8, merchant_id = 2250: Transfer Money Transaction
transType = 8 & others merchant_id: Split Bill Transaction
Remained cases are invalid transactions

# Create list of Transaction type
transaction_category = ["Bank Transfer Transaction", "Withdraw Money Transaction", "Top Up Money Transaction", "Payment Transaction", "Transfer Money Transaction", "Split Bill Transaction"]

# Create list of Conditions
conditions = [
              ((transactions["transType"]==2)&(transactions["merchant_id"]==1205)),
              ((transactions["transType"]==2)&(transactions["merchant_id"]==2260)),
              ((transactions["transType"]==2)&(transactions["merchant_id"]==2270)),
              (transactions["transType"]==2),
              ((transactions["transType"]==8)&(transactions["merchant_id"]==2250)),
              (transactions["transType"]==8)
              ]

# Create new column 'transaction_type'
transactions["transaction_type"] = np.select(conditions, transaction_category, default="Invalid Transaction")

print(transactions.head())

transaction_id	merchant_id	volume	transType	transStatus	sender_id	receiver_id	extra_info	timeStamp	transaction_type
3002692434	5	100000	24	1	10199794.0	199794.0	NaN	1682932054455	Invalid Transaction
3002692437	305	20000	2	1	14022211.0	14022211.0	NaN	1682932054912	Payment Transaction
3001960110	7255	48605	22	1	NaN	10530940.0	NaN	1682932055000	Invalid Transaction
3002680710	2270	150000	2	1	10059206.0	59206.0	NaN	1682932055622	Top Up Money Transaction
3002680713	2275	90000	2	1	10004711.0	4711.0	NaN	1682932056197	Payment Transaction

➡️ All transactions are categorized, which is helpful for deeper analysis.

Problem #6: Of each transaction type (excluding invalid transactions): find the number of transactions, volume, senders and receivers.

# Count transaction, volume, senders, receivers of each transaction type
grouped_rs_transaction = transactions.groupby("transaction_type").agg(
                                    transaction_count=("transaction_id","count"),
                                    volume_count=("volume","count"),
                                    sender_count=("sender_id","count"),
                                    receiver_count=("receiver_id","count")
                                  ).reset_index()

# Exclude invalid transactions
grouped_rs_transaction[grouped_rs_transaction["transaction_type"] != "Invalid Transaction"].reset_index(drop=True)

transaction_type	transaction_count	volume_count	sender_count	receiver_count
Bank Transfer Transaction	37,879	37,879	37,879	14,004
Payment Transaction	398,665	398,665	398,665	259,982
Split Bill Transaction	1,376	1,376	1,376	1,376
Top Up Money Transaction	290,498	290,498	290,498	290,498
Transfer Money Transaction	341,173	341,173	341,173	341,173
Withdraw Money Transaction	33,725	33,725	33,725	33,725

➡️ What can be seen?

Payment transactions dominate with 398,665 transactions and 259,982 receivers --> Core focus for optimization.
Bank transfers have 37,879 transactions but only 14,004 receivers --> Bulk payments or business payouts.
Top-up transactions are highly frequent at 290,498 --> Opportunity to incentivize higher deposits.
Split bill transactions have only 1,376 transactions --> Require better promotion or integration with social payments.

VI. CONCLUSION

In conclusion, my analysis of the e-Wallet performance using Python has uncovered valuable insights into payment transactions, product performance, and team contributions, which can inform future business decisions. The next step will involve visualizing or further analysis into some low performance aspects. Overall, this project demonstrates the effectiveness of employing Python to derive meaningful insights from multiple datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
code_file.ipynb		code_file.ipynb
payment_report.csv		payment_report.csv
product.csv		product.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[Python] E-wallet Performance Analysis

I. INTRODUCTION

II. DATASETS

III. KEY AREAS TO ANALYZE

IV. DATA PREPARATION

V. DATA WRANGLING

Problem #1: Find top 3 product_ids with the highest volume.

Problem #2: Given that 1 product_id is only owed by 1 team, are there any abnormal products against this rule?

Problem #3: Find the team has had the lowest performance (lowest volume) since Q2.2023. Find the category that contributes the least to that team.

Problem #4: Find the contribution of source_ids of refund transactions (payment_group = ‘refund’), what is the source_id with the highest contribution?

Problem #5: Categorize type of transactions (‘transaction_type’) for each row, given:

Problem #6: Of each transaction type (excluding invalid transactions): find the number of transactions, volume, senders and receivers.

VI. CONCLUSION

About

Uh oh!

Releases

Packages

Languages

PHAMTHUYDUYEN/E-wallet-Performance-Analysis-Python

Folders and files

Latest commit

History

Repository files navigation

[Python] E-wallet Performance Analysis

I. INTRODUCTION

II. DATASETS

III. KEY AREAS TO ANALYZE

IV. DATA PREPARATION

V. DATA WRANGLING

Problem #1: Find top 3 product_ids with the highest volume.

Problem #2: Given that 1 product_id is only owed by 1 team, are there any abnormal products against this rule?

Problem #3: Find the team has had the lowest performance (lowest volume) since Q2.2023. Find the category that contributes the least to that team.

Problem #4: Find the contribution of source_ids of refund transactions (payment_group = ‘refund’), what is the source_id with the highest contribution?

Problem #5: Categorize type of transactions (‘transaction_type’) for each row, given:

Problem #6: Of each transaction type (excluding invalid transactions): find the number of transactions, volume, senders and receivers.

VI. CONCLUSION

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages