Skip to content

eeghor/mta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

86 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multi-Touch Attribution (MTA)

A comprehensive Python library for multi-touch attribution modeling in marketing analytics. This library implements various attribution models to help marketers understand the contribution of different touchpoints in the customer journey.

🎯 Features

Attribution Models Implemented

  • First Touch: 100% credit to the first interaction
  • Last Touch: 100% credit to the last interaction before conversion
  • Linear: Equal credit distribution across all touchpoints
  • Position-Based (U-Shaped): Customizable weights for first/last touch with remaining credit distributed to middle touches
  • Time Decay: Higher credit to more recent touchpoints
  • Markov Chain: Probabilistic model using transition matrices
  • Shapley Value: Game-theoretic fair allocation based on marginal contributions
  • Shao's Model: Probabilistic Shapley-equivalent approach
  • Logistic Regression: Machine learning-based ensemble attribution
  • Additive Hazard: Survival analysis-based attribution

πŸ“¦ Installation

pip install mta

Or install from source:

git clone https://github.com/eeghor/mta.git
cd mta
pip install -e .

πŸš€ Quick Start

Basic Usage

from mta import MTA

# Initialize with your data
mta = MTA(data="your_data.csv", allow_loops=False, add_timepoints=True)

# Run a single attribution model
mta.linear(share="proportional", normalize=True)
mta.show()

# Chain multiple models
(mta.linear(share="proportional")
    .time_decay(count_direction="right")
    .markov(sim=False)
    .shapley()
    .show())

Using Configuration

from mta import MTA, MTAConfig

# Create custom configuration
config = MTAConfig(
    allow_loops=False,
    add_timepoints=True,
    sep=" > ",
    normalize_by_default=True
)

mta = MTA(data="data.csv", config=config)

Working with DataFrames

import pandas as pd
from mta import MTA

# Load your data
df = pd.read_csv("customer_journeys.csv")

# Initialize MTA with DataFrame
mta = MTA(data=df, allow_loops=False)

# Run attribution models
mta.first_touch().last_touch().linear().show()

πŸ“Š Data Format

Your input data should be a CSV file or pandas DataFrame with the following columns:

path,total_conversions,total_null,exposure_times
alpha > beta > gamma,10,5,2023-01-01 10:00:00 > 2023-01-01 11:00:00 > 2023-01-01 12:00:00
beta > gamma,5,3,2023-01-02 09:00:00 > 2023-01-02 10:00:00

Required Columns:

  • path: Customer journey as channel names separated by > (or custom separator)
  • total_conversions: Number of conversions for this path
  • total_null: Number of non-conversions for this path
  • exposure_times: Timestamps of channel exposures (optional, can be auto-generated)

🎨 Advanced Usage

Position-Based Attribution with Custom Weights

# Give 30% to first touch, 30% to last touch, 40% distributed to middle
mta.position_based(first_weight=30, last_weight=30, normalize=True)

Time Decay with Direction Control

# Count from left (earliest gets lowest credit)
mta.time_decay(count_direction="left")

# Count from right (latest gets highest credit - more common)
mta.time_decay(count_direction="right")

Markov Chain Attribution

# Analytical calculation (faster)
mta.markov(sim=False, normalize=True)

# Simulation-based (more flexible, handles complex scenarios)
mta.markov(sim=True, normalize=True)

Shapley Value Attribution

# With custom coalition size
mta.shapley(max_coalition_size=3, normalize=True)

Logistic Regression Ensemble

# Custom sampling and iteration parameters
mta.logistic_regression(
    test_size=0.25,
    sample_rows=0.5,
    sample_features=0.5,
    n_iterations=1000,
    normalize=True
)

Export Results

# Compare all models
results_df = mta.compare_models()

# Export to various formats
mta.export_results("attribution_results.csv", format="csv")
mta.export_results("attribution_results.json", format="json")
mta.export_results("attribution_results.xlsx", format="excel")

πŸ“ˆ Example: Complete Analysis Pipeline

from mta import MTA
import pandas as pd

# Load data
mta = MTA(
    data="customer_journeys.csv",
    allow_loops=False,  # Remove consecutive duplicate channels
    add_timepoints=True  # Auto-generate timestamps if missing
)

# Run all heuristic models
(mta
    .first_touch()
    .last_touch()
    .linear(share="proportional")
    .position_based(first_weight=40, last_weight=40)
    .time_decay(count_direction="right"))

# Run algorithmic models
(mta
    .markov(sim=False)
    .shapley(max_coalition_size=2)
    .shao()
    .logistic_regression(n_iterations=2000)
    .additive_hazard(epochs=20))

# Display and export results
results = mta.compare_models()
mta.export_results("full_attribution_analysis.csv")

# Access specific model results
print(f"Markov Attribution: {mta.attribution['markov']}")
print(f"Shapley Attribution: {mta.attribution['shapley']}")

πŸ”¬ Model Comparison

Model Type Strengths Use Case
First/Last Touch Heuristic Simple, fast Quick baseline
Linear Heuristic Fair, interpretable Equal value assumption
Position-Based Heuristic Balances first/last Awareness + conversion focus
Time Decay Heuristic Recency-weighted When recent matters more
Markov Chain Algorithmic Considers path structure Sequential dependency
Shapley Value Algorithmic Game-theoretic fairness Complex interactions
Logistic Regression Machine Learning Data-driven Large datasets
Additive Hazard Statistical Time-to-event modeling Survival analysis fans

πŸ› οΈ Requirements

  • Python >= 3.8
  • pandas >= 1.3.0
  • numpy >= 1.20.0
  • scikit-learn >= 0.24.0
  • arrow >= 1.0.0

πŸ“ Citation

If you use this library in your research, please cite:

@software{mta2024,
  author = {Igor Korostil},
  title = {MTA: Multi-Touch Attribution Library},
  year = {2024},
  url = {https://github.com/eeghor/mta}
}

πŸ“š References

This library implements models and techniques from the following research papers:

  1. Nisar, T. M., & Yeung, M. (2015)
    Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation
    PDF

  2. Shao, X., & Li, L. (2011)
    Data-driven Multi-touch Attribution Models
    Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    PDF

  3. Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012)
    Causally Motivated Attribution for Online Advertising
    Proceedings of the Sixth International Workshop on Data Mining for Online Advertising
    PDF

  4. Cano-Berlanga, S., GimΓ©nez-GΓ³mez, J. M., & Vilella, C. (2017)
    Attribution Models and the Cooperative Game Theory
    Expert Systems with Applications, 87, 277-286
    PDF

  5. Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018)
    Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising
    Proceedings of the 27th ACM International Conference on Information and Knowledge Management
    PDF

  6. Zhang, Y., Wei, Y., & Ren, J. (2014)
    Multi-Touch Attribution in Online Advertising with Survival Theory
    2014 IEEE International Conference on Data Mining
    PDF

  7. Geyik, S. C., Saxena, A., & Dasdan, A. (2014)
    Multi-Touch Attribution Based Budget Allocation in Online Advertising
    Proceedings of the 8th International Workshop on Data Mining for Online Advertising
    PDF

Model-to-Paper Mapping

  • Linear & Position-Based: Baseline models referenced across multiple papers
  • Time Decay: Nisar & Yeung (2015), Zhang et al. (2014)
  • Markov Chain: Shao & Li (2011), Dalessandro et al. (2012)
  • Shapley Value: Cano-Berlanga et al. (2017)
  • Logistic Regression: Dalessandro et al. (2012), Ren et al. (2018)
  • Additive Hazard: Zhang et al. (2014)

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Inspired by various academic papers on marketing attribution
  • Built with pandas, numpy, and scikit-learn
  • Special thanks to the open-source community

πŸ“§ Contact

Igor Korostil - eeghor@gmail.com

Project Link: https://github.com/eeghor/mta

πŸ› Known Issues

  • Shapley value computation can be slow for large numbers of channels
  • Additive hazard model requires evenly-spaced time points for best results

About

Multi-Touch Attribution

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages