Skip to content

TresAmigosSD/MVD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MVD

Model Variable Development

This is a demo Big Data Application project with SMV

Install SMV

  • Install Spark version 1.1
  • Download SMV with git clone
  • Compile SMV with mvn clean package install

Setup MVD

  • Download MVD with git clone
  • Compile MVD with mvn package

Get Data

sed -e '2d' Medicare-Physician-and-Other-Supplier-PUF-CY2012.txt > cms_raw_no_2nd_line.csv

Run Demos

  • Discover Schema
$ spark-submit --master local[2] --class org.tresamigos.mvd.projectcms.adhoc.DiscoverSchema target/mvd-1.0-SNAPSHOT-jar-with-dependencies.jar

A directory "cms_raw_no_2nd_line.schema" should be generated under data/cms/input. Move the "part-00000" file out of that directory and rename the file

$ mv cms_raw_no_2nd_line.schema/part-00000 .
$ rm -rf cms_raw_no_2nd_line.schema
$ mv part-00000 cms_raw_no_2nd_line.schema

Review the schema file, and make the following changes

npi: String
bene_unique_cnt: Float
bene_day_srvc_cnt: FLoat
  • Create CmsRaw
DATA_DIR=./data/cms spark-submit --master local[2] --class org.tresamigos.mvd.projectcms.core.CmsApp target/mvd-1.0-SNAPSHOT-jar-with-dependencies.jar -d org.tresamigos.mvd.projectcms.phase1.CmsRaw
  • Basic Aggregation
DATA_DIR=./data/cms spark-submit --master local[2] --class org.tresamigos.mvd.projectcms.core.CmsApp target/mvd-1.0-SNAPSHOT-jar-with-dependencies.jar -d org.tresamigos.mvd.projectcms.adhoc.Ex01SimpleAggregate

About

Model Variable Development

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •