-
Notifications
You must be signed in to change notification settings - Fork 9
Tutorial Casper Ext
Casper dataset extensions provide additional classes and functionality not found elsewhere. In particular it provides:
- A static utility class of various methods (CasperUtil?)
- A casper file definition that provides a double from the columns of a dataset (CDataFileDoubleArray)
- A casper file definition that provides a Map from the columns of a dataset (CDataFileMap)
- Narrowing functionality which converts the columns in a casper dataset to the smallest possible value type of all items while retaining fidelity (CBuildNarrowed). Useful for applying to datasets that have been created from sources that have no data type associated with columns (e.g. CSV files) or loosely specified data types (eg. Excel files) and the data type can't be specified at the time of loading (CBuildNarrowedFile).
- Export a dataset as a CSV string. Useful for comparing datasets. (CExportCSVString)
- A Swing GUI TableModel? wrapper for a Casper dataset (CDatasetTableModel)
- Loading and narrowing a CSV file
Suppose you have the file patients.csv and cannot specify in advance the types of each colum:
refnum,sex,crefnum,question,age,weight
1,M,,2,45,841.9098462
2,F,,1,33,7.587573231
,F,1000,,4,1696.537051
,M,2000,,5,0.123456789012345
You can load and narrow patients.csv at the same time like this
File patientsCSV = new File("patients.csv");
CBuilder builder = new CBuildNarrowedFile(patientsCSV);
CDataCacheContainer container = new CDataCacheContainer(builder);
This will provide a dataset with the following metadata:
{refnum:java.lang.String} {sex:java.lang.Character} {crefnum:java.lang.String} {question:java.lang.String} {age:java.lang.Integer} {weight:java.lang.Double}
Note how the refnum and crefnum are both String and not Integer. This is because they contain missing values which are interpreted as the empty string, making the narrowest possible type for the column String. Alternatively, you can use the setConvertMissing(true) option which will convert missing integers and doubles to a integer/double representation, eg:
File patientsCSV = new File("patients.csv");
CBuilder builder = new CBuildNarrowedFile(patientsCSV).setConvertMissing(true);
CDataCacheContainer container = new CDataCacheContainer(builder);
which will produce a dataset with this metadata:
{refnum:java.lang.Integer} {sex:java.lang.Character} {crefnum:java.lang.Integer} {question:java.lang.Integer} {age:java.lang.Integer} {weight:java.lang.Double}