This Python script consolidates patient laboratory data from a CSV file, aggregating multiple entries for the same patient and collection date into a single row.
- Processes large CSV files with multiple rows per patient
- Handles mixed data types
- Chunks output into multiple files for easier management
- Robust error handling for data type conversions
- Python 3.8+
- pandas library
- numpy library
git clone <your-repository-url>
cd patient-data-consolidationpython3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`pip install pandas numpypython consolidate_patient_data.py- Modify
input_filein the script to process different CSV files - Adjust
chunk_sizeparameter to control output file sizes - Customize
output_dirto specify output location
- CSV file with patient data
- Columns should include:
- Patient
- CollectDate
- Various measurement columns
- Multiple CSV files in the specified output directory
chunk_summary.txtwith processing details
- Script is optimized for files with hundreds of thousands of rows
- Uses memory-efficient processing techniques
- Ensure all required libraries are installed
- Check input file format and column names
- Verify Python version compatibility