Tech Stack: Python, Statsmodel, Sklearn
Developed a machine learning model to accurately forecast energy consumption and predict its price that helps energy companies, policymakers, and consumers to make informed decisions about energy usage, pricing, and resource planning.
- The objective was to develop a machine learning model that can accurately forecast electricity consumption and predict its price.
- The project will involve collecting historical data on electricity usage and pricing from data sources. This data will then be pre-processed and analyzed to identify patterns and trends in electricity consumption and pricing.
- Next, various machine learning algorithms such as time-series analysis, regression analysis, and neural networks will be applied to the data to create a predictive model.
- The model will be trained and validated using historical data, and its accuracy will be tested against new, unseen data.
The source of the data is the following link: Dataset Source. There are 4 files, they are:
- July 2011 to June 2012 excel file - 57,430 rows and 24 columns
- May 2012 to April 2013 excel file - 65,806 rows and 24 columns
- July 2012 to June 2013 excel file - 66,776 rows and 24 columns
- July 2013 to June 2014 excel file - 67,838 rows and 24 columns
The following is a brief summary of the data cleaning steps we performed:
- First, we identified missing data and decided how to handle it, either by imputing the missing values or excluding the observations entirely based on the respective columns.
- Next, identified and corrected any errors and inconsistencies in the data, such as incorrect values, and formatting the date column.
- We also removed duplicate data and standardized the format of data across different tables, since we were working with multiple tables and there was overlap between the time period of the datasets which we had to account for.
- The data tables contain information regarding the building address, location, service number, billing dates, total amount due.
- Reliant Contract No: A unique identifier for each contract.
- Service Address: Address for the service location
- Meter No: Meter number for the service location.
- ESID: Electric Service Identifier for the service location.
- Business Area: Business area code for the service location.
- Cost Center: Cost center code for the service location.
- Fund: Fund code for the service location.
- Bill Type: Type of bill (e.g. "T" for "Total", "P" for "Partial", etc.).
- Bill Date: Date the bill was generated.
- Read Date: Date the meter was read.
- Due Date: Due date for the bill.
- Meter Read: Meter reading for the service location.
- Base Cost: TBase cost for the service.
- T&D Discretionary: Transmission and Distribution Discretionary charge for the service.
- T&D Charges: Transmission and Distribution charge for the service.
- Current Due: Current due amount for the service.
- Index Charge: Index charge for the service.
- Total Due: Total due amount for the service.
- Franchise Fee: Franchise fee for the service.
- Voucher Date: Date the voucher was issued for the service.
- Billed Demand: Billed demand for the service in KVA.
- kWh Usage: Kilowatt-hour usage for the service.
- Nodal Cu Charge: Nodal Cu Charge for the service.
- Adder Charge: Adder Charge for the service.
- Reliant Contract No: integer (ratio)
- Service Address: string (nominal)
- Meter No: integer (nominal)
- ESID: integer (nominal)
- Business Area: integer (ratio))
- Cost Center: integer (ratio)
- Fund: integer (ratio)
- Bill Type: string (nominal)
- Bill Date: date (nominal)
- Read Date: date (nominal)
- Due Date: date (nominal)
- Meter Read: integer (ratio)
- Base Cost: float (nominal)
- T&D Discretionary: float (nominal)
- T&D Charges: float (nominal)
- Current Due: float (nominal)
- Index Charge: float (nominal)
- Total Due: float (nominal)
- Franchise Fee: float (nominal)
- Voucher Date: date (nominal)
- Billed Demand (KVA): integer (nominal)
- kWh Usage: integer (nominal)
- Nodal Cu Charge: float (nominal)
- Adder Charge: float (nominal)
- The key issue in generating electricity is to determine how much capacity to generate in order to meet future demand.
- Electricity usage forecasting involves predicting the demand for electricity over a specific period. This process has several uses, including energy procurement, where it helps suppliers purchase the right amount of energy to ensure a steady supply.
- The advancement of smart infrastructure and integration of distributed renewable power has raised future supply, demand, and pricing uncertainties. This unpredictability has increased interest in price prediction and energy analysis.
- Previous electricity usage data can be used for predicting the usage for future (Time-Series) - Hyndavi
- Group areas based on their energy consumption (Clustering) - Sunil
- Electricity usage can be predicted by using correlated features (Regression) - Sourabh
- Classification of bill type can be done using features in the data (Classification) - Sharmisha