In this notebook, the data analysis to study driver charging behavior in adaptive charging networks is performed. This study has 3 main objectives:

Analyze the driver charging behavior before and after pricing policy changes for 2 sites
Analyze the driver charging behavior throughout the year of 2019 for 2 sites
Analyze independent driver charging behavior throughout the year of 2019 for 2 sites

The dataset used in this study is the ACN-Data dataset, proposed in [1]. The dataset contains 2 sites that will be further investigated in this study:

Caltech : 54 6.6kW level-2 EVSEs in one campus garage. The site is open to the public.
Jpl : 50 6.6kW level-2 EVSEs in a workplace environment. The site is used only by employees.

For reference, the capacity in each site is equal to:

Caltech : 150 kW.
Jpl : 195 kW.

Obs : we are not considering the fast charger that is contained in these sites.

Obs2 : in the website it says that the Caltech site has 80 EVSEs and 300kW capacity.

To check the fields contained in the dataset, please check ACN-Data.

Adaptive Charging Networks

First of all, it is possibly that this is the first time the concept of Adaptive Charging Networks is introduced for the reader. In a normal charging network, fixed power is provided to the EV at each timestep. At an Adaptive Charging Network, on the other hand, the power at each timestep (also called pilot signal) is controlled at each timestep by an adaptive scheduling algorithm to optimize a desired cost function, e.g., charge the EVs as fast as possible while meeting power capacity constraints.

In both sites, an adaptive scheduling algorithm is used to deliver each driver's requested energy prior to her stated departure without exceeding the infrastructure capacity.

The adaptive scheduling algorithm describes each EV as a tuple $(a_i, e_i, d_i, r_i)$ where $a_i$ is the EV's arrival time relative to the start of the optimization horizon, $e_i$ is its energy demand, $d_i$ is the duration of the session, and $r_i$ is the maximum charging rate for EV i.

The objective encourage EVs to finish charging as quickly as possible, freeing up capacity for future arrivals.

It is possible that the energy delivered may not reach the user's requested energy due to their battery becoming full or congestion in the system. (In this study we will investigate if congestion of the system situation happens)

Claimed/Unclaimed Sessions

To obtain data directly from users, a mobile application is used. The driver is able to input their estimated departure time and requested energy. This is referred as user input data. Sessions with an associated iser input are defined as claimed and those without this data as unclaimed. Claimed sessions are useful for studying individual user behavior and is the main focus of this study.

Next, we import the libraries necessary for our study.

#collapse-hide
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from datetime import datetime
import calendar
import pytz
from ast import literal_eval
# Timezone of the ACN we are using.
timezone = pytz.timezone('America/Los_Angeles')
# Time format
fmt = "%Y-%m-%d %H:%M:%S"

ANALYSIS 1: BEFORE/AFTER PRICING POLICY CHANGE

For the 2 sites studied, pricing policies changes were applied in Nov. 1, 2018.

Caltech: Before Nov. 1, 2018 the operation of the Caltech ACN was free for drivers. However, beginning of Nov. 1, 2018 a fee of 0.12 dollars per KWh was imposed.
Jpl: In the Jpl ACN the price for claimed data was kept the same, but the unclaimed sessions were terminated after 30 minutes to encourage claimed sessions.

The next figure, from [1] shows the pattern in the number of sessions because of the pricing policy. In Caltech ACN both the number of session per day and daily energy delivered decreased significantly. For the Jpl ACN the pattern was similar after the policy change, as the Jpl ACN is in a workplace it is more insensitive for price and policy changes.

In this analysis, we have the objective of:

Plotting graphs with minutesCharging, minutesAvailable, minutesTotal, kWhDelivered and KWhRequested for each site.
Calculating the statistics related to minutesIdle, errorMinAvailable, kWhDelivered, errorkWhRequested
Plotting graphs with maeMinAvailable, smapeMinAvailable, smpeMinAvailable, maekWhRequested, smapekWhRequested for each site.
Calculating the number of unique users that used the ACN during that period of time and average chargers per user, also plotting a userCounts graph.

No SMPE error for the kWhRequested because kWhDelivered < kWhRequested

The 3 errors used in this analysis are:

MAE (Mean Absolute Error): $abs(minutesAvailable - minutesTotal)$, $abs(kWhRequested - kWhDelivered)$.
SMAPE (Symmetric mean absolute percentage error).
SMPE (Symmetric mean percentage error).

Next, we load and start the preprocessing of the data for the analysis.

#collapse-hide
caltech_before_df = pd.read_csv("../data/data_18_sep_oct.csv")
caltech_after_df = pd.read_csv("../data/data_18_nov_dec.csv")
jpl_before_df = pd.read_csv("../data/jpl_18_sep_oct.csv")
jpl_after_df = pd.read_csv("../data/jpl_18_nov_dec.csv")
# preprocessing the dataframe before the policy change
caltech_before_df = caltech_before_df[caltech_before_df['userID'].notnull()]
caltech_before_df = caltech_before_df[caltech_before_df['connectionTime'].notnull()]
caltech_before_df = caltech_before_df[caltech_before_df['disconnectTime'].notnull()]
caltech_before_df = caltech_before_df[caltech_before_df['doneChargingTime'].notnull()]
# preprocessing the dataframe after the policy change
caltech_after_df = caltech_after_df[caltech_after_df['userID'].notnull()]
caltech_after_df = caltech_after_df[caltech_after_df['connectionTime'].notnull()]
caltech_after_df = caltech_after_df[caltech_after_df['disconnectTime'].notnull()]
caltech_after_df = caltech_after_df[caltech_after_df['doneChargingTime'].notnull()]
# printing the shape of each data
print("Caltech: Shape of the dataset before the policy change: " + str(caltech_before_df.shape))
print("Caltech: Shape of the dataset after the policy change: " + str(caltech_after_df.shape))
# preprocessing the dataframe before the policy change
jpl_before_df = jpl_before_df[jpl_before_df['userID'].notnull()]
jpl_before_df = jpl_before_df[jpl_before_df['connectionTime'].notnull()]
jpl_before_df = jpl_before_df[jpl_before_df['disconnectTime'].notnull()]
jpl_before_df = jpl_before_df[jpl_before_df['doneChargingTime'].notnull()]
# preprocessing the dataframe after the policy change
jpl_after_df = jpl_after_df[jpl_after_df['userID'].notnull()]
jpl_after_df = jpl_after_df[jpl_after_df['connectionTime'].notnull()]
jpl_after_df = jpl_after_df[jpl_after_df['disconnectTime'].notnull()]
jpl_after_df = jpl_after_df[jpl_after_df['doneChargingTime'].notnull()]
# printing the shape of each data
print("Jpl: Shape of the dataset before the policy change: " + str(jpl_before_df.shape))
print("Jpl: Shape of the dataset after the policy change: " + str(jpl_after_df.shape))

Caltech: Shape of the dataset before the policy change: (619, 14)
Caltech: Shape of the dataset after the policy change: (1759, 14)
Jpl: Shape of the dataset before the policy change: (735, 14)
Jpl: Shape of the dataset after the policy change: (2144, 14)

Continuing the preprocessing...

Caltech: We can see that from Sep. 1 to Oct. 31 of 2018 we have 619 claimed sessions. We can see that from Nov. 1 to Dec. 31 of 2018 we have 1759 claimed sessions. The policy change increased the number of claimed sessions.
Jpl: We can see that from Sep. 1 to Oct. 31 of 2018 we have 735 claimed sessions. We can see that from Nov. 1 to Dec. 31 of 2018 we have 2144 claimed sessions. The policy change increased the number of claimed sessions.

Now we keep with the preprocessing of our data. We will create 7 new columns for each dataframe:

minutesCharging
minutesIdle
minutesTotal
userInputsArray
minutesAvailable
kWhRequested
requestedDeparture

For the reader reference, we will show the 5 first rows of the caltech_before_df dataframe.

#collapse-hide
# caltech_before_df preprocessing
caltech_before_df["minutesCharging"] = caltech_before_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_before_df["minutesIdle"] = caltech_before_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_before_df["minutesTotal"] = caltech_before_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_before_df['userInputsArray'] = caltech_before_df['userInputs'].apply(literal_eval)
caltech_before_df['minutesAvailable'] = caltech_before_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
caltech_before_df['kWhRequested'] = caltech_before_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
caltech_before_df['requestedDeparture'] = caltech_before_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)
# caltech_after_df preprocessing
caltech_after_df["minutesCharging"] = caltech_after_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_after_df["minutesIdle"] = caltech_after_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_after_df["minutesTotal"] = caltech_after_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_after_df['userInputsArray'] = caltech_after_df['userInputs'].apply(literal_eval)
caltech_after_df['minutesAvailable'] = caltech_after_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
caltech_after_df['kWhRequested'] = caltech_after_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
caltech_after_df['requestedDeparture'] = caltech_after_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)
# jpl_before_df preprocessing
jpl_before_df["minutesCharging"] = jpl_before_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_before_df["minutesIdle"] = jpl_before_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_before_df["minutesTotal"] = jpl_before_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_before_df['userInputsArray'] = jpl_before_df['userInputs'].apply(literal_eval)
jpl_before_df['minutesAvailable'] = jpl_before_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
jpl_before_df['kWhRequested'] = jpl_before_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
jpl_before_df['requestedDeparture'] = jpl_before_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)
# jpl_after_df preprocessing
jpl_after_df["minutesCharging"] = jpl_after_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_after_df["minutesIdle"] = jpl_after_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_after_df["minutesTotal"] = jpl_after_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_after_df['userInputsArray'] = jpl_after_df['userInputs'].apply(literal_eval)
jpl_after_df['minutesAvailable'] = jpl_after_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
jpl_after_df['kWhRequested'] = jpl_after_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
jpl_after_df['requestedDeparture'] = jpl_after_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)

# display caltech_before_df first 5 rows
caltech_before_df.head()

Plotting the first histograms: minutesCharging, minutesAvailable, minutesTotal

Next we will plot the histogram to analyze the behavior before/after policy change of the driver for three variables:

minutesCharging: number of minutes the EV was charging (dependent on the adaptive scheduling algorithm objective function).
minutesAvailable: number of minutes the driver said he would be available for charging (input of the mobile application). Obs: The driver can modify this value dynamically through the app, but here we just consider the first input by the user, i.e., the input recorded when he plugged in the EV to the EVSE.
minutesTotal: the real number of minutes the EV was plugged to the EVSE.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 1200, 100)
fig1, (ax1_fig1, ax2_fig1, ax3_fig1) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig1.hist(caltech_before_df["minutesCharging"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig1.hist(caltech_after_df["minutesCharging"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig1.set_ylim([0, 140])
ax1_fig1.set_title("Minutes Charging")
ax1_fig1.set_xlabel("Minutes")
ax1_fig1.set_ylabel("Number of Sessions")
ax2_fig1.hist(caltech_before_df["minutesTotal"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax2_fig1.hist(caltech_after_df["minutesTotal"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax2_fig1.set_ylim([0, 140])
ax2_fig1.set_title("Minutes Total")
ax2_fig1.set_xlabel("Minutes")
ax2_fig1.set_ylabel("Number of Sessions")
ax3_fig1.hist(caltech_before_df["minutesAvailable"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax3_fig1.hist(caltech_after_df["minutesAvailable"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax3_fig1.set_ylim([0, 140])
ax3_fig1.set_title("Minutes Available (user input)")
ax3_fig1.set_xlabel("Minutes")
ax3_fig1.set_ylabel("Number of Sessions")
fig1.suptitle('Caltech site - Before/After Policy Change - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 1200, 100)
fig2, (ax1_fig2, ax2_fig2, ax3_fig2) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig2.hist(jpl_before_df["minutesCharging"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig2.hist(jpl_after_df["minutesCharging"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig2.set_ylim([0, 140])
ax1_fig2.set_title("Minutes Charging")
ax1_fig2.set_xlabel("Minutes")
ax1_fig2.set_ylabel("Number of Sessions")
ax2_fig2.hist(jpl_before_df["minutesTotal"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax2_fig2.hist(jpl_after_df["minutesTotal"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax2_fig2.set_ylim([0, 140])
ax2_fig2.set_title("Minutes Total")
ax2_fig2.set_xlabel("Minutes")
ax2_fig2.set_ylabel("Number of Sessions")
ax3_fig2.hist(jpl_before_df["minutesAvailable"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax3_fig2.hist(jpl_after_df["minutesAvailable"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax3_fig2.set_ylim([0, 140])
ax3_fig2.set_title("Minutes Available (user input)")
ax3_fig2.set_xlabel("Minutes")
ax3_fig2.set_ylabel("Number of Sessions")
fig2.suptitle('Jpl site - Before/After Policy Change - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for minutesCharging, minutesTotal and minutesAvailable

Caltech: After unclaimed sessions started to be terminated after 30 min, it is seen in the minutesCharging graph a big increase in session of less than 200 minutes. These users usually used the unclaimed option, but because this option is not available anymore, they have to use the claimed option even to these sessions.
Jpl: We can see that the number of claimed sessions increased, but kept a similar pattern to the ones observed before the policy changed. This is expected because the Jpl is a workplace environment, this is clear in the minutesTotal histogram where a peak around 540-600 minutes that is relative to 9-10 hours, a normal working hours routine.

Plotting more histograms: kWhDelivered, kWhRequested

Next we will plot the histogram to analyze the behavior before/after policy change of the driver for two variables:

kWhDelivered: number of kWh delivered to the EV during one charging session (dependent on the adaptive scheduling algorithm objective function and if the EV is completely charged).
KWhRequested: number of kWh requested by the driver (input of the mobile application).

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 100, 100)
fig3, (ax1_fig3, ax2_fig3) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig3.hist(caltech_before_df['kWhDelivered'], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig3.hist(caltech_after_df['kWhDelivered'], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig3.set_ylim([0, 300])
ax1_fig3.set_title("kWh Delivered")
ax1_fig3.set_xlabel("kWh")
ax1_fig3.set_ylabel("Number of Sessions")
ax2_fig3.hist(caltech_before_df['kWhRequested'], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax2_fig3.hist(caltech_after_df['kWhRequested'], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax2_fig3.set_ylim([0, 300])
ax2_fig3.set_title("kWh Requested")
ax2_fig3.set_xlabel("kWh")
ax2_fig3.set_ylabel("Number of Sessions")
fig3.suptitle('Caltech site - Before/After Policy Change - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 100, 100)
fig4, (ax1_fig4, ax2_fig4) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig4.hist(jpl_before_df["kWhDelivered"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig4.hist(jpl_after_df["kWhDelivered"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig4.set_ylim([0, 300])
ax1_fig4.set_title("kWh Delivered")
ax1_fig4.set_xlabel("kWh")
ax1_fig4.set_ylabel("Number of Sessions")
ax2_fig4.hist(jpl_before_df["kWhRequested"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax2_fig4.hist(jpl_after_df["kWhRequested"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax2_fig4.set_ylim([0, 300])
ax2_fig4.set_title("kWh Requested")
ax2_fig4.set_xlabel("kWh")
ax2_fig4.set_ylabel("Number of Sessions")
fig4.suptitle('Jpl site - Before/After Policy Change - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for kWhDelivered, kWhRequested

Caltech: After unclaimed sessions started to be terminated after 30 min, it is seen in the kWhDelivered graph a big increase in session of less than 20 kWh. These users usually used the unclaimed option, but because this option is not available anymore, they have to use the claimed option even to these sessions.
Jpl: We can see that the number of claimed sessions increased, but kept a similar pattern to the ones observed before the policy changed. This is expected because the Jpl is a workplace environment.

Comparing the Jpl and Caltech sites, we see that after the policy changed, in Caltech claimed session 0.12dollars/kWh and in Jpl claimed session 0.10dollars/kWh, the patterns of kWhDelivered and kWhRequested are similar between both places.

Calculating statistics...

Next we will calculate the following statistics to analyze the behavior before/after policy change of the driver:

minutesIdle(mean): minutes the EV is in the EVSE but it is not charging.
errorMinAvailable(mean): error between minAvailable and minTotal.
kWhDelivered(mean): the average kWh delivered per session.
errorKWhRequested(mean): error between kWhRequested and kWhDelivered.

#collapse-hide
# statistics for caltech site

# before the policy change
print ("Caltech - Before:")
print ("minutes idle (mean): " + str(caltech_before_df['minutesIdle'].mean()))
caltech_before_df["maeMinAvailable"] = caltech_before_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
caltech_before_df["smapeMinAvailable"] = caltech_before_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
caltech_before_df["smpeMinAvailable"] = caltech_before_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(caltech_before_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_before_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_before_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_before_df['kWhDelivered'].mean()))
caltech_before_df["maekWhRequested"] = caltech_before_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
caltech_before_df["smapekWhRequested"] = caltech_before_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(caltech_before_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_before_df["smapekWhRequested"].mean()))
# after the policy change
print ("Caltech - After: ")
print ("minutes idle (mean): " + str(caltech_after_df['minutesIdle'].mean()))
caltech_after_df["maeMinAvailable"] = caltech_after_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
caltech_after_df["smapeMinAvailable"] = caltech_after_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
caltech_after_df["smpeMinAvailable"] = caltech_after_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(caltech_after_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_after_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_after_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_after_df['kWhDelivered'].mean()))
caltech_after_df["maekWhRequested"] = caltech_after_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
caltech_after_df["smapekWhRequested"] = caltech_after_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(caltech_after_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_after_df["smapekWhRequested"].mean()))

print (" ")

# statistics for jpl site
      
# before the policy change
print ("Jpl - before:")
print ("minutes idle (mean): " + str(jpl_before_df['minutesIdle'].mean()))
jpl_before_df["maeMinAvailable"] = jpl_before_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
jpl_before_df["smapeMinAvailable"] = jpl_before_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
jpl_before_df["smpeMinAvailable"] = jpl_before_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(jpl_before_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_before_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_before_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_before_df['kWhDelivered'].mean()))
jpl_before_df["maekWhRequested"] = jpl_before_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
jpl_before_df["smapekWhRequested"] = jpl_before_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(jpl_before_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_before_df["smapekWhRequested"].mean()))
# after the policy change
print ("Jpl - after:")
print ("minutes idle (mean): " + str(jpl_after_df['minutesIdle'].mean()))
jpl_after_df["maeMinAvailable"] = jpl_after_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
jpl_after_df["smapeMinAvailable"] = jpl_after_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
jpl_after_df["smpeMinAvailable"] = jpl_after_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(jpl_after_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_after_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_after_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_after_df['kWhDelivered'].mean()))
jpl_after_df["maekWhRequested"] = jpl_after_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
jpl_after_df["smapekWhRequested"] = jpl_after_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(jpl_after_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_after_df["smapekWhRequested"].mean()))

Caltech - Before:
minutes idle (mean): 211.92463651050048
MAE minutes available: 143.89348411416253
SMAPE minutes available: 19.790817990039784
SMPE minutes available: -5.161958741899901
kWh delivered (mean): 16.381960629956744
MAE kwh requested: 10.38535421091463
SMAPE kwh requested: 24.784616007104887
Caltech - After: 
minutes idle (mean): 230.30055902975158
MAE minutes available: 153.4930831912073
SMAPE minutes available: 24.978339183551
SMPE minutes available: -10.289225243658253
kWh delivered (mean): 11.263710123325463
MAE kwh requested: 9.092688549348214
SMAPE kwh requested: 29.466186940788237
 
Jpl - before:
minutes idle (mean): 226.89734693877574
MAE minutes available: 138.1982086167801
SMAPE minutes available: 19.35001641501344
SMPE minutes available: -8.359449271282823
kWh delivered (mean): 12.768320210506422
MAE kwh requested: 8.292498958049885
SMAPE kwh requested: 25.1007697004786
Jpl - after:
minutes idle (mean): 214.3910136815916
MAE minutes available: 132.82706778606965
SMAPE minutes available: 18.498907285353585
SMPE minutes available: -7.239987344511835
kWh delivered (mean): 13.604474685556587
MAE kwh requested: 9.723683640780473
SMAPE kwh requested: 26.11943936668319

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech - Before	Caltech - After	Jpl - Before	Jpl - After
minutes idle (min)	211.93	230.30	226.90	214.39
MAE minAvailable (min)	143.89	153.49	138.20	132.83
SMAPE minAvailable (%)	19.79	24.98	19.35	18.49
SMPE minAvailable (%)	-5.16	-10.29	-8.36	-7.24
kWh delivered (kWh)	16.38	11.26	12.77	13.60
MAE kWhRequested (kWh)	10.39	9.09	8.29	9.72
SMAPE kWhRequested (%)	24.79	29.47	25.1	26.11

It is observed that the number of minutes the EV is idle in the EVSE is comparable despite of the location or before/after policy change.

The error between the user input in the app of the minutes available for changing also seems comparable between the 4 options. The meaning of the number is that usually the user input in the app a number that has on average a gap of more than 2 hours of the real time he will plug out his EV.

The average kWh delivered is decreased considerably in the Caltech website, what was expected based on our histograms.

It is worth noting the high magnitute of the error between the user input in the app of the kWh requested and the total kWh delivered. Possibly also, the user tends to input a higher value of kWh requested than the real number he needs to charge completely the battery.

The error values poses an interesting research question: should we believe the information that is input by the user in the scheduling algorithm?

Plotting the error histograms to find patterns in the error metrics

Next, we calculate the histograms for the MAE and SMPE for the Time Analysis and the histograms for the MAE and SMAPE for the Energy Analysis. It is noted that there is no SMPE for the energy because kWhDelivered <= kWhRequested.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig27, (ax1_fig27, ax2_fig27) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig27.hist(caltech_before_df["maeMinAvailable"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig27.hist(caltech_after_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig27.set_ylim([0, 140])
ax1_fig27.set_title("MAE - Minutes Available")
ax1_fig27.set_xlabel("Minutes")
ax1_fig27.set_ylabel("Number of Sessions")
ax2_fig27.hist(caltech_before_df["smpeMinAvailable"], bins_symmetric, alpha=0.9, label='sep-oct-2018')
ax2_fig27.hist(caltech_after_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='nov-dec-2018')
ax2_fig27.set_ylim([0, 140])
ax2_fig27.set_title("SMPE - Minutes Available")
ax2_fig27.set_xlabel("%")
ax2_fig27.set_ylabel("Number of Sessions")
fig27.suptitle('Caltech site - Before/After Policy Change - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig28, (ax1_fig28, ax2_fig28) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig28.hist(jpl_before_df["maeMinAvailable"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig28.hist(jpl_after_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig28.set_ylim([0, 140])
ax1_fig28.set_title("MAE - Minutes Available")
ax1_fig28.set_xlabel("Minutes")
ax1_fig28.set_ylabel("Number of Sessions")
ax2_fig28.hist(jpl_before_df["smpeMinAvailable"], bins_symmetric, alpha=0.9, label='sep-oct-2018')
ax2_fig28.hist(jpl_after_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='nov-dec-2018')
ax2_fig28.set_ylim([0, 140])
ax2_fig28.set_title("SMPE - Minutes Available")
ax2_fig28.set_xlabel("%")
ax2_fig28.set_ylabel("Number of Sessions")
fig28.suptitle('Jpl site - Before/After Policy Change - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig29, (ax1_fig29, ax2_fig29) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig29.hist(caltech_before_df['maekWhRequested'], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig29.hist(caltech_after_df['maekWhRequested'], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig29.set_ylim([0, 300])
ax1_fig29.set_title("MAE - kWh Requested")
ax1_fig29.set_xlabel("kWh")
ax1_fig29.set_ylabel("Number of Sessions")
ax2_fig29.hist(caltech_before_df['smapekWhRequested'], bins_percent, alpha=0.9, label='sep-oct-2018')
ax2_fig29.hist(caltech_after_df['smapekWhRequested'], bins_percent, alpha=0.5, label='nov-dec-2018')
ax2_fig29.set_ylim([0, 300])
ax2_fig29.set_title("SMAPE - kWh Requested")
ax2_fig29.set_xlabel("%")
ax2_fig29.set_ylabel("Number of Sessions")
fig29.suptitle('Caltech site - Before/After Policy Change - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig30, (ax1_fig30, ax2_fig30) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig30.hist(jpl_before_df["maekWhRequested"], bins_minutes, alpha=0.9, label='sep-oct-2018')
ax1_fig30.hist(jpl_after_df["maekWhRequested"], bins_minutes, alpha=0.5, label='nov-dec-2018')
ax1_fig30.set_ylim([0, 300])
ax1_fig30.set_title("MAE - kWh Requested")
ax1_fig30.set_xlabel("kWh")
ax1_fig30.set_ylabel("Number of Sessions")
ax2_fig30.hist(jpl_before_df["smapekWhRequested"], bins_percent, alpha=0.9, label='sep-oct-2018')
ax2_fig30.hist(jpl_after_df["smapekWhRequested"], bins_percent, alpha=0.5, label='nov-dec-2018')
ax2_fig30.set_ylim([0, 300])
ax2_fig30.set_title("SMAPE - kWh Requested")
ax2_fig30.set_xlabel("%")
ax2_fig30.set_ylabel("Number of Sessions")
fig30.suptitle('Jpl site - Before/After Policy Change - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the graphs obtained

It is interesting to observe the gaussian pattern of the SMPE metric in the time analysis.

The distribution for the SMAPE in the energy analysis seems uniform and no clear pattern detected.

Plotting the number of unique users and their number of charges histogram pattern

Next, we calculate the statistics related to the number of unique users in each of the sites, the average number of charges of each user and how this pattern is observed on a histogram.

#collapse-hide
# user statistics for caltech site

# before the policy change
print ("Caltech - before:")
print("number of unique users: "+ str(caltech_before_df['userID'].nunique()))
print("average sessions by user: " + str(caltech_before_df['userID'].count()/caltech_before_df['userID'].nunique()))
# after the policy change
print ("Caltech - after:")
print("number of unique users: "+ str(caltech_after_df['userID'].nunique()))
print("average sessions by user: " + str(caltech_after_df['userID'].count()/caltech_after_df['userID'].nunique()))

# user statistics for jpl site

# before the policy change
print ("Jpl - before:")
print("number of unique users: "+ str(jpl_before_df['userID'].nunique()))
print("average sessions by user: " + str(jpl_before_df['userID'].count()/jpl_before_df['userID'].nunique()))
# after the policy change
print ("Jpl - after:")
print("number of unique users: "+ str(jpl_after_df['userID'].nunique()))
print("average sessions by user: " + str(jpl_after_df['userID'].count()/jpl_after_df['userID'].nunique()))

# plot the graph of value counts behavior by id
# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 60, 60)
fig5, (ax1_fig5, ax2_fig5, ax3_fig5, ax4_fig5) = plt.subplots(1, 4, figsize=(15, 5))
ax1_fig5.hist(caltech_before_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='caltech-sep-oct-2018')
ax1_fig5.set_ylim([0, 40])
ax1_fig5.set_title("Caltech - Before")
ax1_fig5.set_xlabel("Number of Sessions")
ax1_fig5.set_ylabel("Number of Users")
ax2_fig5.hist(caltech_after_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='sep-oct-2018')
ax2_fig5.set_ylim([0, 40])
ax2_fig5.set_title("Caltech - After")
ax2_fig5.set_xlabel("Number of Sessions")
ax2_fig5.set_ylabel("Number of Users")
ax3_fig5.hist(jpl_before_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='caltech-sep-oct-2018')
ax3_fig5.set_ylim([0, 40])
ax3_fig5.set_title("Jpl - Before")
ax3_fig5.set_xlabel("Number of Sessions")
ax3_fig5.set_ylabel("Number of Users")
ax4_fig5.hist(jpl_after_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='sep-oct-2018')
ax4_fig5.set_ylim([0, 40])
ax4_fig5.set_title("Jpl - After")
ax4_fig5.set_xlabel("Number of Sessions")
ax4_fig5.set_ylabel("Number of Users")
fig5.suptitle('Number of sessions by user - Before/After Policy Change', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Caltech - before:
number of unique users: 72
average sessions by user: 8.597222222222221
Caltech - after:
number of unique users: 181
average sessions by user: 9.718232044198896
Jpl - before:
number of unique users: 121
average sessions by user: 6.074380165289257
Jpl - after:
number of unique users: 187
average sessions by user: 11.46524064171123

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech - Before	Caltech - After	Jpl - Before	Jpl - After
# unique users	72	181	121	187
avg sessions by user	8.59	9.7	6.07	11.47

It is shown in the table the high increase in the number of unique users after the policy change. Because the unclaimed option is terminated after 30min, the users are forced to use the claimed option what assignes an unique userId to them.

It is interesting to observe that despite, the number of unique users increases considerably, the average number of sessions by user remains around the same value before/after the policy change in the Caltech site.

On the other hand, the average sessions by user increases considerably after the policy change in the Jpl site. This suggests that some users interchangeably used the claimed/unclaimed option to charge their car. For example, for cases that the user would stay less than 4 hours or their energy request was low, they would prefer an unclaimed session to save time not needing to use the app.

Finally, the histograms of the number of sessions by user is analyzed.

An interesting observation is that a high number of users used the charging system less than 10 times during the 2 month period of the analysis in the Caltech site. As the Caltech site is open to the public, this pattern is probably observed by the open public going to the university to use the public facilities, e.g. the gym.

For the Jpl site, even a similar curve is observed, the histogram is more distributed. As the Jpl is a working place, a higher number of sessions per user is expected.

Analysis 1 - Conclusion

In this section we investigated the driver charging behavior in two sites, Caltech and Jpl, using Adaptive Charging Networks before and after a new pricing policy is enforced.

We observe that the minutesCharging and kWhDelivered histograms follows the same pattern for both sites, what is expected as they use the same adaptive scheduling algorithm. We observe different patterns for the minutesTotal histogram, as one of the sites is open to the public and the other one is a workplace environment.
We observe that the EV is idle for around 3.68 hours, what is a high number that could be used in more flexible frameworks such as V2G. A high error of 2.37 hours between the number of minutes available that the user input in the app and the real number of minutes total that the EV stays plugged in, what opens up the possibility of not relying only in the information input by the user but also in individual models modeled for each user. We observe a high error also between the kWh requested by the user and the real number of kWh delivered. A deeper analysis is needed to check if this occurs because of the adaptive scheduling algorithm or because the user tends to input a kWhRequested value that is much higher than the energy he needs to completely charge the EV battery.
We observe that the number of unique users is highly increased in both sites after the new pricing policy is enforced. The average sessions by user increases from 6.07 to 11.47 in the Jpl site as the users are now required to use the claimed option even for short charging sessions. Then, it is shown that a high number of users used the charging station for less than 10 sessions in the 2-month period of this analysis

ANALYSIS 2: THE YEAR OF 2019

For the 2 sites studied, Caltech and Jpl, driver charging behavior for the year of 2019 will be investigated.

In this analysis, we have the objective of:

Plotting graphs with number of cars per hour/min, kWhRequested per hour/min, kWhDelivered per hour/min
Plotting graphs with minutesCharging, minutesAvailable, minutesTotal, kWhDelivered and KWhRequested for each site.
Calculating the statistics related to minutesIdle, errorMinAvailable, kWhDelivered, errorkWhRequested
Calculating the number of unique users that used the ACN during that period of time and average chargers per user, also plotting a userCounts graph.

Next, we load the dataset for 2019 and start the preprocessing of the data for the analysis.

#collapse-hide
caltech_2019_df = pd.read_csv("../data/data_19_jan_dec.csv")
jpl_2019_df = pd.read_csv("../data/jpl_19_jan_dec.csv")
# preprocessing the dataframe for the caltech site
caltech_2019_df = caltech_2019_df[caltech_2019_df['userID'].notnull()]
caltech_2019_df = caltech_2019_df[caltech_2019_df['connectionTime'].notnull()]
caltech_2019_df = caltech_2019_df[caltech_2019_df['disconnectTime'].notnull()]
caltech_2019_df = caltech_2019_df[caltech_2019_df['doneChargingTime'].notnull()]
# printing the shape of each data
print("Caltech: Shape of the dataset for the year of 2019: " + str(caltech_2019_df.shape))
# preprocessing the dataframe before the policy change
jpl_2019_df = jpl_2019_df[jpl_2019_df['userID'].notnull()]
jpl_2019_df = jpl_2019_df[jpl_2019_df['connectionTime'].notnull()]
jpl_2019_df = jpl_2019_df[jpl_2019_df['disconnectTime'].notnull()]
jpl_2019_df = jpl_2019_df[jpl_2019_df['doneChargingTime'].notnull()]
# printing the shape of each data
print("Jpl: Shape of the dataset for the year of 2019: " + str(jpl_2019_df.shape))

Caltech: Shape of the dataset for the year of 2019: (8677, 14)
Jpl: Shape of the dataset for the year of 2019: (16534, 14)

Continuing the preprocessing...

Caltech: We can see that from Jan. 1 to Dec. 31 of 2019 we have 8677 claimed sessions.
Jpl: We can see that from Jan. 1 to Dec. 31 of 2019 we have 16534 claimed sessions.

Now we keep with the preprocessing of our data. We will create 9 new columns for each dataframe:

arrivalTime
departureTime
day
minutesCharging
minutesIdle
minutesTotal
userInputsArray
minutesAvailable
kWhRequested
requestedDeparture

Compared to Analysis 1 we add two additional columns:

'arrivalTime': related to the time/hour the EV was plugged in using a 24-hour format in seconds
'departureTime': related to the time/hour the EV was plugged out using a 24-hour format in seconds
'day': related to the day of the week the session was performed

For the reader reference, we will show the 5 first rows of the caltech_2019_df dataframe.

#collapse-hide
# caltech_2019_df preprocessing
caltech_2019_df["minutesCharging"] = caltech_2019_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_2019_df["minutesIdle"] = caltech_2019_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_2019_df["minutesTotal"] = caltech_2019_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
caltech_2019_df['userInputsArray'] = caltech_2019_df['userInputs'].apply(literal_eval)
caltech_2019_df['minutesAvailable'] = caltech_2019_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
caltech_2019_df['kWhRequested'] = caltech_2019_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
caltech_2019_df['requestedDeparture'] = caltech_2019_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)
caltech_2019_df["day"] = caltech_2019_df.apply(lambda row: calendar.day_name[datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).weekday()], axis=1)
caltech_2019_df["arrivalTime"] = caltech_2019_df.apply(lambda row: int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).hour)*3600 +
                                                                   int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).minute)*60 +
                                                                   int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).second), axis=1)
caltech_2019_df["departureTime"] = caltech_2019_df.apply(lambda row: int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).hour)*3600 +
                                                                     int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).minute)*60 +
                                                                     int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).second), axis=1)

# jpl_2019_df preprocessing
jpl_2019_df["minutesCharging"] = jpl_2019_df.apply(lambda row: (datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_2019_df["minutesIdle"] = jpl_2019_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.doneChargingTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_2019_df["minutesTotal"] = jpl_2019_df.apply(lambda row: (datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt) - datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt)).seconds/60, axis=1)
jpl_2019_df['userInputsArray'] = jpl_2019_df['userInputs'].apply(literal_eval)
jpl_2019_df['minutesAvailable'] = jpl_2019_df.apply(lambda row: row.userInputsArray[0]['minutesAvailable'], axis=1)
jpl_2019_df['kWhRequested'] = jpl_2019_df.apply(lambda row: row.userInputsArray[0]['kWhRequested'], axis=1)
jpl_2019_df['requestedDeparture'] = jpl_2019_df.apply(lambda row: row.userInputsArray[0]['requestedDeparture'], axis=1)
jpl_2019_df["day"] = jpl_2019_df.apply(lambda row: calendar.day_name[datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).weekday()], axis=1)
jpl_2019_df["arrivalTime"] = jpl_2019_df.apply(lambda row: int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).hour)*3600 +
                                                    int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).minute)*60 +
                                                    int(datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).second), axis=1)
jpl_2019_df["departureTime"] = jpl_2019_df.apply(lambda row: int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).hour)*3600 +
                                                                     int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).minute)*60 +
                                                                     int(datetime.strptime(row.disconnectTime.rsplit('-', 1)[0], fmt).second), axis=1)

# display caltech_before_df first 5 rows
# 'time': 86400 seconds per day / 600 seconds 10 minutes / 144 bins needed
caltech_2019_df.head()

Plotting the first histograms: arrivalTime, departureTime

Next we will plot the histogram to analyze the behavior of arrivals and departures in both sites:

arrivalTime: time the EV is plugged in.
departureTime: time the EV is plugged out.

#collapse-hide
def format_func(value, tick_number):
    N = value
    if N == 0:
        return "0"
    elif N == 10000:
        return r"2.775"
    elif N == 20000:
        return r"5.55"
    elif N == 30000:
        return r"8.325"
    elif N == 40000:
        return r"11.11"
    elif N == 50000:
        return r"13.875"
    elif N == 60000:
        return r"16.67"
    elif N == 70000:
        return r"19.425"
    else:
        return r"22.22"

# plot the histogram for the arrival time and departure time in caltech and jpl sites
bins_minutes = np.linspace(0, 86400, 288)
fig6, (ax1_fig6, ax2_fig6) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig6.hist(caltech_2019_df["arrivalTime"], bins_minutes, alpha=0.9, label='caltech')
ax1_fig6.hist(jpl_2019_df["arrivalTime"], bins_minutes, alpha=0.5, label='jpl')
ax1_fig6.set_xlim([0, 86400])
ax1_fig6.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig6.set_ylim([0, 800])
ax1_fig6.set_title("Arrival Time")
ax1_fig6.set_xlabel("Time 0-24 hours in a day")
ax1_fig6.set_ylabel("Number of Sessions")
ax2_fig6.hist(caltech_2019_df["departureTime"], bins_minutes, alpha=0.9, label='caltech')
ax2_fig6.hist(jpl_2019_df["departureTime"], bins_minutes, alpha=0.5, label='jpl')
ax2_fig6.set_xlim([0, 86400])
ax2_fig6.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig6.set_ylim([0, 800])
ax2_fig6.set_title("Departure Time")
ax2_fig6.set_xlabel("Time 0-24 hours in a day")
ax2_fig6.set_ylabel("Number of Sessions")
fig6.suptitle('Arrival and Departure Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for arrivalTime, departureTime

arrivalTime: The pattern for the Jpl site presentes a big peak from 6am to 8am that is the time the workers arrive at the workplace and plug their EV. In the Caltech site this peak is delayed to 8am to 11am as the hours tend to be more flexible to students and people that just go to the site to charge their EVs. It is interesting to observe that another smaller peak also happens in the JPL site in the afternoon. Probably for people that go to have lunch or have meetings outside the work place.
departureTime: It is seen that for the JPL site most of the EVs are plugged out between 4pm and 7pm, the normal end of the working hours. It is interesting to observe that this pattern is not that different when compared to the Caltech site, even it is possible to see that in the Caltech site the curve is spread until later hours. For the arrivalTime the different hour pattern is very clear. For the Jpl site, a small peak is observed between 11am and 1pm, the lunch hour.

Plotting scatter plots: integral of total kWhDelivered by arrivalTime, integral of total kWhRequested by arrivalTime

Next we will plot the histograms to analyze the behavior of kWhDelivered by arrivalTime and KWhRequested by arrivalTime

kWhDelivered by arrivalTime: number of kWh delivered to the EV based on the time the EV is plugged in.
KWhRequested by arrivalTime: number of kWh requested by the driver based on the time the EV is plugged in.

#collapse-hide
# analyze kWhDelivered and kWhRequested by arrivalTime

# caltech site
caltech_uniqueTime = caltech_2019_df["arrivalTime"].unique()
caltech_uniqueTime = sorted(caltech_uniqueTime)
caltech_kWhRequested = []
caltech_kWhRequested = np.array(caltech_kWhRequested)
caltech_kWhDelivered = []
caltech_kWhDelivered = np.array(caltech_kWhDelivered)
for time in caltech_uniqueTime:
    caltech_kWhRequested = np.append(caltech_kWhRequested, caltech_2019_df[caltech_2019_df.arrivalTime.isin([time])].kWhRequested.sum())
    caltech_kWhDelivered = np.append(caltech_kWhDelivered, caltech_2019_df[caltech_2019_df.arrivalTime.isin([time])].kWhDelivered.sum())

int_caltech_kWhRequested = cumtrapz(caltech_kWhRequested, np.array(caltech_uniqueTime)/3600, initial=0)
int_caltech_kWhDelivered = cumtrapz(caltech_kWhDelivered, np.array(caltech_uniqueTime)/3600, initial=0)
    
print("Caltech - In one year total kWhRequested (kWh): " + str(caltech_2019_df.kWhRequested.sum()))
print("Caltech - In one year total kWhDelivered (kWh): " + str(caltech_2019_df.kWhDelivered.sum()))
    
# jpl site
jpl_uniqueTime = jpl_2019_df["arrivalTime"].unique()
jpl_uniqueTime = sorted(jpl_uniqueTime)
jpl_kWhRequested = []
jpl_kWhRequested = np.array(jpl_kWhRequested)
jpl_kWhDelivered = []
jpl_kWhDelivered = np.array(jpl_kWhDelivered)
for time in jpl_uniqueTime:
    jpl_kWhRequested = np.append(jpl_kWhRequested, jpl_2019_df[jpl_2019_df.arrivalTime.isin([time])].kWhRequested.sum())
    jpl_kWhDelivered = np.append(jpl_kWhDelivered, jpl_2019_df[jpl_2019_df.arrivalTime.isin([time])].kWhDelivered.sum())
    
int_jpl_kWhRequested = cumtrapz(jpl_kWhRequested, np.array(jpl_uniqueTime)/3600, initial=0)
int_jpl_kWhDelivered = cumtrapz(jpl_kWhDelivered, np.array(jpl_uniqueTime)/3600, initial=0)
    
print("Jpl - In one year total kWhRequested (kWh): " + str(jpl_2019_df.kWhRequested.sum()))
print("Jpl - In one year total kWhDelivered (kWh): " + str(jpl_2019_df.kWhDelivered.sum()))
    
# plot the scatter plot for the kWhRequested and kWhDelivered by arrival time in caltech and jpl sites
fig7, (ax1_fig7, ax2_fig7) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig7.scatter(caltech_uniqueTime, int_caltech_kWhRequested, color='r', label='kWhRequested')
ax1_fig7.scatter(caltech_uniqueTime, int_caltech_kWhDelivered, color='b', label='kWhDelivered')
ax1_fig7.set_xlim([0, 86400])
ax1_fig7.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig7.set_ylim([0, 700])
ax1_fig7.set_title("Caltech")
ax1_fig7.set_xlabel("Time 0-24 hours in a day")
ax1_fig7.set_ylabel("kWh")
ax2_fig7.scatter(jpl_uniqueTime, int_jpl_kWhRequested, color='r', label='kWhRequested')
ax2_fig7.scatter(jpl_uniqueTime, int_jpl_kWhDelivered, color='b', label='kWhDelivered')
ax2_fig7.set_xlim([0, 86400])
ax2_fig7.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig7.set_ylim([0, 700])
ax2_fig7.set_title("Jpl")
ax2_fig7.set_xlabel("Time 0-24 hours in a day")
ax2_fig7.set_ylabel("kWh")
fig7.suptitle('Integral of Total kWhRequested/kWhDelivered by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Caltech - In one year total kWhRequested (kWh): 170220.01
Caltech - In one year total kWhDelivered (kWh): 89482.37912916964
Jpl - In one year total kWhRequested (kWh): 443794.13599999994
Jpl - In one year total kWhDelivered (kWh): 248409.3809575

Analyzing the total energy delivered by each site

Statistic	Caltech	Jpl
kWhRequested (MWh)	170.2	443.8
kWhDelivered (MWh)	89.5	248.4

Analyzing the scatter plots for integral of total kWhRequested/kWhDelivered by Arrival Time in one year

Caltech: In the caltech site, we see that the rate is higher from 10pm to 6am, that is possibly for the users that let their EVs re-charging overnight in the campus. There is a discrepancy in the rate of kWhRequested and kWhDelivered curves.
Jpl: The overnight charging pattern observed to the Caltech site is not observed in the Jpl site. There is a discrepancy in the rate of kWhRequested and kWhDelivered curves.

It is interesting to observe that for both sites there is a considerable gap between the kWhRequested and the kWhDelivered curve.

OBS: Additional interpretation on this rate seems needed.

Plotting histograms: minutesCharging, minutesAvailable, minutesTotal

Next we will plot the histograms to analyze the variables minutesCharging, minutesAvailable and minutesTotal.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 1200, 100)
fig8, (ax1_fig8, ax2_fig8, ax3_fig8) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig8.hist(caltech_2019_df["minutesCharging"], bins_minutes, alpha=0.9, label='caltech')
ax1_fig8.set_ylim([0, 1200])
ax1_fig8.set_title("Minutes Charging")
ax1_fig8.set_xlabel("Minutes")
ax1_fig8.set_ylabel("Number of Sessions")
ax2_fig8.hist(caltech_2019_df["minutesTotal"], bins_minutes, alpha=0.9, label='caltech')
ax2_fig8.set_ylim([0, 1200])
ax2_fig8.set_title("Minutes Total")
ax2_fig8.set_xlabel("Minutes")
ax2_fig8.set_ylabel("Number of Sessions")
ax3_fig8.hist(caltech_2019_df["minutesAvailable"], bins_minutes, alpha=0.9, label='caltech')
ax3_fig8.set_ylim([0, 1200])
ax3_fig8.set_title("Minutes Available (user input)")
ax3_fig8.set_xlabel("Minutes")
ax3_fig8.set_ylabel("Number of Sessions")
fig8.suptitle('Caltech site - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 1200, 100)
fig9, (ax1_fig9, ax2_fig9, ax3_fig9) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig9.hist(jpl_2019_df["minutesCharging"], bins_minutes, alpha=0.9, label='jpl')
ax1_fig9.set_ylim([0, 1200])
ax1_fig9.set_title("Minutes Charging")
ax1_fig9.set_xlabel("Minutes")
ax1_fig9.set_ylabel("Number of Sessions")
ax2_fig9.hist(jpl_2019_df["minutesTotal"], bins_minutes, alpha=0.9, label='jpl')
ax2_fig9.set_ylim([0, 1200])
ax2_fig9.set_title("Minutes Total")
ax2_fig9.set_xlabel("Minutes")
ax2_fig9.set_ylabel("Number of Sessions")
ax3_fig9.hist(jpl_2019_df["minutesAvailable"], bins_minutes, alpha=0.9, label='jpl')
ax3_fig9.set_ylim([0, 1200])
ax3_fig9.set_title("Minutes Available (user input)")
ax3_fig9.set_xlabel("Minutes")
ax3_fig9.set_ylabel("Number of Sessions")
fig9.suptitle('Jpl site - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for minutesCharging, minutesTotal and minutesAvailable

Caltech: It is shown that the majority of the sessions charges for less than 200 minutes. Regarding the minutesTotal spent at the Caltech site, the distribution is well spread with small peaks around 100 and around 500 minutes. The minutes available is well spread and does not have a clear pattern.
Jpl: Comparedd to the Caltech site, the minutesCharging histograms shows a better distributed curve, with also a considerable number of sessions between 200-400 minutes. Regarding the minutesTotal there is a strong peak at around 500-600 minutes what corresponds to 9-10 hours, what is expected because the Jpl site is a workplace environment. The minutes available histogram resembles the minutes total histogram, but the peak is not as accentuated.

It seems that the minutesCharging is strongly dependent on the minutesAvailable input by the user in the app.

Plotting more histograms: kWhDelivered, kWhRequested

Next we will plot histograms to analyze the variables kWhDelivered, kWhRequested.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 100, 100)
fig10, (ax1_fig10, ax2_fig10) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig10.hist(caltech_2019_df['kWhDelivered'], bins_minutes, alpha=0.9, label='caltech')
ax1_fig10.set_ylim([0, 2000])
ax1_fig10.set_title("kWh Delivered")
ax1_fig10.set_xlabel("kWh")
ax1_fig10.set_ylabel("Number of Sessions")
ax2_fig10.hist(caltech_2019_df['kWhRequested'], bins_minutes, alpha=0.9, label='caltech')
ax2_fig10.set_ylim([0, 2000])
ax2_fig10.set_title("kWh Requested")
ax2_fig10.set_xlabel("kWh")
ax2_fig10.set_ylabel("Number of Sessions")
fig10.suptitle('Caltech site - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 100, 100)
fig11, (ax1_fig11, ax2_fig11) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig11.hist(jpl_2019_df["kWhDelivered"], bins_minutes, alpha=0.9, label='jpl')
ax1_fig11.set_ylim([0, 2000])
ax1_fig11.set_title("kWh Delivered")
ax1_fig11.set_xlabel("kWh")
ax1_fig11.set_ylabel("Number of Sessions")
ax2_fig11.hist(jpl_2019_df["kWhRequested"], bins_minutes, alpha=0.9, label='jpl')
ax2_fig11.set_ylim([0, 2000])
ax2_fig11.set_title("kWh Requested")
ax2_fig11.set_xlabel("kWh")
ax2_fig11.set_ylabel("Number of Sessions")
fig11.suptitle('Jpl site - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for kWhDelivered, kWhRequested

Caltech: For the Caltech site the majority of the charges are of less than 20kWh.
Jpl: For the Jpl site the majority of the charges are of less than 20kWh.

There are peaks in the kWh Requested histogram that is possibly because of default values of the application.

Calculating statistics...

Next we will calculate the following statistics: minutesIdle, errorMinAvailable, kWhDelivered, errorkWhRequested.

#collapse-hide
# statistics for caltech site in the year of 2019

print ("Caltech - 2019: ")
print ("minutes idle (mean): " + str(caltech_2019_df['minutesIdle'].mean()))
caltech_2019_df["maeMinAvailable"] = caltech_2019_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
caltech_2019_df["smapeMinAvailable"] = caltech_2019_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
caltech_2019_df["smpeMinAvailable"] = caltech_2019_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(caltech_2019_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_2019_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_after_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_2019_df['kWhDelivered'].mean()))
caltech_2019_df["maekWhRequested"] = caltech_2019_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
caltech_2019_df["smapekWhRequested"] = caltech_2019_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(caltech_2019_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_2019_df["smapekWhRequested"].mean()))

# statistics for jpl site in the year of 2019
      
print ("Jpl - 2019: ")
print ("minutes idle (mean): " + str(jpl_2019_df['minutesIdle'].mean()))
jpl_2019_df["maeMinAvailable"] = jpl_2019_df.apply(lambda row: abs(row.minutesAvailable-row.minutesTotal), axis=1)
jpl_2019_df["smapeMinAvailable"] = jpl_2019_df.apply(lambda row: 100*abs((row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal)), axis=1)
jpl_2019_df["smpeMinAvailable"] = jpl_2019_df.apply(lambda row: 100*(row.minutesAvailable-row.minutesTotal)/(row.minutesAvailable+row.minutesTotal), axis=1)
print ("MAE minutes available: " + str(jpl_2019_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_2019_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_after_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_2019_df['kWhDelivered'].mean()))
jpl_2019_df["maekWhRequested"] = jpl_2019_df.apply(lambda row: abs(row.kWhRequested-row.kWhDelivered), axis=1)
jpl_2019_df["smapekWhRequested"] = jpl_2019_df.apply(lambda row: 100*abs((row.kWhRequested-row.kWhDelivered)/(row.kWhRequested+row.kWhDelivered)), axis=1)
print ("MAE kwh requested: " + str(jpl_2019_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_2019_df["smapekWhRequested"].mean()))

Caltech - 2019: 
minutes idle (mean): 229.06328992355
MAE minutes available: 151.4466002074449
SMAPE minutes available: 25.322137344423925
SMPE minutes available: -10.289225243658253
kWh delivered (mean): 10.312594114229572
MAE kwh requested: 9.45324877341857
SMAPE kwh requested: 32.13216379671299
Jpl - 2019: 
minutes idle (mean): 376.89649711704243
MAE minutes available: 130.77405044151445
SMAPE minutes available: 18.936349840843057
SMPE minutes available: -7.239987344511835
kWh delivered (mean): 15.024155132303067
MAE kwh requested: 12.213107044255715
SMAPE kwh requested: 27.616555068320434

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech	Jpl
minutes idle (min)	229.06	376.89
MAE minAvailable (min)	151.44	130.77
SMAPE minAvailable (%)	25.32	18.94
SMPE minAvailable (%)	-10.29	-7.24
kWh delivered (kWh)	10.31	15.02
MAE kWhRequested (kWh)	9.45	12.21
SMAPE kWhRequested (%)	32.13	27.62

It is observed that the number of minutes the EV is idle is high for both sites. For the Jpl it is observed an increase in the mean value for the minutes the EV is idle in the EVSE to the really high value of 6.28 hours idle per session. =0

The error between the user input in the app of the minutes available for charging seems comparable to the ones obtained in Analysis 1. The meaning of the number is that usually the user input in the app a number that has on average a gap of more than 2 hours of the real time he will plug out his EV.

The average kWh delivered is in the Caltech site is lower than in the Jpl site.

Re-stating the questions from Analysis 1:

It is worth noting the high magnitute of the error between the user input in the app of the kWh requested and the total kWh delivered. Possibly also, the user tends to input a higher value of kWh requested than the real number he needs to charge completely the battery.

The error values poses an interesting research question: should we believe the information that is input by the user in the scheduling algorithm?

Plotting the error histograms to find patterns in the error metrics

Next, we calculate the histograms for the MAE and SMPE for the Time Analysis and the histograms for the MAE and SMAPE for the Energy Analysis. It is noted that there is no SMPE for the energy because kWhDelivered <= kWhRequested.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig31, (ax1_fig31, ax2_fig31) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig31.hist(caltech_2019_df["maeMinAvailable"], bins_minutes, alpha=0.9, label='jan-dec-2019')
ax1_fig31.set_ylim([0, 1000])
ax1_fig31.set_title("MAE - Minutes Available")
ax1_fig31.set_xlabel("Minutes")
ax1_fig31.set_ylabel("Number of Sessions")
ax2_fig31.hist(caltech_2019_df["smpeMinAvailable"], bins_symmetric, alpha=0.9, label='jan-dec-2019')
ax2_fig31.set_ylim([0, 1000])
ax2_fig31.set_title("SMPE - Minutes Available")
ax2_fig31.set_xlabel("%")
ax2_fig31.set_ylabel("Number of Sessions")
fig31.suptitle('Caltech site - 2019 - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig32, (ax1_fig32, ax2_fig32) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig32.hist(jpl_2019_df["maeMinAvailable"], bins_minutes, alpha=0.9, label='jan-dec-2019')
ax1_fig32.set_ylim([0, 1000])
ax1_fig32.set_title("MAE - Minutes Available")
ax1_fig32.set_xlabel("Minutes")
ax1_fig32.set_ylabel("Number of Sessions")
ax2_fig32.hist(jpl_2019_df["smpeMinAvailable"], bins_symmetric, alpha=0.9, label='jan-dec-2019')
ax2_fig32.set_ylim([0, 1000])
ax2_fig32.set_title("SMPE - Minutes Available")
ax2_fig32.set_xlabel("%")
ax2_fig32.set_ylabel("Number of Sessions")
fig32.suptitle('Jpl site - 2019 - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig33, (ax1_fig33, ax2_fig33) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig33.hist(caltech_2019_df['maekWhRequested'], bins_minutes, alpha=0.9, label='jan-dec-2019')
ax1_fig33.set_ylim([0, 2000])
ax1_fig33.set_title("MAE - kWh Requested")
ax1_fig33.set_xlabel("kWh")
ax1_fig33.set_ylabel("Number of Sessions")
ax2_fig33.hist(caltech_2019_df['smapekWhRequested'], bins_percent, alpha=0.9, label='jan-dec-2019')
ax2_fig33.set_ylim([0, 2000])
ax2_fig33.set_title("SMAPE - kWh Requested")
ax2_fig33.set_xlabel("%")
ax2_fig33.set_ylabel("Number of Sessions")
fig33.suptitle('Caltech site - 2019 - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig34, (ax1_fig34, ax2_fig34) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig34.hist(jpl_2019_df["maekWhRequested"], bins_minutes, alpha=0.9, label='jan-dec-2019')
ax1_fig34.set_ylim([0, 2000])
ax1_fig34.set_title("MAE - kWh Requested")
ax1_fig34.set_xlabel("kWh")
ax1_fig34.set_ylabel("Number of Sessions")
ax2_fig34.hist(jpl_2019_df["smapekWhRequested"], bins_percent, alpha=0.9, label='jan-dec-2019')
ax2_fig34.set_ylim([0, 2000])
ax2_fig34.set_title("SMAPE - kWh Requested")
ax2_fig34.set_xlabel("%")
ax2_fig34.set_ylabel("Number of Sessions")
fig34.suptitle('Jpl site - 2019 - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the graphs obtained

It is interesting to observe the gaussian pattern of the SMPE metric in the time analysis. This pattern mean that usually the driver stays in the charging station with a gaussian pattern with mean approximately the time he input in the app.

The distribution for the SMAPE in the energy analysis seems uniform and no clear pattern detected. The charging pattern seems not clear based on the kWh Requested that is input by the user.

Plotting the number of unique users and their number of charges histogram pattern

Next, we calculate the statistics related to the number of unique users in each of the sites in the year of 2019, the average number of charges of each user and how this pattern is observed on a histogram.

#collapse-hide
# user statistics for caltech site in the year of 2019

print ("Caltech (2019):")
print("number of unique users: "+ str(caltech_2019_df['userID'].nunique()))
print("average sessions by user: " + str(caltech_2019_df['userID'].count()/caltech_2019_df['userID'].nunique()))

# user statistics for jpl site in the year of 2019

print ("Jpl (2019):")
print("number of unique users: "+ str(jpl_2019_df['userID'].nunique()))
print("average sessions by user: " + str(jpl_2019_df['userID'].count()/jpl_2019_df['userID'].nunique()))

# plot the graph of value counts behavior by id
# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 365, 50)
fig12, (ax1_fig12, ax2_fig12) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig12.hist(caltech_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9)
ax1_fig12.set_ylim([0, 175])
ax1_fig12.set_title("Caltech (2019)")
ax1_fig12.set_xlabel("Number of Sessions")
ax1_fig12.set_ylabel("Number of Users")
ax2_fig12.hist(jpl_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9)
ax2_fig12.set_ylim([0, 175])
ax2_fig12.set_title("JPL (2019)")
ax2_fig12.set_xlabel("Number of Sessions")
ax2_fig12.set_ylabel("Number of Users")
fig12.suptitle('Number of sessions by user - Year 2019', y=1.05)
plt.tight_layout()

Caltech (2019):
number of unique users: 319
average sessions by user: 27.20062695924765
Jpl (2019):
number of unique users: 367
average sessions by user: 45.05177111716621

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech	Jpl
# unique users	319	367
avg sessions by user	27.2	45.1

The number of unique users in the Caltech site is equal to 319 while to the Jpl site is equal to 367.

The average sessions of each user is higher at the Jpl site, that is a workplace environment, than at the Caltech site that is a university open to the public.

Finally, the histograms of the number of sessions by user is analyzed.

An interesting observation is that a high number of users used the charging system less than 10 times during the year of 2019 in the Caltech site. As the Caltech site is open to the public, this pattern is probably observed by the open public going to the university to use the public facilities, e.g. the gym.

For the Jpl site, even a similar curve is observed, the histogram is more distributed. As the Jpl is a working place, a higher number of sessions per user is expected.

Week/Weekend Analysis

For the sites investigated, different patterns for charging during the week and during the weekend is expected. For example, for the Jpl site is expected that almost no charging is performed during weekends. To confirm this expectation the dataset for the year of 2019 is divided in a subset for sessions that happens during weekdays and sessions that happens during the weekend.

Therefore, next we filter the dataset into 'caltech_week_2019_df', 'caltech_weekend_2019_df', 'jpl_week_2019_df' and 'jpl_weekend_2019_df'.

We present the head of the 'jpl_week_2019_df' dataframe just to reassure the filter is performed correctly.

#collapse-hide
# caltech site
caltech_week_2019_df = caltech_2019_df[caltech_2019_df['day'].isin(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])]
caltech_weekend_2019_df = caltech_2019_df[caltech_2019_df['day'].isin(['Saturday', 'Sunday'])]

# jpl site
jpl_week_2019_df = jpl_2019_df[jpl_2019_df['day'].isin(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])]
jpl_weekend_2019_df = jpl_2019_df[jpl_2019_df['day'].isin(['Saturday', 'Sunday'])]

# display the first 5 rows of jpl_week_2019_df
jpl_week_2019_df.head()

Plotting the histograms for minutesCharging, minutesAvailable, minutesTotal

Next we will plot the histogram to analyze the difference in behavior of the driver when charging during the week or during the weekend for three variables: minutesCharging, minutesAvailable and minutesTotal.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 1200, 100)
fig13, (ax1_fig13, ax2_fig13, ax3_fig13) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig13.hist(caltech_week_2019_df["minutesCharging"], bins_minutes, alpha=0.5, label='week')
ax1_fig13.hist(caltech_weekend_2019_df["minutesCharging"], bins_minutes, alpha=0.9, label='weekend')
ax1_fig13.set_ylim([0, 1000])
ax1_fig13.set_title("Minutes Charging")
ax1_fig13.set_xlabel("Number of Sessions")
ax1_fig13.set_ylabel("Minutes")
ax2_fig13.hist(caltech_week_2019_df["minutesTotal"], bins_minutes, alpha=0.5, label='week')
ax2_fig13.hist(caltech_weekend_2019_df["minutesTotal"], bins_minutes, alpha=0.9, label='weekend')
ax2_fig13.set_ylim([0, 1000])
ax2_fig13.set_title("Minutes Total")
ax2_fig13.set_xlabel("Number of Sessions")
ax2_fig13.set_ylabel("Minutes")
ax3_fig13.hist(caltech_week_2019_df["minutesAvailable"], bins_minutes, alpha=0.5, label='week')
ax3_fig13.hist(caltech_weekend_2019_df["minutesAvailable"], bins_minutes, alpha=0.9, label='weekend')
ax3_fig13.set_ylim([0, 1000])
ax3_fig13.set_title("Minutes Available (user input)")
ax3_fig13.set_xlabel("Number of Sessions")
ax3_fig13.set_ylabel("Minutes")
fig13.suptitle('Caltech site - Week/Weekend - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 1200, 100)
fig14, (ax1_fig14, ax2_fig14, ax3_fig14) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig14.hist(jpl_week_2019_df["minutesCharging"], bins_minutes, alpha=0.5, label='week')
ax1_fig14.hist(jpl_weekend_2019_df["minutesCharging"], bins_minutes, alpha=0.9, label='weekend')
ax1_fig14.set_ylim([0, 1000])
ax1_fig14.set_title("Minutes Charging")
ax1_fig14.set_xlabel("Number of Sessions")
ax1_fig14.set_ylabel("Minutes")
ax2_fig14.hist(jpl_week_2019_df["minutesTotal"], bins_minutes, alpha=0.5, label='week')
ax2_fig14.hist(jpl_weekend_2019_df["minutesTotal"], bins_minutes, alpha=0.9, label='weekend')
ax2_fig14.set_ylim([0, 1000])
ax2_fig14.set_title("Minutes Total")
ax2_fig14.set_xlabel("Number of Sessions")
ax2_fig14.set_ylabel("Minutes")
ax3_fig14.hist(jpl_week_2019_df["minutesAvailable"], bins_minutes, alpha=0.5, label='week')
ax3_fig14.hist(jpl_weekend_2019_df["minutesAvailable"], bins_minutes, alpha=0.9, label='weekend')
ax3_fig14.set_ylim([0, 1000])
ax3_fig14.set_title("Minutes Available (user input)")
ax3_fig14.set_xlabel("Number of Sessions")
ax3_fig14.set_ylabel("Minutes")
fig14.suptitle('Jpl site - Week/Weekend - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for minutesCharging, minutesTotal and minutesAvailable

Caltech: It is seen that the activity in the Caltech site during the weekend is not negligible. A peak similar to the one observed during the week is observed for the cases where the user stays 1-2 hour charging his vehicle. However, the second peak observed in the minutesTotal curve during the week is not observed during the weekend, what is expected.
Jpl: The activity in the Jpl site during the weekend is negligible when compared to the week patterns. Additionally, no clear pattern is observed. Probably, during the weekend, only workers that need to solve something urgent go to the office, what explains the high variance in the minutesTotal spent by the EV plugged in.

Plotting more histograms: kWhDelivered, kWhRequested

Next we will plot the histogram to analyze the week/weekend behavior for the kWhDelivered, kWhRequested variables.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 100, 100)
fig15, (ax1_fig15, ax2_fig15) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig15.hist(caltech_week_2019_df['kWhDelivered'], bins_minutes, alpha=0.5, label='sep-oct-2018')
ax1_fig15.hist(caltech_weekend_2019_df['kWhDelivered'], bins_minutes, alpha=0.9, label='nov-dec-2018')
ax1_fig15.set_ylim([0, 2000])
ax1_fig15.set_title("kWh Delivered")
ax1_fig15.set_xlabel("kWh")
ax1_fig15.set_ylabel("Number of Sessions")
ax2_fig15.hist(caltech_week_2019_df['kWhRequested'], bins_minutes, alpha=0.5, label='sep-oct-2018')
ax2_fig15.hist(caltech_weekend_2019_df['kWhRequested'], bins_minutes, alpha=0.9, label='nov-dec-2018')
ax2_fig15.set_ylim([0, 2000])
ax2_fig15.set_title("kWh Requested")
ax2_fig15.set_xlabel("kWh")
ax2_fig15.set_ylabel("Number of Sessions")
fig15.suptitle('Caltech site - Week/Weekend - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 100, 100)
fig16, (ax1_fig16, ax2_fig16) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig16.hist(jpl_week_2019_df["kWhDelivered"], bins_minutes, alpha=0.5, label='week')
ax1_fig16.hist(jpl_weekend_2019_df["kWhDelivered"], bins_minutes, alpha=0.9, label='weekend')
ax1_fig16.set_ylim([0, 2000])
ax1_fig16.set_title("kWh Delivered")
ax1_fig16.set_xlabel("kWh")
ax1_fig16.set_ylabel("Number of Sessions")
ax2_fig16.hist(jpl_week_2019_df["kWhRequested"], bins_minutes, alpha=0.5, label='week')
ax2_fig16.hist(jpl_weekend_2019_df["kWhRequested"], bins_minutes, alpha=0.9, label='weekend')
ax2_fig16.set_ylim([0, 2000])
ax2_fig16.set_title("kWh Requested")
ax2_fig16.set_xlabel("kWh")
ax2_fig16.set_ylabel("Number of Sessions")
fig16.suptitle('Jpl site - Week/Weekend - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for kWhDelivered, kWhRequested

Caltech: Most of the time less than 20 kWh was delivered during the weekend.
Jpl: High variability in the kWh delivered during the weekend.

Calculating statistics...

Next we will calculate the following statistics to analyze the week/weekend driver charging behavior.

#collapse-hide
# statistics for caltech site

# week
print ("Caltech - Week:")
print ("minutes idle (mean): " + str(caltech_week_2019_df['minutesIdle'].mean()))
print ("MAE minutes available (mean): " + str((abs(caltech_week_2019_df['minutesAvailable']-caltech_week_2019_df['minutesTotal'])).mean()))
print ("kWh delivered (mean): " + str(caltech_week_2019_df['kWhDelivered'].mean()))
print ("error kwh requested (mean): " + str((abs(caltech_week_2019_df['kWhRequested']-caltech_week_2019_df['kWhDelivered'])).mean()))
# weekend
print ("Caltech - Weekend: ")
print ("minutes idle (mean): " + str(caltech_weekend_2019_df['minutesIdle'].mean()))
print ("error minutes available (mean): " + str((abs(caltech_weekend_2019_df['minutesAvailable']-caltech_weekend_2019_df['minutesTotal'])).mean()))
print ("kWh delivered (mean): " + str(caltech_weekend_2019_df['kWhDelivered'].mean()))
print ("MAE kwh requested (mean): " + str((abs(caltech_weekend_2019_df['kWhRequested']-caltech_weekend_2019_df['kWhDelivered'])).mean()))
      
# statistics for jpl site
      
# week
print ("Jpl - Week:")
print ("minutes idle (mean): " + str(jpl_week_2019_df['minutesIdle'].mean()))
print ("MAE minutes available (mean): " + str((abs(jpl_week_2019_df['minutesAvailable']-jpl_week_2019_df['minutesTotal'])).mean()))
print ("kWh delivered (mean): " + str(jpl_week_2019_df['kWhDelivered'].mean()))
print ("MAE kwh requested (mean): " + str((abs(jpl_week_2019_df['kWhRequested']-jpl_week_2019_df['kWhDelivered'])).mean()))
# weekend
print ("Jpl - Weekend:")
print ("minutes idle (mean): " + str(jpl_weekend_2019_df['minutesIdle'].mean()))
print ("MAE minutes available (mean): " + str((abs(jpl_weekend_2019_df['minutesAvailable']-jpl_weekend_2019_df['minutesTotal'])).mean()))
print ("kWh delivered (mean): " + str(jpl_weekend_2019_df['kWhDelivered'].mean()))
print ("MAE kwh requested (mean): " + str((abs(jpl_weekend_2019_df['kWhRequested']-jpl_weekend_2019_df['kWhDelivered'])).mean()))

Caltech - Week:
minutes idle (mean): 242.76089396284647
MAE minutes available (mean): 153.6992797557619
kWh delivered (mean): 10.095075868055979
error kwh requested (mean): 8.976210088616263
Caltech - Weekend: 
minutes idle (mean): 114.26996396396356
error minutes available (mean): 132.567927927928
kWh delivered (mean): 12.135514594594587
MAE kwh requested (mean): 13.451090810810827
Jpl - Week:
minutes idle (mean): 376.0737453090756
MAE minutes available (mean): 130.5036815951174
kWh delivered (mean): 15.04889469734139
MAE kwh requested (mean): 12.181319630706538
Jpl - Weekend:
minutes idle (mean): 413.0394474637677
MAE minutes available (mean): 142.65117753623193
kWh delivered (mean): 13.93736217466789
MAE kwh requested (mean): 13.609507390549508

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech - Week	Caltech - Weekend	Jpl - Week	Jpl - Weekend
minutes idle (min)	242.76	114.27	376.07	413.03
MAE minAvailable (min)	153.70	132.56	130.50	142.65
kWh delivered (kWh)	10.09	12.13	15.04	13.94
MAE kWhRequested (kWh)	8.97	13.45	12.18	13.61

In the Caltech site during the weekend the number of minutes the vehicle is idle is highly decreased because the minutesTotal spent at the EVSE also decreases. For the Jpl site, however, the time idle increases when compared to the week pattern.

The error between the user input in the app of the minutes available for changing also seems comparable between the 4 options. The meaning of the number is that usually the user input in the app a number that has on average a gap of more than 2 hours of the real time he will plug out his EV. It is interesting to observe that this error have been consistent throughout all the analysis we have done so far.

The average kWh delivered is increases in the Caltech website, even minutesTotal pattern decreases, what is an interesting pattern. For the Jpl site, the kWh delivered during the weekend decreases around 1 kWh.

The high magnitude of the error seems consistent to the ones obtained to the year 2019 analysis.

Plotting the number of unique users and their number of charges histogram pattern

Next, we calculate the statistics related to the number of unique users in each of the sites, the average number of charges of each user and how this pattern is observed on a histogram.

#collapse-hide
# user statistics for caltech site

# week
print ("Caltech - Week:")
print("number of unique users: "+ str(caltech_week_2019_df['userID'].nunique()))
print("average sessions by user: " + str(caltech_week_2019_df['userID'].count()/caltech_week_2019_df['userID'].nunique()))
# weekend
print ("Caltech - Weekend:")
print("number of unique users: "+ str(caltech_weekend_2019_df['userID'].nunique()))
print("average sessions by user: " + str(caltech_weekend_2019_df['userID'].count()/caltech_weekend_2019_df['userID'].nunique()))

# user statistics for jpl site

# week
print ("Jpl - Week:")
print("number of unique users: "+ str(jpl_week_2019_df['userID'].nunique()))
print("average sessions by user: " + str(jpl_week_2019_df['userID'].count()/jpl_week_2019_df['userID'].nunique()))
# weekend
print ("Jpl - Weekend:")
print("number of unique users: "+ str(jpl_weekend_2019_df['userID'].nunique()))
print("average sessions by user: " + str(jpl_weekend_2019_df['userID'].count()/jpl_weekend_2019_df['userID'].nunique()))

# plot the graph of value counts behavior by id
# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 360, 60)
fig17, (ax1_fig17, ax2_fig17, ax3_fig17, ax4_fig17) = plt.subplots(1, 4, figsize=(15, 5))
ax1_fig17.hist(caltech_week_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='caltech-week')
ax1_fig17.set_ylim([0, 150])
ax1_fig17.set_title("Caltech - Week")
ax1_fig17.set_xlabel("Number of Sessions")
ax1_fig17.set_ylabel("Number of Users")
ax2_fig17.hist(caltech_weekend_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='caltech-weekend')
ax2_fig17.set_ylim([0, 150])
ax2_fig17.set_title("Caltech - Weekend")
ax2_fig17.set_xlabel("Number of Sessions")
ax2_fig17.set_ylabel("Number of Users")
ax3_fig17.hist(jpl_week_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='jpl-week')
ax3_fig17.set_ylim([0, 150])
ax3_fig17.set_title("Jpl - Week")
ax3_fig17.set_xlabel("Number of Sessions")
ax3_fig17.set_ylabel("Number of Users")
ax4_fig17.hist(jpl_weekend_2019_df['userID'].value_counts(), bins_minutes, alpha=0.9, label='jpl-weekend')
ax4_fig17.set_ylim([0, 150])
ax4_fig17.set_title("Jpl - Weekend")
ax4_fig17.set_xlabel("Number of Sessions")
ax4_fig17.set_ylabel("Number of Users")
fig17.suptitle('Number of sessions by user - Week/Weekend', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Caltech - Week:
number of unique users: 296
average sessions by user: 26.18918918918919
Caltech - Weekend:
number of unique users: 148
average sessions by user: 6.25
Jpl - Week:
number of unique users: 362
average sessions by user: 44.65745856353591
Jpl - Weekend:
number of unique users: 95
average sessions by user: 3.873684210526316

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech - Week	Caltech - Weekend	Jpl - Week	Jpl - Weekend
# unique users	296	148	362	95
avg sessions by user	26.19	6.25	44.66	3.87

It is shown in the table the half of the unique Caltech site users also utilizes the charging network during the weekend. It is interesting to observe that form the 319 unique users (in the entire 2019 year), only 296 went at least one time during the week. 23 users just utilized the charging network during the weekend. For the Jpl site, from the 367 unique users (in the entire 2019), 362 went at least one time during the week. The percentage of unique users that went during the weekend in the Jpl site is lower when compared to the Caltech site.

The average sessions by user during the weekend also confirms that the Caltech site is more used during the weekend when compared to the Jpl site.

Finally, the histograms of the number of sessions by user is analyzed.

Similar pattern with the entire year of 2019 is observed.

Analysis 2 - Conclusion

In this section we investigated the driver charging behavior in two sites, Caltech and Jpl, using Adaptive Charging Networks during the year of 2019.

We observe that the minutesCharging and kWhDelivered histograms follows the same pattern for both sites, what is expected as they use the same adaptive scheduling algorithm. We observe different patterns for the minutesTotal histogram, as one of the sites is open to the public and the other one is a workplace environment.
We observe that the EV is idle for around 302.97 hours, what is a high number that could be used in more flexible frameworks such as V2G. A high error of 2.35 hours between the number of minutes available that the user input in the app and the real number of minutes total that the EV stays plugged in, what opens up the possibility of not relying only in the information input by the user but also in individual models modeled for each user. We observe a high error also between the kWh requested by the user and the real number of kWh delivered. A deeper analysis is needed to check if this occurs because of the adaptive scheduling algorithm or because the user tends to input a kWhRequested value that is much higher than the energy he needs to completely charge the EV battery.
We observe very divergent patternn of the driver charging behavior during the week and during the weekend. This shows that, general behavior cannot be applied in every situation, and different models separated in week/weekend behavior are desired. However, as the number of users decreases significantly in the Jpl site, this is not crucial. On the other hand, for the Caltech site this mismatch is more evident. Then, it is shown that a high number of users used the charging station for less than 10 sessions in the period of this analysis.

ANALYSIS 3: INDIVIDUAL DRIVER CHARGING BEHAVIOR IN THE YEAR OF 2019

The main focus of this work is the analysis of individual driver charging behavior, what is started in this analysis.

In this analysis the objective is to get insights on how to model and statistically characterise individual driver charging behavior for subsequent development of adaptive scheduling algorithms. One of the challenges that arise from this analysis and from the subsequent tasks is how reliable is the modeling of individual behavior when data is limited? From Analysis 1 and Analysis 2 we already saw that most of the users charged their EVs less than 10 times using the Adaptive Charging Network.

Therefore, in summary, for the 2 sites studied, Caltech and Jpl, individual driver charging behavior for the year of 2019 will be investigated.

In this analysis, we have the objective of:

Calculating the statistics related to the difference between general and individual driver charging behavior.
Analyze the 3 users with more charging sessions in the Caltech site.
Analyze the 3 users with more charging sessions in the JPL site.
Propose a Reliability and a Predictability index for individual driver charging behavior.

Next, start calculating the characteristics that will be used to compare general and individual driver charging behavior for the minutesAvailable variable.

#collapse-hide
# caltech site
caltech_users = caltech_2019_df['userID'].unique()
caltech_minutesAvailable = []
caltech_minutesAvailable = np.array(caltech_minutesAvailable)
caltech_minutesAvailableStd = []
caltech_minutesAvailableStd = np.array(caltech_minutesAvailableStd)
for user in caltech_users:
    caltech_minutesAvailable = np.append(caltech_minutesAvailable, caltech_2019_df[caltech_2019_df.userID.isin([user])].minutesAvailable.mean())
    caltech_minutesAvailableStd = np.append(caltech_minutesAvailableStd, caltech_2019_df[caltech_2019_df.userID.isin([user])].minutesAvailable.std())
print("Caltech - Minutes Available Average: " + str(np.nanmean(caltech_minutesAvailable)))
print("Caltech - General Minutes Available Std: " + str(np.nanstd(caltech_minutesAvailable)))
print("Caltech - Individual Minutes Available Std: " + str(np.nanmean(caltech_minutesAvailableStd)))

# jpl site
jpl_users = jpl_2019_df['userID'].unique()
jpl_minutesAvailable = []
jpl_minutesAvailable = np.array(jpl_minutesAvailable)
jpl_minutesAvailableStd = []
jpl_minutesAvailableStd = np.array(jpl_minutesAvailableStd)
for user in jpl_users:
    jpl_minutesAvailable = np.append(jpl_minutesAvailable, jpl_2019_df[jpl_2019_df.userID.isin([user])].minutesAvailable.mean())
    jpl_minutesAvailableStd = np.append(jpl_minutesAvailableStd, jpl_2019_df[jpl_2019_df.userID.isin([user])].minutesAvailable.std())
print("Jpl - Minutes Available Average: " + str(np.nanmean(jpl_minutesAvailable)))
print("Jpl - General Minutes Available Std: " + str(np.nanstd(jpl_minutesAvailable)))
print("Jpl - Individual Minutes Available Std: " + str(np.nanmean(jpl_minutesAvailableStd)))

Caltech - Minutes Available Average: 269.80762246167006
Caltech - General Minutes Available Std: 169.82840961752086
Caltech - Individual Minutes Available Std: 85.05675868997399
Jpl - Minutes Available Average: 332.38003276740227
Jpl - General Minutes Available Std: 154.21344880288214
Jpl - Individual Minutes Available Std: 111.83956584517233

Analyzing the statistics obtained...

The statistics obtained for the minAvailable variable are presented in the next table.

Statistic	Caltech	Jpl
avg minutes available	269.81	332.38
general minutes available std	169.83	154.21
individual minutes available std	85.05	111.84

It is shown that while the general standard deviation for the minAvailable variable is 169.83 minutes for the Caltech site, the individual standard deviation for the same variable is just 85.045. For the JPL site, the general standard deviation for the minAvailable variable is 154.21 minutes, while the standard deviation for the indiviual minAvailable is 111.84.

This shows the potential of individually characterising each user regarding the minAvailable variable.

Next, start calculating the characteristics that will be used to compare general and individual driver charging behavior for the kWhRequested variable.

#collapse-hide
# caltech site
caltech_kWhRequested = []
caltech_kWhRequested = np.array(caltech_kWhRequested)
caltech_kWhRequestedStd = []
caltech_kWhRequestedStd = np.array(caltech_kWhRequestedStd)
for user in caltech_users:
    caltech_kWhRequested = np.append(caltech_kWhRequested, caltech_2019_df[caltech_2019_df.userID.isin([user])].kWhRequested.mean())
    caltech_kWhRequestedStd = np.append(caltech_kWhRequestedStd, caltech_2019_df[caltech_2019_df.userID.isin([user])].kWhRequested.std())
print("Caltech - kWh Requested Average: " + str(np.nanmean(caltech_kWhRequested)))
print("Caltech - General kWh Requested Std: " + str(np.nanstd(caltech_kWhRequested)))
print("Caltech - Individual kWh Requested Std: " + str(np.nanmean(caltech_kWhRequestedStd)))

# jpl site
jpl_kWhRequested = []
jpl_kWhRequested = np.array(jpl_kWhRequested)
jpl_kWhRequestedStd = []
jpl_kWhRequestedStd = np.array(jpl_kWhRequestedStd)
for user in jpl_users:
    jpl_kWhRequested = np.append(jpl_kWhRequested, jpl_2019_df[jpl_2019_df.userID.isin([user])].kWhRequested.mean())
    jpl_kWhRequestedStd = np.append(jpl_kWhRequestedStd, jpl_2019_df[jpl_2019_df.userID.isin([user])].kWhRequested.std())
print("Jpl - kWh Requested Average: " + str(np.nanmean(jpl_kWhRequested)))
print("Jpl - General kWh Requested Std: " + str(np.nanstd(jpl_kWhRequested)))
print("Jpl - Individual kWh Requested Std: " + str(np.nanmean(jpl_kWhRequestedStd)))

Caltech - kWh Requested Average: 23.88094788819109
Caltech - General kWh Requested Std: 19.024704462659574
Caltech - Individual kWh Requested Std: 6.68394996775694
Jpl - kWh Requested Average: 26.113976620812284
Jpl - General kWh Requested Std: 17.416047572238796
Jpl - Individual kWh Requested Std: 7.929428991585556

Analyzing the statistics obtained...

The statistics obtained for the kwhRequested variable are presented in the next table.

Statistic	Caltech	Jpl
avg kwh requested	23.88	26.11
general kwh requested std	19.02	17.42
individual kwh requested std	6.68	7.93

It is shown that while the general standard deviation for the kWhRequested variable is 19.02 kWh for the Caltech site, the individual standard deviation for the same variable is just 6.68. For the JPL site, the general standard deviation for the kWhRequested variable is 17.42 kwh, while the standard deviation for the indiviual kWhRequested is 7.93.

This shows the potential of individually characterising each user regarding the kWhRequested variable.

Next, we select the 3 users with more charging sessions recorded for the year of 2019 in the Caltech and Jpl sites.

#collapse-hide
print("Caltech: ")
print(caltech_2019_df['userID'].value_counts())

print("Jpl: ")
print(jpl_2019_df['userID'].value_counts())

# filter 3 users - caltech
caltech_user743_df = caltech_2019_df[caltech_2019_df.userID.isin([743.0])]
caltech_user562_df = caltech_2019_df[caltech_2019_df.userID.isin([562.0])]
caltech_user891_df = caltech_2019_df[caltech_2019_df.userID.isin([891.0])]

# filter 3 users - jpl
jpl_user651_df = jpl_2019_df[jpl_2019_df.userID.isin([651.0])]
jpl_user933_df = jpl_2019_df[jpl_2019_df.userID.isin([933.0])]
jpl_user406_df = jpl_2019_df[jpl_2019_df.userID.isin([406.0])]

Caltech: 
743.0     297
562.0     254
891.0     222
1470.0    199
712.0     197
         ... 
3590.0      1
3020.0      1
419.0       1
428.0       1
3968.0      1
Name: userID, Length: 319, dtype: int64
Jpl: 
651.0     237
933.0     236
406.0     232
405.0     206
483.0     203
         ... 
2319.0      1
433.0       1
1663.0      1
331.0       1
2525.0      1
Name: userID, Length: 367, dtype: int64

It is seen that the 3 users with more charging sessions for the Caltech site are:

User 743 with 297 charging sessions
User 562 with 254 charging sessions
User 891 with 222 charging sessions

It is seen that the 3 users with more charging sessions for the Jpl site are:

User 651 with 237 charging sessions
User 933 with 236 charging sessions
User 406 with 232 charging sessions

Next, we show the behavior for the 3 users with more charging sessions recorded for the year of 2019 in the Caltech and Jpl site.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 1200, 100)
fig18, (ax1_fig18, ax2_fig18, ax3_fig18) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig18.hist(caltech_user743_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 743')
ax1_fig18.hist(caltech_user562_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 562')
ax1_fig18.hist(caltech_user891_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 891')
ax1_fig18.set_ylim([0, 140])
ax1_fig18.set_title("Minutes Charging")
ax1_fig18.set_xlabel("Minutes")
ax1_fig18.set_ylabel("Number of Sessions")
ax2_fig18.hist(caltech_user743_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 743')
ax2_fig18.hist(caltech_user562_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 562')
ax2_fig18.hist(caltech_user891_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 891')
ax2_fig18.set_ylim([0, 140])
ax2_fig18.set_title("Minutes Total")
ax2_fig18.set_xlabel("Minutes")
ax2_fig18.set_ylabel("Number of Sessions")
ax3_fig18.hist(caltech_user743_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 743')
ax3_fig18.hist(caltech_user562_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 562')
ax3_fig18.hist(caltech_user891_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 891')
ax3_fig18.set_ylim([0, 140])
ax3_fig18.set_title("Minutes Available (user input)")
ax3_fig18.set_xlabel("Minutes")
ax3_fig18.set_ylabel("Number of Sessions")
fig18.suptitle('Caltech site - Individual Driver Charging Behavior - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 1200, 100)
fig19, (ax1_fig19, ax2_fig19, ax3_fig19) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig19.hist(jpl_user651_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 651')
ax1_fig19.hist(jpl_user933_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 933')
ax1_fig19.hist(jpl_user406_df["minutesCharging"], bins_minutes, alpha=0.3, label='user 406')
ax1_fig19.set_ylim([0, 140])
ax1_fig19.set_title("Minutes Charging")
ax1_fig19.set_xlabel("Minutes")
ax1_fig19.set_ylabel("Number of Sessions")
ax2_fig19.hist(jpl_user651_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 651')
ax2_fig19.hist(jpl_user933_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 933')
ax2_fig19.hist(jpl_user406_df["minutesTotal"], bins_minutes, alpha=0.3, label='user 406')
ax2_fig19.set_ylim([0, 140])
ax2_fig19.set_title("Minutes Total")
ax2_fig19.set_xlabel("Minutes")
ax2_fig19.set_ylabel("Number of Sessions")
ax3_fig19.hist(jpl_user651_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 651')
ax3_fig19.hist(jpl_user933_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 933')
ax3_fig19.hist(jpl_user406_df["minutesAvailable"], bins_minutes, alpha=0.3, label='user 406')
ax3_fig19.set_ylim([0, 140])
ax3_fig19.set_title("Minutes Available (user input)")
ax3_fig19.set_xlabel("Minutes")
ax3_fig19.set_ylabel("Number of Sessions")
fig19.suptitle('Jpl site - Individual Driver Charging Behavior - Time Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for minutesCharging, minutesTotal and minutesAvailable

For the Caltech site:

User 743: For the minTotal we observe two clusters, one with a peak around 100 minutes and another one with peak around 500 minutes. The same pattern is observed for the minAvailable pattern.
User 562: For the minTotal we observe a distribution with peak around 200 minutes, with the same pattern observed for the minAvailable variable. However, the user seems to often stay with the EV in the EVSE more time than the input in the minAvailable variable.
User 891: For this user, the behavior seems more distributed. The minTotal peaks are around 100 minutes and around 400 minutes.

For the Jpl site:

User 651: The user 651 have an uniform distributed histogram from 200-500 minutes.
User 406: The user 406 is highly committed to staying the same amount of time at the Jpl site. There is a low error between the input information for minAvailable and minTotal for the user 406.
User 933: The user 933 have an uniform histogram staying longer, distributed from 500-800 minutes.

It is noted that, even for the same workspace and working hours, users have different charging patterns.

It is interesting to observe the difference in the minutesCharging histograms for the 3-top-users from the Jpl and Caltech sites.

Plotting more histograms: kWhDelivered, kWhRequested

Next we will plot the histogram to analyze the behavior of the users for the variables kWhDelivered and kWhRequested.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 100, 100)
fig20, (ax1_fig20, ax2_fig20) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig20.hist(caltech_user743_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 743')
ax1_fig20.hist(caltech_user562_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 562')
ax1_fig20.hist(caltech_user891_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 891')
ax1_fig20.set_ylim([0, 300])
ax1_fig20.set_title("kWh Delivered")
ax1_fig20.set_xlabel("kWh")
ax1_fig20.set_ylabel("Number of Sessions")
ax2_fig20.hist(caltech_user743_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 743')
ax2_fig20.hist(caltech_user562_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 562')
ax2_fig20.hist(caltech_user891_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 891')
ax2_fig20.set_ylim([0, 300])
ax2_fig20.set_title("kWh Requested")
ax2_fig20.set_xlabel("kWh")
ax2_fig20.set_ylabel("Number of Sessions")
fig20.suptitle('Caltech site - Individual Driver Charging Behavior - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 100, 100)
fig21, (ax1_fig21, ax2_fig21) = plt.subplots(1, 2, figsize=(15, 5))
ax1_fig21.hist(jpl_user651_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 651')
ax1_fig21.hist(jpl_user933_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 933')
ax1_fig21.hist(jpl_user406_df['kWhDelivered'], bins_minutes, alpha=0.5, label='user 406')
ax1_fig21.set_ylim([0, 100])
ax1_fig21.set_title("kWh Delivered")
ax1_fig21.set_xlabel("kWh")
ax1_fig21.set_ylabel("Number of Sessions")
ax2_fig21.hist(jpl_user651_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 651')
ax2_fig21.hist(jpl_user933_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 933')
ax2_fig21.hist(jpl_user406_df['kWhRequested'], bins_minutes, alpha=0.5, label='user 406')
ax2_fig21.set_ylim([0, 100])
ax2_fig21.set_title("kWh Requested")
ax2_fig21.set_xlabel("kWh")
ax2_fig21.set_ylabel("Number of Sessions")
fig21.suptitle('Jpl site - Individual Driver Charging Behavior - Energy Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the histograms for kWhDelivered, kWhRequested

For the Caltech site:

User 743: The user 743 usually ask the same kWhRequested.
User 562: For the user 562 two values are usually requested for charging.
User 891: For this user, the behavior seems more distributed, but a value of less than 20kWh is requested.

For the Jpl site:

User 651: The user 651 usually ask the same kWhRequested of 20 kWh.
User 406: Despite of staying the same amount of time at the Jpl site, the user 406 asks 3 different energy requested.
User 933: The user 933 have an uniform histogram of kWH Requested, starting from more than 30 kWh. The user 933 is an outlier.

It is noted that, even for the same workspace and working hours, users have different charging patterns.

Usually the users ask for charging of less than 20 kWh, but the user 933 of the Jpl site is an outlier, and recharges the vehicle for more than 20 kWh in each charging session.

Calculating statistics...

Next we will calculate the statistics for each user: minutesIdle, errorMinAvailable, kWhDelivered, errorkWhRequested

#collapse-hide
# statistics for caltech site

# user 743
print ("Caltech - User 743:")
print ("minutes idle (mean): " + str(caltech_user743_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(caltech_user743_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_user743_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_user743_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_user743_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(caltech_user743_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_user743_df["smapekWhRequested"].mean()))
# user 562
print ("Caltech - User 562: ")
print ("minutes idle (mean): " + str(caltech_user562_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(caltech_user562_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_user562_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_user562_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_user562_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(caltech_user562_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_user562_df["smapekWhRequested"].mean()))
# user 891
print ("Caltech - User 891: ")
print ("minutes idle (mean): " + str(caltech_user891_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(caltech_user891_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(caltech_user891_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(caltech_user891_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(caltech_user891_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(caltech_user891_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(caltech_user891_df["smapekWhRequested"].mean()))
      
# statistics for jpl site
      
# before the policy change
print ("Jpl - User 651:")
print ("minutes idle (mean): " + str(jpl_user651_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(jpl_user651_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_user651_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_user651_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_user651_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(jpl_user651_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_user651_df["smapekWhRequested"].mean()))
# after the policy change
print ("Jpl - User 933:")
print ("minutes idle (mean): " + str(jpl_user933_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(jpl_user933_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_user933_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_user933_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_user933_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(jpl_user933_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_user933_df["smapekWhRequested"].mean()))
# after the policy change
print ("Jpl - User 406:")
print ("minutes idle (mean): " + str(jpl_user406_df['minutesIdle'].mean()))
print ("MAE minutes available: " + str(jpl_user406_df["maeMinAvailable"].mean()))
print ("SMAPE minutes available: " + str(jpl_user406_df["smapeMinAvailable"].mean()))
print ("SMPE minutes available: " + str(jpl_user406_df["smpeMinAvailable"].mean()))
print ("kWh delivered (mean): " + str(jpl_user406_df['kWhDelivered'].mean()))
print ("MAE kwh requested: " + str(jpl_user406_df["maekWhRequested"].mean()))
print ("SMAPE kwh requested: " + str(jpl_user406_df["smapekWhRequested"].mean()))

Caltech - User 743:
minutes idle (mean): 229.5993265993266
MAE minutes available: 46.93563411896745
SMAPE minutes available: 10.52924026814878
SMPE minutes available: -5.435360422045931
kWh delivered (mean): 4.083737373737374
MAE kwh requested: 3.956666666666668
SMAPE kwh requested: 33.41869837667189
Caltech - User 562: 
minutes idle (mean): 249.28123359580044
MAE minutes available: 133.23339895013115
SMAPE minutes available: 20.671464126048484
SMPE minutes available: -18.100377919084657
kWh delivered (mean): 3.8322358027356747
MAE kwh requested: 1.6163660914495852
SMAPE kwh requested: 21.70603093266066
Caltech - User 891: 
minutes idle (mean): 309.6396396396397
MAE minutes available: 180.3549549549549
SMAPE minutes available: 32.34886037164889
SMPE minutes available: -18.77987313479631
kWh delivered (mean): 2.556612612612612
MAE kwh requested: 4.5379819819819796
SMAPE kwh requested: 47.282404919536994
Jpl - User 651:
minutes idle (mean): 242.79127988748257
MAE minutes available: 193.15133614627288
SMAPE minutes available: 33.171681116713785
SMPE minutes available: -27.100288372361465
kWh delivered (mean): 12.663523206751046
MAE kwh requested: 7.024274261603373
SMAPE kwh requested: 22.800590498294774
Jpl - User 933:
minutes idle (mean): 399.6137005649721
MAE minutes available: 144.93573446327687
SMAPE minutes available: 13.585741048563026
SMPE minutes available: 1.1658135789581516
kWh delivered (mean): 35.26355927024482
MAE kwh requested: 17.5426102212806
SMAPE kwh requested: 20.64112067448836
Jpl - User 406:
minutes idle (mean): 257.71063218390805
MAE minutes available: 35.9102011494253
SMAPE minutes available: 3.516062831529398
SMPE minutes available: 1.1235969850902068
kWh delivered (mean): 10.638380340038317
MAE kwh requested: 6.68527228208812
SMAPE kwh requested: 21.65838469101727

Analyzing the statistics obtained...

The statistics obtained are presented in the next table.

Statistic	Caltech - User 743	Caltech - User 562	Caltech - User 891	Jpl - User 651	Jpl - User 933	Jpl - User 406
minutes idle (min)	229.60	249.28	309.64	242.79	399.61	257.71
MAE minAvailable (min)	46.94	133.23	180.35	193.15	144.94	35.91
SMAPE minAvailable (%)	10.53	20.67	32.34	33.17	13.58	3.51
SMPE minAvailable (%)	-5.43	-18.10	-18.77	-27.10	1.16	1.12
kWh delivered (kWh)	4.08	3.83	2.56	12.66	35.26	10.64
MAE kWhRequested (kWh)	3.96	1.62	4.54	7.02	17.54	6.69
SMAPE kWhRequested (%)	33.42	21.71	47.28	22.8	20.64	21.66

It is observed that the average number of minutes the EV is idle in the EVSE is more than 200 minutes independent of the user analyzed.

The error between the user input in the app of the minutes available for charging has a high standard deviation, leading to us believing that some users are more reliable on the information input in the app than other users.

The average kWh delivered has a really high standard deviation between users.

It is worth noting the error of kWhRequested is higher when the kWhDelivered/kWhRequested is higher. Because of that fact is also important to analyze the SMAPE metric and check how it is associated with the adaptive scheduling algorithm. We also plan to do a Pilot Signal and Occupancy Analysis to check if the scheduling algorithm is able to meet all energy demands.

It can be seen that the SMAPE is higher than 20% for all the users analyzed.

Plotting the MAE, SMAPE, SMPE for each user

Next, we try to observe the error pattern of each user and check the possibility of using this graph to model individual behavior.

#collapse-hide
# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig35, (ax1_fig35, ax2_fig35) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig35.hist(caltech_user743_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 743')
ax1_fig35.hist(caltech_user562_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 562')
ax1_fig35.hist(caltech_user891_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 891')
ax1_fig35.set_ylim([0, 70])
ax1_fig35.set_title("MAE - Minutes Available")
ax1_fig35.set_xlabel("Minutes")
ax1_fig35.set_ylabel("Number of Sessions")
ax2_fig35.hist(caltech_user743_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 743')
ax2_fig35.hist(caltech_user562_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 562')
ax2_fig35.hist(caltech_user891_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 891')
ax2_fig35.set_ylim([0, 70])
ax2_fig35.set_title("SMPE - Minutes Available")
ax2_fig35.set_xlabel("%")
ax2_fig35.set_ylabel("Number of Sessions")
fig35.suptitle('Caltech users - 2019 - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 500, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig36, (ax1_fig36, ax2_fig36) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig36.hist(jpl_user651_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 651')
ax1_fig36.hist(jpl_user933_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 933')
ax1_fig36.hist(jpl_user406_df["maeMinAvailable"], bins_minutes, alpha=0.5, label='User 406')
ax1_fig36.set_ylim([0, 70])
ax1_fig36.set_title("MAE - Minutes Available")
ax1_fig36.set_xlabel("Minutes")
ax1_fig36.set_ylabel("Number of Sessions")
ax2_fig36.hist(jpl_user651_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 651')
ax2_fig36.hist(jpl_user933_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 933')
ax2_fig36.hist(jpl_user406_df["smpeMinAvailable"], bins_symmetric, alpha=0.5, label='User 406')
ax2_fig36.set_ylim([0, 70])
ax2_fig36.set_title("SMPE - Minutes Available")
ax2_fig36.set_xlabel("%")
ax2_fig36.set_ylabel("Number of Sessions")
fig36.suptitle('Jpl users - 2019 - Time Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the caltech site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig37, (ax1_fig37, ax2_fig37) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig37.hist(caltech_user743_df['maekWhRequested'], bins_minutes, alpha=0.5, label='User 743')
ax1_fig37.hist(caltech_user562_df['maekWhRequested'], bins_minutes, alpha=0.5, label='User 562')
ax1_fig37.hist(caltech_user891_df['maekWhRequested'], bins_minutes, alpha=0.5, label='User 891')
ax1_fig37.set_ylim([0, 150])
ax1_fig37.set_title("MAE - kWh Requested")
ax1_fig37.set_xlabel("kWh")
ax1_fig37.set_ylabel("Number of Sessions")
ax2_fig37.hist(caltech_user743_df['smapekWhRequested'], bins_percent, alpha=0.5, label='User 743')
ax2_fig37.hist(caltech_user562_df['smapekWhRequested'], bins_percent, alpha=0.5, label='User 562')
ax2_fig37.hist(caltech_user891_df['smapekWhRequested'], bins_percent, alpha=0.5, label='User 891')
ax2_fig37.set_ylim([0, 150])
ax2_fig37.set_title("SMAPE - kWh Requested")
ax2_fig37.set_xlabel("%")
ax2_fig37.set_ylabel("Number of Sessions")
fig37.suptitle('Caltech users - 2019 - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the jpl site
bins_minutes = np.linspace(0, 50, 100)
bins_percent = np.linspace(0, 100, 100)
bins_symmetric = np.linspace(-50, 50, 100)
fig38, (ax1_fig38, ax2_fig38) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig38.hist(jpl_user651_df["maekWhRequested"], bins_minutes, alpha=0.5, label='User 651')
ax1_fig38.hist(jpl_user933_df["maekWhRequested"], bins_minutes, alpha=0.5, label='User 933')
ax1_fig38.hist(jpl_user406_df["maekWhRequested"], bins_minutes, alpha=0.5, label='User 406')
ax1_fig38.set_ylim([0, 150])
ax1_fig38.set_title("MAE - kWh Requested")
ax1_fig38.set_xlabel("kWh")
ax1_fig38.set_ylabel("Number of Sessions")
ax2_fig38.hist(jpl_user651_df["smapekWhRequested"], bins_percent, alpha=0.5, label='User 651')
ax2_fig38.hist(jpl_user933_df["smapekWhRequested"], bins_percent, alpha=0.5, label='User 933')
ax2_fig38.hist(jpl_user406_df["smapekWhRequested"], bins_percent, alpha=0.5, label='User 406')
ax2_fig38.set_ylim([0, 150])
ax2_fig38.set_title("SMAPE - kWh Requested")
ax2_fig38.set_xlabel("%")
ax2_fig38.set_ylabel("Number of Sessions")
fig38.suptitle('Jpl users - 2019 - Energy Error Analysis', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the obtained results...

It is seen that the reliable users are the ones with smaller error and peaks around error 0.

Predictable users tend to get a gaussian-like curve patterns for the error, especially for the SMPE (symmetric mean percentage error).

Plotting the minutesTotal/minutesAvailable and kWhDelivered/kWhRequested by time

Next, we try to observe the behavior of the user by the time he arrives at the charging station and check the possibility of using this graph to model individual behavior.

Other related works use the arrival time as the variable to predict the charging duration and kWh delivered

For the next analysis only the last 50 sessions of each user analyzed will be used to plotting the graphs and calculating the reliability and predictability index. This number can be tuned to get the number representative to the current charging behavior. The predictability index should also be robust to start having prediction patterns for a small number of sessions.

#collapse-hide
# analyze minTotal, minAvailable, kWhDelivered and kWhRequested by arrivalTime for each user
print("Caltech User 743 - In one year total kWhRequested (kWh): " + str(caltech_user743_df.kWhRequested.sum()))
print("Caltech User 743 - In one year total kWhDelivered (kWh): " + str(caltech_user743_df.kWhDelivered.sum()))
print("Caltech User 562 - In one year total kWhRequested (kWh): " + str(caltech_user562_df.kWhRequested.sum()))
print("Caltech User 562 - In one year total kWhDelivered (kWh): " + str(caltech_user562_df.kWhDelivered.sum()))
print("Caltech User 891 - In one year total kWhRequested (kWh): " + str(caltech_user891_df.kWhRequested.sum()))
print("Caltech User 891 - In one year total kWhDelivered (kWh): " + str(caltech_user891_df.kWhDelivered.sum()))
print("Jpl User 651 - In one year total kWhRequested (kWh): " + str(jpl_user651_df.kWhRequested.sum()))
print("Jpl User 651 - In one year total kWhDelivered (kWh): " + str(jpl_user651_df.kWhDelivered.sum()))
print("Jpl User 933 - In one year total kWhRequested (kWh): " + str(jpl_user933_df.kWhRequested.sum()))
print("Jpl User 933 - In one year total kWhDelivered (kWh): " + str(jpl_user933_df.kWhDelivered.sum()))
print("Jpl User 406 - In one year total kWhRequested (kWh): " + str(jpl_user406_df.kWhRequested.sum()))
print("Jpl User 406 - In one year total kWhDelivered (kWh): " + str(jpl_user406_df.kWhDelivered.sum()))

# Filter last N sessions to classify users:
n_sessions = 50
caltech_user743_df = caltech_user743_df.iloc[-n_sessions:]
caltech_user562_df = caltech_user562_df.iloc[-n_sessions:]
caltech_user891_df = caltech_user891_df.iloc[-n_sessions:]
jpl_user651_df = jpl_user651_df.iloc[-n_sessions:]
jpl_user933_df = jpl_user933_df.iloc[-n_sessions:]
jpl_user406_df = jpl_user406_df.iloc[-n_sessions:]

# caltech site

# user 743
caltech_user743_uniqueTime = caltech_user743_df["arrivalTime"].unique()
caltech_user743_uniqueTime = sorted(caltech_user743_uniqueTime)
caltech_user743_minTotal = np.array([])
caltech_user743_minAvailable = np.array([])
caltech_user743_kWhRequested = np.array([])
caltech_user743_kWhDelivered = np.array([])
caltech_user743_smpeMinAvailable = np.array([])
caltech_user743_smapeMinAvailable = np.array([])
caltech_user743_smapekWhRequested = np.array([])
for time in caltech_user743_uniqueTime:
    caltech_user743_minTotal = np.append(caltech_user743_minTotal, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].minutesTotal.sum())
    caltech_user743_minAvailable = np.append(caltech_user743_minAvailable, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].minutesAvailable.sum())
    caltech_user743_kWhRequested = np.append(caltech_user743_kWhRequested, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].kWhRequested.sum())
    caltech_user743_kWhDelivered = np.append(caltech_user743_kWhDelivered, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].kWhDelivered.sum())
    caltech_user743_smpeMinAvailable = np.append(caltech_user743_smpeMinAvailable, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    caltech_user743_smapeMinAvailable = np.append(caltech_user743_smapeMinAvailable, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].smapeMinAvailable.sum())
    caltech_user743_smapekWhRequested = np.append(caltech_user743_smapekWhRequested, caltech_user743_df[caltech_user743_df.arrivalTime.isin([time])].smapekWhRequested.sum())

# user 562
caltech_user562_uniqueTime = caltech_user562_df["arrivalTime"].unique()
caltech_user562_uniqueTime = sorted(caltech_user562_uniqueTime)
caltech_user562_minTotal = np.array([])
caltech_user562_minAvailable = np.array([])
caltech_user562_kWhRequested = np.array([])
caltech_user562_kWhDelivered = np.array([])
caltech_user562_smpeMinAvailable = np.array([])
caltech_user562_smapeMinAvailable = np.array([])
caltech_user562_smapekWhRequested = np.array([])
for time in caltech_user562_uniqueTime:
    caltech_user562_minTotal = np.append(caltech_user562_minTotal, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].minutesTotal.sum())
    caltech_user562_minAvailable = np.append(caltech_user562_minAvailable, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].minutesAvailable.sum())
    caltech_user562_kWhRequested = np.append(caltech_user562_kWhRequested, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].kWhRequested.sum())
    caltech_user562_kWhDelivered = np.append(caltech_user562_kWhDelivered, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].kWhDelivered.sum())
    caltech_user562_smpeMinAvailable = np.append(caltech_user562_smpeMinAvailable, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    caltech_user562_smapeMinAvailable = np.append(caltech_user562_smapeMinAvailable, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].smapeMinAvailable.sum())                   
    caltech_user562_smapekWhRequested = np.append(caltech_user562_smapekWhRequested, caltech_user562_df[caltech_user562_df.arrivalTime.isin([time])].smapekWhRequested.sum())
    
# user 891
caltech_user891_uniqueTime = caltech_user891_df["arrivalTime"].unique()
caltech_user891_uniqueTime = sorted(caltech_user891_uniqueTime)
caltech_user891_minTotal = np.array([])
caltech_user891_minAvailable = np.array([])
caltech_user891_kWhRequested = np.array([])
caltech_user891_kWhDelivered = np.array([])
caltech_user891_smpeMinAvailable = np.array([])
caltech_user891_smapeMinAvailable = np.array([])
caltech_user891_smapekWhRequested = np.array([])
for time in caltech_user891_uniqueTime:
    caltech_user891_minTotal = np.append(caltech_user891_minTotal, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].minutesTotal.sum())
    caltech_user891_minAvailable = np.append(caltech_user891_minAvailable, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].minutesAvailable.sum())
    caltech_user891_kWhRequested = np.append(caltech_user891_kWhRequested, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].kWhRequested.sum())
    caltech_user891_kWhDelivered = np.append(caltech_user891_kWhDelivered, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].kWhDelivered.sum())
    caltech_user891_smpeMinAvailable = np.append(caltech_user891_smpeMinAvailable, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    caltech_user891_smapeMinAvailable = np.append(caltech_user891_smapeMinAvailable, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].smapeMinAvailable.sum())
    caltech_user891_smapekWhRequested = np.append(caltech_user891_smapekWhRequested, caltech_user891_df[caltech_user891_df.arrivalTime.isin([time])].smapekWhRequested.sum())
  
# jpl site

# user 651
jpl_user651_uniqueTime = jpl_user651_df["arrivalTime"].unique()
jpl_user651_uniqueTime = sorted(jpl_user651_uniqueTime)
jpl_user651_minTotal = np.array([])
jpl_user651_minAvailable = np.array([])
jpl_user651_kWhRequested = np.array([])
jpl_user651_kWhDelivered = np.array([])
jpl_user651_smpeMinAvailable = np.array([])
jpl_user651_smapeMinAvailable = np.array([])
jpl_user651_smapekWhRequested = np.array([])
for time in jpl_user651_uniqueTime:
    jpl_user651_minTotal = np.append(jpl_user651_minTotal, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].minutesTotal.sum())
    jpl_user651_minAvailable = np.append(jpl_user651_minAvailable, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].minutesAvailable.sum())
    jpl_user651_kWhRequested = np.append(jpl_user651_kWhRequested, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].kWhRequested.sum())
    jpl_user651_kWhDelivered = np.append(jpl_user651_kWhDelivered, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].kWhDelivered.sum())
    jpl_user651_smpeMinAvailable = np.append(jpl_user651_smpeMinAvailable, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    jpl_user651_smapeMinAvailable = np.append(jpl_user651_smapeMinAvailable, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].smapeMinAvailable.sum())
    jpl_user651_smapekWhRequested = np.append(jpl_user651_smapekWhRequested, jpl_user651_df[jpl_user651_df.arrivalTime.isin([time])].smapekWhRequested.sum())
    
# user 933
jpl_user933_uniqueTime = jpl_user933_df["arrivalTime"].unique()
jpl_user933_uniqueTime = sorted(jpl_user933_uniqueTime)
jpl_user933_minTotal = np.array([])
jpl_user933_minAvailable = np.array([])
jpl_user933_kWhRequested = np.array([])
jpl_user933_kWhDelivered = np.array([])
jpl_user933_smpeMinAvailable = np.array([])
jpl_user933_smapeMinAvailable = np.array([])
jpl_user933_smapekWhRequested = np.array([])
for time in jpl_user933_uniqueTime:
    jpl_user933_minTotal = np.append(jpl_user933_minTotal, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].minutesTotal.sum())
    jpl_user933_minAvailable = np.append(jpl_user933_minAvailable, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].minutesAvailable.sum())
    jpl_user933_kWhRequested = np.append(jpl_user933_kWhRequested, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].kWhRequested.sum())
    jpl_user933_kWhDelivered = np.append(jpl_user933_kWhDelivered, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].kWhDelivered.sum())
    jpl_user933_smpeMinAvailable = np.append(jpl_user933_smpeMinAvailable, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    jpl_user933_smapeMinAvailable = np.append(jpl_user933_smapeMinAvailable, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].smapeMinAvailable.sum())
    jpl_user933_smapekWhRequested = np.append(jpl_user933_smapekWhRequested, jpl_user933_df[jpl_user933_df.arrivalTime.isin([time])].smapekWhRequested.sum())

# user 406
jpl_user406_uniqueTime = jpl_user406_df["arrivalTime"].unique()
jpl_user406_uniqueTime = sorted(jpl_user406_uniqueTime)
jpl_user406_minTotal = np.array([])
jpl_user406_minAvailable = np.array([])
jpl_user406_kWhRequested = np.array([])
jpl_user406_kWhDelivered = np.array([])
jpl_user406_smpeMinAvailable = np.array([])
jpl_user406_smapeMinAvailable = np.array([])
jpl_user406_smapekWhRequested = np.array([])
for time in jpl_user406_uniqueTime:
    jpl_user406_minTotal = np.append(jpl_user406_minTotal, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].minutesTotal.sum())
    jpl_user406_minAvailable = np.append(jpl_user406_minAvailable, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].minutesAvailable.sum())
    jpl_user406_kWhRequested = np.append(jpl_user406_kWhRequested, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].kWhRequested.sum())
    jpl_user406_kWhDelivered = np.append(jpl_user406_kWhDelivered, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].kWhDelivered.sum())
    jpl_user406_smpeMinAvailable = np.append(jpl_user406_smpeMinAvailable, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].smpeMinAvailable.sum())
    jpl_user406_smapeMinAvailable = np.append(jpl_user406_smapeMinAvailable, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].smapeMinAvailable.sum())
    jpl_user406_smapekWhRequested = np.append(jpl_user406_smapekWhRequested, jpl_user406_df[jpl_user406_df.arrivalTime.isin([time])].smapekWhRequested.sum())

Caltech User 743 - In one year total kWhRequested (kWh): 2388.0
Caltech User 743 - In one year total kWhDelivered (kWh): 1212.87
Caltech User 562 - In one year total kWhRequested (kWh): 1373.0
Caltech User 562 - In one year total kWhDelivered (kWh): 973.387893894861
Caltech User 891 - In one year total kWhRequested (kWh): 1575.0
Caltech User 891 - In one year total kWhDelivered (kWh): 567.568
Jpl User 651 - In one year total kWhRequested (kWh): 4653.6
Jpl User 651 - In one year total kWhDelivered (kWh): 3001.2549999999997
Jpl User 933 - In one year total kWhRequested (kWh): 12293.739999999998
Jpl User 933 - In one year total kWhDelivered (kWh): 8322.199987777778
Jpl User 406 - In one year total kWhRequested (kWh): 4014.0
Jpl User 406 - In one year total kWhDelivered (kWh): 2468.1042388888886

Analyzing the total energy delivered for each user:

Statistic	Caltech - User 743	Caltech - User 562	Caltech - User 891	Jpl - User 651	Jpl - User 933	Jpl - User 406
kWhRequested (MWh)	2.39	1.37	1.58	4.65	12.29	4.01
kWhDelivered (MWh)	1.21	0.97	0.57	3.00	8.32	2.47

The mismatch between the kWhRequested and kWhDelivered should be explained by the adaptive scheduling algorithm and the cases where the EV is already fully charged.

Plotting graphs by arrival time

Next the graphs by arrival time to analyze each user behavior.

#collapse-hide
# plot the scatter plot for the minTotal and minAvailable for caltech users
fig22, (ax1_fig22, ax2_fig22, ax3_fig22) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig22.scatter(caltech_user743_uniqueTime, caltech_user743_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax1_fig22.scatter(caltech_user743_uniqueTime, caltech_user743_minTotal, alpha=0.2, color='b', label='minTotal')
ax1_fig22.set_xlim([0, 86400])
ax1_fig22.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig22.set_ylim([0, 1000])
ax1_fig22.set_title("Caltech - User 743")
ax1_fig22.set_xlabel("Time 0-24 hours in a day")
ax1_fig22.set_ylabel("Minutes")
ax2_fig22.scatter(caltech_user562_uniqueTime, caltech_user562_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax2_fig22.scatter(caltech_user562_uniqueTime, caltech_user562_minTotal, alpha=0.2, color='b', label='minTotal')
ax2_fig22.set_xlim([0, 86400])
ax2_fig22.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig22.set_ylim([0, 1000])
ax2_fig22.set_title("Caltech - User 562")
ax2_fig22.set_xlabel("Time 0-24 hours in a day")
ax2_fig22.set_ylabel("Minutes")
ax3_fig22.scatter(caltech_user891_uniqueTime, caltech_user891_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax3_fig22.scatter(caltech_user891_uniqueTime, caltech_user891_minTotal, alpha=0.2, color='b', label='minTotal')
ax3_fig22.set_xlim([0, 86400])
ax3_fig22.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig22.set_ylim([0, 1000])
ax3_fig22.set_title("Caltech - User 891")
ax3_fig22.set_xlabel("Time 0-24 hours in a day")
ax3_fig22.set_ylabel("Minutes")
fig22.suptitle('minTotal/minAvailable by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the kWhRequested and kWhDelivered for caltech users
fig24, (ax1_fig24, ax2_fig24, ax3_fig24) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig24.scatter(caltech_user743_uniqueTime, caltech_user743_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax1_fig24.scatter(caltech_user743_uniqueTime, caltech_user743_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax1_fig24.set_xlim([0, 86400])
ax1_fig24.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig24.set_ylim([0, 100])
ax1_fig24.set_title("Caltech - User 743")
ax1_fig24.set_xlabel("Time 0-24 hours in a day")
ax1_fig24.set_ylabel("kWh")
ax2_fig24.scatter(caltech_user562_uniqueTime, caltech_user562_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax2_fig24.scatter(caltech_user562_uniqueTime, caltech_user562_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax2_fig24.set_xlim([0, 86400])
ax2_fig24.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig24.set_ylim([0, 100])
ax2_fig24.set_title("Caltech - User 562")
ax2_fig24.set_xlabel("Time 0-24 hours in a day")
ax2_fig24.set_ylabel("kWh")
ax3_fig24.scatter(caltech_user891_uniqueTime, caltech_user891_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax3_fig24.scatter(caltech_user891_uniqueTime, caltech_user891_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax3_fig24.set_xlim([0, 86400])
ax3_fig24.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig24.set_ylim([0, 100])
ax3_fig24.set_title("Caltech - User 891")
ax3_fig24.set_xlabel("Time 0-24 hours in a day")
ax3_fig24.set_ylabel("kWh")
fig24.suptitle('kWhRequested/kWhDelivered by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the minTotal and minAvailable for jpl users
fig23, (ax1_fig23, ax2_fig23, ax3_fig23) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig23.scatter(jpl_user651_uniqueTime, jpl_user651_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax1_fig23.scatter(jpl_user651_uniqueTime, jpl_user651_minTotal, alpha=0.2, color='b', label='minTotal')
ax1_fig23.set_xlim([0, 86400])
ax1_fig23.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig23.set_ylim([0, 1000])
ax1_fig23.set_title("Jpl - User 651")
ax1_fig23.set_xlabel("Time 0-24 hours in a day")
ax1_fig23.set_ylabel("Minutes")
ax2_fig23.scatter(jpl_user933_uniqueTime, jpl_user933_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax2_fig23.scatter(jpl_user933_uniqueTime, jpl_user933_minTotal, alpha=0.2, color='b', label='minTotal')
ax2_fig23.set_xlim([0, 86400])
ax2_fig23.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig23.set_ylim([0, 1000])
ax2_fig23.set_title("Jpl - User 933")
ax2_fig23.set_xlabel("Time 0-24 hours in a day")
ax2_fig23.set_ylabel("Minutes")
ax3_fig23.scatter(jpl_user406_uniqueTime, jpl_user406_minAvailable, alpha=0.2, color='r', label='minAvailable')
ax3_fig23.scatter(jpl_user406_uniqueTime, jpl_user406_minTotal, alpha=0.2, color='b', label='minTotal')
ax3_fig23.set_xlim([0, 86400])
ax3_fig23.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig23.set_ylim([0, 1000])
ax3_fig23.set_title("Jpl - User 406")
ax3_fig23.set_xlabel("Time 0-24 hours in a day")
ax3_fig23.set_ylabel("Minutes")
fig23.suptitle('minTotal/minAvailable by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the kWhRequested and kWhDelivered for jpl users
fig25, (ax1_fig25, ax2_fig25, ax3_fig25) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig25.scatter(jpl_user651_uniqueTime, jpl_user651_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax1_fig25.scatter(jpl_user651_uniqueTime, jpl_user651_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax1_fig25.set_xlim([0, 86400])
ax1_fig25.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig25.set_ylim([0, 100])
ax1_fig25.set_title("Jpl - User 651")
ax1_fig25.set_xlabel("Time 0-24 hours in a day")
ax1_fig25.set_ylabel("kWh")
ax2_fig25.scatter(jpl_user933_uniqueTime, jpl_user933_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax2_fig25.scatter(jpl_user933_uniqueTime, jpl_user933_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax2_fig25.set_xlim([0, 86400])
ax2_fig25.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig25.set_ylim([0, 100])
ax2_fig25.set_title("Jpl - User 933")
ax2_fig25.set_xlabel("Time 0-24 hours in a day")
ax2_fig25.set_ylabel("kWh")
ax3_fig25.scatter(jpl_user406_uniqueTime, jpl_user406_kWhRequested, alpha=0.2, color='r', label='kWhRequested')
ax3_fig25.scatter(jpl_user406_uniqueTime, jpl_user406_kWhDelivered, alpha=0.2, color='b', label='kWhDelivered')
ax3_fig25.set_xlim([0, 86400])
ax3_fig25.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig25.set_ylim([0, 100])
ax3_fig25.set_title("Jpl - User 406")
ax3_fig25.set_xlabel("Time 0-24 hours in a day")
ax3_fig25.set_ylabel("kWh")
fig25.suptitle('kWhRequested/kWhDelivered by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Interpreting the graphs obtained

It is important to remember to analyze and get insights from this graph that the variables kWhRequested and minAvailable are user inputs, while the variables kWhDelivered and minTotal are the real data based on the user behavior charging it is EV.

With this mention, it is clear that, in the optimal case, the red and blue dots would be coincident for every example draw in the plot. If the optimal case is met, an optimal adaptive scheduler could be implemented. An observation to the graphs shows that this is not what is observed, so there is a mismatch between the data input by users and the real user behavior.

The Caltech user 743 is a good example of a user input that has a good similarity with the real user behavior. If the minTotal/minAvailable is analyzed, a low error is even visually observed. If we check the table with our calculations we see that his error between minTotal and minAvailable is around 47 minutes, while the average for other users is more than 2 hours.

On the other hand the Jpl user 651 have a low similarity in these graphs.

Next, we plot the same graphs to check the clustering of the error metrics MAE, SMAPE, SMPE

We are going to plot the SMPE metric for the time analysis and the SMAPE metric for the energy analysis.

#collapse-hide
# plot the scatter plot for the minTotal and minAvailable for caltech users
fig39, (ax1_fig39, ax2_fig39, ax3_fig39) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig39.scatter(caltech_user743_uniqueTime, caltech_user743_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax1_fig39.set_xlim([0, 86400])
ax1_fig39.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig39.set_ylim([-100, 100])
ax1_fig39.set_title("Caltech - User 743")
ax1_fig39.set_xlabel("Time 0-24 hours in a day")
ax1_fig39.set_ylabel("%")
ax2_fig39.scatter(caltech_user562_uniqueTime, caltech_user562_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax2_fig39.set_xlim([0, 86400])
ax2_fig39.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig39.set_ylim([-100, 100])
ax2_fig39.set_title("Caltech - User 562")
ax2_fig39.set_xlabel("Time 0-24 hours in a day")
ax2_fig39.set_ylabel("%")
ax3_fig39.scatter(caltech_user891_uniqueTime, caltech_user891_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax3_fig39.set_xlim([0, 86400])
ax3_fig39.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig39.set_ylim([-100, 100])
ax3_fig39.set_title("Caltech - User 891")
ax3_fig39.set_xlabel("Time 0-24 hours in a day")
ax3_fig39.set_ylabel("%")
fig39.suptitle('SMPE minAvailable/minTotal by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the kWhRequested and kWhDelivered for caltech users
fig40, (ax1_fig40, ax2_fig40, ax3_fig40) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig40.scatter(caltech_user743_uniqueTime, caltech_user743_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax1_fig40.set_xlim([0, 86400])
ax1_fig40.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig40.set_ylim([0, 100])
ax1_fig40.set_title("Caltech - User 743")
ax1_fig40.set_xlabel("Time 0-24 hours in a day")
ax1_fig40.set_ylabel("%")
ax2_fig40.scatter(caltech_user562_uniqueTime, caltech_user562_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax2_fig40.set_xlim([0, 86400])
ax2_fig40.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig40.set_ylim([0, 100])
ax2_fig40.set_title("Caltech - User 562")
ax2_fig40.set_xlabel("Time 0-24 hours in a day")
ax2_fig40.set_ylabel("%")
ax3_fig40.scatter(caltech_user891_uniqueTime, caltech_user891_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax3_fig40.set_xlim([0, 86400])
ax3_fig40.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig40.set_ylim([0, 100])
ax3_fig40.set_title("Caltech - User 891")
ax3_fig40.set_xlabel("Time 0-24 hours in a day")
ax3_fig40.set_ylabel("%")
fig40.suptitle('SMAPE kWhRequested/kWhDelivered by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the minTotal and minAvailable for jpl users
fig41, (ax1_fig41, ax2_fig41, ax3_fig41) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig41.scatter(jpl_user651_uniqueTime, jpl_user651_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax1_fig41.set_xlim([0, 86400])
ax1_fig41.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig41.set_ylim([-100, 100])
ax1_fig41.set_title("Jpl - User 651")
ax1_fig41.set_xlabel("Time 0-24 hours in a day")
ax1_fig41.set_ylabel("%")
ax2_fig41.scatter(jpl_user933_uniqueTime, jpl_user933_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax2_fig41.set_xlim([0, 86400])
ax2_fig41.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig41.set_ylim([-100, 100])
ax2_fig41.set_title("Jpl - User 933")
ax2_fig41.set_xlabel("Time 0-24 hours in a day")
ax2_fig41.set_ylabel("%")
ax3_fig41.scatter(jpl_user406_uniqueTime, jpl_user406_smpeMinAvailable, alpha=0.2, color='r', label='SMPE minAvailable')
ax3_fig41.set_xlim([0, 86400])
ax3_fig41.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig41.set_ylim([-100, 100])
ax3_fig41.set_title("Jpl - User 406")
ax3_fig41.set_xlabel("Time 0-24 hours in a day")
ax3_fig41.set_ylabel("%")
fig41.suptitle('SMPE minAvailable/minTotal by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the scatter plot for the kWhRequested and kWhDelivered for jpl users
fig42, (ax1_fig42, ax2_fig42, ax3_fig42) = plt.subplots(1, 3, figsize=(15, 5))
ax1_fig42.scatter(jpl_user651_uniqueTime, jpl_user651_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax1_fig42.set_xlim([0, 86400])
ax1_fig42.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax1_fig42.set_ylim([0, 100])
ax1_fig42.set_title("Jpl - User 651")
ax1_fig42.set_xlabel("Time 0-24 hours in a day")
ax1_fig42.set_ylabel("%")
ax2_fig42.scatter(jpl_user933_uniqueTime, jpl_user933_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax2_fig42.set_xlim([0, 86400])
ax2_fig42.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax2_fig42.set_ylim([0, 100])
ax2_fig42.set_title("Jpl - User 933")
ax2_fig42.set_xlabel("Time 0-24 hours in a day")
ax2_fig42.set_ylabel("%")
ax3_fig42.scatter(jpl_user406_uniqueTime, jpl_user406_smapekWhRequested, alpha=0.2, color='r', label='SMAPE kWhRequested')
ax3_fig42.set_xlim([0, 86400])
ax3_fig42.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
ax3_fig42.set_ylim([0, 100])
ax3_fig42.set_title("Jpl - User 406")
ax3_fig42.set_xlabel("Time 0-24 hours in a day")
ax3_fig42.set_ylabel("%")
fig42.suptitle('SMAPE kWhRequested/kWhDelivered by Arrival Time in one year', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Analyzing the obtained error scatter plots

For the Caltech site:

User 743: User 743 is realiable and predictable for the minAvailable variable. User 743 is unreliable but predictable for the kWhRequested variable.
User 562: User 562 reliability and predictability for the minAvailable and kWhRequested variables depends on the arrival time.
User 891: User 891 is unreliable and unpredictable for both minAvailable and kWhRequested variables.

For the Jpl site:

User 651: The user 651 is usually unreliable and unpredictable for the minAvailable variable. User 651 is unreliable but predictable for the kWhRequested variable.
User 406: User 406 is realiable and predictable for the minAvailable variable. User 406 is unreliable and unpredictable for the kWhRequested variable.
User 933: User 933 is realiable and predictable for the minAvailable variable. User 933 is unreliable but predictable for the kWhRequested variable.

Proposing User Classification rules...

As we see that the users have different behaviors and also different reliability from its inputs another question that arises is that if we can find patterns in their behavior. A visually way to see these patterns is in the form of clusters in the data. For example, Caltech user 743 seems to always choose a value of 20kWh in the great majority of his requests.

Analyzing references [1] and [2], usually the data modeled is regarding the behavior of modeled arrivals and modeled departures by time. An example is shown in the next figure. With this graph, it is possible to predict, for example the maximum capacity needed in peak time by the charging station. Also predict if the user needs to wait for charging the vehicle at peak times.

For our case, the difficulty to model individual user data is increased. First of all, there exists a limited number of sessions by each user. By our analysis we show that usually the user charge his EV less than 10 times per year in the charging network. This pattern will probably be alleviated with the popularization of EVs.

Here, I will try to design a user classification rules. First I define the users based on the reliability of the data:

Reliable User: an user with a low error, SMAPE, between input data and real behavior.
Unreliable User: an user with a high error, SMAPE, between input data and real behavior.

Then I define the user based on the predictability of his behaviors:

Predictable User: an user with well defined clusters for their SMPE and SMAPE for the minTotal and kWhDelivered variables, respectively.
Unpredictable User: an user with high variance for their SMPE and SMAPE for the minTotal and/or kWhDelivered variables, respectively, turning it an unpredictable user.

Also, we can classify the individual user behavior in four different patterns (for each: time and energy):

Reliable and Predictable User
Reliable and Unpredictable User
Unreliable and Predictable User
Unreliable and Unpredictable User

Optimally we would like to being able to identify these 4 types of users based on their data.

Also, in this work we focus on identifying the behavioral patterns in predictable users.

Reliability Index

Next we propose a condition to classify reliable users based on the SMAPE error metric.

An user is classified as a reliable user if SMAPE < THRESHOLD.

To do that, we need to define the value of our THRESHOLD. For that we will filter and analyze the dataset based on different values for this threshold.

#collapse-hide
caltech_smape_df = caltech_2019_df[['userID', 'smapeMinAvailable', 'smapekWhRequested']].groupby(['userID']).mean()
jpl_smape_df = jpl_2019_df[['userID', 'smapeMinAvailable', 'smapekWhRequested']].groupby(['userID']).mean()

# calculate and plot the percentage graph based on the smape
list_of_smapes = range(100)
caltech_percentages_minutes = []
jpl_percentages_minutes = []
caltech_percentages_kwh = []
jpl_percentages_kwh = []
for value in list_of_smapes:
    filtered_df = caltech_smape_df[caltech_smape_df['smapeMinAvailable'] < value]
    caltech_percentages_minutes.append(filtered_df.shape[0]/caltech_smape_df.shape[0])
    filtered_df = jpl_smape_df[jpl_smape_df['smapeMinAvailable'] < value]
    jpl_percentages_minutes.append(filtered_df.shape[0]/jpl_smape_df.shape[0])
    filtered_df = caltech_smape_df[caltech_smape_df['smapekWhRequested'] < value]
    caltech_percentages_kwh.append(filtered_df.shape[0]/caltech_smape_df.shape[0])
    filtered_df = jpl_smape_df[jpl_smape_df['smapekWhRequested'] < value]
    jpl_percentages_kwh.append(filtered_df.shape[0]/jpl_smape_df.shape[0])
    
# plot the scatter plot for the kWhRequested and kWhDelivered by arrival time in caltech and jpl sites
fig43, (ax1_fig43, ax2_fig43) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig43.plot(list_of_smapes, caltech_percentages_minutes, color='r', label='Minutes')
ax1_fig43.plot(list_of_smapes, caltech_percentages_kwh, color='b', label='kWh')
ax1_fig43.set_xlim([0, 100])
ax1_fig43.set_ylim([0, 1])
ax1_fig43.set_title("Caltech")
ax1_fig43.set_xlabel("Threshold (SMAPE)")
ax1_fig43.set_ylabel("%")
ax2_fig43.plot(list_of_smapes, jpl_percentages_minutes, color='r', label='Minutes')
ax2_fig43.plot(list_of_smapes, jpl_percentages_kwh, color='b', label='kWh')
ax2_fig43.set_xlim([0, 100])
ax2_fig43.set_ylim([0, 1])
ax2_fig43.set_title("Jpl")
ax2_fig43.set_xlabel("Threshold (SMAPE)")
ax2_fig43.set_ylabel("%")
fig43.suptitle('Percentage of users with SMAPE below Threshold', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

Calculating the reliability index...

#collapse-hide

# caltech - measure the reliability
print("Caltech:")
print("Time Reliability User 743: " + str(1 - caltech_user743_smapeMinAvailable.mean()/100))
print("Energy Reliability User 743: " + str(1 - caltech_user743_smapekWhRequested.mean()/100))
print("Time Reliability User 562: " + str(1 - caltech_user562_smapeMinAvailable.mean()/100))
print("Energy Reliability User 562: " + str(1 - caltech_user562_smapekWhRequested.mean()/100))
print("Time Reliability User 891: " + str(1 - caltech_user891_smapeMinAvailable.mean()/100))
print("Energy Reliability User 891: " + str(1 - caltech_user891_smapekWhRequested.mean()/100))

print("")

# jpl - measure the reliability
print("Jpl: ")
print("Time Reliability User 651: " + str(1 -jpl_user651_smapeMinAvailable.mean()/100))
print("Energy Reliability User 651: " + str(1 -jpl_user651_smapekWhRequested.mean()/100))
print("Time Reliability User 933: " + str(1 -jpl_user933_smapeMinAvailable.mean()/100))
print("Energy Reliability User 933: " + str(1 -jpl_user933_smapekWhRequested.mean()/100))
print("Time Reliability User 406: " + str(1 - jpl_user406_smapeMinAvailable.mean()/100))
print("Energy Reliability User 406: " + str(1 - jpl_user406_smapekWhRequested.mean()/100))

Caltech:
Time Reliability User 743: 0.8504673576786131
Energy Reliability User 743: 0.6277572196855845
Time Reliability User 562: 0.785584672371714
Energy Reliability User 562: 0.7939832732055515
Time Reliability User 891: 0.5333600943110675
Energy Reliability User 891: 0.43581161153553927

Jpl: 
Time Reliability User 651: 0.7762480507733366
Energy Reliability User 651: 0.7468218605326276
Time Reliability User 933: 0.853117315033301
Energy Reliability User 933: 0.7735805626169042
Time Reliability User 406: 0.967961431359043
Energy Reliability User 406: 0.9458244365765816

To calculate our reliability index we will set the threshold equal to 20(%).

To define this threshold it is possible that a more complex methodology should be employed.

Predictability Index

Next we propose a condition to classify predictable users.

To define a predictable user the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm is chosen. DBSCAN is chosen because of being an unsupervised algorithm and good at separating clusters of high density versus clusters of low density. We can also use DBSCAN to classify the predictability of users with more than the hyperparameter min_points to calculate the predictability index. In this work, min_points is set to 5.

For the future, the clustering can be assigned to multiple users. In this work, the analysis is limited to individual behavior.

Before using DBSCAN, we analyze the standard deviation of each user in the axis of different variables:

#collapse-hide

# caltech - measure of the standard deviation to calculate predictability
print("Caltech User 743 - STD SMAPE Min Available: " + str(caltech_user743_smapeMinAvailable.std()))
print("Caltech User 743 - STD SMAPE kWh Requested: " + str(caltech_user743_smapekWhRequested.std()))
print("Caltech User 562 - STD SMAPE Min Available: " + str(caltech_user562_smapeMinAvailable.std()))
print("Caltech User 562 - STD SMAPE kWh Requested: " + str(caltech_user562_smapekWhRequested.std()))
print("Caltech User 891 - STD SMAPE Min Available: " + str(caltech_user891_smapeMinAvailable.std()))
print("Caltech User 891 - STD SMAPE kWh Requested: " + str(caltech_user891_smapekWhRequested.std()))

print("")

# jpl - measure of the standard deviation to calculate predictability
print("Jpl User 651 - STD SMAPE Min Available: " + str(jpl_user651_smapeMinAvailable.std()))
print("Jpl User 651 - STD SMAPE kWh Requested: " + str(jpl_user651_smapekWhRequested.std()))
print("Jpl User 933 - STD SMAPE Min Available: " + str(jpl_user933_smapeMinAvailable.std()))
print("Jpl User 933 - STD SMAPE kWh Requested: " + str(jpl_user933_smapekWhRequested.std()))
print("Jpl User 406 - STD SMAPE Min Available: " + str(jpl_user406_smapeMinAvailable.std()))
print("Jpl User 406 - STD SMAPE kWh Requested: " + str(jpl_user406_smapekWhRequested.std()))

Caltech User 743 - STD SMAPE Min Available: 19.7162742225585
Caltech User 743 - STD SMAPE kWh Requested: 16.575317172350832
Caltech User 562 - STD SMAPE Min Available: 17.64080899452569
Caltech User 562 - STD SMAPE kWh Requested: 15.590282274993541
Caltech User 891 - STD SMAPE Min Available: 27.418800804316312
Caltech User 891 - STD SMAPE kWh Requested: 13.288469524755477

Jpl User 651 - STD SMAPE Min Available: 14.46846809870606
Jpl User 651 - STD SMAPE kWh Requested: 17.698744416817362
Jpl User 933 - STD SMAPE Min Available: 13.519772140663338
Jpl User 933 - STD SMAPE kWh Requested: 12.3705750061042
Jpl User 406 - STD SMAPE Min Available: 4.472968289213045
Jpl User 406 - STD SMAPE kWh Requested: 2.953748508456715

To be predictable, an user should have a low standard deviation. We see from the values that the border is not clear to define a thresold to classify a user as predictable based on the STD of the SMAPE metric. This corroborates our idea of using DBSCAN to find clusters in the user behavior.

Using DBSCAN to cluster the data:

Hyperparameters to tune:

Eps: related to the % of precision around a given point in the dataset (set to 0.05 in this work, can be related to the value of % clustering radius, even this relation is more complex and in our case, would be better if this value is sensitive to the axis, as it is desired clusters in the x axis but not in the y axis.)
MinPts: This is a sensitive number, as we want to be based to cluster individual user behavior based on a really low MinPts value. It can be understood as the minimum number of sessios for the user to have a predictability index. In this work is set to 5.

#collapse-hide
from sklearn.cluster import DBSCAN
eps_min = 0.05
eps_kwh = 0.05
min_samples_ = 5
color = ['blue', 'green', 'red', 'orange', 'black']

# obs: correct variable naming

# caltech - user 743
model_user743_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user743_smpeMinAvailable = np.stack((np.array(caltech_user743_uniqueTime)/86400, np.array(caltech_user743_smpeMinAvailable)/100), axis=1)
Y_user743_smpeMinAvailable = model_user743_smpeMinAvailable.fit_predict(X_user743_smpeMinAvailable)
clusters_Y_user743_smpeMinAvailable = np.unique(Y_user743_smpeMinAvailable)
model_user743_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user743_smapekWhRequested = np.stack((np.array(caltech_user743_uniqueTime)/86400, np.array(caltech_user743_smapekWhRequested)/100), axis=1)
Y_user743_smapekWhRequested = model_user743_smapekWhRequested.fit_predict(X_user743_smapekWhRequested)
clusters_Y_user743_smapekWhRequested = np.unique(Y_user743_smapekWhRequested)

# caltech - user 562
model_user562_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user562_smpeMinAvailable = np.stack((np.array(caltech_user562_uniqueTime)/86400, np.array(caltech_user562_smpeMinAvailable)/100), axis=1)
Y_user562_smpeMinAvailable = model_user562_smpeMinAvailable.fit_predict(X_user562_smpeMinAvailable)
clusters_Y_user562_smpeMinAvailable = np.unique(Y_user562_smpeMinAvailable)
model_user562_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user562_smapekWhRequested = np.stack((np.array(caltech_user562_uniqueTime)/86400, np.array(caltech_user562_smapekWhRequested)/100), axis=1)
Y_user562_smapekWhRequested = model_user562_smapekWhRequested.fit_predict(X_user562_smapekWhRequested)
clusters_Y_user562_smapekWhRequested = np.unique(Y_user562_smapekWhRequested)

# caltech - user 891
model_user891_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user891_smpeMinAvailable = np.stack((np.array(caltech_user891_uniqueTime)/86400, np.array(caltech_user891_smpeMinAvailable)/100), axis=1)
Y_user891_smpeMinAvailable = model_user891_smpeMinAvailable.fit_predict(X_user891_smpeMinAvailable)
clusters_Y_user891_smpeMinAvailable = np.unique(Y_user891_smpeMinAvailable)
model_user891_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user891_smapekWhRequested = np.stack((np.array(caltech_user891_uniqueTime)/86400, np.array(caltech_user891_smapekWhRequested)/100), axis=1)
Y_user891_smapekWhRequested = model_user891_smapekWhRequested.fit_predict(X_user891_smapekWhRequested)
clusters_Y_user891_smapekWhRequested = np.unique(Y_user891_smapekWhRequested)

# jpl - user 651
model_user651_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user651_smpeMinAvailable = np.stack((np.array(jpl_user651_uniqueTime)/86400, np.array(jpl_user651_smpeMinAvailable)/100), axis=1)
Y_user651_smpeMinAvailable = model_user651_smpeMinAvailable.fit_predict(X_user651_smpeMinAvailable)
clusters_Y_user651_smpeMinAvailable = np.unique(Y_user651_smpeMinAvailable)
model_user651_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user651_smapekWhRequested = np.stack((np.array(jpl_user651_uniqueTime)/86400, np.array(jpl_user651_smapekWhRequested)/100), axis=1)
Y_user651_smapekWhRequested = model_user651_smapekWhRequested.fit_predict(X_user651_smapekWhRequested)
clusters_Y_user651_smapekWhRequested = np.unique(Y_user651_smapekWhRequested)

# jpl - user 933
model_user933_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user933_smpeMinAvailable = np.stack((np.array(jpl_user933_uniqueTime)/86400, np.array(jpl_user933_smpeMinAvailable)/100), axis=1)
Y_user933_smpeMinAvailable = model_user933_smpeMinAvailable.fit_predict(X_user933_smpeMinAvailable)
clusters_Y_user933_smpeMinAvailable = np.unique(Y_user933_smpeMinAvailable)
model_user933_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user933_smapekWhRequested = np.stack((np.array(jpl_user933_uniqueTime)/86400, np.array(jpl_user933_smapekWhRequested)/100), axis=1)
Y_user933_smapekWhRequested = model_user933_smapekWhRequested.fit_predict(X_user933_smapekWhRequested)
clusters_Y_user933_smapekWhRequested = np.unique(Y_user933_smapekWhRequested)

# jpl - user 406
model_user406_smpeMinAvailable = DBSCAN(eps=eps_min, min_samples=min_samples_)
X_user406_smpeMinAvailable = np.stack((np.array(jpl_user406_uniqueTime)/86400, np.array(jpl_user406_smpeMinAvailable)/100), axis=1)
Y_user406_smpeMinAvailable = model_user406_smpeMinAvailable.fit_predict(X_user406_smpeMinAvailable)
clusters_Y_user406_smpeMinAvailable = np.unique(Y_user406_smpeMinAvailable)
model_user406_smapekWhRequested = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
X_user406_smapekWhRequested = np.stack((np.array(jpl_user406_uniqueTime)/86400, np.array(jpl_user406_smapekWhRequested)/100), axis=1)
Y_user406_smapekWhRequested = model_user406_smapekWhRequested.fit_predict(X_user406_smapekWhRequested)
clusters_Y_user406_smapekWhRequested = np.unique(Y_user406_smapekWhRequested)

# plot the scatter plot for the minTotal and minAvailable for caltech users
fig44, (ax1_fig44, ax2_fig44, ax3_fig44) = plt.subplots(1, 3, figsize=(15, 5))
for cluster in clusters_Y_user743_smpeMinAvailable:
    row_ix = np.where(Y_user743_smpeMinAvailable == cluster)
    ax1_fig44.scatter(X_user743_smpeMinAvailable[row_ix, 0], X_user743_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax1_fig44.set_xlim([0, 1])
ax1_fig44.set_ylim([-1, 1])
ax1_fig44.set_title("Caltech - User 743")
ax1_fig44.set_xlabel("Time 0-24 hours in a day")
ax1_fig44.set_ylabel("%")
for cluster in clusters_Y_user562_smpeMinAvailable:
    row_ix = np.where(Y_user562_smpeMinAvailable == cluster)
    ax2_fig44.scatter(X_user562_smpeMinAvailable[row_ix, 0], X_user562_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax2_fig44.set_xlim([0, 1])
ax2_fig44.set_ylim([-1, 1])
ax2_fig44.set_title("Caltech - User 562")
ax2_fig44.set_xlabel("Time 0-24 hours in a day")
ax2_fig44.set_ylabel("%")
for cluster in clusters_Y_user891_smpeMinAvailable:
    row_ix = np.where(Y_user891_smpeMinAvailable == cluster)
    ax3_fig44.scatter(X_user891_smpeMinAvailable[row_ix, 0], X_user891_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax3_fig44.set_xlim([0, 1])
ax3_fig44.set_ylim([-1, 1])
ax3_fig44.set_title("Caltech - User 891")
ax3_fig44.set_xlabel("Time 0-24 hours in a day")
ax3_fig44.set_ylabel("%")
fig44.suptitle('SMPE minAvailable/minTotal by Arrival Time in one year', y=1.05)
plt.tight_layout()

# plot the scatter plot for the kWhDelivered and kWhRequested for caltech users
fig45, (ax1_fig45, ax2_fig45, ax3_fig45) = plt.subplots(1, 3, figsize=(15, 5))
for cluster in clusters_Y_user743_smapekWhRequested:
    row_ix = np.where(Y_user743_smapekWhRequested == cluster)
    ax1_fig45.scatter(X_user743_smapekWhRequested[row_ix, 0], X_user743_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax1_fig45.set_xlim([0, 1])
ax1_fig45.set_ylim([0, 1])
ax1_fig45.set_title("Caltech - User 743")
ax1_fig45.set_xlabel("Time 0-24 hours in a day")
ax1_fig45.set_ylabel("%")
for cluster in clusters_Y_user562_smapekWhRequested:
    row_ix = np.where(Y_user562_smapekWhRequested == cluster)
    ax2_fig45.scatter(X_user562_smapekWhRequested[row_ix, 0], X_user562_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax2_fig45.set_xlim([0, 1])
ax2_fig45.set_ylim([0, 1])
ax2_fig45.set_title("Caltech - User 562")
ax2_fig45.set_xlabel("Time 0-24 hours in a day")
ax2_fig45.set_ylabel("%")
for cluster in clusters_Y_user891_smapekWhRequested:
    row_ix = np.where(Y_user891_smapekWhRequested == cluster)
    ax3_fig45.scatter(X_user891_smapekWhRequested[row_ix, 0], X_user891_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax3_fig45.set_xlim([0, 1])
ax3_fig45.set_ylim([0, 1])
ax3_fig45.set_title("Caltech - User 891")
ax3_fig45.set_xlabel("Time 0-24 hours in a day")
ax3_fig45.set_ylabel("%")
fig45.suptitle('SMAPE kWhDelivered/kWhRequested by Arrival Time in one year', y=1.05)
plt.tight_layout()

# plot the scatter plot for the minTotal and minAvailable for caltech users
fig46, (ax1_fig46, ax2_fig46, ax3_fig46) = plt.subplots(1, 3, figsize=(15, 5))
for cluster in clusters_Y_user651_smpeMinAvailable:
    row_ix = np.where(Y_user651_smpeMinAvailable == cluster)
    ax1_fig46.scatter(X_user651_smpeMinAvailable[row_ix, 0], X_user651_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax1_fig46.set_xlim([0, 1])
ax1_fig46.set_ylim([-1, 1])
ax1_fig46.set_title("Jpl - User 651")
ax1_fig46.set_xlabel("Time 0-24 hours in a day")
ax1_fig46.set_ylabel("%")
for cluster in clusters_Y_user933_smpeMinAvailable:
    row_ix = np.where(Y_user933_smpeMinAvailable == cluster)
    ax2_fig46.scatter(X_user933_smpeMinAvailable[row_ix, 0], X_user933_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax2_fig46.set_xlim([0, 1])
ax2_fig46.set_ylim([-1, 1])
ax2_fig46.set_title("Jpl - User 933")
ax2_fig46.set_xlabel("Time 0-24 hours in a day")
ax2_fig46.set_ylabel("%")
for cluster in clusters_Y_user406_smpeMinAvailable:
    row_ix = np.where(Y_user406_smpeMinAvailable == cluster)
    ax3_fig46.scatter(X_user406_smpeMinAvailable[row_ix, 0], X_user406_smpeMinAvailable[row_ix, 1], c=color[cluster], alpha=0.2)
ax3_fig46.set_xlim([0, 1])
ax3_fig46.set_ylim([-1, 1])
ax3_fig46.set_title("Jpl - User 406")
ax3_fig46.set_xlabel("Time 0-24 hours in a day")
ax3_fig46.set_ylabel("%")
fig46.suptitle('SMPE minAvailable/minTotal by Arrival Time in one year', y=1.05)
plt.tight_layout()

# plot the scatter plot for the kWhDelivered and kWhRequested for caltech users
fig47, (ax1_fig47, ax2_fig47, ax3_fig47) = plt.subplots(1, 3, figsize=(15, 5))
for cluster in clusters_Y_user651_smapekWhRequested:
    row_ix = np.where(Y_user651_smapekWhRequested == cluster)
    ax1_fig47.scatter(X_user651_smapekWhRequested[row_ix, 0], X_user651_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax1_fig47.set_xlim([0, 1])
ax1_fig47.set_ylim([0, 1])
ax1_fig47.set_title("Jpl - User 651")
ax1_fig47.set_xlabel("Time 0-24 hours in a day")
ax1_fig47.set_ylabel("%")
for cluster in clusters_Y_user933_smapekWhRequested:
    row_ix = np.where(Y_user933_smapekWhRequested == cluster)
    ax2_fig47.scatter(X_user933_smapekWhRequested[row_ix, 0], X_user933_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax2_fig47.set_xlim([0, 1])
ax2_fig47.set_ylim([0, 1])
ax2_fig47.set_title("Jpl - User 933")
ax2_fig47.set_xlabel("Time 0-24 hours in a day")
ax2_fig47.set_ylabel("%")
for cluster in clusters_Y_user406_smapekWhRequested:
    row_ix = np.where(Y_user406_smapekWhRequested == cluster)
    ax3_fig47.scatter(X_user406_smapekWhRequested[row_ix, 0], X_user406_smapekWhRequested[row_ix, 1], c=color[cluster], alpha=0.2)
ax3_fig47.set_xlim([0, 1])
ax3_fig47.set_ylim([0, 1])
ax3_fig47.set_title("Jpl - User 406")
ax3_fig47.set_xlabel("Time 0-24 hours in a day")
ax3_fig47.set_ylabel("%")
fig47.suptitle('SMAPE kWhDelivered/kWhRequested by Arrival Time in one year', y=1.05)
plt.tight_layout()

Predictability Index:

number of outliers / number of totals < threshold

#collapse-hide
print("Caltech: ")
print("Time Predictability - User 743: " + str(1 - (np.where(Y_user743_smpeMinAvailable == -1)[0].shape[0]/Y_user743_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 743: " + str(1 - (np.where(Y_user743_smapekWhRequested == -1)[0].shape[0]/Y_user743_smapekWhRequested.shape[0])))
print("Time Predictability - User 562: " + str(1 - (np.where(Y_user562_smpeMinAvailable == -1)[0].shape[0]/Y_user562_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 562: " + str(1 - (np.where(Y_user562_smapekWhRequested == -1)[0].shape[0]/Y_user562_smapekWhRequested.shape[0])))
print("Time Predictability - User 891: " + str(1 - (np.where(Y_user891_smpeMinAvailable == -1)[0].shape[0]/Y_user891_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 891: " + str(1 - (np.where(Y_user891_smapekWhRequested == -1)[0].shape[0]/Y_user891_smapekWhRequested.shape[0])))
print("")
print("Jpl: ")
print("Time Predictability - User 651: " + str(1 - (np.where(Y_user651_smpeMinAvailable == -1)[0].shape[0]/Y_user651_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 651: " + str(1 - (np.where(Y_user651_smapekWhRequested == -1)[0].shape[0]/Y_user651_smapekWhRequested.shape[0])))
print("Time Predictability - User 933: " + str(1 - (np.where(Y_user933_smpeMinAvailable == -1)[0].shape[0]/Y_user933_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 933: " + str(1 - (np.where(Y_user933_smapekWhRequested == -1)[0].shape[0]/Y_user933_smapekWhRequested.shape[0])))
print("Time Predictability - User 406: " + str(1 - (np.where(Y_user406_smpeMinAvailable == -1)[0].shape[0]/Y_user406_smpeMinAvailable.shape[0])))
print("Energy Predictability - User 406: " + str(1 - (np.where(Y_user406_smapekWhRequested == -1)[0].shape[0]/Y_user406_smapekWhRequested.shape[0])))

Caltech: 
Time Predictability - User 743: 0.5800000000000001
Energy Predictability - User 743: 0.56
Time Predictability - User 562: 0.7
Energy Predictability - User 562: 0.8200000000000001
Time Predictability - User 891: 0.52
Energy Predictability - User 891: 0.74

Jpl: 
Time Predictability - User 651: 0.9
Energy Predictability - User 651: 0.86
Time Predictability - User 933: 0.6938775510204082
Energy Predictability - User 933: 0.7959183673469388
Time Predictability - User 406: 0.9795918367346939
Energy Predictability - User 406: 1.0

Final Rules:

Reliability Index (RI): $1 - SMAPE/100$

Reliability Condition: RI > 0.8

Predictability Index (PI): $1 - CLUSTERED_{DATA}/TOTAL_{DATA}$

Time Data: SMPE

Energy Data: SMAPE

Predictability Condition: PI > 0.8

Table with Reliability Index and Predictability Index for users:

Statistic	Caltech - User 743	Caltech - User 562	Caltech - User 891	Jpl - User 651	Jpl - User 933	Jpl - User 406
Time/Reliability	0.850	0.786	0.533	0.776	0.853	0.968
Energy/Reliability	0.628	0.794	0.436	0.747	0.774	0.946
Time/Predictability	0.580	0.700	0.520	0.900	0.694	0.979
Energy/Predictability	0.56	0.820	0.74	0.86	0.795	1.00

Based on the decided rules we classify each user:

Statistic	Caltech - User 743	Caltech - User 562	Caltech - User 891	Jpl - User 651	Jpl - User 933	Jpl - User 406
Time	R/UP	UR/UP	UR/UP	UR/UP	R/UP	R/P
Energy	UR/UP	UR/P	UR/UP	R/P	UR/UP	R/P

Open question: how to automatically define the condition/threshold for RI and PI?

Next, we plot the histogram of the values obtained for RI and PI for all the users of both caltech and jpl datasets during the year of 2019:

#collapse-hide

from sklearn.cluster import DBSCAN
eps_min = 0.05
eps_kwh = 0.05
min_samples_ = 5

# caltech
caltech_riTime_array = np.array([])
caltech_riEnergy_array = np.array([])
caltech_piTime_array = np.array([])
caltech_piEnergy_array = np.array([])
for user in caltech_2019_df['userID'].unique():
    caltech_riTime_array = np.append(caltech_riTime_array, (1 - caltech_2019_df[caltech_2019_df.userID.isin([user])].smapeMinAvailable.mean()/100))
    caltech_riEnergy_array = np.append(caltech_riEnergy_array, (1 - caltech_2019_df[caltech_2019_df.userID.isin([user])].smapekWhRequested.mean()/100))
    model = DBSCAN(eps=eps_min, min_samples=min_samples_)
    X = np.stack((np.array(caltech_2019_df[caltech_2019_df.userID.isin([user])].arrivalTime)/86400, np.array(caltech_2019_df[caltech_2019_df.userID.isin([user])].smpeMinAvailable)/100), axis=1)
    Y = model.fit_predict(X)
    caltech_piTime_array = np.append(caltech_piTime_array, (1 - (np.where(Y == -1)[0].shape[0]/Y.shape[0])))
    model = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
    X = np.stack((np.array(caltech_2019_df[caltech_2019_df.userID.isin([user])].arrivalTime)/86400, np.array(caltech_2019_df[caltech_2019_df.userID.isin([user])].smapekWhRequested)/100), axis=1)
    Y = model.fit_predict(X)
    caltech_piEnergy_array = np.append(caltech_piEnergy_array, (1 - (np.where(Y == -1)[0].shape[0]/Y.shape[0])))
    
# jpl
jpl_riTime_array = np.array([])
jpl_riEnergy_array = np.array([])
jpl_piTime_array = np.array([])
jpl_piEnergy_array = np.array([])
for user in jpl_2019_df['userID'].unique():
    jpl_riTime_array = np.append(jpl_riTime_array, (1 - jpl_2019_df[jpl_2019_df.userID.isin([user])].smapeMinAvailable.mean()/100))
    jpl_riEnergy_array = np.append(jpl_riEnergy_array, (1 - jpl_2019_df[jpl_2019_df.userID.isin([user])].smapekWhRequested.mean()/100))
    model = DBSCAN(eps=eps_min, min_samples=min_samples_)
    X = np.stack((np.array(jpl_2019_df[jpl_2019_df.userID.isin([user])].arrivalTime)/86400, np.array(jpl_2019_df[jpl_2019_df.userID.isin([user])].smpeMinAvailable)/100), axis=1)
    Y = model.fit_predict(X)
    jpl_piTime_array = np.append(jpl_piTime_array, (1 - (np.where(Y == -1)[0].shape[0]/Y.shape[0])))
    model = DBSCAN(eps=eps_kwh, min_samples=min_samples_)
    X = np.stack((np.array(jpl_2019_df[jpl_2019_df.userID.isin([user])].arrivalTime)/86400, np.array(jpl_2019_df[jpl_2019_df.userID.isin([user])].smapekWhRequested)/100), axis=1)
    Y = model.fit_predict(X)
    jpl_piEnergy_array = np.append(jpl_piEnergy_array, (1 - (np.where(Y == -1)[0].shape[0]/Y.shape[0])))
    
# plot the histogram for the time and energy indexes
bins_index = np.linspace(0, 1, 50)
fig49, (ax1_fig49, ax2_fig49) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig49.hist(caltech_riTime_array, bins_index, alpha=0.4, label='caltech')
ax1_fig49.hist(jpl_riTime_array, bins_index, alpha=0.4, label='jpl')
ax1_fig49.set_ylim([0, 50])
ax1_fig49.set_title("RI - Time Analysis")
ax1_fig49.set_xlabel("Reliability Index")
ax1_fig49.set_ylabel("Number of users")
ax2_fig49.hist(caltech_riEnergy_array, bins_index, alpha=0.4, label='caltech')
ax2_fig49.hist(jpl_riEnergy_array, bins_index, alpha=0.4, label='jpl')
ax2_fig49.set_ylim([0, 50])
ax2_fig49.set_title("RI - Energy Analysis")
ax2_fig49.set_xlabel("Reliability Index")
ax2_fig49.set_ylabel("Number of Users")
fig49.suptitle('Reliability Index', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

# plot the histogram for the time and energy indexes
bins_index = np.linspace(0, 1, 50)
fig50, (ax1_fig50, ax2_fig50) = plt.subplots(1, 2, figsize=(10, 5))
ax1_fig50.hist(caltech_piTime_array, bins_index, alpha=0.4, label='caltech')
ax1_fig50.hist(jpl_piTime_array, bins_index, alpha=0.4, label='jpl')
ax1_fig50.set_ylim([0, 250])
ax1_fig50.set_title("PI - Time Analysis")
ax1_fig50.set_xlabel("Predictability Index")
ax1_fig50.set_ylabel("Number of users")
ax2_fig50.hist(caltech_piEnergy_array, bins_index, alpha=0.4, label='caltech')
ax2_fig50.hist(jpl_piEnergy_array, bins_index, alpha=0.4, label='jpl')
ax2_fig50.set_ylim([0, 250])
ax2_fig50.set_title("PI - Energy Analysis")
ax2_fig50.set_xlabel("Predictability Index")
ax2_fig50.set_ylabel("Number of Users")
fig50.suptitle('Predictability Index', y=1.05)
plt.legend(loc='upper right')
plt.tight_layout()

It is noted that for the predictable index there is a big peak with PI equal to 0 that is mostly related to the calculation for users with less than 5 sessions. The graph should be re-plotted filtering this case.

The RI has gaussian like behavior with peak around 0.8, what shows that possibly the threshold for reliability should be decreased using SMAPE as a metric.

For the PI, it is observed that the Jpl users are more predictable than the Caltech users, what is expected because the Jpl site is a workplace environment.

Adding the day of the week to analyze the data for a Predictable User

Next we plot the graph for the minTotal for JPL user 651 based on two variables: arrivalTime and day of the week. The JPL user 651 is predictable but not entirely reliable.

#collapse-hide
jpl_user651_df['day_float'] = jpl_user651_df.apply(lambda row: datetime.strptime(row.connectionTime.rsplit('-', 1)[0], fmt).weekday(), axis=1)\

# plotting a 3d scatter plot (need to add axis titles)
from mpl_toolkits.mplot3d import Axes3D
fig26 = plt.figure()
ax1_fig26 = fig26.add_subplot(111, projection='3d')
ax1_fig26.scatter(jpl_user651_df['arrivalTime'], jpl_user651_df['day_float'], jpl_user651_df['minutesTotal'], alpha=0.5)
ax1_fig26.set_xlabel("Arrival Time")
ax1_fig26.set_ylabel("Day of the week")
ax1_fig26.set_zlabel("Minutes Total")
fig26.suptitle('Jpl - User 651', y=1.05)
plt.tight_layout()

fig48 = plt.figure()
ax1_fig48 = fig48.add_subplot(111, projection='3d')
ax1_fig48.scatter(jpl_user651_df['arrivalTime'], jpl_user651_df['day_float'], jpl_user651_df['kWhDelivered'], alpha=0.5)
ax1_fig48.set_xlabel("Arrival Time")
ax1_fig48.set_ylabel("Day of the week")
ax1_fig48.set_zlabel("kWh Delivered")
fig48.suptitle('Jpl - User 651', y=1.05)
plt.tight_layout()

An additional analysis inserting the day of the week in the DBSCAN clustering algorithm to calculate the predictability index is planned and to be performed in the future.

Analysis 3- Conclusion

In this section we investigated the individual driver charging behavior for 6 users of two sites, Caltech and Jpl, 3 in each site. The analysis was performed for the year of 2019.

We observe that the deviation within individual users is smaller when compared to the general behavior, to reinforce our claim that better individual driver charging behavior modeling is needed.
We observe that gaussian-like behavior are observed for different error metrics analyzed. We also propose the use of different error metrics related if the time or energy analysis is being performed or predicted. For the time analysis SMPE is used while for the energy analysis SMAPE is used.
We propose a predictability and a reliability index. In the optimal case, the reliability of each user should be 1, and this should be incentivize by the government and company policies when the user input his data on the mobile application for the Adaptive Charging Network framework of [1] used in our analysis. We also propose a predictability index based on the DBSCAN algorithm. The results obtained show interesting clustering patterns obtained by the algorithm used.

Missing

The occupancy of each space-ID in the garage is missing in this analysis, as we expect the number of users is not big enough to make congestion in charging during the day a problem. In the Jpl site [1] mentions about a policy that incentivize the users to plug out after the charging is complete, but we don't know more details about this policy.

An occupancy analysis, especially, in the Jpl site is needed.

Also, the capacity of the Caltech site is increased based on their site data when compared to [1] to 300kW. Therefore, an capacity analysis is also to be performed.

Finally, the adaptive scheduling algorithm will be investigated by a pilot signal analysis.

This analysis is performed in a different jupyter notebook.

References

[1] Z. Lee et al, "ACN-Data: Analysis and Applications of an Open EV Charging Dataset", Proceedings of the 10th ACM International Conference on Future Energy Systems, 2019 (e-Energy '19).

[2] M. G. Flammini et al, "Statistical characterisation of the real transaction data gathered from electric vehicle charging stations", Electric Power Systems Research, vol. 166, pp. 136-150, 2019.

	Unnamed: 0	_id	clusterID	connectionTime	disconnectTime	doneChargingTime	kWhDelivered	sessionID	siteID	spaceID	...	timezone	userID	userInputs	minutesCharging	minutesIdle	minutesTotal	userInputsArray	minutesAvailable	kWhRequested	requestedDeparture
11	11	5bc93946f9af8b0dc677d735	39	2018-09-01 09:29:36-07:00	2018-09-01 17:16:50-07:00	2018-09-01 15:56:06-07:00	42.138	2_39_91_437_2018-09-01 16:29:35.661527	2	CA-317	...	America/Los_Angeles	291.0	[{'userID': 291, 'milesRequested': 120, 'WhPer...	386.500000	80.733333	467.233333	[{'userID': 291, 'milesRequested': 120, 'WhPer...	570	48.0	Sun, 02 Sep 2018 01:59:36 GMT
82	82	5bc93979f9af8b0dc677d77c	39	2018-09-02 17:55:50-07:00	2018-09-02 23:43:59-07:00	2018-09-02 23:43:49-07:00	38.773	2_39_138_29_2018-09-03 00:55:50.190398	2	CA-304	...	America/Los_Angeles	325.0	[{'userID': 325, 'milesRequested': 200, 'WhPer...	347.983333	0.166667	348.150000	[{'userID': 325, 'milesRequested': 200, 'WhPer...	60	80.0	Mon, 03 Sep 2018 01:55:50 GMT
102	102	5bc939a0f9af8b0dc677d790	39	2018-09-03 12:11:29-07:00	2018-09-03 17:20:16-07:00	2018-09-03 14:58:55-07:00	17.577	2_39_78_363_2018-09-03 19:11:28.722398	2	CA-320	...	America/Los_Angeles	241.0	[{'userID': 241, 'milesRequested': 80, 'WhPerM...	167.433333	141.350000	308.783333	[{'userID': 241, 'milesRequested': 80, 'WhPerM...	60	32.0	Mon, 03 Sep 2018 20:11:29 GMT
192	192	5bc939e4f9af8b0dc677d7ea	39	2018-09-04 17:13:07-07:00	2018-09-04 19:05:37-07:00	2018-09-04 19:05:29-07:00	12.425	2_39_139_28_2018-09-05 00:13:06.867600	2	CA-303	...	America/Los_Angeles	216.0	[{'userID': 216, 'milesRequested': 80, 'WhPerM...	112.366667	0.133333	112.500000	[{'userID': 216, 'milesRequested': 80, 'WhPerM...	103	20.0	Wed, 05 Sep 2018 01:56:07 GMT
278	278	5bc93a46f9af8b0dc677d840	39	2018-09-05 18:03:55-07:00	2018-09-05 20:14:01-07:00	2018-09-05 20:13:57-07:00	14.897	2_39_95_27_2018-09-06 01:03:54.928777	2	CA-319	...	America/Los_Angeles	231.0	[{'userID': 231, 'milesRequested': 120, 'WhPer...	130.033333	0.066667	130.100000	[{'userID': 231, 'milesRequested': 120, 'WhPer...	192	30.0	Thu, 06 Sep 2018 04:15:55 GMT

	Unnamed: 0	_id	clusterID	connectionTime	disconnectTime	doneChargingTime	kWhDelivered	sessionID	siteID	spaceID	...	minutesCharging	minutesIdle	minutesTotal	userInputsArray	minutesAvailable	kWhRequested	requestedDeparture	day	arrivalTime	departureTime
1	1	5c412c1df9af8b12cb56c27d	39	2019-01-01 10:09:17-08:00	2019-01-01 18:39:32-08:00	2019-01-01 12:16:10-08:00	12.534	2_39_79_379_2019-01-01 18:09:16.991864	2	CA-327	...	126.883333	383.366667	510.250000	[{'userID': 558, 'milesRequested': 80, 'WhPerM...	514	17.76	Wed, 02 Jan 2019 02:43:17 GMT	Tuesday	36557	67172
4	4	5c412c1df9af8b12cb56c280	39	2019-01-01 13:05:57-08:00	2019-01-01 18:03:02-08:00	2019-01-01 17:59:27-08:00	16.136	2_39_79_378_2019-01-01 21:05:56.972890	2	CA-326	...	293.500000	3.583333	297.083333	[{'userID': 1135, 'milesRequested': 50, 'WhPer...	282	20.00	Wed, 02 Jan 2019 01:47:57 GMT	Tuesday	47157	64982
5	5	5c412c1df9af8b12cb56c281	39	2019-01-01 13:08:49-08:00	2019-01-01 14:52:39-08:00	2019-01-01 14:17:15-08:00	2.917	2_39_139_28_2019-01-01 21:08:49.264929	2	CA-303	...	68.433333	35.400000	103.833333	[{'userID': 838, 'milesRequested': 60, 'WhPerM...	277	25.98	Wed, 02 Jan 2019 01:45:49 GMT	Tuesday	47329	53559
7	7	5c412c1df9af8b12cb56c283	39	2019-01-01 14:23:40-08:00	2019-01-01 16:54:51-08:00	2019-01-01 16:30:48-08:00	13.273	2_39_125_21_2019-01-01 22:23:40.471724	2	CA-311	...	127.133333	24.050000	151.183333	[{'userID': 754, 'milesRequested': 50, 'WhPerM...	170	15.00	Wed, 02 Jan 2019 01:13:40 GMT	Tuesday	51820	60891
8	8	5c427db1f9af8b1e1a538287	39	2019-01-02 05:53:10-08:00	2019-01-02 14:38:52-08:00	2019-01-02 11:50:11-08:00	36.608	2_39_127_19_2019-01-02 13:53:10.182304	2	CA-309	...	357.016667	168.683333	525.700000	[{'userID': 556, 'milesRequested': 210, 'WhPer...	494	63.00	Wed, 02 Jan 2019 22:07:10 GMT	Wednesday	21190	52732

	Unnamed: 0	_id	clusterID	connectionTime	disconnectTime	doneChargingTime	kWhDelivered	sessionID	siteID	spaceID	...	kWhRequested	requestedDeparture	day	arrivalTime	departureTime	maeMinAvailable	smapeMinAvailable	smpeMinAvailable	maekWhRequested	smapekWhRequested
0	0	5c367215f9af8b4639a8f35f	1	2019-01-01 17:00:51-08:00	2019-01-01 18:39:46-08:00	2019-01-01 18:39:37-08:00	10.143	1_1_193_829_2019-01-02 01:00:51.413435	1	AG-1F03	...	34.32	Wed, 02 Jan 2019 03:19:51 GMT	Tuesday	61251	67186	40.083333	16.847636	16.847636	24.177	54.375548
1	1	5c367245f9af8b4639a8f360	1	2019-01-02 05:39:11-08:00	2019-01-02 17:19:57-08:00	2019-01-02 07:37:12-08:00	5.871	1_1_191_789_2019-01-02 13:39:11.359003	1	AG-4F52	...	10.15	Wed, 02 Jan 2019 20:54:11 GMT	Wednesday	20351	62397	265.766667	23.399759	-23.399759	4.279	26.708695
2	2	5c367245f9af8b4639a8f361	1	2019-01-02 05:44:27-08:00	2019-01-02 14:37:33-08:00	2019-01-02 11:18:16-08:00	12.094	1_1_178_823_2019-01-02 13:44:26.828039	1	AG-1F08	...	20.00	Wed, 02 Jan 2019 23:36:27 GMT	Wednesday	20667	52653	58.900000	5.235090	5.235090	7.906	24.633888
3	3	5c367245f9af8b4639a8f362	1	2019-01-02 05:47:38-08:00	2019-01-02 11:01:31-08:00	2019-01-02 07:06:07-08:00	2.425	1_1_193_829_2019-01-02 13:47:38.465648	1	AG-1F03	...	4.00	Wed, 02 Jan 2019 17:48:38 GMT	Wednesday	20858	39691	72.883333	13.134893	-13.134893	1.575	24.513619
4	4	5c367245f9af8b4639a8f363	1	2019-01-02 05:53:41-08:00	2019-01-02 13:40:03-08:00	2019-01-02 08:45:46-08:00	14.331	1_1_193_819_2019-01-02 13:53:40.716472	1	AG-1F06	...	16.00	Wed, 02 Jan 2019 16:20:41 GMT	Wednesday	21221	49203	319.366667	52.067822	-52.067822	1.669	5.502621