I have a dataset that gathers stock option data per hour. Unfortunately some dates and hours are missing due to some maintenance etc..
Here is one exemple with a dataframe (Dates_,Price), as you can see the day "2018-02-08" we only have 2 hours available instead of 24.
2018-02-07-00H 7603.39000000
2018-02-07-01H 7655.02000000
2018-02-07-02H 7544.96000000
2018-02-07-03H 7576.33000000
2018-02-07-04H 7282.02000000
2018-02-07-05H 7200.00000000
2018-02-07-06H 7451.01000000
2018-02-07-07H 7420.00000000
2018-02-07-08H 7460.11000000
2018-02-07-09H 7669.97000000
2018-02-07-10H 7654.92000000
2018-02-07-11H 7962.89000000
2018-02-07-12H 8169.00000000
2018-02-07-13H 8138.05000000
2018-02-07-14H 8266.00000000
2018-02-07-15H 8190.03000000
2018-02-07-16H 8209.98000000
2018-02-07-17H 8165.00000000
2018-02-07-18H 7919.94000000
2018-02-07-19H 8284.31000000
2018-02-07-20H 8225.00000000
2018-02-07-21H 8130.81000000
2018-02-07-22H 8126.96000000
2018-02-07-23H 8073.00000000
2018-02-08-00H 7781.00000000
2018-02-08-01H 7599.00000000
2018-02-09-10H 7789.90000000
2018-02-09-11H 8179.80000000
2018-02-09-12H 8301.01000000
2018-02-09-13H 8215.85000000
2018-02-09-14H 8279.78000000
2018-02-09-15H 8390.05000000
2018-02-09-16H 8387.76000000
2018-02-09-17H 8422.21000000
2018-02-09-18H 8499.36000000
2018-02-09-19H 8628.87000000
2018-02-09-20H 8693.84000000
2018-02-09-21H 8610.00000000
2018-02-09-22H 8499.00000000
2018-02-09-23H 8475.54000000
2018-02-10-00H 8627.97000000
2018-02-10-01H 8789.85000000
is there a way to find the missing dates ? and also to populate the dataset with the missing dates (adding NA in other columns) ?
I tried this method to find the missing dates:
pd.date_range(df.Date_.min(), df.Date_.max()).difference(df.Date_)
but this doen't work well with the hours as it returned only 1 missing hour this day at 6am...
Partager