<<< WeeWX Guides >>>

Add Missing Data to WeeWX

Background

Having gotten a caravan, I wanted to make sure that it is not getting cold and damp, especially through the winter months and so I sought out a way to keep track of the internal temperature and humidity of the caravan

After much research I settled on an Ecowitt GW1000 WiFi Weather Station Gateway and Ecowitt WH32 Outdoor Temperature and Humidity Sensor

Initally I set them up to capture the temperature and record it on the Ecowitt Website, but I always planned to merge this data into the WeeWX database, which I have now finished, a desription on how to do this is available here

So now that the integration is finished I need to populate WeeWX with the missing data. The first thing I tried was wee_import, but this only adds new records. It does not update existing records, and so this guide shows you how I achived the necessary update to add the missing data

Warning

This process requires you to make changes to the WeeWX database, outside of the control of the WeeWX program, and if you are not experienced and/or confident working with relation databases then I suggest you obtain additional help.

For background: I have worked with relational database for over 30 years and currently work in a Global Role, advising organisations on the best way to modernise their databases through the Database Freedom Programme

Add Missing Data to WeeWX

As I said I have been sending my temperature readings to the Ecowitt website since I installed my temperature sensor, and luckily for me the website provides an easy way to download your data. This lets me download data on a daily, weekly, monthly or yearly basis. Each with a different level of granularity.

Daily: returns the data with one record every five minutes
Weekly: returns the data with one record every thirty minutes
Monthly: returns the data with one record every four hours
Yearly: returns the data with one record every day

As I ws downloading about four months of data I decided to download the weekly reports, which still meant I had to consollidate around sixteen Microsoft Excel files, but this gave me sufficient granularity the purposes of storing historical data

Once I had consolidated all of that into a single Excel spreadsheet I extracted the data in csv format and ended up with a file containing data in this format

date_and_time,extraTemp2,extraHumid2,extraTemp3,extraHumid1
2020-11-03 20:00,74.48,60,72.68,51
2020-11-03 20:30,74.3,60,72.68,51
2020-11-03 21:00,74.12,61,72.32,52
2020-11-03 21:30,73.94,62,72.68,51
2020-11-03 22:00,73.94,62,73.22,49
2020-11-03 22:30,73.76,63,72.5,50
2020-11-03 23:00,73.76,64,72.86,49
2020-11-03 23:30,73.76,64,72.32,51
2020-11-04 00:00,73.76,64,71.96,51
2020-11-04 00:30,73.22,61,71.78,51

Create Python Script to Parse the File

The next part of this processes was actually a lot easier than I anticipated, I simply had to create a Python script to read and parse the file and to then update the SQLite database

As this is only ever going to be a one off activity I used constants to store the directory names for the csv files and database location, and in the sample program below you can see that I used two variables, one for testing and one for the actual upgrade

Another point of note is that the SQL statement for the update is hard coded into this program, again as this will only ever be a one off activity this is probably the best approach, and other than the columns names, which should be self explanatory the only other calculation that is required is to convert the UTC date into epochTime to match the records in the database

The code I used is replicated below and downloadable from here

pi@raspberrypi:~ $ vi weeUpdate.py
import csv
import os
import sqlite3
import time

#CSV_FILE_PATH     = '/tmp'
CSV_FILE_PATH      = '/Users/revittmk/OneDrive/Mike/WebSites/Weewx/DataLoads/Ecowitt'
#SQLITE_DB         = '/home/weewx/weewx.sdb'
SQLITE_DB         = '/Users/revittmk/OneDrive/Mike/WebSites/Weewx/DataLoads/Ecowitt/weewx.sdb'
SOURCE_FILE       = 'Cougar-Caravan.csv'
DEST_FILE         = 'Epoc-Caravan.csv'

if __name__ == "__main__":

    firstRow      = True
    sourceFile    = os.path.join(CSV_FILE_PATH, SOURCE_FILE)
    destFile      = os.path.join(CSV_FILE_PATH, DEST_FILE)
    destData      = open(destFile, 'w')
    destWriter    = csv.writer(destData, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    sqliteDB      = sqlite3.connect(SQLITE_DB)
    sqliteCursor  = sqliteDB.cursor()

    with open(sourceFile, 'r') as sourceData:
        csvReader = csv.reader(sourceData, delimiter=',')

        for sLineIn in csvReader:
            if firstRow == True:
                destWriter.writerow(sLineIn)
                firstRow = False
            else:
                epochTime =  int(time.mktime(time.strptime(sLineIn[0],'%Y-%m-%d %H:%M')))
                destWriter.writerow([epochTime, sLineIn[1], sLineIn[2], sLineIn[3], sLineIn[4]])

                sqlString = 'update archive set extraTemp2  = ' + sLineIn[1] + ',';
                sqlString =  sqlString +       'extraHumid2 = ' + sLineIn[2] + ',';
                sqlString =  sqlString +       'extraTemp3  = ' + sLineIn[3] + ',';
                sqlString =  sqlString +       'extraHumid1 = ' + sLineIn[4] + ' ';
                sqlString =  sqlString + 'where datetime    = ' + str(epochTime)  + ';\n';
                sqliteCursor.execute(sqlString)

    sourceData.close()
    destData.close()
    sqliteDB.commit()
    sqliteCursor.close()

Test Test Test

I shouldn't need to say this but you are updating data in your production database and if you get it wrong you could end up corrupting everything, not the ideal way to test if your backup strategy works. You do have a backup strategy don't you

Copy your database file somewhere and then connect to the database to examine the records that we will be updating, in my case extraTemp1 contains the temperature of my Rasberry PI CPU but the others are empty

pi@weewx: $ sqlite3 /Users/revittmk/OneDrive/Mike/WebSites/Weewx/DataLoads/Ecowitt/weewx.sdb
sqlite> select datetime, extraTemp1, extraTemp2, extraTemp3 from archive where datetime > 1610638200;
1610638500|128.4008||
1610638800|126.6476||
1610639100|124.0178||
1610639400|124.8944||
1610639700|124.8944||
1610640000|124.0178||
1610640300|124.8944||
1610640600|124.8944||
1610640900|126.6476||
1610641200|125.771||

Run the update program and then examine the data again. One of the other things this program does is create a new csv file, Epoc-Caravan.csv in my case, containing the updated information which can be used to perform spot checks against the database. This update file contains the UTC date converted to epochTime

pi@weewx: $ python3 weeUpdate.py
pi@weewx: $ sqlite3 /Users/revittmk/OneDrive/Mike/WebSites/Weewx/DataLoads/Ecowitt/weewx.sdb
sqlite> select datetime, extraTemp1, extraTemp2, extraTemp3 from archive where datetime > 1610638200;
1610638500|128.4008|34.7|64.6
1610638800|126.6476|34.5|64.9
1610639100|124.0178|34.5|65.3
1610639400|124.8944|34.5|65.5
1610639700|124.8944|34.5|65.5
1610640000|124.0178|34.7|65.66
1610640300|124.8944|34.3|65.3
1610640600|124.8944|34.5|65.5
1610640900|126.6476|34.7|65.7
1610641200|125.771|35.1|65.8

Populate Summary Tables

What we have done here is update the archive rows with the new data, but the summary reports run against a different set of archive tables, for the data I have just updated these are

archive_day_extraTemp2
archive_day_extraTemp3
archive_day_extraHumid1
archive_day_extraHumid2

First lets have a look at the data in these tables

pi@weewx: $ sqlite3 /Users/revittmk/OneDrive/Mike/WebSites/Weewx/DataLoads/Ecowitt/weewx.sdb
sqlite> select * from archive_day_extraTemp2 where datetime > 1604361600;
1604448000||||||||
1604534400||||||||
1604620800||||||||

sqlite> select * from archive_day_extraTemp3 where datetime > 1604361600;
1604448000||||||||
1604534400||||||||
1604620800||||||||

sqlite> select * from archive_day_extraHumid1 where datetime > 1604361600;
1604448000||||||||
1604534400||||||||
1604620800||||||||

sqlite> select * from archive_day_extraHumid2 where datetime > 1604361600;
1604448000||||||||
1604534400||||||||
1604620800||||||||

Next we run the WeeWX utility wee_database, this will correctly populate the archive tables so that when the reports run the graphs will be populated with the data we just updated

As before I wanted to test this before running the database utillity against my production database so I created a copy of my main config file and changed the location of the database in that file, this allowed me to tell the database utility to use the test database via a test copy of the config file

pi@weewx: $ wee_database --config=weewx-db.conf --rebuild-daily --from=2020-11-01
sqlite> select * from archive_day_extraTemp2 where datetime > 1604361600;
1604448000|42.98|1604534400|73.22|1604449800|2622.84|48|2622.84|48
1604534400|41.9|1604545200|51.8|1604563200|2304.6|48|2304.6|48
1604620800|44.06|1604707200|54.32|1604649600|2337.0|48|2337.0|48

sqlite> select * from archive_day_extraTemp3 where datetime > 1604361600;
1604448000|66.74|1604509200|76.1|1604453400|3435.36|48|3435.36|48
1604534400|65.48|1604590200|76.28|1604611800|3339.96|48|3339.96|48
1604620800|64.94|1604676600|76.28|1604701800|3362.28|48|3362.28|48

sqlite> select * from archive_day_extraHumid1 where datetime > 1604361600;
1604448000|47.0|1604451600|54.0|1604485800|2430.0|48|2430.0|48
1604534400|50.0|1604611800|57.0|1604559600|2580.0|48|2580.0|48
1604620800|48.0|1604674800|57.0|1604644200|2525.0|48|2525.0|48

sqlite> select * from archive_day_extraHumid2 where datetime > 1604361600;
1604448000|56.0|1604475000|74.0|1604521800|3281.0|48|3281.0|48
1604534400|72.0|1604565000|77.0|1604610000|3567.0|48|3567.0|48
1604620800|73.0|1604651400|78.0|1604696400|3631.0|48|3631.0|48

And that's all there is to it, WeeWX will now create all of your graphs and reports from the data you just uploaded

Mike Revitt - Cloud Computing Specialist

<<< WeeWX Guides >>>

Add Missing Data to WeeWX

Background

Warning

Add Missing Data to WeeWX

Create Python Script to Parse the File

Test Test Test

Populate Summary Tables