You really can graph and analyze anything!
Holly caught me a month ago playing with visualizing some data. When she asked what it was I had to admit that is was fuel economy data for my new car. I had decided that I was going to track my mileage between fill-ups and how many gallons I put in each time I got more gas. Chrome on Android lets you set a bookmark as an icon on your home screen, so I set up a form to make inputting new data easy.
After I had a few data points recorded I was tempted to write a quick couple of scripts that would plot my gas mileage for me. And then it struck me that I work at PetroDE, and the thing I work on daily does that.
So I scrapped my plans to write those couple scripts, and instead just fed my csv into PetroDE.
Holly’s been demanding that I write up something that would show how I was using PetroDE. I fought a bit because I have so little data most of our tools don’t make much sense over this small dataset. But she was very insistent, so here we go.
Data Layout
The first month or two of data collection there wasn’t a fill-up that went by that I didn’t change the layout of the form or the data I was recording to make it easier while I was standing at the pump, and save me some data munging when I was sitting at my desk taking the raw data and getting it ready for upload.
My data is stored in two layers in PetroDE. The first is called gassstations
. I laid out this table very simply.
where_id | Where | latitude | longitude |
15 | I-25 & 7th Murphy Express | 40.0008445 | -104.9905128 |
16 | 144th & I-25 Murphy Express | 39.9567016 | -104.9847027 |
This list is pretty static. I don’t gas up at very many places, but if I do I can just put it at the end of the list and upload the table again. I have a separate where_id
and Where
so that I can link my gaslog on the ID and not have to worry about what I named the gas station. Also, this way I only have to geolocate the gas station once and then just reference that station.
The exciting table is called gaslog
. I do a bit of manual massaging from the simple form I input data in when I’m standing at the pump. These are my columns for the gaslog:
- Timestamp
- Gas Prices (/gal)
- Miles Driven (mi)
- Gallons (gal)
- Octane
- Where
- Odometer Reading
- Actual Price
- Expected reading
- mpg
- price (calc)
When I’m standing at the pump I only have to fill in from Gas Prices (/gal) to Where. The form automatically records the submission time, and I use the spreadsheet to calculate the rest of the columns. I also create a duplicate of the octane column so I can have that column as both a number and a string inside PetroDE. While I’m waiting for the pump to click off I fill in a descriptive name in Where so when I’m sitting at a desk I can change that column to be where_id. I could write a script that does that for me, but like xkcd 1205 I wouldn’t save much time, maybe 2 hours over 5 years.
Once I have that table as a CSV I use our data updater to replace the layer in PetroDE without having to re-create my linked layer. If you haven’t ever used it before, it’s neat and saves you from being forced to re-style a layer you might be updating regularly. In the Help Guide, it’s under “Display tool:
Automatically Update a Layer”, or by clicking here if you are logged in to PetroDE. I drop my zip file into the directory, and a few seconds later I got an email that my layer had been updated. If your layer is larger, it will take longer for the system to ingest your data.
The first time I uploaded these two layers I created the link like this.
Remember that I changed the where column in gaslog to use the where_id’s that are in the gasstation table. Since gasstations are geolocated I can use the geometry in that layer. When I click on a map marker I want the annotation bubble to be titled with my descriptive Gas station names. The last drop-down is particularly interesting, and showed me a feature I didn’t know was in PetroDE until I used it. Since I’ve gassed up at the same gas station several times, there are multiple entries in gaslog
for the same gas station. In the map annotation bubble I want to be able to distinguish each row in gaslog by date I filled up, so setting this drop-down to Timestamp lets me do that. In case you’ve never seen a bubble with multiple values like that, here’s what it looks like:
That drop-down on the Timestamp entry lets you choose which one of the entries you’re looking at. All the columns listed above the timestamp are the same for all entries, everything below is different at least once. So clicking around on the map markers I can easily see which gas stations I’ve only bought one octane type.
Enough exploration of raw data! Let’s do something exciting with the Analysis card.
Data Analysis
Let’s jump straight to the easiest thing we can throw onto the map. I’ll make a heatmap of miles per gallon.
Let’s say I just remembered that this is a heatmap over all of my data (It isn’t, I’ve made this heatmap many times before and had to make this realization several times). The heatmap is going to be skewed because I’ve bought different octanes at different locations. I can clean that up either by refining the data, or by doing a chart subsearch. I’ll do it as a chart subsearch because it lets me show you one of my favorite charts before I throw out that data.
This is one of my favorite barcharts to draw with this data. It naturally visualizes a lot of things. First, picking MPG as my left Y-axis means that the Octanes are plotted from highest MPG to lowest. Adding a sum of the miles driven at each octane gives you a good idea of how certain you can be in the first number. I’ve driven about 4,500 miles with 87 octane in my tank, and I’ve been getting on average 31.7 mpg. In the screenshot, you can see that clicking on the bar chart has performed a subsearch on that data, and redrawn my heatmap to just include that data.
Because I only have 15 data points going into this heatmap, there’s not much to conclude. It is odd that I’m getting the fuel economy I am from the pumps in the little triangle cut out by I-25 and Highway 36.
I just want to look at my 87 octane data, so I’m going to refine on that (because you can’t draw a chart from a previous chart’s subsearch), and I’m going to switch my chart to plotting where I’ve been fueling and how many gallons I’ve used from those stations.
Interesting. The Broomfield and Heritage gas stations are anomalies. But, the station at 144th and I-25 has really good quality 87 octane. I’ve only fueled up there 4 times, but this is confirming the suspicion that a friend and I have been having about that station. We both noticed anecdotally that we were getting better fuel economy on tanks from there than most places. I don’t have statistically significant data yet, but this is an encouraging early result.
Just a few more charts in closing.
I’m interested to see the distribution of my MPG data as I own the car.
Time series charts are frustrating animals. You have to have a good understanding of your data to make valuable charts that progress through time. When I first got my car I thought that the most valuable findings would come from time series charts. I might still learn some interesting things by graphing my data this way, but I’m going to need a lot more data, which means I’m going to have to be patient. For now I’ll show you the most interesting time series graph I’ve found in this data.
It took me a while to realize this, but the only time you care about a stacked time series chart, is when you are doing a comparison of the area under each line. PetroDE helpfully sorts the lines by total area so largest is at the bottom.
Hopefully that was interesting to someone. I like looking at it, but it’s my car and my data, I can’t imagine it being interesting to anyone but me. Holly seems to think it will be.