What Is in Your Caching Database?

Data.

Initially, the idea of building the database was to preserve my caching history. By recording each cache find and my log to accompany it, there would be a back-up of the data should anything ever happen to the data I posted on a listing site. There were some genuine concerns about the financial health of GC.com during 2002.

By the time I actually began building this database, the scope of the project had already grown. The core was the same: to preserve my caching history. What, however, that would encompass grew measurably beyond recording the cache and the log. The database was going to provide me a learning tool to advance my skills in database design as well as learning how to interact with a database via a web site. These tasks provide a seemingly endless set of projects for me. I like that.

So, in February 2003, when I actually began building this thing, I began collecting cache information about caches I had found. At that time I had fewer than 200 finds. A couple of points about what I had decided to collect:

  1. I wanted to avoid copyrightable material. Since I was collecting just my finds, the logs were not an issue. But I decided not to include the description of the cache. I went back and forth on this. While it is someone’s property, if this database is to preserve my history, it should reflect the cache as I knew it when I found it. But, if it was going to manifest itself to a web site at some point, well then, it wasn’t mine to reproduce.
  2. I made some mistakes. Good database design was not followed for I hadn’t thought through this project well enough. I did not think how I would use the coordinates, so I took them in the format GC.com posts coordinates. I did not record the hide date or the difficulty-terrain ratings. Again, poor design on my end which has caused me hours (days?) of extra work later on. (I still have not collected the difficulty-terrain ratings and am trying to convince myself I do not need them.)

I muddled through for a few months. I began during this time to design a web site (look at the hidden caches image for the design) that would eventually house the database. I learned from that little experience that without a database, a personal geocacher’s web site does not have much to offer. 😉 I scrapped the idea quickly, but made some notes about what would be needed. I had all my finds recorded in the database and was keeping up-to-date with the new finds. I was also stuck in the same place I had been with a database I keep at work: photographs. I did not know how to properly place photographs in or link to a database. I futzed with a few different ways, but things were not going quite as I planned.

The first six months of the database had passed and although the finds were well-recorded, other things were in disarray. I had added a cachers table so I could keep in one place some telephone numbers that I kept losing of my friends. I had some other fields in this too, but this table was (and is) incomplete. I had begun inputting information about my hitch hikers, their movements, and hitch hikers I had found. That needed a lot of attention for it was not thought through how to handle the movements. And there was still the problem with the photographs. In addition, GC.com was making some big changes on its site.

The Terms of Use, which I guess had always been present over at the big site, was now an in-your-face document. One could not submit a cache without agreeing to it. Being who I am, I read the thing. Now, before I go off on this rant, let me put up front that it is only good business practice of any site to have a TOU to protect itself. I have no problem with the document in theory. The problem I had was the way it was written. In its attempt at covering all the bases, Groundspeak placed each cacher in jeopardy of violating the TOU each day. If I showed a .GPX file on my PDA to someone, that was a violation. There were many other similarly worded snafus in the making. I made a stink out of it. To no end, I might add. 🙁 My protest was to take the first step that I had considered for some time: removing myself publicly from the game. I stopped posting photographs online. GC.com had always reserved the right to use the photographs as they saw fit, even though I retained ownership of them. But the new TOU spelled it out much more bluntly how they could use these photographs. The beginnings of what could turn into a nightmare (as far as I am concerned) can be seen in the monthly calendars the big site distributes. On many of them are photographs taken by regular cachers. Now that it has partnered with Today’s Cacher, it wouldn’t be too difficult to imagine that other photographs could be offered up to help the big site. This all helps further solidify the brand name. In August 2003, this was a big concern of mine.

In addition to refraining from posting images, I began to not post finds as well. I sought to see how many I could not log without the community noticing. As fall rolled in, I knew I needed to get this database in better shape. When October came, I had a reason. I finally took the plunge I had wanted to take since 1994; I registered frolickin.com as a domain. The host company was recommended in a thread on GC.com. I liked the deal and it came with four features I wanted: support for Access databases, a lot of disk space, phpBB, and it was inexpensive. Now I had the incentive to get the database in shape so I could actually mount it on a web server.

I went through the cache table and neatened up the urls, added an owner field to distinguish it from the hider, added an index number, began querying the data and found that I could generate some interesting results. In late fall, I began in earnest to verify that I had all the photographs that were posted on GC.com in my possession. Once I did that, I began purging the photographs from the big site. This changed my presence, I believe. No longer would folks know me by sight like Nik did at the Bordentown event. No longer would folks know that I posted a self-portrait from each cache. I didn’t recognize that change until after it was over. I had about 40 pages of images to delete. It went quickly.

After deleting the photographs, I knew that it wouldn’t be long before the logs went too. Unlike others, I didn’t want to do it in a huff. I also wanted more time to think through the decision and any ramifications.

As 2004 came, something new took my interest. I had planned an event and the more I worked with my database, the more I was finding I had South Jersey’s history in my hands. I wasn’t around in the very beginning, but pretty close to it. I had found each cache in South Jersey and had a database of information about each. For the event, I wanted to create some puzzles based on that information. I requested help and after some time was provided a jackpot. Nate provided me with Atsion Crossword as the first South Jersey cache. Later, someone else provided me with a list of the first ten waypoints in New Jersey. In it, was a waypoint no one had known about. That really sparked my excitement. I began re-tracing the area’s history. I went through each person who logged Atsion Crossword to see what else he hid or found. This was a somewhat tedious, but at the same time exciting, endeavor. I found caches I had never heard of just sitting there. I added all this to my database. What I noticed was how quickly the cachers table grew. Before this time, there were a couple hundred cachers listed. By the time I had gone through it all (including every page of CCCooperAgency‘s and StayFloopy‘s logs), that table had swelled to well past one thousand. It’s possible there’s a cache or two out there I missed. If there were early SJ caches without finds placed by folks who did not find other area caches, I would not have found them. I think that scenario is unlikely and am confident of what I have found.

Once I had every South Jersey cache in the database, it wasn’t long before I recognized that if I recorded each find on each of those caches, I could generate some statistics (my approach to statistics will be elsewhere described). So, about the time of the Piney Luncheon event, I began culling all the finds on these caches. Once I added each find for a cache to the database, I would add the cache to my Watch List. That ensures that any new logs to page are e-mailed to me for inclusion in the database (this also ensures that I will continue being a paying member of geocaching.com). This was an arduous task. With over 600 caches and over 10,000 find logs, it was a couple months before I had it all. But once I did, it became simple to generate leaderboards, query for first finds, determine how many caches were placed when I joined (30, fwiw), etc. All of this was exciting.

There was no reason to post my logs publicly any longer. I had a mechanism that provided me with everything I wanted and needed. I could slip somewhat from the public face of caching and concentrate on my own thing. And with a baby coming along, if I wasn’t able to get out there for a while, no one would be using my numbers for their comparison. I archived each and every one of my find logs at GC.com. I wish I had thought to bookmark each first, in case I ever wanted to point someone to a log over there as opposed to here. I didn’t and it is too late now.

Removing my find logs had a pleasant after-effect. Pocket Queries are no longer mandatory for me. I had used PQs to determine which caches I hadn’t found in South Jersey. I generate those lists differently now. I can create my own PQ based on that and am no worse the wear.

Now I am further tweaking the database. I had effectively removed my web site for the past month or two and left The Fora up. I am now re-building the site, putting into practice what I have learned about databases, and generally am excited at how much cleaner the code is for all this once I know a little bit. 🙂

I have also taken on placing Morris County into the database. Why? Orange used to keep those stats. I have discussed some things with him about stats and thought this would provide another perfect task for me to learn. By adding Morris County, I can then learn how to properly split a database. If and when I get to that point, I am certain I will have a need to do so with the South Jersey and will then know how to do it. This whole thing truly is a learning experience for me. What I learn from this project is put into practice at my job. So, one can safely state that geocaching is an efficient use of your tax dollars!

Also blogged on this date . . .

Leave a Reply

Your email address will not be published.