Thursday, March 13, 2008

Car or Auto Make-Model-Year Database : For Breakfast

Make Model What?

If you like me were tasked with loading a database of recent car makes/models/years, you would start by looking on the web and seeing if someone else just has it out there, readily available, hopefully for free, but perhaps for a tiny nominal fee.?

If only it was that simple...

I looked and looked, and couldn't find anything that would fit the above requirements. So I thought, who would know about US car models better than Kelly Blue Book? So I went on their site, and sure enough they have a javascript file that lists all known to them makes and models of used cars. Since the file is public, I figured it's not really "evil" if I scrape and parse it for my own benefit. Disagree? Have a better source? Then leave a comment.

Anyway, to cut the long story short, I'm hoping to save a day or so to someone else who may, like me, be looking for this information. The ruby module shown below retrieves and parses the javascript from KBB site into a Ruby data structure of the following form - basically a hash, keyed on make, then on model with list of years as a value:

 
>> Constants::Auto::DB.keys.sort[0..5]
=> ["AMC", "Acura", "Alfa Romeo", "Audi", "BMW", "Bertone"]
>> Constants::Auto::DB["Subaru"].keys.sort[0..5]
=> ["B9 Tribeca", "Baja", "DL", "Forester", "GL", "GL-10"]
>> Constants::Auto::DB["Audi"]["A4"]
=> ["1999", "2007", "1998", "2006", "2005", "1996", "2004", "2003", "2002", "1997", "2001", "2000"]
>> Constants::Auto::DB["BMW"]["X5"]
=> ["2003", "2002", "2001", "2000", "2005", "2007", "2006", "2004"]

The idea is that you could load the initial hash: @models = KBB::Parser.new.to_hash and then save the output of @models.inspect in your local constants file - hence me using Constants::Auto::DB (I actually have a Rake task for doing this -- let me know if I should post it too). Then you would just re-run this every time you think new car models are added/changed on KBB. Realize, that hitting their site every time you need the data is clearly evil. So use this class to load the data initially, save the result of inspect() call into a ruby file, and use that cached version in your app. Re-run the load every time you want to update your database.

Please let me know if you find this code useful, or if you find a better/cleaner/more comprehensive way of maintaining car make/model/year database.

#
# author: Konstantin Gredeskoul © 2008
# license: public domain
#
require 'net/http'
require 'uri'

module KBB
  MODELS_URL = "http://file.kbb.com/kbb/ymmData.axd?VehicleClass=UsedCar"

  class Models
    def initialize(js)    
      @models = {}
      @makes = {}
      n = /ymUsed_\[\d{4}\]\s*=\s*'([^']+)'/
      m = /ymmUsed_\["(\d+)~(\d+)"\]\s*=\s*"([^"]+)"/
      js.split(/\n/).each do |line|
        next if line.strip.blank?
        if matched = n.match(line)
          matched[1].split(/,/).each do |token|
            id, name = token.split('|')
            @makes[id.to_i] = name
          end
        end
        
        if matched = m.match(line)
          year, make_id, models = matched[1], matched[2], matched[3]
          models.split(/,/).each do |t| 
            id, model_name = t.split('|')
            make_name = @makes[make_id.to_i]
            @models[make_name] ||= {}
            @models[make_name][model_name] ||= []
            @models[make_name][model_name] << year
          end
        end
      end
    end
    
    def to_hash
      @models
    end
  end

  class Parser
    def initialize
      @m = Models.new(Net::HTTP.get(URI.parse(MODELS_URL)))
    end
    def to_hash
      @m.to_hash
    end
  end

end

11 comments:

Anonymous said...

hi

thank you for this - what a great idea

i was wondering what language is that in?

i was given the task to compile this database in one week, but spent a lot of time looking for a list, but no dice...

so if i can make this work, then you saved my day

thanks

kristine

Konstantin Gredeskoul said...

It's written in Ruby. Email me in case you are having troubles running it.

Kristine said...

oops, i realized that after i re-read , lol...

thought i'd ask... do you have a php version of it by chance?

either way, this is super helpful!

thanks again ya

kristine

Kristine said...

hey konstantin,

here is a basic PHP array:

$file = file('./ymmData.axd');



$patternMake = '/ymUsed_\[\d{4}\]\s*=\s*\'([^\']+)\'/';

$patternModel = '/ymmUsed_\["(\d+)~(\d+)"\]\s*=\s*"([^"]+)"/';



foreach($file as $row) {

if(preg_match($patternMake,$row,$matched))

{

$tmpMakes = explode(',',$matched[1]);

foreach($tmpMakes as $str) {

list($id,$name) = explode("|",$str);

$arrMakes[$id] = $name;

}

}

unset($str);

if(preg_match($patternModel,$row,$matched))

{

$year = $matched[1];

$make_id = $matched[2];

$models = $matched[3];

$tmpModels = explode(',',$models);

foreach($tmpModels as $str) {

list($id,$model_name) = explode("|",$str);

$make_name = $arrMakes[$make_id];

$arrModels[$make_name][$model_name][$year] = $year;

}

}

}

ksort($arrModels);

echo

Lester said...
This post has been removed by the author.
Lester said...

This is exactly what I am looking for, but I am not exactly sure how this is supposed to be setup. Can you please enlighten me?

Konstantin Gredeskoul said...

Lester, it's a ruby script - it's supposed to be run using ruby interpreter. Some knowledge of programming is required to be able to take advantage of this code.

ntv1534 said...
This post has been removed by the author.
ntv1534 said...

Wow, great job doing this! I'm familiar with Python but not Ruby, though they look pretty similar. Is that built-in regular expression support I see? Savage...

For other people who are making ASP.NET websites or don't want to parse a .axd file, it looks like the Selection Service is directly queryable and addable from here



Seekda has a better description, that allows you to see the results of an HTTP-POST
here

That being said, I still need to figure out how to actually use these properly (Web Service n00b here), but it's good to know they're out there if you don't want to do brute force parsing on the javascript!

matt said...

this is an AMAZING idea, but I'm absolutely lost as to how to get it into a database (I'm running a PHP/SQL setup). any help?

theshirey said...

This is soooo great. I actually need this for a program that I'm writing for a friends Car Audio shop and I'm wondering if you can enlighten me on how I can do this. I'm a newbie at Web Design. I'm actually not going to be using this online at all. It's going to be a stand-alone program. I would greatly appreciate any help.