Tuesday, December 1, 2015

Project - Simple Web Scraper


Simple Web Scraper


https://sourceforge.net/projects/iad-dispatch-web-scraper/

Goal: 

Write a very, very, very (did I mention very?) simple program to pull data from a simple website and plot the information on a graph. Additional work can be done to build a small database and do analytics on the data.

Introduction: 

The Dulles International Airport (IAD) near Washington, D.C. has a taxi service provided by the Washington Flyer. Taxi cabs are leased by drivers and rides are regulated using a queue system. Drivers enter a corral near the Arrival gate and wait for dispatchers to announce passengers.

There is a website that displays useful information about the queue. The number of taxis waiting in queue, the wait time of the last vehicle out, and the number of taxis to exit the corral in the past hour. This information is updated a few times every hour, not in real time.

Motivation: 

The program should attempt to answer these "ten questions". (Ten being a fluid number, that can range from one to however many I want. Ten is whatever number I say it is.)

  • What is the time of day that has the shortest expected wait time?
  • What is my expected wait time when I enter the corral?
  • Are there shorter wait times on certain days?
  • How many rides can I get between the hours of 11pm and 5am?
  • How many hours should I expect to work to average five rides a day?
  • ...
  • Ten

Steps: 

  1. find out how to connect to a website and download html
  2. find out how to auto-refresh, periodically get data
  3. learn how to parse html for relevant data: departure times, numbers of cars
  4. generate plots
  5. create user interface to display info

Tools:

Python, BeautifulSoup library

To do:

  • Plot the data points by date and time according to this format "11/30/2015 14:00:52"
  • Figure out which plots to show the data.
    • Holding Lot Count vs date and time of dispatch update
    • Wait time of last dispatched vs date and time of dispatch (entry time + wait time)

No comments:

Post a Comment

You can add Images, Colored Text and more to your comment.
See instructions at http://macrolayer.blogspot.com..