Agenda 7/16

  1. Overview
    1. Review data story spreadsheet
    2. Review blogs (Flowing Data, Source)
    3. Go over scraping readings for this week:
      1. Ethics of scraping – http://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
      2. Getting data from the Web – challenges – http://datajournalismhandbook.org/1.0/en/getting_data_3.html
  2. What is Web scraping?
    1. http://blog.screen-scraper.com/2008/04/21/screening-scraping-ethics/
  3. Scrape a website with programming: Dataset
    1. Download Python (If you are on a Mac, just go to the Terminal and hit python.
    2. In the terminal, run pip install Beautiful Soup, pip install requests, pip install csv
    3. Discuss how to identify what we want to scrape
    4. Go over parts of a webpage, how they work together.
    5. Grab the file I created, and run it successfully: https://gist.github.com/michelleminkoff/377900bf7288a8871c34
    6. Comment out/remove various parts of the file and talk about what makes it work.
  4. Scrape a website w/out programming (this tutorial may become homework)
    1. Download data set and Scraper extension: Dataset | Scraper
      1. Learn how to make a list into a spreadsheet
    2. Introduce import.io with above dataset, a more flexible option.
    3. More advanced techniques with Outwit Hub, using same data set
      1. Pull specific links/images
      2. Use knowledge of how the Web works to write more complex before/after scraper. – http://michelleminkoff.com/web-scraping-without-programming-nicar-2012-hands-on-tutorial/

Assignments (due 7/23)

  1. Finish scraping without programming tutorial from the blog.
  2. Readings:
    1. Role of visualization in finding story in data –http://datajournalismhandbook.org/1.0/en/understanding_data_7.html
    2. Directory of visualization types – http://guides.library.duke.edu/vis_types
    3. Visual math mistakes – https://eagereyes.org/criticism/visual-math-wrong
    4. Stacked area chart vs. line chart – http://vizwiz.blogspot.com/2012/10/stacked-area-chart-vs-line-chart-great.html
    5. Why data viz matters –  http://www.mulinblog.com/data-visualization-matters/
    6. Principles of visual design – http://webstyleguide.com/wsg3/7-page-design/4-visual-design-principles.html
    7. History of visualization – http://data-art.net/resources/history_of_vis.php
    8. Importance of white space – https://hackdesign.org/lessons/18
  3. Submit final project idea by next week.
Advertisements