6/4 Agenda; 6/9 (Last Day!) Assignments


  1. Last Data Stories/ Flowing Data/ Source
  2. Discus final project details
    1. Tagging
    2. Splash image
    3. How presentations will work
  3. Go over how addresses work with Mapbox map
    1. Use this site for geocoding: http://geocod.io/
  4. Go over text readings
    1. Word clouds: Case for avoiding them – http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/
    2. Intro to “text mining” – http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/
  5. Explore text tools:


  1. Based on what you get in the WordPress system today, you will hear back from me with both edits, and comments on how everything is displaying on WordPress over the weekend, hopefully Saturday, but possibly early Sunday.
    1. Text and embedded visualizations all published, after final approval by me, on Medill DC site. All stories should have a cover image selected with a strong visual, and your story should be tagged with the data_projects topic.
    2. Memo as outlined in syllabus describing your decisions
    3. Be prepared to give a 5-10 minute presentation describing your work and decisions, as outlined in the syllabus
    4. Second critique should be handed in, if it isn’t already.
    5. Congrats!

5/28 Agenda; 6/4 Assignments


  1. Go over data stories
  2. Go over Flowing Data (Source is lacking this week)
  3. Finish choropleth map from last week
    1. Get coloring working…
      1. Download QGIS – Merge shapefile data with your original csv, refer back to it in Tilemill
    2. Successfully upload as we did before, but…
      1. Go to data section on mapbox site
      2. Click to get details
      3. Create new project
      4. Add title/description
      5. Send me share link
  4. Make address-driven map
    1. Geocode addresses through Google
    2. Import csv with lat long columns back into tilemill
    3. Style points (marker-width, marker-color attributes)
  5. Individual time to work on maps and final projects, I will meet with people individually for help


  1. Second draft of final project due – w/visualization. Should be in medilldc.net by this point.
  2. Complete and hand in both Mapbox maps.
  3. Refresh yourself on readings (we won’t use this week, the text viz class is pushed to next week)
    1. http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/
    2.  http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/

5/21 Agenda; 5/28 Assignments


  1. Data stories
  2. Flowing Data/Source review
  3. Go over readings
    1. http://onlinejournalismblog.com/2013/09/16/ethics-in-data-journalism-privacy-user-data-collaboration-and-the-clash-of-codes/
    2. http://www.ericson.net/content/2011/10/when-maps-shouldnt-be-maps/
  4. Make choropleth map with shapes
    1. Find shapes to match up
    2. Confirm columns match
    3. Merge data in Google – State shapes are here: https://www.google.com/fusiontables/data?docid=17aT9Ud-YnGiXdXEJUyycH2ocUqreOeKGbzCkUw#map:id=3
    4. Create Mapbox account
    5. Get Tilemill/hook up account – https://www.mapbox.com/tilemill/
      1. If not working, try this
    6. Load shapes
    7. Customize colors
    8. Customize popup
  5. Make point map with addresses
    1. Format spreadsheet
    2. Geocode addresses
    3. Load CSV into Tilemill
    4. Color
    5. Popups
    6. Discuss styling of background


  1. Complete rough draft of project – text and visualization – must be handed in by next class
  2. Hand in two completed Mapbox maps, which we started in class
  3. Read:
    1. http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/
    2.  http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/

Final Project Details

  • The final project is 30% of your final grade.
  • The final project includes a text story, interactive visualization using techniques we’ve learned in class, a memo explaining the choices behind the final project and an oral presentation discussing your work.
  • On Thurs., May 28, I am expecting you to have work you are happy with turned in as a rough draft for both the text and visual components of your story, based on what we’ve discussed in individual meetings. The memos explaining your work are NOT due at this point. I am not expecting anyone to have edited the story by this point.
  • By the last class, Thurs. June 4, you will have received edits from me by May 30 (Sat). Those should be implemented, your entire project should be uploaded to the medilldc.net WordPress site, and have gone through a final review edit (I believe with a writing professor, but possibly me again, I’ll clarify that as soon as I can). These projects will be uploaded into the normal system, but you will have to check a special category for our class, so that all of this class’ stories show up by clicking on one tab on the website. By final presentations, these stories should all be live and final on the site. By 6pm on June 4, I will expect to have a copy of your memo detailing your work in my email, and you will present your work — that presentation is graded as well. After your presentation, be prepared for me or your classmates to follow up with questions.
  • Also, on Thurs., June 4 (our last class), our second critique of a professional data visualization or story is due.
  • The grading breakdown for the project is as follows:
    • Quality of visuals (20%)
      Are your visuals easy to understand? Are they interesting and enlightening? The visuals should be professional and polished.
    • News value (20%)
      Is your story relevant to your beat? Is it an original story idea? This is also where you will score points on good reporting and writing that complement and explain the visuals.
    • Text story (20%)
      Storytelling using words, that make good use of the data you have found, and analyzing it using techniques you learned in class to back up your story. Should have strong narrative throughout the piece, and make use of at least 2 human sources.
    • Appropriate use of visuals (15%)
      Is it a story that is strengthened by your choice of visuals? Did you choose the right visuals to tell the story?
    • Design (10%)
      Does your story have appropriate fonts, colors, alignment and hierarchy? Is there a clear sense of order on the page?
    • Presentation (10%)
      On the last day of class, you will be expected to give a 5-10 minute presentation to the rest of the class on why you chose your topic, how you found/cleaned your data, how you turned it into a story and why you made the visual storytelling decisions that you did. Let the memo you need to write (details below) serve as a guide. Presentation should show what you learned, show the reasoning behind your decisions, and be clear enough that others can learn from the lessons you learned.
    • Memo (5%)A 1-3 page memo detailing why you did the following:1. Picked the story you did. What is interesting or newsworthy about it?

      2. Which columns of data did you use for the text components? Which for the interactive? Why

      3. What human sources did you use? What category do they fall into (expert, person on the street, etc?) Why did you pick that category and that specific person? Is there another “side” of the story you would have liked to cover, or made sure to pay particular attention to?

      4. What techniques did you use for your analysis? Sorting/filtering/grouping? What questions did your data help you to answer? Did those answers surprise you?

      5. What story form did you use for the visual components of your story? What made you decide to use that type of chart or visualization?

      6. What colors did you use in your visualization? Do they reflect sequential, diverging or categorical data? Why did you choose those types of colors?

      7. Defend choices for your axes, if you have any charts. What are the minimum and maximum numbers on your scale? Do you feel they paint a complete picture of the story? Are your axes labeled? If so, what choices are behind those names? If not, why aren’t they labeled?

      8. What shapes did you use to mark different points in your visualization (bars, lines, map markers, etc.? Why?

      9. What other customization did you add to your interactive component? What were your decisions behind making those choices?

      10. What decisions did you make about integrating the text and visual components of your story? Why?

      11. What do you hope a user gains by reading/interacting with your story? What should he/she learn?

      12. What have you learned, from the content and the experience of putting this together, that you hope to apply to your future work?

5/14 Agenda; 5/21 Assignments

5/14 Agenda

  1. Review data stories spreadsheet
  2. Review Flowing Data/Source blogs
  3. Go over readings for this week:
    1. Data viz tips – http://guides.library.duke.edu/topten
    2. Data art vs. data visualization – http://www.perceptualedge.com/blog/?p=1245
  4. Discuss final project requirements – again
  5. Discuss Tableau – when is it a better alternative to DataWrapper?
  6. How do I decide what to visualize?
  7. How and when do I combine chart types to tell a better story?
  8. What is the role of chart types when integrated with a text story?
  9. Talk through addtl types of data visualization: http://guides.library.duke.edu/vis_types
  10. More in-depth look at other types of Tableau charts: http://www.tableau.com/sites/default/files/media/which_chart_v6_final_0.pdf
  11. Classwork
    1. Make a Tableau chart that isn’t a bar, line or map – play with the alternatives.
    2. Write a super short story (less than 500 words) about your findings, and write around the graphic in the way we discussed. You can include more than one chart if you want.

5/21 Assignments

  1. Read these posts:
    1. Ethics of privacy in data journalism – http://onlinejournalismblog.com/2013/09/16/ethics-in-data-journalism-privacy-user-data-collaboration-and-the-clash-of-codes/
    2. When Maps Shouldn’t Be Maps – http://www.ericson.net/content/2011/10/when-maps-shouldnt-be-maps/
  2. Complete in-class exercise
  3. Bring in two data sets to map:
    1. One should feature specific addresses/locations
    2. One should feature information that features numbers categorized by either specific countries or states

5/7 Agenda; 5/14 Assignments


  1. Review data stories in spreadsheet
  2. Go over Flowing Data/Source blog readings for this week
  3. Complete Ruby scraping exercise from Tuesday
  4. Go over readings from Tuesday
  5. Basics of Tableau/how different from DataWrapper
  6. Talk a bit about timelines as a final project option
  7. Overall comments on the final project memos
  8. Classwork/Individual memo review:
    1. I’ll meet with each of you individually outside the classroom. While I’m doing that, please complete the following:
      1. Use Tableau to make a chart based off of a data set of your choice, using rules we’ve set out in previous classes about careful color choices, axis labeling, etc. Send me the completed chart.
      2. Dip your toe into mapping by following this tutorial and send me the result: http://www.peteraldhous.com/CAR/tableau_demo.pdf
      3. If you finish before class time is over, come find me during the individual student meetings and let me know.


  1. Readings:
    1. Data viz tips – http://guides.library.duke.edu/topten
    2. Data art vs. data visualization – http://www.perceptualedge.com/blog/?p=1245
  2. Work on final project. Written responses to questions we lay out during individual meetings will be due (We’ll discuss next steps at our one-on-one meetings).

5/5 Agenda; 5/7 Assignments


  1. What is Web scraping?
    1. http://blog.screen-scraper.com/2008/04/21/screening-scraping-ethics/
    2. http://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
  2. Scrape a website w/out programming
    1. Download data set and Scraper extension: Dataset | Scraper
      1. Learn how to make a list into a spreadsheet
    2. Go over parts of a webpage, how they work together.
    3. Introduce import.io with above dataset, a more flexible option.
    4. More advanced techniques with Outwit Hub, using same data set
      1. Pull specific links/images
      2. Use knowledge of how the Web works to write more complex before/after scraper. – http://michelleminkoff.com/web-scraping-without-programming-nicar-2012-hands-on-tutorial/
  3. Scrape a website with programming: Dataset (We may hit all/part of this section on Thursday)
    1. Download Ruby (If you are on a Mac, just go to the Terminal and hit irb. If Windows, download this: http://dl.bintray.com/oneclick/rubyinstaller/rubyinstaller-2.2.2.exe)
    2. Download Nokogiri. In the command prompt, “gem install nokogiri”
    3. Grab the file I created, and run it sucessfully: https://gist.github.com/michelleminkoff/7e934e88bf958496c10e
    4. Comment out/remove various parts of the file and talk about what makes it work.


  1. For this week, basically readings.
    1. Ethics of scraping – http://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
    2. Getting data from the Web – challenges – http://datajournalismhandbook.org/1.0/en/getting_data_3.html
    3. Role of visualization in finding story in data –http://datajournalismhandbook.org/1.0/en/understanding_data_7.html
    4. Directory of visualization types – http://guides.library.duke.edu/vis_types
    5. Visual math mistakes – https://eagereyes.org/criticism/visual-math-wrong
    6. Stacked area chart vs. line chart – http://vizwiz.blogspot.com/2012/10/stacked-area-chart-vs-line-chart-great.html

4/23 Agenda and 4/30 Assignments

Agenda 4/23:

  1. Go over spreadsheet data stories
  2. Review Flowing Data/Source for this week
  3. Review articles we read
    1. Why Data Viz Matters – http://www.mulinblog.com/data-visualization-matters/
    2. Principles of viz design – http://webstyleguide.com/wsg3/7-page-design/4-visual-design-principles.html
    3. History of viz – http://data-art.net/resources/history_of_vis.php
  4. Importance of white space – https://hackdesign.org/lessons/18
  5. Visual hierarchy – http://blog.formedfunction.com/post/3029763425/on-visual-hierarchy
  6. WTF visualizations – what not to do – http://viz.wtf/
  7. Colorbrewer – http://colorbrewer2.org/
  8. Bar graph or line graph – https://datahero.com/blog/2013/08/06/line-or-bar-graph/
  9. Classwork
    1. Create account on DataWrapper – https://datawrapper.de
    2. Make practice bar graph together
    3. Make practice line graph together
    4. Use your own data set to do the following:
      1. Pick one trend from the data set you’ve been working with (can be from last week, or another you are playing with)
      2. Is it a bar or line graph?
      3. Make it in Datawrapper following ideas we’ve talked about
      4. Think about how chart will be displayed, and mock up the layout of the graphic in an editing program on the computer (Photoshop, etc), or draw it on a piece of paper. Hand in chart by taking a screenshot (either Command-Shift 4, or go to Publish tab, print to PDF and send me PDF).
      5. Write memo (at least 500 words) commenting on:
        1. Chart
          1. why you used a certain chart type
          2. color
          3. white space
          4. axis labels
        2. Overall layout
          1. What information you wanted to include
          2. Why you placed it where you did
          3. How you designated what info is most important

Chart, overall layout and memo, you can finish either in class, or at home, for next week.

Assignments 4/30:

  1. Finish classwork (see above)
  2. Hand in critique of a data story (see syllabus for more detail on this)
  3. Hand in formal memo about your final project (should be at least one page)
    1. What is the overall topic of your story?
    2. Why is this story important?
    3. What benefit does structured data bring to this story? What can numbers tell you that people cannot?
    4. Ideas of at least three people-sources (types of people, if not specific names) who you can interview for your story (you will be required to have three sources besides the data in your final project as well. If these don’t pan out, that’s fine, but start thinking about it now.
    5. Three ideas of what a reader will learn from your story, and how it will impact them.
    6. What non-data, non-human interview research work you need to do to flesh out your idea. I imagine more will come up as you go, but explain where you plan to start.
    7. Another data source that would help that you wish you had, which would bring better context to your story.

4/16 Agenda and 4/23 Assignments

Agenda 4/16:

  1. Go over spreadsheet data stories
  2. Review Flowing Data/Source for this week
  3. Discuss how the homework went — what sites were most/least helpful?
  4. Discuss interviewing data article
  5. Useful types of data sets
  6. Converting PDF to spreadsheet –> http://tabula.technology/
  7. Categorize our questions — are they:
    1. Calculation
    2. Sorting
    3. Filtering
    4. Other
  8. Review how to answer
    1. Calculation
    2. Sorting
    3. Filtering
    4. Other – you may need more info, may not be appropriate as a data question, may need a not-yet-covered bit of Excel

Assignments 4/23:

  1. Hand in completed questions, with answers, about your data set. You may have completed this in class, or you might not be finished, in which case you’ll have a bit of work to do at home.
    1. Includes a spreadsheet of questions
    2. The data set you are using
    3. A memo explaining what process you used to arrive at each of those answers.
  2. Read the following, and be prepared to discuss:
    1. Why data viz matters –  http://www.mulinblog.com/data-visualization-matters/
    2. Principles of visual design – http://webstyleguide.com/wsg3/7-page-design/4-visual-design-principles.html
    3. History of visualization – http://data-art.net/resources/history_of_vis.php
  3. Use what we discussed in class today to carefully consider your final project topic. Can you use any of these insights? Do you need more info? A different data set? Your project idea is due in 2 weeks. Use this process to help you arrive at a topic.

In-class assignment 2 for April 9

This exercise is designed to help you find data resources for stories related to a topic of interest.

  1. What topic do you want to search? Write 5 search terms related to your topic.
  2. Think about your topic and search the Data Hub, World Bank, United Nations and UK Data Archive (all referenced here: http://datajournalismhandbook.org/1.0/en/getting_data_0.html), as well as data.gov (which we didn’t talk about, just looking for you to explore). Identify 2 data sources of interest from each of those sites, and record them and why they are interesting (should be a total of 10).
  3. Use techniques we discussed in class today to find at least ten different interesting data sources on your beat. At least five should be a form of structured data (xls, csv, xlsx), one should be a pdf and two should be others (listed below). Record how you accessed this information (what you entered into Google search) and write 2-3 sentences for each set describing what you found, and how it might be useful). One search should be by site type, one by link, and at least five by filetype.


csv – comma-separated values
xls/xlsx – Excel

pdf – unstructured data

xml – structured data, not spreadsheet
ppt – internal presentations
kmz/kml/shp – geographic data