Urban Data Science

A course on Geographic Data Science that shows how data can be used and misused, and how to critically evaluate datasets, models and questions that arise from them.

By Trivik Verma in education open-source

January 1, 0001

Thumbnail image is adapted from the work of Sean Perez for educational purposes only.

What is Urban Data Science?


Historically, humans have always migrated to dense spatial agglomerations for social interactions, exchange of goods, services, information and ideas. Globally, cities are expected to accommodate over 80% of the world population by 2050. The unprecedented development is tightly linked to the most pressing functional and environmental challenges of our time.

Multiple cities in the world are at the brink of collapse, suffering from poverty and segregation, excessive consumption, pollution and associated changes in climate, depleting agricultural utility and in exceptional cases, submergence of land. Lacking resources to solve such problems, some cities are directing further development to satellite centres like Accra in Ghana or Cairo in Egypt. Jakarta, home to 60% of the Indonesian population, is considering relocating its capital city to the rainforests of Borneo because of rising sea levels. Rapidly urbanising cities are also making tremendous efforts to become smarter, sustainable, resilient and inclusive. But how? Transforming from a manufacturing to an information economy in the last few decades, urban regions around the world have witnessed ever-increasing inequalities. Multiple low-income communities have suffered adverse social, economic and environmental consequences, while others have been pushed into worse forms of inequality.

Urban Data

In the last decade, technological advancements have led us to embed large-scale networked systems, sensors and computers into the built environment. Urban data has emerged as an excellent stream of constant, real-time and accurate information about all urban activities. The big data revolution, coupled with the capacity of infrastructure to be “smart” has enticed cities and urban managers worldwide to participate in machine learning-based decision making for improving the course of humanity. But city planning has largely been instituted around loosely coupled organisations within municipal and regional governments, project developers, companies and investors, transport, water and energy operators. While some communities have enjoyed the benefits of policies based on the use of big data, machine learning and AI, many have also suffered disproportionately by being pushed to the physical and technological periphery of rapid development in cities. As a data scientist, and especially an engineer and policy analyst, it is our responsibility to interrogate the quality of data, research the design of intelligent systems and evaluate their impact on communities.

Course Contents — What is this class about?

The primary purpose of this course is to teach future data scientists to look beyond the technical power of artificial intelligence and recognise the possibilities and limitations of data and the spatial inequalities that galvanise as a result of data-driven policy. This course will engage students at the intersection of data science, urbanisation, and effective communication. By interrogating the sociotechnical nature of urban problems, students should then be able to approach solutions to these problems in ways that prioritise social equity and justice.

This class will train students to gather, fuse and clean data from multiple sources, in order to gain useful insights into the reality of multiple problems in urban ecosystems, understand and estimate alternative implications of solutions and communicate results to a wide audience effectively.

The course is divided into five major modules, each focusing on crucial steps in the lifecycle of a data science project.

  1. Obtain: Obtaining data from multiple open data sources.
  2. Scrub: Data cleaning, munging, sampling to consolidate all information into a dataset that is manageable, informative and relates to your problem.
  3. Explore: Exploratory data analysis to make sense of the data.
  4. Model: Estimation and modelling based on statistical tools such as regression and clustering.
  5. Interpret: Communicating results and reflections through visualisation, storytelling and interpretable summaries.

Pedagogical Goals

After completion of this course, you will be able to:

  • interpret and discuss data sources that are usable and relatable for a problem presented.
  • manipulate data and consolidate all information into a dataset that is manageable, informative and relates to your problem.
  • describe and analyse the consolidated dataset(s) to support your problem with evidence.
  • apply models using statistical and machine learning to infer results in the process of turning data into valuable information.
  • report results and reflections through visualisation, storytelling and interpretable summaries, especially when faced with a new dataset.

… and hopefully get great data-driven policy jobs in the future where you can address issues of equity, or go on adventurous travels with an open mind.