[Project] Air pollution analysis


Would you like to feel like a real scientist discovering regularities in large data sets, formulating and testing hypotheses? Do you like to search the Internet for information, connect different facts and draw conclusions? If so, this task is for you: you will have an opportunity to analyze real data and identify how it depends on geographical and meteorological parameters.

The students will get familiar with the air pollution problem, air quality monitoring initiatives and basic geographical and weather factors which influence air quality. Working on the problem will allow the students to demonstrate their analytical skills and practice typical research process steps: understanding and interpreting data, making assumptions and testing hypotheses, generating observations and formulating conclusions. Finally, the analysis will strongly foster good teamwork, brainstorming, and extensive communication within the team.


Hello everyone!

My name is Agnieszka Bukowicka and I’m 17 years old. I am from Warsaw (Poland). I would like to take part in this project because my country belongs to the one of the most polluted ( so it gives me lots of sources to get informations ). I’d love to join an international group to work in this activity.

Thank you,


Hello Agnieszka, my name is Maria Jose and I’m 17 years old too. I am from Albacete (Spain). I agree with this project, so I would like to cooperate with you.


Oh perfect, I’m so glad! So we need only one more person.


Enclosed below please find the main questions and topics discussed during the coaching session on March the 22th 2018 on the topic: Air Pollution Analysis

Q1. Is there any length limit for the project?
Answer: There’s generally no limit, you needn’t worry about the project being too short because the organizers are pretty aware that the time given is relatively short.

Q2. Are many charts required?
Answer: Yes, charts can be included in the solution with appropriate comments attached that back up the hypothesis stated at the beginning.

Q3. May the analysis use also data which is not openly available (e.g. data sent on request).
Answer: Yes, generally it is recommended to use rather openly accessible data, however as it turned out in many case the some kind of data, or historical data is not published. So, if you manage to get this data contacting the data provider, it is OK.

Tips regarding the data preprocessing
There are many tools we can be used for data preparation and analysis. One of the is NotePad++ which amongst many powerful features offers:
a) Regular expressions support which allows searching data patterns and replacing them by other strings
b) Columnar mode (Alt + RightMouseClick, or choosing appropriate option from menu) – coping, pacing, deleting selecting columns in txt files.