Youth Research & Knowledge

Creating the Youth Policy Fact Sheets: A technical perspective on the road to ninja

Published on

We started building the youthpolicy.org website using WordPress, a free and open-source blogging tool and content management system based on PHP and MySQL. The advantages were enormous: Not only we could bootstrap our on-line presence literally within weeks and start publishing content, we had a vibrant community at our disposal, willing to communicate and help each other and share their development (such as plugins, designs and advice). youthpolicy.org is a real open-source success story - thanks to the community around WordPress. And with this, youthpolicy.org was born!

As our content, audience and reach grew, so did our ambitions. It was clear to us that a blogging site was no longer enough: we needed the site to become a real hub of information about youth policy - a place to keep all the global evidence we were building.

The development of the Youth Policy Library - for archiving documents around youth, policy, research, and publications - was our first step. But again, our ambitions were larger. We needed a concept that would reflect the valuable research our team were conducting: in other words, some kind of data-container that would include a pre-defined set of data about youth in the world. In layman’s terms, we wanted to find a way to not just store our data about the situation of young people in the world, but also present it in a way that was accessible to the public, usable for our team, and that looked brilliant. This is how the Youth Policy Fact Sheets came to life.

Today, we have 196 Fact Sheets online, covering every UN member country. These provide snapshots of youth policy, legislation, participation, age classifications, and the economic and political life for youth. They are a useful summary of the situation of young people, combining existing data, such as indicators and indices, and original research such as legislation scanning and age definitions. For more details see the Fact Sheets home page, and our summary blog post.

The difficulties in data collection and research have been covered previously in ourarticles about the Fact Sheets, andthis article will focus on the challenges involved with the development of their on-line presence, the derived visualisations and the sustainability of the data that make up the fact-sheets. This article is aimed at the technology community, and is written for those interested in creating digital spaces and systems for data containing and visualising. So here is our warning: This article comes with tech speak! But, we want to get this message out - without simplifying it down to something meaningless - to those who are designing the way in which data is presented. You can’t say you were not warned…

The back-end

Speaking WordPress, we created a custom post type factsheetand assigned a series of custom fields to it, one for every figure or insight we wanted to show about a country. Because we knew there would be a maximum of 196 factsheets (one for each UN member country) we were rather generous with the amount of custom fields assigned to each post. The scalability of our database mattered less, however it is important to remind the reader that if our requirements had gone over a few hundred posts, the amount of custom fields every Fact Sheethad would have been fatal[1].

The front-end

Screenshot from 2015-05-22 12:35:28

After having the skeleton of the Fact Sheet in place via our beloved WordPress data structures (custom post types and custom fields), our designer and user interface expert Bowe worked - very impressively - to find the right way of presenting this data on the front-end. The single page template of the Fact Sheets is a bijou. Responsiveness is granted on every screen-size, sidebars separate certain datasets from others, and the overall alignment is reflective of a concise piece of paper, as if it were a cheat-sheet for youth researchers.

Sustainability of the data

All of the initial factsheet data was inputted manually via a gravity-form, after content was finalised in a Word template. Given the 90 custom fields, you can imagine the size of the resulting gravity-form. Much of the initial research was based on external datasets (from e.g. World Bank & UNESCO), and so our researchers were able to confirm that the data would be suitable for the analysis we wanted to perform.Every country required significant research anyways, and so the time spent on inputting the data to the gravity-form was marginal compared to the wider research process.

Once the research was complete and the factsheets were launched, our attention turned to the need to keep the data up-to-date. Unfortunately, we realised that our originalcdesign would not be sustainable. Imagine the scenario: Unemployment datasets from the World Bank changing once every year for every country and having to go into 196 gravity forms to change one field. Then do this for GDP, HDI, GINI, YDI, literacy rate, tobacco use, and school enrolment - all at different times, with multiple changes in one year. A pretty tedious job, right?

To avoid this nightmare, we started to look at possible ways to synchronize CSV (Comma Separated Values)files with our database - the same format as most external data providers would make available.

The key of the whole CSV synchronization lies in the utilisation of the native PHP function fgetcsv() and array_combine().We were able to transpose the whole content of a CSV into an array of arrays of key value pairs having the CSV headers as keys and the CSV column data as values.

//initialize an empty array
$csvMasterArray = array();

//define the location of the CSV
$csv_url = "static/unemployment.csv";

//open the csv and do some basic error handling
if (($handle = fopen($csv_url, 'r')) === false) {
die('Error opening file');
}

//get the first row of the CSV
$headers = fgetcsv($handle, 5000, ',');

//for each row, combine the value of the column with the value of the headers row
while ($row = fgetcsv($handle, 5000, ',')) {

$csvMasterArray[] = array_combine($headers, $row);

}

fclose($handle);

The result of these operations can be exemplified as follows:

//given an example comma separated values file containing 2 rows and 5 columns:
//this is dummy data only used for this example!

"country_name,country_code,unemployment_male,unemployment_female,unemployment_total
Afghanistan,AFG,18.79,18.39,16.39
Armenia,ARM,16.79,12.25,13.53
..." 

//the outputted PHP array of arrays looks like this: 

Array (
  [0]=> Array (
          [country_name] => Afghanistan
          [country_code] => AFG
          [unemployment_male] => 24.79
          [unemployment_female] => 18.39 [unemployment_total] => 8.39
        )
  [1]=> Array (
          [country_name] => Armenia
          [country_code] => ARM
          [unemployment_male] => 16.79
          [unemployment_female] => 12.25
          [unemployment_total] => 13.53 )
  ...
)

//perfectly suitable for assigning the values to whatever container needed 

The rest of the operations were done by using the WordPress’ functions (or in WordPress terminology, “template tags”), looping through all the Fact Sheets and assigning the values of the rows of $csvMasterArray to the corresponding custom field. Crucial to this operation is the need for every Fact Sheet to be uniquely linked to the corresponding country-row in the CSV. To overcome this, we took the World Bank’s ISO country-codes, and assigned each of our custom post typefactsheeta custom field country_code.

The synchronisation process was then enhanced to include datasets from multiple sources. As it stands now, as a basic rule, everything that does not need human compositing or formulations should be synchronisedthrough the CSVs. Every custom field that isbeing populated by a CSV was prefixed with the convention “csv_”, so that in the back-end of our WordPress, team members canclearly identify those fields that donot need any human input. For even better usability and error avoidance the developer could hide the custom fields starting with “csv_” completely from the back-end.

As part of this, we even increased the amount of custom fields to be able to do more granular abstractions of the data. For example, information like the data source and the publication year, which originally sat in the same custom field and therefore only able to be visualised together, could now be separated in any possible way, creating the basis for more flexible representations of the data.

This concept was applied to a fair amount of custom fields, increasing them to 128, but decreasing the ones that needed human editing to only 32. This means that the gravity form shrank from 90 input fields to just 32, saving our researchers quite a few headaches when having to update a Fact Sheet! Of these 32 input fields, most of them represent texts and original research conducted by the Youth Policy Labs team, and therefore edited with the gravity-form.

To further make life simpler for our team, a plugin was developed. This included a page within the WordPress back-end, where anyone - without the need for higher technical knowledge - can upload and synchronise CSVs.

The Fact Sheetsas overall data-container

When thinking of the first version of our Fact Sheets, we considered the front-end representation to be an exact same mirror of the data sitting in the back-end. While the project grew, we realised that we should not be bound to this design and that there was space to let our imaginations go a little wild…

  • How could we integrate other country-related information on youthpolicy.org into the Fact Sheets?
  • How could we present the data in a way that was accessible and usable, both at a glace but also for rigorous exploration?

We could either start all over again, or use the data-containers that were already serving the Fact Sheets.

The overview of national youth policiesis a good example. Data on this page is built with data from the Fact Sheets, and instead of having to update two parts of our website when a country launches a new youth policy, we had to ensure there was synchronicity between pages and and find ways that data can be visualised within multiple parts of our website.

The second question will be answered in the following section.

TheFact Sheets as source for data visualisations

When researching for the Fact Sheets, a parallel project was started by Tatsu and Emilia to explore how we could visualise the rich data that we had all across youthpolicy.org.

Screenshot from 2015-05-22 13:01:54We decided that a good solution would be an interactive choropleth map. This is when we started working with Mapbox as provider of custom on-line maps.
The data of Mapbox is taken both from open data sources, such as OpenStreetMap and NASA, and from proprietary data sources, such as DigitalGlobe. The technology ismostlybased onMapnik, and Leaflet.js.

Having explored the options, we started working with geoJSON.As the acronym already tells us, geoJSON files are JSON (Javascript Object Notation) files containing geographical information. It is a format for encoding a variety of geographic data structures (points, polylines, polygons) and can be used by Mapbox to create interactive map overlays.

Within the geoJSON format, every country represents a polygon made of coordinates that have additional properties called “feature objects”. These can be used as data containers, such as the presence of a national youth policy or the minimum age of criminal responsibility (and many other figures). In short, they were perfect for storing custom data about each country.

Finding a geoJSON file of the world’s countries in the era of open source was easy, and we then took these empty country representations and filled them with data from the Fact Sheets. This created another synchronisation process: this time from the database to the front-end. The fields we thought made sense in an aggregated view of all the countries, were copied into the geoJSON.

A simple json_encode() would do the trick if the passed in PHP object was correctly built out of our database. To build this object we used the handy WordPress’ functions to retrieve post data.

If interested in the structure of the geoJSON, just take a look at the finalized geoJSON available on our site.

This filewas then simply included on the Fact Sheets landing page, imported into the Mapbox map and made interactive using Leaflet.js - a powerful open-source JavaScript library for mobile-friendly interactive maps.

The Mapbox documentation has been an extremely valuable source of information for these last steps.

The results of these operations can be seen on theFact Sheet landing pageand are described in the ourredesign article after the redesign launch of youthpolicy.org. For a graphic representation of what we covered so far, take a look at following picture:

yp_data_flow

The future of theFact Sheets

So far, we have produced the content, built the systems, solved the initial problems of sustainability and presented our data visually in a dynamic and interactive way. The challenges have not been as easy in practice, especially the process of synchronising data from CSVs to our database and from the database to JSON files processed on the client-side.

  • But what if we wanted to get the data out of the database without having to always synchronise back and forth?
  • What if our partners wanted to retrieve data for certain topics or geographical areas in an automated way?
  • What about mobile apps? How could we make our data even more accessible - instantaneously - in the palm of your hand?

The need to have an own youthpolicy.org RESTful [2]API (Advanced Programming Interface) has become imminent.

Luckily for us, WordPress seems to be developing in this direction as well. A RESTfulAPI is already available through the installation of a plugin - wp-api - and the WordPress development team is discussing an integration into the core [3]. Using HTTP verbs to query our Fact Sheetswill soon be possible. And then the possibilities will be huge.

Here is one idea for you:

Imagine a section of the website with an on-demand generation of CSVs. A place where you could select the figures you want (youth unemployment, tobacco use and/or any other Fact Sheet figure), add regions or subregions of your choice, and have a CSV generated instantly.

The once limited, albeit user-friendly apprentice (called WordPress!), is now capable of handling the barriers of modern web development, having become a ninja freed of (or at least some of) its lorries (PHP and the traditional web paradigm of pages and links) and is armed with a robust combat gear (the RESTful API).

Conclusion

This article covered the main technical challenges we faced during the development of the Fact Sheets. It did not dive into the Javascript development needed for the visualisation of the interactive map, nor the the integration of the country table. This is the start of a series of IT related articles, and we hope to cover possible table implementations (as the one seen on the Fact Sheet landing page) in our next article on the development of the Youth Policy Library. Then we’ll tackle the migration to the roots.io starter theme and the challenges involved.

If you would like any clarifications regarding our source code, or have any questions, we’ll happily answer them via the comments section. For a more thorough understanding of the desired architecture we would like to achieve, the Single Page Manifesto [4]will be a very valuable source of information.

Footnotes


[1] - []: For cases like this a common approach is to serialize all meta-data into one JSON object, to store the JSON inside one custom field and deserialize it upon every page load of the post, reducing the amount of database reads per post by hundreds.

[2] - Representational State Transfer (REST) [] -Roy Fielding’s Dissertation

[3] - Github discussion []. - https://github.com/WP-API/WP-API/issues/571

[4] - The Single Page Manifesto []. - http://itsnat.sourceforge.net/php/spim/spi_manifesto_en.php

Written by Jacob Kreyenbühl and edited by Alex Farrow.

Featured image is a screenshot of computer programming in action by Jacob Kreyenbühl..