DescriptionThe task is divided into 4 steps. Each step is done either automatically or with the help of an agent (assisted).
1- Finding pages containing tables - assisted
Input: yearbook PDF file - https://unstats.un.org/unsd/publications/statistical-yearbook/files/syb56/syb56.pdf
Output: a csv file containing page numbers of each table - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/2_table_page_numbers/tables.csv
Method: Recording the page numbers for each table in the yearbook (manually)
2- Extracting the pages containing tables - automated
Input: a text file containing page numbers of each table - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/2_table_page_numbers/tables.csv
Output: a set of PDF files each containing a table - https://github.com/moqri/UN_YearBooks2OpenData/tree/master/3_table_pdfs
Method: PyPDF2 - https://pythonhosted.org/PyPDF2/
3- Extracting the data form each table -automated
Input: a PDF file containing a table - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/3_table_pdfs/2.pdf
Output: a CSV file containing table data - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/4_table_csvs/2.csv
Method: Tabula - http://tabula.technology/
4- Cleaning the data and creating tables - assisted
Input: a CSV file containing table data - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/4_table_csvs/2.csv
Output: a CSV table with correct labels and rows - https://github.com/moqri/UN_YearBooks2OpenData/blob/master/5_tables_cleaned/2.csv
Method: comparing the original table (from PDF) and the result (in CSV) and correcting the labels, alignments, etc. (manually)
Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)https://github.com/moqri/UN_YearBooks2OpenData
Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):https://github.com/moqri/UN_YearBooks2OpenData
Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):https://github.com/moqri/UN_YearBooks2OpenData
Help to Improve This Idea.