DescriptionThe outcome of this task will be extracting table from the PDFs of Yearbook and structuring the tables into readable data set (csv) for Government, researchers and others to use in the future.
Initial Phase:
1. Downloading the PDFs.
2. Using packages in R / Python to Extract table from PDFs.
3. Transforming the tables into the data frame.
Cleaning Phase:
1. Removing and modifying rows or columns which contain incorrect value
2. Identifying each table by its page No. and name of PDF.
3. Normalizing similar tables into same structure to avoid redundancy if the data would be stored in Database in UN.
Co-authors to your solutionGokulakrishnan Narasimhan, Guilherme Silveira, Meijie Li, Guangyue Li
Help to Improve This Idea.