#StatsHistory : Transform United Nations statistical publications into #opendata

X Close

Prev | Next

Table Recognition and Extraction

by Tian Qing 02/28/2018 04:42 PM GMT

{{:upVoteCount}}

Move idea from "Expert Review" stage to:

Collapse

Do you want to send this idea to AdaptiveWork?

Collapse

Do you want to send this idea to Portfolios?

Parent structure code

Collapse

Which workspace template do you wish to use?

Collapse

I accept the terms and conditions (see side bar). I understand all content I am submitting must be licensed under an open-source software or Creative Commons license as described in the Terms and Conditions:

Description

The outcome of this task will be extracting table from the PDFs of Yearbook and structuring the tables into readable data set (csv) for Government, researchers and others to use in the future.

Initial Phase:

1. Downloading the PDFs.

2. Using packages in R / Python to Extract table from PDFs.

3. Transforming the tables into the data frame.

Cleaning Phase:

1. Removing and modifying rows or columns which contain incorrect value

2. Identifying each table by its page No. and name of PDF.

3. Normalizing similar tables into same structure to avoid redundancy if the data would be stored in Database in UN.

Co-authors to your solution

Gokulakrishnan Narasimhan, Guilherme Silveira, Meijie Li, Guangyue Li

Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)

Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):

Link to a video or screencast of your solution or prototype (Required by the final day of the Submission & Collaboration phase):

Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):

Tags: Initial idea,#StatsHistory

Move this Idea

Close this idea

When closing an idea, you must determine whether the idea has exited successfully or unsuccessfully.

Was the idea selected?

What is the Primary annual Impact?*

Quantify based on your selection*

What is the annual Secondary Impact?

Quantify based on your selection

What will the next steps be?*

Cancel Submit

Add Team Members

*Required

Cancel Add Now

Done

Help to Improve This Idea.

life cycle stages

33%

User Tasks

Required for graduation.
Task	Assigned to	Due Date	Status
Approval	Jorge Martinez-Navarrete	06/15/2018	Completed on 05/04/2018

Terms & Conditions

Help to Improve This Idea.

legal.notice.title

View Idea

Table Recognition and Extraction

Move idea from "Expert Review" stage to:

Do you want to send this idea to AdaptiveWork?

Do you want to send this idea to Portfolios?

Which workspace template do you wish to use?

Move this Idea

Close this idea

Copy idea to another community

Team Members

Add Team Members

Comments

Help to Improve This Idea.

Tasks

Comparable Ideas

Activities

Terms & Conditions

Help to Improve This Idea.

legal.notice.title

Inbox

View Idea

Table Recognition and Extraction

Move idea from "Expert Review" stage to:

Do you want to send this idea to AdaptiveWork?

Do you want to send this idea to Portfolios?

Which workspace template do you wish to use?

Move this Idea

Close this idea

Copy idea to another community

Team Members

Add Team Members

Comments

Help to Improve This Idea.

Tasks

Comparable Ideas

Activities