DescriptionWe will focus on the following improvements:
1.Tokenization
Make the connection of the sentences with similar meanings.
Make sure to keep the completeness of the meaning of the sentences.
2.Extraction.
Extract more meaningful contents from the documents.
3.Classification.
Enhance the match accuracy of categorizations with sentences.
4.Multi-lingual support.
Perform classification on non-English documents.
Main Challenges:
1.Building a proper dictionary.
2.Analyzing different languages.
Methodology and Tools:
Python NLP:Textract, NLTK, Goslate, Scikit Learn, Spacy
Co-authors to your solutionChien Min Wang, Yichao Li, Aidi Li, Xiansheng Zhang
Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)https://github.com/UniteIdeas/CyberSecurityNLP/issues/1
Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):https://github.com/UniteIdeas/CyberSecurityNLP/issues/1
Link to a video or screencast of your solution or prototype (Required by the final day of the Submission & Collaboration phase):https://youtu.be/5KFF_vA7gJI
Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):https://github.com/UniteIdeas/CyberSecurityNLP/issues/1
Help to Improve This Idea.