Multilingual Natural Language Processing using Python
2021-11-13, 15:00–15:30 (Europe/Athens), Room 2

Natural Language Processing(NLP) is an interesting and challenging field. It becomes even more interesting and challenging when we take into consideration more than one human language. In this talk, we will discuss the techniques for processing information in more than one human language.


Natural Language Processing(NLP) is an interesting and challenging field. It becomes even more interesting and challenging when we take into consideration more than one human language. when we perform an NLP on a single language there is a possibility that the interesting insights from another human language might be missed out. The interesting and valuable information may be available in other human languages such as Spanish, Chinese, French, Hindi, and other major languages of the world. Also, the information may be available in various formats such as text, images, audio, and video.

In this talk, I will discuss techniques and methods that will help perform NLP tasks on multi-source and multilingual information. The talk begins with an introduction to natural language processing and its concepts. Then it addresses the challenges with respect to multilingual and multi-source NLP. Next, I will discuss various techniques and tools to extract information from audio, video, images, and other types of files using PyScreenshot, SpeechRecognition, Beautiful Soup, and PIL packages. Also, extracting the information from web pages and source code using pytessaract. Next, I will discuss concepts such as translation and transliteration that help to bring the information into a common language format. Once the language is in a common language format it becomes easy to perform NLP tasks. Next, I will explain with the help of a code walkthrough generating a summary from multi-source and multi-lingual information into a specific language using spacy and stanza packages.

Outline
1. Introduction to NLP and concepts (05 Minutes)
2. Challenges in Multi source multilingual NLP (02 Minutes)
3. Tools for extracting information from various file formats (04 Minutes)
4. Extract information from web pages and source code (04 Minutes)
5. Methods to convert information into common language format (05 Minutes)
6. code walkthrough for multi-source and multilingual summary generation (10 Minutes)

I hold M.Tech. in Computer Science and Engineering and PG Diploma in Cyber Law and Cyber Forensics from National Law School of India University, Bengaluru India. I have presented talks/posters/papers at prestigious conferences including JuliaCon, London, PyCon France, PyCon Hong Kong, PyCon Taiwan, COSCUP Taiwan, PyCon Africa, BuzzConf Argentina, EuroPython, PiterPy Russia, SciPy USA, SciPy India, NIT Goa, and IIT Gandhi Nagar. Worked as a Reviewer and Program Committee member for reputed International conferences including SciPy USA, SciPy Japan, JuliaCon, JupyterCon, PyData Global, and PyCon India, and publishers include Manning USA and Oxford Univesity Press. I am also a GitHub Certified Campus Advisor. I lead the PyData Belagavi chapter and the OWASP Belagavi chapter.

This speaker also appears in: