It’s not debatable to say that it’s always good to know libraries in case of a need.
Core libraries that you have to start with
- Jupyter notebook – For interactive Python (Highly recommended)
- Spacy (Checkout NLTK Vs Spacy Vs Stanford CoreNLP)
- Textacy (Built over Spacy)
- Stanford CoreNLP
- Textblob (Built over NLTK and scikit)
- RegEx (Inbuilt in Python) – For word patterns
- Newspaper – Scraping content of news articles and extracting keywords
- Python CRFSuite – Just for making CRFs for NER or other purposes
- Gensim – For generating word embeddings
- Librosa – Audio analysis
- LDA – For Latent Dirichlet Allocation
- Textract – For parsing any text file or image containing text
You can find more over here.
- Tesseract (OCR engine)
Reading text files
- Youtube-dl – Can download whole playlists
- Coursera-dl – Can download all videos of a course
- Udacity-dl – Can download all videos of a course
Scraping web pages
- Beautiful soup – For scraping HTML pages
- Scrapy – For extracting certain fields from a html web page (Can be trained to extract certain fields with samples)
- Newspaper – For getting text of news articles
- Pigar – Generating requirements.txt from all python files in a repo
You can find more libraries over here.
In case you are facing error in installing a package, dowload the .whl file from here and install it using pip install packagename.whl
This list will keep on updating. Let us know if you have any suggestions!
An AI evangelist and a multi-disciplinary engineer. Loves to read business and psychology during leisure time. Connect with him any time on LinkedIn for a quick chat on AI!