/Towards Mature Data Science

Towards Mature Data Science

Some pointers I learnt with time:

  • Think of a variety of solutions instead of going for the first solution that comes to mind
  • Define train/test/validation data
  • Don’t commit metrics to stakeholders before trying the solutions
  • Define good metrics for selection of model
  • Think of inference time and infrastructure requirement while model selection
  • Have an in-depth understanding of libraries to avoid making inefficient time-consuming custom solutions
  • Know the pros-cons of algos in consideration for a solution and think beforehand
  • Think of big(O) while making solutions.
  • Use AutoML as an ally in parallel to custom solutions
  • When you use someone else’s solution, go through it thoroughly
  • Ask others when stuck

A list of mistakes I did while doing data science interview assignments and projects:

  • Scaling the dependent variable y
  • Not using stratify while doing train/test split for imbalanced data
  • Not having train/test/validation strategy
  • Using label encoded text in a feed-forward network for a classification problem
  • Using an embedding layer in LSTM network without passing embedding matrix
  • Doing oversampling of the minority before doing train-test split
  • Making a custom function for fill forward in a df instead of using pandas ffill method in fillna
  • Not checking vocab coverage before using a particular word embedding
  • Not writing functions in the interview assignment notebooks

An AI evangelist and a multi-disciplinary engineer. Loves to read business and psychology during leisure time. Connect with him any time on LinkedIn for a quick chat on AI!