Machine Learning Interview Questions and Answers 2021

By Aradhya Kumar

Last updated on Nov 23 2020

Machine Learning Interview Questions and Answers 2021

Most Commonly Asked Machine Learning Interview Questions 2021


In this article, we will talk about Machine learning and the most important questions you can expect in your interview.


Machine learning broadly can be classified into four categories:

  1. Classical Learning
  2. Neural NETS and Deep Learning
  3. Ensemble Methods
  4. Reinforcement Learning

Our article will deal with each of them and provide you the most asked questions from these domains. There are four types of genre that you can expect in your interview or any prelims & they are as follows:

  1. Modeling case study questions.
  2. Core machine learning interview questions
  3. Recommendation engines & search engines.
  4. Questions based on Python.


You must know that the degree of difficulty depends upon your applied profile or job role. For Business-oriented roles, you need to prepare for Applied Machine Learning and require experience and skills for higher Job-roles such as data scientists or research scientists.


Modeling Case study:

Tenda is a product-based company that manufactures goods like Router, modem, and Optic fiber. Its product engineers, design engineers, material engineers, and their respective teams worked hard to build a new product category in their Router product line. Their product uses multiple channels to reach their end-users like ISP, network providers, cable connection owners & e-commerce websites. A returned product gets treated as a loss in revenue & a shame. Retailers can return the product when not sold, & that also adds up to Tenda's loss. Challenges could vary as the following:

  • unavailability of data
  • unstructured data
  • It takes months to gather data and validates its sufficiency.

Here the relevant-information of the product features with past transaction data is available with customer reviews & warranty details of the product.


Basic Machine Learning Interview Questions


Q1. How can we ascertain the volume of the returned products, followed by the reasons for return?

The data engineers have to use NLP technology like word embedding, N-grams, term frequency-inverse document, Latent Dirichlet Allocation, Support vector Machine & Long Short-term memory. For tokenization, lemmatization & parts-of-speech tagging. It will provide the reasons for the return products.


Q2. Hence the business wants a solution that can predict the failure rates to build a better product.

NLP and BI tools devised for data visualization will create a forecast model that will identify the pattern in complaints, frequency of complaints as per the product line. Test analytics does an acid test and reveals returned products and defective products. The relevant data to solve the problem of data insufficiency battled through data sampling bias.


Recommendation engines & search engines:


How would you build the recommendation algorithm engine likeType-ahead search?

Based on the origins of the data, we can create a confluence between collaborative filtering and content-based or characteristics filtering. We achieve convergence through the Hybrid approach. We can also classify the data based on the algorithm our end users use, for instance, memory base, model base. The Hybrid Approach is a mixture of these memory-based and Model-based algorithms.


Then we need to focus on collecting feedbacks explicitly and implicitly.

Then we start with a matrix of users x items using ratings to find similar users or items recommended with high similarity.

With the use of the K-Nearest Neighbors algorithm, we can find the K closest data points prediction and get the average of the neighbors.

Finally, we load the data from the database to find the users nearest the neighbors and use neighbors' ratings to predict ratings. This method is the RapidMiner Process. It helps us to train the system by doing the trials from the big chunk of the training data.

And at the end of the step, we use content-based recommenders as an ML Problem to classify items according to their contents to match the user's interests. We also profile users based on their content.


Core Machine Learning Interview Questions

How would you differentiate AI, Machine learning, and deep learning from each other?

Artificial Intelligence is the technique that enhances the computing prowess of machines to mimic human behavior, and Machine learning uses statistical methods to enable this behavior that includes Deep Learning to create this augmented reality. Deep-learning, on the other hand, uses multilayered Neural networks to compose the algorithm obtained from vast data such as speech and image recognition to simulate human decision-making or works as a positronic brain.


Illustrate the different types of Classical Learning?

There are two types of Classical Machine learning:

  • Supervised
  • unsupervised

These are simple tabular data with CLEAR features.


What is Bayes' theorem?

Bayes' theorem allows us to determine Posteriers Probabilities from our Priors with evidence. It is a method of revising existing predictions given in new evidence. The theorem forms the fundamental assumptions of the Naive Bayes' Classifier.


What is the difference between generative & discriminative models?

Discriminative Models learn decision boundaries between classes. SVM is discriminative as we are creating a learned decision boundary and serves as a maximum margin classifier. The discriminative models are not susceptible to Outliers efficiently but have the maximum likelihood of estimation. Here we maximize the "conditional likelihood" with given model parameters.


Generative models learn the distribution of the classes themselves. NBC is generative as they are adaptive to the distribution of the CLASSES. Here outliers are handled better than discriminative models. We maximize the joint likelihood in Generative models, otherwise called the joint probability given with the model parameter.


Define Data Normalization and its needs?

We spend so much time normalizing our data & giving our data clean and set it up. Data Normalizing is a preprocessing step to standardize our data. It can help to minimize and eliminate data redundancy by rescaling values to fit into the desired range to build convergence. And then, finally, we restructure the data & improve integrity. The need for data normalization is the input of clean data and out of clean data.


Elucidate cross-validation technique would you use on a time series dataset?

In normal cross-validation, such as K-Fold CV, we split the data into k equal size data chunks & use K -1 Chunks as Training, and remaining for the Training. We can average the performance of K tests to give some performance measures. We cannot include Samples into the CV that occurred later corresponding to the test point. The selection of a Point as test set marks demarcates the before occurrence as the train sets.


How is a decision tree pruned?

The pursuit of node purity depends on making a decision tree simple with a cost-effective method that helps to determine the functioning of the Original tree. Pruning helps in removing nodes and branches in a decision tree to achieve the highest accuracy. The Cost-effective pruning method determines the stagging of the original tree. If the validation set does not have a significant difference in its performance, then it is considered as a Simpler tree.


How do you handle an imbalanced dataset?

Working with a lot of data in the under-represented class should begin with random undersampling that involves getting rid of over-represented class samples from the training data. On the contrary, with fewer data to work with, performing random oversampling by taking the under-represented class with its replacement to achieve the required ratio. SMOTE or synthetic monitory oversampling is a technique that helps to synthesize data. The reason is aggregation tends to mitigate the over-fitting of a specific class.


Define a Boltzmann Machine?

It is a simplified version of a Mult-Layer perceptron that has a visible input layer and a hidden layer. The Boltzmann Machine is almost always shallow. They have these two-layer neural nets that make stochastic decisions, whether a neuron should on or off. Nodes forms connection across the layers but, no two nodes of the same layer are connected. Hence, it is also known as the Restricted Boltzmann Machine.


How will you differentiate between Classification and regression in ML?

The difference between Classification and regression depends upon the various types of prediction problems based on supervise and unsupervised learning. Altogether there are classification, regression, clustering & association problems. Classification helps in differentiating different types of data to separate categories based on input parameters. Regression, on the other hand, helps to create a model for distinguishing these data into real values. It can even predict the movement based on historical data. Mostly it is used in predicting the occurrence of an event depending on the degree of association of variable.


How will you define Linear Regression?

Mostly used for classification problems to predict the group to which the object in supervision belongs and, these possibilities have to be translated into binary values to get the prediction. It helps to measure the relationship between what we want to predict over the independent variable with the usage of logistic functions.


Explain how the Receiver operating characteristic curve works?

It has its roots in Signal Theory detection that is a binary classification problem. Thus it is used to measure the performance of Binary classifiers where you use it as a trade-off between True Positive and False Positive rates.

In conditions where both Sensitivity & fallout is zero, then the classifier prediction goes negative.


What is Auto ML?

It is a relatively new machine learning paradigm that governs the application of algorithms within the inputs of any type of data to get results automatically. It simplifies the work of a data scientist with preoccupied techniques.


How would you define the confusion matrix and its elements? Explain with examples.

A Confusion Matrix is a grid used to summarize the performance of classification algorithms.

Imagine we have a medical data grid that has metrics like:

  1. Chest Pain
  2. Good blood circulation
  3. Blocked Arteries
  4. Weight

We have to apply ML to predict whether or not someone can develop heart disease. There tons of methods like Logistic regression, K-Nearest neighbors, and Random Forest methods to get the results. Deciding the optimal way to coincide with our data is a critical problem. Thus, we have to divide the data into training and testing sets. We can make use of CV or cross-validation to eke out quicker and efficient results. Then we train the testing set by trailing with each method. Confusion Matrix gets us the analysis of records & methods we have devised in testing. In the Confusion Matrix:

  1. Rows correspond to the ML prediction of the algorithm.
  2. Columns correspond to the truth.

Since we have only two categories to choose from:

  1. Heart disease
  2. Does not have heart disease

Then the top left corner contains TRUE Positive. These are the patients who had heart disease and are correctly identified by the algorithm.

The TRUE Negatives are in the bottom-right corner. These are the patients who did not have heart disease.

The Bottom-left corner contains FALSE Negative. These patients had Heart disease, but the algorithm said they didn't.

Lastly, the top-right corner contains FALSE Positives. These are the patients who do not have Heart Disease, but the algorithm failed to predict it.

The only optimal metrics out of all will show a high degree of variation in its performance, unlike others who end up closer in their values. That is the reason we use sophisticated metrics like Sensitivity, Specificity, ROC, and AUC that are profound in getting accurate performance values.


Differentiate between Inductive and Deductive Learning?

We use data through a model to predict in Machine learning the inductive learning is used to conclude out of observations. Deductive learning infers the form of speculation out of the conclusion. Transductive-learning helps to create this loop of continuous inspection and framing inference.


How would you categorize using too many false positives or too many false negatives?

Taking sides depends upon the scenario, the domain where we are indulged in ML when used in the detection of spam emails, then false-positive will making important emails marked as spam.

And when ML gets used in Medical testing, then a false negative makes the paradigm risky by classifying the report excellent when things are not good with the patients.


How can you make a more accurate prediction?

Model Accuracy is a subset of Model Performance, where the accuracy and the performance of the model are directly proportional to each other. Thus the better the execution of the model gives us precise accuracy in the predictions.


Differentiate between Gini Impurity & Entropy in a Decision tree?

Gini Impurity and Entropy are the metrics that split a decision tree. To reduce the uncertainty in the output level, we make use of Entropy. It calculates the Information-gain by a split. The Gini Impurity metrics classifies a randomly picked label as per the distribution in its branch with the probability of a random sample.

Entropy determines the haziness or mess in the data decreasing our reach to get closer to the leaf node.


Illustrate the Ensemble learning technique used in ML?

To get accurate results, we devise the Ensemble Learning technique to Multiple Learning Models. In the Ensemble Learning model, the entire training data set get split to form multiple subsets & these are further used to build separate models. Later, we train these models & combine to predict outcomes in a way that the variance gets reduced to a bare minimum.


What are the similarities of Bagging & Boosting in ML?

Bagging & Boosting in ML are both Ensemble methods used to get N learns from 1 learner. They both help in the generation of several training datasets through random sampling. They are used to reduce variance & provide scalability. They help to reach the final decision by taking the averages of N learners.


Define Collinearity & Multicollinearity?

When we get multiple regression having the same correlation, then the collinearity occurs with the two predictor variables.

But when you find inter-correlation in two or more predictor variables, then multicollinearity occurs.

Ellucidate A/B testing?

A/B Testing directs end-users to ads, welcome emails, and web pages. It segments the results based on control & variance. This hypothesis works best for website optimization by gathering website performance data & reveal different versions of the webpage to the visitor.


How would you define cluster-random sampling?

This design uses randomly selected members from a large sample to subdivide the natural groups' population into clusters of gathered groups.


What are some of the types of Probability & Non-Probability Sampling?

In Probability sampling, there is an equal chance of all the members to get selected through random sampling & its types are as follows:

  1. Simple Random Sampling
  2. Stratified Random Sampling
  3. Systematic sampling
  4. Cluster Random Sampling

Non-Probability Sampling techniques collect in a way that doesn't give equal chances to all the units in the population and doesn't involve Random Sampling. Its types are as follows:

  1. Convenience Sampling
  2. Quota Sampling
  3. Judgment Sampling
  4. Snowball Sampling


Interview Questions on Machine learning With Python


List some libraries in Python devised for data analysis?

The most used Data Exploration libraries in Python are as follows:

  1. NumPy: Numerical Python contains Linear algebra function & advanced number capabilities.
  2. Pandas: Its usage is extensive for data mugging & data preparation for structured data operation with manipulations.
  3. Matplotlib
  4. Seaborn
  5. Bokeh
  6. SciKit

These are some of the intensively used libraries in Python for data exploration.


Can you signify the relationship between NumPy & SciPy?

NumPy being part of SciPy, can be defined as arrays like linear algebra functions and advanced number capabilities, indexing, sorting & reshaping. The SciPy in Machine learning assimilates computations such as optimization, numerical integration with NumPy's functionality.


What are the two tasks in Pants?

The task in Pants performs the work of one to one mapping between a task name and a goal name with the multiple-tasks registered in the goal. There are two very crucial tasks in the pants & they are as follows:

  1. Series
  2. Data Frame.


Define the usage of the Review Process?

It forms the crux of continuous delivery of Machine learning. With the help of the Review Process, we can alternate the data without changing the dataset.


What are Polymorphism and its types?

In Python, Polymorphism allows the users to define the ability & methods to take different forms in the child class with the same parent class. The two types of Polymorphism are Time Polymorphism & Run-Time Polymorphism.


What are Lambda functions in Python?

In Python Lambda, functions are Lightweight, nameless, or anonymous functions that accept any number of arguments creating inline functions having the only exception of having a single expression.




Machine learning is evolving each passing day. There was a day when people had never imagined in their wildest dream about computers. And today we are frightened to think of Positronic Robots taking over our jobs. The thing is it is gloomy and exciting at the same time. Nevertheless, we should focus on the bright side. The questions we have discussed here are important for candidates aiming for job that requires knowledge about machine learning.

We are a globally recognized ATO [An accredited training organization] called Sprintzeal. We have our knowledge engineering course training which is specifically designed for professionals with a keen interest in understanding Artificial Intelligence and Machine learning. And along with that we also offer training for multiple IT and Business related globally recognized certifications. You can choose from the wide range of courses we have and enhance your career.

Get Artificial Intelligence training – online, live online, and classroom

Suggested Read – How AI has Impacted Consumer Buying Behavior



About the Author

Sprintzeal   Aradhya Kumar

With years of experience and a vast amount of knowledge in Project Management, Agile Management, Scrum, and other popular domains, Aradhya Kumar is well-versed in creating content for audiences from various fields and industries.

Recommended Courses

Recommended Resources

Mitigate the Cyber-Attack Risks with Best Cyber Security Protocols

Mitigate the Cyber-Attack Risks with Best Cyber Security Protocols


CCNA Interview Questions and Answers in 2021

CCNA Interview Questions and Answers in 2021


Updated Data Analyst Interview Questions and Answers 2021

Updated Data Analyst Interview Questions and Answers 2021