October 16, 2018

Srikaanth

Hexacta Most Frequently Asked Data Science Interview Questions Answers

What is Linear Regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

 What is Interpolation and Extrapolation?

Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

A coin of diameter 1-inches is thrown on a table covered with a grid of lines each two inches apart. What is the probability that the coin lands inside a square without touching any of the lines of the grid? You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly.

You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly.

A) 1/2

B) 1/4

C) Π/3

D) 1/3

Ans: (B)

Think about where all the center of the coin can be when it lands on 2 inches grid and it not touching the lines of the grid.

If the yellow region is a 1 inch square and the outside square is of 2 inches. If the center falls in the yellow region, the coin will not touch the grid line. Since the total area is 4 and the area of the yellow region is 1, the probability is ¼ .

Consider the following probability density function: What is the probability for X≤6 i.e. P(x≤6)

f(x)=1/8e-x/8 forx>=0

What is the probability for X≤6 i.e. P(x≤6)

A) 0.3935
B) 0.5276
C) 0.1341
D) 0.4724

Ans: (B)

To calculate the area of a particular region of a probability density function, we need to integrate the function under the bounds of the values for which we need to calculate the probability.

Therefore on integrating the given function from 0 to 6, we get 0.5276
Hexacta Most Frequently Asked Data Science Interview Questions Answers
Hexacta Most Frequently Asked Data Science Interview Questions Answers

Do gradient descent methods always converge to same point?

No, they do not because in some cases it reaches a local minima or a local optima point. You don’t reach the global optima point. It depends on the data and starting conditions

During analysis, how do you treat missing values?

The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored.There are various factors to be considered when answering this question-

Understand the problem statement, understand the data and then give the answer.Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important.
If it is a categorical variable, the default value is assigned. The missing value is assigned a default value.
If you have a distribution of data coming, for normal distribution give the mean value.
Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.

Explain about the box cox transformation in regression models.

For some reason or the other, the response variable for a regression analysis might not satisfy one or more assumptions of an ordinary least squares regression. The residuals could either curve as the prediction increases or  follow skewed distribution. In such scenarios, it is necessary to transform the response variable so that the data  meets the required assumptions. A Box cox transformation is a statistical technique to transform non-mornla dependent variables into a normal shape. If the given data is not normal then most of the statistical techniques assume normality. Applying a box cox transformation means that you can run a broader number of tests.

Can you use machine learning for time series analysis?

Yes, it can be used but it depends on the applications.

Write a function that takes in two sorted lists and outputs a sorted list that is their union.

First solution which will come to your mind is to merge two lists and short them afterwards

Python code-
def return_union(list_a, list_b):
    return sorted(list_a + list_b)

R code-
return_union <- function(list_a, list_b)
{
list_c<-list(c(unlist(list_a),unlist(list_b)))
return(list(list_c[[1]][order(list_c[[1]])]))
}

Generally, the tricky part of the question is not to use any sorting or ordering function. In that case you will have to write your own logic to answer the question and impress your interviewer.

Python code-
def return_union(list_a, list_b):
    len1 = len(list_a)
    len2 = len(list_b)
    final_sorted_list = []
    j = 0
    k = 0

    for i in range(len1+len2):
        if k == len1:
            final_sorted_list.extend(list_b[j:])
            break
        elif j == len2:
            final_sorted_list.extend(list_a[k:])
            break
        elif list_a[k] < list_b[j]:
            final_sorted_list.append(list_a[k])
            k += 1
        else:
            final_sorted_list.append(list_b[j])
            j += 1
    return final_sorted_list

Similar function can be returned in R as well by following the similar steps.

return_union <- function(list_a,list_b)
{
#Initializing length variables
len_a <- length(list_a)
len_b <- length(list_b)
len <- len_a + len_b

#initializing counter variables

j=1
k=1

#Creating an empty list which has length equal to sum of both the lists

list_c <- list(rep(NA,len))

#Here goes our for loop

for(i in 1:len)
  {
    if(j>len_a)
      {
        list_c[i:len] <- list_b[k:len_b]
        break
      }
    else if(k>len_b)
      {
        list_c[i:len] <- list_a[j:len_a]
        break
      }
    else if(list_a[[j]] <= list_b[[k]])
      {
        list_c[[i]] <- list_a[[j]]
        j <- j+1
      }
    else if(list_a[[j]] > list_b[[k]])
    {
      list_c[[i]] <- list_b[[k]]
      k <- k+1
    }
  }
  return(list(unlist(list_c)))

  }

https://mytecbooks.blogspot.com/2018/10/hexacta-most-frequently-asked-data.html
Subscribe to get more Posts :