NumPy Interview Questions: Arrays, Broadcasting, Views, Copies, Random, and Practical Coding
NumPy interviews usually test more than function names.
They check whether you understand how arrays are stored, why vectorization is faster than Python loops, when slicing creates a view, when indexing creates a copy, how broadcasting works, and how to solve small data problems without writing unnecessary loops.
This guide is written as a practical interview-preparation file. Each answer is original and uses simple examples from analytics, student scores, product data, images, and machine learning workflows.
You will prepare for questions about:
ndarray, shape, dtype, ndim, size, itemsize, and strides- arrays vs Python lists
- vectorization and ufuncs
- views, copies, shallow-looking array behavior, and
.base - basic indexing, boolean indexing, and fancy indexing
- broadcasting rules and
np.newaxis - axis-based aggregation
- sorting, ranking, filtering, clipping, and set operations
- random number generation with
default_rng allcloseand floating-point comparisonmeshgrid,swapaxes,tile,repeat, andcount_nonzero- image-like arrays
- structured arrays
- saving and loading NumPy data
- code-output and debugging interview tasks
How To Use This Guide
Read each question and try to answer it before reading the explanation.
For code-output questions, write the output on paper first. In interviews, the goal is not only to know NumPy functions. The goal is to reason from:
shape + dtype + axis + indexing ruleIf you can explain those four things clearly, most NumPy interview questions become manageable.
1. What Is NumPy?
NumPy is a Python library for numerical computing.
Its main object is the ndarray, which stores values in a compact, typed, multidimensional array.
Interview answer:
NumPy is used for fast numerical operations on arrays. It provides vectorized operations, broadcasting, efficient memory storage, mathematical functions, random number generation, and linear algebra tools. Many libraries such as Pandas, scikit-learn, TensorFlow, PyTorch, and image-processing tools use NumPy-style arrays internally or at their boundaries.
Example:
import numpy as np
prices = np.array([100, 150, 200])
discounted = prices * 0.9
print(discounted)Output:
[ 90. 135. 180.]2. What Is An ndarray?
An ndarray is NumPy's n-dimensional array object.
It is usually:
- multidimensional
- homogeneous, meaning values normally share one dtype
- fixed-size after creation
- optimized for numerical operations
Example:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(type(arr))
print(arr.shape)
print(arr.dtype)Output:
<class 'numpy.ndarray'>
(2, 3)
int64The exact integer dtype may be int32 on some systems.
3. How Is A NumPy Array Different From A Python List?
A Python list stores references to Python objects. It can hold mixed types and can grow dynamically.
A NumPy array stores fixed-size values of a common dtype in a compact memory layout.
Interview answer:
Lists are general-purpose containers. NumPy arrays are specialized numerical containers. Arrays support vectorized operations, use memory more compactly for numerical data, and work naturally with multidimensional shapes.
Example:
print([1, 2, 3] * 2)
print(np.array([1, 2, 3]) * 2)Output:
[1, 2, 3, 1, 2, 3]
[2 4 6]4. What Do shape, ndim, size, dtype, And itemsize Mean?
Use this array:
data = np.array([
[10, 20, 30],
[40, 50, 60],
], dtype=np.int32)Attributes:
print(data.shape)
print(data.ndim)
print(data.size)
print(data.dtype)
print(data.itemsize)Output:
(2, 3)
2
6
int32
4Meaning:
shape: size along each dimensionndim: number of dimensionssize: total number of elementsdtype: data type of each elementitemsize: bytes used by one element
5. What Are Strides?
Strides tell NumPy how many bytes to move in memory to reach the next element along each axis.
Example:
arr = np.arange(12, dtype=np.int32).reshape(3, 4)
print(arr)
print(arr.strides)Possible output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
(16, 4)Why?
- Each
int32value uses 4 bytes. - Moving one column moves 4 bytes.
- Moving one row moves 4 columns x 4 bytes = 16 bytes.
Interview answer:
Strides describe how an n-dimensional index maps to the underlying memory buffer. They are one reason NumPy can create views, transpose arrays, and slice arrays without always copying data.
6. What Is A View?
A view is a new array object that looks at the same underlying data as another array.
Example:
arr = np.array([10, 20, 30, 40])
view = arr[1:3]
view[0] = 999
print(arr)
print(view)Output:
[ 10 999 30 40]
[999 30]Changing the view changed the original.
Interview answer:
A view shares the original array's data buffer. It can be faster and memory-efficient, but changes through one array may appear in the other.
7. What Is A Copy?
A copy owns separate data.
Example:
arr = np.array([10, 20, 30, 40])
copy_arr = arr[1:3].copy()
copy_arr[0] = 999
print(arr)
print(copy_arr)Output:
[10 20 30 40]
[999 30]The original did not change.
8. How Can You Check Whether An Array Is A View?
Use .base as a learning/debugging clue.
arr = np.arange(6)
view = arr[1:4]
copy_arr = arr[[1, 2, 3]]
print(view.base is arr)
print(copy_arr.base is None)Output:
True
TrueImportant interview point:
- basic slicing usually creates views
- advanced indexing usually creates copies
9. What Is The Difference Between Basic Indexing And Advanced Indexing?
Basic indexing uses integers, slices, ellipsis, and None or np.newaxis.
Advanced indexing uses integer arrays or boolean arrays.
Example:
arr = np.arange(9).reshape(3, 3)
basic = arr[1:, :]
advanced = arr[[1, 2], :]
print(basic)
print(advanced)Both may look similar, but their memory behavior differs.
Interview answer:
Basic slicing generally returns a view. Advanced indexing generally returns a copy. This matters for memory usage and whether changes affect the original array.
10. What Is Boolean Indexing?
Boolean indexing selects values where a condition is true.
scores = np.array([45, 72, 88, 39, 91])
passed = scores[scores >= 50]
print(passed)Output:
[72 88 91]The condition creates a boolean mask:
print(scores >= 50)Output:
[False True True False True]11. What Is Fancy Indexing?
Fancy indexing selects values using arrays or lists of indexes.
scores = np.array([45, 72, 88, 39, 91])
selected = scores[[0, 2, 4]]
print(selected)Output:
[45 88 91]Fancy indexing is useful for selecting specific rows, columns, or records.
12. What Is Broadcasting?
Broadcasting is NumPy's rule for operating on arrays with different shapes.
Example:
matrix = np.array([
[10, 20, 30],
[40, 50, 60],
])
bonus = np.array([1, 2, 3])
print(matrix + bonus)Output:
[[11 22 33]
[41 52 63]]The 1D bonus array is applied across each row.
Interview answer:
Broadcasting lets NumPy perform element-wise operations on compatible shapes without physically copying the smaller array across the larger one.
13. What Are The Broadcasting Rules?
Compare shapes from right to left.
Two dimensions are compatible if:
- they are equal, or
- one of them is
1
Example:
(4, 3)
( 3)Compatible because the last dimension is 3.
Example:
(4, 3)
(4,)Not compatible because the trailing dimensions are 3 and 4.
Code:
a = np.zeros((4, 3))
b = np.array([1, 2, 3])
print((a + b).shape)Output:
(4, 3)14. How Does np.newaxis Help Broadcasting?
np.newaxis adds a dimension of size 1.
row = np.array([1, 2, 3])
column = row[:, np.newaxis]
print(row.shape)
print(column.shape)Output:
(3,)
(3, 1)Create an outer addition table:
a = np.array([10, 20, 30])
b = np.array([1, 2, 3, 4])
result = a[:, np.newaxis] + b
print(result)Output:
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]15. What Is Vectorization?
Vectorization means applying operations to entire arrays instead of writing Python loops over individual elements.
Loop version:
values = [10, 20, 30]
result = []
for value in values:
result.append(value * 2)NumPy version:
values = np.array([10, 20, 30])
result = values * 2Interview answer:
Vectorization is faster because NumPy performs the loop in optimized compiled code and avoids much of the overhead of Python-level iteration.
16. What Are Ufuncs?
Ufunc means universal function.
Ufuncs apply element-wise operations efficiently.
Examples:
arr = np.array([1, 4, 9, 16])
print(np.sqrt(arr))
print(np.add(arr, 10))Output:
[1. 2. 3. 4.]
[11 14 19 26]Common ufuncs include:
np.addnp.subtractnp.multiplynp.dividenp.sqrtnp.expnp.lognp.sin
17. What Is The Difference Between axis=0 And axis=1?
For a 2D array:
axis=0works down rows and returns one value per columnaxis=1works across columns and returns one value per row
Example:
marks = np.array([
[70, 80, 90],
[60, 75, 85],
])
print(marks.sum(axis=0))
print(marks.sum(axis=1))Output:
[130 155 175]
[240 220]Interview shortcut:
The axis you pass is the axis that gets reduced.
18. Why Use keepdims=True?
keepdims=True keeps reduced axes as dimensions of size 1.
This is useful for broadcasting.
marks = np.array([
[70, 80, 90],
[60, 75, 85],
])
row_mean = marks.mean(axis=1, keepdims=True)
centered = marks - row_mean
print(row_mean.shape)
print(centered)Output:
(2, 1)
[[-10. 0. 10.]
[-13.33333333 1.66666667 11.66666667]]Without keepdims=True, broadcasting may fail or mean something different.
19. What Is reshape()?
reshape() changes the shape of an array without changing the number of elements.
arr = np.arange(12)
matrix = arr.reshape(3, 4)
print(matrix)Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]This fails:
np.arange(12).reshape(5, 3)because 12 values cannot fill 15 positions.
20. Does reshape() Return A View Or A Copy?
Often, reshape() can return a view, but it depends on memory layout.
Interview answer:
reshape()returns a view when possible. If the requested shape cannot be represented with compatible strides, NumPy may need a copy or may raise an error in some in-place reshape situations.
Practical advice:
arr = np.arange(12)
reshaped = arr.reshape(3, 4)
print(reshaped.base is arr)Use .base only as a learning/debugging tool, not as business logic.
21. What Is The Difference Between ravel() And flatten()?
Both convert an array to 1D.
Important difference:
ravel()returns a view when possibleflatten()always returns a copy
Example:
matrix = np.arange(6).reshape(2, 3)
flat_view = matrix.ravel()
flat_copy = matrix.flatten()
flat_view[0] = 999
flat_copy[1] = 888
print(matrix)Output:
[[999 1 2]
[ 3 4 5]]flat_copy did not affect the original.
22. What Is The Difference Between transpose, .T, And swapaxes?
For 2D arrays, .T and transpose() both swap rows and columns.
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
])
print(matrix.T)Output:
[[1 4]
[2 5]
[3 6]]For higher dimensions, swapaxes() swaps two chosen axes.
arr = np.zeros((2, 3, 4))
print(np.swapaxes(arr, 0, 2).shape)Output:
(4, 3, 2)Interview answer:
.Treverses axes.transpose()can reorder axes explicitly.swapaxes()swaps exactly two axes.
23. What Is np.expand_dims()?
np.expand_dims() inserts a new axis.
arr = np.array([10, 20, 30])
row = np.expand_dims(arr, axis=0)
column = np.expand_dims(arr, axis=1)
print(row.shape)
print(column.shape)Output:
(1, 3)
(3, 1)It is commonly used when a model expects a batch dimension.
24. What Is np.squeeze()?
np.squeeze() removes axes of length 1.
arr = np.zeros((1, 3, 1, 4))
print(np.squeeze(arr).shape)Output:
(3, 4)Use it carefully. Removing a batch dimension accidentally can break model input shapes.
25. What Is The Difference Between np.concatenate, vstack, hstack, And stack?
concatenate joins arrays along an existing axis.
a = np.array([[1, 2]])
b = np.array([[3, 4]])
print(np.concatenate((a, b), axis=0))Output:
[[1 2]
[3 4]]vstack stacks vertically.
hstack stacks horizontally.
stack creates a new axis.
x = np.array([1, 2])
y = np.array([3, 4])
print(np.stack((x, y), axis=0))
print(np.stack((x, y), axis=1))Output:
[[1 2]
[3 4]]
[[1 3]
[2 4]]Interview answer:
Use
concatenatewhen joining along an existing dimension. Usestackwhen creating a new dimension.
26. What Is The Difference Between np.tile() And np.repeat()?
tile() repeats the whole array pattern.
arr = np.array([1, 2, 3])
print(np.tile(arr, 2))Output:
[1 2 3 1 2 3]repeat() repeats individual elements.
print(np.repeat(arr, 2))Output:
[1 1 2 2 3 3]For 2D arrays, axis controls the direction for repeat.
matrix = np.array([[1, 2], [3, 4]])
print(np.repeat(matrix, 2, axis=0))Output:
[[1 2]
[1 2]
[3 4]
[3 4]]27. What Is np.where()?
np.where() has two common uses.
Find positions:
scores = np.array([45, 80, 62, 30])
print(np.where(scores >= 60))Output:
(array([1, 2]),)Choose values conditionally:
labels = np.where(scores >= 60, "pass", "retry")
print(labels)Output:
['retry' 'pass' 'pass' 'retry']28. What Is np.clip()?
np.clip() limits values to a minimum and maximum.
values = np.array([-5, 10, 50, 120])
print(np.clip(values, 0, 100))Output:
[ 0 10 50 100]Use it for outlier control, image pixel limits, probability bounds, and safe feature ranges.
29. What Is np.count_nonzero()?
It counts non-zero values.
arr = np.array([
[1, 0, 3],
[0, 0, 6],
])
print(np.count_nonzero(arr))
print(np.count_nonzero(arr, axis=0))
print(np.count_nonzero(arr, axis=1))Output:
3
[1 0 2]
[2 1]It is often used to count true values because True behaves like 1 and False like 0.
scores = np.array([45, 80, 62, 30])
print(np.count_nonzero(scores >= 60))Output:
230. What Is np.allclose() And Why Is It Important?
Floating-point values can have tiny precision differences.
Do not compare floats using exact equality when small numerical error is expected.
a = np.array([0.1 + 0.2])
b = np.array([0.3])
print(a == b)
print(np.allclose(a, b))Output:
[False]
TrueInterview answer:
np.allclose()checks whether arrays are element-wise equal within a tolerance. It is useful for testing numerical code where tiny floating-point differences are acceptable.
31. What Is The Difference Between np.random.seed() And default_rng()?
np.random.seed() controls legacy global random state.
Modern NumPy code should prefer np.random.default_rng().
rng = np.random.default_rng(42)
print(rng.integers(1, 10, size=5))Interview answer:
default_rng()creates an independent random generator object. It avoids relying on shared global state and is the recommended approach for new code.
32. How Do You Generate Random Integers, Uniform Values, And Normal Values?
rng = np.random.default_rng(7)
integers = rng.integers(1, 101, size=(2, 3))
uniform_values = rng.uniform(0, 1, size=5)
normal_values = rng.normal(loc=0, scale=1, size=5)
print(integers)
print(uniform_values)
print(normal_values)Use:
integersfor random integer rangesuniformfor continuous values in a rangenormalfor Gaussian-like data
33. What Is The Difference Between shuffle() And choice()?
shuffle() rearranges an array in place.
rng = np.random.default_rng(10)
arr = np.array([1, 2, 3, 4, 5])
rng.shuffle(arr)
print(arr)choice() samples values.
rng = np.random.default_rng(10)
arr = np.array([1, 2, 3, 4, 5])
print(rng.choice(arr, size=3, replace=False))Use replace=False when the same item should not be selected twice.
34. What Is np.meshgrid()?
meshgrid() creates coordinate grids from coordinate vectors.
x = np.array([0, 1, 2])
y = np.array([10, 20])
xx, yy = np.meshgrid(x, y)
print(xx)
print(yy)Output:
[[0 1 2]
[0 1 2]]
[[10 10 10]
[20 20 20]]Interview answer:
meshgrid()is useful for evaluating a function on a 2D grid, plotting surfaces, creating coordinate maps, or generating image-style coordinate arrays.
35. What Are Structured Arrays?
Structured arrays let each element contain named fields.
students = np.array(
[
("Asha", 92, 8.7, True),
("Ravi", 78, 7.9, False),
],
dtype=[
("name", "U20"),
("score", "i4"),
("cgpa", "f4"),
("placed", "?"),
],
)
print(students["name"])
print(students["score"])Output:
['Asha' 'Ravi']
[92 78]Interview answer:
Structured arrays are useful when each record has named fields, but for general tabular analytics Pandas is often more convenient.
36. How Are Images Represented As NumPy Arrays?
A grayscale image can be a 2D array:
(height, width)A color image is often a 3D array:
(height, width, channels)For RGB images, channels are usually 3.
Common operations:
image = np.zeros((100, 200, 3), dtype=np.uint8)
print(image.shape)
print(image.dtype)Output:
(100, 200, 3)
uint8Examples:
flipped_vertical = np.flip(image, axis=0)
flipped_horizontal = np.flip(image, axis=1)
darkened = np.clip(image * 0.7, 0, 255).astype(np.uint8)
negative = 255 - image
cropped = image[20:80, 50:150]37. What Is The Difference Between np.save, np.load, And np.savetxt?
np.save() stores one array in NumPy's binary .npy format.
arr = np.array([1, 2, 3])
np.save("numbers.npy", arr)
loaded = np.load("numbers.npy")
print(loaded)np.savetxt() stores text data such as CSV-like output.
Binary .npy is usually better for preserving dtype and shape.
Use np.savez() or np.savez_compressed() for multiple arrays.
38. Code Output: Slicing View
Question:
arr = np.array([10, 20, 30, 40])
view = arr[1:3]
view[1] = 999
print(arr)Answer:
[ 10 20 999 40]Explanation:
view shares data with arr. view[1] corresponds to arr[2].
39. Code Output: Fancy Indexing Copy
Question:
arr = np.array([10, 20, 30, 40])
selected = arr[[1, 2]]
selected[0] = 999
print(arr)
print(selected)Answer:
[10 20 30 40]
[999 30]Fancy indexing returned a copy.
40. Code Output: Broadcasting
Question:
a = np.array([[1], [2], [3]])
b = np.array([10, 20, 30, 40])
print((a + b).shape)
print(a + b)Answer:
(3, 4)
[[11 21 31 41]
[12 22 32 42]
[13 23 33 43]]Shapes:
(3, 1)
(4,)Broadcast to:
(3, 4)41. Code Output: Axis Reduction
Question:
arr = np.array([
[1, 2, 3],
[4, 5, 6],
])
print(arr.sum(axis=0))
print(arr.sum(axis=1))Answer:
[5 7 9]
[ 6 15]42. Code Output: tile vs repeat
Question:
arr = np.array([1, 2, 3])
print(np.tile(arr, 2))
print(np.repeat(arr, 2))Answer:
[1 2 3 1 2 3]
[1 1 2 2 3 3]43. Code Output: allclose
Question:
a = np.array([0.1 + 0.2])
b = np.array([0.3])
print(a == b)
print(np.allclose(a, b))Answer:
[False]
TrueThe exact binary representation of decimal fractions can produce tiny differences.
44. Debugging: Why Does This Broadcasting Fail?
Question:
sales = np.zeros((4, 3))
bonus = np.array([1, 2, 3, 4])
sales + bonusAnswer:
This fails because shapes are:
(4, 3)
(4,)Broadcasting compares from the right:
3 vs 4They are not equal, and neither is 1.
Fix by making bonus a column:
bonus = bonus.reshape(4, 1)
print((sales + bonus).shape)Output:
(4, 3)45. Debugging: Why Did My Original Array Change?
Question:
data = np.arange(10)
part = data[2:5]
part[:] = -1
print(data)Answer:
part is a view created by slicing, so it shares memory with data.
Output:
[ 0 1 -1 -1 -1 5 6 7 8 9]Fix:
part = data[2:5].copy()46. Debugging: Why Is arr == np.nan Always False?
NaN is not equal to itself.
arr = np.array([1.0, np.nan, 3.0])
print(arr == np.nan)Output:
[False False False]Correct:
print(np.isnan(arr))Output:
[False True False]47. Debugging: Why Did Integer Division Become Float?
arr = np.array([1, 2, 3])
print((arr / 2).dtype)
print(arr // 2)Output:
float64
[0 1 1]/ performs true division and can produce floats. // performs floor division.
48. Coding Task: Normalize Each Row
Question:
Normalize each row using:
(row - row_min) / (row_max - row_min)Solution:
data = np.array([
[10, 20, 30],
[2, 4, 8],
[100, 150, 200],
])
row_min = data.min(axis=1, keepdims=True)
row_max = data.max(axis=1, keepdims=True)
normalized = (data - row_min) / (row_max - row_min)
print(normalized)Output:
[[0. 0.5 1. ]
[0. 0.33333333 1. ]
[0. 0.5 1. ]]49. Coding Task: Find Rows With Any Value Greater Than X
arr = np.array([
[1, 2, 3],
[10, 2, 1],
[3, 9, 4],
])
x = 6
rows = np.where((arr > x).any(axis=1))[0]
print(rows)Output:
[1 2]50. Coding Task: Remove Minimum And Maximum Values
Remove every occurrence of the minimum and maximum values.
arr = np.array([4, 9, 1, 3, 9, 2, 1, 7])
minimum = arr.min()
maximum = arr.max()
result = arr[(arr != minimum) & (arr != maximum)]
print(result)Output:
[4 3 2 7]51. Coding Task: Sort Rows By Second Column
data = np.array([
[101, 75],
[102, 92],
[103, 60],
])
sorted_rows = data[np.argsort(data[:, 1])]
print(sorted_rows)Output:
[[103 60]
[101 75]
[102 92]]Descending:
sorted_rows_desc = data[np.argsort(data[:, 1])[::-1]]52. Coding Task: Add Total Column And Get Top 2
marks = np.array([
[70, 80, 90],
[60, 75, 85],
[95, 91, 93],
[50, 65, 70],
])
total = marks.sum(axis=1, keepdims=True)
with_total = np.concatenate((marks, total), axis=1)
ranked = with_total[np.argsort(with_total[:, -1])[::-1]]
print(ranked[:2])Output:
[[ 95 91 93 279]
[ 70 80 90 240]]53. Coding Task: Unique Rows
records = np.array([
[1, 10],
[2, 20],
[1, 10],
[3, 30],
])
print(np.unique(records, axis=0))Output:
[[ 1 10]
[ 2 20]
[ 3 30]]54. Coding Task: Count Category Frequencies
labels = np.array(["free", "pro", "free", "team", "pro", "free"])
categories, counts = np.unique(labels, return_counts=True)
print(categories)
print(counts)Output:
['free' 'pro' 'team']
[3 2 1]55. Coding Task: Build A Distance Matrix
Given points on a line:
points = np.array([1, 4, 9])Create pairwise absolute distances.
distance = np.abs(points[:, np.newaxis] - points[np.newaxis, :])
print(distance)Output:
[[0 3 8]
[3 0 5]
[8 5 0]]This uses broadcasting.
56. Coding Task: Euclidean Distance From A Target Point
points = np.array([
[2, 3],
[5, 7],
[1, 8],
])
target = np.array([3, 4])
distances = np.sqrt(((points - target) ** 2).sum(axis=1))
print(distances)Output:
[1.41421356 3.60555128 4.47213595]57. Coding Task: Create A Checkerboard Matrix
board = np.zeros((6, 6), dtype=int)
board[::2, ::2] = 1
board[1::2, 1::2] = 1
print(board)Output:
[[1 0 1 0 1 0]
[0 1 0 1 0 1]
[1 0 1 0 1 0]
[0 1 0 1 0 1]
[1 0 1 0 1 0]
[0 1 0 1 0 1]]58. Coding Task: Replace Outliers With Boundary Values
values = np.array([5, 12, 40, 99, 120, -3])
cleaned = np.clip(values, 0, 100)
print(cleaned)Output:
[ 5 12 40 99 100 0]59. Coding Task: Find Common Product IDs
batch_a = np.array([101, 102, 103, 104])
batch_b = np.array([103, 104, 105, 106])
print(np.intersect1d(batch_a, batch_b))
print(np.setdiff1d(batch_a, batch_b))
print(np.union1d(batch_a, batch_b))Output:
[103 104]
[101 102]
[101 102 103 104 105 106]60. Coding Task: Use meshgrid To Evaluate A Function
x = np.array([0, 1, 2])
y = np.array([10, 20])
xx, yy = np.meshgrid(x, y)
z = xx + yy
print(z)Output:
[[10 11 12]
[20 21 22]]61. Interview Answer: How Would You Improve Slow NumPy Code?
Strong answer:
First, I would check whether the code is using Python loops over array elements. Then I would look for vectorization, broadcasting, ufuncs, axis-based reductions, and boolean masks. I would also avoid repeated appends inside loops because NumPy arrays are fixed-size; it is better to collect data first or preallocate the final array. Finally, I would check unnecessary copies, dtype choices, and memory layout if performance still matters.
62. Interview Answer: When Should You Not Use NumPy?
Strong answer:
NumPy is not ideal for mixed object-heavy data, heavily nested Python objects, row-by-row business logic, or datasets too large for memory unless paired with chunking or other tools. For labeled tabular data, Pandas is often more ergonomic. For GPU tensor work, PyTorch, TensorFlow, JAX, or CuPy may be better depending on the project.
63. Interview Answer: Why Can Broadcasting Be Dangerous?
Broadcasting can silently create a result with a valid but unintended shape.
Example:
a = np.ones((3, 1))
b = np.ones((1, 4))
print((a + b).shape)Output:
(3, 4)This is correct mathematically, but if you expected a 1D result, it is a bug.
Good habit:
print(a.shape, b.shape)before combining arrays.
64. Interview Answer: Why Can Copies Hurt Performance?
Copies use extra memory and time.
If you slice a huge array and can work with a view safely, it can be faster and more memory-efficient.
But views can cause accidental mutation.
Strong answer:
Views are efficient but share data. Copies are safer when independence matters. The right choice depends on whether the downstream code should be allowed to affect the original data.
65. Interview Answer: Why Does dtype Matter?
dtype controls:
- memory usage
- numerical range
- precision
- operation results
- compatibility with libraries
Example:
a = np.array([1, 2, 3], dtype=np.int8)
b = np.array([1, 2, 3], dtype=np.float64)
print(a.itemsize)
print(b.itemsize)Output:
1
8Using a smaller dtype can save memory, but it can also overflow if values exceed the dtype range.
66. Quick Revision Table
| Topic | Interview point |
|---|---|
ndarray | typed, multidimensional array |
shape | size of each dimension |
dtype | type and storage format of elements |
strides | bytes to move along each axis |
| view | shares data |
| copy | owns separate data |
| basic slicing | usually view |
| advanced indexing | usually copy |
| broadcasting | compatible shape expansion without manual loops |
axis | dimension being reduced or operated along |
keepdims | keeps reduced axes for broadcasting |
ravel | view when possible |
flatten | copy |
default_rng | recommended random generator constructor |
allclose | tolerance-based float comparison |
tile | repeats whole pattern |
repeat | repeats individual elements |
meshgrid | coordinate grids |
| structured array | records with named fields |
67. Rapid-Fire Interview Questions
1. What is NumPy mainly used for?
Fast numerical work with arrays.
2. What is the main NumPy object?
ndarray.
3. What does shape return?
A tuple showing the size of each dimension.
4. What does dtype tell you?
The type and storage format of each array element.
5. What does axis=0 mean in a 2D aggregation?
Reduce down rows and return one result per column.
6. What does axis=1 mean in a 2D aggregation?
Reduce across columns and return one result per row.
7. Does slicing copy data?
Basic slicing usually returns a view.
8. Does fancy indexing copy data?
Usually yes.
9. Why use copy()?
To avoid changing the original array when modifying selected data.
10. Why use np.allclose()?
To compare floating-point arrays with tolerance.
11. What is broadcasting?
Automatic shape compatibility for element-wise operations.
12. What is vectorization?
Using array operations instead of Python loops.
13. Why is vectorization faster?
The loop runs in optimized compiled code with less Python overhead.
14. What is np.where()?
A conditional selection function or a way to find matching positions.
15. What is np.argmax()?
It returns the index of the maximum value.
16. What is np.argmin()?
It returns the index of the minimum value.
17. What is np.unique(..., return_counts=True) used for?
Finding unique values and their frequencies.
18. What is np.clip() used for?
Limiting values to a minimum and maximum range.
19. What is np.meshgrid() used for?
Creating coordinate grids.
20. What is a structured array?
An array with named fields inside each record.
68. Practice Interview Set
Try these without looking at the answers first.
Question 1
Explain why NumPy arrays are faster than Python lists for numerical operations.
Question 2
Given an array with shape (5, 1) and another with shape (3,), what is the result shape after addition?
Question 3
What happens when you modify an array slice?
Question 4
Write code to select rows where any value is negative.
Question 5
Write code to normalize each column.
Question 6
Write code to get the top 3 values from a 1D array.
Question 7
Write code to find duplicate values in an array.
Question 8
Write code to replace NaN values with zero.
Question 9
Write code to create a 5 by 5 identity matrix.
Question 10
Write code to save and load a NumPy array.
69. Practice Interview Answers
Solution Key
Answer 1
NumPy arrays are faster because values are stored in a compact typed buffer, and operations run in optimized compiled code instead of Python-level loops.
Solution Key
Answer 2
Shapes:
(5, 1)
(3,)Result:
(5, 3)Solution Key
Answer 3
Basic slices usually create views, so modifying the slice can modify the original array.
Solution Key
Answer 4
arr = np.array([
[1, 2, 3],
[4, -1, 6],
[7, 8, 9],
])
rows = arr[(arr < 0).any(axis=1)]
print(rows)Explanation
- A 2D NumPy array
arris created with integers, including a negative value (-1). - The expression
(arr < 0).any(axis=1)generates a boolean array indicating which rows contain at least one negative value. - The original array
arris indexed with this boolean array to extract the rows that meet the condition. - The resulting rows are stored in the variable
rowsand printed, showing only the rows with negative values.
Solution Key
Answer 5
data = np.array([
[10, 100],
[20, 150],
[30, 200],
])
col_min = data.min(axis=0, keepdims=True)
col_max = data.max(axis=0, keepdims=True)
normalized = (data - col_min) / (col_max - col_min)
print(normalized)Explanation
- The code initializes a 2D NumPy array named
datawith specific values. - It calculates the minimum values for each column using
data.min(axis=0, keepdims=True), preserving the array's dimensions. - Similarly, it computes the maximum values for each column with
data.max(axis=0, keepdims=True). - The normalization formula
(data - col_min) / (col_max - col_min)is applied to scale the data to a range between 0 and 1. - Finally, the normalized array is printed, showing the transformed values.
Solution Key
Answer 6
arr = np.array([12, 99, 4, 42, 18, 77])
top_3 = np.sort(arr)[-3:][::-1]
print(top_3)Explanation
- The code initializes a NumPy array
arrcontaining six integer values. - It sorts the array in ascending order using
np.sort(arr). - The last three elements of the sorted array, which are the highest values, are selected with
[-3:]. - The selected values are then reversed to present them in descending order using
[::-1]. - Finally, the top three values are printed to the console.
Solution Key
Answer 7
arr = np.array([1, 2, 2, 3, 4, 4, 4])
values, counts = np.unique(arr, return_counts=True)
duplicates = values[counts > 1]
print(duplicates)Explanation
- The code initializes a NumPy array
arrcontaining integers, some of which are duplicated. - It uses
np.unique()to find unique values in the array while also counting their occurrences, returning two arrays:valuesandcounts. - The
duplicatesarray is created by filteringvalueswhere the correspondingcountsare greater than 1, indicating duplicates. - Finally, it prints the
duplicatesarray, which contains the values that appear more than once in the original array.
Solution Key
Answer 8
arr = np.array([1.0, np.nan, 3.0, np.nan])
cleaned = np.where(np.isnan(arr), 0, arr)
print(cleaned)Explanation
- The code initializes a NumPy array
arrcontaining floating-point numbers, includingNaNvalues. - It uses
np.isnan(arr)to create a boolean mask identifying theNaNelements in the array. - The
np.wherefunction replacesNaNvalues with0while keeping other values unchanged. - The resulting array
cleanedis printed, showing the original values withNaNreplaced by0.
Alternative:
cleaned = np.nan_to_num(arr, nan=0.0)Explanation
- Utilizes the
np.nan_to_num()function from the NumPy library to handle NaN values. - The input
arris a NumPy array that may contain NaN (Not a Number) entries. - Any NaN values found in
arrare replaced with0.0, ensuring the output arraycleanedhas no NaN values. - This is useful for data preprocessing, especially before performing mathematical operations or analyses that cannot handle NaN values.
Solution Key
Answer 9
identity = np.eye(5)
print(identity)Explanation
- The code utilizes the NumPy library, which is commonly used for numerical operations in Python.
np.eye(5)creates a 5x5 identity matrix, where all the diagonal elements are 1 and all other elements are 0.- The
print(identity)statement outputs the generated identity matrix to the console. - Identity matrices are useful in various mathematical computations, including linear algebra and transformations.
Solution Key
Answer 10
arr = np.array([[1, 2], [3, 4]])
np.save("arr.npy", arr)
loaded = np.load("arr.npy")
print(loaded)Explanation
- The code creates a 2D NumPy array named
arrcontaining the values[[1, 2], [3, 4]]. - It uses
np.saveto save the array to a file called "arr.npy" in binary format. - The array is then loaded back into memory using
np.load, retrieving the saved data into the variableloaded. - Finally, the loaded array is printed to the console, displaying its contents.
70. Common Mistakes To Avoid
Mistake 1: Not checking shapes
Most NumPy bugs are shape bugs.
Always inspect:
print(arr.shape)Explanation
- The code uses the
printfunction to output information to the console. arr.shapeaccesses theshapeattribute of a NumPy array, which returns a tuple representing the dimensions of the array.- This is useful for understanding the structure of the data, such as the number of rows and columns in a 2D array.
- The output will vary depending on the specific shape of the
arrarray being analyzed.
Mistake 2: Confusing views and copies
If you modify a slice and the original changes, you probably had a view.
Use:
arr[2:5].copy()Explanation
- The code accesses a portion of the list
arrfrom index 2 to index 4 (5 is exclusive). - The
copy()method is called on the sliced portion, ensuring that a new list is created rather than a reference to the original. - This is useful for modifying the copied list without affecting the original list.
- The resulting copied list contains the elements from the specified range of the original list.
when independence matters.
Mistake 3: Comparing floats with exact equality
Use:
np.allclose(a, b)Explanation
- Utilizes the NumPy library function
np.allclose()to compare two arrays,aandb. - Returns
Trueif all elements of the arrays are equal within a specified tolerance, otherwise returnsFalse. - Useful for numerical comparisons where floating-point precision issues may arise.
- The function allows for customization of relative and absolute tolerances through optional parameters.
when tiny numerical differences are acceptable.
Mistake 4: Using loops for simple array operations
Prefer:
arr * 2
arr[arr > 0]
arr.sum(axis=1)Explanation
- The expression
arr * 2scales each element of the arrayarrby a factor of 2, effectively doubling its values. - The expression
arr[arr > 0]filters the array to include only the elements that are greater than zero, creating a new array with positive values. - The method
arr.sum(axis=1)computes the sum of elements along the specified axis (rows in this case), returning a new array with the sum of each row's elements.
over manual loops when possible.
Mistake 5: Using np.append() repeatedly in a loop
NumPy arrays are fixed-size. Repeated appends create repeated allocations.
Better approaches:
- collect values in a Python list, then convert once
- preallocate the final NumPy array
- use
concatenateonce when possible
Final Summary
For NumPy interviews, remember these core ideas:
ndarrayis a typed multidimensional array.- Shape tells you structure; dtype tells you storage and numerical behavior.
- Strides explain how NumPy walks through memory.
- Basic slicing usually creates views.
- Advanced indexing usually creates copies.
- Broadcasting compares shapes from right to left.
- Vectorization avoids Python-level loops.
axistells NumPy which dimension to operate over or reduce.keepdims=Truekeeps dimensions useful for broadcasting.default_rng()is preferred for modern random number generation.allclose()is better than exact equality for many floating-point checks.- Image data, structured records, and grids are all natural NumPy use cases.
The best interview answers are short, accurate, and supported by a small example.
Sources and Further Reading
- NumPy documentation: https://numpy.org/doc/
- NumPy ndarray reference: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html
- NumPy ndarray user guide: https://numpy.org/doc/stable/reference/arrays.ndarray.html
- NumPy copies and views: https://numpy.org/doc/stable/user/basics.copies.html
- NumPy broadcasting guide: https://numpy.org/doc/stable/user/basics.broadcasting.html
- NumPy random Generator: https://numpy.org/doc/stable/reference/random/generator.html
- NumPy strides reference: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.strides.html
- NumPy allclose reference: https://numpy.org/doc/stable/reference/generated/numpy.allclose.html
- NumPy structured arrays: https://numpy.org/doc/stable/user/basics.rec.html
- NumPy I/O routines: https://numpy.org/doc/stable/reference/routines.io.html
