# Essential NumPy Interview Questions for Data Science Candidates URL: https://madhudadi.in/blog/posts/numpy-interview-questions-arrays-broadcasting-more Published: 2026-05-29 Tags: python, interview, Numpy Read time: 35 min Difficulty: intermediate > Prepare for NumPy interviews with original questions and answers on ndarray basics, dtype, shape, strides, broadcasting, views vs copies, indexing, vectorization, random numbers, image arrays, structured arrays, file I/O, and practical coding tasks.# NumPy Interview Questions: Arrays, Broadcasting, Views, Copies, Random, and Practical Coding NumPy interviews usually test more than function names. They check whether you understand how arrays are stored, why vectorization is faster than Python loops, when slicing creates a view, when indexing creates a copy, how broadcasting works, and how to solve small data problems without writing unnecessary loops. This guide is written as a practical interview-preparation file. Each answer is original and uses simple examples from analytics, student scores, product data, images, and machine learning workflows. You will prepare for questions about: - `ndarray`, shape, dtype, ndim, size, itemsize, and strides - arrays vs Python lists - vectorization and ufuncs - views, copies, shallow-looking array behavior, and `.base` - basic indexing, boolean indexing, and fancy indexing - broadcasting rules and `np.newaxis` - axis-based aggregation - sorting, ranking, filtering, clipping, and set operations - random number generation with `default_rng` - `allclose` and floating-point comparison - `meshgrid`, `swapaxes`, `tile`, `repeat`, and `count_nonzero` - image-like arrays - structured arrays - saving and loading NumPy data - code-output and debugging interview tasks ## How To Use This Guide Read each question and try to answer it before reading the explanation. For code-output questions, write the output on paper first. In interviews, the goal is not only to know NumPy functions. The goal is to reason from: ```text shape + dtype + axis + indexing rule ``` If you can explain those four things clearly, most NumPy interview questions become manageable. ## 1. What Is NumPy? NumPy is a Python library for numerical computing. Its main object is the `ndarray`, which stores values in a compact, typed, multidimensional array. Interview answer: > NumPy is used for fast numerical operations on arrays. It provides vectorized operations, broadcasting, efficient memory storage, mathematical functions, random number generation, and linear algebra tools. Many libraries such as Pandas, scikit-learn, TensorFlow, PyTorch, and image-processing tools use NumPy-style arrays internally or at their boundaries. Example: ```python import numpy as np prices = np.array([100, 150, 200]) discounted = prices * 0.9 print(discounted) ``` Output: ```text [ 90. 135. 180.] ``` ## 2. What Is An `ndarray`? An `ndarray` is NumPy's n-dimensional array object. It is usually: - multidimensional - homogeneous, meaning values normally share one dtype - fixed-size after creation - optimized for numerical operations Example: ```python arr = np.array([[1, 2, 3], [4, 5, 6]]) print(type(arr)) print(arr.shape) print(arr.dtype) ``` Output: ```text (2, 3) int64 ``` The exact integer dtype may be `int32` on some systems. ## 3. How Is A NumPy Array Different From A Python List? A Python list stores references to Python objects. It can hold mixed types and can grow dynamically. A NumPy array stores fixed-size values of a common dtype in a compact memory layout. Interview answer: > Lists are general-purpose containers. NumPy arrays are specialized numerical containers. Arrays support vectorized operations, use memory more compactly for numerical data, and work naturally with multidimensional shapes. Example: ```python print([1, 2, 3] * 2) print(np.array([1, 2, 3]) * 2) ``` Output: ```text [1, 2, 3, 1, 2, 3] [2 4 6] ``` ## 4. What Do `shape`, `ndim`, `size`, `dtype`, And `itemsize` Mean? Use this array: ```python data = np.array([ [10, 20, 30], [40, 50, 60], ], dtype=np.int32) ``` Attributes: ```python print(data.shape) print(data.ndim) print(data.size) print(data.dtype) print(data.itemsize) ``` Output: ```text (2, 3) 2 6 int32 4 ``` Meaning: - `shape`: size along each dimension - `ndim`: number of dimensions - `size`: total number of elements - `dtype`: data type of each element - `itemsize`: bytes used by one element ## 5. What Are Strides? Strides tell NumPy how many bytes to move in memory to reach the next element along each axis. Example: ```python arr = np.arange(12, dtype=np.int32).reshape(3, 4) print(arr) print(arr.strides) ``` Possible output: ```text [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] (16, 4) ``` Why? - Each `int32` value uses 4 bytes. - Moving one column moves 4 bytes. - Moving one row moves 4 columns x 4 bytes = 16 bytes. Interview answer: > Strides describe how an n-dimensional index maps to the underlying memory buffer. They are one reason NumPy can create views, transpose arrays, and slice arrays without always copying data. ## 6. What Is A View? A view is a new array object that looks at the same underlying data as another array. Example: ```python arr = np.array([10, 20, 30, 40]) view = arr[1:3] view[0] = 999 print(arr) print(view) ``` Output: ```text [ 10 999 30 40] [999 30] ``` Changing the view changed the original. Interview answer: > A view shares the original array's data buffer. It can be faster and memory-efficient, but changes through one array may appear in the other. ## 7. What Is A Copy? A copy owns separate data. Example: ```python arr = np.array([10, 20, 30, 40]) copy_arr = arr[1:3].copy() copy_arr[0] = 999 print(arr) print(copy_arr) ``` Output: ```text [10 20 30 40] [999 30] ``` The original did not change. ## 8. How Can You Check Whether An Array Is A View? Use `.base` as a learning/debugging clue. ```python arr = np.arange(6) view = arr[1:4] copy_arr = arr[[1, 2, 3]] print(view.base is arr) print(copy_arr.base is None) ``` Output: ```text True True ``` Important interview point: - basic slicing usually creates views - advanced indexing usually creates copies ## 9. What Is The Difference Between Basic Indexing And Advanced Indexing? Basic indexing uses integers, slices, ellipsis, and `None` or `np.newaxis`. Advanced indexing uses integer arrays or boolean arrays. Example: ```python arr = np.arange(9).reshape(3, 3) basic = arr[1:, :] advanced = arr[[1, 2], :] print(basic) print(advanced) ``` Both may look similar, but their memory behavior differs. Interview answer: > Basic slicing generally returns a view. Advanced indexing generally returns a copy. This matters for memory usage and whether changes affect the original array. ## 10. What Is Boolean Indexing? Boolean indexing selects values where a condition is true. ```python scores = np.array([45, 72, 88, 39, 91]) passed = scores[scores >= 50] print(passed) ``` Output: ```text [72 88 91] ``` The condition creates a boolean mask: ```python print(scores >= 50) ``` Output: ```text [False True True False True] ``` ## 11. What Is Fancy Indexing? Fancy indexing selects values using arrays or lists of indexes. ```python scores = np.array([45, 72, 88, 39, 91]) selected = scores[[0, 2, 4]] print(selected) ``` Output: ```text [45 88 91] ``` Fancy indexing is useful for selecting specific rows, columns, or records. ## 12. What Is Broadcasting? Broadcasting is NumPy's rule for operating on arrays with different shapes. Example: ```python matrix = np.array([ [10, 20, 30], [40, 50, 60], ]) bonus = np.array([1, 2, 3]) print(matrix + bonus) ``` Output: ```text [[11 22 33] [41 52 63]] ``` The 1D `bonus` array is applied across each row. Interview answer: > Broadcasting lets NumPy perform element-wise operations on compatible shapes without physically copying the smaller array across the larger one. ## 13. What Are The Broadcasting Rules? Compare shapes from right to left. Two dimensions are compatible if: - they are equal, or - one of them is `1` Example: ```text (4, 3) ( 3) ``` Compatible because the last dimension is `3`. Example: ```text (4, 3) (4,) ``` Not compatible because the trailing dimensions are `3` and `4`. Code: ```python a = np.zeros((4, 3)) b = np.array([1, 2, 3]) print((a + b).shape) ``` Output: ```text (4, 3) ``` ## 14. How Does `np.newaxis` Help Broadcasting? `np.newaxis` adds a dimension of size 1. ```python row = np.array([1, 2, 3]) column = row[:, np.newaxis] print(row.shape) print(column.shape) ``` Output: ```text (3,) (3, 1) ``` Create an outer addition table: ```python a = np.array([10, 20, 30]) b = np.array([1, 2, 3, 4]) result = a[:, np.newaxis] + b print(result) ``` Output: ```text [[11 12 13 14] [21 22 23 24] [31 32 33 34]] ``` ## 15. What Is Vectorization? Vectorization means applying operations to entire arrays instead of writing Python loops over individual elements. Loop version: ```python values = [10, 20, 30] result = [] for value in values: result.append(value * 2) ``` NumPy version: ```python values = np.array([10, 20, 30]) result = values * 2 ``` Interview answer: > Vectorization is faster because NumPy performs the loop in optimized compiled code and avoids much of the overhead of Python-level iteration. ## 16. What Are Ufuncs? Ufunc means universal function. Ufuncs apply element-wise operations efficiently. Examples: ```python arr = np.array([1, 4, 9, 16]) print(np.sqrt(arr)) print(np.add(arr, 10)) ``` Output: ```text [1. 2. 3. 4.] [11 14 19 26] ``` Common ufuncs include: - `np.add` - `np.subtract` - `np.multiply` - `np.divide` - `np.sqrt` - `np.exp` - `np.log` - `np.sin` ## 17. What Is The Difference Between `axis=0` And `axis=1`? For a 2D array: - `axis=0` works down rows and returns one value per column - `axis=1` works across columns and returns one value per row Example: ```python marks = np.array([ [70, 80, 90], [60, 75, 85], ]) print(marks.sum(axis=0)) print(marks.sum(axis=1)) ``` Output: ```text [130 155 175] [240 220] ``` Interview shortcut: > The axis you pass is the axis that gets reduced. ## 18. Why Use `keepdims=True`? `keepdims=True` keeps reduced axes as dimensions of size 1. This is useful for broadcasting. ```python marks = np.array([ [70, 80, 90], [60, 75, 85], ]) row_mean = marks.mean(axis=1, keepdims=True) centered = marks - row_mean print(row_mean.shape) print(centered) ``` Output: ```text (2, 1) [[-10. 0. 10.] [-13.33333333 1.66666667 11.66666667]] ``` Without `keepdims=True`, broadcasting may fail or mean something different. ## 19. What Is `reshape()`? `reshape()` changes the shape of an array without changing the number of elements. ```python arr = np.arange(12) matrix = arr.reshape(3, 4) print(matrix) ``` Output: ```text [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] ``` This fails: ```python np.arange(12).reshape(5, 3) ``` because 12 values cannot fill 15 positions. ## 20. Does `reshape()` Return A View Or A Copy? Often, `reshape()` can return a view, but it depends on memory layout. Interview answer: > `reshape()` returns a view when possible. If the requested shape cannot be represented with compatible strides, NumPy may need a copy or may raise an error in some in-place reshape situations. Practical advice: ```python arr = np.arange(12) reshaped = arr.reshape(3, 4) print(reshaped.base is arr) ``` Use `.base` only as a learning/debugging tool, not as business logic. ## 21. What Is The Difference Between `ravel()` And `flatten()`? Both convert an array to 1D. Important difference: - `ravel()` returns a view when possible - `flatten()` always returns a copy Example: ```python matrix = np.arange(6).reshape(2, 3) flat_view = matrix.ravel() flat_copy = matrix.flatten() flat_view[0] = 999 flat_copy[1] = 888 print(matrix) ``` Output: ```text [[999 1 2] [ 3 4 5]] ``` `flat_copy` did not affect the original. ## 22. What Is The Difference Between `transpose`, `.T`, And `swapaxes`? For 2D arrays, `.T` and `transpose()` both swap rows and columns. ```python matrix = np.array([ [1, 2, 3], [4, 5, 6], ]) print(matrix.T) ``` Output: ```text [[1 4] [2 5] [3 6]] ``` For higher dimensions, `swapaxes()` swaps two chosen axes. ```python arr = np.zeros((2, 3, 4)) print(np.swapaxes(arr, 0, 2).shape) ``` Output: ```text (4, 3, 2) ``` Interview answer: > `.T` reverses axes. `transpose()` can reorder axes explicitly. `swapaxes()` swaps exactly two axes. ## 23. What Is `np.expand_dims()`? `np.expand_dims()` inserts a new axis. ```python arr = np.array([10, 20, 30]) row = np.expand_dims(arr, axis=0) column = np.expand_dims(arr, axis=1) print(row.shape) print(column.shape) ``` Output: ```text (1, 3) (3, 1) ``` It is commonly used when a model expects a batch dimension. ## 24. What Is `np.squeeze()`? `np.squeeze()` removes axes of length 1. ```python arr = np.zeros((1, 3, 1, 4)) print(np.squeeze(arr).shape) ``` Output: ```text (3, 4) ``` Use it carefully. Removing a batch dimension accidentally can break model input shapes. ## 25. What Is The Difference Between `np.concatenate`, `vstack`, `hstack`, And `stack`? `concatenate` joins arrays along an existing axis. ```python a = np.array([[1, 2]]) b = np.array([[3, 4]]) print(np.concatenate((a, b), axis=0)) ``` Output: ```text [[1 2] [3 4]] ``` `vstack` stacks vertically. `hstack` stacks horizontally. `stack` creates a new axis. ```python x = np.array([1, 2]) y = np.array([3, 4]) print(np.stack((x, y), axis=0)) print(np.stack((x, y), axis=1)) ``` Output: ```text [[1 2] [3 4]] [[1 3] [2 4]] ``` Interview answer: > Use `concatenate` when joining along an existing dimension. Use `stack` when creating a new dimension. ## 26. What Is The Difference Between `np.tile()` And `np.repeat()`? `tile()` repeats the whole array pattern. ```python arr = np.array([1, 2, 3]) print(np.tile(arr, 2)) ``` Output: ```text [1 2 3 1 2 3] ``` `repeat()` repeats individual elements. ```python print(np.repeat(arr, 2)) ``` Output: ```text [1 1 2 2 3 3] ``` For 2D arrays, `axis` controls the direction for `repeat`. ```python matrix = np.array([[1, 2], [3, 4]]) print(np.repeat(matrix, 2, axis=0)) ``` Output: ```text [[1 2] [1 2] [3 4] [3 4]] ``` ## 27. What Is `np.where()`? `np.where()` has two common uses. Find positions: ```python scores = np.array([45, 80, 62, 30]) print(np.where(scores >= 60)) ``` Output: ```text (array([1, 2]),) ``` Choose values conditionally: ```python labels = np.where(scores >= 60, "pass", "retry") print(labels) ``` Output: ```text ['retry' 'pass' 'pass' 'retry'] ``` ## 28. What Is `np.clip()`? `np.clip()` limits values to a minimum and maximum. ```python values = np.array([-5, 10, 50, 120]) print(np.clip(values, 0, 100)) ``` Output: ```text [ 0 10 50 100] ``` Use it for outlier control, image pixel limits, probability bounds, and safe feature ranges. ## 29. What Is `np.count_nonzero()`? It counts non-zero values. ```python arr = np.array([ [1, 0, 3], [0, 0, 6], ]) print(np.count_nonzero(arr)) print(np.count_nonzero(arr, axis=0)) print(np.count_nonzero(arr, axis=1)) ``` Output: ```text 3 [1 0 2] [2 1] ``` It is often used to count true values because `True` behaves like 1 and `False` like 0. ```python scores = np.array([45, 80, 62, 30]) print(np.count_nonzero(scores >= 60)) ``` Output: ```text 2 ``` ## 30. What Is `np.allclose()` And Why Is It Important? Floating-point values can have tiny precision differences. Do not compare floats using exact equality when small numerical error is expected. ```python a = np.array([0.1 + 0.2]) b = np.array([0.3]) print(a == b) print(np.allclose(a, b)) ``` Output: ```text [False] True ``` Interview answer: > `np.allclose()` checks whether arrays are element-wise equal within a tolerance. It is useful for testing numerical code where tiny floating-point differences are acceptable. ## 31. What Is The Difference Between `np.random.seed()` And `default_rng()`? `np.random.seed()` controls legacy global random state. Modern NumPy code should prefer `np.random.default_rng()`. ```python rng = np.random.default_rng(42) print(rng.integers(1, 10, size=5)) ``` Interview answer: > `default_rng()` creates an independent random generator object. It avoids relying on shared global state and is the recommended approach for new code. ## 32. How Do You Generate Random Integers, Uniform Values, And Normal Values? ```python rng = np.random.default_rng(7) integers = rng.integers(1, 101, size=(2, 3)) uniform_values = rng.uniform(0, 1, size=5) normal_values = rng.normal(loc=0, scale=1, size=5) print(integers) print(uniform_values) print(normal_values) ``` Use: - `integers` for random integer ranges - `uniform` for continuous values in a range - `normal` for Gaussian-like data ## 33. What Is The Difference Between `shuffle()` And `choice()`? `shuffle()` rearranges an array in place. ```python rng = np.random.default_rng(10) arr = np.array([1, 2, 3, 4, 5]) rng.shuffle(arr) print(arr) ``` `choice()` samples values. ```python rng = np.random.default_rng(10) arr = np.array([1, 2, 3, 4, 5]) print(rng.choice(arr, size=3, replace=False)) ``` Use `replace=False` when the same item should not be selected twice. ## 34. What Is `np.meshgrid()`? `meshgrid()` creates coordinate grids from coordinate vectors. ```python x = np.array([0, 1, 2]) y = np.array([10, 20]) xx, yy = np.meshgrid(x, y) print(xx) print(yy) ``` Output: ```text [[0 1 2] [0 1 2]] [[10 10 10] [20 20 20]] ``` Interview answer: > `meshgrid()` is useful for evaluating a function on a 2D grid, plotting surfaces, creating coordinate maps, or generating image-style coordinate arrays. ## 35. What Are Structured Arrays? Structured arrays let each element contain named fields. ```python students = np.array( [ ("Asha", 92, 8.7, True), ("Ravi", 78, 7.9, False), ], dtype=[ ("name", "U20"), ("score", "i4"), ("cgpa", "f4"), ("placed", "?"), ], ) print(students["name"]) print(students["score"]) ``` Output: ```text ['Asha' 'Ravi'] [92 78] ``` Interview answer: > Structured arrays are useful when each record has named fields, but for general tabular analytics Pandas is often more convenient. ## 36. How Are Images Represented As NumPy Arrays? A grayscale image can be a 2D array: ```text (height, width) ``` A color image is often a 3D array: ```text (height, width, channels) ``` For RGB images, channels are usually 3. Common operations: ```python image = np.zeros((100, 200, 3), dtype=np.uint8) print(image.shape) print(image.dtype) ``` Output: ```text (100, 200, 3) uint8 ``` Examples: ```python flipped_vertical = np.flip(image, axis=0) flipped_horizontal = np.flip(image, axis=1) darkened = np.clip(image * 0.7, 0, 255).astype(np.uint8) negative = 255 - image cropped = image[20:80, 50:150] ``` ## 37. What Is The Difference Between `np.save`, `np.load`, And `np.savetxt`? `np.save()` stores one array in NumPy's binary `.npy` format. ```python arr = np.array([1, 2, 3]) np.save("numbers.npy", arr) loaded = np.load("numbers.npy") print(loaded) ``` `np.savetxt()` stores text data such as CSV-like output. Binary `.npy` is usually better for preserving dtype and shape. Use `np.savez()` or `np.savez_compressed()` for multiple arrays. ## 38. Code Output: Slicing View Question: ```python arr = np.array([10, 20, 30, 40]) view = arr[1:3] view[1] = 999 print(arr) ``` Answer: ```text [ 10 20 999 40] ``` Explanation: `view` shares data with `arr`. `view[1]` corresponds to `arr[2]`. ## 39. Code Output: Fancy Indexing Copy Question: ```python arr = np.array([10, 20, 30, 40]) selected = arr[[1, 2]] selected[0] = 999 print(arr) print(selected) ``` Answer: ```text [10 20 30 40] [999 30] ``` Fancy indexing returned a copy. ## 40. Code Output: Broadcasting Question: ```python a = np.array([[1], [2], [3]]) b = np.array([10, 20, 30, 40]) print((a + b).shape) print(a + b) ``` Answer: ```text (3, 4) [[11 21 31 41] [12 22 32 42] [13 23 33 43]] ``` Shapes: ```text (3, 1) (4,) ``` Broadcast to: ```text (3, 4) ``` ## 41. Code Output: Axis Reduction Question: ```python arr = np.array([ [1, 2, 3], [4, 5, 6], ]) print(arr.sum(axis=0)) print(arr.sum(axis=1)) ``` Answer: ```text [5 7 9] [ 6 15] ``` ## 42. Code Output: `tile` vs `repeat` Question: ```python arr = np.array([1, 2, 3]) print(np.tile(arr, 2)) print(np.repeat(arr, 2)) ``` Answer: ```text [1 2 3 1 2 3] [1 1 2 2 3 3] ``` ## 43. Code Output: `allclose` Question: ```python a = np.array([0.1 + 0.2]) b = np.array([0.3]) print(a == b) print(np.allclose(a, b)) ``` Answer: ```text [False] True ``` The exact binary representation of decimal fractions can produce tiny differences. ## 44. Debugging: Why Does This Broadcasting Fail? Question: ```python sales = np.zeros((4, 3)) bonus = np.array([1, 2, 3, 4]) sales + bonus ``` Answer: This fails because shapes are: ```text (4, 3) (4,) ``` Broadcasting compares from the right: ```text 3 vs 4 ``` They are not equal, and neither is 1. Fix by making `bonus` a column: ```python bonus = bonus.reshape(4, 1) print((sales + bonus).shape) ``` Output: ```text (4, 3) ``` ## 45. Debugging: Why Did My Original Array Change? Question: ```python data = np.arange(10) part = data[2:5] part[:] = -1 print(data) ``` Answer: `part` is a view created by slicing, so it shares memory with `data`. Output: ```text [ 0 1 -1 -1 -1 5 6 7 8 9] ``` Fix: ```python part = data[2:5].copy() ``` ## 46. Debugging: Why Is `arr == np.nan` Always False? NaN is not equal to itself. ```python arr = np.array([1.0, np.nan, 3.0]) print(arr == np.nan) ``` Output: ```text [False False False] ``` Correct: ```python print(np.isnan(arr)) ``` Output: ```text [False True False] ``` ## 47. Debugging: Why Did Integer Division Become Float? ```python arr = np.array([1, 2, 3]) print((arr / 2).dtype) print(arr // 2) ``` Output: ```text float64 [0 1 1] ``` `/` performs true division and can produce floats. `//` performs floor division. ## 48. Coding Task: Normalize Each Row Question: Normalize each row using: ```text (row - row_min) / (row_max - row_min) ``` Solution: ```python data = np.array([ [10, 20, 30], [2, 4, 8], [100, 150, 200], ]) row_min = data.min(axis=1, keepdims=True) row_max = data.max(axis=1, keepdims=True) normalized = (data - row_min) / (row_max - row_min) print(normalized) ``` Output: ```text [[0. 0.5 1. ] [0. 0.33333333 1. ] [0. 0.5 1. ]] ``` ## 49. Coding Task: Find Rows With Any Value Greater Than X ```python arr = np.array([ [1, 2, 3], [10, 2, 1], [3, 9, 4], ]) x = 6 rows = np.where((arr > x).any(axis=1))[0] print(rows) ``` Output: ```text [1 2] ``` ## 50. Coding Task: Remove Minimum And Maximum Values Remove every occurrence of the minimum and maximum values. ```python arr = np.array([4, 9, 1, 3, 9, 2, 1, 7]) minimum = arr.min() maximum = arr.max() result = arr[(arr != minimum) & (arr != maximum)] print(result) ``` Output: ```text [4 3 2 7] ``` ## 51. Coding Task: Sort Rows By Second Column ```python data = np.array([ [101, 75], [102, 92], [103, 60], ]) sorted_rows = data[np.argsort(data[:, 1])] print(sorted_rows) ``` Output: ```text [[103 60] [101 75] [102 92]] ``` Descending: ```python sorted_rows_desc = data[np.argsort(data[:, 1])[::-1]] ``` ## 52. Coding Task: Add Total Column And Get Top 2 ```python marks = np.array([ [70, 80, 90], [60, 75, 85], [95, 91, 93], [50, 65, 70], ]) total = marks.sum(axis=1, keepdims=True) with_total = np.concatenate((marks, total), axis=1) ranked = with_total[np.argsort(with_total[:, -1])[::-1]] print(ranked[:2]) ``` Output: ```text [[ 95 91 93 279] [ 70 80 90 240]] ``` ## 53. Coding Task: Unique Rows ```python records = np.array([ [1, 10], [2, 20], [1, 10], [3, 30], ]) print(np.unique(records, axis=0)) ``` Output: ```text [[ 1 10] [ 2 20] [ 3 30]] ``` ## 54. Coding Task: Count Category Frequencies ```python labels = np.array(["free", "pro", "free", "team", "pro", "free"]) categories, counts = np.unique(labels, return_counts=True) print(categories) print(counts) ``` Output: ```text ['free' 'pro' 'team'] [3 2 1] ``` ## 55. Coding Task: Build A Distance Matrix Given points on a line: ```python points = np.array([1, 4, 9]) ``` Create pairwise absolute distances. ```python distance = np.abs(points[:, np.newaxis] - points[np.newaxis, :]) print(distance) ``` Output: ```text [[0 3 8] [3 0 5] [8 5 0]] ``` This uses broadcasting. ## 56. Coding Task: Euclidean Distance From A Target Point ```python points = np.array([ [2, 3], [5, 7], [1, 8], ]) target = np.array([3, 4]) distances = np.sqrt(((points - target) ** 2).sum(axis=1)) print(distances) ``` Output: ```text [1.41421356 3.60555128 4.47213595] ``` ## 57. Coding Task: Create A Checkerboard Matrix ```python board = np.zeros((6, 6), dtype=int) board[::2, ::2] = 1 board[1::2, 1::2] = 1 print(board) ``` Output: ```text [[1 0 1 0 1 0] [0 1 0 1 0 1] [1 0 1 0 1 0] [0 1 0 1 0 1] [1 0 1 0 1 0] [0 1 0 1 0 1]] ``` ## 58. Coding Task: Replace Outliers With Boundary Values ```python values = np.array([5, 12, 40, 99, 120, -3]) cleaned = np.clip(values, 0, 100) print(cleaned) ``` Output: ```text [ 5 12 40 99 100 0] ``` ## 59. Coding Task: Find Common Product IDs ```python batch_a = np.array([101, 102, 103, 104]) batch_b = np.array([103, 104, 105, 106]) print(np.intersect1d(batch_a, batch_b)) print(np.setdiff1d(batch_a, batch_b)) print(np.union1d(batch_a, batch_b)) ``` Output: ```text [103 104] [101 102] [101 102 103 104 105 106] ``` ## 60. Coding Task: Use `meshgrid` To Evaluate A Function ```python x = np.array([0, 1, 2]) y = np.array([10, 20]) xx, yy = np.meshgrid(x, y) z = xx + yy print(z) ``` Output: ```text [[10 11 12] [20 21 22]] ``` ## 61. Interview Answer: How Would You Improve Slow NumPy Code? Strong answer: > First, I would check whether the code is using Python loops over array elements. Then I would look for vectorization, broadcasting, ufuncs, axis-based reductions, and boolean masks. I would also avoid repeated appends inside loops because NumPy arrays are fixed-size; it is better to collect data first or preallocate the final array. Finally, I would check unnecessary copies, dtype choices, and memory layout if performance still matters. ## 62. Interview Answer: When Should You Not Use NumPy? Strong answer: > NumPy is not ideal for mixed object-heavy data, heavily nested Python objects, row-by-row business logic, or datasets too large for memory unless paired with chunking or other tools. For labeled tabular data, Pandas is often more ergonomic. For GPU tensor work, PyTorch, TensorFlow, JAX, or CuPy may be better depending on the project. ## 63. Interview Answer: Why Can Broadcasting Be Dangerous? Broadcasting can silently create a result with a valid but unintended shape. Example: ```python a = np.ones((3, 1)) b = np.ones((1, 4)) print((a + b).shape) ``` Output: ```text (3, 4) ``` This is correct mathematically, but if you expected a 1D result, it is a bug. Good habit: ```python print(a.shape, b.shape) ``` before combining arrays. ## 64. Interview Answer: Why Can Copies Hurt Performance? Copies use extra memory and time. If you slice a huge array and can work with a view safely, it can be faster and more memory-efficient. But views can cause accidental mutation. Strong answer: > Views are efficient but share data. Copies are safer when independence matters. The right choice depends on whether the downstream code should be allowed to affect the original data. ## 65. Interview Answer: Why Does dtype Matter? `dtype` controls: - memory usage - numerical range - precision - operation results - compatibility with libraries Example: ```python a = np.array([1, 2, 3], dtype=np.int8) b = np.array([1, 2, 3], dtype=np.float64) print(a.itemsize) print(b.itemsize) ``` Output: ```text 1 8 ``` Using a smaller dtype can save memory, but it can also overflow if values exceed the dtype range. ## 66. Quick Revision Table | Topic | Interview point | |---|---| | `ndarray` | typed, multidimensional array | | `shape` | size of each dimension | | `dtype` | type and storage format of elements | | `strides` | bytes to move along each axis | | view | shares data | | copy | owns separate data | | basic slicing | usually view | | advanced indexing | usually copy | | broadcasting | compatible shape expansion without manual loops | | `axis` | dimension being reduced or operated along | | `keepdims` | keeps reduced axes for broadcasting | | `ravel` | view when possible | | `flatten` | copy | | `default_rng` | recommended random generator constructor | | `allclose` | tolerance-based float comparison | | `tile` | repeats whole pattern | | `repeat` | repeats individual elements | | `meshgrid` | coordinate grids | | structured array | records with named fields | ## 67. Rapid-Fire Interview Questions ### 1. What is NumPy mainly used for? Fast numerical work with arrays. ### 2. What is the main NumPy object? `ndarray`. ### 3. What does `shape` return? A tuple showing the size of each dimension. ### 4. What does `dtype` tell you? The type and storage format of each array element. ### 5. What does `axis=0` mean in a 2D aggregation? Reduce down rows and return one result per column. ### 6. What does `axis=1` mean in a 2D aggregation? Reduce across columns and return one result per row. ### 7. Does slicing copy data? Basic slicing usually returns a view. ### 8. Does fancy indexing copy data? Usually yes. ### 9. Why use `copy()`? To avoid changing the original array when modifying selected data. ### 10. Why use `np.allclose()`? To compare floating-point arrays with tolerance. ### 11. What is broadcasting? Automatic shape compatibility for element-wise operations. ### 12. What is vectorization? Using array operations instead of Python loops. ### 13. Why is vectorization faster? The loop runs in optimized compiled code with less Python overhead. ### 14. What is `np.where()`? A conditional selection function or a way to find matching positions. ### 15. What is `np.argmax()`? It returns the index of the maximum value. ### 16. What is `np.argmin()`? It returns the index of the minimum value. ### 17. What is `np.unique(..., return_counts=True)` used for? Finding unique values and their frequencies. ### 18. What is `np.clip()` used for? Limiting values to a minimum and maximum range. ### 19. What is `np.meshgrid()` used for? Creating coordinate grids. ### 20. What is a structured array? An array with named fields inside each record. ## 68. Practice Interview Set Try these without looking at the answers first. ### Question 1 Explain why NumPy arrays are faster than Python lists for numerical operations. ### Question 2 Given an array with shape `(5, 1)` and another with shape `(3,)`, what is the result shape after addition? ### Question 3 What happens when you modify an array slice? ### Question 4 Write code to select rows where any value is negative. ### Question 5 Write code to normalize each column. ### Question 6 Write code to get the top 3 values from a 1D array. ### Question 7 Write code to find duplicate values in an array. ### Question 8 Write code to replace NaN values with zero. ### Question 9 Write code to create a 5 by 5 identity matrix. ### Question 10 Write code to save and load a NumPy array. ## 69. Practice Interview Answers ### Answer 1 NumPy arrays are faster because values are stored in a compact typed buffer, and operations run in optimized compiled code instead of Python-level loops. ### Answer 2 Shapes: ```text (5, 1) (3,) ``` Result: ```text (5, 3) ``` ### Answer 3 Basic slices usually create views, so modifying the slice can modify the original array. ### Answer 4 ```python arr = np.array([ [1, 2, 3], [4, -1, 6], [7, 8, 9], ]) rows = arr[(arr < 0).any(axis=1)] print(rows) ``` **Explanation** - A 2D NumPy array `arr` is created with integers, including a negative value (-1). - The expression `(arr < 0).any(axis=1)` generates a boolean array indicating which rows contain at least one negative value. - The original array `arr` is indexed with this boolean array to extract the rows that meet the condition. - The resulting rows are stored in the variable `rows` and printed, showing only the rows with negative values. ### Answer 5 ```python data = np.array([ [10, 100], [20, 150], [30, 200], ]) col_min = data.min(axis=0, keepdims=True) col_max = data.max(axis=0, keepdims=True) normalized = (data - col_min) / (col_max - col_min) print(normalized) ``` **Explanation** - The code initializes a 2D NumPy array named `data` with specific values. - It calculates the minimum values for each column using `data.min(axis=0, keepdims=True)`, preserving the array's dimensions. - Similarly, it computes the maximum values for each column with `data.max(axis=0, keepdims=True)`. - The normalization formula `(data - col_min) / (col_max - col_min)` is applied to scale the data to a range between 0 and 1. - Finally, the normalized array is printed, showing the transformed values. ### Answer 6 ```python arr = np.array([12, 99, 4, 42, 18, 77]) top_3 = np.sort(arr)[-3:][::-1] print(top_3) ``` **Explanation** - The code initializes a NumPy array `arr` containing six integer values. - It sorts the array in ascending order using `np.sort(arr)`. - The last three elements of the sorted array, which are the highest values, are selected with `[-3:]`. - The selected values are then reversed to present them in descending order using `[::-1]`. - Finally, the top three values are printed to the console. ### Answer 7 ```python arr = np.array([1, 2, 2, 3, 4, 4, 4]) values, counts = np.unique(arr, return_counts=True) duplicates = values[counts > 1] print(duplicates) ``` **Explanation** - The code initializes a NumPy array `arr` containing integers, some of which are duplicated. - It uses `np.unique()` to find unique values in the array while also counting their occurrences, returning two arrays: `values` and `counts`. - The `duplicates` array is created by filtering `values` where the corresponding `counts` are greater than 1, indicating duplicates. - Finally, it prints the `duplicates` array, which contains the values that appear more than once in the original array. ### Answer 8 ```python arr = np.array([1.0, np.nan, 3.0, np.nan]) cleaned = np.where(np.isnan(arr), 0, arr) print(cleaned) ``` **Explanation** - The code initializes a NumPy array `arr` containing floating-point numbers, including `NaN` values. - It uses `np.isnan(arr)` to create a boolean mask identifying the `NaN` elements in the array. - The `np.where` function replaces `NaN` values with `0` while keeping other values unchanged. - The resulting array `cleaned` is printed, showing the original values with `NaN` replaced by `0`. Alternative: ```python cleaned = np.nan_to_num(arr, nan=0.0) ``` **Explanation** - Utilizes the `np.nan_to_num()` function from the NumPy library to handle NaN values. - The input `arr` is a NumPy array that may contain NaN (Not a Number) entries. - Any NaN values found in `arr` are replaced with `0.0`, ensuring the output array `cleaned` has no NaN values. - This is useful for data preprocessing, especially before performing mathematical operations or analyses that cannot handle NaN values. ### Answer 9 ```python identity = np.eye(5) print(identity) ``` **Explanation** - The code utilizes the NumPy library, which is commonly used for numerical operations in Python. - `np.eye(5)` creates a 5x5 identity matrix, where all the diagonal elements are 1 and all other elements are 0. - The `print(identity)` statement outputs the generated identity matrix to the console. - Identity matrices are useful in various mathematical computations, including linear algebra and transformations. ### Answer 10 ```python arr = np.array([[1, 2], [3, 4]]) np.save("arr.npy", arr) loaded = np.load("arr.npy") print(loaded) ``` **Explanation** - The code creates a 2D NumPy array named `arr` containing the values `[[1, 2], [3, 4]]`. - It uses `np.save` to save the array to a file called "arr.npy" in binary format. - The array is then loaded back into memory using `np.load`, retrieving the saved data into the variable `loaded`. - Finally, the loaded array is printed to the console, displaying its contents. ## 70. Common Mistakes To Avoid ### Mistake 1: Not checking shapes Most NumPy bugs are shape bugs. Always inspect: ```python print(arr.shape) ``` **Explanation** - The code uses the `print` function to output information to the console. - `arr.shape` accesses the `shape` attribute of a NumPy array, which returns a tuple representing the dimensions of the array. - This is useful for understanding the structure of the data, such as the number of rows and columns in a 2D array. - The output will vary depending on the specific shape of the `arr` array being analyzed. ### Mistake 2: Confusing views and copies If you modify a slice and the original changes, you probably had a view. Use: ```python arr[2:5].copy() ``` **Explanation** - The code accesses a portion of the list `arr` from index 2 to index 4 (5 is exclusive). - The `copy()` method is called on the sliced portion, ensuring that a new list is created rather than a reference to the original. - This is useful for modifying the copied list without affecting the original list. - The resulting copied list contains the elements from the specified range of the original list. when independence matters. ### Mistake 3: Comparing floats with exact equality Use: ```python np.allclose(a, b) ``` **Explanation** - Utilizes the NumPy library function `np.allclose()` to compare two arrays, `a` and `b`. - Returns `True` if all elements of the arrays are equal within a specified tolerance, otherwise returns `False`. - Useful for numerical comparisons where floating-point precision issues may arise. - The function allows for customization of relative and absolute tolerances through optional parameters. when tiny numerical differences are acceptable. ### Mistake 4: Using loops for simple array operations Prefer: ```python arr * 2 arr[arr > 0] arr.sum(axis=1) ``` **Explanation** - The expression `arr * 2` scales each element of the array `arr` by a factor of 2, effectively doubling its values. - The expression `arr[arr > 0]` filters the array to include only the elements that are greater than zero, creating a new array with positive values. - The method `arr.sum(axis=1)` computes the sum of elements along the specified axis (rows in this case), returning a new array with the sum of each row's elements. over manual loops when possible. ### Mistake 5: Using `np.append()` repeatedly in a loop NumPy arrays are fixed-size. Repeated appends create repeated allocations. Better approaches: - collect values in a Python list, then convert once - preallocate the final NumPy array - use `concatenate` once when possible ## Final Summary For NumPy interviews, remember these core ideas: - `ndarray` is a typed multidimensional array. - Shape tells you structure; dtype tells you storage and numerical behavior. - Strides explain how NumPy walks through memory. - Basic slicing usually creates views. - Advanced indexing usually creates copies. - Broadcasting compares shapes from right to left. - Vectorization avoids Python-level loops. - `axis` tells NumPy which dimension to operate over or reduce. - `keepdims=True` keeps dimensions useful for broadcasting. - `default_rng()` is preferred for modern random number generation. - `allclose()` is better than exact equality for many floating-point checks. - Image data, structured records, and grids are all natural NumPy use cases. The best interview answers are short, accurate, and supported by a small example. ## Sources and Further Reading - NumPy documentation: https://numpy.org/doc/ - NumPy ndarray reference: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html - NumPy ndarray user guide: https://numpy.org/doc/stable/reference/arrays.ndarray.html - NumPy copies and views: https://numpy.org/doc/stable/user/basics.copies.html - NumPy broadcasting guide: https://numpy.org/doc/stable/user/basics.broadcasting.html - NumPy random Generator: https://numpy.org/doc/stable/reference/random/generator.html - NumPy strides reference: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.strides.html - NumPy allclose reference: https://numpy.org/doc/stable/reference/generated/numpy.allclose.html - NumPy structured arrays: https://numpy.org/doc/stable/user/basics.rec.html - NumPy I/O routines: https://numpy.org/doc/stable/reference/routines.io.html