# Getting Started with NumPy: Arrays, Indexing, and Operations URL: https://madhudadi.in/blog/posts/numpy-basics-arrays-indexing-and-operations-guide Published: 2026-05-23 Tags: Numpy, python Read time: 24 min Difficulty: beginner > Learn NumPy from the ground up: create arrays, understand shape and dtype, perform vectorized operations, use axes, slice 1D/2D/3D arrays, reshape data, stack arrays, split arrays, and solve beginner practice tasks.# NumPy Basics: Arrays, Shapes, Dtypes, Indexing, and Array Operations NumPy is one of the most important Python libraries for data analysis, machine learning, scientific computing, and numerical programming. The main idea is simple: > NumPy lets you store numbers in compact arrays and perform operations on the whole array at once. If you are learning data science, you will see NumPy everywhere. Pandas, scikit-learn, TensorFlow, PyTorch, image-processing libraries, and many plotting tools all depend on array-style thinking. This lesson starts from the basics. You will learn how to create arrays, inspect their shape, change data types, perform calculations, slice arrays, reshape data, combine arrays, split arrays, and solve practical beginner problems. ## What you will learn By the end, you should be able to: - explain why NumPy arrays are useful - create 1D, 2D, and 3D arrays - use `np.array()`, `np.arange()`, `np.zeros()`, `np.ones()`, `np.linspace()`, and `np.eye()` - inspect `ndim`, `shape`, `size`, `dtype`, and `itemsize` - convert array data types with `astype()` - perform scalar and array operations - use aggregate functions such as `sum`, `mean`, `min`, `max`, and `std` - understand the meaning of `axis=0` and `axis=1` - index and slice arrays confidently - reshape, transpose, flatten, stack, and split arrays - solve beginner NumPy practice problems ## 1. What Is NumPy? NumPy is a Python library for working with numerical arrays. A normal Python list can store values: ```python scores = [72, 85, 91, 64] ``` That is useful, but if you want to add 5 marks to every score, a list needs a loop: ```python scores = [72, 85, 91, 64] updated = [] for score in scores: updated.append(score + 5) print(updated) ``` Output: ```text [77, 90, 96, 69] ``` With NumPy, you can apply the operation directly: ```python import numpy as np scores = np.array([72, 85, 91, 64]) updated = scores + 5 print(updated) ``` Output: ```text [77 90 96 69] ``` This is called a vectorized operation. You write less code, and NumPy performs the calculation efficiently. ## 2. NumPy Arrays vs Python Lists Python lists are flexible. They can grow, shrink, and contain mixed types. ```python mixed = ["Python", 10, True] print(mixed) ``` NumPy arrays are designed for numerical work. In most practical cases, all values in one array share the same data type. ```python numbers = np.array([10, 20, 30]) print(numbers) print(numbers.dtype) ``` Output: ```text [10 20 30] int64 ``` On some systems, the integer dtype may appear as `int32` instead of `int64`. The exact default can depend on your platform. Here is the practical difference: | Feature | Python list | NumPy array | |---|---|---| | Best for | General Python objects | Numerical data | | Mixed data types | Common | Usually avoided | | Vectorized math | No | Yes | | Memory layout | Flexible | Compact | | Data science use | Input/helper structure | Core structure | Use lists when you need general-purpose Python containers. Use NumPy arrays when you need fast numerical operations. ## 3. Installing and Importing NumPy If NumPy is not installed, install it with: ```bash pip install numpy ``` Then import it: ```python import numpy as np ``` The alias `np` is the standard convention. You will see it in documentation, tutorials, notebooks, and production code. ## 4. Creating a 1D Array A 1D array is like a simple row of values. ```python import numpy as np marks = np.array([80, 75, 92, 68]) print(marks) print(type(marks)) ``` Output: ```text [80 75 92 68] ``` The object type is `ndarray`, which means n-dimensional array. ## 5. Creating 2D Arrays A 2D array is like a table with rows and columns. ```python sales = np.array([ [120, 135, 150], [90, 110, 125], ]) print(sales) ``` Output: ```text [[120 135 150] [ 90 110 125]] ``` This array has: - 2 rows - 3 columns You can think of it as sales data for 2 stores across 3 days. ## 6. Creating 3D Arrays A 3D array is like multiple tables stacked together. ```python weekly_sales = np.array([ [ [120, 135], [90, 110], ], [ [140, 160], [100, 115], ], ]) print(weekly_sales) ``` This can represent: - 2 weeks - 2 stores - 2 days per week In machine learning, 3D and higher-dimensional arrays are common. Images, batches of images, time-series windows, embeddings, and tensors all use this style of structure. ## 7. Choosing a Data Type With dtype You can tell NumPy which type to use. ```python prices = np.array([99, 149, 199], dtype=float) print(prices) print(prices.dtype) ``` Output: ```text [ 99. 149. 199.] float64 ``` You can also create boolean arrays: ```python availability = np.array([1, 0, 1, 1], dtype=bool) print(availability) ``` Output: ```text [ True False True True] ``` And complex arrays: ```python signals = np.array([2, 5, 8], dtype=complex) print(signals) ``` Output: ```text [2.+0.j 5.+0.j 8.+0.j] ``` For beginner data analysis, the most common dtypes are integers, floats, booleans, strings, and dates. ## 8. Creating Ranges With np.arange() `np.arange()` creates values in a range. ```python numbers = np.arange(1, 8) print(numbers) ``` Output: ```text [1 2 3 4 5 6 7] ``` The stop value is not included. You can add a step: ```python even_numbers = np.arange(2, 13, 2) print(even_numbers) ``` Output: ```text [ 2 4 6 8 10 12] ``` You can count backward: ```python countdown = np.arange(5, 0, -1) print(countdown) ``` Output: ```text [5 4 3 2 1] ``` ## 9. Reshaping a Range `reshape()` changes how values are arranged. ```python grid = np.arange(1, 13).reshape(3, 4) print(grid) ``` Output: ```text [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] ``` The total number of values must match. This works because: ```text 3 rows x 4 columns = 12 values ``` This will fail: ```python np.arange(1, 13).reshape(5, 3) ``` Why? ```text 5 rows x 3 columns = 15 positions ``` But the array has only 12 values. ## 10. Creating Arrays of Zeros and Ones `np.zeros()` creates an array filled with zero. ```python empty_scores = np.zeros((3, 4)) print(empty_scores) ``` Output: ```text [[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]] ``` `np.ones()` creates an array filled with one. ```python default_flags = np.ones((2, 5)) print(default_flags) ``` Output: ```text [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] ``` These are useful when you want to create a placeholder array before filling it with real values. ## 11. Creating Random Arrays Random arrays are useful for demos, simulations, and testing. ```python random_values = np.random.random((2, 3)) print(random_values) ``` This creates a 2 by 3 array with values between 0 and 1. For reproducible examples, use a random generator with a seed: ```python rng = np.random.default_rng(42) sample = rng.random((2, 3)) print(sample) ``` Using a seed helps you get the same random values every time you run the code. ## 12. Creating Evenly Spaced Values With linspace() `np.linspace()` creates a fixed number of evenly spaced values between a start and end. ```python temperatures = np.linspace(0, 100, 6) print(temperatures) ``` Output: ```text [ 0. 20. 40. 60. 80. 100.] ``` Use `linspace()` when you care about how many values you want. Use `arange()` when you care about the step size. ## 13. Creating Identity Matrices An identity matrix has ones on the main diagonal and zeros everywhere else. ```python identity = np.eye(4) print(identity) ``` Output: ```text [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]] ``` Identity matrices are common in linear algebra. If you want a rectangular matrix with diagonal ones, use `np.eye()` with two dimensions: ```python wide_identity = np.eye(3, 5) print(wide_identity) ``` Output: ```text [[1. 0. 0. 0. 0.] [0. 1. 0. 0. 0.] [0. 0. 1. 0. 0.]] ``` ## 14. Important Array Attributes Create three arrays: ```python one_d = np.arange(6) two_d = np.arange(12).reshape(3, 4) three_d = np.arange(24).reshape(2, 3, 4) ``` ### ndim `ndim` tells you how many dimensions an array has. ```python print(one_d.ndim) print(two_d.ndim) print(three_d.ndim) ``` **Explanation** - The code uses the `ndim` attribute of NumPy arrays to determine their dimensionality. - `one_d`, `two_d`, and `three_d` are assumed to be NumPy arrays with one, two, and three dimensions, respectively. - The `print` function outputs the number of dimensions for each array to the console. - This is useful for understanding the structure of the data being handled in scientific computing or data analysis tasks. Output: ```text 1 2 3 ``` ### shape `shape` tells you the size of each dimension. ```python print(one_d.shape) print(two_d.shape) print(three_d.shape) ``` **Explanation** - The `print` function outputs the shape of each array to the console. - `one_d.shape` accesses the shape attribute of a one-dimensional array, returning its length. - `two_d.shape` retrieves the dimensions of a two-dimensional array, typically returning a tuple of rows and columns. - `three_d.shape` provides the dimensions of a three-dimensional array, returning a tuple representing depth, height, and width. - This code is useful for understanding the structure and size of different types of arrays in numerical computing. Output: ```text (6,) (3, 4) (2, 3, 4) ``` For `three_d`, the shape means: ```text 2 blocks, 3 rows per block, 4 columns per row ``` ### size `size` tells you the total number of elements. ```python print(one_d.size) print(two_d.size) print(three_d.size) ``` **Explanation** - The `print` function is used to output the sizes of the arrays to the console. - `one_d.size`, `two_d.size`, and `three_d.size` access the `size` attribute of each respective array, which indicates the total number of elements in the array. - This code assumes that `one_d`, `two_d`, and `three_d` are pre-defined array-like objects, likely from libraries such as NumPy. - The output will display the sizes in the order of the arrays defined, helping to understand their dimensions. Output: ```text 6 12 24 ``` ### dtype `dtype` tells you the data type. ```python print(two_d.dtype) ``` **Explanation** - The `print` function outputs the result to the console. - `two_d` is expected to be a NumPy array, which is a common data structure for numerical data in Python. - The `dtype` attribute of a NumPy array provides information about the type of data stored in the array, such as integers, floats, etc. - This code is useful for debugging or understanding the nature of the data being processed. ### itemsize `itemsize` tells you how many bytes each element uses. ```python small = np.array([1, 2, 3], dtype=np.int16) large = np.array([1, 2, 3], dtype=np.int64) print(small.itemsize) print(large.itemsize) ``` **Explanation** - The code creates two NumPy arrays, `small` and `large`, with different data types: `int16` and `int64`, respectively. - The `itemsize` attribute of a NumPy array returns the size in bytes of each element in the array. - The `print` statements output the memory size of each element for both arrays, illustrating the difference in storage requirements between the two data types. - This comparison is useful for understanding how data types affect memory usage in numerical computations. Output: ```text 2 8 ``` This matters when you work with large datasets. ## 15. Changing Data Types With astype() Use `astype()` to create a converted copy of an array. ```python ratings = np.array([4.8, 3.2, 5.0, 2.9]) rounded_ratings = ratings.astype(int) print(rounded_ratings) ``` **Explanation** - The code initializes a NumPy array named `ratings` containing floating-point numbers representing ratings. - The `astype(int)` method is used to convert each floating-point rating in the array to an integer, effectively rounding down. - The resulting integer array is stored in the variable `rounded_ratings`. - Finally, the code prints the `rounded_ratings` array to the console, displaying the integer values. Output: ```text [4 3 5 2] ``` Be careful: converting floats to integers removes the decimal part. It does not round to the nearest number. ```python values = np.array([1.9, 2.1, 3.7]) print(values.astype(int)) ``` **Explanation** - The code initializes a NumPy array named `values` containing floating-point numbers. - The `astype(int)` method is called on the array, which converts each element from float to integer type. - The `print` function outputs the resulting array, displaying the integer values after conversion. - This operation truncates the decimal part of each float, effectively rounding down to the nearest whole number. Output: ```text [1 2 3] ``` If you want proper rounding, use `np.round()` first: ```python print(np.round(values).astype(int)) ``` **Explanation** - The code uses the NumPy library to handle numerical operations on arrays. - `np.round(values)` rounds each element in the `values` array to the nearest integer. - The result of the rounding is then converted to an integer type using `.astype(int)`. - This is useful for preparing data for scenarios where integer values are required, such as indexing or counting. Output: ```text [2 2 4] ``` ## 16. Scalar Operations A scalar is a single value. ```python prices = np.array([100, 200, 300]) print(prices + 10) print(prices * 2) print(prices / 4) ``` **Explanation** - Initializes a NumPy array named `prices` containing three integer values: 100, 200, and 300. - Adds 10 to each element in the `prices` array, resulting in a new array with values [110, 210, 310]. - Multiplies each element in the `prices` array by 2, producing an array with values [200, 400, 600]. - Divides each element in the `prices` array by 4, yielding an array with values [25.0, 50.0, 75.0]. Output: ```text [110 210 310] [200 400 600] [25. 50. 75.] ``` The operation is applied to every element. You can also compare every element: ```python marks = np.array([45, 72, 88, 39]) print(marks >= 50) ``` **Explanation** - The code initializes a NumPy array named `marks` containing four integer values representing scores. - It uses a comparison operation (`>= 50`) to create a boolean array indicating which scores are greater than or equal to 50. - The result of the comparison is printed, showing `True` for scores that meet the condition and `False` for those that do not. - This operation is useful for quickly assessing performance against a passing mark. Output: ```text [False True True False] ``` This returns a boolean array. ## 17. Array-to-Array Operations Arrays with the same shape can be added, subtracted, multiplied, and compared element by element. ```python jan = np.array([120, 90, 150]) feb = np.array([130, 85, 170]) print(feb - jan) print(feb > jan) ``` **Explanation** - Two NumPy arrays, `jan` and `feb`, are created to represent values for January and February. - The expression `feb - jan` calculates the difference between corresponding elements of the two arrays, showing how values changed from January to February. - The expression `feb > jan` performs an element-wise comparison, returning a boolean array indicating whether each value in February is greater than the corresponding value in January. Output: ```text [10 -5 20] [ True False True] ``` For 2D arrays: ```python morning = np.array([ [8, 10, 12], [7, 9, 11], ]) evening = np.array([ [5, 6, 7], [4, 5, 6], ]) print(morning + evening) ``` **Explanation** - The code initializes two 2D NumPy arrays, `morning` and `evening`, containing integer values. - It uses the `np.array` function from the NumPy library to create these arrays. - The `print` function outputs the result of adding the two arrays together, performing element-wise addition. - The resulting array will have the same shape as the input arrays, with each element being the sum of the corresponding elements from `morning` and `evening`. Output: ```text [[13 16 19] [11 14 17]] ``` ## 18. Useful Array Functions Create a small dataset: ```python orders = np.array([ [12, 18, 10], [9, 15, 21], [14, 11, 16], ]) ``` **Explanation** - Initializes a NumPy array named `orders` to store a 2D matrix. - The matrix consists of three rows and three columns, representing different order quantities. - Each inner list corresponds to a specific order, with values indicating quantities of items ordered. - This structure is useful for performing mathematical operations or analyses on order data. ### Sum ```python print(np.sum(orders)) ``` **Explanation** - Utilizes the NumPy library, which is commonly used for numerical operations in Python. - The `np.sum()` function computes the sum of all elements in the provided array, 'orders'. - The result is printed to the console, allowing for immediate visibility of the total sum. - This operation is efficient for large datasets due to NumPy's optimized performance. Output: ```text 126 ``` ### Minimum and maximum ```python print(np.min(orders)) print(np.max(orders)) ``` **Explanation** - Utilizes the NumPy library, which is commonly used for numerical operations in Python. - `np.min(orders)` computes and prints the smallest value in the `orders` array. - `np.max(orders)` computes and prints the largest value in the `orders` array. - This code is useful for quickly assessing the range of values in a dataset. Output: ```text 9 21 ``` ### Mean ```python print(np.mean(orders)) ``` **Explanation** - Utilizes the `mean` function from the NumPy library to compute the average. - Assumes `orders` is a NumPy array or a list containing numerical values. - The result is printed directly to the console, providing immediate feedback on the average order value. - This operation is efficient for large datasets due to NumPy's optimized performance. Output: ```text 14.0 ``` ### Standard deviation ```python print(np.std(orders)) ``` **Explanation** - Utilizes the `np.std()` function from the NumPy library to compute the standard deviation. - The input `orders` is expected to be a NumPy array or a list containing numerical data. - Standard deviation measures the amount of variation or dispersion in a set of values. - The result is printed to the console, providing insight into the variability of the `orders` dataset. - This function is useful for statistical analysis and understanding data distribution. Standard deviation tells you how spread out the values are. ## 19. Understanding axis The `axis` argument tells NumPy which direction to calculate across. Use this array: ```python orders = np.array([ [12, 18, 10], [9, 15, 21], [14, 11, 16], ]) ``` **Explanation** - Initializes a NumPy array named `orders` to store a 2D matrix. - The matrix consists of three rows and three columns, representing different order quantities. - Each inner list corresponds to a specific order, with values indicating quantities of items ordered. - This structure is useful for performing mathematical operations or analyses on order data. Think of it as: ```text rows -> different stores columns -> different days ``` ### axis=0 `axis=0` works down the rows, so it produces one result per column. ```python print(np.sum(orders, axis=0)) ``` **Explanation** - Utilizes the NumPy library to perform efficient numerical operations on arrays. - The `np.sum()` function calculates the sum of array elements. - The `axis=0` parameter specifies that the sum should be computed column-wise (i.e., summing across rows). - This operation is useful for aggregating data, such as total sales or counts, from multiple orders. Output: ```text [35 44 47] ``` This means: - day 1 total: 12 + 9 + 14 = 35 - day 2 total: 18 + 15 + 11 = 44 - day 3 total: 10 + 21 + 16 = 47 ### axis=1 `axis=1` works across the columns, so it produces one result per row. ```python print(np.sum(orders, axis=1)) ``` **Explanation** - Utilizes the NumPy library to perform efficient numerical operations on arrays. - The `np.sum` function calculates the sum of elements along a specified axis. - The parameter `axis=1` indicates that the summation is performed across rows, resulting in a single sum for each row. - This operation is useful for aggregating data, such as total orders per customer in a dataset. Output: ```text [40 45 41] ``` This means: - store 1 total: 40 - store 2 total: 45 - store 3 total: 41 This is one of the most important NumPy ideas. If your result shape looks wrong, check your axis. ## 20. Mathematical Functions NumPy includes many mathematical functions. ```python values = np.array([1, 2, 3, 4]) print(np.sqrt(values)) print(np.exp(values)) print(np.log(values)) ``` **Explanation** - The code initializes a NumPy array called `values` containing integers from 1 to 4. - It calculates the square root of each element in the array using `np.sqrt()`, which returns an array of square roots. - The exponential function is applied to each element with `np.exp()`, resulting in an array of e raised to the power of each value. - The natural logarithm of each element is computed using `np.log()`, producing an array of logarithmic values. Output: ```text [1. 1.41421356 1.73205081 2. ] [ 2.71828183 7.3890561 20.08553692 54.59815003] [0. 0.69314718 1.09861229 1.38629436] ``` Trigonometric functions also work: ```python angles = np.array([0, np.pi / 2, np.pi]) print(np.sin(angles)) ``` **Explanation** - The code imports the NumPy library and creates an array of angles in radians: 0, π/2, and π. - It then computes the sine of each angle in the array using the `np.sin()` function. - The result is printed, showing the sine values corresponding to the input angles: 0, 1, and 0, respectively. - This snippet demonstrates the use of vectorized operations in NumPy for efficient mathematical computations. Output: ```text [0.0000000e+00 1.0000000e+00 1.2246468e-16] ``` The last value is extremely close to zero. Floating-point calculations sometimes produce tiny approximation errors. ## 21. Rounding Values ```python measurements = np.array([2.2, 2.8, 3.1, 3.9]) print(np.round(measurements)) print(np.floor(measurements)) print(np.ceil(measurements)) ``` **Explanation** - Initializes a NumPy array called `measurements` with floating-point values. - Uses `np.round()` to round each element in the array to the nearest integer. - Applies `np.floor()` to return the largest integer less than or equal to each element. - Utilizes `np.ceil()` to return the smallest integer greater than or equal to each element. Output: ```text [2. 3. 3. 4.] [2. 2. 3. 3.] [3. 3. 4. 4.] ``` Use: - `round` for nearest value - `floor` for lower integer - `ceil` for higher integer ## 22. Dot Product The dot product is a common linear algebra operation. For two 1D arrays: ```python weights = np.array([0.2, 0.5, 0.3]) features = np.array([80, 60, 90]) score = np.dot(weights, features) print(score) ``` **Explanation** - The `weights` array contains the coefficients that represent the importance of each feature. - The `features` array holds the values of the features being evaluated. - The `np.dot()` function computes the dot product of the `weights` and `features`, resulting in a single score that reflects the weighted sum. - Finally, the calculated score is printed to the console, providing a quantitative assessment based on the input data. Output: ```text 73.0 ``` This is: ```text 0.2*80 + 0.5*60 + 0.3*90 ``` For matrices, the inner dimensions must match. ```python a = np.arange(6).reshape(2, 3) b = np.arange(12).reshape(3, 4) print(np.dot(a, b)) ``` **Explanation** - The code initializes two NumPy arrays, `a` and `b`, with specified shapes using `np.arange()` and `reshape()`. - Array `a` is a 2x3 matrix containing values from 0 to 5, while array `b` is a 3x4 matrix containing values from 0 to 11. - The `np.dot()` function is used to compute the dot product of the two matrices, resulting in a new 2x4 matrix. - The result of the dot product is printed to the console, showcasing the multiplication of the two matrices. Here: ```text a shape = (2, 3) b shape = (3, 4) result shape = (2, 4) ``` ## 23. Indexing 1D Arrays Indexing means selecting values by position. ```python scores = np.array([55, 70, 82, 91, 64]) print(scores[0]) print(scores[3]) print(scores[-1]) ``` **Explanation** - Initializes a NumPy array named `scores` containing five integer values representing scores. - Uses `print(scores[0])` to output the first element of the array, which is 55. - Uses `print(scores[3])` to output the fourth element of the array, which is 91. - Uses `print(scores[-1])` to output the last element of the array, which is 64, demonstrating negative indexing. Output: ```text 55 91 64 ``` Python indexing starts at zero. Negative indexing starts from the end. ## 24. Slicing 1D Arrays Slicing selects a range. ```python scores = np.array([55, 70, 82, 91, 64, 77]) print(scores[1:4]) ``` **Explanation** - The code initializes a NumPy array named `scores` containing six integer values representing scores. - The slicing operation `scores[1:4]` retrieves elements from index 1 to index 3 (inclusive of 1 and exclusive of 4). - The `print` function outputs the sliced portion of the array, which consists of the scores 70, 82, and 91. - This technique is useful for accessing a subset of data within a larger dataset efficiently. Output: ```text [70 82 91] ``` The start index is included. The stop index is excluded. You can add a step: ```python print(scores[0:6:2]) ``` **Explanation** - The `print` function outputs the result to the console. - `scores[0:6:2]` uses Python's list slicing syntax to access elements. - The slice starts at index 0 and goes up to, but does not include, index 6. - The step value of 2 indicates that every second element within the specified range will be selected. - This is useful for extracting specific elements from a list efficiently. Output: ```text [55 82 64] ``` Reverse an array: ```python print(scores[::-1]) ``` **Explanation** - The slicing syntax `[::-1]` is used to create a reversed copy of the 'scores' list. - The `print()` function outputs the reversed list to the console. - This approach is efficient and concise for reversing lists in Python. - The original 'scores' list remains unchanged after this operation. Output: ```text [77 64 91 82 70 55] ``` ## 25. Indexing 2D Arrays Use row and column positions. ```python table = np.array([ [10, 20, 30], [40, 50, 60], [70, 80, 90], ]) print(table[1, 2]) ``` **Explanation** - A 2D NumPy array named `table` is created with three rows and three columns containing integer values. - The `print` function is used to output a specific element from the array. - The element accessed is located at the second row (index 1) and third column (index 2), which corresponds to the value `60`. - NumPy uses zero-based indexing, meaning the first row and column are indexed as 0. - This code snippet demonstrates how to retrieve values from a multi-dimensional array efficiently. Output: ```text 60 ``` This means: ```text row index 1, column index 2 ``` Select a full row: ```python print(table[0, :]) ``` **Explanation** - The `print` function outputs data to the console. - `table` is expected to be a 2D array or similar data structure, such as a NumPy array or a list of lists. - The indexing `[0, :]` selects all columns of the first row (index 0) of the `table`. - The colon `:` indicates that all elements in that row should be included in the output. - This snippet is useful for quickly inspecting the contents of the first row in a dataset. Output: ```text [10 20 30] ``` Select a full column: ```python print(table[:, 1]) ``` **Explanation** - The code utilizes NumPy's slicing feature to access specific elements in a 2D array. - `table[:, 1]` selects all rows (`:`) from the second column (`1`) of the `table` array. - This operation returns a one-dimensional array containing all values from the specified column. - It is a common technique for data manipulation and analysis in scientific computing with Python. - Ensure that `table` is a NumPy array for this slicing syntax to work correctly. Output: ```text [20 50 80] ``` Select a smaller block: ```python print(table[0:2, 1:3]) ``` **Explanation** - The code uses the `print` function to display the output of the slicing operation. - `table` is assumed to be a 2D array or matrix-like structure, such as a NumPy array. - The slicing `0:2` indicates that it will select the first two rows (index 0 and 1). - The slicing `1:3` indicates that it will select the second and third columns (index 1 and 2). - The result is a smaller 2D array containing the specified rows and columns from the original `table`. Output: ```text [[20 30] [50 60]] ``` ## 26. Indexing 3D Arrays A 3D array has three indexes: ```text block, row, column ``` Example: ```python cube = np.arange(24).reshape(2, 3, 4) print(cube) ``` **Explanation** - The code utilizes NumPy to create a 3D array named `cube` with dimensions 2x3x4. - `np.arange(24)` generates a one-dimensional array with values from 0 to 23. - The `reshape(2, 3, 4)` method reorganizes this array into a three-dimensional structure. - The `print(cube)` statement outputs the contents of the 3D array to the console. The shape is: ```python print(cube.shape) ``` **Explanation** - The code uses the `print` function to output the shape of the variable `cube`. - `cube` is expected to be a NumPy array, which can represent multi-dimensional data. - The `shape` attribute returns a tuple indicating the size of each dimension of the array. - This information is useful for understanding the structure and dimensions of the data being processed. Output: ```text (2, 3, 4) ``` Get one block: ```python print(cube[0]) ``` **Explanation** - The code uses the `print()` function to output data to the console. - It retrieves the first element of the list `cube` by using the index `0`. - Lists in Python are zero-indexed, meaning the first element is accessed with index `0`. Get one row from one block: ```python print(cube[1, 2]) ``` **Explanation** - The code snippet retrieves the value located at the first row and second column of a two-dimensional array named `cube`. - The `print` function outputs the value to the console. - The indexing uses zero-based indexing, meaning that `1` refers to the second row and `2` refers to the third column. - Ensure that `cube` is defined as a multi-dimensional array (e.g., a NumPy array or a list of lists) prior to this operation to avoid errors. Get one value: ```python print(cube[1, 2, 3]) ``` **Explanation** - The code snippet prints the value located at the coordinates (1, 2, 3) in a multi-dimensional array named `cube`. - The array `cube` is expected to be defined earlier in the code and should be at least three-dimensional. - The indices are zero-based, meaning that (1, 2, 3) refers to the second, third, and fourth elements along each respective dimension. - This operation is commonly used in data manipulation and scientific computing to retrieve specific data points from structured datasets. When slicing 3D arrays, say the dimensions out loud: ```text which block? which row? which column? ``` That habit makes indexing much less confusing. ## 27. Iterating Over Arrays For a 1D array, a loop gives individual values: ```python arr = np.array([10, 20, 30]) for value in arr: print(value) ``` **Explanation** - Initializes a NumPy array `arr` with three integer elements: 10, 20, and 30. - Uses a for loop to iterate over each element in the array. - Prints each element of the array to the console, one per line. For a 2D array, a loop gives rows: ```python matrix = np.arange(6).reshape(2, 3) for row in matrix: print(row) ``` **Explanation** - The code uses NumPy's `arange` function to generate an array of integers from 0 to 5. - The `reshape` method is then called to transform this 1D array into a 2D array with 2 rows and 3 columns. - A for loop iterates over each row of the 2D array, allowing individual rows to be printed. - Each row is printed as a separate NumPy array, showcasing the structure of the 2D matrix. Output: ```text [0 1 2] [3 4 5] ``` If you want every element regardless of dimensions, use `np.nditer()`: ```python for value in np.nditer(matrix): print(value) ``` **Explanation** - Utilizes `np.nditer`, a NumPy function designed for efficient iteration over arrays. - The loop iterates through each element in the specified `matrix`, allowing access to each value sequentially. - The `print(value)` statement outputs each element to the console, displaying the contents of the matrix. - This approach is particularly useful for large matrices, as `nditer` optimizes memory usage and performance during iteration. Use normal vectorized operations when possible. Iteration is useful for learning, debugging, or special cases, but NumPy is usually strongest when you avoid Python loops. ## 28. Transpose Transpose swaps rows and columns. ```python matrix = np.array([ [1, 2, 3], [4, 5, 6], ]) print(matrix.T) ``` **Explanation** - The code initializes a 2D NumPy array named `matrix` with two rows and three columns. - The `np.array` function is used to create the array from a list of lists. - The `print(matrix.T)` statement outputs the transposed version of the array, where rows become columns and vice versa. - The `.T` attribute is a convenient way to access the transpose of a NumPy array. - This operation is useful in various mathematical and data manipulation tasks where the orientation of data needs to be changed. Output: ```text [[1 4] [2 5] [3 6]] ``` You can also use: ```python print(np.transpose(matrix)) ``` **Explanation** - Utilizes the `transpose` function from the NumPy library to switch the rows and columns of the input matrix. - The `matrix` variable should be a NumPy array or a compatible structure for the function to work correctly. - The result is a new matrix where the first row becomes the first column, the second row becomes the second column, and so on. - This operation is commonly used in linear algebra and data manipulation tasks. ## 29. Flattening With ravel() `ravel()` turns a multi-dimensional array into a 1D view when possible. ```python matrix = np.arange(12).reshape(3, 4) flat = matrix.ravel() print(flat) ``` **Explanation** - The code initializes a 3x4 matrix using `np.arange(12)`, which creates an array of integers from 0 to 11. - The `reshape(3, 4)` method reshapes the array into a 3-row by 4-column format. - The `ravel()` function is called on the matrix to flatten it into a one-dimensional array. - Finally, the flattened array is printed, displaying the elements in a single row. Output: ```text [ 0 1 2 3 4 5 6 7 8 9 10 11] ``` This is useful when a function expects a simple 1D input. ## 30. Horizontal and Vertical Stacking Stacking combines arrays. Create two arrays with the same shape: ```python left = np.array([ [1, 2], [3, 4], ]) right = np.array([ [10, 20], [30, 40], ]) ``` **Explanation** - The `left` variable is a 2x2 NumPy array containing the integers 1, 2, 3, and 4. - The `right` variable is another 2x2 NumPy array containing the integers 10, 20, 30, and 40. - Both arrays can be used for matrix operations such as addition, multiplication, or other linear algebra computations. - NumPy is a powerful library in Python for numerical and scientific computing, providing efficient array operations. ### Horizontal stack `hstack()` joins arrays side by side. ```python print(np.hstack((left, right))) ``` **Explanation** - Utilizes the `np.hstack` function from the NumPy library to concatenate arrays. - Takes two input arrays, `left` and `right`, and combines them along their horizontal axis. - The resulting array maintains the same number of rows as the input arrays, effectively merging their columns. - This operation is useful for data manipulation and preparation in numerical computations. Output: ```text [[ 1 2 10 20] [ 3 4 30 40]] ``` ### Vertical stack `vstack()` joins arrays top to bottom. ```python print(np.vstack((left, right))) ``` **Explanation** - Utilizes the `np.vstack()` function from the NumPy library to combine arrays. - Takes two input arrays, `left` and `right`, and stacks them on top of each other. - The resulting array maintains the shape of the original arrays, provided they have the same number of columns. - Useful for consolidating data from different sources into a single dataset for analysis or processing. Output: ```text [[ 1 2] [ 3 4] [10 20] [30 40]] ``` The shapes must be compatible. If stacking fails, print the shapes first. ```python print(left.shape) print(right.shape) ``` **Explanation** - The code uses the `shape` attribute to retrieve the dimensions of two NumPy arrays, `left` and `right`. - `print(left.shape)` outputs the size of the `left` array, indicating how many elements it contains in each dimension. - `print(right.shape)` performs the same function for the `right` array, providing its dimensionality. - This information is useful for understanding the structure of the data before performing further operations or analyses. ## 31. Splitting Arrays Splitting breaks an array into smaller arrays. ```python data = np.arange(12).reshape(3, 4) print(data) ``` **Explanation** - The code uses NumPy's `arange` function to generate an array of integers from 0 to 11. - The `reshape` method is then called to transform this 1D array into a 2D array with 3 rows and 4 columns. - Finally, the reshaped array is printed to the console, displaying its structured format. - This technique is useful for organizing data in a matrix form for further analysis or manipulation. Output: ```text [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] ``` ### Horizontal split ```python parts = np.hsplit(data, 2) for part in parts: print(part) ``` **Explanation** - The `np.hsplit` function is used to horizontally split the `data` array into two equal parts. - The resulting parts are stored in the `parts` variable as a list of arrays. - A `for` loop iterates through each array in the `parts` list. - The `print` function outputs each part to the console, allowing for easy visualization of the split data. This splits the columns into 2 equal parts. ### Vertical split ```python rows = np.vsplit(data, 3) for row_group in rows: print(row_group) ``` **Explanation** - The `np.vsplit` function from the NumPy library divides the `data` array into three equal vertical slices. - The resulting `rows` variable is a list containing these three sections. - A for loop iterates through each section in `rows`, allowing individual printing of each slice. - This approach is useful for visualizing or processing parts of a larger dataset separately. This splits the rows into 3 equal parts. If the array cannot be split evenly, NumPy raises an error. ## 32. Beginner Mistakes to Avoid ### Mistake 1: Forgetting that reshape must preserve size ```python np.arange(10).reshape(3, 4) ``` **Explanation** - Utilizes the `np.arange(10)` function to generate an array of integers from 0 to 9. - The `reshape(3, 4)` method reorganizes the flat array into a 3-row by 4-column format. - The total number of elements (10) must match the product of the specified dimensions (3 * 4 = 12), resulting in an error if mismatched. - This code is useful for preparing data in a structured format for further analysis or manipulation in numerical computations. This fails because 10 values cannot fill 12 positions. ### Mistake 2: Confusing axis directions For a 2D array: - `axis=0` gives column-wise results - `axis=1` gives row-wise results ### Mistake 3: Expecting lists and arrays to behave the same ```python print([1, 2, 3] * 2) print(np.array([1, 2, 3]) * 2) ``` **Explanation** - The first line multiplies a Python list `[1, 2, 3]` by `2`, resulting in the list being repeated twice: `[1, 2, 3, 1, 2, 3]`. - The second line uses NumPy to create an array from `[1, 2, 3]` and multiplies each element by `2`, producing a new array: `[2, 4, 6]`. - This showcases the difference in behavior between standard Python lists and NumPy arrays when using the multiplication operator. - The output of the first print statement is a concatenated list, while the second results in element-wise multiplication. - To use the second line, the NumPy library must be imported as `import numpy as np`. Output: ```text [1, 2, 3, 1, 2, 3] [2 4 6] ``` Lists repeat. NumPy arrays multiply element by element. ### Mistake 4: Not checking shape before operations When something fails, print: ```python print(arr.shape) print(arr.dtype) ``` **Explanation** - The `print(arr.shape)` statement outputs the dimensions of the NumPy array `arr`, indicating how many elements are along each axis. - The `print(arr.dtype)` statement reveals the data type of the elements contained in the array, such as integers, floats, or strings. - This information is crucial for understanding the structure and type of data being handled in numerical computations. - Both attributes help in debugging and optimizing performance when working with large datasets in scientific computing. This small habit solves many beginner errors. ## 33. Practice Exercises Try these before reading the solutions. ### Exercise 1: Create a mostly-zero vector Create an array of size 10 filled with zeros. Change the value at index 4 to `1`. ### Exercise 2: Random score table Create a random array with shape `(4, 3)` to represent marks for 4 students in 3 tests. Print: - the array - the average of all marks - the average mark for each student ### Exercise 3: Border matrix Write a function `make_border(rows, cols)` that returns a 2D array with ones on the border and zeros inside. For `rows=4` and `cols=5`, the result should look like: ```text [[1. 1. 1. 1. 1.] [1. 0. 0. 0. 1.] [1. 0. 0. 0. 1.] [1. 1. 1. 1. 1.]] ``` ### Exercise 4: Values between 0 and 1 Create 8 evenly spaced values between 0 and 1, excluding both 0 and 1. ### Exercise 5: Row pattern Create a 5 by 5 matrix where every row is: ```text [0 1 2 3 4] ``` ### Exercise 6: Distance from a point You have coordinate points: ```python points = np.array([ [2, 3], [5, 7], [1, 8], [9, 4], ]) ``` **Explanation** - Initializes a NumPy array named `points` to hold multiple 2D coordinates. - Each inner list represents a point in a Cartesian coordinate system, with the first element as the x-coordinate and the second as the y-coordinate. - The array is structured as a 2xN matrix, where N is the number of points defined. - This format is useful for mathematical operations and visualizations in data analysis and machine learning tasks. Calculate the distance of every point from: ```python target = np.array([3, 4]) ``` **Explanation** - Initializes a NumPy array named `target` containing the elements 3 and 4. - The array can be used for mathematical operations, such as vector calculations in data analysis or machine learning. - NumPy is a powerful library in Python for numerical computations, providing efficient storage and operations on large datasets. ### Exercise 7: Replace odd values Create an array from 0 to 9. Replace odd values with `-1`. ### Exercise 8: Column swap Create a 3 by 3 array from 1 to 9. Swap the first and last columns. ### Exercise 9: Row normalization Create a 3 by 4 random integer array. Normalize each row using: ```text (row - row_min) / (row_max - row_min) ``` ### Exercise 10: Nth largest value Write a function `nth_largest(arr, n)` that returns the nth largest value from a 1D NumPy array. ## 34. Practice Solutions ### Solution 1: Create a mostly-zero vector ```python import numpy as np vector = np.zeros(10) vector[4] = 1 print(vector) ``` **Explanation** - The code imports the NumPy library, which is essential for numerical operations in Python. - A NumPy array named `vector` is initialized with ten elements, all set to zero using `np.zeros(10)`. - The fifth element (index 4) of the array is then set to one, modifying the initial array of zeros. - Finally, the modified array is printed, displaying a vector with a single one at the fifth position and zeros elsewhere. ### Solution 2: Random score table ```python rng = np.random.default_rng(7) marks = rng.integers(0, 101, size=(4, 3)) print(marks) print("Overall average:", marks.mean()) print("Student averages:", marks.mean(axis=1)) ``` **Explanation** - Initializes a random number generator with a fixed seed of 7 for reproducibility. - Generates a 4x3 array of random integers between 0 and 100, simulating marks for 4 students across 3 subjects. - Prints the generated marks array to the console. - Calculates and prints the overall average of all marks in the array. - Computes and displays the average marks for each student by averaging across the subjects (axis=1). ### Solution 3: Border matrix ```python def make_border(rows, cols): if rows < 2 or cols < 2: raise ValueError("rows and cols must both be at least 2") result = np.ones((rows, cols)) result[1:-1, 1:-1] = 0 return result print(make_border(4, 5)) ``` **Explanation** - The function `make_border` takes two parameters, `rows` and `cols`, which define the dimensions of the matrix. - It raises a `ValueError` if either `rows` or `cols` is less than 2, ensuring a valid border can be created. - A NumPy array filled with ones is initialized, representing the outer border of the matrix. - The inner section of the matrix (excluding the border) is set to zero, creating a clear distinction between the border and the inner area. - The function returns the resulting matrix, which can be printed or used for further processing. ### Solution 4: Values between 0 and 1 ```python values = np.linspace(0, 1, 10)[1:-1] print(values) ``` **Explanation** - Uses NumPy's `linspace` function to create an array of 10 evenly spaced values between 0 and 1. - The slicing operation `[1:-1]` removes the first and last elements of the generated array, effectively excluding 0 and 1. - The resulting array contains 8 values, which are printed to the console. - This technique is useful for generating test data or parameters for simulations where endpoints are not required. Why 10 values? Because including 0 and 1 gives 10 points, and removing both ends leaves 8 inner points. ### Solution 5: Row pattern ```python pattern = np.zeros((5, 5), dtype=int) pattern += np.arange(5) print(pattern) ``` **Explanation** - Initializes a 5x5 matrix filled with zeros using NumPy's `zeros` function. - Uses `np.arange(5)` to generate an array of integers from 0 to 4. - The addition operation (`+=`) adds the array to each row of the matrix, resulting in each row containing the same incremental values. - The final output displays the modified matrix, where each row contains the values [0, 1, 2, 3, 4]. Alternative: ```python pattern = np.tile(np.arange(5), (5, 1)) print(pattern) ``` **Explanation** - The `np.arange(5)` function generates a 1D array containing integers from 0 to 4. - The `np.tile()` function is used to repeat this 1D array 5 times along the vertical axis, creating a 2D array. - The resulting `pattern` variable is a 5x5 array where each row is identical and contains the sequence [0, 1, 2, 3, 4]. - The `print(pattern)` statement outputs the 2D array to the console for visualization. ### Solution 6: Distance from a point ```python points = np.array([ [2, 3], [5, 7], [1, 8], [9, 4], ]) target = np.array([3, 4]) distances = np.sqrt(np.sum((points - target) ** 2, axis=1)) print(distances) ``` **Explanation** - Initializes a NumPy array `points` containing multiple 2D coordinates. - Defines a `target` point as a NumPy array for which distances will be calculated. - Computes the Euclidean distance from the `target` to each point in `points` using the formula √((x2 - x1)² + (y2 - y1)²). - Utilizes broadcasting to subtract the `target` from each point and squares the result before summing along the specified axis. - Outputs the calculated distances as a NumPy array. Explanation: - `points - target` subtracts the target from every point - `** 2` squares the differences - `sum(axis=1)` adds x and y differences for each point - `sqrt()` calculates the final distance ### Solution 7: Replace odd values ```python arr = np.arange(10) arr[arr % 2 == 1] = -1 print(arr) ``` **Explanation** - Initializes a NumPy array `arr` containing integers from 0 to 9 using `np.arange(10)`. - Utilizes boolean indexing to identify odd numbers in the array with the condition `arr % 2 == 1`. - Replaces all identified odd numbers in the array with -1. - Prints the modified array, showing even numbers unchanged and odd numbers replaced. Output: ```text [ 0 -1 2 -1 4 -1 6 -1 8 -1] ``` ### Solution 8: Column swap ```python matrix = np.arange(1, 10).reshape(3, 3) swapped = matrix[:, [2, 1, 0]] print(matrix) print(swapped) ``` **Explanation** - The code initializes a 3x3 matrix using `np.arange(1, 10)` which generates numbers from 1 to 9 and reshapes it into a 3x3 format. - The `swapped` variable reorders the columns of the original matrix by selecting them in reverse order: from the last column to the first. - The original matrix and the modified matrix with swapped columns are printed to the console for comparison. - This showcases NumPy's powerful indexing capabilities for manipulating array structures efficiently. ### Solution 9: Row normalization ```python rng = np.random.default_rng(10) data = rng.integers(1, 50, size=(3, 4)) row_min = data.min(axis=1, keepdims=True) row_max = data.max(axis=1, keepdims=True) normalized = (data - row_min) / (row_max - row_min) print(data) print(normalized) ``` **Explanation** - Initializes a random number generator with a fixed seed for reproducibility. - Generates a 3x4 array of random integers between 1 and 50. - Computes the minimum and maximum values for each row while maintaining the original array's shape. - Applies min-max normalization to scale the data between 0 and 1 for each row. - Outputs both the original random integer array and the normalized array. `keepdims=True` keeps the result as a column shape, which allows NumPy broadcasting to work cleanly across each row. ### Solution 10: Nth largest value ```python def nth_largest(arr, n): if not isinstance(arr, np.ndarray): raise TypeError("arr must be a NumPy array") if arr.ndim != 1: raise ValueError("arr must be 1D") if n < 1 or n > arr.size: raise ValueError("n is outside the valid range") sorted_arr = np.sort(arr) return sorted_arr[-n] numbers = np.array([12, 4, 99, 18, 42]) print(nth_largest(numbers, 1)) print(nth_largest(numbers, 3)) ``` **Explanation** - Defines a function `nth_largest` that retrieves the n-th largest element from a given NumPy array. - Validates input to ensure the array is a 1D NumPy array and that n is within a valid range. - Sorts the array in ascending order using `np.sort()` and accesses the n-th largest element using negative indexing. - Demonstrates the function with a sample array of numbers, printing the largest and third largest elements. Output: ```text 99 18 ``` ## 35. Mini Project: Analyze Weekly Store Sales Let us combine the basics into one small task. Suppose you have sales from 4 stores across 7 days: ```python sales = np.array([ [120, 135, 150, 160, 155, 170, 180], [90, 95, 105, 110, 108, 120, 130], [200, 210, 190, 220, 230, 240, 250], [60, 75, 80, 85, 90, 95, 100], ]) ``` **Explanation** - A 2D NumPy array named `sales` is created to store sales figures for different categories over a series of time periods. - Each inner list represents sales data for a specific category, with values indicating sales amounts. - The array structure allows for efficient numerical operations and data manipulation using NumPy's powerful features. - This setup is useful for analyzing trends, comparing performance, and performing calculations on sales data. Find: - total sales for each store - total sales for each day - best store - best day - normalized sales for each store Solution: ```python store_totals = sales.sum(axis=1) day_totals = sales.sum(axis=0) best_store_index = np.argmax(store_totals) best_day_index = np.argmax(day_totals) store_min = sales.min(axis=1, keepdims=True) store_max = sales.max(axis=1, keepdims=True) normalized_sales = (sales - store_min) / (store_max - store_min) print("Store totals:", store_totals) print("Day totals:", day_totals) print("Best store index:", best_store_index) print("Best day index:", best_day_index) print("Normalized sales:") print(normalized_sales) ``` **Explanation** - Computes total sales for each store and each day using the `sum` function with specified axes. - Identifies the index of the store with the highest total sales and the day with the highest sales using `np.argmax`. - Calculates the minimum and maximum sales for each store to facilitate normalization. - Normalizes the sales data to a range of 0 to 1 by applying the formula `(sales - min) / (max - min)`. - Outputs the total sales, best store and day indices, and the normalized sales data for further analysis. This small project uses: - 2D arrays - axis-based aggregation - `argmax` - row-wise normalization - broadcasting These are the same building blocks used in real data analysis. ## 36. Quick Quiz ### 1. What is the difference between `np.arange()` and `np.linspace()`? `np.arange()` focuses on step size. `np.linspace()` focuses on the number of values. ### 2. What does `shape` tell you? It tells you how many elements exist along each dimension. ### 3. What does `axis=0` mean in a 2D array? It means calculate down the rows, producing one result for each column. ### 4. Why does `reshape()` sometimes fail? Because the new shape must contain the same total number of elements as the original array. ### 5. Why are vectorized operations useful? They let you apply operations to whole arrays with cleaner code and usually better performance than Python loops. ## Final Takeaway NumPy becomes easier when you focus on four questions: 1. What is the array shape? 2. What is the array dtype? 3. Which axis do I want? 4. Am I selecting, reshaping, combining, or calculating? If you can answer these questions, most beginner NumPy code becomes predictable. Start small. Print shapes often. Practice slicing. Use vectorized operations. Over time, NumPy will feel less like a library of random functions and more like a clear way to think about data. ## Sources and Further Reading - NumPy documentation: https://numpy.org/doc/ - NumPy quickstart: https://numpy.org/doc/stable/user/quickstart.html - NumPy ndarray reference: https://numpy.org/doc/stable/reference/arrays.ndarray.html - NumPy random generator guide: https://numpy.org/doc/stable/reference/random/generator.html