NumPy Basics: Arrays, Indexing, and Operations Guide

May 23, 2026
24 min read

AI Insights

Powered by GPT-4o-mini

Verified Context: numpy-basics-arrays-indexing-and-operations-guide
Quick Answer

Learn NumPy from the ground up: create arrays, understand shape and dtype, perform vectorized operations, use axes, slice 1D/2D/3D arrays, reshape data, stack arrays, split arrays, and solve beginner practice tasks.

Quick Summary

Learn NumPy fundamentals: create arrays, index data, and perform operations efficiently for data analysis and machine learning.

NumPy Basics: Arrays, Shapes, Dtypes, Indexing, and Array Operations

NumPy is one of the most important Python libraries for data analysis, machine learning, scientific computing, and numerical programming.

The main idea is simple:

NumPy lets you store numbers in compact arrays and perform operations on the whole array at once.

If you are learning data science, you will see NumPy everywhere. Pandas, scikit-learn, TensorFlow, PyTorch, image-processing libraries, and many plotting tools all depend on array-style thinking.

This lesson starts from the basics. You will learn how to create arrays, inspect their shape, change data types, perform calculations, slice arrays, reshape data, combine arrays, split arrays, and solve practical beginner problems.

What you will learn

By the end, you should be able to:

  • explain why NumPy arrays are useful
  • create 1D, 2D, and 3D arrays
  • use np.array(), np.arange(), np.zeros(), np.ones(), np.linspace(), and np.eye()
  • inspect ndim, shape, size, dtype, and itemsize
  • convert array data types with astype()
  • perform scalar and array operations
  • use aggregate functions such as sum, mean, min, max, and std
  • understand the meaning of axis=0 and axis=1
  • index and slice arrays confidently
  • reshape, transpose, flatten, stack, and split arrays
  • solve beginner NumPy practice problems

1. What Is NumPy?

NumPy is a Python library for working with numerical arrays.

A normal Python list can store values:

python
scores = [72, 85, 91, 64]

That is useful, but if you want to add 5 marks to every score, a list needs a loop:

python
scores = [72, 85, 91, 64]

updated = []
for score in scores:
    updated.append(score + 5)

print(updated)

Output:

text
[77, 90, 96, 69]

With NumPy, you can apply the operation directly:

python
import numpy as np

scores = np.array([72, 85, 91, 64])
updated = scores + 5

print(updated)

Output:

text
[77 90 96 69]

This is called a vectorized operation. You write less code, and NumPy performs the calculation efficiently.

2. NumPy Arrays vs Python Lists

Python lists are flexible. They can grow, shrink, and contain mixed types.

python
mixed = ["Python", 10, True]
print(mixed)

NumPy arrays are designed for numerical work. In most practical cases, all values in one array share the same data type.

python
numbers = np.array([10, 20, 30])
print(numbers)
print(numbers.dtype)

Output:

text
[10 20 30]
int64

On some systems, the integer dtype may appear as int32 instead of int64. The exact default can depend on your platform.

Here is the practical difference:

FeaturePython listNumPy array
Best forGeneral Python objectsNumerical data
Mixed data typesCommonUsually avoided
Vectorized mathNoYes
Memory layoutFlexibleCompact
Data science useInput/helper structureCore structure

Use lists when you need general-purpose Python containers. Use NumPy arrays when you need fast numerical operations.

3. Installing and Importing NumPy

If NumPy is not installed, install it with:

bash
pip install numpy

Then import it:

python
import numpy as np

The alias np is the standard convention. You will see it in documentation, tutorials, notebooks, and production code.

4. Creating a 1D Array

A 1D array is like a simple row of values.

python
import numpy as np

marks = np.array([80, 75, 92, 68])

print(marks)
print(type(marks))

Output:

text
[80 75 92 68]
<class 'numpy.ndarray'>

The object type is ndarray, which means n-dimensional array.

5. Creating 2D Arrays

A 2D array is like a table with rows and columns.

python
sales = np.array([
    [120, 135, 150],
    [90, 110, 125],
])

print(sales)

Output:

text
[[120 135 150]
 [ 90 110 125]]

This array has:

  • 2 rows
  • 3 columns

You can think of it as sales data for 2 stores across 3 days.

6. Creating 3D Arrays

A 3D array is like multiple tables stacked together.

python
weekly_sales = np.array([
    [
        [120, 135],
        [90, 110],
    ],
    [
        [140, 160],
        [100, 115],
    ],
])

print(weekly_sales)

This can represent:

  • 2 weeks
  • 2 stores
  • 2 days per week

In machine learning, 3D and higher-dimensional arrays are common. Images, batches of images, time-series windows, embeddings, and tensors all use this style of structure.

7. Choosing a Data Type With dtype

You can tell NumPy which type to use.

python
prices = np.array([99, 149, 199], dtype=float)

print(prices)
print(prices.dtype)

Output:

text
[ 99. 149. 199.]
float64

You can also create boolean arrays:

python
availability = np.array([1, 0, 1, 1], dtype=bool)

print(availability)

Output:

text
[ True False  True  True]

And complex arrays:

python
signals = np.array([2, 5, 8], dtype=complex)

print(signals)

Output:

text
[2.+0.j 5.+0.j 8.+0.j]

For beginner data analysis, the most common dtypes are integers, floats, booleans, strings, and dates.

8. Creating Ranges With np.arange()

np.arange() creates values in a range.

python
numbers = np.arange(1, 8)

print(numbers)

Output:

text
[1 2 3 4 5 6 7]

The stop value is not included.

You can add a step:

python
even_numbers = np.arange(2, 13, 2)

print(even_numbers)

Output:

text
[ 2  4  6  8 10 12]

You can count backward:

python
countdown = np.arange(5, 0, -1)

print(countdown)

Output:

text
[5 4 3 2 1]

9. Reshaping a Range

reshape() changes how values are arranged.

python
grid = np.arange(1, 13).reshape(3, 4)

print(grid)

Output:

text
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

The total number of values must match.

This works because:

text
3 rows x 4 columns = 12 values

This will fail:

python
np.arange(1, 13).reshape(5, 3)

Why?

text
5 rows x 3 columns = 15 positions

But the array has only 12 values.

10. Creating Arrays of Zeros and Ones

np.zeros() creates an array filled with zero.

python
empty_scores = np.zeros((3, 4))

print(empty_scores)

Output:

text
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

np.ones() creates an array filled with one.

python
default_flags = np.ones((2, 5))

print(default_flags)

Output:

text
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

These are useful when you want to create a placeholder array before filling it with real values.

11. Creating Random Arrays

Random arrays are useful for demos, simulations, and testing.

python
random_values = np.random.random((2, 3))

print(random_values)

This creates a 2 by 3 array with values between 0 and 1.

For reproducible examples, use a random generator with a seed:

python
rng = np.random.default_rng(42)
sample = rng.random((2, 3))

print(sample)

Using a seed helps you get the same random values every time you run the code.

12. Creating Evenly Spaced Values With linspace()

np.linspace() creates a fixed number of evenly spaced values between a start and end.

python
temperatures = np.linspace(0, 100, 6)

print(temperatures)

Output:

text
[  0.  20.  40.  60.  80. 100.]

Use linspace() when you care about how many values you want.

Use arange() when you care about the step size.

13. Creating Identity Matrices

An identity matrix has ones on the main diagonal and zeros everywhere else.

python
identity = np.eye(4)

print(identity)

Output:

text
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Identity matrices are common in linear algebra.

If you want a rectangular matrix with diagonal ones, use np.eye() with two dimensions:

python
wide_identity = np.eye(3, 5)

print(wide_identity)

Output:

text
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]]

14. Important Array Attributes

Create three arrays:

python
one_d = np.arange(6)
two_d = np.arange(12).reshape(3, 4)
three_d = np.arange(24).reshape(2, 3, 4)

ndim

ndim tells you how many dimensions an array has.

python
print(one_d.ndim)
print(two_d.ndim)
print(three_d.ndim)

Explanation

  • The code uses the ndim attribute of NumPy arrays to determine their dimensionality.
  • one_d, two_d, and three_d are assumed to be NumPy arrays with one, two, and three dimensions, respectively.
  • The print function outputs the number of dimensions for each array to the console.
  • This is useful for understanding the structure of the data being handled in scientific computing or data analysis tasks.

Output:

text
1
2
3

shape

shape tells you the size of each dimension.

python
print(one_d.shape)
print(two_d.shape)
print(three_d.shape)

Explanation

  • The print function outputs the shape of each array to the console.
  • one_d.shape accesses the shape attribute of a one-dimensional array, returning its length.
  • two_d.shape retrieves the dimensions of a two-dimensional array, typically returning a tuple of rows and columns.
  • three_d.shape provides the dimensions of a three-dimensional array, returning a tuple representing depth, height, and width.
  • This code is useful for understanding the structure and size of different types of arrays in numerical computing.

Output:

text
(6,)
(3, 4)
(2, 3, 4)

For three_d, the shape means:

text
2 blocks, 3 rows per block, 4 columns per row

size

size tells you the total number of elements.

python
print(one_d.size)
print(two_d.size)
print(three_d.size)

Explanation

  • The print function is used to output the sizes of the arrays to the console.
  • one_d.size, two_d.size, and three_d.size access the size attribute of each respective array, which indicates the total number of elements in the array.
  • This code assumes that one_d, two_d, and three_d are pre-defined array-like objects, likely from libraries such as NumPy.
  • The output will display the sizes in the order of the arrays defined, helping to understand their dimensions.

Output:

text
6
12
24

dtype

dtype tells you the data type.

python
print(two_d.dtype)

Explanation

  • The print function outputs the result to the console.
  • two_d is expected to be a NumPy array, which is a common data structure for numerical data in Python.
  • The dtype attribute of a NumPy array provides information about the type of data stored in the array, such as integers, floats, etc.
  • This code is useful for debugging or understanding the nature of the data being processed.

itemsize

itemsize tells you how many bytes each element uses.

python
small = np.array([1, 2, 3], dtype=np.int16)
large = np.array([1, 2, 3], dtype=np.int64)

print(small.itemsize)
print(large.itemsize)

Explanation

  • The code creates two NumPy arrays, small and large, with different data types: int16 and int64, respectively.
  • The itemsize attribute of a NumPy array returns the size in bytes of each element in the array.
  • The print statements output the memory size of each element for both arrays, illustrating the difference in storage requirements between the two data types.
  • This comparison is useful for understanding how data types affect memory usage in numerical computations.

Output:

text
2
8

This matters when you work with large datasets.

15. Changing Data Types With astype()

Use astype() to create a converted copy of an array.

python
ratings = np.array([4.8, 3.2, 5.0, 2.9])
rounded_ratings = ratings.astype(int)

print(rounded_ratings)

Explanation

  • The code initializes a NumPy array named ratings containing floating-point numbers representing ratings.
  • The astype(int) method is used to convert each floating-point rating in the array to an integer, effectively rounding down.
  • The resulting integer array is stored in the variable rounded_ratings.
  • Finally, the code prints the rounded_ratings array to the console, displaying the integer values.

Output:

text
[4 3 5 2]

Be careful: converting floats to integers removes the decimal part. It does not round to the nearest number.

python
values = np.array([1.9, 2.1, 3.7])

print(values.astype(int))

Explanation

  • The code initializes a NumPy array named values containing floating-point numbers.
  • The astype(int) method is called on the array, which converts each element from float to integer type.
  • The print function outputs the resulting array, displaying the integer values after conversion.
  • This operation truncates the decimal part of each float, effectively rounding down to the nearest whole number.

Output:

text
[1 2 3]

If you want proper rounding, use np.round() first:

python
print(np.round(values).astype(int))

Explanation

  • The code uses the NumPy library to handle numerical operations on arrays.
  • np.round(values) rounds each element in the values array to the nearest integer.
  • The result of the rounding is then converted to an integer type using .astype(int).
  • This is useful for preparing data for scenarios where integer values are required, such as indexing or counting.

Output:

text
[2 2 4]

16. Scalar Operations

A scalar is a single value.

python
prices = np.array([100, 200, 300])

print(prices + 10)
print(prices * 2)
print(prices / 4)

Explanation

  • Initializes a NumPy array named prices containing three integer values: 100, 200, and 300.
  • Adds 10 to each element in the prices array, resulting in a new array with values [110, 210, 310].
  • Multiplies each element in the prices array by 2, producing an array with values [200, 400, 600].
  • Divides each element in the prices array by 4, yielding an array with values [25.0, 50.0, 75.0].

Output:

text
[110 210 310]
[200 400 600]
[25. 50. 75.]

The operation is applied to every element.

You can also compare every element:

python
marks = np.array([45, 72, 88, 39])

print(marks >= 50)

Explanation

  • The code initializes a NumPy array named marks containing four integer values representing scores.
  • It uses a comparison operation (>= 50) to create a boolean array indicating which scores are greater than or equal to 50.
  • The result of the comparison is printed, showing True for scores that meet the condition and False for those that do not.
  • This operation is useful for quickly assessing performance against a passing mark.

Output:

text
[False  True  True False]

This returns a boolean array.

17. Array-to-Array Operations

Arrays with the same shape can be added, subtracted, multiplied, and compared element by element.

python
jan = np.array([120, 90, 150])
feb = np.array([130, 85, 170])

print(feb - jan)
print(feb > jan)

Explanation

  • Two NumPy arrays, jan and feb, are created to represent values for January and February.
  • The expression feb - jan calculates the difference between corresponding elements of the two arrays, showing how values changed from January to February.
  • The expression feb > jan performs an element-wise comparison, returning a boolean array indicating whether each value in February is greater than the corresponding value in January.

Output:

text
[10 -5 20]
[ True False  True]

For 2D arrays:

python
morning = np.array([
    [8, 10, 12],
    [7, 9, 11],
])

evening = np.array([
    [5, 6, 7],
    [4, 5, 6],
])

print(morning + evening)

Explanation

  • The code initializes two 2D NumPy arrays, morning and evening, containing integer values.
  • It uses the np.array function from the NumPy library to create these arrays.
  • The print function outputs the result of adding the two arrays together, performing element-wise addition.
  • The resulting array will have the same shape as the input arrays, with each element being the sum of the corresponding elements from morning and evening.

Output:

text
[[13 16 19]
 [11 14 17]]

18. Useful Array Functions

Create a small dataset:

python
orders = np.array([
    [12, 18, 10],
    [9, 15, 21],
    [14, 11, 16],
])

Explanation

  • Initializes a NumPy array named orders to store a 2D matrix.
  • The matrix consists of three rows and three columns, representing different order quantities.
  • Each inner list corresponds to a specific order, with values indicating quantities of items ordered.
  • This structure is useful for performing mathematical operations or analyses on order data.

Sum

python
print(np.sum(orders))

Explanation

  • Utilizes the NumPy library, which is commonly used for numerical operations in Python.
  • The np.sum() function computes the sum of all elements in the provided array, 'orders'.
  • The result is printed to the console, allowing for immediate visibility of the total sum.
  • This operation is efficient for large datasets due to NumPy's optimized performance.

Output:

text
126

Minimum and maximum

python
print(np.min(orders))
print(np.max(orders))

Explanation

  • Utilizes the NumPy library, which is commonly used for numerical operations in Python.
  • np.min(orders) computes and prints the smallest value in the orders array.
  • np.max(orders) computes and prints the largest value in the orders array.
  • This code is useful for quickly assessing the range of values in a dataset.

Output:

text
9
21

Mean

python
print(np.mean(orders))

Explanation

  • Utilizes the mean function from the NumPy library to compute the average.
  • Assumes orders is a NumPy array or a list containing numerical values.
  • The result is printed directly to the console, providing immediate feedback on the average order value.
  • This operation is efficient for large datasets due to NumPy's optimized performance.

Output:

text
14.0

Standard deviation

python
print(np.std(orders))

Explanation

  • Utilizes the np.std() function from the NumPy library to compute the standard deviation.
  • The input orders is expected to be a NumPy array or a list containing numerical data.
  • Standard deviation measures the amount of variation or dispersion in a set of values.
  • The result is printed to the console, providing insight into the variability of the orders dataset.
  • This function is useful for statistical analysis and understanding data distribution.

Standard deviation tells you how spread out the values are.

19. Understanding axis

The axis argument tells NumPy which direction to calculate across.

Use this array:

python
orders = np.array([
    [12, 18, 10],
    [9, 15, 21],
    [14, 11, 16],
])

Explanation

  • Initializes a NumPy array named orders to store a 2D matrix.
  • The matrix consists of three rows and three columns, representing different order quantities.
  • Each inner list corresponds to a specific order, with values indicating quantities of items ordered.
  • This structure is useful for performing mathematical operations or analyses on order data.

Think of it as:

text
rows    -> different stores
columns -> different days

axis=0

axis=0 works down the rows, so it produces one result per column.

python
print(np.sum(orders, axis=0))

Explanation

  • Utilizes the NumPy library to perform efficient numerical operations on arrays.
  • The np.sum() function calculates the sum of array elements.
  • The axis=0 parameter specifies that the sum should be computed column-wise (i.e., summing across rows).
  • This operation is useful for aggregating data, such as total sales or counts, from multiple orders.

Output:

text
[35 44 47]

This means:

  • day 1 total: 12 + 9 + 14 = 35
  • day 2 total: 18 + 15 + 11 = 44
  • day 3 total: 10 + 21 + 16 = 47

axis=1

axis=1 works across the columns, so it produces one result per row.

python
print(np.sum(orders, axis=1))

Explanation

  • Utilizes the NumPy library to perform efficient numerical operations on arrays.
  • The np.sum function calculates the sum of elements along a specified axis.
  • The parameter axis=1 indicates that the summation is performed across rows, resulting in a single sum for each row.
  • This operation is useful for aggregating data, such as total orders per customer in a dataset.

Output:

text
[40 45 41]

This means:

  • store 1 total: 40
  • store 2 total: 45
  • store 3 total: 41

This is one of the most important NumPy ideas. If your result shape looks wrong, check your axis.

20. Mathematical Functions

NumPy includes many mathematical functions.

python
values = np.array([1, 2, 3, 4])

print(np.sqrt(values))
print(np.exp(values))
print(np.log(values))

Explanation

  • The code initializes a NumPy array called values containing integers from 1 to 4.
  • It calculates the square root of each element in the array using np.sqrt(), which returns an array of square roots.
  • The exponential function is applied to each element with np.exp(), resulting in an array of e raised to the power of each value.
  • The natural logarithm of each element is computed using np.log(), producing an array of logarithmic values.

Output:

text
[1.         1.41421356 1.73205081 2.        ]
[ 2.71828183  7.3890561  20.08553692 54.59815003]
[0.         0.69314718 1.09861229 1.38629436]

Trigonometric functions also work:

python
angles = np.array([0, np.pi / 2, np.pi])

print(np.sin(angles))

Explanation

  • The code imports the NumPy library and creates an array of angles in radians: 0, π/2, and π.
  • It then computes the sine of each angle in the array using the np.sin() function.
  • The result is printed, showing the sine values corresponding to the input angles: 0, 1, and 0, respectively.
  • This snippet demonstrates the use of vectorized operations in NumPy for efficient mathematical computations.

Output:

text
[0.0000000e+00 1.0000000e+00 1.2246468e-16]

The last value is extremely close to zero. Floating-point calculations sometimes produce tiny approximation errors.

21. Rounding Values

python
measurements = np.array([2.2, 2.8, 3.1, 3.9])

print(np.round(measurements))
print(np.floor(measurements))
print(np.ceil(measurements))

Explanation

  • Initializes a NumPy array called measurements with floating-point values.
  • Uses np.round() to round each element in the array to the nearest integer.
  • Applies np.floor() to return the largest integer less than or equal to each element.
  • Utilizes np.ceil() to return the smallest integer greater than or equal to each element.

Output:

text
[2. 3. 3. 4.]
[2. 2. 3. 3.]
[3. 3. 4. 4.]

Use:

  • round for nearest value
  • floor for lower integer
  • ceil for higher integer

22. Dot Product

The dot product is a common linear algebra operation.

For two 1D arrays:

python
weights = np.array([0.2, 0.5, 0.3])
features = np.array([80, 60, 90])

score = np.dot(weights, features)

print(score)

Explanation

  • The weights array contains the coefficients that represent the importance of each feature.
  • The features array holds the values of the features being evaluated.
  • The np.dot() function computes the dot product of the weights and features, resulting in a single score that reflects the weighted sum.
  • Finally, the calculated score is printed to the console, providing a quantitative assessment based on the input data.

Output:

text
73.0

This is:

text
0.2*80 + 0.5*60 + 0.3*90

For matrices, the inner dimensions must match.

python
a = np.arange(6).reshape(2, 3)
b = np.arange(12).reshape(3, 4)

print(np.dot(a, b))

Explanation

  • The code initializes two NumPy arrays, a and b, with specified shapes using np.arange() and reshape().
  • Array a is a 2x3 matrix containing values from 0 to 5, while array b is a 3x4 matrix containing values from 0 to 11.
  • The np.dot() function is used to compute the dot product of the two matrices, resulting in a new 2x4 matrix.
  • The result of the dot product is printed to the console, showcasing the multiplication of the two matrices.

Here:

text
a shape = (2, 3)
b shape = (3, 4)
result shape = (2, 4)

23. Indexing 1D Arrays

Indexing means selecting values by position.

python
scores = np.array([55, 70, 82, 91, 64])

print(scores[0])
print(scores[3])
print(scores[-1])

Explanation

  • Initializes a NumPy array named scores containing five integer values representing scores.
  • Uses print(scores[0]) to output the first element of the array, which is 55.
  • Uses print(scores[3]) to output the fourth element of the array, which is 91.
  • Uses print(scores[-1]) to output the last element of the array, which is 64, demonstrating negative indexing.

Output:

text
55
91
64

Python indexing starts at zero.

Negative indexing starts from the end.

24. Slicing 1D Arrays

Slicing selects a range.

python
scores = np.array([55, 70, 82, 91, 64, 77])

print(scores[1:4])

Explanation

  • The code initializes a NumPy array named scores containing six integer values representing scores.
  • The slicing operation scores[1:4] retrieves elements from index 1 to index 3 (inclusive of 1 and exclusive of 4).
  • The print function outputs the sliced portion of the array, which consists of the scores 70, 82, and 91.
  • This technique is useful for accessing a subset of data within a larger dataset efficiently.

Output:

text
[70 82 91]

The start index is included. The stop index is excluded.

You can add a step:

python
print(scores[0:6:2])

Explanation

  • The print function outputs the result to the console.
  • scores[0:6:2] uses Python's list slicing syntax to access elements.
  • The slice starts at index 0 and goes up to, but does not include, index 6.
  • The step value of 2 indicates that every second element within the specified range will be selected.
  • This is useful for extracting specific elements from a list efficiently.

Output:

text
[55 82 64]

Reverse an array:

python
print(scores[::-1])

Explanation

  • The slicing syntax [::-1] is used to create a reversed copy of the 'scores' list.
  • The print() function outputs the reversed list to the console.
  • This approach is efficient and concise for reversing lists in Python.
  • The original 'scores' list remains unchanged after this operation.

Output:

text
[77 64 91 82 70 55]

25. Indexing 2D Arrays

Use row and column positions.

python
table = np.array([
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90],
])

print(table[1, 2])

Explanation

  • A 2D NumPy array named table is created with three rows and three columns containing integer values.
  • The print function is used to output a specific element from the array.
  • The element accessed is located at the second row (index 1) and third column (index 2), which corresponds to the value 60.
  • NumPy uses zero-based indexing, meaning the first row and column are indexed as 0.
  • This code snippet demonstrates how to retrieve values from a multi-dimensional array efficiently.

Output:

text
60

This means:

text
row index 1, column index 2

Select a full row:

python
print(table[0, :])

Explanation

  • The print function outputs data to the console.
  • table is expected to be a 2D array or similar data structure, such as a NumPy array or a list of lists.
  • The indexing [0, :] selects all columns of the first row (index 0) of the table.
  • The colon : indicates that all elements in that row should be included in the output.
  • This snippet is useful for quickly inspecting the contents of the first row in a dataset.

Output:

text
[10 20 30]

Select a full column:

python
print(table[:, 1])

Explanation

  • The code utilizes NumPy's slicing feature to access specific elements in a 2D array.
  • table[:, 1] selects all rows (:) from the second column (1) of the table array.
  • This operation returns a one-dimensional array containing all values from the specified column.
  • It is a common technique for data manipulation and analysis in scientific computing with Python.
  • Ensure that table is a NumPy array for this slicing syntax to work correctly.

Output:

text
[20 50 80]

Select a smaller block:

python
print(table[0:2, 1:3])

Explanation

  • The code uses the print function to display the output of the slicing operation.
  • table is assumed to be a 2D array or matrix-like structure, such as a NumPy array.
  • The slicing 0:2 indicates that it will select the first two rows (index 0 and 1).
  • The slicing 1:3 indicates that it will select the second and third columns (index 1 and 2).
  • The result is a smaller 2D array containing the specified rows and columns from the original table.

Output:

text
[[20 30]
 [50 60]]

26. Indexing 3D Arrays

A 3D array has three indexes:

text
block, row, column

Example:

python
cube = np.arange(24).reshape(2, 3, 4)

print(cube)

Explanation

  • The code utilizes NumPy to create a 3D array named cube with dimensions 2x3x4.
  • np.arange(24) generates a one-dimensional array with values from 0 to 23.
  • The reshape(2, 3, 4) method reorganizes this array into a three-dimensional structure.
  • The print(cube) statement outputs the contents of the 3D array to the console.

The shape is:

python
print(cube.shape)

Explanation

  • The code uses the print function to output the shape of the variable cube.
  • cube is expected to be a NumPy array, which can represent multi-dimensional data.
  • The shape attribute returns a tuple indicating the size of each dimension of the array.
  • This information is useful for understanding the structure and dimensions of the data being processed.

Output:

text
(2, 3, 4)

Get one block:

python
print(cube[0])

Explanation

  • The code uses the print() function to output data to the console.
  • It retrieves the first element of the list cube by using the index 0.
  • Lists in Python are zero-indexed, meaning the first element is accessed with index 0.

Get one row from one block:

python
print(cube[1, 2])

Explanation

  • The code snippet retrieves the value located at the first row and second column of a two-dimensional array named cube.
  • The print function outputs the value to the console.
  • The indexing uses zero-based indexing, meaning that 1 refers to the second row and 2 refers to the third column.
  • Ensure that cube is defined as a multi-dimensional array (e.g., a NumPy array or a list of lists) prior to this operation to avoid errors.

Get one value:

python
print(cube[1, 2, 3])

Explanation

  • The code snippet prints the value located at the coordinates (1, 2, 3) in a multi-dimensional array named cube.
  • The array cube is expected to be defined earlier in the code and should be at least three-dimensional.
  • The indices are zero-based, meaning that (1, 2, 3) refers to the second, third, and fourth elements along each respective dimension.
  • This operation is commonly used in data manipulation and scientific computing to retrieve specific data points from structured datasets.

When slicing 3D arrays, say the dimensions out loud:

text
which block?
which row?
which column?

That habit makes indexing much less confusing.

27. Iterating Over Arrays

For a 1D array, a loop gives individual values:

python
arr = np.array([10, 20, 30])

for value in arr:
    print(value)

Explanation

  • Initializes a NumPy array arr with three integer elements: 10, 20, and 30.
  • Uses a for loop to iterate over each element in the array.
  • Prints each element of the array to the console, one per line.

For a 2D array, a loop gives rows:

python
matrix = np.arange(6).reshape(2, 3)

for row in matrix:
    print(row)

Explanation

  • The code uses NumPy's arange function to generate an array of integers from 0 to 5.
  • The reshape method is then called to transform this 1D array into a 2D array with 2 rows and 3 columns.
  • A for loop iterates over each row of the 2D array, allowing individual rows to be printed.
  • Each row is printed as a separate NumPy array, showcasing the structure of the 2D matrix.

Output:

text
[0 1 2]
[3 4 5]

If you want every element regardless of dimensions, use np.nditer():

python
for value in np.nditer(matrix):
    print(value)

Explanation

  • Utilizes np.nditer, a NumPy function designed for efficient iteration over arrays.
  • The loop iterates through each element in the specified matrix, allowing access to each value sequentially.
  • The print(value) statement outputs each element to the console, displaying the contents of the matrix.
  • This approach is particularly useful for large matrices, as nditer optimizes memory usage and performance during iteration.

Use normal vectorized operations when possible. Iteration is useful for learning, debugging, or special cases, but NumPy is usually strongest when you avoid Python loops.

28. Transpose

Transpose swaps rows and columns.

python
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
])

print(matrix.T)

Explanation

  • The code initializes a 2D NumPy array named matrix with two rows and three columns.
  • The np.array function is used to create the array from a list of lists.
  • The print(matrix.T) statement outputs the transposed version of the array, where rows become columns and vice versa.
  • The .T attribute is a convenient way to access the transpose of a NumPy array.
  • This operation is useful in various mathematical and data manipulation tasks where the orientation of data needs to be changed.

Output:

text
[[1 4]
 [2 5]
 [3 6]]

You can also use:

python
print(np.transpose(matrix))

Explanation

  • Utilizes the transpose function from the NumPy library to switch the rows and columns of the input matrix.
  • The matrix variable should be a NumPy array or a compatible structure for the function to work correctly.
  • The result is a new matrix where the first row becomes the first column, the second row becomes the second column, and so on.
  • This operation is commonly used in linear algebra and data manipulation tasks.

29. Flattening With ravel()

ravel() turns a multi-dimensional array into a 1D view when possible.

python
matrix = np.arange(12).reshape(3, 4)

flat = matrix.ravel()

print(flat)

Explanation

  • The code initializes a 3x4 matrix using np.arange(12), which creates an array of integers from 0 to 11.
  • The reshape(3, 4) method reshapes the array into a 3-row by 4-column format.
  • The ravel() function is called on the matrix to flatten it into a one-dimensional array.
  • Finally, the flattened array is printed, displaying the elements in a single row.

Output:

text
[ 0  1  2  3  4  5  6  7  8  9 10 11]

This is useful when a function expects a simple 1D input.

30. Horizontal and Vertical Stacking

Stacking combines arrays.

Create two arrays with the same shape:

python
left = np.array([
    [1, 2],
    [3, 4],
])

right = np.array([
    [10, 20],
    [30, 40],
])

Explanation

  • The left variable is a 2x2 NumPy array containing the integers 1, 2, 3, and 4.
  • The right variable is another 2x2 NumPy array containing the integers 10, 20, 30, and 40.
  • Both arrays can be used for matrix operations such as addition, multiplication, or other linear algebra computations.
  • NumPy is a powerful library in Python for numerical and scientific computing, providing efficient array operations.

Horizontal stack

hstack() joins arrays side by side.

python
print(np.hstack((left, right)))

Explanation

  • Utilizes the np.hstack function from the NumPy library to concatenate arrays.
  • Takes two input arrays, left and right, and combines them along their horizontal axis.
  • The resulting array maintains the same number of rows as the input arrays, effectively merging their columns.
  • This operation is useful for data manipulation and preparation in numerical computations.

Output:

text
[[ 1  2 10 20]
 [ 3  4 30 40]]

Vertical stack

vstack() joins arrays top to bottom.

python
print(np.vstack((left, right)))

Explanation

  • Utilizes the np.vstack() function from the NumPy library to combine arrays.
  • Takes two input arrays, left and right, and stacks them on top of each other.
  • The resulting array maintains the shape of the original arrays, provided they have the same number of columns.
  • Useful for consolidating data from different sources into a single dataset for analysis or processing.

Output:

text
[[ 1  2]
 [ 3  4]
 [10 20]
 [30 40]]

The shapes must be compatible. If stacking fails, print the shapes first.

python
print(left.shape)
print(right.shape)

Explanation

  • The code uses the shape attribute to retrieve the dimensions of two NumPy arrays, left and right.
  • print(left.shape) outputs the size of the left array, indicating how many elements it contains in each dimension.
  • print(right.shape) performs the same function for the right array, providing its dimensionality.
  • This information is useful for understanding the structure of the data before performing further operations or analyses.

31. Splitting Arrays

Splitting breaks an array into smaller arrays.

python
data = np.arange(12).reshape(3, 4)

print(data)

Explanation

  • The code uses NumPy's arange function to generate an array of integers from 0 to 11.
  • The reshape method is then called to transform this 1D array into a 2D array with 3 rows and 4 columns.
  • Finally, the reshaped array is printed to the console, displaying its structured format.
  • This technique is useful for organizing data in a matrix form for further analysis or manipulation.

Output:

text
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Horizontal split

python
parts = np.hsplit(data, 2)

for part in parts:
    print(part)

Explanation

  • The np.hsplit function is used to horizontally split the data array into two equal parts.
  • The resulting parts are stored in the parts variable as a list of arrays.
  • A for loop iterates through each array in the parts list.
  • The print function outputs each part to the console, allowing for easy visualization of the split data.

This splits the columns into 2 equal parts.

Vertical split

python
rows = np.vsplit(data, 3)

for row_group in rows:
    print(row_group)

Explanation

  • The np.vsplit function from the NumPy library divides the data array into three equal vertical slices.
  • The resulting rows variable is a list containing these three sections.
  • A for loop iterates through each section in rows, allowing individual printing of each slice.
  • This approach is useful for visualizing or processing parts of a larger dataset separately.

This splits the rows into 3 equal parts.

If the array cannot be split evenly, NumPy raises an error.

32. Beginner Mistakes to Avoid

Mistake 1: Forgetting that reshape must preserve size

python
np.arange(10).reshape(3, 4)

Explanation

  • Utilizes the np.arange(10) function to generate an array of integers from 0 to 9.
  • The reshape(3, 4) method reorganizes the flat array into a 3-row by 4-column format.
  • The total number of elements (10) must match the product of the specified dimensions (3 * 4 = 12), resulting in an error if mismatched.
  • This code is useful for preparing data in a structured format for further analysis or manipulation in numerical computations.

This fails because 10 values cannot fill 12 positions.

Mistake 2: Confusing axis directions

For a 2D array:

  • axis=0 gives column-wise results
  • axis=1 gives row-wise results

Mistake 3: Expecting lists and arrays to behave the same

python
print([1, 2, 3] * 2)
print(np.array([1, 2, 3]) * 2)

Explanation

  • The first line multiplies a Python list [1, 2, 3] by 2, resulting in the list being repeated twice: [1, 2, 3, 1, 2, 3].
  • The second line uses NumPy to create an array from [1, 2, 3] and multiplies each element by 2, producing a new array: [2, 4, 6].
  • This showcases the difference in behavior between standard Python lists and NumPy arrays when using the multiplication operator.
  • The output of the first print statement is a concatenated list, while the second results in element-wise multiplication.
  • To use the second line, the NumPy library must be imported as import numpy as np.

Output:

text
[1, 2, 3, 1, 2, 3]
[2 4 6]

Lists repeat. NumPy arrays multiply element by element.

Mistake 4: Not checking shape before operations

When something fails, print:

python
print(arr.shape)
print(arr.dtype)

Explanation

  • The print(arr.shape) statement outputs the dimensions of the NumPy array arr, indicating how many elements are along each axis.
  • The print(arr.dtype) statement reveals the data type of the elements contained in the array, such as integers, floats, or strings.
  • This information is crucial for understanding the structure and type of data being handled in numerical computations.
  • Both attributes help in debugging and optimizing performance when working with large datasets in scientific computing.

This small habit solves many beginner errors.

33. Practice Exercises

Try these before reading the solutions.

Practice Lab

Exercise 1: Create a mostly-zero vector

Create an array of size 10 filled with zeros. Change the value at index 4 to 1.

Practice Lab

Exercise 2: Random score table

Create a random array with shape (4, 3) to represent marks for 4 students in 3 tests. Print:

  • the array
  • the average of all marks
  • the average mark for each student

Practice Lab

Exercise 3: Border matrix

Write a function make_border(rows, cols) that returns a 2D array with ones on the border and zeros inside.

For rows=4 and cols=5, the result should look like:

text
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]

Practice Lab

Exercise 4: Values between 0 and 1

Create 8 evenly spaced values between 0 and 1, excluding both 0 and 1.

Practice Lab

Exercise 5: Row pattern

Create a 5 by 5 matrix where every row is:

text
[0 1 2 3 4]

Practice Lab

Exercise 6: Distance from a point

You have coordinate points:

python
points = np.array([
    [2, 3],
    [5, 7],
    [1, 8],
    [9, 4],
])

Explanation

  • Initializes a NumPy array named points to hold multiple 2D coordinates.
  • Each inner list represents a point in a Cartesian coordinate system, with the first element as the x-coordinate and the second as the y-coordinate.
  • The array is structured as a 2xN matrix, where N is the number of points defined.
  • This format is useful for mathematical operations and visualizations in data analysis and machine learning tasks.

Calculate the distance of every point from:

python
target = np.array([3, 4])

Explanation

  • Initializes a NumPy array named target containing the elements 3 and 4.
  • The array can be used for mathematical operations, such as vector calculations in data analysis or machine learning.
  • NumPy is a powerful library in Python for numerical computations, providing efficient storage and operations on large datasets.

Practice Lab

Exercise 7: Replace odd values

Create an array from 0 to 9. Replace odd values with -1.

Practice Lab

Exercise 8: Column swap

Create a 3 by 3 array from 1 to 9. Swap the first and last columns.

Practice Lab

Exercise 9: Row normalization

Create a 3 by 4 random integer array. Normalize each row using:

text
(row - row_min) / (row_max - row_min)

Practice Lab

Exercise 10: Nth largest value

Write a function nth_largest(arr, n) that returns the nth largest value from a 1D NumPy array.

34. Practice Solutions

Solution Key

Solution 1: Create a mostly-zero vector

python
import numpy as np

vector = np.zeros(10)
vector[4] = 1

print(vector)

Explanation

  • The code imports the NumPy library, which is essential for numerical operations in Python.
  • A NumPy array named vector is initialized with ten elements, all set to zero using np.zeros(10).
  • The fifth element (index 4) of the array is then set to one, modifying the initial array of zeros.
  • Finally, the modified array is printed, displaying a vector with a single one at the fifth position and zeros elsewhere.

Solution Key

Solution 2: Random score table

python
rng = np.random.default_rng(7)
marks = rng.integers(0, 101, size=(4, 3))

print(marks)
print("Overall average:", marks.mean())
print("Student averages:", marks.mean(axis=1))

Explanation

  • Initializes a random number generator with a fixed seed of 7 for reproducibility.
  • Generates a 4x3 array of random integers between 0 and 100, simulating marks for 4 students across 3 subjects.
  • Prints the generated marks array to the console.
  • Calculates and prints the overall average of all marks in the array.
  • Computes and displays the average marks for each student by averaging across the subjects (axis=1).

Solution Key

Solution 3: Border matrix

python
def make_border(rows, cols):
    if rows < 2 or cols < 2:
        raise ValueError("rows and cols must both be at least 2")

    result = np.ones((rows, cols))
    result[1:-1, 1:-1] = 0
    return result

print(make_border(4, 5))

Explanation

  • The function make_border takes two parameters, rows and cols, which define the dimensions of the matrix.
  • It raises a ValueError if either rows or cols is less than 2, ensuring a valid border can be created.
  • A NumPy array filled with ones is initialized, representing the outer border of the matrix.
  • The inner section of the matrix (excluding the border) is set to zero, creating a clear distinction between the border and the inner area.
  • The function returns the resulting matrix, which can be printed or used for further processing.

Solution Key

Solution 4: Values between 0 and 1

python
values = np.linspace(0, 1, 10)[1:-1]

print(values)

Explanation

  • Uses NumPy's linspace function to create an array of 10 evenly spaced values between 0 and 1.
  • The slicing operation [1:-1] removes the first and last elements of the generated array, effectively excluding 0 and 1.
  • The resulting array contains 8 values, which are printed to the console.
  • This technique is useful for generating test data or parameters for simulations where endpoints are not required.

Why 10 values?

Because including 0 and 1 gives 10 points, and removing both ends leaves 8 inner points.

Solution Key

Solution 5: Row pattern

python
pattern = np.zeros((5, 5), dtype=int)
pattern += np.arange(5)

print(pattern)

Explanation

  • Initializes a 5x5 matrix filled with zeros using NumPy's zeros function.
  • Uses np.arange(5) to generate an array of integers from 0 to 4.
  • The addition operation (+=) adds the array to each row of the matrix, resulting in each row containing the same incremental values.
  • The final output displays the modified matrix, where each row contains the values [0, 1, 2, 3, 4].

Alternative:

python
pattern = np.tile(np.arange(5), (5, 1))

print(pattern)

Explanation

  • The np.arange(5) function generates a 1D array containing integers from 0 to 4.
  • The np.tile() function is used to repeat this 1D array 5 times along the vertical axis, creating a 2D array.
  • The resulting pattern variable is a 5x5 array where each row is identical and contains the sequence [0, 1, 2, 3, 4].
  • The print(pattern) statement outputs the 2D array to the console for visualization.

Solution Key

Solution 6: Distance from a point

python
points = np.array([
    [2, 3],
    [5, 7],
    [1, 8],
    [9, 4],
])

target = np.array([3, 4])

distances = np.sqrt(np.sum((points - target) ** 2, axis=1))

print(distances)

Explanation

  • Initializes a NumPy array points containing multiple 2D coordinates.
  • Defines a target point as a NumPy array for which distances will be calculated.
  • Computes the Euclidean distance from the target to each point in points using the formula √((x2 - x1)² + (y2 - y1)²).
  • Utilizes broadcasting to subtract the target from each point and squares the result before summing along the specified axis.
  • Outputs the calculated distances as a NumPy array.

Explanation:

  • points - target subtracts the target from every point
  • ** 2 squares the differences
  • sum(axis=1) adds x and y differences for each point
  • sqrt() calculates the final distance

Solution Key

Solution 7: Replace odd values

python
arr = np.arange(10)
arr[arr % 2 == 1] = -1

print(arr)

Explanation

  • Initializes a NumPy array arr containing integers from 0 to 9 using np.arange(10).
  • Utilizes boolean indexing to identify odd numbers in the array with the condition arr % 2 == 1.
  • Replaces all identified odd numbers in the array with -1.
  • Prints the modified array, showing even numbers unchanged and odd numbers replaced.

Output:

text
[ 0 -1  2 -1  4 -1  6 -1  8 -1]

Solution Key

Solution 8: Column swap

python
matrix = np.arange(1, 10).reshape(3, 3)

swapped = matrix[:, [2, 1, 0]]

print(matrix)
print(swapped)

Explanation

  • The code initializes a 3x3 matrix using np.arange(1, 10) which generates numbers from 1 to 9 and reshapes it into a 3x3 format.
  • The swapped variable reorders the columns of the original matrix by selecting them in reverse order: from the last column to the first.
  • The original matrix and the modified matrix with swapped columns are printed to the console for comparison.
  • This showcases NumPy's powerful indexing capabilities for manipulating array structures efficiently.

Solution Key

Solution 9: Row normalization

python
rng = np.random.default_rng(10)
data = rng.integers(1, 50, size=(3, 4))

row_min = data.min(axis=1, keepdims=True)
row_max = data.max(axis=1, keepdims=True)

normalized = (data - row_min) / (row_max - row_min)

print(data)
print(normalized)

Explanation

  • Initializes a random number generator with a fixed seed for reproducibility.
  • Generates a 3x4 array of random integers between 1 and 50.
  • Computes the minimum and maximum values for each row while maintaining the original array's shape.
  • Applies min-max normalization to scale the data between 0 and 1 for each row.
  • Outputs both the original random integer array and the normalized array.

keepdims=True keeps the result as a column shape, which allows NumPy broadcasting to work cleanly across each row.

Solution Key

Solution 10: Nth largest value

python
def nth_largest(arr, n):
    if not isinstance(arr, np.ndarray):
        raise TypeError("arr must be a NumPy array")
    if arr.ndim != 1:
        raise ValueError("arr must be 1D")
    if n < 1 or n > arr.size:
        raise ValueError("n is outside the valid range")

    sorted_arr = np.sort(arr)
    return sorted_arr[-n]

numbers = np.array([12, 4, 99, 18, 42])

print(nth_largest(numbers, 1))
print(nth_largest(numbers, 3))

Explanation

  • Defines a function nth_largest that retrieves the n-th largest element from a given NumPy array.
  • Validates input to ensure the array is a 1D NumPy array and that n is within a valid range.
  • Sorts the array in ascending order using np.sort() and accesses the n-th largest element using negative indexing.
  • Demonstrates the function with a sample array of numbers, printing the largest and third largest elements.

Output:

text
99
18

35. Mini Project: Analyze Weekly Store Sales

Let us combine the basics into one small task.

Suppose you have sales from 4 stores across 7 days:

python
sales = np.array([
    [120, 135, 150, 160, 155, 170, 180],
    [90, 95, 105, 110, 108, 120, 130],
    [200, 210, 190, 220, 230, 240, 250],
    [60, 75, 80, 85, 90, 95, 100],
])

Explanation

  • A 2D NumPy array named sales is created to store sales figures for different categories over a series of time periods.
  • Each inner list represents sales data for a specific category, with values indicating sales amounts.
  • The array structure allows for efficient numerical operations and data manipulation using NumPy's powerful features.
  • This setup is useful for analyzing trends, comparing performance, and performing calculations on sales data.

Find:

  • total sales for each store
  • total sales for each day
  • best store
  • best day
  • normalized sales for each store

Solution:

python
store_totals = sales.sum(axis=1)
day_totals = sales.sum(axis=0)

best_store_index = np.argmax(store_totals)
best_day_index = np.argmax(day_totals)

store_min = sales.min(axis=1, keepdims=True)
store_max = sales.max(axis=1, keepdims=True)
normalized_sales = (sales - store_min) / (store_max - store_min)

print("Store totals:", store_totals)
print("Day totals:", day_totals)
print("Best store index:", best_store_index)
print("Best day index:", best_day_index)
print("Normalized sales:")
print(normalized_sales)

Explanation

  • Computes total sales for each store and each day using the sum function with specified axes.
  • Identifies the index of the store with the highest total sales and the day with the highest sales using np.argmax.
  • Calculates the minimum and maximum sales for each store to facilitate normalization.
  • Normalizes the sales data to a range of 0 to 1 by applying the formula (sales - min) / (max - min).
  • Outputs the total sales, best store and day indices, and the normalized sales data for further analysis.

This small project uses:

  • 2D arrays
  • axis-based aggregation
  • argmax
  • row-wise normalization
  • broadcasting

These are the same building blocks used in real data analysis.

36. Quick Quiz

1. What is the difference between np.arange() and np.linspace()?

np.arange() focuses on step size. np.linspace() focuses on the number of values.

2. What does shape tell you?

It tells you how many elements exist along each dimension.

3. What does axis=0 mean in a 2D array?

It means calculate down the rows, producing one result for each column.

4. Why does reshape() sometimes fail?

Because the new shape must contain the same total number of elements as the original array.

5. Why are vectorized operations useful?

They let you apply operations to whole arrays with cleaner code and usually better performance than Python loops.

Final Takeaway

NumPy becomes easier when you focus on four questions:

  1. What is the array shape?
  2. What is the array dtype?
  3. Which axis do I want?
  4. Am I selecting, reshaping, combining, or calculating?

If you can answer these questions, most beginner NumPy code becomes predictable.

Start small. Print shapes often. Practice slicing. Use vectorized operations. Over time, NumPy will feel less like a library of random functions and more like a clear way to think about data.

Sources and Further Reading