NumPy Basics: Arrays, Shapes, Dtypes, Indexing, and Array Operations
NumPy is one of the most important Python libraries for data analysis, machine learning, scientific computing, and numerical programming.
The main idea is simple:
NumPy lets you store numbers in compact arrays and perform operations on the whole array at once.
If you are learning data science, you will see NumPy everywhere. Pandas, scikit-learn, TensorFlow, PyTorch, image-processing libraries, and many plotting tools all depend on array-style thinking.
This lesson starts from the basics. You will learn how to create arrays, inspect their shape, change data types, perform calculations, slice arrays, reshape data, combine arrays, split arrays, and solve practical beginner problems.
What you will learn
By the end, you should be able to:
- explain why NumPy arrays are useful
- create 1D, 2D, and 3D arrays
- use
np.array(),np.arange(),np.zeros(),np.ones(),np.linspace(), andnp.eye() - inspect
ndim,shape,size,dtype, anditemsize - convert array data types with
astype() - perform scalar and array operations
- use aggregate functions such as
sum,mean,min,max, andstd - understand the meaning of
axis=0andaxis=1 - index and slice arrays confidently
- reshape, transpose, flatten, stack, and split arrays
- solve beginner NumPy practice problems
1. What Is NumPy?
NumPy is a Python library for working with numerical arrays.
A normal Python list can store values:
scores = [72, 85, 91, 64]That is useful, but if you want to add 5 marks to every score, a list needs a loop:
scores = [72, 85, 91, 64]
updated = []
for score in scores:
updated.append(score + 5)
print(updated)Output:
[77, 90, 96, 69]With NumPy, you can apply the operation directly:
import numpy as np
scores = np.array([72, 85, 91, 64])
updated = scores + 5
print(updated)Output:
[77 90 96 69]This is called a vectorized operation. You write less code, and NumPy performs the calculation efficiently.
2. NumPy Arrays vs Python Lists
Python lists are flexible. They can grow, shrink, and contain mixed types.
mixed = ["Python", 10, True]
print(mixed)NumPy arrays are designed for numerical work. In most practical cases, all values in one array share the same data type.
numbers = np.array([10, 20, 30])
print(numbers)
print(numbers.dtype)Output:
[10 20 30]
int64On some systems, the integer dtype may appear as int32 instead of int64. The exact default can depend on your platform.
Here is the practical difference:
| Feature | Python list | NumPy array |
|---|---|---|
| Best for | General Python objects | Numerical data |
| Mixed data types | Common | Usually avoided |
| Vectorized math | No | Yes |
| Memory layout | Flexible | Compact |
| Data science use | Input/helper structure | Core structure |
Use lists when you need general-purpose Python containers. Use NumPy arrays when you need fast numerical operations.
3. Installing and Importing NumPy
If NumPy is not installed, install it with:
pip install numpyThen import it:
import numpy as npThe alias np is the standard convention. You will see it in documentation, tutorials, notebooks, and production code.
4. Creating a 1D Array
A 1D array is like a simple row of values.
import numpy as np
marks = np.array([80, 75, 92, 68])
print(marks)
print(type(marks))Output:
[80 75 92 68]
<class 'numpy.ndarray'>The object type is ndarray, which means n-dimensional array.
5. Creating 2D Arrays
A 2D array is like a table with rows and columns.
sales = np.array([
[120, 135, 150],
[90, 110, 125],
])
print(sales)Output:
[[120 135 150]
[ 90 110 125]]This array has:
- 2 rows
- 3 columns
You can think of it as sales data for 2 stores across 3 days.
6. Creating 3D Arrays
A 3D array is like multiple tables stacked together.
weekly_sales = np.array([
[
[120, 135],
[90, 110],
],
[
[140, 160],
[100, 115],
],
])
print(weekly_sales)This can represent:
- 2 weeks
- 2 stores
- 2 days per week
In machine learning, 3D and higher-dimensional arrays are common. Images, batches of images, time-series windows, embeddings, and tensors all use this style of structure.
7. Choosing a Data Type With dtype
You can tell NumPy which type to use.
prices = np.array([99, 149, 199], dtype=float)
print(prices)
print(prices.dtype)Output:
[ 99. 149. 199.]
float64You can also create boolean arrays:
availability = np.array([1, 0, 1, 1], dtype=bool)
print(availability)Output:
[ True False True True]And complex arrays:
signals = np.array([2, 5, 8], dtype=complex)
print(signals)Output:
[2.+0.j 5.+0.j 8.+0.j]For beginner data analysis, the most common dtypes are integers, floats, booleans, strings, and dates.
8. Creating Ranges With np.arange()
np.arange() creates values in a range.
numbers = np.arange(1, 8)
print(numbers)Output:
[1 2 3 4 5 6 7]The stop value is not included.
You can add a step:
even_numbers = np.arange(2, 13, 2)
print(even_numbers)Output:
[ 2 4 6 8 10 12]You can count backward:
countdown = np.arange(5, 0, -1)
print(countdown)Output:
[5 4 3 2 1]9. Reshaping a Range
reshape() changes how values are arranged.
grid = np.arange(1, 13).reshape(3, 4)
print(grid)Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]The total number of values must match.
This works because:
3 rows x 4 columns = 12 valuesThis will fail:
np.arange(1, 13).reshape(5, 3)Why?
5 rows x 3 columns = 15 positionsBut the array has only 12 values.
10. Creating Arrays of Zeros and Ones
np.zeros() creates an array filled with zero.
empty_scores = np.zeros((3, 4))
print(empty_scores)Output:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]np.ones() creates an array filled with one.
default_flags = np.ones((2, 5))
print(default_flags)Output:
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]These are useful when you want to create a placeholder array before filling it with real values.
11. Creating Random Arrays
Random arrays are useful for demos, simulations, and testing.
random_values = np.random.random((2, 3))
print(random_values)This creates a 2 by 3 array with values between 0 and 1.
For reproducible examples, use a random generator with a seed:
rng = np.random.default_rng(42)
sample = rng.random((2, 3))
print(sample)Using a seed helps you get the same random values every time you run the code.
12. Creating Evenly Spaced Values With linspace()
np.linspace() creates a fixed number of evenly spaced values between a start and end.
temperatures = np.linspace(0, 100, 6)
print(temperatures)Output:
[ 0. 20. 40. 60. 80. 100.]Use linspace() when you care about how many values you want.
Use arange() when you care about the step size.
13. Creating Identity Matrices
An identity matrix has ones on the main diagonal and zeros everywhere else.
identity = np.eye(4)
print(identity)Output:
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]Identity matrices are common in linear algebra.
If you want a rectangular matrix with diagonal ones, use np.eye() with two dimensions:
wide_identity = np.eye(3, 5)
print(wide_identity)Output:
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]]14. Important Array Attributes
Create three arrays:
one_d = np.arange(6)
two_d = np.arange(12).reshape(3, 4)
three_d = np.arange(24).reshape(2, 3, 4)ndim
ndim tells you how many dimensions an array has.
print(one_d.ndim)
print(two_d.ndim)
print(three_d.ndim)Explanation
- The code uses the
ndimattribute of NumPy arrays to determine their dimensionality. one_d,two_d, andthree_dare assumed to be NumPy arrays with one, two, and three dimensions, respectively.- The
printfunction outputs the number of dimensions for each array to the console. - This is useful for understanding the structure of the data being handled in scientific computing or data analysis tasks.
Output:
1
2
3shape
shape tells you the size of each dimension.
print(one_d.shape)
print(two_d.shape)
print(three_d.shape)Explanation
- The
printfunction outputs the shape of each array to the console. one_d.shapeaccesses the shape attribute of a one-dimensional array, returning its length.two_d.shaperetrieves the dimensions of a two-dimensional array, typically returning a tuple of rows and columns.three_d.shapeprovides the dimensions of a three-dimensional array, returning a tuple representing depth, height, and width.- This code is useful for understanding the structure and size of different types of arrays in numerical computing.
Output:
(6,)
(3, 4)
(2, 3, 4)For three_d, the shape means:
2 blocks, 3 rows per block, 4 columns per rowsize
size tells you the total number of elements.
print(one_d.size)
print(two_d.size)
print(three_d.size)Explanation
- The
printfunction is used to output the sizes of the arrays to the console. one_d.size,two_d.size, andthree_d.sizeaccess thesizeattribute of each respective array, which indicates the total number of elements in the array.- This code assumes that
one_d,two_d, andthree_dare pre-defined array-like objects, likely from libraries such as NumPy. - The output will display the sizes in the order of the arrays defined, helping to understand their dimensions.
Output:
6
12
24dtype
dtype tells you the data type.
print(two_d.dtype)Explanation
- The
printfunction outputs the result to the console. two_dis expected to be a NumPy array, which is a common data structure for numerical data in Python.- The
dtypeattribute of a NumPy array provides information about the type of data stored in the array, such as integers, floats, etc. - This code is useful for debugging or understanding the nature of the data being processed.
itemsize
itemsize tells you how many bytes each element uses.
small = np.array([1, 2, 3], dtype=np.int16)
large = np.array([1, 2, 3], dtype=np.int64)
print(small.itemsize)
print(large.itemsize)Explanation
- The code creates two NumPy arrays,
smallandlarge, with different data types:int16andint64, respectively. - The
itemsizeattribute of a NumPy array returns the size in bytes of each element in the array. - The
printstatements output the memory size of each element for both arrays, illustrating the difference in storage requirements between the two data types. - This comparison is useful for understanding how data types affect memory usage in numerical computations.
Output:
2
8This matters when you work with large datasets.
15. Changing Data Types With astype()
Use astype() to create a converted copy of an array.
ratings = np.array([4.8, 3.2, 5.0, 2.9])
rounded_ratings = ratings.astype(int)
print(rounded_ratings)Explanation
- The code initializes a NumPy array named
ratingscontaining floating-point numbers representing ratings. - The
astype(int)method is used to convert each floating-point rating in the array to an integer, effectively rounding down. - The resulting integer array is stored in the variable
rounded_ratings. - Finally, the code prints the
rounded_ratingsarray to the console, displaying the integer values.
Output:
[4 3 5 2]Be careful: converting floats to integers removes the decimal part. It does not round to the nearest number.
values = np.array([1.9, 2.1, 3.7])
print(values.astype(int))Explanation
- The code initializes a NumPy array named
valuescontaining floating-point numbers. - The
astype(int)method is called on the array, which converts each element from float to integer type. - The
printfunction outputs the resulting array, displaying the integer values after conversion. - This operation truncates the decimal part of each float, effectively rounding down to the nearest whole number.
Output:
[1 2 3]If you want proper rounding, use np.round() first:
print(np.round(values).astype(int))Explanation
- The code uses the NumPy library to handle numerical operations on arrays.
np.round(values)rounds each element in thevaluesarray to the nearest integer.- The result of the rounding is then converted to an integer type using
.astype(int). - This is useful for preparing data for scenarios where integer values are required, such as indexing or counting.
Output:
[2 2 4]16. Scalar Operations
A scalar is a single value.
prices = np.array([100, 200, 300])
print(prices + 10)
print(prices * 2)
print(prices / 4)Explanation
- Initializes a NumPy array named
pricescontaining three integer values: 100, 200, and 300. - Adds 10 to each element in the
pricesarray, resulting in a new array with values [110, 210, 310]. - Multiplies each element in the
pricesarray by 2, producing an array with values [200, 400, 600]. - Divides each element in the
pricesarray by 4, yielding an array with values [25.0, 50.0, 75.0].
Output:
[110 210 310]
[200 400 600]
[25. 50. 75.]The operation is applied to every element.
You can also compare every element:
marks = np.array([45, 72, 88, 39])
print(marks >= 50)Explanation
- The code initializes a NumPy array named
markscontaining four integer values representing scores. - It uses a comparison operation (
>= 50) to create a boolean array indicating which scores are greater than or equal to 50. - The result of the comparison is printed, showing
Truefor scores that meet the condition andFalsefor those that do not. - This operation is useful for quickly assessing performance against a passing mark.
Output:
[False True True False]This returns a boolean array.
17. Array-to-Array Operations
Arrays with the same shape can be added, subtracted, multiplied, and compared element by element.
jan = np.array([120, 90, 150])
feb = np.array([130, 85, 170])
print(feb - jan)
print(feb > jan)Explanation
- Two NumPy arrays,
janandfeb, are created to represent values for January and February. - The expression
feb - jancalculates the difference between corresponding elements of the two arrays, showing how values changed from January to February. - The expression
feb > janperforms an element-wise comparison, returning a boolean array indicating whether each value in February is greater than the corresponding value in January.
Output:
[10 -5 20]
[ True False True]For 2D arrays:
morning = np.array([
[8, 10, 12],
[7, 9, 11],
])
evening = np.array([
[5, 6, 7],
[4, 5, 6],
])
print(morning + evening)Explanation
- The code initializes two 2D NumPy arrays,
morningandevening, containing integer values. - It uses the
np.arrayfunction from the NumPy library to create these arrays. - The
printfunction outputs the result of adding the two arrays together, performing element-wise addition. - The resulting array will have the same shape as the input arrays, with each element being the sum of the corresponding elements from
morningandevening.
Output:
[[13 16 19]
[11 14 17]]18. Useful Array Functions
Create a small dataset:
orders = np.array([
[12, 18, 10],
[9, 15, 21],
[14, 11, 16],
])Explanation
- Initializes a NumPy array named
ordersto store a 2D matrix. - The matrix consists of three rows and three columns, representing different order quantities.
- Each inner list corresponds to a specific order, with values indicating quantities of items ordered.
- This structure is useful for performing mathematical operations or analyses on order data.
Sum
print(np.sum(orders))Explanation
- Utilizes the NumPy library, which is commonly used for numerical operations in Python.
- The
np.sum()function computes the sum of all elements in the provided array, 'orders'. - The result is printed to the console, allowing for immediate visibility of the total sum.
- This operation is efficient for large datasets due to NumPy's optimized performance.
Output:
126Minimum and maximum
print(np.min(orders))
print(np.max(orders))Explanation
- Utilizes the NumPy library, which is commonly used for numerical operations in Python.
np.min(orders)computes and prints the smallest value in theordersarray.np.max(orders)computes and prints the largest value in theordersarray.- This code is useful for quickly assessing the range of values in a dataset.
Output:
9
21Mean
print(np.mean(orders))Explanation
- Utilizes the
meanfunction from the NumPy library to compute the average. - Assumes
ordersis a NumPy array or a list containing numerical values. - The result is printed directly to the console, providing immediate feedback on the average order value.
- This operation is efficient for large datasets due to NumPy's optimized performance.
Output:
14.0Standard deviation
print(np.std(orders))Explanation
- Utilizes the
np.std()function from the NumPy library to compute the standard deviation. - The input
ordersis expected to be a NumPy array or a list containing numerical data. - Standard deviation measures the amount of variation or dispersion in a set of values.
- The result is printed to the console, providing insight into the variability of the
ordersdataset. - This function is useful for statistical analysis and understanding data distribution.
Standard deviation tells you how spread out the values are.
19. Understanding axis
The axis argument tells NumPy which direction to calculate across.
Use this array:
orders = np.array([
[12, 18, 10],
[9, 15, 21],
[14, 11, 16],
])Explanation
- Initializes a NumPy array named
ordersto store a 2D matrix. - The matrix consists of three rows and three columns, representing different order quantities.
- Each inner list corresponds to a specific order, with values indicating quantities of items ordered.
- This structure is useful for performing mathematical operations or analyses on order data.
Think of it as:
rows -> different stores
columns -> different daysaxis=0
axis=0 works down the rows, so it produces one result per column.
print(np.sum(orders, axis=0))Explanation
- Utilizes the NumPy library to perform efficient numerical operations on arrays.
- The
np.sum()function calculates the sum of array elements. - The
axis=0parameter specifies that the sum should be computed column-wise (i.e., summing across rows). - This operation is useful for aggregating data, such as total sales or counts, from multiple orders.
Output:
[35 44 47]This means:
- day 1 total: 12 + 9 + 14 = 35
- day 2 total: 18 + 15 + 11 = 44
- day 3 total: 10 + 21 + 16 = 47
axis=1
axis=1 works across the columns, so it produces one result per row.
print(np.sum(orders, axis=1))Explanation
- Utilizes the NumPy library to perform efficient numerical operations on arrays.
- The
np.sumfunction calculates the sum of elements along a specified axis. - The parameter
axis=1indicates that the summation is performed across rows, resulting in a single sum for each row. - This operation is useful for aggregating data, such as total orders per customer in a dataset.
Output:
[40 45 41]This means:
- store 1 total: 40
- store 2 total: 45
- store 3 total: 41
This is one of the most important NumPy ideas. If your result shape looks wrong, check your axis.
20. Mathematical Functions
NumPy includes many mathematical functions.
values = np.array([1, 2, 3, 4])
print(np.sqrt(values))
print(np.exp(values))
print(np.log(values))Explanation
- The code initializes a NumPy array called
valuescontaining integers from 1 to 4. - It calculates the square root of each element in the array using
np.sqrt(), which returns an array of square roots. - The exponential function is applied to each element with
np.exp(), resulting in an array of e raised to the power of each value. - The natural logarithm of each element is computed using
np.log(), producing an array of logarithmic values.
Output:
[1. 1.41421356 1.73205081 2. ]
[ 2.71828183 7.3890561 20.08553692 54.59815003]
[0. 0.69314718 1.09861229 1.38629436]Trigonometric functions also work:
angles = np.array([0, np.pi / 2, np.pi])
print(np.sin(angles))Explanation
- The code imports the NumPy library and creates an array of angles in radians: 0, π/2, and π.
- It then computes the sine of each angle in the array using the
np.sin()function. - The result is printed, showing the sine values corresponding to the input angles: 0, 1, and 0, respectively.
- This snippet demonstrates the use of vectorized operations in NumPy for efficient mathematical computations.
Output:
[0.0000000e+00 1.0000000e+00 1.2246468e-16]The last value is extremely close to zero. Floating-point calculations sometimes produce tiny approximation errors.
21. Rounding Values
measurements = np.array([2.2, 2.8, 3.1, 3.9])
print(np.round(measurements))
print(np.floor(measurements))
print(np.ceil(measurements))Explanation
- Initializes a NumPy array called
measurementswith floating-point values. - Uses
np.round()to round each element in the array to the nearest integer. - Applies
np.floor()to return the largest integer less than or equal to each element. - Utilizes
np.ceil()to return the smallest integer greater than or equal to each element.
Output:
[2. 3. 3. 4.]
[2. 2. 3. 3.]
[3. 3. 4. 4.]Use:
roundfor nearest valuefloorfor lower integerceilfor higher integer
22. Dot Product
The dot product is a common linear algebra operation.
For two 1D arrays:
weights = np.array([0.2, 0.5, 0.3])
features = np.array([80, 60, 90])
score = np.dot(weights, features)
print(score)Explanation
- The
weightsarray contains the coefficients that represent the importance of each feature. - The
featuresarray holds the values of the features being evaluated. - The
np.dot()function computes the dot product of theweightsandfeatures, resulting in a single score that reflects the weighted sum. - Finally, the calculated score is printed to the console, providing a quantitative assessment based on the input data.
Output:
73.0This is:
0.2*80 + 0.5*60 + 0.3*90For matrices, the inner dimensions must match.
a = np.arange(6).reshape(2, 3)
b = np.arange(12).reshape(3, 4)
print(np.dot(a, b))Explanation
- The code initializes two NumPy arrays,
aandb, with specified shapes usingnp.arange()andreshape(). - Array
ais a 2x3 matrix containing values from 0 to 5, while arraybis a 3x4 matrix containing values from 0 to 11. - The
np.dot()function is used to compute the dot product of the two matrices, resulting in a new 2x4 matrix. - The result of the dot product is printed to the console, showcasing the multiplication of the two matrices.
Here:
a shape = (2, 3)
b shape = (3, 4)
result shape = (2, 4)23. Indexing 1D Arrays
Indexing means selecting values by position.
scores = np.array([55, 70, 82, 91, 64])
print(scores[0])
print(scores[3])
print(scores[-1])Explanation
- Initializes a NumPy array named
scorescontaining five integer values representing scores. - Uses
print(scores[0])to output the first element of the array, which is 55. - Uses
print(scores[3])to output the fourth element of the array, which is 91. - Uses
print(scores[-1])to output the last element of the array, which is 64, demonstrating negative indexing.
Output:
55
91
64Python indexing starts at zero.
Negative indexing starts from the end.
24. Slicing 1D Arrays
Slicing selects a range.
scores = np.array([55, 70, 82, 91, 64, 77])
print(scores[1:4])Explanation
- The code initializes a NumPy array named
scorescontaining six integer values representing scores. - The slicing operation
scores[1:4]retrieves elements from index 1 to index 3 (inclusive of 1 and exclusive of 4). - The
printfunction outputs the sliced portion of the array, which consists of the scores 70, 82, and 91. - This technique is useful for accessing a subset of data within a larger dataset efficiently.
Output:
[70 82 91]The start index is included. The stop index is excluded.
You can add a step:
print(scores[0:6:2])Explanation
- The
printfunction outputs the result to the console. scores[0:6:2]uses Python's list slicing syntax to access elements.- The slice starts at index 0 and goes up to, but does not include, index 6.
- The step value of 2 indicates that every second element within the specified range will be selected.
- This is useful for extracting specific elements from a list efficiently.
Output:
[55 82 64]Reverse an array:
print(scores[::-1])Explanation
- The slicing syntax
[::-1]is used to create a reversed copy of the 'scores' list. - The
print()function outputs the reversed list to the console. - This approach is efficient and concise for reversing lists in Python.
- The original 'scores' list remains unchanged after this operation.
Output:
[77 64 91 82 70 55]25. Indexing 2D Arrays
Use row and column positions.
table = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
])
print(table[1, 2])Explanation
- A 2D NumPy array named
tableis created with three rows and three columns containing integer values. - The
printfunction is used to output a specific element from the array. - The element accessed is located at the second row (index 1) and third column (index 2), which corresponds to the value
60. - NumPy uses zero-based indexing, meaning the first row and column are indexed as 0.
- This code snippet demonstrates how to retrieve values from a multi-dimensional array efficiently.
Output:
60This means:
row index 1, column index 2Select a full row:
print(table[0, :])Explanation
- The
printfunction outputs data to the console. tableis expected to be a 2D array or similar data structure, such as a NumPy array or a list of lists.- The indexing
[0, :]selects all columns of the first row (index 0) of thetable. - The colon
:indicates that all elements in that row should be included in the output. - This snippet is useful for quickly inspecting the contents of the first row in a dataset.
Output:
[10 20 30]Select a full column:
print(table[:, 1])Explanation
- The code utilizes NumPy's slicing feature to access specific elements in a 2D array.
table[:, 1]selects all rows (:) from the second column (1) of thetablearray.- This operation returns a one-dimensional array containing all values from the specified column.
- It is a common technique for data manipulation and analysis in scientific computing with Python.
- Ensure that
tableis a NumPy array for this slicing syntax to work correctly.
Output:
[20 50 80]Select a smaller block:
print(table[0:2, 1:3])Explanation
- The code uses the
printfunction to display the output of the slicing operation. tableis assumed to be a 2D array or matrix-like structure, such as a NumPy array.- The slicing
0:2indicates that it will select the first two rows (index 0 and 1). - The slicing
1:3indicates that it will select the second and third columns (index 1 and 2). - The result is a smaller 2D array containing the specified rows and columns from the original
table.
Output:
[[20 30]
[50 60]]26. Indexing 3D Arrays
A 3D array has three indexes:
block, row, columnExample:
cube = np.arange(24).reshape(2, 3, 4)
print(cube)Explanation
- The code utilizes NumPy to create a 3D array named
cubewith dimensions 2x3x4. np.arange(24)generates a one-dimensional array with values from 0 to 23.- The
reshape(2, 3, 4)method reorganizes this array into a three-dimensional structure. - The
print(cube)statement outputs the contents of the 3D array to the console.
The shape is:
print(cube.shape)Explanation
- The code uses the
printfunction to output the shape of the variablecube. cubeis expected to be a NumPy array, which can represent multi-dimensional data.- The
shapeattribute returns a tuple indicating the size of each dimension of the array. - This information is useful for understanding the structure and dimensions of the data being processed.
Output:
(2, 3, 4)Get one block:
print(cube[0])Explanation
- The code uses the
print()function to output data to the console. - It retrieves the first element of the list
cubeby using the index0. - Lists in Python are zero-indexed, meaning the first element is accessed with index
0.
Get one row from one block:
print(cube[1, 2])Explanation
- The code snippet retrieves the value located at the first row and second column of a two-dimensional array named
cube. - The
printfunction outputs the value to the console. - The indexing uses zero-based indexing, meaning that
1refers to the second row and2refers to the third column. - Ensure that
cubeis defined as a multi-dimensional array (e.g., a NumPy array or a list of lists) prior to this operation to avoid errors.
Get one value:
print(cube[1, 2, 3])Explanation
- The code snippet prints the value located at the coordinates (1, 2, 3) in a multi-dimensional array named
cube. - The array
cubeis expected to be defined earlier in the code and should be at least three-dimensional. - The indices are zero-based, meaning that (1, 2, 3) refers to the second, third, and fourth elements along each respective dimension.
- This operation is commonly used in data manipulation and scientific computing to retrieve specific data points from structured datasets.
When slicing 3D arrays, say the dimensions out loud:
which block?
which row?
which column?That habit makes indexing much less confusing.
27. Iterating Over Arrays
For a 1D array, a loop gives individual values:
arr = np.array([10, 20, 30])
for value in arr:
print(value)Explanation
- Initializes a NumPy array
arrwith three integer elements: 10, 20, and 30. - Uses a for loop to iterate over each element in the array.
- Prints each element of the array to the console, one per line.
For a 2D array, a loop gives rows:
matrix = np.arange(6).reshape(2, 3)
for row in matrix:
print(row)Explanation
- The code uses NumPy's
arangefunction to generate an array of integers from 0 to 5. - The
reshapemethod is then called to transform this 1D array into a 2D array with 2 rows and 3 columns. - A for loop iterates over each row of the 2D array, allowing individual rows to be printed.
- Each row is printed as a separate NumPy array, showcasing the structure of the 2D matrix.
Output:
[0 1 2]
[3 4 5]If you want every element regardless of dimensions, use np.nditer():
for value in np.nditer(matrix):
print(value)Explanation
- Utilizes
np.nditer, a NumPy function designed for efficient iteration over arrays. - The loop iterates through each element in the specified
matrix, allowing access to each value sequentially. - The
print(value)statement outputs each element to the console, displaying the contents of the matrix. - This approach is particularly useful for large matrices, as
nditeroptimizes memory usage and performance during iteration.
Use normal vectorized operations when possible. Iteration is useful for learning, debugging, or special cases, but NumPy is usually strongest when you avoid Python loops.
28. Transpose
Transpose swaps rows and columns.
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
])
print(matrix.T)Explanation
- The code initializes a 2D NumPy array named
matrixwith two rows and three columns. - The
np.arrayfunction is used to create the array from a list of lists. - The
print(matrix.T)statement outputs the transposed version of the array, where rows become columns and vice versa. - The
.Tattribute is a convenient way to access the transpose of a NumPy array. - This operation is useful in various mathematical and data manipulation tasks where the orientation of data needs to be changed.
Output:
[[1 4]
[2 5]
[3 6]]You can also use:
print(np.transpose(matrix))Explanation
- Utilizes the
transposefunction from the NumPy library to switch the rows and columns of the input matrix. - The
matrixvariable should be a NumPy array or a compatible structure for the function to work correctly. - The result is a new matrix where the first row becomes the first column, the second row becomes the second column, and so on.
- This operation is commonly used in linear algebra and data manipulation tasks.
29. Flattening With ravel()
ravel() turns a multi-dimensional array into a 1D view when possible.
matrix = np.arange(12).reshape(3, 4)
flat = matrix.ravel()
print(flat)Explanation
- The code initializes a 3x4 matrix using
np.arange(12), which creates an array of integers from 0 to 11. - The
reshape(3, 4)method reshapes the array into a 3-row by 4-column format. - The
ravel()function is called on the matrix to flatten it into a one-dimensional array. - Finally, the flattened array is printed, displaying the elements in a single row.
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11]This is useful when a function expects a simple 1D input.
30. Horizontal and Vertical Stacking
Stacking combines arrays.
Create two arrays with the same shape:
left = np.array([
[1, 2],
[3, 4],
])
right = np.array([
[10, 20],
[30, 40],
])Explanation
- The
leftvariable is a 2x2 NumPy array containing the integers 1, 2, 3, and 4. - The
rightvariable is another 2x2 NumPy array containing the integers 10, 20, 30, and 40. - Both arrays can be used for matrix operations such as addition, multiplication, or other linear algebra computations.
- NumPy is a powerful library in Python for numerical and scientific computing, providing efficient array operations.
Horizontal stack
hstack() joins arrays side by side.
print(np.hstack((left, right)))Explanation
- Utilizes the
np.hstackfunction from the NumPy library to concatenate arrays. - Takes two input arrays,
leftandright, and combines them along their horizontal axis. - The resulting array maintains the same number of rows as the input arrays, effectively merging their columns.
- This operation is useful for data manipulation and preparation in numerical computations.
Output:
[[ 1 2 10 20]
[ 3 4 30 40]]Vertical stack
vstack() joins arrays top to bottom.
print(np.vstack((left, right)))Explanation
- Utilizes the
np.vstack()function from the NumPy library to combine arrays. - Takes two input arrays,
leftandright, and stacks them on top of each other. - The resulting array maintains the shape of the original arrays, provided they have the same number of columns.
- Useful for consolidating data from different sources into a single dataset for analysis or processing.
Output:
[[ 1 2]
[ 3 4]
[10 20]
[30 40]]The shapes must be compatible. If stacking fails, print the shapes first.
print(left.shape)
print(right.shape)Explanation
- The code uses the
shapeattribute to retrieve the dimensions of two NumPy arrays,leftandright. print(left.shape)outputs the size of theleftarray, indicating how many elements it contains in each dimension.print(right.shape)performs the same function for therightarray, providing its dimensionality.- This information is useful for understanding the structure of the data before performing further operations or analyses.
31. Splitting Arrays
Splitting breaks an array into smaller arrays.
data = np.arange(12).reshape(3, 4)
print(data)Explanation
- The code uses NumPy's
arangefunction to generate an array of integers from 0 to 11. - The
reshapemethod is then called to transform this 1D array into a 2D array with 3 rows and 4 columns. - Finally, the reshaped array is printed to the console, displaying its structured format.
- This technique is useful for organizing data in a matrix form for further analysis or manipulation.
Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]Horizontal split
parts = np.hsplit(data, 2)
for part in parts:
print(part)Explanation
- The
np.hsplitfunction is used to horizontally split thedataarray into two equal parts. - The resulting parts are stored in the
partsvariable as a list of arrays. - A
forloop iterates through each array in thepartslist. - The
printfunction outputs each part to the console, allowing for easy visualization of the split data.
This splits the columns into 2 equal parts.
Vertical split
rows = np.vsplit(data, 3)
for row_group in rows:
print(row_group)Explanation
- The
np.vsplitfunction from the NumPy library divides thedataarray into three equal vertical slices. - The resulting
rowsvariable is a list containing these three sections. - A for loop iterates through each section in
rows, allowing individual printing of each slice. - This approach is useful for visualizing or processing parts of a larger dataset separately.
This splits the rows into 3 equal parts.
If the array cannot be split evenly, NumPy raises an error.
32. Beginner Mistakes to Avoid
Mistake 1: Forgetting that reshape must preserve size
np.arange(10).reshape(3, 4)Explanation
- Utilizes the
np.arange(10)function to generate an array of integers from 0 to 9. - The
reshape(3, 4)method reorganizes the flat array into a 3-row by 4-column format. - The total number of elements (10) must match the product of the specified dimensions (3 * 4 = 12), resulting in an error if mismatched.
- This code is useful for preparing data in a structured format for further analysis or manipulation in numerical computations.
This fails because 10 values cannot fill 12 positions.
Mistake 2: Confusing axis directions
For a 2D array:
axis=0gives column-wise resultsaxis=1gives row-wise results
Mistake 3: Expecting lists and arrays to behave the same
print([1, 2, 3] * 2)
print(np.array([1, 2, 3]) * 2)Explanation
- The first line multiplies a Python list
[1, 2, 3]by2, resulting in the list being repeated twice:[1, 2, 3, 1, 2, 3]. - The second line uses NumPy to create an array from
[1, 2, 3]and multiplies each element by2, producing a new array:[2, 4, 6]. - This showcases the difference in behavior between standard Python lists and NumPy arrays when using the multiplication operator.
- The output of the first print statement is a concatenated list, while the second results in element-wise multiplication.
- To use the second line, the NumPy library must be imported as
import numpy as np.
Output:
[1, 2, 3, 1, 2, 3]
[2 4 6]Lists repeat. NumPy arrays multiply element by element.
Mistake 4: Not checking shape before operations
When something fails, print:
print(arr.shape)
print(arr.dtype)Explanation
- The
print(arr.shape)statement outputs the dimensions of the NumPy arrayarr, indicating how many elements are along each axis. - The
print(arr.dtype)statement reveals the data type of the elements contained in the array, such as integers, floats, or strings. - This information is crucial for understanding the structure and type of data being handled in numerical computations.
- Both attributes help in debugging and optimizing performance when working with large datasets in scientific computing.
This small habit solves many beginner errors.
33. Practice Exercises
Try these before reading the solutions.
Practice Lab
Exercise 1: Create a mostly-zero vector
Create an array of size 10 filled with zeros. Change the value at index 4 to 1.
Practice Lab
Exercise 2: Random score table
Create a random array with shape (4, 3) to represent marks for 4 students in 3 tests. Print:
- the array
- the average of all marks
- the average mark for each student
Practice Lab
Exercise 3: Border matrix
Write a function make_border(rows, cols) that returns a 2D array with ones on the border and zeros inside.
For rows=4 and cols=5, the result should look like:
[[1. 1. 1. 1. 1.]
[1. 0. 0. 0. 1.]
[1. 0. 0. 0. 1.]
[1. 1. 1. 1. 1.]]Practice Lab
Exercise 4: Values between 0 and 1
Create 8 evenly spaced values between 0 and 1, excluding both 0 and 1.
Practice Lab
Exercise 5: Row pattern
Create a 5 by 5 matrix where every row is:
[0 1 2 3 4]Practice Lab
Exercise 6: Distance from a point
You have coordinate points:
points = np.array([
[2, 3],
[5, 7],
[1, 8],
[9, 4],
])Explanation
- Initializes a NumPy array named
pointsto hold multiple 2D coordinates. - Each inner list represents a point in a Cartesian coordinate system, with the first element as the x-coordinate and the second as the y-coordinate.
- The array is structured as a 2xN matrix, where N is the number of points defined.
- This format is useful for mathematical operations and visualizations in data analysis and machine learning tasks.
Calculate the distance of every point from:
target = np.array([3, 4])Explanation
- Initializes a NumPy array named
targetcontaining the elements 3 and 4. - The array can be used for mathematical operations, such as vector calculations in data analysis or machine learning.
- NumPy is a powerful library in Python for numerical computations, providing efficient storage and operations on large datasets.
Practice Lab
Exercise 7: Replace odd values
Create an array from 0 to 9. Replace odd values with -1.
Practice Lab
Exercise 8: Column swap
Create a 3 by 3 array from 1 to 9. Swap the first and last columns.
Practice Lab
Exercise 9: Row normalization
Create a 3 by 4 random integer array. Normalize each row using:
(row - row_min) / (row_max - row_min)Practice Lab
Exercise 10: Nth largest value
Write a function nth_largest(arr, n) that returns the nth largest value from a 1D NumPy array.
34. Practice Solutions
Solution Key
Solution 1: Create a mostly-zero vector
import numpy as np
vector = np.zeros(10)
vector[4] = 1
print(vector)Explanation
- The code imports the NumPy library, which is essential for numerical operations in Python.
- A NumPy array named
vectoris initialized with ten elements, all set to zero usingnp.zeros(10). - The fifth element (index 4) of the array is then set to one, modifying the initial array of zeros.
- Finally, the modified array is printed, displaying a vector with a single one at the fifth position and zeros elsewhere.
Solution Key
Solution 2: Random score table
rng = np.random.default_rng(7)
marks = rng.integers(0, 101, size=(4, 3))
print(marks)
print("Overall average:", marks.mean())
print("Student averages:", marks.mean(axis=1))Explanation
- Initializes a random number generator with a fixed seed of 7 for reproducibility.
- Generates a 4x3 array of random integers between 0 and 100, simulating marks for 4 students across 3 subjects.
- Prints the generated marks array to the console.
- Calculates and prints the overall average of all marks in the array.
- Computes and displays the average marks for each student by averaging across the subjects (axis=1).
Solution Key
Solution 3: Border matrix
def make_border(rows, cols):
if rows < 2 or cols < 2:
raise ValueError("rows and cols must both be at least 2")
result = np.ones((rows, cols))
result[1:-1, 1:-1] = 0
return result
print(make_border(4, 5))Explanation
- The function
make_bordertakes two parameters,rowsandcols, which define the dimensions of the matrix. - It raises a
ValueErrorif eitherrowsorcolsis less than 2, ensuring a valid border can be created. - A NumPy array filled with ones is initialized, representing the outer border of the matrix.
- The inner section of the matrix (excluding the border) is set to zero, creating a clear distinction between the border and the inner area.
- The function returns the resulting matrix, which can be printed or used for further processing.
Solution Key
Solution 4: Values between 0 and 1
values = np.linspace(0, 1, 10)[1:-1]
print(values)Explanation
- Uses NumPy's
linspacefunction to create an array of 10 evenly spaced values between 0 and 1. - The slicing operation
[1:-1]removes the first and last elements of the generated array, effectively excluding 0 and 1. - The resulting array contains 8 values, which are printed to the console.
- This technique is useful for generating test data or parameters for simulations where endpoints are not required.
Why 10 values?
Because including 0 and 1 gives 10 points, and removing both ends leaves 8 inner points.
Solution Key
Solution 5: Row pattern
pattern = np.zeros((5, 5), dtype=int)
pattern += np.arange(5)
print(pattern)Explanation
- Initializes a 5x5 matrix filled with zeros using NumPy's
zerosfunction. - Uses
np.arange(5)to generate an array of integers from 0 to 4. - The addition operation (
+=) adds the array to each row of the matrix, resulting in each row containing the same incremental values. - The final output displays the modified matrix, where each row contains the values [0, 1, 2, 3, 4].
Alternative:
pattern = np.tile(np.arange(5), (5, 1))
print(pattern)Explanation
- The
np.arange(5)function generates a 1D array containing integers from 0 to 4. - The
np.tile()function is used to repeat this 1D array 5 times along the vertical axis, creating a 2D array. - The resulting
patternvariable is a 5x5 array where each row is identical and contains the sequence [0, 1, 2, 3, 4]. - The
print(pattern)statement outputs the 2D array to the console for visualization.
Solution Key
Solution 6: Distance from a point
points = np.array([
[2, 3],
[5, 7],
[1, 8],
[9, 4],
])
target = np.array([3, 4])
distances = np.sqrt(np.sum((points - target) ** 2, axis=1))
print(distances)Explanation
- Initializes a NumPy array
pointscontaining multiple 2D coordinates. - Defines a
targetpoint as a NumPy array for which distances will be calculated. - Computes the Euclidean distance from the
targetto each point inpointsusing the formula √((x2 - x1)² + (y2 - y1)²). - Utilizes broadcasting to subtract the
targetfrom each point and squares the result before summing along the specified axis. - Outputs the calculated distances as a NumPy array.
Explanation:
points - targetsubtracts the target from every point** 2squares the differencessum(axis=1)adds x and y differences for each pointsqrt()calculates the final distance
Solution Key
Solution 7: Replace odd values
arr = np.arange(10)
arr[arr % 2 == 1] = -1
print(arr)Explanation
- Initializes a NumPy array
arrcontaining integers from 0 to 9 usingnp.arange(10). - Utilizes boolean indexing to identify odd numbers in the array with the condition
arr % 2 == 1. - Replaces all identified odd numbers in the array with -1.
- Prints the modified array, showing even numbers unchanged and odd numbers replaced.
Output:
[ 0 -1 2 -1 4 -1 6 -1 8 -1]Solution Key
Solution 8: Column swap
matrix = np.arange(1, 10).reshape(3, 3)
swapped = matrix[:, [2, 1, 0]]
print(matrix)
print(swapped)Explanation
- The code initializes a 3x3 matrix using
np.arange(1, 10)which generates numbers from 1 to 9 and reshapes it into a 3x3 format. - The
swappedvariable reorders the columns of the original matrix by selecting them in reverse order: from the last column to the first. - The original matrix and the modified matrix with swapped columns are printed to the console for comparison.
- This showcases NumPy's powerful indexing capabilities for manipulating array structures efficiently.
Solution Key
Solution 9: Row normalization
rng = np.random.default_rng(10)
data = rng.integers(1, 50, size=(3, 4))
row_min = data.min(axis=1, keepdims=True)
row_max = data.max(axis=1, keepdims=True)
normalized = (data - row_min) / (row_max - row_min)
print(data)
print(normalized)Explanation
- Initializes a random number generator with a fixed seed for reproducibility.
- Generates a 3x4 array of random integers between 1 and 50.
- Computes the minimum and maximum values for each row while maintaining the original array's shape.
- Applies min-max normalization to scale the data between 0 and 1 for each row.
- Outputs both the original random integer array and the normalized array.
keepdims=True keeps the result as a column shape, which allows NumPy broadcasting to work cleanly across each row.
Solution Key
Solution 10: Nth largest value
def nth_largest(arr, n):
if not isinstance(arr, np.ndarray):
raise TypeError("arr must be a NumPy array")
if arr.ndim != 1:
raise ValueError("arr must be 1D")
if n < 1 or n > arr.size:
raise ValueError("n is outside the valid range")
sorted_arr = np.sort(arr)
return sorted_arr[-n]
numbers = np.array([12, 4, 99, 18, 42])
print(nth_largest(numbers, 1))
print(nth_largest(numbers, 3))Explanation
- Defines a function
nth_largestthat retrieves the n-th largest element from a given NumPy array. - Validates input to ensure the array is a 1D NumPy array and that n is within a valid range.
- Sorts the array in ascending order using
np.sort()and accesses the n-th largest element using negative indexing. - Demonstrates the function with a sample array of numbers, printing the largest and third largest elements.
Output:
99
1835. Mini Project: Analyze Weekly Store Sales
Let us combine the basics into one small task.
Suppose you have sales from 4 stores across 7 days:
sales = np.array([
[120, 135, 150, 160, 155, 170, 180],
[90, 95, 105, 110, 108, 120, 130],
[200, 210, 190, 220, 230, 240, 250],
[60, 75, 80, 85, 90, 95, 100],
])Explanation
- A 2D NumPy array named
salesis created to store sales figures for different categories over a series of time periods. - Each inner list represents sales data for a specific category, with values indicating sales amounts.
- The array structure allows for efficient numerical operations and data manipulation using NumPy's powerful features.
- This setup is useful for analyzing trends, comparing performance, and performing calculations on sales data.
Find:
- total sales for each store
- total sales for each day
- best store
- best day
- normalized sales for each store
Solution:
store_totals = sales.sum(axis=1)
day_totals = sales.sum(axis=0)
best_store_index = np.argmax(store_totals)
best_day_index = np.argmax(day_totals)
store_min = sales.min(axis=1, keepdims=True)
store_max = sales.max(axis=1, keepdims=True)
normalized_sales = (sales - store_min) / (store_max - store_min)
print("Store totals:", store_totals)
print("Day totals:", day_totals)
print("Best store index:", best_store_index)
print("Best day index:", best_day_index)
print("Normalized sales:")
print(normalized_sales)Explanation
- Computes total sales for each store and each day using the
sumfunction with specified axes. - Identifies the index of the store with the highest total sales and the day with the highest sales using
np.argmax. - Calculates the minimum and maximum sales for each store to facilitate normalization.
- Normalizes the sales data to a range of 0 to 1 by applying the formula
(sales - min) / (max - min). - Outputs the total sales, best store and day indices, and the normalized sales data for further analysis.
This small project uses:
- 2D arrays
- axis-based aggregation
argmax- row-wise normalization
- broadcasting
These are the same building blocks used in real data analysis.
36. Quick Quiz
1. What is the difference between np.arange() and np.linspace()?
np.arange() focuses on step size. np.linspace() focuses on the number of values.
2. What does shape tell you?
It tells you how many elements exist along each dimension.
3. What does axis=0 mean in a 2D array?
It means calculate down the rows, producing one result for each column.
4. Why does reshape() sometimes fail?
Because the new shape must contain the same total number of elements as the original array.
5. Why are vectorized operations useful?
They let you apply operations to whole arrays with cleaner code and usually better performance than Python loops.
Final Takeaway
NumPy becomes easier when you focus on four questions:
- What is the array shape?
- What is the array dtype?
- Which axis do I want?
- Am I selecting, reshaping, combining, or calculating?
If you can answer these questions, most beginner NumPy code becomes predictable.
Start small. Print shapes often. Practice slicing. Use vectorized operations. Over time, NumPy will feel less like a library of random functions and more like a clear way to think about data.
Sources and Further Reading
- NumPy documentation: https://numpy.org/doc/
- NumPy quickstart: https://numpy.org/doc/stable/user/quickstart.html
- NumPy ndarray reference: https://numpy.org/doc/stable/reference/arrays.ndarray.html
- NumPy random generator guide: https://numpy.org/doc/stable/reference/random/generator.html
