How can you sort a 1D NumPy array in descending order?

To sort a 1D NumPy array in descending order, use np.sort combined with slicing notation ::-1.

What does the axis parameter do when sorting 2D arrays in NumPy?

The axis parameter defines how sorting is applied: axis=1 sorts row-wise, and axis=0 sorts column-wise.

How can you retrieve unique elements from a 2D array using NumPy?

You can retrieve unique elements and unique rows or columns from 2D arrays using NumPy's specialized functions.

What are some operations you can perform with NumPy's advanced functions?

You can perform operations like sorting, searching, stacking, clipping, set logic, statistical distributions, and more without using Python loops.

How does np.sort differ from Python's built-in sorted function?

np.sort is vectorized and can sort multi-dimensional arrays along a specific axis, unlike Python's built-in sorted function.

Advanced NumPy: Sorting, Searching & Essential

NumPy Tricks: Sorting, Filtering, Reshaping, Statistics, and Set Operations

Once you understand NumPy arrays, shapes, axes, indexing, and broadcasting, the next step is learning the small tools that make day-to-day array work faster.

These are not "magic tricks." They are practical patterns you will use when cleaning data, preparing model inputs, analyzing scores, ranking records, creating summary features, and transforming arrays before sending them to Pandas, scikit-learn, visualization tools, or machine learning models.

In this lesson, you will learn how to:

sort 1D and 2D arrays
add rows and columns safely
combine arrays with concatenate
find unique values and unique rows
add dimensions with expand_dims
filter and replace values using where
find best and worst positions with argmax and argmin
calculate running totals with cumsum
calculate percentiles and medians
build frequency tables with histogram
measure correlation with corrcoef
check membership with isin
reverse arrays with flip
update and delete values carefully
use NumPy set operations
cap extreme values with clip

The examples are written around small business, student, and analytics-style datasets so you can see how these functions appear in real work.

1. Setup

Import NumPy with the standard alias:

python

import numpy as np

For examples that use random data, use a generator with a seed:

python

rng = np.random.default_rng(42)

This keeps your output reproducible while learning.

2. Sorting 1D Arrays With `np.sort`

np.sort() returns a sorted copy of an array.

python

scores = np.array([72, 95, 61, 88, 75])

sorted_scores = np.sort(scores)

print(sorted_scores)

Output:

text

[61 72 75 88 95]

The original array is not changed:

python

print(scores)

Output:

text

[72 95 61 88 75]

To sort in descending order, reverse the sorted result:

python

descending_scores = np.sort(scores)[::-1]

print(descending_scores)

Output:

text

[95 88 75 72 61]

3. Sorting 2D Arrays

For a 2D array, axis controls the direction of sorting.

python

sales = np.array([
    [45, 80, 62],
    [90, 55, 73],
    [38, 96, 68],
])

Sort values inside each row:

python

print(np.sort(sales, axis=1))

Output:

text

[[45 62 80]
 [55 73 90]
 [38 68 96]]

Sort values inside each column:

python

print(np.sort(sales, axis=0))

Output:

text

[[38 55 62]
 [45 80 68]
 [90 96 73]]

Use this when you want to sort values within rows or columns independently.

4. Sorting Rows By One Column

Sometimes you do not want to sort values inside each row. You want to reorder the rows based on one column.

Suppose each row is:

text

[store_id, weekday_sales, weekend_sales]

python

store_sales = np.array([
    [101, 450, 620],
    [102, 390, 710],
    [103, 520, 560],
    [104, 480, 800],
])

Sort rows by weekend sales:

python

order = np.argsort(store_sales[:, 2])
sorted_by_weekend = store_sales[order]

print(sorted_by_weekend)

Output:

text

[[103 520 560]
 [101 450 620]
 [102 390 710]
 [104 480 800]]

For descending order:

python

best_weekend_first = store_sales[np.argsort(store_sales[:, 2])[::-1]]

print(best_weekend_first)

Output:

text

[[104 480 800]
 [102 390 710]
 [101 450 620]
 [103 520 560]]

This pattern is very useful for ranking tables.

5. Sorting Rows By A Calculated Value

You can sort rows by a value that does not exist yet.

Example: sort stores by total sales.

python

totals = store_sales[:, 1] + store_sales[:, 2]
order = np.argsort(totals)[::-1]

ranked_stores = store_sales[order]

print(ranked_stores)
print(totals[order])

Output:

text

[[104 480 800]
 [102 390 710]
 [103 520 560]
 [101 450 620]]
[1280 1100 1080 1070]

The important idea:

python

array[np.argsort(values)]

Use it when you want to rearrange records based on a score, total, date, error value, or prediction confidence.

6. Adding A Column With `np.concatenate`

Assume you have marks for 4 students in 3 subjects:

python

marks = np.array([
    [78, 85, 91],
    [62, 70, 68],
    [90, 88, 95],
    [55, 60, 64],
])

Now a new subject score arrives:

python

project_marks = np.array([89, 74, 97, 66])

This is a 1D array. To add it as a column, convert it to shape (4, 1):

python

project_column = project_marks.reshape(-1, 1)

updated_marks = np.concatenate((marks, project_column), axis=1)

print(updated_marks)

Output:

text

[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]]

Why reshape?

python

print(marks.shape)
print(project_marks.shape)
print(project_column.shape)

Output:

text

(4, 3)
(4,)
(4, 1)

For column-wise joining, both arrays must agree on the number of rows.

7. Adding Rows With `np.concatenate`

Now add two new students:

python

new_students = np.array([
    [81, 77, 84, 90],
    [69, 73, 71, 75],
])

all_marks = np.concatenate((updated_marks, new_students), axis=0)

print(all_marks)

Output:

text

[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]
 [81 77 84 90]
 [69 73 71 75]]

For row-wise joining, both arrays must agree on the number of columns.

8. Adding A Derived Column

A common data-preparation task is adding totals, averages, or flags.

Add a total marks column:

python

total_marks = all_marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((all_marks, total_marks), axis=1)

print(marks_with_total)

Output:

text

[[ 78  85  91  89 343]
 [ 62  70  68  74 274]
 [ 90  88  95  97 370]
 [ 55  60  64  66 245]
 [ 81  77  84  90 332]
 [ 69  73  71  75 288]]

keepdims=True keeps the result as a 2D column, which makes concatenation easier.

9. `np.append`: Useful, But Be Careful

np.append() can add values, but it often hides shape mistakes.

python

arr = np.array([[1, 2], [3, 4]])

print(np.append(arr, [[5, 6]], axis=0))

Output:

text

[[1 2]
 [3 4]
 [5 6]]

Without axis, np.append() flattens the data:

python

print(np.append(arr, [[5, 6]]))

Output:

text

[1 2 3 4 5 6]

For serious data work, prefer np.concatenate(), np.vstack(), or np.hstack() because they make shape expectations clearer.

10. Finding Unique Values

np.unique() returns sorted unique values.

python

categories = np.array(["basic", "pro", "basic", "enterprise", "pro"])

print(np.unique(categories))

Output:

text

['basic' 'enterprise' 'pro']

You can also count how often each value appears:

python

labels, counts = np.unique(categories, return_counts=True)

print(labels)
print(counts)

Output:

text

['basic' 'enterprise' 'pro']
[2 1 2]

This is useful for quick frequency tables.

11. Unique Rows And Columns

For 2D arrays, use axis.

python

events = np.array([
    [1, 10, 100],
    [2, 20, 200],
    [1, 10, 100],
    [3, 30, 300],
])

Unique rows:

python

print(np.unique(events, axis=0))

Output:

text

[[  1  10 100]
 [  2  20 200]
 [  3  30 300]]

Unique columns:

python

matrix = np.array([
    [1, 2, 1, 4],
    [5, 6, 5, 8],
])

print(np.unique(matrix, axis=1))

Output:

text

[[1 2 4]
 [5 6 8]]

Use this when duplicate records or duplicate feature columns need to be detected.

12. Adding Dimensions With `np.expand_dims`

Machine learning libraries often expect data in a specific number of dimensions.

Suppose one user's activity data is 1D:

python

activity = np.array([8, 10, 7, 12])

print(activity.shape)

Output:

text

(4,)

Make it one row:

python

row = np.expand_dims(activity, axis=0)

print(row)
print(row.shape)

Output:

text

[[ 8 10  7 12]]
(1, 4)

Make it one column:

python

column = np.expand_dims(activity, axis=1)

print(column)
print(column.shape)

Output:

text

[[ 8]
 [10]
 [ 7]
 [12]]
(4, 1)

The same result can often be written with reshape():

python

print(activity.reshape(1, -1).shape)
print(activity.reshape(-1, 1).shape)

13. Filtering With `np.where`

np.where() can return positions or choose values conditionally.

Create an array:

python

temperatures = np.array([28, 35, 41, 32, 39, 45])

Find positions where temperature is above 38:

python

hot_positions = np.where(temperatures > 38)

print(hot_positions)

Output:

text

(array([2, 4, 5]),)

Use those positions to get values:

python

print(temperatures[hot_positions])

Output:

text

[41 39 45]

14. Replacing Values With `np.where`

The three-argument form is:

python

np.where(condition, value_if_true, value_if_false)

Example: cap warning temperatures with a label value.

python

cleaned = np.where(temperatures > 40, 40, temperatures)

print(cleaned)

Output:

text

[28 35 40 32 39 40]

Example: create pass/fail labels:

python

exam_scores = np.array([82, 45, 67, 39, 90])

status = np.where(exam_scores >= 50, "pass", "retry")

print(status)

Output:

text

['pass' 'retry' 'pass' 'retry' 'pass']

15. Finding Best And Worst Positions

np.argmax() returns the index of the largest value.

python

daily_orders = np.array([120, 98, 145, 160, 132])

best_day = np.argmax(daily_orders)
worst_day = np.argmin(daily_orders)

print(best_day)
print(worst_day)

Output:

text

3
1

Index 3 has the highest order count. Index 1 has the lowest.

For 2D arrays:

python

weekly_orders = np.array([
    [120, 98, 145],
    [80, 110, 105],
    [150, 130, 170],
])

Best store per day:

python

print(np.argmax(weekly_orders, axis=0))

Output:

text

[2 2 2]

Best day per store:

python

print(np.argmax(weekly_orders, axis=1))

Output:

text

[2 1 2]

16. Cumulative Sum And Product

np.cumsum() calculates running totals.

python

revenue = np.array([1000, 1500, 1200, 1800])

print(np.cumsum(revenue))

Output:

text

[1000 2500 3700 5500]

For 2D arrays:

python

monthly_sales = np.array([
    [10, 12, 15],
    [8, 9, 11],
])

Cumulative sales across months for each product:

python

print(np.cumsum(monthly_sales, axis=1))

Output:

text

[[10 22 37]
 [ 8 17 28]]

np.cumprod() works similarly for running multiplication:

python

growth = np.array([1.05, 1.10, 0.95])

print(np.cumprod(growth))

Output:

text

[1.05    1.155   1.09725]

17. Percentiles And Median

A percentile tells you how a value compares to the distribution.

python

response_times = np.array([120, 180, 240, 300, 360, 420, 900])

Calculate the 50th, 75th, and 90th percentiles:

python

print(np.percentile(response_times, 50))
print(np.percentile(response_times, 75))
print(np.percentile(response_times, 90))

Output:

text

300.0
390.0
612.0

The 50th percentile is the median:

python

print(np.median(response_times))

Output:

text

300.0

Percentiles are useful when averages are misleading because of outliers.

18. Percentiles Along Axis

Suppose rows are products and columns are monthly sales:

python

sales_table = np.array([
    [100, 120, 140, 160],
    [80, 85, 90, 300],
    [200, 210, 220, 230],
])

Median per product:

python

print(np.percentile(sales_table, 50, axis=1))

Output:

text

[130.   87.5 215. ]

Median per month:

python

print(np.percentile(sales_table, 50, axis=0))

Output:

text

[100. 120. 140. 230.]

Again, shape and axis decide the meaning.

19. Histograms With `np.histogram`

np.histogram() counts how many values fall into ranges.

python

ages = np.array([18, 21, 22, 25, 27, 33, 35, 41, 45, 52, 60])

counts, bin_edges = np.histogram(ages, bins=[18, 30, 45, 65])

print(counts)
print(bin_edges)

Output:

text

[5 3 3]
[18 30 45 65]

This means:

5 values from 18 up to 30
3 values from 30 up to 45
3 values from 45 up to 65

Use histograms when you want a distribution summary without plotting yet.

20. Correlation With `np.corrcoef`

Correlation measures how two variables move together.

python

ad_spend = np.array([10, 20, 30, 40, 50])
sales = np.array([24, 38, 52, 68, 79])

correlation = np.corrcoef(ad_spend, sales)

print(correlation)

Output:

text

[[1.         0.99898688]
 [0.99898688 1.        ]]

The off-diagonal value is the correlation between ad spend and sales.

python

print(correlation[0, 1])

Output:

text

0.9989868773062354

A value close to 1 means strong positive correlation.

Important reminder: correlation does not prove causation.

21. Membership Checks With `np.isin`

np.isin() checks whether values are present in another collection.

python

user_ids = np.array([101, 102, 103, 104, 105, 106])
premium_ids = np.array([102, 105, 108])

mask = np.isin(user_ids, premium_ids)

print(mask)
print(user_ids[mask])

Output:

text

[False  True False False  True False]
[102 105]

This is useful for filtering by allowed IDs, selected categories, blocked values, or known labels.

22. Reversing Arrays With `np.flip`

For 1D arrays:

python

steps = np.array([1, 2, 3, 4, 5])

print(np.flip(steps))

Output:

text

[5 4 3 2 1]

For 2D arrays:

python

grid = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])

Flip rows:

python

print(np.flip(grid, axis=0))

Output:

text

[[7 8 9]
 [4 5 6]
 [1 2 3]]

Flip columns:

python

print(np.flip(grid, axis=1))

Output:

text

[[3 2 1]
 [6 5 4]
 [9 8 7]]

Flip both axes:

python

print(np.flip(grid))

Output:

text

[[9 8 7]
 [6 5 4]
 [3 2 1]]

23. Updating Values With `np.put`

np.put() updates positions in the flattened version of the array.

python

board = np.arange(1, 10).reshape(3, 3)

print(board)

Output:

text

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Update flattened positions 0 and 8:

python

np.put(board, [0, 8], [100, 900])

print(board)

Output:

text

[[100   2   3]
 [  4   5   6]
 [  7   8 900]]

Because np.put() mutates the original array, use it carefully.

In many cases, direct indexing is clearer:

python

board[0, 0] = 100
board[2, 2] = 900

24. Deleting Values With `np.delete`

np.delete() returns a new array with selected positions removed.

python

numbers = np.array([10, 20, 30, 40, 50])

without_first = np.delete(numbers, 0)

print(without_first)

Output:

text

[20 30 40 50]

Delete multiple positions:

python

print(np.delete(numbers, [1, 3]))

Output:

text

[10 30 50]

For 2D arrays:

python

table = np.arange(1, 13).reshape(3, 4)

print(table)

Output:

text

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Delete a row:

python

print(np.delete(table, 1, axis=0))

Output:

text

[[ 1  2  3  4]
 [ 9 10 11 12]]

Delete a column:

python

print(np.delete(table, 2, axis=1))

Output:

text

[[ 1  2  4]
 [ 5  6  8]
 [ 9 10 12]]

25. Set Operations

NumPy has useful set-style functions for 1D arrays.

python

course_a = np.array([101, 102, 103, 104])
course_b = np.array([103, 104, 105, 106])

Union:

python

print(np.union1d(course_a, course_b))

Output:

text

[101 102 103 104 105 106]

Intersection:

python

print(np.intersect1d(course_a, course_b))

Output:

text

[103 104]

Values in course_a but not in course_b:

python

print(np.setdiff1d(course_a, course_b))

Output:

text

[101 102]

Values that appear in one array but not both:

python

print(np.setxor1d(course_a, course_b))

Output:

text

[101 102 105 106]

These functions are helpful when comparing IDs, labels, selected items, feature lists, or category groups.

26. Clipping Values With `np.clip`

np.clip() limits values to a minimum and maximum range.

python

ratings = np.array([2, 5, 8, 11, -3, 7])

safe_ratings = np.clip(ratings, a_min=0, a_max=10)

print(safe_ratings)

Output:

text

[ 2  5  8 10  0  7]

This is useful for:

limiting outliers
keeping probabilities between 0 and 1
capping image pixel values
protecting dashboards from extreme values
preparing model features

Example with percentages:

python

predicted_discount = np.array([-5, 10, 25, 60, 120])

final_discount = np.clip(predicted_discount, 0, 50)

print(final_discount)

Output:

text

[ 0 10 25 50 50]

27. Mini Project: Rank Students After Adding New Marks

You have marks for 5 students across 4 subjects:

python

marks = np.array([
    [72, 81, 77, 69],
    [88, 90, 84, 91],
    [55, 61, 58, 64],
    [79, 74, 82, 80],
    [93, 89, 95, 90],
])

A new practical exam score arrives:

python

practical = np.array([85, 92, 67, 78, 96])

Add the practical score as a new column:

python

marks = np.concatenate((marks, practical.reshape(-1, 1)), axis=1)

Add total marks as another column:

python

total = marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((marks, total), axis=1)

Sort students by total marks in descending order:

python

ranked = marks_with_total[np.argsort(marks_with_total[:, -1])[::-1]]

print(ranked)

Get the top 2:

python

print(ranked[:2])

This combines:

reshaping
concatenation
row-wise sum
sorting rows by a derived column
slicing top results

28. Mini Project: Clean Sensor Readings

You receive sensor readings where values below 0 and above 100 are invalid.

python

readings = np.array([12, 45, -8, 60, 105, 88, 101, 0, 74])

Clip invalid values:

python

cleaned = np.clip(readings, 0, 100)

print(cleaned)

Output:

text

[ 12  45   0  60 100  88 100   0  74]

Find readings that were changed:

python

changed_positions = np.where(readings != cleaned)[0]

print(changed_positions)

Output:

text

[2 4 6]

Create labels:

python

labels = np.where(readings != cleaned, "corrected", "ok")

print(labels)

Output:

text

['ok' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'ok']

This is a realistic pattern for data cleaning.

29. Practice Exercises

Try these before checking the solutions.

Practice Lab

Exercise 1: Sort by total

Create a 4 by 3 array of product sales. Add a total column and sort rows by total sales in descending order.

Practice Lab

Exercise 2: Add a status column

Given exam scores for students, add a column that contains 1 if the student's average is at least 60, otherwise 0.

Practice Lab

Exercise 3: Unique customer visits

Given an array of customer IDs, print unique customers and how many times each customer appears.

Practice Lab

Exercise 4: Top product per day

Given a 2D array where rows are products and columns are days, find the product index with maximum sales for every day.

Practice Lab

Exercise 5: Clip and count outliers

Given an array of values, clip everything between 10 and 90. Count how many values were changed.

Practice Lab

Exercise 6: Membership filter

Given all user IDs and a list of blocked IDs, return only users who are not blocked.

Practice Lab

Exercise 7: Flip an image-like matrix

Create a 4 by 4 array and flip it vertically, horizontally, and both ways.

Practice Lab

Exercise 8: Histogram buckets

Create an array of ages and count how many people fall into age groups [0, 18, 30, 45, 60, 100].

Practice Lab

Exercise 9: Remove min and max

Create a 1D array and remove every occurrence of its minimum and maximum values.

Practice Lab

Exercise 10: Compare two batches

Given two arrays of product IDs, find products only in batch A, only in batch B, and products present in both.

30. Practice Solutions

Solution Key

Solution 1: Sort by total

python

sales = np.array([
    [40, 55, 60],
    [90, 70, 85],
    [30, 45, 35],
    [75, 80, 72],
])

totals = sales.sum(axis=1, keepdims=True)
with_total = np.concatenate((sales, totals), axis=1)
ranked = with_total[np.argsort(with_total[:, -1])[::-1]]

print(ranked)

Explanation

A NumPy array named sales is created, containing sales data for different categories.
The sum method computes the total sales for each row (category) while maintaining the original array's dimensions using keepdims=True.
The total sales are concatenated to the original sales array, creating a new array with_total that includes the totals as an additional column.
The rows of with_total are sorted in descending order based on the total sales using np.argsort and slicing.
Finally, the ranked array is printed, showing the sales data ordered by total sales.

Solution Key

Solution 2: Add a status column

python

scores = np.array([
    [70, 65, 80],
    [45, 50, 55],
    [90, 88, 92],
])

average = scores.mean(axis=1, keepdims=True)
status = np.where(average >= 60, 1, 0)

result = np.concatenate((scores, status), axis=1)

print(result)

Explanation

Initializes a NumPy array scores containing test scores for three students across three subjects.
Computes the average score for each student along the specified axis (rows) while maintaining the original dimensions using keepdims=True.
Uses np.where to create a binary status array, marking students as '1' (pass) if their average score is 60 or above, and '0' (fail) otherwise.
Concatenates the original scores with the status array to form a new array that includes both scores and pass/fail status.
Outputs the final combined array, showing each student's scores alongside their pass/fail status.

Solution Key

Solution 3: Unique customer visits

python

customers = np.array([101, 102, 101, 103, 102, 101, 104])

ids, counts = np.unique(customers, return_counts=True)

print(ids)
print(counts)

Explanation

The code initializes a NumPy array called customers containing customer IDs, some of which are repeated.
The np.unique() function is used to find unique customer IDs and count their occurrences, returning two arrays: ids for unique IDs and counts for their respective counts.
The unique IDs are printed to the console, showing which customers are present.
The counts of each unique ID are also printed, indicating how many times each customer ID appears in the original array.

Solution Key

Solution 4: Top product per day

python

sales = np.array([
    [20, 35, 30],
    [25, 30, 45],
    [40, 20, 25],
])

top_product_by_day = np.argmax(sales, axis=0)

print(top_product_by_day)

Explanation

The code initializes a 2D NumPy array named sales, representing sales figures for three products over three days.
The np.argmax function is used to find the index of the highest sales value for each day, specified by axis=0, which indicates that the operation is performed column-wise.
The result, stored in top_product_by_day, contains the indices of the top-selling products for each day.
Finally, the indices of the top products are printed to the console.

Solution Key

Solution 5: Clip and count outliers

python

values = np.array([5, 18, 44, 92, 100, 63, 7])

clipped = np.clip(values, 10, 90)
changed_count = np.sum(values != clipped)

print(clipped)
print(changed_count)

Explanation

The code initializes a NumPy array named values with a set of integers.
It uses the np.clip() function to limit the values in the array to a specified range, in this case between 10 and 90.
The result of the clipping is stored in the clipped variable.
The code calculates the number of elements that were changed during the clipping process by comparing the original and clipped arrays, using np.sum() to count the differences.
Finally, it prints the clipped array and the count of changed values to the console.

Solution Key

Solution 6: Membership filter

python

users = np.array([10, 11, 12, 13, 14, 15])
blocked = np.array([11, 15])

allowed_users = users[~np.isin(users, blocked)]

print(allowed_users)

Explanation

The code initializes two NumPy arrays: users containing a range of user IDs and blocked containing IDs that are not allowed.
It uses np.isin() to create a boolean array that identifies which users are in the blocked list.
The tilde operator ~ negates this boolean array, effectively marking users that are not blocked.
The filtered array allowed_users is created by indexing the users array with the negated boolean array.
Finally, it prints the allowed_users array, which contains only the IDs of users that are not blocked.

Solution Key

Solution 7: Flip an image-like matrix

python

image = np.arange(1, 17).reshape(4, 4)

print(np.flip(image, axis=0))
print(np.flip(image, axis=1))
print(np.flip(image))

Explanation

The code creates a 4x4 NumPy array filled with integers from 1 to 16 using np.arange and reshape.
np.flip(image, axis=0) flips the array vertically (upside down).
np.flip(image, axis=1) flips the array horizontally (left to right).
np.flip(image) flips the array both vertically and horizontally, resulting in a 180-degree rotation.
The print statements display the results of each flip operation.

Solution Key

Solution 8: Histogram buckets

python

ages = np.array([12, 17, 18, 24, 29, 30, 37, 44, 45, 61, 72])

counts, edges = np.histogram(ages, bins=[0, 18, 30, 45, 60, 100])

print(counts)
print(edges)

Explanation

The code initializes a NumPy array ages containing various age values.
It uses np.histogram to compute the frequency of ages within specified bins: [0, 18), [18, 30), [30, 45), [45, 60), and [60, 100).
The function returns two arrays: counts, which holds the number of ages in each bin, and edges, which defines the boundaries of the bins.
Finally, it prints the counts of ages in each bin and the edges of the bins to the console.

Solution Key

Solution 9: Remove min and max

python

arr = np.array([4, 9, 1, 3, 9, 2, 1, 7])

minimum = arr.min()
maximum = arr.max()

filtered = arr[(arr != minimum) & (arr != maximum)]

print(filtered)

Explanation

The code initializes a NumPy array arr with a set of integer values.
It calculates the minimum and maximum values in the array using the min() and max() methods.
A filtered array filtered is created by excluding the minimum and maximum values using boolean indexing.
Finally, the filtered array is printed, displaying only the values that are neither the minimum nor the maximum.

Solution Key

Solution 10: Compare two batches

python

batch_a = np.array([101, 102, 103, 104])
batch_b = np.array([103, 104, 105, 106])

only_a = np.setdiff1d(batch_a, batch_b)
only_b = np.setdiff1d(batch_b, batch_a)
both = np.intersect1d(batch_a, batch_b)

print("Only A:", only_a)
print("Only B:", only_b)
print("Both:", both)

Explanation

batch_a and batch_b are defined as NumPy arrays containing integer values.
np.setdiff1d is used to find elements that are in batch_a but not in batch_b, stored in only_a.
Similarly, only_b contains elements that are in batch_b but not in batch_a.
np.intersect1d identifies elements that are present in both arrays, stored in both.
The results are printed, showing unique elements from each array and their intersection.

31. Common Mistakes

Mistake 1: Forgetting that `np.append()` flattens by default

Always pass axis if you want to preserve 2D structure.

Mistake 2: Sorting values when you meant to sort rows

Use np.sort() to sort values inside arrays. Use np.argsort() to reorder rows based on a column or score.

Mistake 3: Concatenating arrays with mismatched dimensions

Print shapes before combining arrays:

python

print(a.shape)
print(b.shape)

Explanation

The print function outputs the shape of the array a using the .shape attribute, which returns a tuple representing the dimensions of the array.
Similarly, the shape of the array b is printed, providing insight into its structure and size.
This is useful for debugging and understanding the data being processed in numerical computations or machine learning tasks.
The shapes can indicate whether the arrays are compatible for operations like addition, multiplication, or concatenation.

Mistake 4: Confusing axis meanings

For 2D arrays:

axis=0 works down rows and returns one result per column
axis=1 works across columns and returns one result per row

Mistake 5: Mutating arrays accidentally

Functions like np.put() modify the original array. Functions like np.delete() and np.sort() usually return new arrays.

Final Takeaway

NumPy becomes powerful when you stop thinking only in terms of individual values and start thinking in terms of whole-array transformations.

The most useful habits are:

Print shape before combining arrays.
Use axis intentionally.
Prefer concatenate, vstack, or hstack when structure matters.
Use boolean masks and np.where() instead of manual loops.
Use argsort, argmax, and argmin when you need positions.
Use clip, percentile, and histogram for quick data cleaning and analysis.

These tricks are small individually, but together they make NumPy feel like a practical data toolkit instead of just an array library.

Sources and Further Reading

NumPy documentation: https://numpy.org/doc/
NumPy sorting reference: https://numpy.org/doc/stable/reference/generated/numpy.sort.html
NumPy concatenate reference: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
NumPy unique reference: https://numpy.org/doc/stable/reference/generated/numpy.unique.html
NumPy where reference: https://numpy.org/doc/stable/reference/generated/numpy.where.html
NumPy histogram reference: https://numpy.org/doc/stable/reference/generated/numpy.histogram.html
NumPy set routines: https://numpy.org/doc/stable/reference/routines.set.html

Advanced NumPy: Sorting, Searching & Essential Tricks

AI Insights

Exercise 1: Sort by total

Exercise 2: Add a status column

Exercise 3: Unique customer visits

Exercise 4: Top product per day

Exercise 5: Clip and count outliers

Exercise 6: Membership filter

Exercise 7: Flip an image-like matrix

Exercise 8: Histogram buckets

Exercise 9: Remove min and max

Exercise 10: Compare two batches

Solution 1: Sort by total

Solution 2: Add a status column

Solution 3: Unique customer visits

Solution 4: Top product per day

Solution 5: Clip and count outliers

Solution 6: Membership filter

Solution 7: Flip an image-like matrix

Solution 8: Histogram buckets

Solution 9: Remove min and max

Solution 10: Compare two batches