#Numpy#pythonIntermediate

Advanced NumPy: Sorting, Searching & Essential Tricks

May 27, 2026
28 min read

AI Insights

Powered by GPT-4o-mini

Verified Context: advanced-numpy-sorting-searching-essential-tricks
Quick Answer

Learn practical NumPy tricks for real data work: sorting arrays, adding rows and columns, finding unique values, filtering with conditions, ranking values, cumulative calculations, percentiles, histograms, correlation, set operations, clipping, and practice tasks.

Quick Summary

Learn advanced NumPy techniques for sorting, searching, and array manipulation. Enhance your data processing skills with essential tricks!

NumPy Tricks: Sorting, Filtering, Reshaping, Statistics, and Set Operations

Once you understand NumPy arrays, shapes, axes, indexing, and broadcasting, the next step is learning the small tools that make day-to-day array work faster.

These are not "magic tricks." They are practical patterns you will use when cleaning data, preparing model inputs, analyzing scores, ranking records, creating summary features, and transforming arrays before sending them to Pandas, scikit-learn, visualization tools, or machine learning models.

In this lesson, you will learn how to:

  • sort 1D and 2D arrays
  • add rows and columns safely
  • combine arrays with concatenate
  • find unique values and unique rows
  • add dimensions with expand_dims
  • filter and replace values using where
  • find best and worst positions with argmax and argmin
  • calculate running totals with cumsum
  • calculate percentiles and medians
  • build frequency tables with histogram
  • measure correlation with corrcoef
  • check membership with isin
  • reverse arrays with flip
  • update and delete values carefully
  • use NumPy set operations
  • cap extreme values with clip

The examples are written around small business, student, and analytics-style datasets so you can see how these functions appear in real work.

1. Setup

Import NumPy with the standard alias:

python
import numpy as np

For examples that use random data, use a generator with a seed:

python
rng = np.random.default_rng(42)

This keeps your output reproducible while learning.

2. Sorting 1D Arrays With np.sort

np.sort() returns a sorted copy of an array.

python
scores = np.array([72, 95, 61, 88, 75])

sorted_scores = np.sort(scores)

print(sorted_scores)

Output:

text
[61 72 75 88 95]

The original array is not changed:

python
print(scores)

Output:

text
[72 95 61 88 75]

To sort in descending order, reverse the sorted result:

python
descending_scores = np.sort(scores)[::-1]

print(descending_scores)

Output:

text
[95 88 75 72 61]

3. Sorting 2D Arrays

For a 2D array, axis controls the direction of sorting.

python
sales = np.array([
    [45, 80, 62],
    [90, 55, 73],
    [38, 96, 68],
])

Sort values inside each row:

python
print(np.sort(sales, axis=1))

Output:

text
[[45 62 80]
 [55 73 90]
 [38 68 96]]

Sort values inside each column:

python
print(np.sort(sales, axis=0))

Output:

text
[[38 55 62]
 [45 80 68]
 [90 96 73]]

Use this when you want to sort values within rows or columns independently.

4. Sorting Rows By One Column

Sometimes you do not want to sort values inside each row. You want to reorder the rows based on one column.

Suppose each row is:

text
[store_id, weekday_sales, weekend_sales]
python
store_sales = np.array([
    [101, 450, 620],
    [102, 390, 710],
    [103, 520, 560],
    [104, 480, 800],
])

Sort rows by weekend sales:

python
order = np.argsort(store_sales[:, 2])
sorted_by_weekend = store_sales[order]

print(sorted_by_weekend)

Output:

text
[[103 520 560]
 [101 450 620]
 [102 390 710]
 [104 480 800]]

For descending order:

python
best_weekend_first = store_sales[np.argsort(store_sales[:, 2])[::-1]]

print(best_weekend_first)

Output:

text
[[104 480 800]
 [102 390 710]
 [101 450 620]
 [103 520 560]]

This pattern is very useful for ranking tables.

5. Sorting Rows By A Calculated Value

You can sort rows by a value that does not exist yet.

Example: sort stores by total sales.

python
totals = store_sales[:, 1] + store_sales[:, 2]
order = np.argsort(totals)[::-1]

ranked_stores = store_sales[order]

print(ranked_stores)
print(totals[order])

Output:

text
[[104 480 800]
 [102 390 710]
 [103 520 560]
 [101 450 620]]
[1280 1100 1080 1070]

The important idea:

python
array[np.argsort(values)]

Use it when you want to rearrange records based on a score, total, date, error value, or prediction confidence.

6. Adding A Column With np.concatenate

Assume you have marks for 4 students in 3 subjects:

python
marks = np.array([
    [78, 85, 91],
    [62, 70, 68],
    [90, 88, 95],
    [55, 60, 64],
])

Now a new subject score arrives:

python
project_marks = np.array([89, 74, 97, 66])

This is a 1D array. To add it as a column, convert it to shape (4, 1):

python
project_column = project_marks.reshape(-1, 1)

updated_marks = np.concatenate((marks, project_column), axis=1)

print(updated_marks)

Output:

text
[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]]

Why reshape?

python
print(marks.shape)
print(project_marks.shape)
print(project_column.shape)

Output:

text
(4, 3)
(4,)
(4, 1)

For column-wise joining, both arrays must agree on the number of rows.

7. Adding Rows With np.concatenate

Now add two new students:

python
new_students = np.array([
    [81, 77, 84, 90],
    [69, 73, 71, 75],
])

all_marks = np.concatenate((updated_marks, new_students), axis=0)

print(all_marks)

Output:

text
[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]
 [81 77 84 90]
 [69 73 71 75]]

For row-wise joining, both arrays must agree on the number of columns.

8. Adding A Derived Column

A common data-preparation task is adding totals, averages, or flags.

Add a total marks column:

python
total_marks = all_marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((all_marks, total_marks), axis=1)

print(marks_with_total)

Output:

text
[[ 78  85  91  89 343]
 [ 62  70  68  74 274]
 [ 90  88  95  97 370]
 [ 55  60  64  66 245]
 [ 81  77  84  90 332]
 [ 69  73  71  75 288]]

keepdims=True keeps the result as a 2D column, which makes concatenation easier.

9. np.append: Useful, But Be Careful

np.append() can add values, but it often hides shape mistakes.

python
arr = np.array([[1, 2], [3, 4]])

print(np.append(arr, [[5, 6]], axis=0))

Output:

text
[[1 2]
 [3 4]
 [5 6]]

Without axis, np.append() flattens the data:

python
print(np.append(arr, [[5, 6]]))

Output:

text
[1 2 3 4 5 6]

For serious data work, prefer np.concatenate(), np.vstack(), or np.hstack() because they make shape expectations clearer.

10. Finding Unique Values

np.unique() returns sorted unique values.

python
categories = np.array(["basic", "pro", "basic", "enterprise", "pro"])

print(np.unique(categories))

Output:

text
['basic' 'enterprise' 'pro']

You can also count how often each value appears:

python
labels, counts = np.unique(categories, return_counts=True)

print(labels)
print(counts)

Output:

text
['basic' 'enterprise' 'pro']
[2 1 2]

This is useful for quick frequency tables.

11. Unique Rows And Columns

For 2D arrays, use axis.

python
events = np.array([
    [1, 10, 100],
    [2, 20, 200],
    [1, 10, 100],
    [3, 30, 300],
])

Unique rows:

python
print(np.unique(events, axis=0))

Output:

text
[[  1  10 100]
 [  2  20 200]
 [  3  30 300]]

Unique columns:

python
matrix = np.array([
    [1, 2, 1, 4],
    [5, 6, 5, 8],
])

print(np.unique(matrix, axis=1))

Output:

text
[[1 2 4]
 [5 6 8]]

Use this when duplicate records or duplicate feature columns need to be detected.

12. Adding Dimensions With np.expand_dims

Machine learning libraries often expect data in a specific number of dimensions.

Suppose one user's activity data is 1D:

python
activity = np.array([8, 10, 7, 12])

print(activity.shape)

Output:

text
(4,)

Make it one row:

python
row = np.expand_dims(activity, axis=0)

print(row)
print(row.shape)

Output:

text
[[ 8 10  7 12]]
(1, 4)

Make it one column:

python
column = np.expand_dims(activity, axis=1)

print(column)
print(column.shape)

Output:

text
[[ 8]
 [10]
 [ 7]
 [12]]
(4, 1)

The same result can often be written with reshape():

python
print(activity.reshape(1, -1).shape)
print(activity.reshape(-1, 1).shape)

13. Filtering With np.where

np.where() can return positions or choose values conditionally.

Create an array:

python
temperatures = np.array([28, 35, 41, 32, 39, 45])

Find positions where temperature is above 38:

python
hot_positions = np.where(temperatures > 38)

print(hot_positions)

Output:

text
(array([2, 4, 5]),)

Use those positions to get values:

python
print(temperatures[hot_positions])

Output:

text
[41 39 45]

14. Replacing Values With np.where

The three-argument form is:

python
np.where(condition, value_if_true, value_if_false)

Example: cap warning temperatures with a label value.

python
cleaned = np.where(temperatures > 40, 40, temperatures)

print(cleaned)

Output:

text
[28 35 40 32 39 40]

Example: create pass/fail labels:

python
exam_scores = np.array([82, 45, 67, 39, 90])

status = np.where(exam_scores >= 50, "pass", "retry")

print(status)

Output:

text
['pass' 'retry' 'pass' 'retry' 'pass']

15. Finding Best And Worst Positions

np.argmax() returns the index of the largest value.

python
daily_orders = np.array([120, 98, 145, 160, 132])

best_day = np.argmax(daily_orders)
worst_day = np.argmin(daily_orders)

print(best_day)
print(worst_day)

Output:

text
3
1

Index 3 has the highest order count. Index 1 has the lowest.

For 2D arrays:

python
weekly_orders = np.array([
    [120, 98, 145],
    [80, 110, 105],
    [150, 130, 170],
])

Best store per day:

python
print(np.argmax(weekly_orders, axis=0))

Output:

text
[2 2 2]

Best day per store:

python
print(np.argmax(weekly_orders, axis=1))

Output:

text
[2 1 2]

16. Cumulative Sum And Product

np.cumsum() calculates running totals.

python
revenue = np.array([1000, 1500, 1200, 1800])

print(np.cumsum(revenue))

Output:

text
[1000 2500 3700 5500]

For 2D arrays:

python
monthly_sales = np.array([
    [10, 12, 15],
    [8, 9, 11],
])

Cumulative sales across months for each product:

python
print(np.cumsum(monthly_sales, axis=1))

Output:

text
[[10 22 37]
 [ 8 17 28]]

np.cumprod() works similarly for running multiplication:

python
growth = np.array([1.05, 1.10, 0.95])

print(np.cumprod(growth))

Output:

text
[1.05    1.155   1.09725]

17. Percentiles And Median

A percentile tells you how a value compares to the distribution.

python
response_times = np.array([120, 180, 240, 300, 360, 420, 900])

Calculate the 50th, 75th, and 90th percentiles:

python
print(np.percentile(response_times, 50))
print(np.percentile(response_times, 75))
print(np.percentile(response_times, 90))

Output:

text
300.0
390.0
612.0

The 50th percentile is the median:

python
print(np.median(response_times))

Output:

text
300.0

Percentiles are useful when averages are misleading because of outliers.

18. Percentiles Along Axis

Suppose rows are products and columns are monthly sales:

python
sales_table = np.array([
    [100, 120, 140, 160],
    [80, 85, 90, 300],
    [200, 210, 220, 230],
])

Median per product:

python
print(np.percentile(sales_table, 50, axis=1))

Output:

text
[130.   87.5 215. ]

Median per month:

python
print(np.percentile(sales_table, 50, axis=0))

Output:

text
[100. 120. 140. 230.]

Again, shape and axis decide the meaning.

19. Histograms With np.histogram

np.histogram() counts how many values fall into ranges.

python
ages = np.array([18, 21, 22, 25, 27, 33, 35, 41, 45, 52, 60])

counts, bin_edges = np.histogram(ages, bins=[18, 30, 45, 65])

print(counts)
print(bin_edges)

Output:

text
[5 3 3]
[18 30 45 65]

This means:

  • 5 values from 18 up to 30
  • 3 values from 30 up to 45
  • 3 values from 45 up to 65

Use histograms when you want a distribution summary without plotting yet.

20. Correlation With np.corrcoef

Correlation measures how two variables move together.

python
ad_spend = np.array([10, 20, 30, 40, 50])
sales = np.array([24, 38, 52, 68, 79])

correlation = np.corrcoef(ad_spend, sales)

print(correlation)

Output:

text
[[1.         0.99898688]
 [0.99898688 1.        ]]

The off-diagonal value is the correlation between ad spend and sales.

python
print(correlation[0, 1])

Output:

text
0.9989868773062354

A value close to 1 means strong positive correlation.

Important reminder: correlation does not prove causation.

21. Membership Checks With np.isin

np.isin() checks whether values are present in another collection.

python
user_ids = np.array([101, 102, 103, 104, 105, 106])
premium_ids = np.array([102, 105, 108])

mask = np.isin(user_ids, premium_ids)

print(mask)
print(user_ids[mask])

Output:

text
[False  True False False  True False]
[102 105]

This is useful for filtering by allowed IDs, selected categories, blocked values, or known labels.

22. Reversing Arrays With np.flip

For 1D arrays:

python
steps = np.array([1, 2, 3, 4, 5])

print(np.flip(steps))

Output:

text
[5 4 3 2 1]

For 2D arrays:

python
grid = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])

Flip rows:

python
print(np.flip(grid, axis=0))

Output:

text
[[7 8 9]
 [4 5 6]
 [1 2 3]]

Flip columns:

python
print(np.flip(grid, axis=1))

Output:

text
[[3 2 1]
 [6 5 4]
 [9 8 7]]

Flip both axes:

python
print(np.flip(grid))

Output:

text
[[9 8 7]
 [6 5 4]
 [3 2 1]]

23. Updating Values With np.put

np.put() updates positions in the flattened version of the array.

python
board = np.arange(1, 10).reshape(3, 3)

print(board)

Output:

text
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Update flattened positions 0 and 8:

python
np.put(board, [0, 8], [100, 900])

print(board)

Output:

text
[[100   2   3]
 [  4   5   6]
 [  7   8 900]]

Because np.put() mutates the original array, use it carefully.

In many cases, direct indexing is clearer:

python
board[0, 0] = 100
board[2, 2] = 900

24. Deleting Values With np.delete

np.delete() returns a new array with selected positions removed.

python
numbers = np.array([10, 20, 30, 40, 50])

without_first = np.delete(numbers, 0)

print(without_first)

Output:

text
[20 30 40 50]

Delete multiple positions:

python
print(np.delete(numbers, [1, 3]))

Output:

text
[10 30 50]

For 2D arrays:

python
table = np.arange(1, 13).reshape(3, 4)

print(table)

Output:

text
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Delete a row:

python
print(np.delete(table, 1, axis=0))

Output:

text
[[ 1  2  3  4]
 [ 9 10 11 12]]

Delete a column:

python
print(np.delete(table, 2, axis=1))

Output:

text
[[ 1  2  4]
 [ 5  6  8]
 [ 9 10 12]]

25. Set Operations

NumPy has useful set-style functions for 1D arrays.

python
course_a = np.array([101, 102, 103, 104])
course_b = np.array([103, 104, 105, 106])

Union:

python
print(np.union1d(course_a, course_b))

Output:

text
[101 102 103 104 105 106]

Intersection:

python
print(np.intersect1d(course_a, course_b))

Output:

text
[103 104]

Values in course_a but not in course_b:

python
print(np.setdiff1d(course_a, course_b))

Output:

text
[101 102]

Values that appear in one array but not both:

python
print(np.setxor1d(course_a, course_b))

Output:

text
[101 102 105 106]

These functions are helpful when comparing IDs, labels, selected items, feature lists, or category groups.

26. Clipping Values With np.clip

np.clip() limits values to a minimum and maximum range.

python
ratings = np.array([2, 5, 8, 11, -3, 7])

safe_ratings = np.clip(ratings, a_min=0, a_max=10)

print(safe_ratings)

Output:

text
[ 2  5  8 10  0  7]

This is useful for:

  • limiting outliers
  • keeping probabilities between 0 and 1
  • capping image pixel values
  • protecting dashboards from extreme values
  • preparing model features

Example with percentages:

python
predicted_discount = np.array([-5, 10, 25, 60, 120])

final_discount = np.clip(predicted_discount, 0, 50)

print(final_discount)

Output:

text
[ 0 10 25 50 50]

27. Mini Project: Rank Students After Adding New Marks

You have marks for 5 students across 4 subjects:

python
marks = np.array([
    [72, 81, 77, 69],
    [88, 90, 84, 91],
    [55, 61, 58, 64],
    [79, 74, 82, 80],
    [93, 89, 95, 90],
])

A new practical exam score arrives:

python
practical = np.array([85, 92, 67, 78, 96])

Add the practical score as a new column:

python
marks = np.concatenate((marks, practical.reshape(-1, 1)), axis=1)

Add total marks as another column:

python
total = marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((marks, total), axis=1)

Sort students by total marks in descending order:

python
ranked = marks_with_total[np.argsort(marks_with_total[:, -1])[::-1]]

print(ranked)

Get the top 2:

python
print(ranked[:2])

This combines:

  • reshaping
  • concatenation
  • row-wise sum
  • sorting rows by a derived column
  • slicing top results

28. Mini Project: Clean Sensor Readings

You receive sensor readings where values below 0 and above 100 are invalid.

python
readings = np.array([12, 45, -8, 60, 105, 88, 101, 0, 74])

Clip invalid values:

python
cleaned = np.clip(readings, 0, 100)

print(cleaned)

Output:

text
[ 12  45   0  60 100  88 100   0  74]

Find readings that were changed:

python
changed_positions = np.where(readings != cleaned)[0]

print(changed_positions)

Output:

text
[2 4 6]

Create labels:

python
labels = np.where(readings != cleaned, "corrected", "ok")

print(labels)

Output:

text
['ok' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'ok']

This is a realistic pattern for data cleaning.

29. Practice Exercises

Try these before checking the solutions.

Practice Lab

Exercise 1: Sort by total

Create a 4 by 3 array of product sales. Add a total column and sort rows by total sales in descending order.

Practice Lab

Exercise 2: Add a status column

Given exam scores for students, add a column that contains 1 if the student's average is at least 60, otherwise 0.

Practice Lab

Exercise 3: Unique customer visits

Given an array of customer IDs, print unique customers and how many times each customer appears.

Practice Lab

Exercise 4: Top product per day

Given a 2D array where rows are products and columns are days, find the product index with maximum sales for every day.

Practice Lab

Exercise 5: Clip and count outliers

Given an array of values, clip everything between 10 and 90. Count how many values were changed.

Practice Lab

Exercise 6: Membership filter

Given all user IDs and a list of blocked IDs, return only users who are not blocked.

Practice Lab

Exercise 7: Flip an image-like matrix

Create a 4 by 4 array and flip it vertically, horizontally, and both ways.

Practice Lab

Exercise 8: Histogram buckets

Create an array of ages and count how many people fall into age groups [0, 18, 30, 45, 60, 100].

Practice Lab

Exercise 9: Remove min and max

Create a 1D array and remove every occurrence of its minimum and maximum values.

Practice Lab

Exercise 10: Compare two batches

Given two arrays of product IDs, find products only in batch A, only in batch B, and products present in both.

30. Practice Solutions

Solution Key

Solution 1: Sort by total

python
sales = np.array([
    [40, 55, 60],
    [90, 70, 85],
    [30, 45, 35],
    [75, 80, 72],
])

totals = sales.sum(axis=1, keepdims=True)
with_total = np.concatenate((sales, totals), axis=1)
ranked = with_total[np.argsort(with_total[:, -1])[::-1]]

print(ranked)

Explanation

  • A NumPy array named sales is created, containing sales data for different categories.
  • The sum method computes the total sales for each row (category) while maintaining the original array's dimensions using keepdims=True.
  • The total sales are concatenated to the original sales array, creating a new array with_total that includes the totals as an additional column.
  • The rows of with_total are sorted in descending order based on the total sales using np.argsort and slicing.
  • Finally, the ranked array is printed, showing the sales data ordered by total sales.

Solution Key

Solution 2: Add a status column

python
scores = np.array([
    [70, 65, 80],
    [45, 50, 55],
    [90, 88, 92],
])

average = scores.mean(axis=1, keepdims=True)
status = np.where(average >= 60, 1, 0)

result = np.concatenate((scores, status), axis=1)

print(result)

Explanation

  • Initializes a NumPy array scores containing test scores for three students across three subjects.
  • Computes the average score for each student along the specified axis (rows) while maintaining the original dimensions using keepdims=True.
  • Uses np.where to create a binary status array, marking students as '1' (pass) if their average score is 60 or above, and '0' (fail) otherwise.
  • Concatenates the original scores with the status array to form a new array that includes both scores and pass/fail status.
  • Outputs the final combined array, showing each student's scores alongside their pass/fail status.

Solution Key

Solution 3: Unique customer visits

python
customers = np.array([101, 102, 101, 103, 102, 101, 104])

ids, counts = np.unique(customers, return_counts=True)

print(ids)
print(counts)

Explanation

  • The code initializes a NumPy array called customers containing customer IDs, some of which are repeated.
  • The np.unique() function is used to find unique customer IDs and count their occurrences, returning two arrays: ids for unique IDs and counts for their respective counts.
  • The unique IDs are printed to the console, showing which customers are present.
  • The counts of each unique ID are also printed, indicating how many times each customer ID appears in the original array.

Solution Key

Solution 4: Top product per day

python
sales = np.array([
    [20, 35, 30],
    [25, 30, 45],
    [40, 20, 25],
])

top_product_by_day = np.argmax(sales, axis=0)

print(top_product_by_day)

Explanation

  • The code initializes a 2D NumPy array named sales, representing sales figures for three products over three days.
  • The np.argmax function is used to find the index of the highest sales value for each day, specified by axis=0, which indicates that the operation is performed column-wise.
  • The result, stored in top_product_by_day, contains the indices of the top-selling products for each day.
  • Finally, the indices of the top products are printed to the console.

Solution Key

Solution 5: Clip and count outliers

python
values = np.array([5, 18, 44, 92, 100, 63, 7])

clipped = np.clip(values, 10, 90)
changed_count = np.sum(values != clipped)

print(clipped)
print(changed_count)

Explanation

  • The code initializes a NumPy array named values with a set of integers.
  • It uses the np.clip() function to limit the values in the array to a specified range, in this case between 10 and 90.
  • The result of the clipping is stored in the clipped variable.
  • The code calculates the number of elements that were changed during the clipping process by comparing the original and clipped arrays, using np.sum() to count the differences.
  • Finally, it prints the clipped array and the count of changed values to the console.

Solution Key

Solution 6: Membership filter

python
users = np.array([10, 11, 12, 13, 14, 15])
blocked = np.array([11, 15])

allowed_users = users[~np.isin(users, blocked)]

print(allowed_users)

Explanation

  • The code initializes two NumPy arrays: users containing a range of user IDs and blocked containing IDs that are not allowed.
  • It uses np.isin() to create a boolean array that identifies which users are in the blocked list.
  • The tilde operator ~ negates this boolean array, effectively marking users that are not blocked.
  • The filtered array allowed_users is created by indexing the users array with the negated boolean array.
  • Finally, it prints the allowed_users array, which contains only the IDs of users that are not blocked.

Solution Key

Solution 7: Flip an image-like matrix

python
image = np.arange(1, 17).reshape(4, 4)

print(np.flip(image, axis=0))
print(np.flip(image, axis=1))
print(np.flip(image))

Explanation

  • The code creates a 4x4 NumPy array filled with integers from 1 to 16 using np.arange and reshape.
  • np.flip(image, axis=0) flips the array vertically (upside down).
  • np.flip(image, axis=1) flips the array horizontally (left to right).
  • np.flip(image) flips the array both vertically and horizontally, resulting in a 180-degree rotation.
  • The print statements display the results of each flip operation.

Solution Key

Solution 8: Histogram buckets

python
ages = np.array([12, 17, 18, 24, 29, 30, 37, 44, 45, 61, 72])

counts, edges = np.histogram(ages, bins=[0, 18, 30, 45, 60, 100])

print(counts)
print(edges)

Explanation

  • The code initializes a NumPy array ages containing various age values.
  • It uses np.histogram to compute the frequency of ages within specified bins: [0, 18), [18, 30), [30, 45), [45, 60), and [60, 100).
  • The function returns two arrays: counts, which holds the number of ages in each bin, and edges, which defines the boundaries of the bins.
  • Finally, it prints the counts of ages in each bin and the edges of the bins to the console.

Solution Key

Solution 9: Remove min and max

python
arr = np.array([4, 9, 1, 3, 9, 2, 1, 7])

minimum = arr.min()
maximum = arr.max()

filtered = arr[(arr != minimum) & (arr != maximum)]

print(filtered)

Explanation

  • The code initializes a NumPy array arr with a set of integer values.
  • It calculates the minimum and maximum values in the array using the min() and max() methods.
  • A filtered array filtered is created by excluding the minimum and maximum values using boolean indexing.
  • Finally, the filtered array is printed, displaying only the values that are neither the minimum nor the maximum.

Solution Key

Solution 10: Compare two batches

python
batch_a = np.array([101, 102, 103, 104])
batch_b = np.array([103, 104, 105, 106])

only_a = np.setdiff1d(batch_a, batch_b)
only_b = np.setdiff1d(batch_b, batch_a)
both = np.intersect1d(batch_a, batch_b)

print("Only A:", only_a)
print("Only B:", only_b)
print("Both:", both)

Explanation

  • batch_a and batch_b are defined as NumPy arrays containing integer values.
  • np.setdiff1d is used to find elements that are in batch_a but not in batch_b, stored in only_a.
  • Similarly, only_b contains elements that are in batch_b but not in batch_a.
  • np.intersect1d identifies elements that are present in both arrays, stored in both.
  • The results are printed, showing unique elements from each array and their intersection.

31. Common Mistakes

Mistake 1: Forgetting that np.append() flattens by default

Always pass axis if you want to preserve 2D structure.

Mistake 2: Sorting values when you meant to sort rows

Use np.sort() to sort values inside arrays. Use np.argsort() to reorder rows based on a column or score.

Mistake 3: Concatenating arrays with mismatched dimensions

Print shapes before combining arrays:

python
print(a.shape)
print(b.shape)

Explanation

  • The print function outputs the shape of the array a using the .shape attribute, which returns a tuple representing the dimensions of the array.
  • Similarly, the shape of the array b is printed, providing insight into its structure and size.
  • This is useful for debugging and understanding the data being processed in numerical computations or machine learning tasks.
  • The shapes can indicate whether the arrays are compatible for operations like addition, multiplication, or concatenation.

Mistake 4: Confusing axis meanings

For 2D arrays:

  • axis=0 works down rows and returns one result per column
  • axis=1 works across columns and returns one result per row

Mistake 5: Mutating arrays accidentally

Functions like np.put() modify the original array. Functions like np.delete() and np.sort() usually return new arrays.

Final Takeaway

NumPy becomes powerful when you stop thinking only in terms of individual values and start thinking in terms of whole-array transformations.

The most useful habits are:

  1. Print shape before combining arrays.
  2. Use axis intentionally.
  3. Prefer concatenate, vstack, or hstack when structure matters.
  4. Use boolean masks and np.where() instead of manual loops.
  5. Use argsort, argmax, and argmin when you need positions.
  6. Use clip, percentile, and histogram for quick data cleaning and analysis.

These tricks are small individually, but together they make NumPy feel like a practical data toolkit instead of just an array library.

Sources and Further Reading