# Mastering Advanced NumPy: Essential Sorting and Searching Techniques
URL: https://madhudadi.in/blog/posts/advanced-numpy-sorting-searching-essential-tricks
Published: 2026-05-27
Tags: Numpy, python
Read time: 28 min
Difficulty: intermediate
> Learn practical NumPy tricks for real data work: sorting arrays, adding rows and columns, finding unique values, filtering with conditions, ranking values, cumulative calculations, percentiles, histograms, correlation, set operations, clipping, and practice tasks.# NumPy Tricks: Sorting, Filtering, Reshaping, Statistics, and Set Operations

Once you understand NumPy arrays, shapes, axes, indexing, and broadcasting, the next step is learning the small tools that make day-to-day array work faster.

These are not "magic tricks." They are practical patterns you will use when cleaning data, preparing model inputs, analyzing scores, ranking records, creating summary features, and transforming arrays before sending them to Pandas, scikit-learn, visualization tools, or machine learning models.

In this lesson, you will learn how to:

- sort 1D and 2D arrays
- add rows and columns safely
- combine arrays with `concatenate`
- find unique values and unique rows
- add dimensions with `expand_dims`
- filter and replace values using `where`
- find best and worst positions with `argmax` and `argmin`
- calculate running totals with `cumsum`
- calculate percentiles and medians
- build frequency tables with `histogram`
- measure correlation with `corrcoef`
- check membership with `isin`
- reverse arrays with `flip`
- update and delete values carefully
- use NumPy set operations
- cap extreme values with `clip`

The examples are written around small business, student, and analytics-style datasets so you can see how these functions appear in real work.

## 1. Setup

Import NumPy with the standard alias:

```python
import numpy as np
```

For examples that use random data, use a generator with a seed:

```python
rng = np.random.default_rng(42)
```

This keeps your output reproducible while learning.

## 2. Sorting 1D Arrays With `np.sort`

`np.sort()` returns a sorted copy of an array.

```python
scores = np.array([72, 95, 61, 88, 75])

sorted_scores = np.sort(scores)

print(sorted_scores)
```

Output:

```text
[61 72 75 88 95]
```

The original array is not changed:

```python
print(scores)
```

Output:

```text
[72 95 61 88 75]
```

To sort in descending order, reverse the sorted result:

```python
descending_scores = np.sort(scores)[::-1]

print(descending_scores)
```

Output:

```text
[95 88 75 72 61]
```

## 3. Sorting 2D Arrays

For a 2D array, `axis` controls the direction of sorting.

```python
sales = np.array([
    [45, 80, 62],
    [90, 55, 73],
    [38, 96, 68],
])
```

Sort values inside each row:

```python
print(np.sort(sales, axis=1))
```

Output:

```text
[[45 62 80]
 [55 73 90]
 [38 68 96]]
```

Sort values inside each column:

```python
print(np.sort(sales, axis=0))
```

Output:

```text
[[38 55 62]
 [45 80 68]
 [90 96 73]]
```

Use this when you want to sort values within rows or columns independently.

## 4. Sorting Rows By One Column

Sometimes you do not want to sort values inside each row. You want to reorder the rows based on one column.

Suppose each row is:

```text
[store_id, weekday_sales, weekend_sales]
```

```python
store_sales = np.array([
    [101, 450, 620],
    [102, 390, 710],
    [103, 520, 560],
    [104, 480, 800],
])
```

Sort rows by weekend sales:

```python
order = np.argsort(store_sales[:, 2])
sorted_by_weekend = store_sales[order]

print(sorted_by_weekend)
```

Output:

```text
[[103 520 560]
 [101 450 620]
 [102 390 710]
 [104 480 800]]
```

For descending order:

```python
best_weekend_first = store_sales[np.argsort(store_sales[:, 2])[::-1]]

print(best_weekend_first)
```

Output:

```text
[[104 480 800]
 [102 390 710]
 [101 450 620]
 [103 520 560]]
```

This pattern is very useful for ranking tables.

## 5. Sorting Rows By A Calculated Value

You can sort rows by a value that does not exist yet.

Example: sort stores by total sales.

```python
totals = store_sales[:, 1] + store_sales[:, 2]
order = np.argsort(totals)[::-1]

ranked_stores = store_sales[order]

print(ranked_stores)
print(totals[order])
```

Output:

```text
[[104 480 800]
 [102 390 710]
 [103 520 560]
 [101 450 620]]
[1280 1100 1080 1070]
```

The important idea:

```python
array[np.argsort(values)]
```

Use it when you want to rearrange records based on a score, total, date, error value, or prediction confidence.

## 6. Adding A Column With `np.concatenate`

Assume you have marks for 4 students in 3 subjects:

```python
marks = np.array([
    [78, 85, 91],
    [62, 70, 68],
    [90, 88, 95],
    [55, 60, 64],
])
```

Now a new subject score arrives:

```python
project_marks = np.array([89, 74, 97, 66])
```

This is a 1D array. To add it as a column, convert it to shape `(4, 1)`:

```python
project_column = project_marks.reshape(-1, 1)

updated_marks = np.concatenate((marks, project_column), axis=1)

print(updated_marks)
```

Output:

```text
[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]]
```

Why reshape?

```python
print(marks.shape)
print(project_marks.shape)
print(project_column.shape)
```

Output:

```text
(4, 3)
(4,)
(4, 1)
```

For column-wise joining, both arrays must agree on the number of rows.

## 7. Adding Rows With `np.concatenate`

Now add two new students:

```python
new_students = np.array([
    [81, 77, 84, 90],
    [69, 73, 71, 75],
])

all_marks = np.concatenate((updated_marks, new_students), axis=0)

print(all_marks)
```

Output:

```text
[[78 85 91 89]
 [62 70 68 74]
 [90 88 95 97]
 [55 60 64 66]
 [81 77 84 90]
 [69 73 71 75]]
```

For row-wise joining, both arrays must agree on the number of columns.

## 8. Adding A Derived Column

A common data-preparation task is adding totals, averages, or flags.

Add a total marks column:

```python
total_marks = all_marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((all_marks, total_marks), axis=1)

print(marks_with_total)
```

Output:

```text
[[ 78  85  91  89 343]
 [ 62  70  68  74 274]
 [ 90  88  95  97 370]
 [ 55  60  64  66 245]
 [ 81  77  84  90 332]
 [ 69  73  71  75 288]]
```

`keepdims=True` keeps the result as a 2D column, which makes concatenation easier.

## 9. `np.append`: Useful, But Be Careful

`np.append()` can add values, but it often hides shape mistakes.

```python
arr = np.array([[1, 2], [3, 4]])

print(np.append(arr, [[5, 6]], axis=0))
```

Output:

```text
[[1 2]
 [3 4]
 [5 6]]
```

Without `axis`, `np.append()` flattens the data:

```python
print(np.append(arr, [[5, 6]]))
```

Output:

```text
[1 2 3 4 5 6]
```

For serious data work, prefer `np.concatenate()`, `np.vstack()`, or `np.hstack()` because they make shape expectations clearer.

## 10. Finding Unique Values

`np.unique()` returns sorted unique values.

```python
categories = np.array(["basic", "pro", "basic", "enterprise", "pro"])

print(np.unique(categories))
```

Output:

```text
['basic' 'enterprise' 'pro']
```

You can also count how often each value appears:

```python
labels, counts = np.unique(categories, return_counts=True)

print(labels)
print(counts)
```

Output:

```text
['basic' 'enterprise' 'pro']
[2 1 2]
```

This is useful for quick frequency tables.

## 11. Unique Rows And Columns

For 2D arrays, use `axis`.

```python
events = np.array([
    [1, 10, 100],
    [2, 20, 200],
    [1, 10, 100],
    [3, 30, 300],
])
```

Unique rows:

```python
print(np.unique(events, axis=0))
```

Output:

```text
[[  1  10 100]
 [  2  20 200]
 [  3  30 300]]
```

Unique columns:

```python
matrix = np.array([
    [1, 2, 1, 4],
    [5, 6, 5, 8],
])

print(np.unique(matrix, axis=1))
```

Output:

```text
[[1 2 4]
 [5 6 8]]
```

Use this when duplicate records or duplicate feature columns need to be detected.

## 12. Adding Dimensions With `np.expand_dims`

Machine learning libraries often expect data in a specific number of dimensions.

Suppose one user's activity data is 1D:

```python
activity = np.array([8, 10, 7, 12])

print(activity.shape)
```

Output:

```text
(4,)
```

Make it one row:

```python
row = np.expand_dims(activity, axis=0)

print(row)
print(row.shape)
```

Output:

```text
[[ 8 10  7 12]]
(1, 4)
```

Make it one column:

```python
column = np.expand_dims(activity, axis=1)

print(column)
print(column.shape)
```

Output:

```text
[[ 8]
 [10]
 [ 7]
 [12]]
(4, 1)
```

The same result can often be written with `reshape()`:

```python
print(activity.reshape(1, -1).shape)
print(activity.reshape(-1, 1).shape)
```

## 13. Filtering With `np.where`

`np.where()` can return positions or choose values conditionally.

Create an array:

```python
temperatures = np.array([28, 35, 41, 32, 39, 45])
```

Find positions where temperature is above 38:

```python
hot_positions = np.where(temperatures > 38)

print(hot_positions)
```

Output:

```text
(array([2, 4, 5]),)
```

Use those positions to get values:

```python
print(temperatures[hot_positions])
```

Output:

```text
[41 39 45]
```

## 14. Replacing Values With `np.where`

The three-argument form is:

```python
np.where(condition, value_if_true, value_if_false)
```

Example: cap warning temperatures with a label value.

```python
cleaned = np.where(temperatures > 40, 40, temperatures)

print(cleaned)
```

Output:

```text
[28 35 40 32 39 40]
```

Example: create pass/fail labels:

```python
exam_scores = np.array([82, 45, 67, 39, 90])

status = np.where(exam_scores >= 50, "pass", "retry")

print(status)
```

Output:

```text
['pass' 'retry' 'pass' 'retry' 'pass']
```

## 15. Finding Best And Worst Positions

`np.argmax()` returns the index of the largest value.

```python
daily_orders = np.array([120, 98, 145, 160, 132])

best_day = np.argmax(daily_orders)
worst_day = np.argmin(daily_orders)

print(best_day)
print(worst_day)
```

Output:

```text
3
1
```

Index `3` has the highest order count. Index `1` has the lowest.

For 2D arrays:

```python
weekly_orders = np.array([
    [120, 98, 145],
    [80, 110, 105],
    [150, 130, 170],
])
```

Best store per day:

```python
print(np.argmax(weekly_orders, axis=0))
```

Output:

```text
[2 2 2]
```

Best day per store:

```python
print(np.argmax(weekly_orders, axis=1))
```

Output:

```text
[2 1 2]
```

## 16. Cumulative Sum And Product

`np.cumsum()` calculates running totals.

```python
revenue = np.array([1000, 1500, 1200, 1800])

print(np.cumsum(revenue))
```

Output:

```text
[1000 2500 3700 5500]
```

For 2D arrays:

```python
monthly_sales = np.array([
    [10, 12, 15],
    [8, 9, 11],
])
```

Cumulative sales across months for each product:

```python
print(np.cumsum(monthly_sales, axis=1))
```

Output:

```text
[[10 22 37]
 [ 8 17 28]]
```

`np.cumprod()` works similarly for running multiplication:

```python
growth = np.array([1.05, 1.10, 0.95])

print(np.cumprod(growth))
```

Output:

```text
[1.05    1.155   1.09725]
```

## 17. Percentiles And Median

A percentile tells you how a value compares to the distribution.

```python
response_times = np.array([120, 180, 240, 300, 360, 420, 900])
```

Calculate the 50th, 75th, and 90th percentiles:

```python
print(np.percentile(response_times, 50))
print(np.percentile(response_times, 75))
print(np.percentile(response_times, 90))
```

Output:

```text
300.0
390.0
612.0
```

The 50th percentile is the median:

```python
print(np.median(response_times))
```

Output:

```text
300.0
```

Percentiles are useful when averages are misleading because of outliers.

## 18. Percentiles Along Axis

Suppose rows are products and columns are monthly sales:

```python
sales_table = np.array([
    [100, 120, 140, 160],
    [80, 85, 90, 300],
    [200, 210, 220, 230],
])
```

Median per product:

```python
print(np.percentile(sales_table, 50, axis=1))
```

Output:

```text
[130.   87.5 215. ]
```

Median per month:

```python
print(np.percentile(sales_table, 50, axis=0))
```

Output:

```text
[100. 120. 140. 230.]
```

Again, shape and axis decide the meaning.

## 19. Histograms With `np.histogram`

`np.histogram()` counts how many values fall into ranges.

```python
ages = np.array([18, 21, 22, 25, 27, 33, 35, 41, 45, 52, 60])

counts, bin_edges = np.histogram(ages, bins=[18, 30, 45, 65])

print(counts)
print(bin_edges)
```

Output:

```text
[5 3 3]
[18 30 45 65]
```

This means:

- 5 values from 18 up to 30
- 3 values from 30 up to 45
- 3 values from 45 up to 65

Use histograms when you want a distribution summary without plotting yet.

## 20. Correlation With `np.corrcoef`

Correlation measures how two variables move together.

```python
ad_spend = np.array([10, 20, 30, 40, 50])
sales = np.array([24, 38, 52, 68, 79])

correlation = np.corrcoef(ad_spend, sales)

print(correlation)
```

Output:

```text
[[1.         0.99898688]
 [0.99898688 1.        ]]
```

The off-diagonal value is the correlation between ad spend and sales.

```python
print(correlation[0, 1])
```

Output:

```text
0.9989868773062354
```

A value close to `1` means strong positive correlation.

Important reminder: correlation does not prove causation.

## 21. Membership Checks With `np.isin`

`np.isin()` checks whether values are present in another collection.

```python
user_ids = np.array([101, 102, 103, 104, 105, 106])
premium_ids = np.array([102, 105, 108])

mask = np.isin(user_ids, premium_ids)

print(mask)
print(user_ids[mask])
```

Output:

```text
[False  True False False  True False]
[102 105]
```

This is useful for filtering by allowed IDs, selected categories, blocked values, or known labels.

## 22. Reversing Arrays With `np.flip`

For 1D arrays:

```python
steps = np.array([1, 2, 3, 4, 5])

print(np.flip(steps))
```

Output:

```text
[5 4 3 2 1]
```

For 2D arrays:

```python
grid = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
```

Flip rows:

```python
print(np.flip(grid, axis=0))
```

Output:

```text
[[7 8 9]
 [4 5 6]
 [1 2 3]]
```

Flip columns:

```python
print(np.flip(grid, axis=1))
```

Output:

```text
[[3 2 1]
 [6 5 4]
 [9 8 7]]
```

Flip both axes:

```python
print(np.flip(grid))
```

Output:

```text
[[9 8 7]
 [6 5 4]
 [3 2 1]]
```

## 23. Updating Values With `np.put`

`np.put()` updates positions in the flattened version of the array.

```python
board = np.arange(1, 10).reshape(3, 3)

print(board)
```

Output:

```text
[[1 2 3]
 [4 5 6]
 [7 8 9]]
```

Update flattened positions `0` and `8`:

```python
np.put(board, [0, 8], [100, 900])

print(board)
```

Output:

```text
[[100   2   3]
 [  4   5   6]
 [  7   8 900]]
```

Because `np.put()` mutates the original array, use it carefully.

In many cases, direct indexing is clearer:

```python
board[0, 0] = 100
board[2, 2] = 900
```

## 24. Deleting Values With `np.delete`

`np.delete()` returns a new array with selected positions removed.

```python
numbers = np.array([10, 20, 30, 40, 50])

without_first = np.delete(numbers, 0)

print(without_first)
```

Output:

```text
[20 30 40 50]
```

Delete multiple positions:

```python
print(np.delete(numbers, [1, 3]))
```

Output:

```text
[10 30 50]
```

For 2D arrays:

```python
table = np.arange(1, 13).reshape(3, 4)

print(table)
```

Output:

```text
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
```

Delete a row:

```python
print(np.delete(table, 1, axis=0))
```

Output:

```text
[[ 1  2  3  4]
 [ 9 10 11 12]]
```

Delete a column:

```python
print(np.delete(table, 2, axis=1))
```

Output:

```text
[[ 1  2  4]
 [ 5  6  8]
 [ 9 10 12]]
```

## 25. Set Operations

NumPy has useful set-style functions for 1D arrays.

```python
course_a = np.array([101, 102, 103, 104])
course_b = np.array([103, 104, 105, 106])
```

Union:

```python
print(np.union1d(course_a, course_b))
```

Output:

```text
[101 102 103 104 105 106]
```

Intersection:

```python
print(np.intersect1d(course_a, course_b))
```

Output:

```text
[103 104]
```

Values in `course_a` but not in `course_b`:

```python
print(np.setdiff1d(course_a, course_b))
```

Output:

```text
[101 102]
```

Values that appear in one array but not both:

```python
print(np.setxor1d(course_a, course_b))
```

Output:

```text
[101 102 105 106]
```

These functions are helpful when comparing IDs, labels, selected items, feature lists, or category groups.

## 26. Clipping Values With `np.clip`

`np.clip()` limits values to a minimum and maximum range.

```python
ratings = np.array([2, 5, 8, 11, -3, 7])

safe_ratings = np.clip(ratings, a_min=0, a_max=10)

print(safe_ratings)
```

Output:

```text
[ 2  5  8 10  0  7]
```

This is useful for:

- limiting outliers
- keeping probabilities between 0 and 1
- capping image pixel values
- protecting dashboards from extreme values
- preparing model features

Example with percentages:

```python
predicted_discount = np.array([-5, 10, 25, 60, 120])

final_discount = np.clip(predicted_discount, 0, 50)

print(final_discount)
```

Output:

```text
[ 0 10 25 50 50]
```

## 27. Mini Project: Rank Students After Adding New Marks

You have marks for 5 students across 4 subjects:

```python
marks = np.array([
    [72, 81, 77, 69],
    [88, 90, 84, 91],
    [55, 61, 58, 64],
    [79, 74, 82, 80],
    [93, 89, 95, 90],
])
```

A new practical exam score arrives:

```python
practical = np.array([85, 92, 67, 78, 96])
```

Add the practical score as a new column:

```python
marks = np.concatenate((marks, practical.reshape(-1, 1)), axis=1)
```

Add total marks as another column:

```python
total = marks.sum(axis=1, keepdims=True)
marks_with_total = np.concatenate((marks, total), axis=1)
```

Sort students by total marks in descending order:

```python
ranked = marks_with_total[np.argsort(marks_with_total[:, -1])[::-1]]

print(ranked)
```

Get the top 2:

```python
print(ranked[:2])
```

This combines:

- reshaping
- concatenation
- row-wise sum
- sorting rows by a derived column
- slicing top results

## 28. Mini Project: Clean Sensor Readings

You receive sensor readings where values below 0 and above 100 are invalid.

```python
readings = np.array([12, 45, -8, 60, 105, 88, 101, 0, 74])
```

Clip invalid values:

```python
cleaned = np.clip(readings, 0, 100)

print(cleaned)
```

Output:

```text
[ 12  45   0  60 100  88 100   0  74]
```

Find readings that were changed:

```python
changed_positions = np.where(readings != cleaned)[0]

print(changed_positions)
```

Output:

```text
[2 4 6]
```

Create labels:

```python
labels = np.where(readings != cleaned, "corrected", "ok")

print(labels)
```

Output:

```text
['ok' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'ok']
```

This is a realistic pattern for data cleaning.

## 29. Practice Exercises

Try these before checking the solutions.

### Exercise 1: Sort by total

Create a 4 by 3 array of product sales. Add a total column and sort rows by total sales in descending order.

### Exercise 2: Add a status column

Given exam scores for students, add a column that contains `1` if the student's average is at least 60, otherwise `0`.

### Exercise 3: Unique customer visits

Given an array of customer IDs, print unique customers and how many times each customer appears.

### Exercise 4: Top product per day

Given a 2D array where rows are products and columns are days, find the product index with maximum sales for every day.

### Exercise 5: Clip and count outliers

Given an array of values, clip everything between 10 and 90. Count how many values were changed.

### Exercise 6: Membership filter

Given all user IDs and a list of blocked IDs, return only users who are not blocked.

### Exercise 7: Flip an image-like matrix

Create a 4 by 4 array and flip it vertically, horizontally, and both ways.

### Exercise 8: Histogram buckets

Create an array of ages and count how many people fall into age groups `[0, 18, 30, 45, 60, 100]`.

### Exercise 9: Remove min and max

Create a 1D array and remove every occurrence of its minimum and maximum values.

### Exercise 10: Compare two batches

Given two arrays of product IDs, find products only in batch A, only in batch B, and products present in both.

## 30. Practice Solutions

### Solution 1: Sort by total

```python
sales = np.array([
    [40, 55, 60],
    [90, 70, 85],
    [30, 45, 35],
    [75, 80, 72],
])

totals = sales.sum(axis=1, keepdims=True)
with_total = np.concatenate((sales, totals), axis=1)
ranked = with_total[np.argsort(with_total[:, -1])[::-1]]

print(ranked)
```
**Explanation**

- A NumPy array named `sales` is created, containing sales data for different categories.  
- The `sum` method computes the total sales for each row (category) while maintaining the original array's dimensions using `keepdims=True`.  
- The total sales are concatenated to the original `sales` array, creating a new array `with_total` that includes the totals as an additional column.  
- The rows of `with_total` are sorted in descending order based on the total sales using `np.argsort` and slicing.  
- Finally, the ranked array is printed, showing the sales data ordered by total sales.


### Solution 2: Add a status column

```python
scores = np.array([
    [70, 65, 80],
    [45, 50, 55],
    [90, 88, 92],
])

average = scores.mean(axis=1, keepdims=True)
status = np.where(average >= 60, 1, 0)

result = np.concatenate((scores, status), axis=1)

print(result)
```
**Explanation**

- Initializes a NumPy array `scores` containing test scores for three students across three subjects.  
- Computes the average score for each student along the specified axis (rows) while maintaining the original dimensions using `keepdims=True`.  
- Uses `np.where` to create a binary status array, marking students as '1' (pass) if their average score is 60 or above, and '0' (fail) otherwise.  
- Concatenates the original scores with the status array to form a new array that includes both scores and pass/fail status.  
- Outputs the final combined array, showing each student's scores alongside their pass/fail status.


### Solution 3: Unique customer visits

```python
customers = np.array([101, 102, 101, 103, 102, 101, 104])

ids, counts = np.unique(customers, return_counts=True)

print(ids)
print(counts)
```
**Explanation**

- The code initializes a NumPy array called `customers` containing customer IDs, some of which are repeated.  
- The `np.unique()` function is used to find unique customer IDs and count their occurrences, returning two arrays: `ids` for unique IDs and `counts` for their respective counts.  
- The unique IDs are printed to the console, showing which customers are present.  
- The counts of each unique ID are also printed, indicating how many times each customer ID appears in the original array.


### Solution 4: Top product per day

```python
sales = np.array([
    [20, 35, 30],
    [25, 30, 45],
    [40, 20, 25],
])

top_product_by_day = np.argmax(sales, axis=0)

print(top_product_by_day)
```
**Explanation**

- The code initializes a 2D NumPy array named `sales`, representing sales figures for three products over three days.  
- The `np.argmax` function is used to find the index of the highest sales value for each day, specified by `axis=0`, which indicates that the operation is performed column-wise.  
- The result, stored in `top_product_by_day`, contains the indices of the top-selling products for each day.  
- Finally, the indices of the top products are printed to the console.


### Solution 5: Clip and count outliers

```python
values = np.array([5, 18, 44, 92, 100, 63, 7])

clipped = np.clip(values, 10, 90)
changed_count = np.sum(values != clipped)

print(clipped)
print(changed_count)
```
**Explanation**

- The code initializes a NumPy array named `values` with a set of integers.  
- It uses the `np.clip()` function to limit the values in the array to a specified range, in this case between 10 and 90.  
- The result of the clipping is stored in the `clipped` variable.  
- The code calculates the number of elements that were changed during the clipping process by comparing the original and clipped arrays, using `np.sum()` to count the differences.  
- Finally, it prints the clipped array and the count of changed values to the console.


### Solution 6: Membership filter

```python
users = np.array([10, 11, 12, 13, 14, 15])
blocked = np.array([11, 15])

allowed_users = users[~np.isin(users, blocked)]

print(allowed_users)
```
**Explanation**

- The code initializes two NumPy arrays: `users` containing a range of user IDs and `blocked` containing IDs that are not allowed.
- It uses `np.isin()` to create a boolean array that identifies which users are in the `blocked` list.
- The tilde operator `~` negates this boolean array, effectively marking users that are not blocked.
- The filtered array `allowed_users` is created by indexing the `users` array with the negated boolean array.
- Finally, it prints the `allowed_users` array, which contains only the IDs of users that are not blocked.


### Solution 7: Flip an image-like matrix

```python
image = np.arange(1, 17).reshape(4, 4)

print(np.flip(image, axis=0))
print(np.flip(image, axis=1))
print(np.flip(image))
```
**Explanation**

- The code creates a 4x4 NumPy array filled with integers from 1 to 16 using `np.arange` and `reshape`.  
- `np.flip(image, axis=0)` flips the array vertically (upside down).  
- `np.flip(image, axis=1)` flips the array horizontally (left to right).  
- `np.flip(image)` flips the array both vertically and horizontally, resulting in a 180-degree rotation.  
- The `print` statements display the results of each flip operation.


### Solution 8: Histogram buckets

```python
ages = np.array([12, 17, 18, 24, 29, 30, 37, 44, 45, 61, 72])

counts, edges = np.histogram(ages, bins=[0, 18, 30, 45, 60, 100])

print(counts)
print(edges)
```
**Explanation**

- The code initializes a NumPy array `ages` containing various age values.  
- It uses `np.histogram` to compute the frequency of ages within specified bins: [0, 18), [18, 30), [30, 45), [45, 60), and [60, 100).  
- The function returns two arrays: `counts`, which holds the number of ages in each bin, and `edges`, which defines the boundaries of the bins.  
- Finally, it prints the counts of ages in each bin and the edges of the bins to the console.


### Solution 9: Remove min and max

```python
arr = np.array([4, 9, 1, 3, 9, 2, 1, 7])

minimum = arr.min()
maximum = arr.max()

filtered = arr[(arr != minimum) & (arr != maximum)]

print(filtered)
```
**Explanation**

- The code initializes a NumPy array `arr` with a set of integer values.  
- It calculates the minimum and maximum values in the array using the `min()` and `max()` methods.  
- A filtered array `filtered` is created by excluding the minimum and maximum values using boolean indexing.  
- Finally, the filtered array is printed, displaying only the values that are neither the minimum nor the maximum.


### Solution 10: Compare two batches

```python
batch_a = np.array([101, 102, 103, 104])
batch_b = np.array([103, 104, 105, 106])

only_a = np.setdiff1d(batch_a, batch_b)
only_b = np.setdiff1d(batch_b, batch_a)
both = np.intersect1d(batch_a, batch_b)

print("Only A:", only_a)
print("Only B:", only_b)
print("Both:", both)
```
**Explanation**

- `batch_a` and `batch_b` are defined as NumPy arrays containing integer values.  
- `np.setdiff1d` is used to find elements that are in `batch_a` but not in `batch_b`, stored in `only_a`.  
- Similarly, `only_b` contains elements that are in `batch_b` but not in `batch_a`.  
- `np.intersect1d` identifies elements that are present in both arrays, stored in `both`.  
- The results are printed, showing unique elements from each array and their intersection.


## 31. Common Mistakes

### Mistake 1: Forgetting that `np.append()` flattens by default

Always pass `axis` if you want to preserve 2D structure.

### Mistake 2: Sorting values when you meant to sort rows

Use `np.sort()` to sort values inside arrays. Use `np.argsort()` to reorder rows based on a column or score.

### Mistake 3: Concatenating arrays with mismatched dimensions

Print shapes before combining arrays:

```python
print(a.shape)
print(b.shape)
```
**Explanation**

- The `print` function outputs the shape of the array `a` using the `.shape` attribute, which returns a tuple representing the dimensions of the array.  
- Similarly, the shape of the array `b` is printed, providing insight into its structure and size.  
- This is useful for debugging and understanding the data being processed in numerical computations or machine learning tasks.  
- The shapes can indicate whether the arrays are compatible for operations like addition, multiplication, or concatenation.


### Mistake 4: Confusing axis meanings

For 2D arrays:

- `axis=0` works down rows and returns one result per column
- `axis=1` works across columns and returns one result per row

### Mistake 5: Mutating arrays accidentally

Functions like `np.put()` modify the original array. Functions like `np.delete()` and `np.sort()` usually return new arrays.

## Final Takeaway

NumPy becomes powerful when you stop thinking only in terms of individual values and start thinking in terms of whole-array transformations.

The most useful habits are:

1. Print `shape` before combining arrays.
2. Use `axis` intentionally.
3. Prefer `concatenate`, `vstack`, or `hstack` when structure matters.
4. Use boolean masks and `np.where()` instead of manual loops.
5. Use `argsort`, `argmax`, and `argmin` when you need positions.
6. Use `clip`, `percentile`, and `histogram` for quick data cleaning and analysis.

These tricks are small individually, but together they make NumPy feel like a practical data toolkit instead of just an array library.

## Sources and Further Reading

- NumPy documentation: https://numpy.org/doc/
- NumPy sorting reference: https://numpy.org/doc/stable/reference/generated/numpy.sort.html
- NumPy concatenate reference: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
- NumPy unique reference: https://numpy.org/doc/stable/reference/generated/numpy.unique.html
- NumPy where reference: https://numpy.org/doc/stable/reference/generated/numpy.where.html
- NumPy histogram reference: https://numpy.org/doc/stable/reference/generated/numpy.histogram.html
- NumPy set routines: https://numpy.org/doc/stable/reference/routines.set.html