# Mastering Advanced NumPy: Essential Sorting and Searching Techniques URL: https://madhudadi.in/blog/posts/advanced-numpy-sorting-searching-essential-tricks Published: 2026-05-27 Tags: Numpy, python Read time: 28 min Difficulty: intermediate > Learn practical NumPy tricks for real data work: sorting arrays, adding rows and columns, finding unique values, filtering with conditions, ranking values, cumulative calculations, percentiles, histograms, correlation, set operations, clipping, and practice tasks.# NumPy Tricks: Sorting, Filtering, Reshaping, Statistics, and Set Operations Once you understand NumPy arrays, shapes, axes, indexing, and broadcasting, the next step is learning the small tools that make day-to-day array work faster. These are not "magic tricks." They are practical patterns you will use when cleaning data, preparing model inputs, analyzing scores, ranking records, creating summary features, and transforming arrays before sending them to Pandas, scikit-learn, visualization tools, or machine learning models. In this lesson, you will learn how to: - sort 1D and 2D arrays - add rows and columns safely - combine arrays with `concatenate` - find unique values and unique rows - add dimensions with `expand_dims` - filter and replace values using `where` - find best and worst positions with `argmax` and `argmin` - calculate running totals with `cumsum` - calculate percentiles and medians - build frequency tables with `histogram` - measure correlation with `corrcoef` - check membership with `isin` - reverse arrays with `flip` - update and delete values carefully - use NumPy set operations - cap extreme values with `clip` The examples are written around small business, student, and analytics-style datasets so you can see how these functions appear in real work. ## 1. Setup Import NumPy with the standard alias: ```python import numpy as np ``` For examples that use random data, use a generator with a seed: ```python rng = np.random.default_rng(42) ``` This keeps your output reproducible while learning. ## 2. Sorting 1D Arrays With `np.sort` `np.sort()` returns a sorted copy of an array. ```python scores = np.array([72, 95, 61, 88, 75]) sorted_scores = np.sort(scores) print(sorted_scores) ``` Output: ```text [61 72 75 88 95] ``` The original array is not changed: ```python print(scores) ``` Output: ```text [72 95 61 88 75] ``` To sort in descending order, reverse the sorted result: ```python descending_scores = np.sort(scores)[::-1] print(descending_scores) ``` Output: ```text [95 88 75 72 61] ``` ## 3. Sorting 2D Arrays For a 2D array, `axis` controls the direction of sorting. ```python sales = np.array([ [45, 80, 62], [90, 55, 73], [38, 96, 68], ]) ``` Sort values inside each row: ```python print(np.sort(sales, axis=1)) ``` Output: ```text [[45 62 80] [55 73 90] [38 68 96]] ``` Sort values inside each column: ```python print(np.sort(sales, axis=0)) ``` Output: ```text [[38 55 62] [45 80 68] [90 96 73]] ``` Use this when you want to sort values within rows or columns independently. ## 4. Sorting Rows By One Column Sometimes you do not want to sort values inside each row. You want to reorder the rows based on one column. Suppose each row is: ```text [store_id, weekday_sales, weekend_sales] ``` ```python store_sales = np.array([ [101, 450, 620], [102, 390, 710], [103, 520, 560], [104, 480, 800], ]) ``` Sort rows by weekend sales: ```python order = np.argsort(store_sales[:, 2]) sorted_by_weekend = store_sales[order] print(sorted_by_weekend) ``` Output: ```text [[103 520 560] [101 450 620] [102 390 710] [104 480 800]] ``` For descending order: ```python best_weekend_first = store_sales[np.argsort(store_sales[:, 2])[::-1]] print(best_weekend_first) ``` Output: ```text [[104 480 800] [102 390 710] [101 450 620] [103 520 560]] ``` This pattern is very useful for ranking tables. ## 5. Sorting Rows By A Calculated Value You can sort rows by a value that does not exist yet. Example: sort stores by total sales. ```python totals = store_sales[:, 1] + store_sales[:, 2] order = np.argsort(totals)[::-1] ranked_stores = store_sales[order] print(ranked_stores) print(totals[order]) ``` Output: ```text [[104 480 800] [102 390 710] [103 520 560] [101 450 620]] [1280 1100 1080 1070] ``` The important idea: ```python array[np.argsort(values)] ``` Use it when you want to rearrange records based on a score, total, date, error value, or prediction confidence. ## 6. Adding A Column With `np.concatenate` Assume you have marks for 4 students in 3 subjects: ```python marks = np.array([ [78, 85, 91], [62, 70, 68], [90, 88, 95], [55, 60, 64], ]) ``` Now a new subject score arrives: ```python project_marks = np.array([89, 74, 97, 66]) ``` This is a 1D array. To add it as a column, convert it to shape `(4, 1)`: ```python project_column = project_marks.reshape(-1, 1) updated_marks = np.concatenate((marks, project_column), axis=1) print(updated_marks) ``` Output: ```text [[78 85 91 89] [62 70 68 74] [90 88 95 97] [55 60 64 66]] ``` Why reshape? ```python print(marks.shape) print(project_marks.shape) print(project_column.shape) ``` Output: ```text (4, 3) (4,) (4, 1) ``` For column-wise joining, both arrays must agree on the number of rows. ## 7. Adding Rows With `np.concatenate` Now add two new students: ```python new_students = np.array([ [81, 77, 84, 90], [69, 73, 71, 75], ]) all_marks = np.concatenate((updated_marks, new_students), axis=0) print(all_marks) ``` Output: ```text [[78 85 91 89] [62 70 68 74] [90 88 95 97] [55 60 64 66] [81 77 84 90] [69 73 71 75]] ``` For row-wise joining, both arrays must agree on the number of columns. ## 8. Adding A Derived Column A common data-preparation task is adding totals, averages, or flags. Add a total marks column: ```python total_marks = all_marks.sum(axis=1, keepdims=True) marks_with_total = np.concatenate((all_marks, total_marks), axis=1) print(marks_with_total) ``` Output: ```text [[ 78 85 91 89 343] [ 62 70 68 74 274] [ 90 88 95 97 370] [ 55 60 64 66 245] [ 81 77 84 90 332] [ 69 73 71 75 288]] ``` `keepdims=True` keeps the result as a 2D column, which makes concatenation easier. ## 9. `np.append`: Useful, But Be Careful `np.append()` can add values, but it often hides shape mistakes. ```python arr = np.array([[1, 2], [3, 4]]) print(np.append(arr, [[5, 6]], axis=0)) ``` Output: ```text [[1 2] [3 4] [5 6]] ``` Without `axis`, `np.append()` flattens the data: ```python print(np.append(arr, [[5, 6]])) ``` Output: ```text [1 2 3 4 5 6] ``` For serious data work, prefer `np.concatenate()`, `np.vstack()`, or `np.hstack()` because they make shape expectations clearer. ## 10. Finding Unique Values `np.unique()` returns sorted unique values. ```python categories = np.array(["basic", "pro", "basic", "enterprise", "pro"]) print(np.unique(categories)) ``` Output: ```text ['basic' 'enterprise' 'pro'] ``` You can also count how often each value appears: ```python labels, counts = np.unique(categories, return_counts=True) print(labels) print(counts) ``` Output: ```text ['basic' 'enterprise' 'pro'] [2 1 2] ``` This is useful for quick frequency tables. ## 11. Unique Rows And Columns For 2D arrays, use `axis`. ```python events = np.array([ [1, 10, 100], [2, 20, 200], [1, 10, 100], [3, 30, 300], ]) ``` Unique rows: ```python print(np.unique(events, axis=0)) ``` Output: ```text [[ 1 10 100] [ 2 20 200] [ 3 30 300]] ``` Unique columns: ```python matrix = np.array([ [1, 2, 1, 4], [5, 6, 5, 8], ]) print(np.unique(matrix, axis=1)) ``` Output: ```text [[1 2 4] [5 6 8]] ``` Use this when duplicate records or duplicate feature columns need to be detected. ## 12. Adding Dimensions With `np.expand_dims` Machine learning libraries often expect data in a specific number of dimensions. Suppose one user's activity data is 1D: ```python activity = np.array([8, 10, 7, 12]) print(activity.shape) ``` Output: ```text (4,) ``` Make it one row: ```python row = np.expand_dims(activity, axis=0) print(row) print(row.shape) ``` Output: ```text [[ 8 10 7 12]] (1, 4) ``` Make it one column: ```python column = np.expand_dims(activity, axis=1) print(column) print(column.shape) ``` Output: ```text [[ 8] [10] [ 7] [12]] (4, 1) ``` The same result can often be written with `reshape()`: ```python print(activity.reshape(1, -1).shape) print(activity.reshape(-1, 1).shape) ``` ## 13. Filtering With `np.where` `np.where()` can return positions or choose values conditionally. Create an array: ```python temperatures = np.array([28, 35, 41, 32, 39, 45]) ``` Find positions where temperature is above 38: ```python hot_positions = np.where(temperatures > 38) print(hot_positions) ``` Output: ```text (array([2, 4, 5]),) ``` Use those positions to get values: ```python print(temperatures[hot_positions]) ``` Output: ```text [41 39 45] ``` ## 14. Replacing Values With `np.where` The three-argument form is: ```python np.where(condition, value_if_true, value_if_false) ``` Example: cap warning temperatures with a label value. ```python cleaned = np.where(temperatures > 40, 40, temperatures) print(cleaned) ``` Output: ```text [28 35 40 32 39 40] ``` Example: create pass/fail labels: ```python exam_scores = np.array([82, 45, 67, 39, 90]) status = np.where(exam_scores >= 50, "pass", "retry") print(status) ``` Output: ```text ['pass' 'retry' 'pass' 'retry' 'pass'] ``` ## 15. Finding Best And Worst Positions `np.argmax()` returns the index of the largest value. ```python daily_orders = np.array([120, 98, 145, 160, 132]) best_day = np.argmax(daily_orders) worst_day = np.argmin(daily_orders) print(best_day) print(worst_day) ``` Output: ```text 3 1 ``` Index `3` has the highest order count. Index `1` has the lowest. For 2D arrays: ```python weekly_orders = np.array([ [120, 98, 145], [80, 110, 105], [150, 130, 170], ]) ``` Best store per day: ```python print(np.argmax(weekly_orders, axis=0)) ``` Output: ```text [2 2 2] ``` Best day per store: ```python print(np.argmax(weekly_orders, axis=1)) ``` Output: ```text [2 1 2] ``` ## 16. Cumulative Sum And Product `np.cumsum()` calculates running totals. ```python revenue = np.array([1000, 1500, 1200, 1800]) print(np.cumsum(revenue)) ``` Output: ```text [1000 2500 3700 5500] ``` For 2D arrays: ```python monthly_sales = np.array([ [10, 12, 15], [8, 9, 11], ]) ``` Cumulative sales across months for each product: ```python print(np.cumsum(monthly_sales, axis=1)) ``` Output: ```text [[10 22 37] [ 8 17 28]] ``` `np.cumprod()` works similarly for running multiplication: ```python growth = np.array([1.05, 1.10, 0.95]) print(np.cumprod(growth)) ``` Output: ```text [1.05 1.155 1.09725] ``` ## 17. Percentiles And Median A percentile tells you how a value compares to the distribution. ```python response_times = np.array([120, 180, 240, 300, 360, 420, 900]) ``` Calculate the 50th, 75th, and 90th percentiles: ```python print(np.percentile(response_times, 50)) print(np.percentile(response_times, 75)) print(np.percentile(response_times, 90)) ``` Output: ```text 300.0 390.0 612.0 ``` The 50th percentile is the median: ```python print(np.median(response_times)) ``` Output: ```text 300.0 ``` Percentiles are useful when averages are misleading because of outliers. ## 18. Percentiles Along Axis Suppose rows are products and columns are monthly sales: ```python sales_table = np.array([ [100, 120, 140, 160], [80, 85, 90, 300], [200, 210, 220, 230], ]) ``` Median per product: ```python print(np.percentile(sales_table, 50, axis=1)) ``` Output: ```text [130. 87.5 215. ] ``` Median per month: ```python print(np.percentile(sales_table, 50, axis=0)) ``` Output: ```text [100. 120. 140. 230.] ``` Again, shape and axis decide the meaning. ## 19. Histograms With `np.histogram` `np.histogram()` counts how many values fall into ranges. ```python ages = np.array([18, 21, 22, 25, 27, 33, 35, 41, 45, 52, 60]) counts, bin_edges = np.histogram(ages, bins=[18, 30, 45, 65]) print(counts) print(bin_edges) ``` Output: ```text [5 3 3] [18 30 45 65] ``` This means: - 5 values from 18 up to 30 - 3 values from 30 up to 45 - 3 values from 45 up to 65 Use histograms when you want a distribution summary without plotting yet. ## 20. Correlation With `np.corrcoef` Correlation measures how two variables move together. ```python ad_spend = np.array([10, 20, 30, 40, 50]) sales = np.array([24, 38, 52, 68, 79]) correlation = np.corrcoef(ad_spend, sales) print(correlation) ``` Output: ```text [[1. 0.99898688] [0.99898688 1. ]] ``` The off-diagonal value is the correlation between ad spend and sales. ```python print(correlation[0, 1]) ``` Output: ```text 0.9989868773062354 ``` A value close to `1` means strong positive correlation. Important reminder: correlation does not prove causation. ## 21. Membership Checks With `np.isin` `np.isin()` checks whether values are present in another collection. ```python user_ids = np.array([101, 102, 103, 104, 105, 106]) premium_ids = np.array([102, 105, 108]) mask = np.isin(user_ids, premium_ids) print(mask) print(user_ids[mask]) ``` Output: ```text [False True False False True False] [102 105] ``` This is useful for filtering by allowed IDs, selected categories, blocked values, or known labels. ## 22. Reversing Arrays With `np.flip` For 1D arrays: ```python steps = np.array([1, 2, 3, 4, 5]) print(np.flip(steps)) ``` Output: ```text [5 4 3 2 1] ``` For 2D arrays: ```python grid = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9], ]) ``` Flip rows: ```python print(np.flip(grid, axis=0)) ``` Output: ```text [[7 8 9] [4 5 6] [1 2 3]] ``` Flip columns: ```python print(np.flip(grid, axis=1)) ``` Output: ```text [[3 2 1] [6 5 4] [9 8 7]] ``` Flip both axes: ```python print(np.flip(grid)) ``` Output: ```text [[9 8 7] [6 5 4] [3 2 1]] ``` ## 23. Updating Values With `np.put` `np.put()` updates positions in the flattened version of the array. ```python board = np.arange(1, 10).reshape(3, 3) print(board) ``` Output: ```text [[1 2 3] [4 5 6] [7 8 9]] ``` Update flattened positions `0` and `8`: ```python np.put(board, [0, 8], [100, 900]) print(board) ``` Output: ```text [[100 2 3] [ 4 5 6] [ 7 8 900]] ``` Because `np.put()` mutates the original array, use it carefully. In many cases, direct indexing is clearer: ```python board[0, 0] = 100 board[2, 2] = 900 ``` ## 24. Deleting Values With `np.delete` `np.delete()` returns a new array with selected positions removed. ```python numbers = np.array([10, 20, 30, 40, 50]) without_first = np.delete(numbers, 0) print(without_first) ``` Output: ```text [20 30 40 50] ``` Delete multiple positions: ```python print(np.delete(numbers, [1, 3])) ``` Output: ```text [10 30 50] ``` For 2D arrays: ```python table = np.arange(1, 13).reshape(3, 4) print(table) ``` Output: ```text [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] ``` Delete a row: ```python print(np.delete(table, 1, axis=0)) ``` Output: ```text [[ 1 2 3 4] [ 9 10 11 12]] ``` Delete a column: ```python print(np.delete(table, 2, axis=1)) ``` Output: ```text [[ 1 2 4] [ 5 6 8] [ 9 10 12]] ``` ## 25. Set Operations NumPy has useful set-style functions for 1D arrays. ```python course_a = np.array([101, 102, 103, 104]) course_b = np.array([103, 104, 105, 106]) ``` Union: ```python print(np.union1d(course_a, course_b)) ``` Output: ```text [101 102 103 104 105 106] ``` Intersection: ```python print(np.intersect1d(course_a, course_b)) ``` Output: ```text [103 104] ``` Values in `course_a` but not in `course_b`: ```python print(np.setdiff1d(course_a, course_b)) ``` Output: ```text [101 102] ``` Values that appear in one array but not both: ```python print(np.setxor1d(course_a, course_b)) ``` Output: ```text [101 102 105 106] ``` These functions are helpful when comparing IDs, labels, selected items, feature lists, or category groups. ## 26. Clipping Values With `np.clip` `np.clip()` limits values to a minimum and maximum range. ```python ratings = np.array([2, 5, 8, 11, -3, 7]) safe_ratings = np.clip(ratings, a_min=0, a_max=10) print(safe_ratings) ``` Output: ```text [ 2 5 8 10 0 7] ``` This is useful for: - limiting outliers - keeping probabilities between 0 and 1 - capping image pixel values - protecting dashboards from extreme values - preparing model features Example with percentages: ```python predicted_discount = np.array([-5, 10, 25, 60, 120]) final_discount = np.clip(predicted_discount, 0, 50) print(final_discount) ``` Output: ```text [ 0 10 25 50 50] ``` ## 27. Mini Project: Rank Students After Adding New Marks You have marks for 5 students across 4 subjects: ```python marks = np.array([ [72, 81, 77, 69], [88, 90, 84, 91], [55, 61, 58, 64], [79, 74, 82, 80], [93, 89, 95, 90], ]) ``` A new practical exam score arrives: ```python practical = np.array([85, 92, 67, 78, 96]) ``` Add the practical score as a new column: ```python marks = np.concatenate((marks, practical.reshape(-1, 1)), axis=1) ``` Add total marks as another column: ```python total = marks.sum(axis=1, keepdims=True) marks_with_total = np.concatenate((marks, total), axis=1) ``` Sort students by total marks in descending order: ```python ranked = marks_with_total[np.argsort(marks_with_total[:, -1])[::-1]] print(ranked) ``` Get the top 2: ```python print(ranked[:2]) ``` This combines: - reshaping - concatenation - row-wise sum - sorting rows by a derived column - slicing top results ## 28. Mini Project: Clean Sensor Readings You receive sensor readings where values below 0 and above 100 are invalid. ```python readings = np.array([12, 45, -8, 60, 105, 88, 101, 0, 74]) ``` Clip invalid values: ```python cleaned = np.clip(readings, 0, 100) print(cleaned) ``` Output: ```text [ 12 45 0 60 100 88 100 0 74] ``` Find readings that were changed: ```python changed_positions = np.where(readings != cleaned)[0] print(changed_positions) ``` Output: ```text [2 4 6] ``` Create labels: ```python labels = np.where(readings != cleaned, "corrected", "ok") print(labels) ``` Output: ```text ['ok' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'corrected' 'ok' 'ok'] ``` This is a realistic pattern for data cleaning. ## 29. Practice Exercises Try these before checking the solutions. ### Exercise 1: Sort by total Create a 4 by 3 array of product sales. Add a total column and sort rows by total sales in descending order. ### Exercise 2: Add a status column Given exam scores for students, add a column that contains `1` if the student's average is at least 60, otherwise `0`. ### Exercise 3: Unique customer visits Given an array of customer IDs, print unique customers and how many times each customer appears. ### Exercise 4: Top product per day Given a 2D array where rows are products and columns are days, find the product index with maximum sales for every day. ### Exercise 5: Clip and count outliers Given an array of values, clip everything between 10 and 90. Count how many values were changed. ### Exercise 6: Membership filter Given all user IDs and a list of blocked IDs, return only users who are not blocked. ### Exercise 7: Flip an image-like matrix Create a 4 by 4 array and flip it vertically, horizontally, and both ways. ### Exercise 8: Histogram buckets Create an array of ages and count how many people fall into age groups `[0, 18, 30, 45, 60, 100]`. ### Exercise 9: Remove min and max Create a 1D array and remove every occurrence of its minimum and maximum values. ### Exercise 10: Compare two batches Given two arrays of product IDs, find products only in batch A, only in batch B, and products present in both. ## 30. Practice Solutions ### Solution 1: Sort by total ```python sales = np.array([ [40, 55, 60], [90, 70, 85], [30, 45, 35], [75, 80, 72], ]) totals = sales.sum(axis=1, keepdims=True) with_total = np.concatenate((sales, totals), axis=1) ranked = with_total[np.argsort(with_total[:, -1])[::-1]] print(ranked) ``` **Explanation** - A NumPy array named `sales` is created, containing sales data for different categories. - The `sum` method computes the total sales for each row (category) while maintaining the original array's dimensions using `keepdims=True`. - The total sales are concatenated to the original `sales` array, creating a new array `with_total` that includes the totals as an additional column. - The rows of `with_total` are sorted in descending order based on the total sales using `np.argsort` and slicing. - Finally, the ranked array is printed, showing the sales data ordered by total sales. ### Solution 2: Add a status column ```python scores = np.array([ [70, 65, 80], [45, 50, 55], [90, 88, 92], ]) average = scores.mean(axis=1, keepdims=True) status = np.where(average >= 60, 1, 0) result = np.concatenate((scores, status), axis=1) print(result) ``` **Explanation** - Initializes a NumPy array `scores` containing test scores for three students across three subjects. - Computes the average score for each student along the specified axis (rows) while maintaining the original dimensions using `keepdims=True`. - Uses `np.where` to create a binary status array, marking students as '1' (pass) if their average score is 60 or above, and '0' (fail) otherwise. - Concatenates the original scores with the status array to form a new array that includes both scores and pass/fail status. - Outputs the final combined array, showing each student's scores alongside their pass/fail status. ### Solution 3: Unique customer visits ```python customers = np.array([101, 102, 101, 103, 102, 101, 104]) ids, counts = np.unique(customers, return_counts=True) print(ids) print(counts) ``` **Explanation** - The code initializes a NumPy array called `customers` containing customer IDs, some of which are repeated. - The `np.unique()` function is used to find unique customer IDs and count their occurrences, returning two arrays: `ids` for unique IDs and `counts` for their respective counts. - The unique IDs are printed to the console, showing which customers are present. - The counts of each unique ID are also printed, indicating how many times each customer ID appears in the original array. ### Solution 4: Top product per day ```python sales = np.array([ [20, 35, 30], [25, 30, 45], [40, 20, 25], ]) top_product_by_day = np.argmax(sales, axis=0) print(top_product_by_day) ``` **Explanation** - The code initializes a 2D NumPy array named `sales`, representing sales figures for three products over three days. - The `np.argmax` function is used to find the index of the highest sales value for each day, specified by `axis=0`, which indicates that the operation is performed column-wise. - The result, stored in `top_product_by_day`, contains the indices of the top-selling products for each day. - Finally, the indices of the top products are printed to the console. ### Solution 5: Clip and count outliers ```python values = np.array([5, 18, 44, 92, 100, 63, 7]) clipped = np.clip(values, 10, 90) changed_count = np.sum(values != clipped) print(clipped) print(changed_count) ``` **Explanation** - The code initializes a NumPy array named `values` with a set of integers. - It uses the `np.clip()` function to limit the values in the array to a specified range, in this case between 10 and 90. - The result of the clipping is stored in the `clipped` variable. - The code calculates the number of elements that were changed during the clipping process by comparing the original and clipped arrays, using `np.sum()` to count the differences. - Finally, it prints the clipped array and the count of changed values to the console. ### Solution 6: Membership filter ```python users = np.array([10, 11, 12, 13, 14, 15]) blocked = np.array([11, 15]) allowed_users = users[~np.isin(users, blocked)] print(allowed_users) ``` **Explanation** - The code initializes two NumPy arrays: `users` containing a range of user IDs and `blocked` containing IDs that are not allowed. - It uses `np.isin()` to create a boolean array that identifies which users are in the `blocked` list. - The tilde operator `~` negates this boolean array, effectively marking users that are not blocked. - The filtered array `allowed_users` is created by indexing the `users` array with the negated boolean array. - Finally, it prints the `allowed_users` array, which contains only the IDs of users that are not blocked. ### Solution 7: Flip an image-like matrix ```python image = np.arange(1, 17).reshape(4, 4) print(np.flip(image, axis=0)) print(np.flip(image, axis=1)) print(np.flip(image)) ``` **Explanation** - The code creates a 4x4 NumPy array filled with integers from 1 to 16 using `np.arange` and `reshape`. - `np.flip(image, axis=0)` flips the array vertically (upside down). - `np.flip(image, axis=1)` flips the array horizontally (left to right). - `np.flip(image)` flips the array both vertically and horizontally, resulting in a 180-degree rotation. - The `print` statements display the results of each flip operation. ### Solution 8: Histogram buckets ```python ages = np.array([12, 17, 18, 24, 29, 30, 37, 44, 45, 61, 72]) counts, edges = np.histogram(ages, bins=[0, 18, 30, 45, 60, 100]) print(counts) print(edges) ``` **Explanation** - The code initializes a NumPy array `ages` containing various age values. - It uses `np.histogram` to compute the frequency of ages within specified bins: [0, 18), [18, 30), [30, 45), [45, 60), and [60, 100). - The function returns two arrays: `counts`, which holds the number of ages in each bin, and `edges`, which defines the boundaries of the bins. - Finally, it prints the counts of ages in each bin and the edges of the bins to the console. ### Solution 9: Remove min and max ```python arr = np.array([4, 9, 1, 3, 9, 2, 1, 7]) minimum = arr.min() maximum = arr.max() filtered = arr[(arr != minimum) & (arr != maximum)] print(filtered) ``` **Explanation** - The code initializes a NumPy array `arr` with a set of integer values. - It calculates the minimum and maximum values in the array using the `min()` and `max()` methods. - A filtered array `filtered` is created by excluding the minimum and maximum values using boolean indexing. - Finally, the filtered array is printed, displaying only the values that are neither the minimum nor the maximum. ### Solution 10: Compare two batches ```python batch_a = np.array([101, 102, 103, 104]) batch_b = np.array([103, 104, 105, 106]) only_a = np.setdiff1d(batch_a, batch_b) only_b = np.setdiff1d(batch_b, batch_a) both = np.intersect1d(batch_a, batch_b) print("Only A:", only_a) print("Only B:", only_b) print("Both:", both) ``` **Explanation** - `batch_a` and `batch_b` are defined as NumPy arrays containing integer values. - `np.setdiff1d` is used to find elements that are in `batch_a` but not in `batch_b`, stored in `only_a`. - Similarly, `only_b` contains elements that are in `batch_b` but not in `batch_a`. - `np.intersect1d` identifies elements that are present in both arrays, stored in `both`. - The results are printed, showing unique elements from each array and their intersection. ## 31. Common Mistakes ### Mistake 1: Forgetting that `np.append()` flattens by default Always pass `axis` if you want to preserve 2D structure. ### Mistake 2: Sorting values when you meant to sort rows Use `np.sort()` to sort values inside arrays. Use `np.argsort()` to reorder rows based on a column or score. ### Mistake 3: Concatenating arrays with mismatched dimensions Print shapes before combining arrays: ```python print(a.shape) print(b.shape) ``` **Explanation** - The `print` function outputs the shape of the array `a` using the `.shape` attribute, which returns a tuple representing the dimensions of the array. - Similarly, the shape of the array `b` is printed, providing insight into its structure and size. - This is useful for debugging and understanding the data being processed in numerical computations or machine learning tasks. - The shapes can indicate whether the arrays are compatible for operations like addition, multiplication, or concatenation. ### Mistake 4: Confusing axis meanings For 2D arrays: - `axis=0` works down rows and returns one result per column - `axis=1` works across columns and returns one result per row ### Mistake 5: Mutating arrays accidentally Functions like `np.put()` modify the original array. Functions like `np.delete()` and `np.sort()` usually return new arrays. ## Final Takeaway NumPy becomes powerful when you stop thinking only in terms of individual values and start thinking in terms of whole-array transformations. The most useful habits are: 1. Print `shape` before combining arrays. 2. Use `axis` intentionally. 3. Prefer `concatenate`, `vstack`, or `hstack` when structure matters. 4. Use boolean masks and `np.where()` instead of manual loops. 5. Use `argsort`, `argmax`, and `argmin` when you need positions. 6. Use `clip`, `percentile`, and `histogram` for quick data cleaning and analysis. These tricks are small individually, but together they make NumPy feel like a practical data toolkit instead of just an array library. ## Sources and Further Reading - NumPy documentation: https://numpy.org/doc/ - NumPy sorting reference: https://numpy.org/doc/stable/reference/generated/numpy.sort.html - NumPy concatenate reference: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html - NumPy unique reference: https://numpy.org/doc/stable/reference/generated/numpy.unique.html - NumPy where reference: https://numpy.org/doc/stable/reference/generated/numpy.where.html - NumPy histogram reference: https://numpy.org/doc/stable/reference/generated/numpy.histogram.html - NumPy set routines: https://numpy.org/doc/stable/reference/routines.set.html