Computers use tiny dots called pixels to display images. Each pixel is stored as an array of numbers that represent color intensities.
Example. In an 8-bit grayscale image, each pixel is a single number. The number represents light intensity ranging from black (0) to white (255).
Example. In a 24-bit RGB color image, each pixel is an array of 3 numbers. These numbers range from 0 to 255 and represent red, green, and blue intensity, respectively. For instance, (0, 0, 255)
is bright blue and (255, 128, 0)
is orange.
In this assignment, you'll use Python and NumPy to manipulate 24-bit RGB color images.
You can use Image.open()
from the Python imaging library (PIL) to open an image:
from PIL import Image
# Cat image from https://unsplash.com/photos/FqkBXo2Nkq0
cat_img = Image.open("cat.png")
Images display inline in Jupyter notebooks:
cat_img
In a Python terminal, you can display the image in a new window with .show()
instead.
NumPy can convert images to arrays:
import numpy as np
cat = np.array(cat_img)
To convert an array back to an image (for display) use the function below:
def as_image(x):
"""Convert an ndarray to an Image.
Args:
x (ndarray): The array of pixels.
Returns:
Image: The Image object.
"""
return Image.fromarray(np.uint8(x))
Exercise 1.1. How many dimensions does the cat
array have? What does each dimension represent?
print cat.shape
print cat.dtype
a, b, c = cat.shape
This shows that cat
is a 300-by-451 pixel image with three channels (red, green, and blue). It has three dimensions.
Exercise 1.2. Use .copy()
to copy the cat array to a new variable. Swap the green and blue color channels in the copy. Display the result.
#Unused potential solution
#def split_rgb(image):
#red = image[:,:,2]
#green = image[:,:,1]
#blue = image[:,:,0]
#return red, green, blue
colors=cat.copy()
for i in range (a): #first dimension
for j in range (b):
colors[i][j][1], colors[i][j][2] = cat[i][j][2], cat[i][j][1]
as_image(colors)
Exercise 1.3. Why is .copy()
necessary in exercise 1.2? What happens if you don't use .copy()
?
.copy()
is necessary because you want to store the flipped image as a separate variable. If you don't use it, you would overrwrite it.
Exercise 1.4. Flip the blue color channel from left to right. Display the resulting image. Hint: see the NumPy documentation on array manipulation routines.
flip=cat.copy()
flip[:, :, 2] = np.fliplr(cat[:, :, 2])
as_image(flip)
Suppose $X$ is an $n \times p$ matrix (for instance, one color channel of the cat image). The singular value decomposition (SVD) factors $X$ as $X = UD V^T$, where:
Note that a matrix $A$ is orthogonal when $A^T A = I$ and $AA^T = I$.
Example. We can use NumPy to compute the SVD for a matrix:
x = np.array(
[[0, 2, 3],
[3, 2, 1]]
)
u, d, vt = np.linalg.svd(x)
# Here d is 2x2 because NumPy only returns the diagonal of D.
print "u is:\n", u, "\nd is:\n", d, "\nv^T is:\n", vt
If we let
then we can write the SVD as $\ X = UDV^T = d_1 u_1 v_1^T + \ldots + d_m u_m v_m^T\ $ using the rules of matrix multiplication. In other words, the SVD decomposes $X$ into a sum!
If we eliminate some of the terms in the sum, we get a simple approximation for $X$. For instance, we could eliminate all but first 3 terms to get the approximation $X \approx d_1 u_1 v_1^T + d_2 u_2 v_2^T + d_3 u_3 v_3^T$. This is the same as if we:
We always eliminate terms starting from the end rather than the beginning, because these terms contribute the least to $X$.
Why would we want to approximate a matrix $X$?
In statistics, principal components analysis uses this approximation to reduce the dimension (number of covariates) in a centered (mean 0) data set. The vectors $d_i u_i$ are called the principal components of $X$. The vectors $v_i^T$ are called the basis vectors. Note that both depend on $X$. The dimension is reduced by using the first $q$ principal components instead of the original $p$ covariates. In other words, the $n \times p$ data $X$ is replaced by the $n \times q$ data $UD_q = XV_q$
In computing, this approximation is sometimes used to reduce the number of bits needed to store a matrix (or image). If $q$ terms are kept, then only $nq + pq$ values (for $XV_q$ and $V_q^T$) need to be stored instead of the uncompressed $np$ values.
Exercise 2.1. Write the functions described below.
A function that takes a matrix $X$ and returns its principal component matrix $XV_q$ and basis matrix $V_q^T$. This function should also take the number of terms kept $q$ as an argument.
A function that takes a principal component matrix $XV_q$ and basis matrix $V_q^T$ and returns an approximation $\hat{X}$ for the original matrix.
As usual, make sure to document your functions. Test your function on the red color channel of the cat image. What's the smallest number of terms where the cat is still recognizable as a cat?
red_channel=cat[:, :, 0]
as_image(red_channel)
x=red_channel
u, d, vt = np.linalg.svd(x) #d and u are called the principal components of x
#is q 3 if all of these stay u, d, vt original values
print x.shape #The red channel only has 2 dimensions, which is what we want
#print "u is:\n", u, "\nd is:\n", d, "\nv^T is:\n", vt
def cat_svd(x,q):
u, d, vt = np.linalg.svd(x)
vt_q = vt[:q]
p_comp = x.dot(vt_q.T)
return [p_comp, vt_q]
p_comp, vt_q=cat_svd(x, 10)
def unclear_cat(p_mat, basis_mat):
x_hat = p_mat.dot(basis_mat)
return x_hat
x_hat = unclear_cat(p_comp, vt_q)
as_image(x_hat)
In my opinion, the cat is still recognizable as a cat when q=10. It should be noted that this is subjective, because some people may be able to recognize it at lower levels, and vice versa.
Exercise 2.2. You can check the number of bytes used by a NumPy array with the .nbytes
attribute. How many bytes does the red color channel of the cat image use? How many bytes does the compressed version use when 10 terms are kept? What percentage of the original size is this?
print "The red channel color of the cat image uses", red_channel.nbytes, "bytes."
print "The compressed version with 10 terms kept uses", p_comp.nbytes+vt_q.nbytes, "bytes."
redbyte=red_channel.nbytes
comp_byte=p_comp.nbytes+vt_q.nbytes
print "The compressed image uses", (float(comp_byte)/float(redbyte))*100, "% of the bytes as the original."
Sources and Notes
http://scikit-image.org/docs/dev/user_guide/numpy_images.html http://knowpapa.com/opencv-rgb-split/ http://stackoverflow.com/questions/38538952/how-to-swap-blue-and-green-channel-in-an-image-using-opencv
Worked with Chad Pickering, Ricky Safran, Hannah Kosinovsky, Sierra Tevlin, Janice Luong, and other people in the class whom I can't keep track of.