Is the Celeba dataset incorrect?

A

Andrei1penguin12021-04-02 20:29:55

Python

Andrei1penguin1, 2021-04-02 20:29:55

Good day, there is a dataset of Celeba faces, here is a link to it on kaggle:
https://www.kaggle.com/jessicali9530/celeba-dataset
The archive also contains several files with additional information, in particular, the list_bbox_celeba.csv file containing lines in the format:
image_id,x_1,y_1,width,height
But these points do not correspond to the bounding box on the image, for example:
In the dataset, all images are 178x218, but the first line of the list_bbox_celeba file contains the following line:
000001.jpg,95,71,226,313
Whereas such a height and even more so cannot be
Here is the code for rendering a rectangle:

import cv2
img = cv2.imread("img.jpg")
with open("list_bbox_celeba.csv") as file:
    points = file.readlines()[1].replace("\n", "").split(",")[1:]
    points = [int(point) for point in points]
box = cv2.rectangle(img, (points[0], points[1]), (points[0] + points[2], points[1] + points[3]), (255, 0, 0), 3)
cv2.imshow("img", img)
cv2.imshow("img_box", box)
cv2.waitKey(0)
cv2.destroyAllWindows()

Please tell me, is this a problem in the dataset, or am I visualizing incorrectly?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vladimir Kuts, 2021-04-03
@fox_12

Well, yes - there actually is in the discussions:

The bbox coordinates mentioned here are corresponding to the original images in the CelebA. These are face crops generated by some other technique. You can either use the original images or just skip using bbox.

The dataset shows parts of the images, while the bbox coordinates are for the original images.
If you do a cursory search, it will lead to this site:
https://www.programmersought.com/article/60434058932/
which will lead here:
mmlab.ie.cuhk.edu.hk/projects/CelebA.html
There you can actually download the original Images.