Photo features 공부: Matz et al. (2019)

Photo feature 공부를 위한 두 번째 논문은 Matz et al. (2019)이다. 논문의 코드는 MatLab 으로 제공되기 때문에, Python 으로 제공되는 다른 논문들에 비해 개인적으로 유용성은 약간 떨어지기는 한다. 그래도 대부분의 feature 들이 지난 포스트들에서 살펴봤던 Segalin et al. (2017)과 겹친다. 특히 Content 관련 feature 들을 제외한, color, composition, texture 관련 feature 들은 거의 동일하다. 더구나 content 관련 feature 들도 뭔가 이렇다 할 만한 것은 없다. 그래서 간단히 살펴보기로 한다.

1. Object detectors (18개 features)

당연히 콘텐츠의 가장 기본적인 측면은 사진에 등장하는 사물이나 대상이 무엇이냐 하는 점이다. 이를 위해 기존 연구(Felzenszwalb et al., 2010)에서 제안된 object detector 를 사용하여 가장 빈번히 등장한다고 간주되는 9가지 사물을 탐지하였다. 9가지 사물은 person, airplane, bicycle, bottle, bus, car, cat, motorbike, chair 이다. 이 9가지 사물의 개수와 average bounding box size 를 feature 로 사용하였다.

근데 현재 시점에서 이걸 사용하기는 좀 그렇다. 일단 현재 object detection 이나 사진의 콘텐츠에 대한 좋은 도구들이 많이 나와 있다. 현재까지 연구에 사용하고 있는 Azure Cognitive Services 의 Computer Vision API 도 아주 괜찮다. 특히 사진마다 등장 사물에 따라 tag 를 달아주고 각 tag 마다 confidence score 를 제공해주기 때문에, 그걸 가지고 사진(들)의 콘텐츠를 판정하는 것이 더 낫다.

2. Faces (3개 features), Upper bodies (1개 feature), Number of people (1개 feature)

소셜 미디어 사진에 사람이 (가장) 많이 나오기 때문에, 그에 대한 feature 를 사용하는 것이 당연하다. 이 논문에서는 얼굴과 상체에 대한 feature 를 사용하였다. 먼저 얼굴에 관해서는 기존 연구(Viola & Jones, 2001))에서 제안된 face detector 를 사용하여, 얼굴 개수, area of bounding boxes, pose angle 을 feature 로 사용하였다. 또한 기존 연구(Ferrari et al., 2008)의 upper-body detector 를 사용하여 area of bounding boxes 를 feature 로 사용하였다. Number of people 은 Amazom Mechanical Turk 를 사용하여 manual annotate 하였다.

이 feature 들 역시 현재 사용하기에는 좀 그렇다. face 와 관련해서는 더 좋은 detector 나 분석 도구들이 많이 나와 있고, upper body 는 적절한 feature 인지에 대해 좀 의문이 든다. 그냥 지금까지 쓰던 걸 계속 쓰는 걸로.

3. Visual clutter (2개 features)

이에 대해서는 자세히 설명되어 있지 않다. 다만 표에 “Feature congestion and measures to describe the busyness of an image (Rosenholtz, Li, Mansfield & Jin, 2005)” 라고만 제시되어 있다.

4. Computer Graphics (5개 features)

이것들은 기본적으로 natural image 와 artificial image 를 구분하기 위한 것이라고 되어 있다. 크게 흥미가 가지 않아서 생략.

전반적으로 설명이 자세히 되어 있지 않다. 또한 현재 사용하고 있는 feature 들에 비해 더 낫다고 얘기할만한 것들도 딱히 없다. 콘텐츠 관련 feature 들이 중요한 것은 여전히 사실이지만, 최소한 이 논문에서는 관련 feature 들을 새롭게 발견했다고 할 수는 없을 듯.

<참고문헌>

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transaction on Pattern Analysis and Machine Intelligence, 32(9), 1627-1645. https://doi.org/10.1109/TPAMI.2009.167
Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference (pp. 1–8). IEEE.
Matz, S. C., Segalin, C., Stillwell, D., Müller, S. R., & Bos, M. W. (2019). Predicting the personal appeal of marketing images using computational methods. Journal of Consumer Psychology, 29(3), 370-390. https://doi.org/10.1002/jcpy.1092
Segalin, C., Perina, A., Cristani, M., & Vinciarelli, A. (2017). The pictures we like are our image: Continuous mapping of favorite pictures into self-assessed and attributed personality traits. IEEE Transactions on Affective Computing, 8(2), 268-285. https://doi.org/10.1109/taffc.2016.2516994
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference (Vol. 1, p. I–511). IEEE.