Photo features 공부: Sharma & Peng (2023)

Photo feature 공부를 위한 세 번째 논문은 Sharma and Peng (2023)이다. 음식 사진의 popularity 를 사진의 feature 를 가지고 예측하는 내용이다. 논문의 코드도 제공된다. 사진의 visual aesthetics 관련 feature 들은 아래와 같다.

1. Specific color

이것은 특정한 color 에 해당되는 화소가 차지하는 비중이다. 지난 포스트에서 살펴봤던 Segalin et al. (2017) 논문에서 color name 이라고 불렸던 것과 동일한 feature 들이다. 11가지의 color 중에서 필요에 맞는 특정한 color (red, orange, yellow, green, blue)의 비중만을 사용했다.

2. Brightness and Colorfulness

이 두 feature 역시 Segalin et al. (2017) 에서 사용된 것들이다.

3. Visual complexity

3-1. Feature complexity

a) file size of the image stored as a JPEG file, divided by the image size

설명된 그대로이다.

b) percentage of edge points

이것도 Segalin et al. (2017) 에서 edge pixels 라는 이름으로 사용되었다.

c) number of segements

이것도 Segalin et al. (2017) 에서 사용되었다. 그 때는 level of detail 을 측정하는 두 가지 중 하나로 사용되었다.

3-2. Compositional complexity

a) edge point 사이의 거리

논문에는 다음과 같이 설명되어 있다. 참고로 인용한 원래 논문에서는 edge distribution 이라고 명명되어 있다.

In an image with high compositional complexity, edge points should be further away from each other. We used the average Euclidean distance among all the pairs of edge points, divided by the image’s diagonal (Peng & Jemmott, 2018)

b) relative size of a minimal bounding box that contained at least 90% of the edge points

이 feature 역시 Peng and Jemmott (2018)에서 사용되었는데, 다만 거기서는 95%로 되어 있다. 그리고 위 두 가지 모두 원래 Ke et al. (2006) 논문에서 사용된 것이다. 거기에서는 edge spatial distribution 이라고 명명되어 있다. 이 논문 뿐 아니라 모든 글에서, 기존 논문을 인용할 때에는 feature 의 이름도 정확하게 사용하는 것이 독자들의 혼선을 줄이는 길인 것 같다. 이름을 바꾸고자 할 때에는 그 이유도 제시하는 것이 좋겠다.

3-3. Color variety

a) hue count

시각적으로 좋게 보이는 범주 안에 들어있는 hue 를 가진 화소의 개수이다. S < 0.2, 0.15 < V < 0.85 인 화소만 골라낸다(인용한 원래 논문인 Ke et al. (2006) 에서는 0.15 < V < 0.95 로 되어 있는데, 왜 바꿨는지는 모르겠다). 이걸 가지고 20 bin 의 히스토그램을 만든다. $m$을 히스토그램의 최대값이라 하고, $N$을 $\alpha m$보다 큰 값을 갖는 bin 이라고 할때($\alpha$는 noise sensitivity 조절 파라미터, 여기서는 0.05로 설정.), 이 feature 는 20에서 $N$의 개수를 뺀 것으로 되어 있다.

b) Shannon index

이것은 앞서 Specific color 관련 feature 에서 11개의 색깔에 해당하는 화소의 비중을 구한 후, black, white, gray 를 제외한 8가지 색깔의 비중을 가지고 그 다양성을 구한 것이다. 구체적으로 어떻게 구했는지는 설명되어 있지 않다.

3-4. Repetition

시각적으로 유사한 요소들을 반복적으로 사용하고 있는지를 측정한 feature 이다. 이미지를 $2 \times 2$ 블록으로 나누고, 각 블록을 사전 훈련된 인공신경망에 넣어서 4,096 차원의 벡터로 만든다. 블록들 사이의 코사인 유사도 점수를 평균해서 feature 로 사용한다. 시각적으로 반복적인 사진이 어떤 영향을 주는지에 대해서는 선행 연구가 2편 인용되어 있지만, repetition 을 왜 이렇게 측정하는지에 대해서는 인용이 없다.

전반적으로 그저 그렇다. 일단 visual complexity 를 구성하는 각 세부 항목의 feature 들을 z-score 로 변환후 하나로 합쳐서 한 가지 변수로 사용했는데, 이건 좀 의문이다. Cronbach $\alpha$ 값이 보고되고 있기는 하지만, 똑같이 complexity 라고 부를 수 있다고 하더라도 엄연히 다른 측면을 측정하고 있는 것이 아닌가 싶다. 그리고 feature 의 이름이나 세부적인 측정 방법을 그냥 맘대로 변경해서 사용하거나, 적절한 설명 및 인용을 제공하지 않는 것도 좀 그렇다.

<참고문헌>

Ke, Y., Tang, X., & Jing, F. (2006). The design of high-level features for photo quality assessment. Presented at the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY. https://doi.org/10.1109/cvpr.2006.303
Peng, Y., & Jemmott, J. B. (2018). Feast for the eyes: Effects of food perceptions and computer vision features on food photo popularity. International Journal of Communication, 12, 313-336. https://ijoc.org/index.php/ijoc/article/view/6678
Segalin, C., Perina, A., Cristani, M., & Vinciarelli, A. (2017). The pictures we like are our image: Continuous mapping of favorite pictures into self-assessed and attributed personality traits. IEEE Transactions on Affective Computing, 8(2), 268-285. https://doi.org/10.1109/taffc.2016.2516994
Sharma, M., & Peng, Y. (2023). How visual aesthetics and calorie density predict food image popularity on Instagram: A computer vision analysis. Health Communication, 1-15. https://doi.org/10.1080/10410236.2023.2175635