The Korean Society for Journalism & Communication (KSJCS)

[ Article ]

Korean Journal of Journalism & Communication Studies - Vol. 68, No. 2, pp.100-139

ISSN: 2586-7369 (Online)

Print publication date 30 Apr 2024

Received 12 Oct 2023 Revised 29 Mar 2024 Accepted 01 Apr 2024

DOI: https://doi.org/10.20879/kjjcs.2024.68.2.003

시위 뉴스 영상에서 폭력 프레이밍의 작동 기제 분석 : 비전 트랜스포머(Vision Transformer)를 활용한 폭력 이미지 분류를 통해

이문혁^*** ; 이종혁^****

***경희대학교 미디어학과 박사수료 dalvitt@gmail.com
****경희대학교 미디어학과 교수 jonghhhh@khu.ac.kr

Analyzing Violence Framing Mechanisms in Protest News Videos : Classifying Violent Images using Vision Transformer

Moon Hyuk Lee^*** ; Jong Hyuk Lee^****

***Ph.D. Candidate, Department of Media, Kyung Hee University dalvitt@gmail.com
****Professor, Department of Media, Kyung Hee University, corresponding author jonghhhh@khu.ac.kr

초록

집회나 시위는 국민의 기본권 행사 행위이다. 그럼에도 불구하고 정부 정책에 반대하는 시민 집회나 임금 인상을 요구하는 노동자 시위가 불법으로 단속되며 언론에 의해 부정적으로 다뤄지는 경우가 많다. 본 연구는 이런 문제의식을 바탕으로 방송뉴스의 시위 보도에서 폭력 프레이밍에 사용되는 영상 편집 전략을 살펴보았다. 구체적으로, 영상에서 폭력성을 강조하는 편집 전략은 화면의 위치와 숏의 지속시간이라는 두 가지 관점에서 논의될 수 있다. 첫 번째 관점에서 폭력 관련 화면은 시청자의 관심을 끌 수 있으므로, 뉴스 스토리 내에서 초반부에 배치될 것으로 예측됐다. 두 번째 관점에서 폭력 화면은 생동감과 긴장감을 높여 시청자 관심을 끌기 위해 최대한 많은 화면이 짧게 구성되는 방식으로 편집될 것으로 예측된다. 띠라서 폭력 화면의 숏 지속시간이 상대적으로 짧을 것으로 예측할 수 있다. 본 연구에서는 위 가설을 검증하기 위해 Vision Transformer(ViT)를 바탕으로 이미지의 폭력 여부를 판단하는 분류기를 개발했다. 구체적으로, 연구진은 허깅페이스(Hugging Face)에 공개된 vit-large-patch16-224 모델에 최종 출력을 폭력/비폭력로 전환하는 미세조정(fine-tuning)을 실시해 분류기를 개발했다. 사용된 학습데이터셋은 로보플로우(Roboflow)에 공개된 이미지 데이터(Dinesh Narianir의 Violence&not_violence Computer Vision Project)였다. 분류기의 정확도(accuracy)와 F1 값은 모두 97.12%로 대체로 높은 수준을 기록했다. 이어서 본 연구진은 네이버 뉴스에서 ‘노동절 시위’로 2003년~2023년 검색된 뉴스 영상 335건(9개 방송사)을 수집했다. 여기에서 추출된 키프레임 13,156개는 앞서 개발된 폭력 여부 분류기를 통해 폭력과 비폭력으로 분류됐다. 분석 결과, 뉴스 스토리의 초반부에 위치한 키프레임에서 (후반부 키프레임에 비해) 더 많은 폭력 장면이 관찰됐으며, 지속시간이 짧은 키프레임에서 (긴 키프레임에 비해) 더 많은 폭력 장면이 나타났다. 또한, 키프레임의 위치와 지속시간 사이에 상호작용 효과도 유의미하게 나타났다. 이는 폭력적 장면을 중시하는 언론이 이런 장면을 영상의 초반에 위치시키고 다양한 촬영 장면을 동원해 빠르게 편집한다는 것이다. 시위 관련 화면은 대체로 집회, 연설, 구호, 행진, 퍼포먼스의 장면으로 구성되며, 때때로 몸싸움, 화염병, 기물 파손, 점거, 소동 등 폭력적 장면을 동반한다. 이 가운데 폭력적 장면이 영상의 초반부에 배치돼 시청자의 즉각적 관심을 끄는 역할을 하고 있는 것이다. 또한 다양한 폭력적 장면이 짧게 여러 컷 배치되면서 시청자의 관심을 증폭시키는 것이다. 이와 같은 영상 편집 전략에는 일탈성 뉴스가치를 앞세워 시청자의 관심을 유도하고 시청률을 올리려는 목적이 엿보인다. 이런 편집은 시청자에게 시위의 내용과 목표를 충분히 전달하지 못한다. 시위 관련 취재보도 관행인 ‘시위 패러다임’에는 폭동 프레임과 대치 프레임뿐 아니라 토론 프레임도 있다. 우리 언론이 시청률을 의식해 폭동과 대치 등 폭력 관련 프레임을 사용하는 관행을 개선하고, 시위 내용에 주목하고 사회적 토론을 유도하는 역할을 맡아야 하겠다.

Abstract

Protests are acts in which citizens exercise their basic rights. However, citizen rallies opposing government policies or labor strikes demanding wage increases are often suppressed as illegal and portrayed negatively by the media. Journalism research challenges the media's reporting techniques, known as the 'protest paradigm', by pointing out that protests are frequently described as disturbances and confrontations. Most studies about pretest news have focused on textual analysis, with little in-depth analysis on news videos. In this regard, this study examined the video editing strategies used to frame violence in broadcast news coverage of protests. Specifically, editing strategies that emphasize violence in videos can be discussed from two perspectives: the location and duration of violence-related shots. From the first standpoint, it is expected that violence-related scenes will be put early in the news story to attract viewers' attention. From the second perspective, it is predicted that violence scenes are expected to be edited in such a way that the number of brief shots is maximized in order to enhance tension and capture viewers' attention. As a result, it is reasonable to expect that shots involving violence will be brief. To verify these hypotheses, this study developed a classifier to determine the presence of violence in images based on the Vision Transformer (ViT). The researchers fine-tuned the publicly available vit-large-patch16-224 model on Hugging Face by replacing the output class into violent/non-violent categories. The classifier achieved high levels of accuracy (97.12%) and F1 score. Subsequently, the researchers collected 335 news videos (from 9 broadcasters) on "Labor Day protests" from Naver News between 2003 and 2023. From these, 13,156 keyframes were identified as violent or non-violent using the developed violence classifier. The results showed that more violent scenes were observed in keyframes located in the early parts of the news story, and more violent scenes were observed in keyframes with shorter durations. Moreover, there was a significant interaction effect between the location and duration of keyframes. This indicates that media emphasizing violent scenes tend to place such scenes at the beginning of the video and employ various similar shots for rapid editing. This editing strategy may be designed to capture the audience's attention by highlighting the deviance news value. The protest paradigm in media coverage of protests includes not only riot and confrontation frames but also discussion frame. Korean media, mindful of viewer ratings, tends to use violent frames including riots and confrontations. Moving forward, it is essential for the media to focus on the themes of protests as socially significant issues and to facilitate the exchange of opinions among societal members through discussion frames.

Keywords:

Protest Paradigm, Violence, Framing, Image Classification, Vision Transformer

키워드:

시위 패러다임, 폭력, 프레임, 이미지 분류

Acknowledgments

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea(이 논문은 2022년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임)[NRF-2022S1A5C2A03093660].

논문의 발전에 큰 도움을 주신 익명의 심사위원들께 감사를 드립니다.

References

Arpan, L. M., Baker, K., Lee, Y., Jung, T., Lorusso. L., & Smith, J. (2006). News coverage of social protests and the effects of photographs and prior attitudes. Mass Communication & Society, 9(1), 1-20. [https://doi.org/10.1207/s15327825mcs0901_1]
Asch, S. E. (1946). Forming impressions of personality. The Journal of Abnormal and Social Psychology, 41(3), 258-290. [https://doi.org/10.1037/h0055756]
Baek, G., & Yoon, H. Y. (2021). Key frame analysis of network TV news covering female sexual crime victims: Victimization & sensationalism of visual image. Korean Journal of Journalism & Communication Studies, 65(2), 75-113.
백지연·윤호영 (2021). 방송 뉴스가 재현하는 성범죄 피해 여성 이미지에 대한 키프레임 분석: 가상물, 자료화면을 통한 피해자다움의 재생산과 익명-실명 보도의 차이를 중심으로. <한국언론학보>, 65권 2호, 75-113. [ https://doi.org/10.20879/kjjcs.2021.65.2.003 ]
Baek, S.-G., Choi. K.-J., & Yoon, H.-J. (2011). International comparative study of broadcast news. Seoul: Koreal Press Foundation.
백선기·최경진·윤호진 (2011). <방송뉴스의 국제 비교 연구>. 서울: 한국언론진흥재단.
Ban, H., & Hong, W. (2009). A content analysis of korean broadcasting news format: Focusing on the evening main news of KBS, MBC and SBS. Studies of Broadcasting Culture, 21(1), 9-38.
반현·홍원식 (2009). 국내 지상파 방송 뉴스 포맷 연구: KBS, MBC, SBS 저녁 메인 뉴스를 중심으로. <방송문화연구>, 21권 1호, 9-38.
Berkowitz, D. (1992). Non‐routine news and newswork: Exploring a what‐a‐story. Journal of Communication, 42(1), 82-94. [https://doi.org/10.1111/j.1460-2466.1992.tb00770.x]
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., & Sukthankar, R. (2011, August). Violence detection in video using computer vision techniques. Paper presented at the 14th International Conference on Computer Analysis of Images and Patterns (CAIP 2011), Seville, Spain. [https://doi.org/10.1007/978-3-642-23678-5_39]
Brasted, M. (2005). Protest in the media. Peace Review: A Journal of Social Justice, 17(4), 383-388. [https://doi.org/10.1080/10402650500374645]
Byun, Y. (2016). Describe aspects of conflict issues in the newspaper. Gyoeraeomoonhak, 57, 149-183.
변영수 (2016). 신문의 갈등 이슈 기술 양상 -‘촛불집회’의 뉴스 프레임 강조 장치를 중심으로. <겨레어문학>, 57호, 149-183.
Chan, J. M., & Lee, C. C. (1984). Journalistic paradigms of civil protests: A case study of Hong Kong. The News Media in National and International Conflict, 183-202.
Chen, L. H., Hsu, H. W., Wang, L. Y., & Su, C. W. (2011, August). Violence detection in movies. Paper presented at the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization (CGIV 2011), Singapore. [https://doi.org/10.1109/CGIV.2011.14]
Chen, X., Hsieh, C. J., & Gong, B. (2021). When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv preprint arXiv:2106.01548.
Cho, Y., Chung, Y., Yoon, H. Y., Kim, M., Kim, N. Y., Chen, L., ... & Kang J. (2020). Analysis of the 19th presidential TV debate using deep learning based video processing algorithms: Analysis of the frequency, facial expression and gaze. Korean Journal of Journalism & Communication Studies, 64(5), 319-372.
최윤정·정유진·윤호영·김민정·김나영·첸루·신주연·이주희·김나영·여은·강제원 (2020). 딥 러닝(Deep learning)기반 동영상 처리 알고리즘을 통한 19대 대선 TV토론 영상분석: 후보자들의 등장빈도, 표정, 응시방향에 대한 분석. <한국언론학보>, 64권 5호, 319-372. [ https://doi.org/10.20879/kjjcs.2020.64.5.009 ]
Choi, E.-J. (2013). Video production theory. Seoul: Communicationbooks.
최이정 (2013). <영상 제작론>. 서울: 커뮤니케이션북스.
Choi, M.-J. (2005). A study of visual representation paradigm in TV news : Focusing on recognition of TV news cameramen. Journal of Broadcasting and Telecommunications Research, 60, 323-349.
최민재 (2005). TV뉴스의 영상구성에 대한 패러다임 연구: TV카메라기자의 인식을 중심으로. <방송통신연구>, 60호, 323-349.
Choi, Y. J. (2008). Order and proportion effects of scenes in a broadcasting news story, and a moderating role of image-issue. Korean Journal of Broadcasting and Telecommunication Studies, 22(3), 365-396.
최윤정 (2008). 방송 뉴스에서 신(scene)의 순서효과 및 비중효과 검증과 이미지-이슈의 조절기능에 대한 연구. <한국방송학보>, 22권 3호, 365-396.
Choi, Y. J., Chung, Y., & Jung, K. H. (2023). Transition in measuring media violence: Automated detection of violent scenes through computer vision. Studies of the Broadcasting Culture, 35(2), 5-59.
최윤정·정유진·정금희 (2023). 미디어 폭력성 측정방식의 전환: 컴퓨터 비전을 통한 자동화된 폭력장면 검출. <방송문화연구>, 35권 2호, 5-59.
Constantin, M. G., Ştefan, L. D., Ionescu, B., Demarty, C. H., Sjöberg, M., Schedl, M., & Gravier, G. (2020). Affect in multimedia: Benchmarking violent scenes detection. IEEE Transactions on Affective Computing, 13(1), 347-366. [https://doi.org/10.1109/TAFFC.2020.2986969]
Cummings, D. (2014). The DNA of a television news story: Technological influences on TV news production. Electronic News, 8(3), 198-215. [https://doi.org/10.1177/1931243114557596]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Gamson, R. M., & Modigliani, A. (1989). Media discourse and public opinion on nuclear power: A constructionist approach. American Journal of Sociology, 95(1), 1-37. [https://doi.org/10.1086/229213]
Giannakopoulos, T., Makris, A., Kosmopoulos, D., Perantonis, S., & Theodoridis, S. (2010, May). Audio-visual fusion for detecting violent scenes in videos. Paper presented at the 6th Hellenic Conference on Artificial Intelligence (SETN 2010), Athens, Greece. [https://doi.org/10.1007/978-3-642-12842-4_13]
Gitlin, T. (1980). The whole world is watching. Berkely, CA: University of California Press.
Gong, Y., Wang, W., Jiang, S., Huang, Q., & Gao, W. (2008, December). Detecting violent scenes in movies by auditory and visual cues. Paper presented at the 9th Pacific Rim Conference on Multimedia (PCM 2008), Tainan, Taiwan. [https://doi.org/10.1007/978-3-540-89796-5_33]
Gruber, J. B. (2023). Troublemakers in the streets? A framing analysis of newspaper coverage of protests in the UK 1992–2017. The International Journal of Press/Politics, 28(2), 414-433. [https://doi.org/10.1177/19401612221102058]
Ha, J.-W., Noh, J.-D., Yoon, S., Kim, M.-S., & Ahn, C. (2022, January). Finding focused key frames of a given meaning on video data. Paper presented at the Korean Society of Computer Information Conference, Daejeon.
하종우·노정담·윤성웅·김민수·안창원 (2022, 1월). <영상의 특정 의미를 반영하는 Key Frame의 추출 방법>. 한국컴퓨터정보학회 동계학술대회. 대전: 대전창조경제혁신센터.
Hallin, D. (1986). The uncensored war: The media and Vietnam. New York, NY: Oxford University press.
Hallin, D. (1992). Sound bite news: Television coverage of elections, 1968–1988. Journal of Communication, 42(2), 5-24. [https://doi.org/10.1111/j.1460-2466.1992.tb00775.x]
Harlow, S., & Bachmann, I. (2023). Police, violence, and the “logic of damage”: Comparing us and chilean media portrayals of protests. Mass Communication and Society, 27(2), 254-277. [https://doi.org/10.1080/15205436.2023.2186247]
Heilman, G. (2023, April 30). Which countries celebrate International Labor Day on 1 May? Diario AS. Retrieved 11/20/23 from https://en.as.com/latest_news/which-countries-celebrate-international-labor-day-on-1-may-n/#tooltip_autores
Hong, J.-H., & Na, E.-K. (2015). Victim blaming of Sewal-ferry disaster on news in conservative total TV programming: categorization of victims and word network analysis. Korean Journal of Journalism & Communication Studies, 59(6), 69-106.
홍주현·나은경 (2015). 세월호 사건 보도의 피해자 비난 경향 연구: 보수 종편 채널 뉴스의 피해자 범주화 및 단어 네트워크 프레임 분석. <한국언론학보>, 59권 6호, 69-106.
Im, Y.-J. (2009). A comparative analysis of news frame of social disputes on the selected TV news: The 2009 Youngsan accident through MBC, KBS, SBS News. Korean Journal of Journalism & Communication Studies, 53(5), 55-79.
임양준 (2009). 집단적 갈등 이슈에 대한 방송뉴스 프레임 비교연구: 용산참사에 대한 MBC, KBS, SBS 저녁뉴스를 중심으로. <한국언론학보>, 53권 5호, 55-79.
Ionescu, B., Schlüter, J., Mironica, I., & Schedl, M. (2013, April). A naive mid-level concept-based fusion approach to violence detection in Hollywood movies. Paper presented at the International Conference on Multimedia Retrieval (ICMR 2013), Dallas, TX. [https://doi.org/10.1145/2461466.2461502]
Jang, S. H. (1994). The theory and practice of TV news footage. Seoul: Kidari.
장석호 (1994). <TV 보도 영상의 이론과 실제>. 서울: 기다리.
Jang, Y. H. (1988). Social movements and the media: a study on the social construction of social movements by the mass media. The Korean Journal of Humanities and the Social Sciences, 11(4), 37-72.
장용호 (1987). 사회운동과 언론: 대중매체에 의한 사회운동의 사회적 구성에 관한 연구. <현상과 인식>, 41호, 37-72.
Joo, J., & Steinert-Threlkeld, Z. C. (2022). Image as data: Automated content analysis for visual presentations of political actors and events. Computational Communication Research, 4(1), 11-67. [https://doi.org/10.5117/CCR2022.1.001.JOO]
Kim, H. H., & Lee, J. K. (2009). Broadcasting news reporting & writing. Seoul: Namuwasup.
김학희·이재경 (2009). <방송보도>. 서울: 나무와 숲.
Kim, M. H. (2018). How to write a TV news story. Seoul: Communicationbooks.
김문환 (2018). <TV 뉴스 기사 작성법>. 서울: 커뮤니케이션북스.
Kim, S. H. (2022). Video and TV journalism. Seoul: Publius.
김성환 (2022). <영상과 TV 저널리즘>. 서울: 푸블리우스.
Kim, S.-J. (2003). Visualizing news objectivity: A cmparative case ctudy of environmental television news in the US and Korea. Korean Journal of Journalism & Communication Studies, 47(5), 363-384.
김수정 (2003). 뉴스 객관성의 영상화. <한국언론학보>, 47권 5호, 363-384.
Lancaster, K. (2013). Video journalism for the web: A practical introduction to documentary storytelling. New York, NY: Routledge.
Lee, C. (2012). A study on characteristics, sensationalism and reality representation of CCTV video on TV news. Broadcasting & Communication, 13(4), 5-43.
이창훈 (2012). CCTV영상의 보도 특성과 선정성, 현실 재현에 관한 연구. <방송과 커뮤니케이션>, 13권 4호, 5-43.
Lee, H. S. (2016). Korea broadcasting newsroom. Paju: Nanam.
이화섭 (2016). <한국방송 뉴스룸>. 파주: 나남.
Lee, H.-Y., & Yun, S.-J. (2013). An analysis of news coverage on conflicts concerning transmission line construction in Miryang - From a perspective of environmental justice. Economy and Society, 98, 40-76.
이화연·윤순진 (2013). 밀양 고압 송전선로 건설 갈등에 대한 일간지 보도 분석: 환경정의 관점에서. <경제와사회>, 98호, 40-76.
Lee, J. (1999). Visual representation in television news: An analysis of the relationship between visual images and verbal texts. Korean Journal of Broadcasting and Telecommunication Studies, 12, 219-252.
이종수 (1999). 텔레비전 뉴스영상 구성. <한국방송학보>, 12호, 219-252.
Lee, J., & Choi, Y. (2017). Network analyses of attention to deviance and social significance based on gene and culture co-evolution theory. In C. M. Liebler & T. P. Vos (Eds.), Media scholarship in a transitional age (pp. 175-191). New York, NY: Peter Lang.
McLeod, D. M., & Hertog, J. K. (1992). The manufacture of public opinion by reporters: informal cues for public perceptions of protest groups. Discourse & Society, 3(3), 259-275. [https://doi.org/10.1177/0957926592003003001]
Mourão, R. R., Brown, D. K., & Sylvie, G. (2021). Framing Ferguson: The interplay of advocacy and journalistic frames in local and national newspaper coverage of Michael Brown. Journalism, 22(2), 320-340. [https://doi.org/10.1177/1464884918778722]
Mutikani, L. (2024, February 22). US labor strikes jump to 23-year high in 2023. Reuters. Retrieved 11/20/23 from https://www.reuters.com/world/us/us-labor-strikes-jump-23-year-high-2023-2024-02-21/
Newhagen, J. E. (1998). TV news images that induce anger, fear, and disgust : Effects on apprach-avoidance and memory. Journal of Broadcasting & Electronic Media, 42(2), 265-276. [https://doi.org/10.1080/08838159809364448]
Newhagen, J., & Reeves, B. (1992). The evening's bad news: Effects of compelling negative television news images on memory. Journal of Communication, 42(2), 25-41. [https://doi.org/10.1111/j.1460-2466.1992.tb00776.x]
Oh, I. (2022). Computer vision & deep learning. Seoul: Hanbit Academy.
오일석 (2022). <컴퓨터 비전과 딥러닝>. 서울: 한빛아카데미.
Ohman, A. (2000). Fear and anxiety: Evolutionary, cognitive and clinical perspectives. In M. Lewis & J. M. Haviland (Eds.), Handbook of emotions (pp. 573-593). New York, NY: Guilford Press.
Park, D. (2022). A study on the applicability of media videos of deep learning models related to computer vision. Communication Theories, 18(1), 111-154.
박대민 (2022). 미디어 인공지능: 컴퓨터 비전 관련 딥러닝 모델의 미디어 동영상 분야 적용 가능성에 관한 연구. <커뮤니케이션이론>, 18권 1호, 111-154. [ https://doi.org/10.20879/ct.2022.18.1.111 ]
Park, N., & Kim, S. (2022). How do vision transformers work? arXiv preprint arXiv:2202.06709.
Rendón-Segador, F. J., Álvarez-García, J. A., Salazar-González, J. L., & Tommasi, T. (2023). Crimenet: Neural structured learning using vision transformer for violence detection. Neural Networks, 161, 318-329. [https://doi.org/10.1016/j.neunet.2023.01.048]
Rodriguez, L., & Dimitrova, D. V. (2011). The levels of visual framing. Journal of Visual Literacy, 30(1), 48-65. [https://doi.org/10.1080/23796529.2011.11674684]
Seol, J. (2007). Fundamentals of broadcast production. Seoul: Communicationbooks.
설진아 (2007). <방송기획제작의 기초>. 서울: 커뮤니케이션북스.
Shin, J. G. (2023). May Day 2023: World Federation of Trade Unions (WFTU) statement. Situations & Labor, 192, 115-117.
신재길 (2023). 2023년 노동절: 세계노동조합연맹(WFTU) 성명. <정세와노동>, 192호, 115-117.
Shoemaker, P. J. (1996). Hardwired for news: Using biological and cultural evolution to explain the surveillance function. Journal of Communication, 46(3), 32-47. [https://doi.org/10.1111/j.1460-2466.1996.tb01487.x]
Shoemaker, P. J., & Cohen, A. A. (2006). News around the world: Content, practitioners, and the public. New York, NY: Routledge.
Shoemaker, P. J., & Reese, S. D. (1996). Mediating the message: Theories of influences on mass media content. White Plains, NY: Longman.
Shoemaker, P. J., Danielian, L. H., & Brendlinger, N. (1991). Deviant acts, risky business and U.S. interests: The newsworthiness of world events. Journalism Quarterly, 68(4), 781-795. [https://doi.org/10.1177/107769909106800419]
Smirnov, R. (2022, October 3). Comparing ViT and EfficientNet in terms of image classification problems. Medium. Retrieved 11/20/23 from https://medium.com/exness-blog/comparing-vit-and-efficientnet-in-terms-of-image-classification-problems-605dfdd843c7
Smith, J., McCarthy, J. D., McPhail, C., & Augustyn, B. (2001). From protest to agenda building: Description bias in media coverage of protest events in Washington, D.C. Social Forces, 79(4), 1397-1423. [https://doi.org/10.1353/sof.2001.0053]
Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., & Beyer, L. (2021). How to train your vit? Data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270.
Tuchman, G. (1973). Making news by doing work: Routinizing the unexpected. American Journal of Sociology, 79(1), 110-131. [https://doi.org/10.1086/225510]
Wikidocs (2023a). Deep Learning Bible – 2. Classification U_01. Understanding of vision transformer. Retrieved from 11/20/23 https://wikidocs.net/164842
위키독스 (2023a). Deep learning bible – 2. Classification U_01. Understanding of vision transformer.
Wikidocs (2023b). Encyclopedia of deep learning computer vision 2.3.4. Vision transformer. Retrieved 11/20/23 from https://wikidocs.net/137253
위키독스 (2023b). 한땀한땀 딥러닝 컴퓨터 비전 백과사전 2.3.4. Vision transformer.
Wischmann, L. (1987). Dying on the front page: Kent State and the Pulitzer Prize. Journal of Mass Media Ethics, 2(2), 67-74. [https://doi.org/10.1080/08900528709358296]
Yang, J.-H. (2001). Media framing of a social conflict - A case study of medical doctors' strike in Korea. Korean Journal of Journalism & Communication Studies, 45(2), 284-315.
양정혜 (2001). 사회갈등과 의미구성하기. <한국언론학보>, 45권 2호, 248-315.
Yoon, H. Y. (2021). From human coding to automated detection: Detecting visual images of female body objectification and sexualized poses from TV music programs using YOLO4 and MediaPipe. Korean Journal of Journalism & Communication Studies, 65(6), 452-481.
윤호영 (2021). 사람에서 컴퓨터 자동화로의 연결을 위한 탐색: 객체 인식(Object Detection) 딥러닝 알고리즘 YOLO4, 자세 인식(Pose Detection) 프레임워크 MediaPipe를 활용한 음악 프로그램의 여성 신체 대상화, 선정적 화면 검출 연구. <한국언론학보>, 65권 6호, 452-481. [ https://doi.org/10.20879/kjjcs.2021.65.6.011 ]
Zettl, H. (2016). Sight, sound, motion: Applied media aesthetics (8^th ed.). Boston, MA: Cengage Learning. 박덕춘 (역) (2016). <영상 제작의 미학적 원리와 방법>. 서울: 커뮤니케이션북스.
Zillmann, D. (2002). Exemplification theory of media influence. In J. Bryant & D. Zillmann (Eds.), Media effects: Advances in theory and research (2^nd ed., pp. 19-42). Mahwah, NJ: Lawrence Elbaum Associates.