Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning

Proceedings of the IEEE:1-6 (forthcoming)
  Copy   BIBTEX

Abstract

This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-many mapping of video data to text captions. The many-to-many mapping occurs via an input temporal sequence of video frames to an output sequence of words to form a caption sentence. Data preprocessing, model construction, and model training are discussed. Caption correctness is evaluated using 2-gram BLEU scores across the different splits of the dataset. Specific examples of output captions were shown to demonstrate model generality over the video temporal dimension. Predicted captions were shown to generalize over video action, even in instances where the video scene changed dramatically. Model architecture changes are discussed to improve sentence grammar and correctness.

Links

PhilArchive

External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

The short-term/long-term memory distinction: Back to the past?Giuseppe Vallar - 2003 - Behavioral and Brain Sciences 26 (6):757-758.
Short-term prediction of parking availability in an open parking lot.Vijay Paidi - 2022 - Journal of Intelligent Systems 31 (1):541-554.

Analytics

Added to PP
2023-10-05

Downloads
204 (#99,624)

6 months
149 (#23,826)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Tosin Ige
University of Texas at El Paso

Citations of this work

Evaluating the level of Press Freedom in Modern Nigeria.Ajijola Samuel - forthcoming - International Journal of Research and Innovation in Social Sciences.
Comparative Analysis of Deep Learning and Naïve Bayes for Language Processing Task.Olalere Abiodun - forthcoming - International Journal of Research and Innovation in Applied Sciences.

Add more citations

References found in this work

Data Mining in the Context of Legality, Privacy, and Ethics.Amos Okomayin, Tosin Ige & Abosede Kolade - 2023 - International Journal of Research and Innovation in Applied Science 10 (Vll):10-15.
Adversarial Sampling for Fairness Testing in Deep Neural Network.Tosin Ige, William Marfo, Justin Tonkinson, Sikiru Adewale & Bolanle Hafiz Matti - 2023 - International Journal of Advanced Computer Science and Applications 14 (2).

Add more references