• Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Tsahy Shapsa, Co-Founder & Co-CEO at Jit – Cybersecurity Interviews

March 29, 2023

CMU Researchers Introduce Zeno: A Framework for Behavioral Analysis of Machine Studying (ML) Fashions

March 29, 2023

Mastering the Artwork of Video Filters with AI Neural Preset: A Neural Community Strategy

March 29, 2023
Facebook Twitter Instagram
The AI Today
Facebook Twitter Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
The AI Today
Home»Machine-Learning»Meet MAGVIT: A Novel Masked Generative Video Transformer To Handle AI Video Era Duties
Machine-Learning

Meet MAGVIT: A Novel Masked Generative Video Transformer To Handle AI Video Era Duties

By January 22, 2023Updated:January 22, 2023No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Synthetic intelligence fashions are not too long ago changing into very highly effective because of the enhance within the dataset dimension used for the coaching course of and in computational energy essential to run the fashions. 

This increment in assets and mannequin capabilities often results in a better accuracy than smaller architectures. Small datasets additionally influence the efficiency of neural networks equally, given the small pattern dimension in comparison with the information variance or unbalanced class samples.

Whereas the mannequin capabilities and accuracy rise, in these instances, the duties carried out are restricted to only a few and particular ones (as an illustration, content material technology, picture inpainting, picture outpainting, or body interpolation). 

A novel framework known as MAsked Generative VIdeo Transformer,

MAGVIT (MAGVIT), together with ten totally different technology duties, has been proposed to beat this limitation.

As reported by the authors, MAGVIT was developed to handle Body Prediction (FP), Body Interpolation (FI), Central Outpainting (OPC), Vertical Outpainting (OPV), Horizontal Outpainting (OPH), Dynamic Outpainting (OPD), Central Inpainting (IPC), and Dynamic Inpainting (IPD), Class-conditional Era (CG), Class-conditional Body Prediction (CFP).

The overview of the structure’s pipeline is offered within the determine under.

Supply: https://arxiv.org/pdf/2212.05199.pdf

In a nutshell, the thought behind the proposed framework is to coach a transformer-based mannequin to retrieve a corrupted picture.  The corruption is right here modeled as masked tokens, which check with parts of the enter body.

Particularly, MAGVIT fashions a video as a sequence of visible tokens within the latent house and learns to foretell masked tokens with BERT (Bidirectional Encoder Representations from Transformers), a transformer-based machine studying strategy initially designed for pure language processing (NLP).

There are two fundamental modules within the proposed framework. 

First, vector embeddings (or tokens) are produced by 3D vector-quantized (VQ) encoders, which quantize and flatten the video right into a sequence of discrete tokens. 

2D and 3D convolutional layers are exploited along with 2D and 3D upsampling or downsampling layers to account for spatial and temporal dependencies effectively.

The downsampling is carried out by the encoder, whereas the upsampling is carried out within the decoder, whose objective is to reconstruct the picture represented by the vector token offered by the encoder.

Second, a masked token modeling (MTM) scheme is exploited for multitask video technology. 

In contrast to typical MTM in picture/video synthesis, an embedding methodology is proposed to mannequin a video situation utilizing a multivariate masks.

The multivariate masking scheme facilitates studying for video technology duties with totally different circumstances. 

The circumstances is usually a spatial area for inpainting/outpainting or a couple of frames for body prediction/interpolation.

The output video is generated based on the masked conditioning token, refined at every step after prediction is carried out.

Primarily based on reported experiments, the authors of this analysis declare that the proposed structure establishes the best-published FVD (Fréchet Video Distance) on three video technology benchmarks. 

Moreover, based on their outcomes, MAGVIT outperforms present strategies in inference time by two orders of magnitude towards diffusion fashions and by 60× towards autoregressive fashions.

Lastly, a single MAGVIT mannequin has been developed to assist ten various technology duties and generalize throughout movies from totally different visible domains.

Within the determine under, some outcomes are reported regarding class-conditioning pattern technology in comparison with state-of-the-art approaches. For the opposite duties, please check with the paper.

Supply: https://arxiv.org/pdf/2212.05199.pdf

This was the abstract of MAGVIT, a novel AI framework to handle numerous video technology duties collectively. If you’re , you could find extra info within the hyperlinks under.


Take a look at the Paper and Challenge. All Credit score For This Analysis Goes To the Researchers on This Challenge. Additionally, don’t neglect to affix our Reddit Web page, Discord Channel, and Electronic mail E-newsletter, the place we share the newest AI analysis information, cool AI tasks, and extra.



Daniele Lorenzi acquired his M.Sc. in ICT for Web and Multimedia Engineering in 2021 from the College of Padua, Italy. He’s a Ph.D. candidate on the Institute of Info Know-how (ITEC) on the Alpen-Adria-Universität (AAU) Klagenfurt. He’s at the moment working within the Christian Doppler Laboratory ATHENA and his analysis pursuits embrace adaptive video streaming, immersive media, machine studying, and QoS/QoE analysis.


Related Posts

CMU Researchers Introduce Zeno: A Framework for Behavioral Analysis of Machine Studying (ML) Fashions

March 29, 2023

Databricks Open-Sources Dolly: A ChatGPT like Generative AI Mannequin that’s Simpler and Quicker to Practice

March 29, 2023

Can Synthetic Intelligence Match Human Creativity? A New Examine Compares The Technology Of Authentic Concepts Between People and Generative Synthetic Intelligence Chatbots

March 28, 2023

Leave A Reply Cancel Reply

Trending
Interviews

Tsahy Shapsa, Co-Founder & Co-CEO at Jit – Cybersecurity Interviews

By March 29, 20230

Tsahy Shapsa is the Co-Founder & Co-CEO at Jit, a platform that that allows simplifying…

CMU Researchers Introduce Zeno: A Framework for Behavioral Analysis of Machine Studying (ML) Fashions

March 29, 2023

Mastering the Artwork of Video Filters with AI Neural Preset: A Neural Community Strategy

March 29, 2023

Databricks Open-Sources Dolly: A ChatGPT like Generative AI Mannequin that’s Simpler and Quicker to Practice

March 29, 2023
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Tsahy Shapsa, Co-Founder & Co-CEO at Jit – Cybersecurity Interviews

March 29, 2023

CMU Researchers Introduce Zeno: A Framework for Behavioral Analysis of Machine Studying (ML) Fashions

March 29, 2023

Mastering the Artwork of Video Filters with AI Neural Preset: A Neural Community Strategy

March 29, 2023

Databricks Open-Sources Dolly: A ChatGPT like Generative AI Mannequin that’s Simpler and Quicker to Practice

March 29, 2023

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

Demo

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Tsahy Shapsa, Co-Founder & Co-CEO at Jit – Cybersecurity Interviews

March 29, 2023

CMU Researchers Introduce Zeno: A Framework for Behavioral Analysis of Machine Studying (ML) Fashions

March 29, 2023

Mastering the Artwork of Video Filters with AI Neural Preset: A Neural Community Strategy

March 29, 2023
Trending

Databricks Open-Sources Dolly: A ChatGPT like Generative AI Mannequin that’s Simpler and Quicker to Practice

March 29, 2023

Can Synthetic Intelligence Match Human Creativity? A New Examine Compares The Technology Of Authentic Concepts Between People and Generative Synthetic Intelligence Chatbots

March 28, 2023

Nvidia Open-Sources Modulus: A Recreation-Altering Bodily Machine Studying Platform for Advancing Bodily Synthetic Intelligence Modeling

March 28, 2023
Facebook Twitter Instagram YouTube LinkedIn TikTok
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms
  • Advertise
  • Shop
Copyright © MetaMedia™ Capital Inc, All right reserved

Type above and press Enter to search. Press Esc to cancel.