PinnedThanks for the appreciation, Its surreal for me to get acknowledged from the author itself.Feb 1Feb 1
Papers Explained 257: NougatNougat (Neural Optical Understanding for Academic Documents) is a Visual Transformer model that performs an Optical Character Recognition…Just nowJust now
Papers Explained 256: DePlotThis paper presents the first few(one)- shot solution to visual language reasoning. It proposes to decompose visual language reasoning into…1d ago1d ago
Papers Explained 255: MatchaMatcha (Math reasoning and Chart derendering pretraining) propose several pre-training tasks that cover plot deconstruction and numerical…2d ago2d ago
Papers Explained 254: Pix2StructPix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing…3d ago3d ago
Papers Explained 253: SPADEInformation Extraction (IE) for semistructured document images is often approached as a sequence tagging problem by classifying each…6d ago6d ago
Papers Explained 252: Nemotron-Mini-HindiNemotron-Mini-Hindi is a 4B bilingual SLM supporting both Hindi and English, based on Nemotron-Mini 4B. The model emphasize the importance…Nov 14Nov 14
Papers Explained 251: H2OVL-MississippiH2O VL Mississippi is a collection of smaller vision-language models, including H2OVL-Mississippi-0.8B and H2OVL-Mississippi-2B. These…Nov 13Nov 13
Papers Explained 250: DINO v2This work demonstrates that existing pre-training methods, especially self-supervised methods, can produce general purpose visual features…Nov 12Nov 12
Papers Explained 249: DINOThis paper investigates whether self-supervised learning enhances Vision Transformer performance compared to convolutional networks…Nov 11Nov 11