Skip to content

PR202111/Computer_vision

Repository files navigation

Computer Vision Architectures – From Scratch Implementations

Overview

This repository is a structured exploration of classical and modern computer vision architectures implemented in PyTorch.

The goal of this project is to deeply understand the evolution of convolutional and attention-based models — from AlexNet to Vision Transformers — by implementing them modularly and analyzing their architectural trade-offs.


Architectures Implemented

  • AlexNet
  • VGG
  • ResNet
  • DenseNet
  • GoogLeNet (Inception v1)
  • MobileNet
  • SqueezeNet
  • Vision Transformer (ViT)

Additional Modules

  • Convolution operations (from scratch implementation)
  • Regularization techniques
  • Transfer learning strategies

Research Motivation

This repository was created to:

  • Understand architectural innovations in deep learning
  • Analyze parameter efficiency vs performance trade-offs
  • Compare convolution-based and attention-based approaches
  • Explore generalization techniques in deep neural networks

Key Insights from Implementation

  • Residual connections mitigate vanishing gradients
  • Dense connectivity encourages feature reuse
  • Depthwise separable convolutions reduce computational cost
  • Transformers remove spatial locality bias but require larger datasets
  • Regularization techniques significantly improve generalization

Tech Stack

  • Python
  • PyTorch
  • NumPy

Future Work

  • Hybrid CNN-Transformer architectures
  • 3D CNNs for hyperspectral imagery
  • Vision-Language Models (VLMs)
  • Self-Supervised Pretraining methods

This repository was created as a structured study of deep learning architectures to understand their mathematical foundations and architectural evolution.

About

my learning on Computer vision with different models and algorithms

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors