Python Face Recognition and Face Detection

Build Your Own Face Recognition Tool With Python

Table of Contents

Project Overview

Prerequisites, step 1: prepare your environment and data, step 2: load training data and train your model, step 3: recognize unlabeled faces, step 4: display results, step 5: validate your model, step 6: add command-line arguments, step 7: perform face recognition with python.

Do you have a phone that you can unlock with your face? Have you ever wondered how that works? Have you ever wanted to build your own face recognizer? With Python, some data, and a few helper packages, you can create your very own. In this project, you’ll use face detection and face recognition to identify faces in a given image.

In this tutorial, you’ll build your own face recognition tool using:

  • Face detection to find faces in an image
  • Machine learning to power face recognition for given images
  • Command-line arguments to direct your application with argparse
  • Bounding boxes to label faces with the help of Pillow

With these techniques, you’ll gain a solid foundation in computer vision . After implementing your knowledge in this project, you’ll be ready to apply these techniques in solving real-world problems beyond face recognition.

Click the link below to download the complete source code for this project:

Free Bonus: Click here to download the full source code to build your own face recognition app with Python.

When you’re done with this project, you’ll have a face recognition application that you can train on any set of images. Once it’s trained, you’ll be able to give your application a new image, and the app will draw boxes on any faces that it finds and label each face by name:

In this video, you saw this project in action: training a new model on a list of images, validating it against an image with known faces, and then testing it with a brand-new image. After finishing this tutorial, you’ll have your very own application that works just like this.

Your program will be a typical command-line application, but it’ll offer some impressive capabilities. To accomplish this feat, you’ll first use face detection , or the ability to find faces in an image. Then, you’ll implement face recognition , which is the ability to identify detected faces in an image. To that end, your program will do three primary tasks:

  • Train a new face recognition model.
  • Validate the model.
  • Test the model.

When training , your face recognizer will need to open and read many image files. It’ll also need to know who appears in each one. To accomplish this, you’ll set up a directory structure to give your program information about the data. Specifically, your project directory will contain three data directories:

  • validation/

You can put images directly into validation/ . For training/ , you should have images separated by subject into directories with the subject’s name.

Setting your training directory up this way will allow you to give your face recognizer the information that it needs to associate a label—the person pictured—with the underlying image data.

Note: This strategy works well for training on images that contain a single face. If you want to train on images with multiple identifiable faces, then you’ll have to investigate an alternative strategy for marking the faces in the training images.

You’ll walk through this project step by step, starting with preparing your environment and data. After that, you’ll be ready to load your training data and get to work on training your model to recognize unlabeled faces.

Once your app is able to do that, you’ll need a way to display your results. You’ll build a command-line interface so that users can interact with your app.

Finally, you’ll run the app through all of its paces. This is of vital importance because it’ll help you see your application through the eyes of a user. That way, you can better understand how your application works in practice, a process that’s key to finding bugs.

To build this face recognition application, you won’t need advanced linear algebra, deep machine learning algorithm knowledge, or even any experience with OpenCV, one of the leading Python libraries enabling a lot of computer vision work.

Instead, you should have an intermediate-level understanding of Python. You should be comfortable with:

  • Installing third-party modules with pip
  • Using argparse to create a command-line interface
  • Opening and reading files with pathlib
  • Serializing and deserializing Python objects with pickle

With these skills in hand, you’ll be more than ready to start on step one of this project: preparing your environment and data.

In this step, you’ll create a project environment, install necessary dependencies, and set the stage for your application.

First, create your project and data directories:

  • Linux + macOS

Running these commands creates a directory called face_recognizer/ , moves to it, then creates the folders output/ , training/ , and validation/ , which you’ll use throughout the project. Now you can create a virtual environment using the tool of your choice.

Before you start installing this project’s dependencies with pip , you’ll need to ensure that you have CMake and a C compiler like gcc installed on your system. If your system doesn’t already have them installed, then follow these instructions to get started:

To install CMake on Windows, visit the CMake downloads page and install the appropriate installer for your system.

You can’t get gcc as a stand-alone download for Windows, but you can install it as a part of the MinGW runtime environment through the Chocolatey package manager with the following command:

To install CMake on Linux, visit the CMake downloads page and install the appropriate installer for your system. Alternatively, CMake binaries may also be available through your favorite package manager. If you use apt package management , for example, then you can install CMake with this:

You’ll also install gcc through your package manager. To install gcc with apt , you’ll install the build-essential metapackage:

To verify that you’ve successfully installed gcc, you can check the version:

If this returns a version number, then you’re good to go!

To install CMake on macOS, visit the CMake downloads page and install the appropriate installer for your system. If you have Homebrew installed, then you can install both CMake and gcc that way:

After following these steps for your operating system, you’ll have Cmake and gcc installed and ready to assist you in building your project.

Now open your favorite text editor to create your requirements.txt file:

This tells pip which dependencies your project will be using and pins them to these specific versions. This is important because future versions could have changes to their APIs that break your code. When you specify the versions needed, you have full control over what versions are compatible with your project.

Note: This project was built on Python 3.9 and also tested on 3.10. Because some of the packages used in this tutorial still use the legacy setup.py installation method, you may run into issues if you use 3.11.

After creating the requirements file and activating your virtual environment, you can install all of your dependencies at once:

This command calls pip and tells it to install the dependencies in the requirements.txt file that you just created.

Next, you’ll need to find a dataset for training and validating your data. Celebrity images are a popular choice for testing face recognition because so many celebrity headshots are widely available. That’s the approach that you’ll take in this tutorial.

If you haven’t already, you can download everything you need for data training and validation by clicking the link below:

As an alternative, it can be great practice to set up your own dataset and folder structure. If you’d like to give that a try, then you can use this dataset or pictures of your own.

If your dataset isn’t already split into training and validation sets , then you should go ahead and make that split now.

In the training/ directory, you should create a separate folder for each person who appears in your training images. Then you can put all the images into their appropriate folders:

You can place the validation images directly into the validation/ directory. Your validation images need to be images that you don’t train with, but you can identify the people who appear in them.

In this step, you’ve prepared your environment. First, you created a directory and several subdirectories to house your project and its data.

Then you created a virtual environment, installed some dependencies manually, and then created a requirements.txt file with your project dependencies pinned to a specific version.

With that, you used pip to install your project dependencies. Then, you downloaded a dataset and split it into training and validation sets. Next, you’ll write the code to load the data and train your model.

In this step, you’ll start writing code. This code will load your training data and start training your model. By the end of this step, you’ll have loaded your training data, detected faces in each image, and saved them as encodings.

First, you’ll need to load images from training/ and train your model on them. To do that, open your favorite editor, create a file called detector.py , and start writing some code:

You start your script by importing pathlib.Path from Python’s standard library, along with face_recognition , a third-party library that you installed in the previous step.

Then, you define a constant for the default encoding path. Keeping this path as a constant toward the top of your script will help you down the line if you want to change that path.

Next, you add three calls to .mkdir() and set exist_ok to True . You may not need these lines of code if you already created the three directories in the previous step. However, for convenience, this code automatically creates all the directories that you’ll use if they don’t already exist.

Finally, you define encode_known_faces() . This function uses a for loop to go through each directory within training/ , saves the label from each directory into name , then uses the load_image_file() function from face_recognition to load each image.

As input, encode_known_faces() will require a model type and a location to save the encodings that you’ll generate for each image.

Note: You’re not using the required arguments yet, but you’ll add more code to encode_known_faces() that relies on these arguments in just a moment.

The model determines what you’ll use to locate faces in the input images. Valid model type choices are "hog" and "cnn" , which refer to the respective algorithms used:

  • HOG (histogram of oriented gradients) is a common technique for object detection. For this tutorial, you only need to remember that it works best with a CPU .
  • CNN (convolutional neural network) is another technique for object detection. In contrast to a HOG, a CNN works better on a GPU , otherwise known as a video card.

These algorithms don’t rely on deep learning. If you’d like to learn more about how algorithms like these work under the hood, then Traditional Face Detection With Python is your guide.

Next, you’ll use face_recognition to detect the face in each image and get its encoding . This is an array of numbers describing the features of the face, and it’s used with the main model underlying face_recognition to reduce training time while improving the accuracy of a large model. This is known as transfer learning .

Then, you’ll add all the names and encodings to the lists names and encodings , respectively:

After updating your project with this code, your encode_known_faces() function is ready to collect names and encodings from all the files in your training/ directory:

Line 15 uses face_recognition.face_locations() to detect the locations of faces in each image. The function returns a list of four-element tuples, one tuple for each detected face. The four elements per tuple provide the four coordinates of a box that could surround the detected face. Such a box is also known as a bounding box .

Line 16 uses face_recognition.face_encodings() to generate encodings for the detected faces in an image. Remember that an encoding is a numeric representation of facial features that’s used to match similar faces by their features.

Lines 18 to 20 add the names and their encodings to separate lists.

Now you’ve generated encodings and added them, along with the label for each image, to a list. Next, you’ll combine them into a single dictionary and save that dictionary to disk.

Note: You’re saving your encodings to disk because generating them can be time-consuming, especially if you don’t have a dedicated GPU. Once they’re generated, saving them allows you to reuse the encodings in other parts of your code without re-creating them every time.

Import pickle from the standard library and use it to save the name-encoding dictionary:

With this addition to encode_known_faces() , you create a dictionary that puts the names and encodings lists together and denotes which list is which. Then, you use pickle to save the encodings to disk.

Finally, you add a call to encode_known_faces() at the end so that you can test whether it works. You can now run your script to confirm that it creates your encodings:

After some time, your script should finish execution, having created a file called encodings.pkl in your output/ directory. Well done, you’ve completed this step!

In this section, you created the encode_known_faces() function, which loads your training images, finds the faces within the images, and then creates a dictionary containing the two lists that you created with each image.

You then saved that dictionary to disk so that you could reuse the encodings. Now you’re ready to deal with unlabeled faces!

In this step, you’ll build the recognize_faces() function, which recognizes faces in images that don’t have a label.

First, you’ll open the encodings that you saved in the previous step and load the unlabeled image with face_recognition.load_image_file() :

After adding this code, your recognize_faces() function will now be able to open and load the saved face encodings using pickle and then load the image in which you want to recognize faces. This is also known as your test image .

You’ll pass the location of the unlabeled image, the model you want to use for face detection, and the location of the saved encodings from the previous step to this function. Then you’ll open the encodings file and load the data with pickle . You’ll also load the image with face_recognition.load_image_file() and assign the output to input_image .

You’ll use face_recognition to find the face in input_image and get its encoding:

Your recognize_faces() function has just gotten more interesting. With these lines of code, you can detect faces in your input image and get their encodings, which will aid your code in identifying the faces.

Now you’ll use the encoding of the detected face to make a comparison with all of the encodings that you found in the previous step. This will happen within a loop so that you can detect and recognize multiple faces in your unknown image:

In this additional code, you iterate through input_face_locations and input_face_encodings in parallel using zip() . Then, you call the non-public function _recognize_face() , passing the encodings for the unknown image and the loaded encodings. This function doesn’t yet exist, but you’ll build it in just a moment.

Note: Looping over two iterables at the same time using Python’s zip() function is called parallel iteration . This comes in handy when you need to do an operation on all the elements of two lists at the same time.

You also add a conditional statement that assigns "Unknown" to name if _recognize_face() doesn’t find a match. Finally, you print name and the coordinates of the identified face that are saved in bounding_box .

Before you can run recognize_faces() , you’ll need to implement _recognize_face() . This helper function will take the unknown and loaded encodings. It’ll make a comparison between the unknown encoding and each of the loaded encodings using compare_faces() from face_recognition . Ultimately, _recognize_face() will return the most likely match, or it’ll implicitly return None if the function exits without reaching a return statement :

You’ve now created _recognize_face() , which does the hard work of identifying each face in the given image. In this function, you call compare_faces() to compare each unknown encoding in your test image with the encodings that you loaded previously.

The compare_faces() function returns a list of True and False values for each loaded encoding. The indices of this list are equal to those of the loaded encodings, so the next thing you do is keep track of votes for each possible match.

You do this with Counter , which you imported from collections at the top of your script. Using Counter allows you to track how many votes each potential match has by counting the True values for each loaded encoding by the associated name. You then return the name that has the most votes in its favor.

But what’s a vote, and who’s voting? Think back to the first function that you wrote in this tutorial, where you generated encodings for a bunch of training images of celebrities’ faces.

When you call compare_faces() , your unknown face is compared to every known face that you have encodings for. Each match acts as a vote for the person with the known face. Since you should have multiple images of each known face, a closer match will have more votes than one that isn’t as close a match.

Finally, outside of the function definition, you add a call to recognize_faces() to test that it’s working as expected.

Note: Remember to remove the call to encode_known_faces() that you previously used to create the encodings. Unless you change the training data, you won’t have to run this function again, and it would unnecessarily use computing time.

In its current state, recognize_faces() fetches the encodings that you created in step two and compares them to the encodings that it generates on an input image. It does that for all the faces that it can find in an image.

For example, if you download the example code for step three, then you’ll find an image called unknown.jpg that shows two characters from the American sitcom Seinfeld :

Recall that at the end of the last snippet, you added a test call to recognize_faces() with the parameter "unknown.jpg" . If you use that image, then running detector.py should give you output like this:

Your script will recognize only one of the two people shown in the image because you only included one of the two characters’ faces in the training data. Python will label any face that the script locates but can’t identify from the encoding that you generated in the previous step as "Unknown" . Try it out with some other images!

Now that you’ve gotten the prediction for your image, you’ll extend this function to show it to the user. One way to do this is to display the results on the input image itself. This has the bonus of being clear for the user and requiring little extra work on their part.

Now comes the time to draw on your input image! This will help the user see which face is being identified and what it’s being identified as.

A popular technique is to draw a bounding box around the face and give it a label. To do this, you’ll use Pillow , a high-powered image processing library for Python.

Note: Are you interested in image processing with Python? If so, check out the Real Python podcast interview with Mike Driscoll.

For now, just load the image into Pillow and create an ImageDraw object in the recognize_faces() function:

Here, you start by adding three lines of code that set up the ability to draw on an existing image:

  • Line 3 at the top of your script imports the Image and ImageDraw modules from PIL .
  • Line 15 creates a Pillow image object from your loaded input image.
  • Line 16 creates an ImageDraw object, which will help you draw a bounding box around detected faces.

Next, within the for loop in recognize_faces() , you remove the print() call from step three , and in line 25, you make a call to another new helper function, this one named _display_face() .

Finally, you add some housekeeping that Pillow requires. You manually remove the draw object from the current scope with the del statement in line 27. Then you show the image by calling .show() in line 28.

Next, you’ll implement the _display_face() function, which will draw a bounding box on the recognized face and add a caption to that bounding box with the name of the identified face, or Unknown if it doesn’t match any known face.

To do this, _display_face() will need to take as parameters the ImageDraw object, the tuple of points that define a square area around a recognized face, and the name that you got from _recognize_face() :

You start by creating two constants near the top of your script and assigning them to two common HTML color names , "blue" and "white" . You then use these constants multiple times in _display_face() . Defining them as constants means that you’ll have less maintenance effort if you want to change the colors later on.

Then, in the first line of your new helper function, you unpack the bounding_box tuple into its four parts: top , right , bottom , and left . You use these coordinates in the next line to draw a rectangle around the recognized face using the .rectangle() method in ImageDraw .

The next step is to determine the bounding box for the text caption. You do this with .textbbox() , which takes a pair of anchor coordinates and the caption text as parameters and returns the four coordinates of a bounding box that fits the caption.

The anchor is a coordinate tuple of where you want the box to start. Because you read English left to right, and captions are typically on the bottom, you use the left and bottom coordinates of the face’s bounding box as the anchor for your caption box.

Next, you draw another rectangle, but for this one, you define the rectangle with the bounding box coordinates that you got in the previous line. You also color in the rectangle by using the fill parameter. This second rectangle serves as the caption area directly under the bounding box that surrounds the recognized face.

And last, you call .text() on the ImageDraw object to write the name in the caption box that you just drew. You use the fill parameter again, but in this case, it determines the color of the text.

After you define _display_face() , your recognize_faces() function is complete. You just wrote the backbone of your project, which takes an image with an unknown face, gets its encoding, checks that against all the encodings made during the training process, and then returns the most likely match for it.

You can now use this function when you want to recognize an unknown face. If you run your script at the end of this step, then Python will display the image for you with the predictions of who’s in the image baked right into the image:

Two characters from the TV show Seinfeld, labeled by the face recognizer

The next step is to validate your model to ensure that your model isn’t overfitted or tuned too specifically to the training data.

Model validation is a technique that tests your trained model by providing data that it hasn’t seen before but that you have. Knowing the correct label for each image allows you to get an idea of your model’s performance on new data.

At the most basic level, you’re just running your recognize_faces() function on images that already contain a known face. In step one , you created a validation directory that contains images with faces that you can recognize.

The function that you’ll build next will use pathlib to open each of the validation images and then call recognize_faces() on them:

In line 6, you open the validation/ directory with pathlib.Path and then use .rglob() to get all the files in that directory . You confirm that the resource is a file in line 7. Then, in lines 8 to 10, you call the recognize_faces() function from step three on the current image file.

Finally, in line 13, you add a call to validate() so that you can test your script. If you run detector.py now, then Python will make all the images from within validate/ pop up with predictions baked right into the images:

A more robust validation could include accuracy measures and visualizations, such as a confusion matrix showing the true positives, true negatives, false positives, and false negatives from your validation run.

How else could you extend this? In addition to the traditional confusion matrix, you could calculate model evaluation measures such as overall accuracy and true positive rate, also known as recall .

Once you’ve built your validation function, it’s time to tie your app together and make it user-friendly.

To make sure that users can access your app’s functionality, you’ll build a command-line interface for your script using the standard library’s argparse module. Think about the types of tasks that you think your users might want to do before reading on.

What did you come up with? Maybe your user will want to:

  • Train the model
  • Validate the model
  • Evaluate an unknown image
  • Pick a model to use
  • Provide the filename of an unlabeled image

First, use argparse to set up the input arguments for each of these activities at the top of your file:

Here you import argparse at the top of your script. Then, starting in line 15, you create a few Boolean arguments, a limited-choice argument for picking the model used for training, and a string argument for getting the filename of the image that you want to check.

Next, you’ll use these arguments in the main part of your script to call the correct functions and pass in the correct arguments. The arguments passed to the script from the user are all attributes in the args variable that you created on line 33.

You’ll then set up a name-main idiom at the bottom of your script, and use the attributes of args there:

And with that, you’re ready for the final step: play! Save your script and run it, testing out the options that you set up with argparse , including --help . You didn’t set that one up yourself, but argparse builds a nice help menu from all of the help parameters that you passed to .add_argument() .

In this step, you made your code more user-friendly by adding command-line arguments to create a simple user interface that allows your users to easily interact with your code.

Now that you’ve built your project, it’s time to actually perform face recognition. You might have saved and played with your program already, but it’s always worthwhile to take it for another spin. That way, you can diagnose bugs, uncover different uses, and more. Watch the video below for a short guided tour through your new project:

This video shows you all of the options that you can use to interact with your face recognizer. These are:

  • --help will show you a list of options, a description of what each of them does, and any arguments that they take.
  • --train will start the training process. You can optionally specify whether to use the CPU-based HOG method or a GPU-based CNN.
  • --validate will run the validation process, where the model takes images with known faces and tries to identify them correctly.
  • --test is the option that you’ll probably use the most. Use this along with the -f option to specify the location of an image with unknown faces that you want to identify. Under the hood, this works the same as validation except that you specify the image location yourself.

Before your first use, you’ll want to train the model with your training images. This will allow your model to be especially good at identifying those particular faces. You can check the accuracy of your model by running the validation process with new images of the same people in your training data and seeing if the labels match the faces.

If you’re not satisfied with the results, then try adding more images to your training data, retraining the model, and attempting validation again. If you got the desired results, then you can start using the --test option with images that you choose.

Congratulations, you’ve built your very own face recognition tool! With it, you can train a model to identify specific faces. Then, you can test that model against other images and give it images with unknown faces to identify.

You took it a step further, though. You also made your code user-friendly by anticipating your users’ needs and likely workflow, and you used argparse to build an interface to address those needs.

While building this project, you’ve learned how to:

  • Build usable datasets for face recognition
  • Use face_recognition to detect faces
  • Generate face encodings from detected face images
  • Recognize a known face in an unknown image
  • Use argparse to build a command-line interface
  • Use Pillow to draw bounding boxes

You built a face recognition application from start to finish and expanded your mastery of Python. Great work! What’s next?

There are several directions you can take this project now that you’ve finished it. Here are several ideas to build on your already-impressive project and stretch your newfound skills:

  • Extend this project to work with video . Your project will detect faces in each frame of a video and give real-time predictions for each detected face.
  • Change your training data. You can update this project to recognize the faces of your friends and family. How about a pet? Do you think it’ll work just as well with a pet’s face? Why or why not?
  • Build a portable security camera. The face_recognition package is tested to work on single-board computers like the Raspberry Pi . Can you adapt this project to work with a camera connected to a Raspberry Pi to identify people and alert you to unknown guests?

Can you think of more ways to extend this project? Post your ideas and suggestions in the comments below.

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Kyle Stratis

Kyle Stratis

Kyle is a self-taught developer working as a senior data engineer at Vizit Labs. In the past, he has founded DanqEx (formerly Nasdanq: the original meme stock exchange) and Encryptid Gaming.

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Aldren Santos

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal . Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session . Happy Pythoning!

Keep Learning

Related Topics: intermediate machine-learning

Keep reading Real Python by creating a free account or signing in:

Already have an account? Sign-In

Almost there! Complete this form and click the button below to gain instant access:

Build Your Own Face Recognition Tool With Python (Source Code)

🔒 No spam. We take your privacy seriously.

programming assignment face recognition

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, face recognition.

605 papers with code • 23 benchmarks • 64 datasets

Facial Recognition is the task of making a positive identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces.

The state of the art tables for this task are contained mainly in the consistent parts of the task : the face verification and face identification tasks.

( Image credit: Face Verification )

programming assignment face recognition

Benchmarks Add a Result

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
GhostFaceNetV2-1 (MS1MV3)
GhostFaceNetV2-1
MS1MV2, R100, SFace
Fine-tuned ArcFace
Fine-tuned ArcFace
ArcFace+CSFM
PIC - QMagFace
Prodpoly
Prodpoly
PIC - MagFace
PIC - ArcFace
FaceNet+Adaptive Threshold
FaceNet+Adaptive Threshold
FaceNet+Adaptive Threshold
Model with Up Convolution + DoG Filter (Aligned)
Model with Up Convolution + DoG Filter
GhostFaceNetV2-1
Model with Up Convolution + DoG Filter
GhostFaceNetV2-1
Multi-task
FaceTransformer+OctupletLoss
Partial FC
MCN

programming assignment face recognition

Most implemented papers

Facenet: a unified embedding for face recognition and clustering.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

programming assignment face recognition

Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability.

VGGFace2: A dataset for recognising faces across pose and age

The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.

SphereFace: Deep Hypersphere Embedding for Face Recognition

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

A Light CNN for Deep Face Representation with Noisy Labels

This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels.

Learning Face Representation from Scratch

The current situation in the field of face recognition is that data is more important than algorithm.

Circle Loss: A Unified Perspective of Pair Similarity Optimization

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

DeepID3: Face Recognition with Very Deep Neural Networks

Very deep neural networks recently achieved great success on general object recognition because of their superb learning capacity.

Can we still avoid automatic face detection?

Recognito-Vision/Linux-FaceRecognition-FaceLivenessDetection • 14 Feb 2016

In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition?

Neural Networks for Face Recognition with TensorFlow

Michael guerzhoy (university of toronto and lks-chart, st. michael's hospital, [email protected] ).

).

).

The assignment can be completed with a single CPU, but students should be encouraged to experiment with cloud services and GPUs.

Meta Information

Summary

Students build feedforward neural networks for face recognition using TensorFlow. Students then visualize the weights of the neural networks they train. The visualization allows students to understand feedforward one-hidden layer neural networks in terms of template matching, and allows students to explore overfitting. Using a framework such as TensorFlow allows for the students to be able to run a variety of experiments in order to obtain interesting visualizations. In the second part of the assignment, students use transfer learning and build a convolutional network to improve the performance of their face recognition system. For bonus marks, students visualize what the convolutional network is doing. Students work with a bare-bones and comprehensible implementation of AlexNet pretrained on ImageNet, and with a TensorFlow implementation of a neural network that classifies MNIST digits.

Topics

Feedforward neural networks, face recognition, weight visualization, overfitting, transfer learning, convolutional neural networks.

Audience

Third and fourth year students in Intro ML classes. Students need to have had practice with making medium-sized (several hundreds of lines of code) programs.

Difficulty

Students find this assignment quite difficult. Spending more than 10 hours on it is not unusual.

Strengths
Weaknesses
Dependencies Students should have a good understanding of feedforward neural networks, including a brief explanation of visualizing the weights of the neural network. Students should have had a brief introduction to ConvNets and transfer learning. Slides are available upong request.
Variants ) are accessible to students.

Handout : html ( markdown source ).

Lessons learned

programming assignment face recognition

Nacho García

Stories of a Machine Teacher

Face Recognition in R using Keras

Introduction..

For millions of years, evolution has selected and improved the human ability to recognize faces. Yes! We, humans, are one of the few mammals able to recognize faces, and we are very good at it. During the courses of our lives, we remember around 5000 faces that we can later recall despite poor illumination conditions, major changes such as strong facial expressions, the presence of beards, glasses, hats, etc… The ability to recognize a face is one of those hard-encoded capabilities of our brains. Nobody taught you how to recognize a face, it is something that you just can do without knowing how.

Despite this apparent simplicity, to train a computer to recognize a face is an extremely complex task mainly because faces are indeed very similar. All faces follow the same patterns: there have two eyes, two ears, a nose and a mouth in the same areas. What makes faces recognizable are milimetrical diferences, so how can we train a machine to recognize those details? Easy, using convolutional neural networks ( CNNs ).

CNNs are special types of neural networks in which the data is processed by one or several convolutional layers before being fed into a classical neural network. A convolutional layer processes each value of the input differently, depending on the neighboring data. If we are talking of images, the processed value for each pixel of the image will depend on the surrounding pixels and the rules to process them are what we call filters.

One of the main features of convolutional layers is that they are very good at finding patterns. Let’s see how they work with a simple, but very intuitive, example: Imagine a picture of 6x6 pixels and a bit-depth of 2. One of the few shapes that it is possible to draw in such a rudimentary system is a diagonal line and. Indeed, there are two possible diagonal lines ascending and descending ones. It is possible to devise a simple convolutional layer consisting of one single filter that finds descending diagonals in the picture ignoring the ascending ones:

CNN1

The output image is the result of multiplying the values of the input by the filter in the different areas of the image covered by the filter. If the CNN finds the pattern, the pattern is preserved; otherwise, it is filtered out:

CNN2

Of course, CNNs are much more complex than this. The filter can slide through the matrix pixel by pixel, the values obtained after applying the filter can be added together, etc… however, the concept is exactly the same.

So the idea behind a face recognition algorithm is to send the images through several convolutional layers to find the patterns that are unique for each person and link them to the identity of the subject through a neural net that we can train. {: style=”text-align: justify”}

The Olivetti face database.

Since I just wanted to play around with these concepts testing different models, I needed a small database so I didn’t spend several hours training a model each time that I wanted to change a parameter. For that purpose the Olivetti database is perfect, it consists of 400 B/W images from 10 different subjects. The images are 92 x 114 with 256 tones of grey (8 bits). You can download the original dataset from the AT&T lab here or a version ready-to-use here (The version ready-to-use is exactly the same as the original one but for simplicity, it contains all the images in the same folder with the number of the subject already encoded in the name of the file).

Faces

Loading of images into the R environment.

If you are using the ready-to-use file you will find that the images have this name structure SX-Y.pgm where X is the number of the subject and Y the number of the picture. So once you have all the images ready in the same folder you can load them into R using something like this:

First, the script uses the imager library and the path where the images are located as a variable. Next, in order to make the script working independently of the number of the images in the dataset, the first image in the folder is loaded, transformed into a matrix and accommodated into an array with 4 dimensions: Number of images, height, width and 1 (since the images are b/w, one channel is enough the describe the color depth). Finally, all the images are sequentially loaded into the array with a for() loop.

By inspecting the array it is possible to see that all the grey values have already been normalized so the maximum possible white value is 1. To visualize the images in R you can use the image() function like this:

Face1

It seems that the data is loaded upside down in the array, but I don’t really care since this inversion is common to all the images in the array.

Next, I prepared the dependent variable to be matched with each image during the training step. To do that I extracted the number of the subject from the name of the file:

The last step before constructing the model is to divide the dataset into training and testing. I decided to use 9 images of each subject for the training process and the remaining image for testing purposes. The number in the sequence of pictures for testing is the same for all subjects and it is randomly defined.

Building the model.

To construct the model I used Keras, which is a very flexible and powerful library for machine learning. Although most of the tutorials and examples over the internet about Keras are based on Python, it is possible to use the library in R and this is what I am going to explain in this post.

As usual, the first time you use Keras you have to install the library:

These commands will install Keras and TensorFlow, which is the core of Keras. Once Keras has been installed it is possible to load it like the rest of the R libraries with library(keras)

With Keras it is possible to create recursive models in which some layers are reused or models with several inputs/output but the most simple and common type of models are the sequential models. In a sequential model, the data flows through the different layers to end up in the output layer where it is compared with the dependent variable during the training.

This face recognition model is a sequential model in which the data extracted from the images is transformed through the different layers to be compared in the last layer with the dependent variable to tune the weights of the model in order to minimize the loss function .

The use of the pipe operator %>% ( Ctrl+Sift+M in RStudio) is extremely useful to add the different layers that conform the model:

In the CNN part, the data enters into a 2D convolutional layer with 32 filters of 3x3 size. In this layer the shape of the data is also defined by input_shape=c(width, height, channels) . As activation function, I used the most common one in CNNs, which is ReLU . Then the data from the first CNN is processed by a second CNN to find patterns of a higher order. Finally, a max pooling layer is added for regularization. In this layer, it occurs downsampling of the data that also prevents overfitting. The pool size is 2x2, that means that the matrix is aggregated in a 2-by-2 manner and that the maximum value of the 4 pixels is selected:

MaxPool

Now, let’s look at the neural network section:

First, the data coming from the CNN is flattened, meaning that the array is reshaped into a vector with only one dimension. Then, the data is sent through two fully-connected layers of 1024 and 128 neurons with ReLU again as the activation function. Next, a regularization layer is added to drop out 30% of the neurons. Finally, an output layer with the same number of units as elements to classify (40 in this case) is added, the activation function of this layer is softmax , that means that for each prediction the probability of belonging to each one of the 40 classes is calculated.

Once the model is created it is possible to visualize it using summary(model)

The next step is to compile the model with the optimization parameters and the loss function :

For the optimization, I am using the Adam optimizer which is one of the state-of-the-art algorithms for weight tuning commonly used in image classification. The loss function here will be the cross-entropy . Here you can read more about the Adam optimizer and here you can read an interesting post where the author explains very well the cross-entropy.

There is one last step before training the model which is to do one-hot-encoding of the dependent variable. One-hot-encoding is an alternative way of representing the dependent variable in opposition to the label-encoding , which is the traditional way of showing it:

OHE

The to_categorical() function from Keras can do the job:

It is finally time to train the model:

The important parameters here are the batch size which is the number of samples that are processed before the model is updated and the number of epochs which is the number of times that the entire dataset goes through the model. I set both to 10 as an starting point.

After 10 minutes of training in my slow computer ( i5-4200U/4Gb ) the model achieves an impressive 92.5% prediction accuracy, with only 3 faces misclassified. This is an awesome score, considering that the model had only 9 samples to train on and it only had run through the whole dataset 10 times. Moreover, by looking at the shape of the training curve it is possible to anticipate that more epochs would lead to better predictions.

Loss

Another amazing feature of Keras/TensorFlow is the possibility of using TensorBoard , a kind of front-end for TensorFlow, which shows a lot of information presented in a beautiful way.

TensorBoard

Conclusions.

In this post, we have seen a very basic example of image recognition and classification in R with Keras. Using this playground it is possible to implement advanced models to solve more complex image-classification tasks. I hope you enjoy it as much as I did.

As always you can download the code of this post Here

Bibliography and Sources of Inspiration.

Keras for R How to implement Deep Learning in R using Keras and Tensorflow

Thinking Neuron banner Logo

Face Recognition using Deep Learning CNN in Python

CNN case study

Convolutional Neural Networks(CNN) changed the way we used to learn images. It made it very very easy! CNN mimics the way humans see images, by focussing on one portion of the image at a time and scanning the whole image.

CNN boils down every image as a vector of numbers, which can be learned by the fully connected Dense layers of ANN. More information about CNN can be found here .

Below diagram summarises the overall flow of CNN algorithm.

CNN face recognition case study

In this case study, I will show you how to implement a face recognition model using CNN. You can use this template to create an image classification model on any group of images by putting them in a folder and creating a class.

Getting Images for the case study

You can download the data required for this case study here.

The data contains cropped face images of 16 people divided into Training and testing. We will train the CNN model using the images in the Training folder and then test the model by using the unseen images from the testing folder, to check if the model is able to recognise the face number of the unseen images or not.

'This script uses a database of images and creates CNN model on top of it to test '' '####### IMAGE PRE-PROCESSING for TRAINING and TESTING data #######''' ='/Users/farukh/Python Case Studies/Face Images/Final Training Images' keras.preprocessing.image import ImageDataGenerator = ImageDataGenerator( shear_range=0.1, zoom_range=0.1, horizontal_flip=True) = ImageDataGenerator() = train_datagen.flow_from_directory( TrainingImagePath, target_size=(64, 64), batch_size=32, class_mode='categorical') = test_datagen.flow_from_directory( TrainingImagePath, target_size=(64, 64), batch_size=32, class_mode='categorical') .class_indices

Reading image data for CNN

Creating a mapping for index and face names

The above class_index dictionary has face names as keys and the numeric mapping as values. We need to swap it, because the classifier model will return the answer as the numeric mapping and we need to get the face-name out of it.

Also, since this is a multi-class classification problem, we are counting the number of unique faces, as that will be used as the number of output neurons in the output layer of fully connected ANN classifier.

'############ Creating lookup table for all faces ############''' =training_set.class_indices ={} faceValue,faceName in zip(TrainClasses.values(),TrainClasses.keys()): ResultMap[faceValue]=faceName pickle open("ResultsMap.pkl", 'wb') as fileWriteStream: pickle.dump(ResultMap, fileWriteStream) ("Mapping of Face and its ID",ResultMap) =len(ResultMap) ('\n The Number of output neurons: ', OutputNeurons)

Face id mapping for CNN

Creating the CNN face recognition model

In the below code snippet, I have created a CNN model with

  • 2 hidden layers of convolution
  • 2 hidden layers of max pooling
  • 1 layer of flattening
  • 1 Hidden ANN layer
  • 1 output layer with 16-neurons (one for each face)

You can increase or decrease the convolution, max pooling, and hidden ANN layers and the number of neurons in it.

Just keep in mind, the more layers/neurons you add, the slower the model becomes.

Also, when you have large amount of images, in the tune of 50K and above, then your laptop’ CPU might not be efficient to learn those many images. You will have to get a GPU enabled laptop, or use cloud services like AWS or Google Cloud.

Since the data we have used for the demonstration is small containing only 244 images for training, you can run it on your laptop easily 🙂

Apart from selecting the best number of layers and the number of neurons in it, for each layer, there are some hyper parameters which needs to be tuned as well.

Take a quick look at some of the important hyperparameters

  • Filters =32: This number indicates how many filters we are using to look at the image pixels during the convolution step. Some filters may catch sharp edges, some filters may catch color variations some filters may catch outlines, etc. In the end, we get important information from the images. In the first layer the number of filters=32 is commonly used, then increasing the power of 2. Like in the next layer it is 64, in the next layer, it is 128 so on and so forth.
  • kernel_size=(5,5) : This indicates the size of the sliding window during convolution, in this case study we are using 5X5 pixels sliding window.
  • strides=(1, 1): How fast or slow should the sliding window move during convolution. We are using the lowest setting of 1X1 pixels. Means slide the convolution window of 5X5 (kernal_size) by 1 pixel in the x-axis and 1 pixel in the y-axis until the whole image is scanned.
  • input_shape=(64,64,3): Images are nothing but matrix of RGB color codes. during our data pre-processing we have compressed the images to 64X64, hence the expected shape is 64X64X3. Means 3 arrays of 64X64, one for RGB colors each.
  • kernel_initializer=’uniform’ : When the Neurons start their computation, some algorithm has to decide the value for each weight. This parameter specifies that. You can choose different values for it like ‘normal’ or ‘glorot_uniform’.
  • activation=’relu’ : This specifies the activation function for the calculations inside each neuron. You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
  • optimizer=’adam’:  This parameter helps to find the optimum values of each weight in the neural network. ‘adam’ is one of the most useful optimizers, another one is ‘rmsprop’
  • batch_size=10 : This specifies how many rows will be passed to the Network in one go after which the SSE calculation will begin and the neural network will start adjusting its weights based on the errors. When all the rows are passed in the batches of 10 rows each as specified in this parameter, then we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A small value of batch_size will make the LSTM look at the data slowly, like 2 rows at a time or 4 rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a time, which will make the LSTM look at the data fast which could lead to underfitting. Hence a proper value must be chosen using hyperparameter tuning.
  • Epochs=10 : The same activity of adjusting weights continues for 10 times, as specified by this parameter. In simple terms, the LSTM looks at the full training data 10 times and adjusts its weights.
'######################## Create CNN deep learning model ########################''' keras.models import Sequential keras.layers import Convolution2D keras.layers import MaxPool2D keras.layers import Flatten keras.layers import Dense 'Initializing the Convolutional Neural Network''' = Sequential() ' STEP--1 Convolution '' .add(Convolution2D(32, kernel_size=(5, 5), strides=(1, 1), input_shape=(64,64,3), activation='relu')) '# STEP--2 MAX Pooling''' .add(MaxPool2D(pool_size=(2,2))) '############## ADDITIONAL LAYER of CONVOLUTION for better accuracy #################''' .add(Convolution2D(64, kernel_size=(5, 5), strides=(1, 1), activation='relu')) .add(MaxPool2D(pool_size=(2,2))) '# STEP--3 FLattening''' .add(Flatten()) '# STEP--4 Fully Connected Neural Network''' .add(Dense(64, activation='relu')) .add(Dense(OutputNeurons, activation='softmax')) '# Compiling the CNN''' .compile(loss='categorical_crossentropy', optimizer = 'adam', metrics=["accuracy"]) time =time.time() .fit_generator( training_set, steps_per_epoch=30, epochs=10, validation_data=test_set, validation_steps=10) =time.time() ("###### Total Time Taken: ", round((EndTime-StartTime)/60), 'Minutes ######')

Fitting the CNN model on the training data

Testing the CNN classifier on unseen images

Using any one of the images from the testing data folder, we can check if the model is able to recognize the face.

'########### Making single predictions ###########''' numpy as np keras.preprocessing import image ='/Users/farukh/Python Case Studies/Face Images/Final Testing Images/face4/3face4.jpg' =image.load_img(ImagePath,target_size=(64, 64)) =image.img_to_array(test_image) =np.expand_dims(test_image,axis=0) =classifier.predict(test_image,verbose=0) ('####'*10) ('Prediction is: ',ResultMap[np.argmax(result)])

Prediction for a single face

The model has predicted this face correctly! You can try for other faces and see if it gets recognized. You can also add your own pics and train the model again.

You can modify this template to create a classification model for any group of images. Just put the images of each category in its respective folder and train the model.

The CNN algorithm has helped us create many great applications around us! Facebook is the perfect example! It has trained its DeepFace CNN model on millions of images and has an accuracy of 97% to recognize anyone on Facebook. This may surpass even humans! as you can remember only a few faces 🙂

CNN is being used in the medical industry as well to help doctors get an early prediction about benign or malignant cancer using the tumor images. Similarly, get an idea about typhoid by looking at the X-ray images, etc.

The usage of CNN are many, and developing fast around us!

I hope after reading this post, you are little more confident about implementing CNN algorithm for some use cases in your projects!

  • Author Details

programming assignment face recognition

35 thoughts on “Face Recognition using Deep Learning CNN in Python”

' src=

ResultMap[faceValue]=faceName getting error for this line, could you please help

' src=

Can you tell me please that how you solved this problem?

Able to solve the issue I was getting , wonderful article, many thanks for sharing

' src=

Thank you Suman! I am glad you liked it!

' src=

Will this categorize the image not in the training set ?

Yes, the test folder which has been used in the example for single predictions was totally unseen by the model.

For other implementations, just make sure the target size of the image is same as the training data while passing a new image to check.

' src=

ResultMap={} for faceValue,faceName in zip(TrainClasses.values(),TrainClasses.keys()): ResultMap[faceValue]=faceName reply correct code?

Hi Sunny, What is the exact issue you are facing, can you send me a screenshot of the command and error, I will be able to help.

' src=

me also face that issue by deepak

' src=

I am getting an error while training the model, I get: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 100 batches). You may need to use the repeat() function when building your dataset.

So the training is not working and the accuracy is 0.0492, should I change anything?

Please use a smaller steps_per_epoch value.

Regards, Farukh Hashmi

Just fixed it, the steps_per_epoch value must be set to 8

' src=

Whatever total epoch may be 10 or 8 , the accuracy level is always less than 0.07 and model could not identify correctly, any one image, I tried, for several attempts .

Hi Amalendu,

Can you try once by increasing the neurons in the Dense layer to 128 or 150? Let me know if it works.

' src=

I have the same issue and tried increasing dense layer and it still identifies incorrectly with very low accuracy level, help!

Hi Abdullah,

Are you using the same data as the case study?

Can you share a little more information about the data/config so that I can help

Regards Farukh Hashmi

' src=

I have my own data for training this model but can you tell me where is the split_data code? Are you splitting data before training because the training and test data both have same path i.e TrainingImagePath?

The split happens based on the folder itself. I have used train and test as the same images and kept the testing folder images to check the model performance in the last section manually. If you want to split your data, please keep them in separate folders and provide different path for training and testing. Hope that helps!

' src=

I would like to ask what version of keras are you using for this. I keep getting this error: from tensorglow.python.eager.context import get_config ImportError: Cannot import name ‘get_config’

I have found some solutions online and they mentioned that it may be the version of the library that causes this error when we are copying someones code.

I would like to ask what version of keras was used in this as I have the following error: from tensorflow.python.eager.context ‘get_config’ ImportError: cannot import name ‘get_config’

I have found some solutions online and 1 of the solutions is that it may be the difference in versions of the library.

I would like to know what version of Keras was used here as i have encountered the following error: from tensorflow.python.eager.context import get_config ImportError: cannot import name ‘get_config’

I have searched online for the cause of this error and it was mentioned that the version of Keras might be a possibility. Thanks

' src=

hai farukh hashmi

so thanks for that program provided

' src=

hey sir! I got a problem with the testing. the image location is working in other place but here Traceback error ” No such file directory”

Hi Muhammad,

Can you share the screenshot of error. I might be able to help.

' src=

is the best article! thank u guys!!

' src=

For the testing part, I’m receiving this :- “AttributeError: module ‘keras.preprocessing.image’ has no attribute ‘load_img'”

Please help me out in this part.

' src=

try using “from tensorflow.keras.preprocessing import image” instead of keras.preprocessing.image’ import load_img’”

' src=

how can we use this for live vedio detecting ?? as the model is trained??

from this trained model is any one have done live recognition through webcam please do letb me know

' src=

Hi, this is really helpful. I tried the code and data, and it worked. But the result always is wrong. Does this result make sense? Thank you!

Can you try by increasing the number of neurons in the hidden layer to 128 or 150 etc. The accuracy will increase with parameter tuning if you are not getting it out of the box code.

' src=

Hey, I am getting this error

WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 100 batches). You may need to use the repeat() function when building your dataset.

I tried reducing the steps per epoch, but still I am getting the error. Also I have a dataset of 10 people having 100 images each. But the accuracy I am getting is around 9%, which is very low. Also I have observed that when I train the model with 2 images the accuracy is around 50% and for 10 images it is around 10%.. if you are aware of why this is happening Please let me! Thanks.

' src=

training_set = train_datagen.flow_from_directory( TrainingImagePath, target_size=(64, 64), batch_size=32, class_mode=’categorical’) getting error in this line saying file not found error. please help

' src=

Hello Dear … I need, very importantly, a code to classify masked faces using ٍSaimese networks if available.

Leave a Reply! Cancel Reply

Your email address will not be published. Required fields are marked *

  • Tutorials for contrib modules
  • Tutorials for face module

Table of Contents

Face recognition, preparing the data, algorithmic description of eigenfaces method, eigenfaces in opencv, algorithmic description of fisherfaces method, fisherfaces in opencv, algorithmic description of lbph method, local binary patterns histograms in opencv, the database of faces, yale facedatabase a, yale facedatabase b, creating the csv file, aligning face images, csv for the at&t facedatabase, introduction.

OpenCV (Open Source Computer Vision) is a popular computer vision library started by Intel in 1999. The cross-platform library sets its focus on real-time image processing and includes patent-free implementations of the latest computer vision algorithms. In 2008 Willow Garage took over support and OpenCV 2.3.1 now comes with a programming interface to C, C++, Python and Android . OpenCV is released under a BSD license so it is used in academic projects and commercial products alike.

OpenCV 2.4 now comes with the very new FaceRecognizer class for face recognition, so you can start experimenting with face recognition right away. This document is the guide I've wished for, when I was working myself into face recognition. It shows you how to perform face recognition with FaceRecognizer in OpenCV (with full source code listings) and gives you an introduction into the algorithms behind. I'll also show how to create the visualizations you can find in many publications, because a lot of people asked for.

The currently available algorithms are:

  • Eigenfaces (see EigenFaceRecognizer::create)
  • Fisherfaces (see FisherFaceRecognizer::create)
  • Local Binary Patterns Histograms (see LBPHFaceRecognizer::create)

You don't need to copy and paste the source code examples from this page, because they are available in the src folder coming with this documentation. If you have built OpenCV with the samples turned on, chances are good you have them compiled already! Although it might be interesting for very advanced users, I've decided to leave the implementation details out as I am afraid they confuse new users.

All code in this document is released under the BSD license , so feel free to use it for your projects.

Face recognition is an easy task for humans. Experiments in [217] have shown, that even one to three day old babies are able to distinguish between known faces. So how hard could it be for a computer? It turns out we know little about human recognition to date. Are inner features (eyes, nose, mouth) or outer features (head shape, hairline) used for a successful face recognition? How do we analyze an image and how does the brain encode it? It was shown by David Hubel and Torsten Wiesel , that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Since we don't see the world as scattered pieces, our visual cortex must somehow combine the different sources of information into useful patterns. Automatic face recognition is all about extracting those meaningful features from an image, putting them into a useful representation and performing some kind of classification on them.

Face recognition based on the geometric features of a face is probably the most intuitive approach to face recognition. One of the first automated face recognition systems was described in [115] : marker points (position of eyes, ears, nose, ...) were used to build a feature vector (distance between the points, angle between them, ...). The recognition was performed by calculating the euclidean distance between feature vectors of a probe and reference image. Such a method is robust against changes in illumination by its nature, but has a huge drawback: the accurate registration of the marker points is complicated, even with state of the art algorithms. Some of the latest work on geometric face recognition was carried out in [37] . A 22-dimensional feature vector was used and experiments on large datasets have shown, that geometrical features alone may not carry enough information for face recognition.

The Eigenfaces method described in [218] took a holistic approach to face recognition: A facial image is a point from a high-dimensional image space and a lower-dimensional representation is found, where classification becomes easy. The lower-dimensional subspace is found with Principal Component Analysis, which identifies the axes with maximum variance. While this kind of transformation is optimal from a reconstruction standpoint, it doesn't take any class labels into account. Imagine a situation where the variance is generated from external sources, let it be light. The axes with maximum variance do not necessarily contain any discriminative information at all, hence a classification becomes impossible. So a class-specific projection with a Linear Discriminant Analysis was applied to face recognition in [17] . The basic idea is to minimize the variance within a class, while maximizing the variance between the classes at the same time.

Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of the input data only local regions of an image are described, the extracted features are (hopefully) more robust against partial occlusion, illumation and small sample size. Algorithms used for a local feature extraction are Gabor Wavelets ( [238] ), Discrete Cosinus Transform ( [154] ) and Local Binary Patterns ( [3] ). It's still an open research question what's the best way to preserve spatial information when applying a local feature extraction, because spatial information is potentially useful information.

Face Database

Let's get some data to experiment with first. I don't want to do a toy example here. We are doing face recognition, so you'll need some face images! You can either create your own dataset or start with one of the available face databases, http://face-rec.org/databases/ gives you an up-to-date overview. Three interesting databases are (parts of the description are quoted from http://face-rec.org ):

  • AT&T Facedatabase The AT&T Facedatabase, sometimes also referred to as ORL Database of Faces , contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Yale Facedatabase A , also known as Yalefaces. The AT&T Facedatabase is good for initial tests, but it's a fairly easy database. The Eigenfaces method already has a 97% recognition rate on it, so you won't see any great improvements with other algorithms. The Yale Facedatabase A (also known as Yalefaces) is a more appropriate dataset for initial experiments, because the recognition problem is harder. The database consists of 15 people (14 male, 1 female) each with 11 grayscale images sized \(320 \times 243\) pixel. There are changes in the light conditions (center light, left light, right light), facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses).

The original images are not cropped and aligned. Please look into the Appendix for a Python script, that does the job for you.

  • Extended Yale Facedatabase B The Extended Yale Facedatabase B contains 2414 images of 38 different people in its cropped version. The focus of this database is set on extracting features that are robust to illumination, the images have almost no variation in emotion/occlusion/... . I personally think, that this dataset is too large for the experiments I perform in this document. You better use the AT&T Facedatabase for intial testing. A first version of the Yale Facedatabase B was used in [17] to see how the Eigenfaces and Fisherfaces method perform under heavy illumination changes. [125] used the same setup to take 16128 images of 28 people. The Extended Yale Facedatabase B is the merge of the two databases, which is now known as Extended Yalefacedatabase B.

Once we have acquired some data, we'll need to read it in our program. In the demo applications I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a filename followed by a ; followed by the label (as integer number ), making up a line like this:

Let's dissect the line. /path/to/image.ext is the path to an image, probably something like this if you are in Windows: C:/faces/person0/image0.jpg. Then there is the separator ; and finally we assign the label 0 to the image. Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label.

Download the AT&T Facedatabase from AT&T Facedatabase and the corresponding CSV file from at.txt, which looks like this (file is without ... of course):

Imagine I have extracted the files to D:/data/at and have downloaded the CSV file to D:/data/at.txt. Then you would simply need to Search & Replace ./ with D:/data/. You can do that in an editor of your choice, every sufficiently advanced editor can do this. Once you have a CSV file with valid filenames and labels, you can run any of the demos by passing the path to the CSV file as parameter:

Please, see Creating the CSV File for details on creating CSV file.

The problem with the image representation we are given is its high dimensionality. Two-dimensional \(p \times q\) grayscale images span a \(m = pq\)-dimensional vector space, so an image with \(100 \times 100\) pixels lies in a \(10,000\)-dimensional image space already. The question is: Are all dimensions equally useful for us? We can only make a decision if there's any variance in data, so what we are looking for are the components that account for most of the information. The Principal Component Analysis (PCA) was independently proposed by Karl Pearson (1901) and Harold Hotelling (1933) to turn a set of possibly correlated variables into a smaller set of uncorrelated variables. The idea is, that a high-dimensional dataset is often described by correlated variables and therefore only a few meaningful dimensions account for most of the information. The PCA method finds the directions with the greatest variance in the data, called principal components.

Let \(X = \{ x_{1}, x_{2}, \ldots, x_{n} \}\) be a random vector with observations \(x_i \in R^{d}\).

Compute the mean \(\mu\)

\[\mu = \frac{1}{n} \sum_{i=1}^{n} x_{i}\]

Compute the the Covariance Matrix S

\[S = \frac{1}{n} \sum_{i=1}^{n} (x_{i} - \mu) (x_{i} - \mu)^{T}`\]

Compute the eigenvalues \(\lambda_{i}\) and eigenvectors \(v_{i}\) of \(S\)

\[S v_{i} = \lambda_{i} v_{i}, i=1,2,\ldots,n\]

  • Order the eigenvectors descending by their eigenvalue. The \(k\) principal components are the eigenvectors corresponding to the \(k\) largest eigenvalues.

The \(k\) principal components of the observed vector \(x\) are then given by:

\[y = W^{T} (x - \mu)\]

where \(W = (v_{1}, v_{2}, \ldots, v_{k})\).

The reconstruction from the PCA basis is given by:

\[x = W y + \mu\]

The Eigenfaces method then performs face recognition by:

  • Projecting all training samples into the PCA subspace.
  • Projecting the query image into the PCA subspace.
  • Finding the nearest neighbor between the projected training images and the projected query image.

Still there's one problem left to solve. Imagine we are given \(400\) images sized \(100 \times 100\) pixel. The Principal Component Analysis solves the covariance matrix \(S = X X^{T}\), where \({size}(X) = 10000 \times 400\) in our example. You would end up with a \(10000 \times 10000\) matrix, roughly \(0.8 GB\). Solving this problem isn't feasible, so we'll need to apply a trick. From your linear algebra lessons you know that a \(M \times N\) matrix with \(M > N\) can only have \(N - 1\) non-zero eigenvalues. So it's possible to take the eigenvalue decomposition \(S = X^{T} X\) of size \(N \times N\) instead:

\[X^{T} X v_{i} = \lambda_{i} v{i}\]

and get the original eigenvectors of \(S = X X^{T}\) with a left multiplication of the data matrix:

\[X X^{T} (X v_{i}) = \lambda_{i} (X v_{i})\]

The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they need to be normalized to unit length. I don't want to turn this into a publication, so please look into [60] for the derivation and proof of the equations.

For the first source code example, I'll go through it with you. I am first giving you the whole source code listing, and after this we'll look at the most important lines in detail. Please note: every source code listing is commented in detail, so you should have no problems following it.

The source code for this demo application is also available in the src folder coming with this documentation:

I've used the jet colormap, so you can see how the grayscale values are distributed within the specific Eigenfaces. You can see, that the Eigenfaces do not only encode facial features, but also the illumination in the images (see the left light in Eigenface #4, right light in Eigenfaces #5):

eigenfaces_opencv.png

We've already seen, that we can reconstruct a face from its lower dimensional approximation. So let's see how many Eigenfaces are needed for a good reconstruction. I'll do a subplot with \(10,30,\ldots,310\) Eigenfaces:

10 Eigenvectors are obviously not sufficient for a good image reconstruction, 50 Eigenvectors may already be sufficient to encode important facial features. You'll get a good reconstruction with approximately 300 Eigenvectors for the AT&T Facedatabase. There are rule of thumbs how many Eigenfaces you should choose for a successful face recognition, but it heavily depends on the input data. [255] is the perfect point to start researching for this:

eigenface_reconstruction_opencv.png

Fisherfaces

The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn't consider any classes and so a lot of discriminative information may be lost when throwing components away. Imagine a situation where the variance in your data is generated by an external source, let it be the light. The components identified by a PCA do not necessarily contain any discriminative information at all, so the projected samples are smeared together and a classification becomes impossible (see http://www.bytefish.de/wiki/pca_lda_with_gnu_octave for an example).

The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented by the great statistician Sir R. A. Fisher . He successfully used it for classifying flowers in his 1936 paper The use of multiple measurements in taxonomic problems [75] . In order to find the combination of features that separates best between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation. This was also recognized by Belhumeur , Hespanha and Kriegman and so they applied a Discriminant Analysis to face recognition in [17] .

Let \(X\) be a random vector with samples drawn from \(c\) classes:

\[\begin{align*} X & = & \{X_1,X_2,\ldots,X_c\} \\ X_i & = & \{x_1, x_2, \ldots, x_n\} \end{align*}\]

The scatter matrices \(S_{B}\) and S_{W} are calculated as:

\[\begin{align*} S_{B} & = & \sum_{i=1}^{c} N_{i} (\mu_i - \mu)(\mu_i - \mu)^{T} \\ S_{W} & = & \sum_{i=1}^{c} \sum_{x_{j} \in X_{i}} (x_j - \mu_i)(x_j - \mu_i)^{T} \end{align*}\]

, where \(\mu\) is the total mean:

\[\mu = \frac{1}{N} \sum_{i=1}^{N} x_i\]

And \(\mu_i\) is the mean of class \(i \in \{1,\ldots,c\}\):

\[\mu_i = \frac{1}{|X_i|} \sum_{x_j \in X_i} x_j\]

Fisher's classic algorithm now looks for a projection \(W\), that maximizes the class separability criterion:

\[W_{opt} = \operatorname{arg\,max}_{W} \frac{|W^T S_B W|}{|W^T S_W W|}\]

Following [17] , a solution for this optimization problem is given by solving the General Eigenvalue Problem:

\[\begin{align*} S_{B} v_{i} & = & \lambda_{i} S_w v_{i} \nonumber \\ S_{W}^{-1} S_{B} v_{i} & = & \lambda_{i} v_{i} \end{align*}\]

There's one problem left to solve: The rank of \(S_{W}\) is at most \((N-c)\), with \(N\) samples and \(c\) classes. In pattern recognition problems the number of samples \(N\) is almost always samller than the dimension of the input data (the number of pixels), so the scatter matrix \(S_{W}\) becomes singular (see [180] ). In [17] this was solved by performing a Principal Component Analysis on the data and projecting the samples into the \((N-c)\)-dimensional space. A Linear Discriminant Analysis was then performed on the reduced data, because \(S_{W}\) isn't singular anymore.

The optimization problem can then be rewritten as:

\[\begin{align*} W_{pca} & = & \operatorname{arg\,max}_{W} |W^T S_T W| \\ W_{fld} & = & \operatorname{arg\,max}_{W} \frac{|W^T W_{pca}^T S_{B} W_{pca} W|}{|W^T W_{pca}^T S_{W} W_{pca} W|} \end{align*}\]

The transformation matrix \(W\), that projects a sample into the \((c-1)\)-dimensional space is then given by:

\[W = W_{fld}^{T} W_{pca}^{T}\]

For this example I am going to use the Yale Facedatabase A, just because the plots are nicer. Each Fisherface has the same length as an original image, thus it can be displayed as an image. The demo shows (or saves) the first, at most 16 Fisherfaces:

fisherfaces_opencv.png

The Fisherfaces method learns a class-specific transformation matrix, so the they do not capture illumination as obviously as the Eigenfaces method. The Discriminant Analysis instead finds the facial features to discriminate between the persons. It's important to mention, that the performance of the Fisherfaces heavily depends on the input data as well. Practically said: if you learn the Fisherfaces for well-illuminated pictures only and you try to recognize faces in bad-illuminated scenes, then method is likely to find the wrong components (just because those features may not be predominant on bad illuminated images). This is somewhat logical, since the method had no chance to learn the illumination.

The Fisherfaces allow a reconstruction of the projected image, just like the Eigenfaces did. But since we only identified the features to distinguish between subjects, you can't expect a nice reconstruction of the original image. For the Fisherfaces method we'll project the sample image onto each of the Fisherfaces instead. So you'll have a nice visualization, which feature each of the Fisherfaces describes:

The differences may be subtle for the human eyes, but you should be able to see some differences:

fisherface_reconstruction_opencv.png

Local Binary Patterns Histograms

Eigenfaces and Fisherfaces take a somewhat holistic approach to face recognition. You treat your data as a vector somewhere in a high-dimensional image space. We all know high-dimensionality is bad, so a lower-dimensional subspace is identified, where (probably) useful information is preserved. The Eigenfaces approach maximizes the total scatter, which can lead to problems if the variance is generated by an external source, because components with a maximum variance over all classes aren't necessarily useful for classification (see http://www.bytefish.de/wiki/pca_lda_with_gnu_octave ). So to preserve some discriminative information we applied a Linear Discriminant Analysis and optimized as described in the Fisherfaces method. The Fisherfaces method worked great... at least for the constrained scenario we've assumed in our model.

Now real life isn't perfect. You simply can't guarantee perfect light settings in your images or 10 different images of a person. So what if there's only one image for each person? Our covariance estimates for the subspace may be horribly wrong, so will the recognition. Remember the Eigenfaces method had a 96% recognition rate on the AT&T Facedatabase? How many images do we actually need to get such useful estimates? Here are the Rank-1 recognition rates of the Eigenfaces and Fisherfaces method on the AT&T Facedatabase, which is a fairly easy image database:

at_database_small_sample_size.png

So in order to get good recognition rates you'll need at least 8(+-1) images for each person and the Fisherfaces method doesn't really help here. The above experiment is a 10-fold cross validated result carried out with the facerec framework at: https://github.com/bytefish/facerec . This is not a publication, so I won't back these figures with a deep mathematical analysis. Please have a look into [149] for a detailed analysis of both methods, when it comes to small training datasets.

So some research concentrated on extracting local features from images. The idea is to not look at the whole image as a high-dimensional vector, but describe only local features of an object. The features you extract this way will have a low-dimensionality implicitly. A fine idea! But you'll soon observe the image representation we are given doesn't only suffer from illumination variations. Think of things like scale, translation or rotation in images - your local description has to be at least a bit robust against those things. Just like SIFT, the Local Binary Patterns methodology has its roots in 2D texture analysis. The basic idea of Local Binary Patterns is to summarize the local structure in an image by comparing each pixel with its neighborhood. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater-equal its neighbor, then denote it with 1 and 0 if not. You'll end up with a binary number for each pixel, just like

  • So with 8 surrounding pixels you'll end up with 2\^8 possible combinations, called Local Binary Patterns or sometimes referred to as LBP codes . The first LBP operator described in literature actually used a fixed 3 x 3 neighborhood just like this:

lbp.png

A more formal description of the LBP operator can be given as:

\[LBP(x_c, y_c) = \sum_{p=0}^{P-1} 2^p s(i_p - i_c)\]

, with \((x_c, y_c)\) as central pixel with intensity \(i_c\); and \(i_n\) being the intensity of the the neighbor pixel. \(s\) is the sign function defined as:

\[\begin{equation} s(x) = \begin{cases} 1 & \text{if \(x \geq 0\)}\\ 0 & \text{else} \end{cases} \end{equation}\]

This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood in [3] . The idea is to align an abritrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods:

patterns.png

For a given Point \((x_c,y_c)\) the position of the neighbor \((x_p,y_p), p \in P\) can be calculated by:

\[\begin{align*} x_{p} & = & x_c + R \cos({\frac{2\pi p}{P}})\\ y_{p} & = & y_c - R \sin({\frac{2\pi p}{P}}) \end{align*}\]

Where \(R\) is the radius of the circle and \(P\) is the number of sample points.

The operator is an extension to the original LBP codes, so it's sometimes called Extended LBP (also referred to as Circular LBP ) . If a points coordinate on the circle doesn't correspond to image coordinates, the point get's interpolated. Computer science has a bunch of clever interpolation schemes, the OpenCV implementation does a bilinear interpolation:

\[\begin{align*} f(x,y) \approx \begin{bmatrix} 1-x & x \end{bmatrix} \begin{bmatrix} f(0,0) & f(0,1) \\ f(1,0) & f(1,1) \end{bmatrix} \begin{bmatrix} 1-y \\ y \end{bmatrix}. \end{align*}\]

By definition the LBP operator is robust against monotonic gray scale transformations. We can easily verify this by looking at the LBP image of an artificially modified image (so you see what an LBP image looks like!):

lbp_yale.jpg

So what's left to do is how to incorporate the spatial information in the face recognition model. The representation proposed by Ahonen et. al [3] is to divide the LBP image into \(m\) local regions and extract a histogram from each. The spatially enhanced feature vector is then obtained by concatenating the local histograms ( not merging them ). These histograms are called Local Binary Patterns Histograms .

You've learned how to use the new FaceRecognizer in real applications. After reading the document you also know how the algorithms work, so now it's time for you to experiment with the available algorithms. Use them, improve them and let the OpenCV community participate!

This document wouldn't be possible without the kind permission to use the face images of the AT&T Database of Faces and the Yale Facedatabase A/B .

Important: when using these images, please give credit to "AT&T Laboratories, Cambridge."

The Database of Faces, formerly The ORL Database of Faces , contains a set of face images taken between April 1992 and April 1994. The database was used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department.

There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

The files are in PGM format. The size of each image is 92x112 pixels, with 256 grey levels per pixel. The images are organised in 40 directories (one for each subject), which have names of the form sX, where X indicates the subject number (between 1 and 40). In each of these directories, there are ten different images of that subject, which have names of the form Y.pgm, where Y is the image number for that subject (between 1 and 10).

A copy of the database can be retrieved from: http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip .

With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B.

The Yale Face Database A (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. (Source: http://cvc.yale.edu/projects/yalefaces/yalefaces.html )

The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as the Yale Face Database B. Please refer to the homepage of the Yale Face Database B (or one copy of this page) for more detailed information of the data format.

You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", PAMI, 2001, [bibtex] .

The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 [pdf] ." All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images. If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well. (Source: http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html )

You don't really want to create the CSV file by hand. I have prepared you a little Python script create_csv.py (you find it at src/create_csv.py coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this ( /basepath/<subject>/<image.ext> ):

Then simply call create_csv.py at , here 'at' being the basepath to the folder, just like this and you could save the output:

Here is the script, if you can't find it:

An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call CropFace(image, eye_left, eye_right, offset_pct, dest_sz) , where:

  • eye_left is the position of the left eye
  • eye_right is the position of the right eye
  • offset_pct is the percent of the image you want to keep next to the eyes (horizontal, vertical direction)
  • dest_sz is the size of the output image

If you are using the same offset_pct and dest_sz for your images, they are all aligned at the eyes.

Imagine we are given this photo of Arnold Schwarzenegger , which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have.

Here are some examples:

Configuration Cropped, Scaled, Rotated Face
0.1 (10%), 0.1 (10%), (200,200)
0.2 (20%), 0.2 (20%), (200,200)
0.3 (30%), 0.3 (30%), (200,200)
0.2 (20%), 0.2 (20%), (70,70)

doxygen

Deep-Learning-Specialization-Coursera

This repo contains the updated version of all the assignments/labs (done by me) of deep learning specialization on coursera by andrew ng. it includes building various deep learning models from scratch and implementing them for object detection, facial recognition, autonomous driving, neural machine translation, trigger word detection, etc., deep learning specialization coursera [updated version 2021].

GitHub Repo

Announcement

[!IMPORTANT] Check our latest paper (accepted in ICDAR’23) on Urdu OCR

UTRNet

This repo contains all of the solved assignments of Coursera’s most famous Deep Learning Specialization of 5 courses offered by deeplearning.ai

Instructor: Prof. Andrew Ng

This Specialization was updated in April 2021 to include developments in deep learning and programming frameworks. One of the most major changes was shifting from Tensorflow 1 to Tensorflow 2. Also, new materials were added. However, Most of the old online repositories still don’t have old codes. This repo contains updated versions of the assignments. Happy Learning :)

Programming Assignments

Course 1: Neural Networks and Deep Learning

  • W2A1 - Logistic Regression with a Neural Network mindset
  • W2A2 - Python Basics with Numpy
  • W3A1 - Planar data classification with one hidden layer
  • W3A1 - Building your Deep Neural Network: Step by Step¶
  • W3A2 - Deep Neural Network for Image Classification: Application

Course 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

  • W1A1 - Initialization
  • W1A2 - Regularization
  • W1A3 - Gradient Checking
  • W2A1 - Optimization Methods
  • W3A1 - Introduction to TensorFlow

Course 3: Structuring Machine Learning Projects

  • There were no programming assignments in this course. It was completely thoeretical.
  • Here is a link to the course

Course 4: Convolutional Neural Networks

  • W1A1 - Convolutional Model: step by step
  • W1A2 - Convolutional Model: application
  • W2A1 - Residual Networks
  • W2A2 - Transfer Learning with MobileNet
  • W3A1 - Autonomous Driving - Car Detection
  • W3A2 - Image Segmentation - U-net
  • W4A1 - Face Recognition
  • W4A2 - Neural Style transfer

Course 5: Sequence Models

  • W1A1 - Building a Recurrent Neural Network - Step by Step
  • W1A2 - Character level language model - Dinosaurus land
  • W1A3 - Improvise A Jazz Solo with an LSTM Network
  • W2A1 - Operations on word vectors
  • W2A2 - Emojify
  • W3A1 - Neural Machine Translation With Attention
  • W3A2 - Trigger Word Detection
  • W4A1 - Transformer Network
  • W4A2 - Named Entity Recognition - Transformer Application
  • W4A3 - Extractive Question Answering - Transformer Application

I’ve uploaded these solutions here, only for being used as a help by those who get stuck somewhere. It may help them to save some time. I strongly recommend everyone to not directly copy any part of the code (from here or anywhere else) while doing the assignments of this specialization. The assignments are fairly easy and one learns a great deal of things upon doing these. Thanks to the deeplearning.ai team for giving this treasure to us.

Connect with me

Name: Abdur Rahman

Institution: Indian Institute of Technology Delhi

Find me on:

LinkedIn

  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Emojify using Face Recognition with Machine Learning

In this article, we will learn how to implement a modification app that will show an emoji of expression which resembles the expression on your face. This is a fun project based on computer vision in which we use an image classification model in reality to classify different expressions of a person.

This project will be implemented in two parts:

  • Building an image classification model which can classify facial images with different expressions on them.
  • Extracting the face from an image and then classifying the expression on it using the classifier.

Data flow of the project

Data flow of the project

Modules Used

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib – This library is used to draw visualizations.
  • Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
  • OpenCV – This is an open-source library mainly focused on image processing and handling.
  • Tensorflow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.
         

Importing Dataset

The dataset which we will use contains around 30,000 images for seven different categories of emotions. Although the images are of very small size (48, 48) RGB format images. As the images are way too small for real-life applications. Also, our model will face difficulty in learning the patterns of different expressions on human faces.

 

Data Exploration

In this section, we will try to explore the data by visualizing the number of images provided for each category for the train and the test data.

These are the seven classes of emotions that we have here.

Countplot of images present in each category

Countplot of images present in each category

One of the important observations here we can draw is the high data imbalance in the “disgust” category. Due to this reason maybe our model will not perform well on this class of images.

Model Development

From this step onward we will use the TensorFlow library to build our CNN model. Keras framework of the tensor flow library contains all the functionalities that one may need to define the architecture of a Convolutional Neural Network and train it on the data.

Model Architecture

We will implement a Sequential model which will contain the following parts:

  • Three Convolutional Layers followed by MaxPooling Layers.
  • The Flatten layer to flatten the output of the convolutional layer.
  • Then we will have two fully connected layers followed by the output of the flattened layer.
  • We have included some BatchNormalization layers to enable stable and fast training and a Dropout layer before the final layer to avoid any possibility of overfitting.
  • The final layer is the output layer which outputs soft probabilities for the seven classes. 
   

Let’s store the labels and which are assigned to different classes of the emotions.

Let’s define the model’s architecture.

   

While compiling a model we provide these three essential parameters:

  • optimizer – This is the method that helps to optimize the cost function by using gradient descent.
  • loss – The loss function by which we monitor whether the model is improving with training or not.
  • metrics – This helps to evaluate the model by predicting the training and the validation data.

Callbacks are used to check whether the model is improving with each epoch or not. If not then what are the necessary steps to be taken like ReduceLROnPlateau decreases learning rate further. Even then if model performance is not improving then training will be stopped by EarlyStopping . We can also define some custom callbacks to stop training in between if the desired results have been obtained early.

   

Now we will train our model:

Training progress of the model

Training progress of the model

The first part of our project is to build a classifier that can classify different emotions.

Predicting Emoji in Real Time

This is part two of this project where we will predict expressions on a person’s face and show an emoji based on that in real-time. To detect the face in a video feed we will use haar_cascade_classifier.

Below is a helper function that will be used to plot the images.

Now we will capture the video and by using the haar_cascade_classifier we will predict the emoji for that expression.

             

Image along with the predicted emoji

Image along with the predicted emoji

Here is code for video:

                   

Conclusion:

There might be some discrepancies in the result because the classifier we build can predict emotions with only 55% accuracy. If we will use a better dataset in terms of size for example then the accuracy of the model increase and so does the emoji predicted by the model. The above shown is a use case of the classifier for a static image. Just replace the last code block with the below one and you’ll be able to predict the emoji in real time for your images.

Please Login to comment...

Similar reads.

  • AI-ML-DS With Python
  • Machine Learning Projects
  • How to Get a Free SSL Certificate
  • Best SSL Certificates Provider in India
  • Elon Musk's xAI releases Grok-2 AI assistant
  • What is OpenAI SearchGPT? How it works and How to Get it?
  • Full Stack Developer Roadmap [2024 Updated]

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Deep-Learning-Specialization

Coursera deep learning specialization, convolutional neural networks.

This course will teach you how to build convolutional neural networks and apply it to image data. Thanks to deep learning, computer vision is working far better than just two years ago, and this is enabling numerous exciting applications ranging from safe autonomous driving, to accurate face recognition, to automatic reading of radiology images.

  • Understand how to build a convolutional neural network, including recent variations such as residual networks.
  • Know how to apply convolutional networks to visual detection and recognition tasks.
  • Know to use neural style transfer to generate art.
  • Be able to apply these algorithms to a variety of image, video, and other 2D or 3D data.

Week 1: Foundations of Convolutional Neural Networks

Key concepts of week 1.

  • Understand the convolution operation
  • Understand the pooling operation
  • Remember the vocabulary used in convolutional neural network (padding, stride, filter, …)
  • Build a convolutional neural network for image multi-class classification

Assignment of Week 1

  • Quiz 1: The basics of ConvNets
  • Programming Assignment: Convolutional Model: step by step
  • Programming Assignment: Convolutional Model: application

Week 2: Deep convolutional models

Key concepts of week 2.

  • Understand multiple foundational papers of convolutional neural networks
  • Analyze the dimensionality reduction of a volume in a very deep network
  • Understand and Implement a Residual network
  • Build a deep neural network using Keras
  • Implement a skip-connection in your network
  • Clone a repository from github and use transfer learning

Assignment of Week 2

  • Quiz 2: Deep convolutional models
  • Programming Assignment: Residual Networks

Week 3: Convolutional Neural Networks

Key concepts of week 3.

  • Understand the challenges of Object Localization, Object Detection and Landmark Finding
  • Understand and implement non-max suppression
  • Understand and implement intersection over union
  • Understand how we label a dataset for an object detection application
  • Remember the vocabulary of object detection (landmark, anchor, bounding box, grid, …)

Assignment of Week 3

  • Quiz 3: Detection algorithms
  • Programming Assignment: Car detection with YOLO

Week 4: Special applications: Face recognition & Neural style transfer

Discover how CNNs can be applied to multiple fields, including art generation and face recognition. Implement your own algorithm to generate art and recognize faces!

Assignment of Week 4

  • Quiz 4: Special applications: Face recognition & Neural style transfer
  • Programming Assignment: Art generation with Neural Style Transfer
  • Programming Assignment: Face Recognition

Course Certificate

Certificate

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

A deep learning specialization series of 5 courses offered by Andrew Ng at Coursera

abhishektripathi24/Deep-Learning-Specialization-Coursera

Folders and files.

NameName
19 Commits

Repository files navigation

Deep learning specialization.

Master Deep Learning, and Break into AI

Instructor: Andrew Ng Community: deeplearning.ai

programming assignment face recognition

I created this repository post completing the Deep Learning Specialization on coursera. Its includes solutions to the quizzes and programming assignments which are required for successful completion of the courses.

Note: Coursera Honor Code advise against plagiarism. Readers are requested to use this repo only for insights and reference. If you are undertaking these courses at coursera, please submit you original work only.

The constitution of the repository as per course modules, quizzes and programming assignments is as follows:

  • Quiz - Introduction to deep learning
  • Quiz - Neural Network Basics
  • Programming Assignment - Python basics with numpy
  • Programming Assignment - Logistic Regression with a Neural Network mindset
  • Quiz - Shallow Neural Networks
  • Programming Assignment - Planar data classification with a hidden layer
  • Programming Assignment - Building your Deep Neural Network: Step by Step
  • Programming Assignment - Deep Neural Network - Application
  • Quiz - Practical aspects of deep learning
  • Programming Assignment - Initialization
  • Programming Assignment - Regularization
  • Programming Assignment - Gradient Checking
  • Quiz - Optimization algorithms
  • Programming Assignment - Optimization
  • Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks
  • Programming Assignment - Tensorflow
  • Quiz - Bird recognition in the city of Peacetopia (case study)
  • Quiz - Autonomous driving (case study)
  • Quiz - The basics of ConvNets
  • Programming Assignment - Convolutional Model: step by step
  • Programming Assignment - Convolutional Model: application
  • Quiz - Deep convolutional models
  • Programming Assignment - Keras Tutorial
  • Programming Assignment - Residual Networks
  • Quiz - Detection algorithms
  • Programming Assignment - Car detection with YOLO
  • Quiz - Special applications: Face recognition & Neural style transfer
  • Programming Assignment - Art generation with Neural Style Transfer
  • Programming Assignment - Face Recognition
  • Quiz - Recurrent Neural Networks
  • Programming Assignment - Building a recurrent neural network - step by step
  • Programming Assignment - Dinosaur Island - Character-Level Language Modeling
  • Programming Assignment - Jazz improvisation with LSTM
  • Quiz - Natural Language Processing & Word Embeddings
  • Programming Assignment - Operations on word vectors - Debiasing
  • Programming Assignment - Emojify
  • Quiz - Sequence models & Attention mechanism
  • Programming Assignment - Neural Machine Translation with Attention
  • Programming Assignment - Trigger word detection

Lecture Notes References

Here are some references of lecture notes and reviews drawn by some communities, authors and editors -

  • https://www.deeplearning.ai/ai-notes/
  • https://www.slideshare.net/TessFerrandez/notes-from-coursera-deep-learning-courses-by-andrew-ng
  • https://towardsdatascience.com/deep-learning-specialization-by-andrew-ng-21-lessons-learned-15ffaaef627c

Acknowledgement

Deep Learning Specialization offered by Andrew Ng is an excellent blend of content for deep learning enthusiasts. I thoroughly enjoyed the course and earned the certificate .

programming assignment face recognition

  • Jupyter Notebook 100.0%

COMMENTS

  1. GitHub

    Coursera - CNN Programming Assignment: In this project, we will build a face recognition system with FaceNet. Face recognition is a method of identifying or verifying the identity of an individual using their face in photos, video, or in real-time Topics

  2. Face Recognition

    Perform face verification and face recognition with these encodings Channels-last notation For this assignment, you'll be using a pre-trained model which represents ConvNet activations using a "channels last" convention, as used during the lecture and in previous programming assignments.

  3. Build Your Own Face Recognition Tool With Python

    Project Overview. Your program will be a typical command-line application, but it'll offer some impressive capabilities. To accomplish this feat, you'll first use face detection, or the ability to find faces in an image.Then, you'll implement face recognition, which is the ability to identify detected faces in an image.To that end, your program will do three primary tasks:

  4. Deep-Learning-Specialization-Coursera/Course 4. Convolutional ...

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  5. Face+Recognition+for+the+Happy+House+-+v3.ipynb

    Welcome to the first assignment of week 4! Here you will build a face recognition system. ... as opposed to the "channels last" convention used in lecture and previous programming assignments. In other words, a batch of images will be of shape (m, n C, n H, n W) instead of (m, n H, n W, n C). Both of these conventions have a reasonable amount ...

  6. Face_Recognition_v3a

    Use these encodings to perform face verification and face recognition Channels-first notation ¶ In this exercise, we will be using a pre-trained model which represents ConvNet activations using a "channels first" convention, as opposed to the "channels last" convention used in lecture and previous programming assignments.

  7. Step by Step Face Recognition Code Implementation From Scratch In

    So in this way we done the face recognition with just one image using the One Shot Learning approach. Input Image (Left), Output image (Right) Python Package. To make this process even more simple and easy to use we have created a python package that is even more easy to use. We have eliminated all the steps to download the supporting files and ...

  8. Face Recognition

    **Facial Recognition** is the task of making a positive identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces. The state of the art tables for this task are contained mainly in the consistent parts of the task ...

  9. Neural Networks for Face Recognition with TensorFlow

    In this assignment, students build several feedforward neural networks for face recognition using TensorFlow. Students train both shallow and deep networks to classify faces of famous actors. The assignment serves as an introduction to TensorFlow. Visualization of the weights to explain how the networks work is emphasized.

  10. Face Recognition in R using Keras

    This face recognition model is a sequential model in which the data extracted from the images is transformed through the different layers to be compared in the last layer with the dependent variable to tune the weights of the model in order to minimize the loss function. #Model. model <-keras_model_sequential()

  11. Face Recognition using Deep Learning CNN in Python

    Creating the CNN face recognition model. In the below code snippet, I have created a CNN model with. You can increase or decrease the convolution, max pooling, and hidden ANN layers and the number of neurons in it. Just keep in mind, the more layers/neurons you add, the slower the model becomes.

  12. Face-Recognition/Face_Recognition.ipynb at main

    Coursera - CNN Programming Assignment: In this project, we will build a face recognition system with FaceNet. Face recognition is a method of identifying or verifying the identity of an individual ...

  13. OpenCV: Face Recognition with OpenCV

    Algorithmic Description of LBPH method. A more formal description of the LBP operator can be given as: LBP(xc,yc) =∑p=0P−1 2ps(ip −ic) , with (xc,yc) as central pixel with intensity ic; and in being the intensity of the the neighbor pixel. s is the sign function defined as: s(x) ={1 0 if x ≥ 0 else.

  14. Facial Expression Recognition with PyTorch

    About this Guided Project. In this 2-hour long guided-project course, you will load a pretrained state of the art model CNN and you will train in PyTorch to classify facial expressions. The data that you will use, consists of 48 x 48 pixel grayscale images of faces and there are seven targets (angry, disgust, fear, happy, sad, surprise, neutral).

  15. Deep Learning Specialization Coursera [UPDATED Version 2021]

    This repo contains the updated version of all the assignments/labs (done by me) of Deep Learning Specialization on Coursera by Andrew Ng. It includes building various deep learning models from scratch and implementing them for object detection, facial recognition, autonomous driving, neural machine translation, trigger word detection, etc.

  16. Emojify using Face Recognition with Machine Learning

    This article aims to quickly build a Python face recognition program to easily train multiple images per person and get started with recognizing known faces in an image. In this article, the code uses ageitgey's face_recognition API for Python. This API is built using dlib's face recognition algorithms and it allows the user to easily implement fac

  17. Convolutional Neural Networks

    There are 4 modules in this course. In the fourth course of the Deep Learning Specialization, you will understand how computer vision has evolved and become familiar with its exciting applications such as autonomous driving, face recognition, reading radiology images, and more. By the end, you will be able to build a convolutional neural ...

  18. Convolutional Neural Networks

    Assignment of Week 3. Quiz 3: Detection algorithms; Programming Assignment: Car detection with YOLO; Week 4: Special applications: Face recognition & Neural style transfer. Discover how CNNs can be applied to multiple fields, including art generation and face recognition. Implement your own algorithm to generate art and recognize faces ...

  19. Deep-Learning-Specialization/4. Convolutional Neural Networks/week 4

    This repository contains the programming assignments and slides from the deep learning course from coursera offered by deeplearning.ai - gmortuza/Deep-Learning-Specialization

  20. amanchadha/coursera-deep-learning-specialization

    Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Networks; (v) Sequence Models - amanchadha/coursera-deep ...

  21. Deep-Learning-Specialization-Coursera/Convolutional Neural ...

    / Week 4-Programming Assignment Face Recognition / Face_Recognition.ipynb. Blame. Blame. Latest commit ...

  22. abhishektripathi24/Deep-Learning-Specialization-Coursera

    Its includes solutions to the quizzes and programming assignments which are required for successful completion of the courses. Note: Coursera Honor Code advise against plagiarism. Readers are requested to use this repo only for insights and reference. ... Programming Assignment - Face Recognition; Sequence Models. week 1 Quiz - Recurrent Neural ...