In this post, we are going to talk about PySyft, a tool to facilitate Private Deep Learning. What is private Deep Learning and how we can practice it? With the advent of new regulations such as the General Data Protection Regulation (GDPR), Deep Learning and Machine Learning are becoming under extensive investigation regarding their privacy concerns. What if we are using data to train a model and the model is leaking information?

pysyft

To tackle the privacy issues many different approaches were proposed. One of which is Federated Learning that has been proposed by Google. Check the Federated Learning search trend by Google Trends:

PySyft is designed to practice privacy-preserving methods such as Federated Learning and Differential Privacy. In this post, you will learn:

What is PySyft?
How to install it?
How it can enable us to do Federated Learning?

Introduction

Assuming the goal is to do Federated Learning. To do that, we basically need a toolkit. Why? The majority of available Deep Learning frameworks such as TensorFlow and PyTorch assume we have access to the aggregated data in a centralized manner. But what if we do not have all our data in one place? That’s the foundation of Federated Learning.

Here, we are going to introduce PySyft as an extension to PyTorch for private Deep Learning. PySyft is capable of many things including:

  • Aggregating gradients for Federated Learning
  • Working with remote machines executions and machines’ collaboration for model creating
  • Creating an environment that is very similar to PyTorch. All we have to do is to add PySyft elements!

NOTE: Refer to the great Secure and Private AI course by Udacity to learn the basics of private Machine Learning and PySyft in particular.

[thrive_lead_lock id=’4018′]

[/thrive_lead_lock]

PySyft Installation and Setup

At first, we are going to install PySyft:

pip install syft

In the next step let’s call PyTorch, NumPy, and the Syft libraries.

import numpy as np
import torch
import syfy

The next phase aims to customize PyTorch to some extent to enable it with the tools PySyft provides. The TorchHook on top of the torch library is what we need.

# Modify PyTorch APY
hook = syft.TorchHook(torch)

Simulation of remote workers

The question is how we are simulating the remote code execution on different machines? In Federated Learning, the are N number of remote machines that aim to collaborate with each other. PySyft has the ability to simulate that job by creating virtual workers.

# Let's create the first Machine
M1 = syft.VirtualWorker(hook, id='M1')
# Send some data to M1 and create the pointer to that machine
data = torch.tensor([0,1])
M1_Pointer = data.send(M1)
# Compare variable types ()
print(type(data),type(M1_Pointer))

And you should get the following as the output:

<class 'torch.Tensor'> <class 'torch.Tensor'>

Procedure

As can be observed above, the pointer is basically of type torch.Tensor as well. However, it has important information for communicating between the local and remote workers. Let’s explore what would happen:

  1. The local worker (you, for example, check M1_Pointer.owner command), send some data to M_1 location (check M1_Pointer.location)
  2. Locate the particular tensor and execute specific command on top of it (check M1_Pointer.id_at_location to find the location of tensor in the remote machine)
  3. Get the information from M_1 and empty M_1 (try M1_Pointer.get()). It is technically getting the tensor from the remote machine and bring it to the local machine.

Run the following commands to see what are the outputs:

M1_Pointer.location
M1_Pointer.id_at_location
# Get the information from M1
# Check the machine objects before and after .get() command
print('Objects in M1 machine before .get(): ',M1._objects)
data = M1_Pointer.get()
print('Objects in M1 machine after .get(): ',M1._objects)

Working with two machines

Let’s define variables on two machines and play around with them. We do the following:

  1. Create the second machine
  2. Create variable on both
  3. Check to see how the variables are connected

Run the following commands:

# Create the second machine
M2 = syft.VirtualWorker(hook, id='M2')
# Send to the first Machine
x = torch.tensor([0,1,2]).send(M1)
y = torch.tensor([1,1,1]).send(M1)
# Send to the second machine
xx = torch.tensor([0,2,4]).send(M2)
yy = torch.tensor([1,0,1]).send(M2)
# We cannot do operations on variables that locate on different machines
try:
  sum = x + xx
except:
  print("ERROR: The variables MUST be on the same machine!")
# With the following command you can extract the location of variables and check to see if they are the same
print('Statement: The location of both variable is the same: ',x.location == xx.location)
# You can add two variables on one machine
z = x + y
print(z)

Let’s explain the above lines:

  • Line 2: Define the second remote machine.
  • Lines 5-6: Define and send two x and y variables to the first remote machine.
  • Lines 9-10: Define and send two xx and yy variables to the second remote machine.
  • Lines 13-19: Shows why it is not possible to do operations on variables that belong to different remote machines!
  • Line 22: Adds two variables on the same remote machine.

Move Data Between Two Remote Machines Directly

The pointer chains are designed to move data between remote workers to eliminate the need to communicate back and forth with the central system. This can enhance security. Let’s clear the existing objects in both remote machines first.

# Remove the objects in previous machines
M1.clear_objects()
M2.clear_objects()

Using the below command, the data is sent to M_1 .

data = torch.tensor([1,2,3,4]).send(M1)
print(M1._objects)
print(M2._objects)

It should output similar to the following:

{42362651711: tensor([1, 2, 3, 4])}
{}

What if we want to send the data directly to M_2? Do the following and check the output:

data = data.move(M2)
print(M1._objects)
print(M2._objects)

Conclusion

Here you became familiar with PySyft, a simple yet powerful toolkit for private Deep Learning. The goal here was to introduce you to the tool and the basic setups. The beauty of PySyft is that it operates on top of PyTorch and you do not need to learn a completely different library. You should know PyTorch and learn how to adapt PySyft for particular applications such as Federated Learning. Feel free to share your thought if you have any point of view about PySyft, this tutorial, or any relevant concept you may find useful here to discuss.

Leave a Comment

Your email address will not be published. Required fields are marked *

Tweet
Share
Pin
Share