With the advent of big data and computational power, stakeholders are becoming more and more interested in leverage the power of Artificial Intelligence (AI) and data-driven methods. However, one specific setback is the privacy restrictions regarding sensitive and private data. One example of confidential such data is the patients’ healthcare records that should not be shared with any unauthorized individuals. To remedy these restrictions, researchers proposed and employed federated learning as a method to train Machine Learning methods. Federated learning eliminates the need to collect data from data holders which can drastically augment the data privacy.
In the case of a Machine Learning problem, we usually put all data on one system element, such as a local computer or a server. The challenge is training standard Machine Learning models that require centralizing data. With the data growing at the speed of light and privacy restrictions, sticking to old manners seems like a hurdle. But why? What if we want to use a system with distributed machines to do the computation? What if it is not possible to collect all the data in one place? The question is, can we do better? Can we decentralized data and improve our Machine Learning model? Federated learning will answer this question.
In specific domains such as healthcare, it is impractical to assume we can have a great centralized data to work with for Machine Learning purposes. Due to privacy restrictions enforced by laws and regulations, in healthcare, the majority of the data holders may not or cannot share their data nor willing to share any trained model on the sensitive data.
What is federated learning?
Federated learning has been introduced as a novel approach [Communication-Efficient Learning of Deep Networks from Decentralized Data]. The goal is to train a model using the federation of participating systems. The job of each participating system is to train a sub-model using its own data without sharing the data with anyone else.
Federated learning is an approach to train a Machine Learning model with the data that we do NOT have access to. It is a promising system for private Machine Learning.
The best example of training a model with such participating systems is utilizing mobile devices. Federated learning eliminates the need for centralizing the data by solely relying on the sub-models trained on each participating system instead of their training data. Basically, the training data is hidden locally on users’ mobile devices, and the devices are used as computational sources to renew and improve a global model.
The procedure can be summarized as below:
- The central system (server) determines a test and an initial model will be trained
- The initial model will be sent to different participating systems (nodes)
- Each node train its own sub-model using its local data and the initial model and send this fine-tuned model back to the central system
- The central system collects all trained sub-models, updates the initial model, and creates a better model.
NOTE: The central model never has access to any of the local data.
Is federated learning secure?
Is there a protocol that assures the security and privacy of sensitive or personal data? Regarding security, this knowledge is usually encrypted before communication. To prevent the central system from being capable of identifying individual data based on the received information, Google has developed and utilizes the Secure Aggregation protocol.
An approach to ensure privacy is the utilization of Differential Privacy. It was first introduced by Cynthia Dwork, a world-known scientist in the privacy domain. Differential privacy endeavors to provide precise, statistical guarantees against what an adversary can understand from studying the outcomes of some randomized algorithm. In simple words, a differentially private system aims to guarantee the outcomes utilizing data (such as using them to train Machine Learning models and generating class-specific features), does NOT reveal sensitive information about individuals.
Although the above approaches help to augment the security of federated learning, for sure, there are still vulnerabilities that are subject to extensive research as of now.
Healthcare and the promise of federated learning
Despite all the advantages of modern technologies and data resources, it is still somewhat arduous to develop and maintain an effective and efficient healthcare data system. Furthermore, another problem involves the privacy considerations and restrictions of patients’personal information. This indicates the complexity of the establishment and organization of a reliable and secure healthcare system.
As it is mentioned before, Access to healthcare data is highly restricted due to patient privacy concerns. Health information data itself is protected by laws such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Fear of any breach of private information, and the resulting consequences, discourages healthcare data holders from sharing data, even if it is de-identified. Another complicating matter is the ownership of patient data amidst different stakeholders in the healthcare system such as providers, patients, payors, and third-party vendors that might be responsible for holding and managing the data.
Federated learning has the potential to solve some of the hurdles faced by methods that need the centralization of sensitive health data regarding privacy. By using federated learning, healthcare stakeholders are not obliged to share confidential health data. The individual data providers or even the patients themselves can keep the full control of their data without the need to sharing it. Such a system can revolutionalize the application of Artificial Intelligence and Machine Learning in healthcare.
Well, you read a short introduction to federated learning and for sure, it does not end here. Federated learning has its own challenges and drawbacks and it is under extensive research. You learn what is the challenge regarding data privacy and how federated learning can help to remedy this problem. You also learn what is the procedure of federated learning and its general picture.
What is your view? Do you believe federated learning will be around for a while or it’s not promising? Do you believe the general picture described above is sufficient or there are missing elements?