1. Anonymize patient data
Imagine that you are collecting your patient data for a study on the quality of care in your hospital. In a case like this, it makes sense to anonymize all identifying patient data before any operations are done on it. You will be doing this with the map function.
For this step, develop your application in the coding cell under the Lab 3 – Anonymize and average data section of the course exercise notebook.
- At the beginning of your application, import the
- Make a function called anonymize that has these settings:
- Has a single parameter tuple
- Hashes the tuple’s values for patientId and locationId by using the sha256 algorithm and saves the hashed valuesHint: To hash the data, you will have to call
item_to_encode= hashlib.sha256 (item_to_encode.encode(‘utf-8’)).digest()
- Returns the hashed tuple
- Use the map function on a stream to anonymize its content, with a call to your new function. Name the resulting stream patientX. The data anonymization should come just before the filter function.
patientX = patientData.map(anonymize)
- Make all of the appropriate downstream alterations to your code to reflect the new changes.
- Ensure the patient data simulator is running on your Bluemix account.
- Submit your application to the Bluemix Streaming Analytics service as you did in the previous labs. Run your application and view its output.When you view the output in the Console Log, notice that the values for PatientId and LocationId have been replaced by hashes, anonymizing the patients.
- Cancel the job running your application.