Healthcare data is an area of huge talk but relatively little action when it comes to actually increasing quality of care for patients. With more data being generated than ever before, grand promises are held hostage in the archives of health systems around the world. There are innumerable stakeholders interested in mining for gold in these archives, but security, technical, legal, and ethical challenges make actually capitalizing on the vast promise hard to do. Recent announcements like the formation of Truveta have underscored the desire and the value that lies in being able to capitalize on data opportunities.
Simultaneously, the COVID-19 pandemic has proven that quick innovation in the face of challenge is possible, especially when it comes to adopting digital transformation. The crisis made it easy for providers to see the necessity of embracing new practices as the world came to a screeching halt. Many of the solutions that were adopted, like telemedicine and multi-institution collaborations, will continue to bring higher quality care to patients for years into the future. Unfortunately, COVID-19 also brought a year of record losses for the provider sector and the need for institutions to come up with new streams of revenue to maintain high level quality of care.
One of the most anticipated innovations of the past few years is artificial intelligence (AI). AI promises to lower costs, reduce medical error rates, bring higher quality and faster diagnoses, and easing physician workload. In terms of future promise, AI is one of the shining stars.
At this moment however,, we are far from being able to realize most of the potential that AI holds. One large problem for development is that not all patients are represented in the training, testing, and validation datasets used to bring these AI algorithms to market. In fact, 71% of algorithms submitted to the FDA for approval through 2020 used data from only three states.* As AI gains ground in healthcare and paths to reimbursement become more readily available, this selection problem will exacerbate already existing biases in the health system and potentially create new ones.
One way for providers to reduce financial losses and to enable high quality, equitable care for patients in the future is to monetize their archives by providing data securely to algorithm developers in industry. Here are some factors that need to be carefully considered as health systems are having these discussions:
- Technological feasibility: Does your institution have the capacity to search across archives (EHR, radiology, pathology), bulk export multi-modal data, ensure it is properly de-identified, and transfer it? Will you deliver a copy of the data to researchers through SFTP or VPN or allow them to access it securely on a remote server? Will you provide labeling and ground-truthing services?
- Legal: Can you properly link and de-identify data that may be delivered to outside researchers and companies? Do you have data licensing contracts that stipulate what data can and cannot be used for? If necessary, can you ensure that your customers are HIPAA compliant?
- Ethical: Will you be transparent with your patients about sharing their de-identified data and allow them to opt-out if they desire to? What is your data allowed to be used for, and under what conditions? How much oversight will you have into how the data is being used?
- Build vs. Partner: Data sharing can bring in hundreds of thousands to millions of dollars a year. Will some of this money go into sourcing customers, building a technical pipeline, and fulfilling deals in-house? Or will you partner with a company in the data for medical AI space who will handle this process end-to-end? What about data exclusivity?
It’s clear that providers have a duty to innovate on behalf of patients. COVID-19 has brought huge momentum on this front along with a necessity to realize diversified revenue to make up for losses. One pathway providers can consider to both ensure high quality care for patients in the future and to enable a new revenue stream is to investigate pathways to putting data archives to work.
*Ref: Kaushal, A, Langlotz, C., (2020). Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms. JAMA: THe Journal of the American Medical Association, 324(12), 1212-1213