There is a promising new trend in healthcare data science. Innovative health payers are beginning to complement their use of structured data with raw data – from healthcare transactions, physician notes, digital health applications, and more – to make decisions that affect population and individual member health.
As payers have massive amounts of both types of data at their disposal, the concept of a “data lake” is gaining traction. A data lake stores both structured and unstructured data without the aid of expensive computer infrastructure. Although most healthcare data flows are traditionally unidirectional, a true data lake receives data from transactional systems as well as returns data to those systems to support and enhance decision making.
Payers interested in utilizing data lake technology to better manage and leverage their data should follow these seven steps for success.
1. Determine where your organization sits on the spectrum of science adoption.
Payers using traditional, statistical models may prefer to continue using an enterprise data warehouse based on the assumption that they have already identified all of their data signals. Conversely, payers that have invested in artificial intelligence (AI) and machine learning may be prepared to build and implement data lakes to support the demands of their data analytics and data science teams.
2. Commit to a data lake approach across the enterprise.
Stakeholders across the organization must understand and appreciate the value that a data lake offers if this approach is to be successful. Payers currently dealing with data silos may face resistance to the data lake strategy, often due to specific departments not wanting to let go of their individual data sets, hardware, and statistical methods. Additionally, because all data lakes must be refined in their early stages, a strong commitment from leaders and team members is critical for successfully navigating any initial bumps in the road. Ensure that you have leadership champions and invested individuals from various departments who can help manage the change associated with a data lake initiative.
3. Involve data scientists in the project from the beginning.
Data scientists are the primary consumers and users of the data lake. Therefore, they should be involved in guiding design decisions and choices related to data structure and access.
4. Assess whether your organization has the internal support and capabilities to build a data lake, or whether external support is required.
Payers will need the following expertise:
– Technical skills, specifically in the form of software developers and integrators
– Data management skills, as the data management team, will be responsible for data governance and security (e.g., determining which fields exist in various data zones and who has access), data curation, and master file maintenance
– Development, operations, and infrastructure expertise, which are all in huge demand and are a requisite for ensuring the infrastructure keeps running and the deployments are performing at scale
– Business and product management knowledge, as individuals with these skills, will look for use cases and applications coming from the data lake and will work across the enterprise to direct data lake outputs
If external support is necessary, ensure any potential partners have the specific skills and technology your organization needs. Ask the following questions:
– Does the vendor understand data science? Data science skills will be critical to the success of this initiative.
– Does the vendor understand your domain? Data lake architecture has nuances by the domain that can easily lead to failure if an experienced team is not on board.
– Does the vendor understand the architecture required to meet your business needs? Architecture must be designed to meet your organization’s business goals now and in the future.
5. Identify the goal(s) of the data lake.
When designing a data lake, having a specific purpose in mind is key to ensuring the initial and ongoing success of the data lake program. More than one strategy can dictate how the data lake operates, which is important to remember particularly for organizations that need to start by building individual data lakes for specific business units, instead of one for the enterprise.
6. Plan a rapid, but incremental, rollout.
A rapid initial data lake rollout is useful for gathering the timely feedback necessary to refine the solution. A best practice is to start by targeting small solutions, running the process end to end, and getting it to market immediately. Refinements are easy to make once the framework is established.
7. Apply and maintain an approach to automate data quality standards.
When it comes to data lakes, the quality of the data is more important than the quantity. Any data that does not meet minimum quality standards should be filtered out of, not pooled in, the lake.
Data lakes are proving incredibly valuable and useful for payer organizations. However, payers must clearly specify the purpose and goal of the data lake, understand the skills and level of expertise needed, locate the appropriate internal and/or external resources, and identify the leaders who will help to drive the initiative forward. With the proper foundation, a strategic data lake will deliver new data-driven insights and the desired return on investment.
Sumant Rao is senior vice president and business owner of performance analytics at Cotiviti, a leading provider of payment accuracy, risk adjustment, and quality and performance solutions for at-risk organizations.