5. Perform market research, inventory your data, and assess your data readiness
Market research
GenAI is evolving rapidly. The limits of what it can do are expanding almost daily. What you think it can do may be very short of what it actually can do. It can be useful to get a pulse check of GenAI’s current capabilities by conducting market research.
One option is to hold a vendor pitch day to hear from innovators. In September 2024, the Government Operations Agency hosted the first-ever California GenAI Innovator Showcase. Dozens of GenAI innovators presented ideas about how to solve some departments’ biggest challenges.
Or perhaps your team wants to conduct more traditional market research. CDT has a Market Research Guide. It may also be helpful to research how other departments solved a similar problem with GenAI.
No matter how you conduct your research, it’s important to be informed about what GenAI can do today.
Inventory your data
Data is critical for any GenAI solution. Comprehensive and accurate data is essential for successful GenAI solutions. It ensures the models learn correctly and give reliable output. The saying “garbage in, garbage out” is highly relevant in GenAI.
And it’s important to think about potential biases in your data. Having as diverse and representative of a dataset as possible will reduce the bias in the GenAI models. Being aware of shortcomings in representation will help inform any mitigations to reduce bias.
The quality of the underlying data is of vital importance to its output. Poor data quality can lead to errors, more biases, and ineffective solutions.
Think about data preparation, mitigation, or cleaning before you can successfully tune or train a procured GenAI solution. We recommend you start asking your program subject matter experts the following types of questions:
- What is your data? Make a list of the data sets you’ll need and provide simple descriptions for each.
- Where is your data? GenAI relies on large, well-organized datasets to function well. A great place to start is knowing where it lives right now. So write down if it’s stored in a database (cloud or on-premise), or maybe there is no centralized storage solution?
- How much data do you have? If you don’t have much data or information, the models may not have what they need to perform well. It’s also important to note if you have a lot of data, to better understand the compute that will be needed. Be sure to write down both the actual size (in MB, GB, etc.) as well as the span or breadth of your data.
- What format is your data in? Many GenAI solutions will require the data to be in a specific format. Be sure to capture each of your dataset’s format.
- Is your data accessible? It’s important to get ahead of any accessibility issues for both legal and technical reasons. Asking this question early can save long delays and inefficiencies in the process.
- Is your data very sparse? Wide-spread missingness or incompleteness in your data can make it difficult for GenAI models to perform well.
- Is your data mostly consistent? Large inconsistencies in how the data has been reported or processed can lead to unreliable outputs in the GenAI solution.
- Is your data from reliable sources? If the data is not from credible sources, it can be difficult for the AI solution to make informed decisions.
- Who owns the data? Do you have legal rights to use the data for this purpose? Who controls it?
- Has the data been scrubbed of personally identifiable information (PII)? If the data will need to be scrubbed of PII, that is another step in the process that should be flagged early on.
- What steps will need to be taken to prepare the data? Often the data is not completely ready and will need significant cleaning, reformatting and/or migration efforts to be deemed usable. Meeting with data subject matter experts will be critical in understanding how much preparation will need to be done to get your data ready to be used.
You’ll want your data to be ready before you sign a purchase order.