data:image/s3,"s3://crabby-images/b0172/b017256099888960a67f5d3b764674f8ea3855be" alt="Json to redshift"
data:image/s3,"s3://crabby-images/13f16/13f167017aec81f179f2059b0dde87049b28fb63" alt="json to redshift json to redshift"
I outlined the problem and my proposed solution, pitched it to the team, and for the next few weeks, it was my main focus.
data:image/s3,"s3://crabby-images/dfed0/dfed0a15278bf8d05d66b00c083ce079560bce92" alt="json to redshift json to redshift"
However, I realized this was the perfect opportunity to tackle the survey data problem. That question can be surprisingly tricky to answer it’s not often you get a blank slate. As my internship was coming to an end, I was asked if there was anything specific I wanted to pursue with my remaining time. I’ve worked on the Growth and Data teams, so I’ve seen first-hand how survey data is handled. I’ve spent the last eight months as an intern at Noom. That means there’s no comprehensive list of the questions that are asked at a given time. The list of questions changes all the time because the Growth team is continuously running experiments. All we have is a `question_id`, which is a short string identifying the question.
#Json to redshift code#
From a data perspective, the downside is that we have no idea what a question was asking that information lives in code and doesn’t make it into the `survey_data` blob. Instead of living in a database or a CMS, all survey questions are defined in code to help engineers get experiments out more quickly. Our current view only captures a subset of the questions we ask, and it still takes *over two hours* to run. We use Redshift as our analysis platform, and getting data out of a JSON blob using Redshift is not ideal (more on this later). All survey data responses are stored in a JSON blob called `survey_data`.
data:image/s3,"s3://crabby-images/b0172/b017256099888960a67f5d3b764674f8ea3855be" alt="Json to redshift"