StrataConf NYC 2018

DSC_7118 I’m behind in writing this but the other week I had the opportunity to attend my first O’Reilly Strata Data Conference (StrataConf) in NYC. If you’re not familiar with the conference it’s an annual conference on the latest trends on data engineering, machine learning, and data science that is held at three locations throughout the year. My goal in attending was to gain a high-level overview of trends in industry and specifically technologies that companies are moving towards…or away from.

The conference used to have more of an emphasis on Hadoop and was formerly known as the Strata + Hadoop World conference… however sinse the “big data” (this term isn’t really used much either now) ecosystem has evolved rapidly over the last few years and is less Hadoop centric the conference is now called the Strata Data Conference.

Observations:

  1. Companies struggle to get value from data efforts-Data and hot new technologies will NOT solve your culture or process problems!
  2. In 2018 this one is probably obvious but very few people are building these systems on-prem
  3. Usage of Hadoop is declining in favor of Spark
  4. Streaming near real-time data using Apache Kafka seems to be trending upward
  5. Providing analysts and data scientists with Interactive query ability (Spark SQL etc)
  6. Companies are beginning to “democratize” data by empowering business users- (you probably don’t need a $150,000 “Unicorn” Data Scientist for most business cases)
  7. Few believe they have implemented everything they need to be GDPR compliant
  1. Data literacy
  2. Apache Spark (batch, streaming, SQL)
  3. Apache Kafka (streaming, SQL)
  4. Facebook Presto
  5. Apache Flume
  6. Google TensorFlow
  7. Google BigQuery, GIS, and Geo Viz (not widely adopted but has my attention)

Cold / Declining Skills:

  1. Hadoop
  2. Apache Drill
  3. Apache Storm

“We have so much data…our goal should be to make it useful”