Spark 1.6.0 was released today, with api and performance updates to the core, and improvements to spark streaming and MLlib.
This release brings operational and performance improvements in Spark core including a new network transport subsytem designed for very large shuffles. Spark SQL introduces an API for external data sources along with Hive 13 support, dynamic partitioning, and the fixed-precision decimal type. MLlib adds a new pipeline-oriented package (spark.ml) for composing multiple algorithms. Spark Streaming adds a Python API and a write ahead log for fault tolerance. Finally, GraphX has graduated from alpha and introduces a stable API.
Go ahead: give yourself a pat on the back. You’ve been doing a great job with big data. You’re collecting and analyzing customer information, gleaning insights into what customers want and need, and acting on those insights. For the first time ever, you’re able to position products to respond to customers’ greatest needs — and you know it’s working, because you’re collecting data that proves it. You’re way ahead of most of your peers in deriving real value from big data. But you’re not done yet. If you want to stay competitive as data growth continues to skyrocket, you’re going to have to do much more to get the maximum value from the customer data you’re collecting. And to do it, you’re going to need artificial intelligence / machine learning.
Feb 12, 2014 – Elasticsearch.org announced the release of Elasticsearch v1.0.0, the open source distributed restful search and analytics system, built on top of Lucene 4.6.1. The release adds a number of enhancements that make it a more robust enterprise search solution.
2013 Internet Trends report by Mary Meeker, KPCB.
Extensive and insightful slide deck on the state of the Web, from Mary Meeker & Liang Wu, KPCB (Kleiner Perkins Caufield & Byers), presented at D11 Conference, AllThingsD.
Overview of 6 announcements from O’Reilly’s Strata Conference, by Doug Henschen, InformationWeek
EMC brings SQL analysis to Hadoop — Intel throws its weight behind Hadoop — Revolution brings predictive analytics to big data — Cloudera makes Hadoop safer — Hortonworks and Microsoft deliver as promised — MapR and Google rev their engines.
BT Addresses Big Data and Security Challenges With New Visualization Service | SecurityWeek.Com
BT launched Assure Analytics – a ground-breaking new security data analysis service which helps organisations collect, arrange and evaluate big data sets, presenting them in visually insightful ways which can improve decision-making. It enables businesses to make informed, split-second decisions and develop effective long-term policies to govern their use of resources and response to potential risks and security threats across their infrastructure and operations.
Cascading is an application framework for Java developers to quickly and easily develop robust Data Analytics and Data Management applications on Apache Hadoop. Cascading 2.0 is now publicly available, including Apache 2.0 Licensing.