Wednesday, November 22, 2017

Connecting Zeppelin, Spark, and MongoDB

It took me a few hours to connect Zeppelin, Spark, and MongoDB.  I didn't find a solution to this problem online; thus the short entry.

First, I added a dependency to the MongoDB Connector for Spark in my Zeppelin notebook.


%dep
z.reset()
z.load("org.mongodb.spark:mongo-spark-connector_2.10:2.2.0")

%spark
import com.mongodb.spark._
import com.mongodb.spark.rdd.MongoRDD
val rdd = MongoSpark.load(sc)

This gave :

java.lang.IllegalArgumentException: Missing database name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.database' property

Then, after realizing,  that you cannot dynamically reconfigure the SparkContext.  I used the GUI to set the property.


It is working well now!

rdd: com.mongodb.spark.rdd.MongoRDD[org.bson.Document] = MongoRDD[0] at RDD at MongoRDD.scala:47