It took me a few hours to connect Zeppelin, Spark, and MongoDB. I didn't find a solution to this problem online; thus the short entry.
First, I added a dependency to the MongoDB Connector for Spark in my Zeppelin notebook.
%dep z.reset() z.load("org.mongodb.spark:mongo-spark-connector_2.10:2.2.0") %spark import com.mongodb.spark._ import com.mongodb.spark.rdd.MongoRDD val rdd = MongoSpark.load(sc)
This gave :
java.lang.IllegalArgumentException: Missing database name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.database' property
Then, after realizing, that you cannot dynamically reconfigure the SparkContext. I used the GUI to set the property.
It is working well now!
rdd: com.mongodb.spark.rdd.MongoRDD[org.bson.Document] = MongoRDD[0] at RDD at MongoRDD.scala:47
No comments:
Post a Comment