When you google “How to run PySpark on Jupyter”, you get so many tutorials that showcase so many different ways to configure iPython notebook to support PySpark, that it’s a little bit confusing for every non-geek like me out there. So when I finally figured out a way to do it, with the help of multiple websites, I thought I will post it as a blog here to help my fellow non-geeks who wish to rock the world of Data Science !
Prerequisites: You should have Jupyter Notebook and PySpark locally installed on your machine.
To predict the pitch type, given various measurements of the pitch.
This dataset was obtained from a national baseball team, as part of a Predictive Modeling Challenge. The objective was to predict the Pitch_Type, given various other parameters, like Start speed, height, angles etc. It’s a classification problem. I have used Random Forest to build the model.
a) Count the number of students whose names start with the letter “D”.
b) Display names of students, their average on the 3 exams, and a letter grade. The letter grade is computed as follows:
>= 90 is an “A”, 80 – <90 will be a “B”, and so on.
c) What is class average on exam_1?
d) Repeat (b), but the display should be sorted in descending order by average on the 3