Skip to content

mahmoudparsian/pyspark-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

26f4090 · Oct 17, 2019

History

84 Commits
Oct 17, 2019
Aug 25, 2019
Aug 31, 2019
Sep 2, 2019
May 21, 2019
Sep 15, 2019

Repository files navigation

Source Code for PySpark Algorithms Book

Unlock the Power of Big Data by PySpark Algorithms book


PySpark Algorithms Book:

Author: Mahmoud Parsian (mahmoud.parsian@yahoo.com)

Publication date: August 2019


About PySpark Algorithms Book

  • This book is about PySpark (Python API for Spark)
  • Introductory book on how to solve data problems using PySpark
  • Learn how to use mappers, filters, and reducers
  • Learn how to partition data for fast queries
  • Learn how to use the mapPartitions() transformation
  • Learn how to use reduceByKey(), groupByKey(), and combineByKey() transformations
  • Learn how to use Spark's transformations and actions for solving real problems
  • Learn how to use RDDs and DataFrames
  • Learn how to read/write data from many data sources
  • Learn how to use Logistic regression
  • Learn how to use Spark's reduction transformations
  • Learn how to use GraphFrames
  • Learn how to use Motifs in GraphFrames
  • Learn how to use Monoids in MapReduce algorithms

PySpark Algorithms Book


Software


Table of Contents

chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
chap04: Getting Started -- Sample Chapter
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids

Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
Appendix C: Questions And Answers (50+ QA)


Future chapters:

chap13: FP-Growth
chap14: LDA
chap15: Linear Regression


PySpark Algorithms Book