General Instructions
You are expected to do a final project in this course and utilize the tools we learned on your interests’ qualifying dataset. Your project should include using at least one of the tools we learned in this course, including, but not limited to, Hive, Impala, MapReduce, Spark, or a combination of these tools. Use bash coding in the initial steps to massage your data before feeding it into the Hadoop. The final project consists of two parts: 1- Presentation 2- Report.
Please be creative.
Find cool results and use appropriate (simultaneously cool) graphs and techniques.
For visualization, you can use Tableau to visualize and draw shiny plots. You can also make online WordCloud plots using this Links to an external site. website. It is free! Try it.
1. Final presentation
You should prepare to present your project in about 8 minutes and be ready to have approximately 2 minutes of Q&A. All members of the groups are expected to participate in the presentation.
1.1. Presentation format
Apart from your group name, what else does the presentation include?
1- Description of the data, 2- Problem Statement, 3- Why is this big data? 4- Method & Results, and 5- Conclusion.
1- Description of the data:
Let us know what the data is. When has it been collected? Who did collect it? What is the source? How large is your data? Do you have any links to the data? How many records does it have? How many features (columns)? Structured or Unstructured? ,…
2- Problem Statement
What are you trying to do? What is your aim? What are your research questions?
3- Why is this big data?
What is the reason that you did select this data? Why is it big data?
4- Method & Results:
What methods did you use? Are you using any Hadoop tools? What are your findings? Any plot? Graph?
5- Conclusion
Please go ahead and finish your findings, and let us know if you have any suggestions regarding the data. For example, machine 12 has too many issues, so it is better to investigate the machine.
1.2 Grading Rubric for Presentation
Each member’s presentation rubric includes the following:
a) Presentation skills (on-time, clear presentation, narration, your PowerPoint style, etc.) 7pts
b) Project introduction 3pts
c) Problem Statement 3pts
d) Dataset 3pts (how big is your dataset? What is it about? Why is it big data? rows? columns? when collected?…)
e) Methods 5pt
f) Conclusion 5pt
g) Novelty & Creativity 7pts (Being creative in your findings and results. Having a novel method and dataset)
h) Participation 7pt
Each team member will evaluate the other team members in this part. There will be a questionnaire in which you can give points to your teammates (not yourself). I’ll send the questionnaire a few days before the presentations. By default, I assume each member gets the total points; otherwise, I’ll look at the given grades.
2. Report
Please discuss your project in detail & hand over a clean, professional, neat report. Your report must include an executive summary (learn how to write an executive summary) plus all sections discussed in the presentation (Description of the data, Problem Statement, Why this is big data, Method and results, and Conclusion) and your code in the appendix. Reports are limited to up to 10 pages (excluding appendices). Notes on the final project report:
2.1 Grading Rubric for Report
a) Professional report skills (clean, on order with clear grammar, having front page, page number .. ) 10 pt
b) Executive summary 10 pt
c) Introduction & Problem Statement 5 pt
d) Code and Dataset 5 pt (how big is your dataset? What is it about? Why is it big data? …)
e) Methods 5 pt
f) Conclusion 5 pt
Everyone is expected to cooperate positively with the team. Remember that your group members assign 7 points to your presentation. If a member does not work toward the project, 1- all project points will be deducted. 2- The case will be reported to the school for further action.
3. Dataset
You bring your dataset. Your data should be large enough to be qualified for Big Data! (minimum 500 MB) and not so large to fill the server space (maximum 2 GB ). There are many sources from which you can get a dataset. Here, I’m introducing some sources to get a dataset and work on:
KaggleLinks to an external site.
noaa.govLinks to an external site.
Weather.comLinks to an external site.
,…
You may check with me and discuss if your dataset qualifies for this project.
so total i need 3 things, one report and one PPT and a text of speech of this PPT and each paragraph have to makt to which page of ppt
General Instructions You are expected to do a final project in this course and u
By admin