Description:In the realm of data analytics, the ability to translate theoretical concepts into practical applications is crucial. This assignment offers students the chance to bridge this gap by delving into real-world data scenarios. In this diagnostic analytics class assignment, students will have the opportunity to select an organization, business, or industry they aspire to work in post-graduation. Whether it’s a business they envision starting themselves or as an Analyst Consultant serving clientele, this assignment aims to connect theoretical knowledge with practical insights.By immersing themselves in the data of their chosen field, students will gain invaluable experience in extracting meaningful insights, identifying trends, and making data-driven recommendations. Through this process, they will not only enhance their analytical skills but also gain a deeper understanding of the challenges and opportunities within their chosen industry.Key Concepts: UML (Unified Modeling Language):UML is a standardized modeling language used in software engineering to visualize the design of a system. It provides a set of graphical notations to represent the structure and behavior of a system. Process Flow Diagram:A process flow diagram is a visual representation of the steps involved in a process. It shows the flow of data or information through a system and helps to understand the sequence of operations. Rationale:Rationale refers to the underlying reasons or justifications behind a particular decision, choice, or action. In the context of this assignment, providing a rationale involves explaining the reasoning behind the choices made in the data analysis process RapidMiner:RapidMiner is a data science platform that provides a range of tools and techniques for data analysis, machine learning, and predictive modeling. Operators (in context of RapidMiner and data handling):In RapidMiner, operators are building blocks used to perform specific tasks or operations on data. These operators include data preprocessing, transformation, modeling, and evaluation tools.Steps:1. Select Target Company/Organization/Business Type:o Students will choose a target company, organization, or business type. This should be a generalcategory rather than a specific company. Examples include an extermination company, a luxuryautomobile seller, a mortgage reinsurer, a convenience store, or a demolitions company.2. Select Target City:o Students will choose a large city as the target location for their analysis. They need to ensure thatthis city has an “open data” repository. If students are unfamiliar with “open data,” they shouldvisit open data Toronto or open data New York to understand what “open data” repositories are.3. Define Objective for Target Organization:o Students will define the objective they aim to achieve or problem they intend to solve for theirchosen organization. For example, an extermination company may aim to identify newcommercial clients. They could leverage the city’s “health inspection” records to identifyrestaurants with recurring infestation problems and focus marketing efforts on areas with thehighest potential clients.4. Locate Relevant Data Set from Open Data:o Students will find a relevant data set from the chosen city’s open data repository to assist them inachieving their goals. This data set should be related to the objective defined for the targetorganization.5. Rationale and Explanation:o Alongside the diagram or in an attached write-up, students will provide a rationale for their designchoices.o For each step in the process, students should explain: Why they selected specific operators. How these operators are leveraged by RapidMiner to cleanse the data. What problem or issue was resolved by each process step.6. Research RapidMiner Operators:o Students are required to research and understand various RapidMiner operators, focusing onthose related to data preprocessing, merging datasets, and handling incomplete or erroneousdata.o They should explore the RapidMiner documentation and other resources to familiarizethemselves with the functionalities of different operators.o Students must refer to the ‘course case’ provided for their narrative and work. They cannot selecta case outside of the one provided.7. Design a Process Flow Diagram:o Using a tool like Microsoft Visio, students will create a detailed process flow diagramdemonstrating how they would handle and merge datasets in RapidMiner.o The diagram should include: Start and end points. Each step of the process represented by appropriate symbols or shapes. Connections between steps indicating the flow of data. Annotations or descriptions for each step. Begin exploring RapidMiner and the Operators within, these will be the individual taskswithin your process flow. Spend a significant amount of time on this component, this isone of the most important and challenging elements of your assignment delivery.Importance of Cleansed Data in Diagnostic Analytics:Cleansed data is a critical component of diagnostic analytics as it ensures the accuracy and reliability of analysis results. In diagnostic analytics, the goal is to identify the root causes of problems or issues within a system. Clean and well-prepared data enables analysts to: Understand Business Needs: Cleansed data provides a clear and accurate representation of thebusiness environment, allowing analysts to understand the specific challenges and requirements of theorganization. Generate Insights: By analyzing clean data, analysts can uncover patterns, trends, and anomalies thatprovide valuable insights into business operations, customer behavior, market trends, etc. Make Informed Decisions: With reliable data, decision-makers can make informed and strategicdecisions that drive business growth and efficiency.A clean dataset serves as the foundational element for any type of analytics, including diagnostic analytics, because it: Reduces Bias: Clean data minimizes the influence of errors, outliers, and inconsistencies, reducing biasin analysis results. Improves Accuracy: By removing irrelevant or inaccurate data points, cleansed data improves theaccuracy of analytical models and findings. Enhances Repeatability: Analysts can confidently reproduce analysis processes and results whenworking with clean datasets, ensuring consistency and reliability.Submission Requirements:Students are required to prepare a comprehensive summary report of at least 2 pages in length for submission to their faculty member covering all of the items covered in the Steps section of this document.You will present your target company, the industry they are in, the problem you have identified along with the dataset(s) you will be utilizing. You will need to identify the specific fields (columns) that have the data you require along with your rationale as to why these fields (columns) are relevant to your solution. You will then build a process flow indicating the different steps youll need to leverage to correct or rectify the data to ensure that it fits your specific needs. Each of the major tasks for data handling/cleansing must have a clear rationale attached or annotated to it to explain how this will contribute to your overall efforts to generate value-add insights. If you require additional training on how to build a process flow then you may visit and watch the included micro-lectures within this course regarding this topic. For those considering a new business venture, your faculty member represents your Angel Investor. Yourreport must convince them of your company’s viability before any funding is provided. For students assisting an existing company, your faculty member acts as your direct Supervisor at theconsultancy firm. You must demonstrate to your faculty member that your proposed solution ismarketable to clients.Additional Notes:Students are encouraged to engage in brainstorming, discussions, and collaborative efforts throughout this project. The creative process benefits greatly from multiple perspectives working together to explore innovative ways to assist a company, conduct research, or enhance its potential. This collaborative approach is fundamental to developing critical thinking skills.It is expected that each student will select a unique combination of companies and problems to address. You may consult with your faculty member to gain approval for your chosen topic. Alternatively, you can present your topic to the class, allowing the entire class to brainstorm and provide input on how to assist your target company.Conclusion:This assignment serves as an opportunity for students to delve into developing an understanding for the practical application of data for analytics and how we set the framework for diagnostic analytics and insight generation. It is crucial to note that this is an individual assignment, and each student must submit their own work.Time management is a critical skill: Students must be aware of the posted end and due dates for the assignment. Deadlines are non-negotiable, and late submissions will incur penalties. Therefore, effective time management is essential to ensure timely completion of the assignment.By working through this assignment, students will not only gain a deeper understanding of data and diagnostic analytics along with the capability of tools such as RapidMiner, operators and data cleansing techniques but also develop valuable skills in research, problem-solving, and data analysis. Understanding the importance of clean and well-prepared data lays a strong foundation for successful analytics projects, enabling organizations to make informed decisions and drive business growth.Success Criteria:This assessment’s overall weight can be found on the Instructional Plan (IP) and reflected within the eConestoga grade book. Any student that discovers a conflict existing between the IP and grade book shall notify their course faculty member.For specific evaluation standards, students shall consult the associated assessment rubric found in the Rubrics section of eConestoga. Failure to submit an assessment by the specified end date, to the correct dropbox, will result in a grade of zero (0)o No opportunity will be provided to make up for an unsubmitted deliverableo It is the students responsibility to ensure that their work has been submitted through eConestoga, on-time,to the correct course and in the correct folder Be aware that Conestoga Colleges Academic Offense policy will be enforcedFormat Requirements:The following requirements are enforced for this assessment. Length1 of paper: N/A Number of files: 1, 1-page File Name: See O3 Student SubmissionsNaming Convention.pdf Audience and Tone:o Utilize professional – neutral toneo Audience type; professionalo Avoid use of passive phrasing Citation Style:o APA2 (required) Title Page: (10% penalty if you do not include)o College Name, Program Code, Course Code,Course Section, Assignment Title, StudentName, Date Font Style:o Body Size: 11pto Font Style: Calibri Line Spacing:o 1.08pt (Microsoft Default) Margins:o Narrow setting (0.5) Document Format:o For each non-title page, a header (or footer)including; date, Course code, Student nameo Dropbox submission through eConestogao PD