列一下Cloudera丧心病狂的CCP:DS认证大纲-创新互联

Required Exams

创新互联建站,为您提供成都网站建设成都网站制作、网站营销推广、网站开发设计,对服务VR全景等多个行业拥有丰富的网站建设及推广经验。创新互联建站网站建设公司成立于2013年,提供专业网站制作报价服务,我们深知市场的竞争激烈,认真对待每位客户,为客户提供赏心悦目的作品。 与客户共同发展进步,是我们永远的责任!

· DS700 – Descriptive and Inferential Statistics on Big Data

· DS701 – Advanced Analytical Techniques on Big Data

· DS702 - Machine Learning at Scale

Each exam may be taken in any order. All three exams must be passed within 365 days of each other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day after the failed attempt, before they may retake the same exam. Candidates must pay for each exam attempt.

Each passed exam is verifiable in your exam transcript and history.

Each exam is a single challenge scenario. You are provided access to the scenario, the data sets, and the cluster. You are given eight (8) hours to complete the challenge.

Required Skills

Common Skills (all exams)

· Extract relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise”

· Extract features from a data stored in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data

DS700 - Descriptive and Inferential Statistics on Big Data

· Use statistical tests to determine confidence for a hypothesis

· Calculate common summary statistics, such as mean, variance, and counts

· Fit a distribution to a dataset and use that distribution to predict event likelihoods

· Perform complex statistical calculations on a large dataset

DS701 - Advanced Analytical Techniques on Big Data

· Build a model that contains relevant features from a large dataset

· Define relevant data groupings, including number, size, and characteristics

· Assign data records from a large dataset into a defined set of data groupings

· Evaluate goodness of fit for a given set of data groupings and a dataset

· Apply advanced analytical techniques, such as network graph analysis or outlier detection

DS702 - Machine Learning at Scale

· Build a model that contains relevant features from a large dataset

· Predict labels for an unlabeled dataset using a labeled dataset for reference

· Select a classification algorithm that is appropriate for the given dataset

· Tune algorithm metaparameters to maximize algorithm performance

· Use validation techniques to determine the successfulness of a given algorithm for the given dataset

Exam Delivery and Cluster Information

All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime.

Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others . In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb, SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER), Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib, ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water.

Currently, the cluster is open to the internet and there are no restrictions on tools you can install or websites or resources you may use.

另外有需要云服务器可以了解下创新互联scvps.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。


分享名称:列一下Cloudera丧心病狂的CCP:DS认证大纲-创新互联
文章URL:http://scyanting.com/article/dggspg.html