메타 데이터의 끝으로 건너뛰기
메타 데이터의 시작으로 이동
이 페이지의 이전 버전을 보고 있습니다. 현재 버전 보기.
현재와 비교
페이지 이력 보기
버전 1
다음 »
Summary of EMBRC / St Andrews requirements for Processing
Detailed requirements
- Data processing desiderata: input
a. What data are to be processed? What are their:
> Typologies Varies
> Volume Varies
> Velocity Varies
> Variety Varies
b. How is the data made available to the analytics phase? By file, by web (stream/protocol), etc.
> Files
c. Please provide concrete examples of data.
> It varies a lot. There are also data protection issues. - Data processing desiderata: analytics
a. Computing needs quantification:
a.1 How many processes do you need to execute?
a.2 How much time does each process take/should take?
> Varies
b. Process implementation:
b.1 What do you use in terms of:
> Programming languages varies
> Platform varies
> Specific software requirements varies
c. Is there a possibility to inject proprietary/user defined algorithms/processes for each of the above?
> Yes
d. Do you use a sandbox to test and tune the algorithm/process for each of the above?
> Yes
f. Do you use batch or interactive processing?
> Both
g. Do you use a monitoring console?
> It varies
h. Please provide concrete examples of processes to be supported/currently in use;
> It varies - Data processing desiderata: output
a. What data are produced?
> Mainly results of analysis - How are analytics outcomes made available?
> By paper - Statistical questions
a. Is the data collected with a distinct question/hypothesis in mind? Or is simply something being measured?
> Varies
b. Will questions/hypotheses be generated or refined (broadened or narrowed in scope) after the data has been collected? (N.B. Such activity would not be good statistical practice)
> Hopefully not - Statistical data
a. Does the question involve analysing the responses of a single set of data (univariate) to other predictor variables or are there multiple response data (bi or multivariate data)?
> Varies
b. Is the data continuous or discrete?
> Varies
c. Is the data bounded in some form (i.e. what is the possible range of the data)?
> Varies
d. Typically how many datums approximately are there?
> Can be millions - fStatistical data analysis
a. Is it desired to work within a statistics or data mining paradigm?
> Mainly statistical
b. Is it desired that there is some sort of outlier/anomaly assessment?
> desirable
c. Are you interested in a statistical approach which rejects null hypotheses (frequentist) or generates probable belief in a hypothesis (Bayesian approach) or do you have a no real preference
> Both
| Go-between | Cristina Adriana Alexandru |
|---|
| RI representative | Charles Paxton |
|---|
| Period of requirements collection | November 2015 |
|---|
| Status | |
|---|