C
C
CTPAXoff2015-04-24 14:20:12
IDE
CTPAXoff, 2015-04-24 14:20:12

Which environment to choose and how to conduct research?

Actually, the question is the following:
There is a large data array, about 500,000 observations of the following form:
TICKER.PER.DATE.TIME.OPEN.HIGH.LOW.CLOSE.VOL
GAZP.D.20060123.000000.239.0000000.239.0000000.218.4900000.218.8900000.5078252
GAZP.D.20060124.000000.220.5000000.224.6800000.219.6600000.224.0000000.8971078
GAZP.D.20060125.000000.225.2000000.231.0000000.225.0000000.228.3800000.15467697
GAZP.D.20060126.000000.228.9000000.229.4100000.223.5100000.224.4700000.7585458
GAZP.D.20060127.000000 .226.2000000.231.5000000.224.0000000.228.7500000.12719299
For simplicity, each of these lines is an observation with 9 different variables. To begin with, we need to extract from this data a sequence of observations in which a given rule of the type:
open[i]>open[i-1]& close[i]>close[i-1]
is executed and where [i-1] is the previous observation, [i] is the current observation, after which we are interested in how the price changes in the future after a certain number of observations, be it 3/5/7/10 periods, etc.
Let's say:
[iN].................................................. ................................................. ..................................
[i-3] .75914790
[i-2] GAZP.D.20131205.000000.137.0200000.137.8000000.134.8100000.135.2500000.60114330
[i-1] GAZP.D. 20131206 .000000. 135.7100000 .137.9400000.135.0000000. 137.4800000 .57176190
[i] GAZP.D. 20131209 .000000. 137.7500000 .138.8500000.136.6100000. 138.4300000 .52689680
[I + 1] GAZP.D.20131210.000000.138.6000000.138.7500000.136.9000000.137.6100000.43531940
[i + 2] Gazp.d.2013111.000000.137.10000.137.8700000.135.8100000.137.0400000.42658230
[i + 3] GAZP. D.20131212.000000.135.7500000.136.1800000.133.5500000.134.1600000.65908040
[I + 4] Gazp.d.201313.00 0000.134.1800000.134.8700000.132.6900000.133.3800000.62409420
[i + 5] Gazp.d.201316.00 0000.132.7000000.136.6000000.132.3200000.136.5000000.74785600
[I + 6] Gazp.d.20131217.000000 .136.5000000.139.0800000.136.2300000.138.5500000.75623300
[i + 7] Gazp.d.201318.000000.139.0000000.139.8800000.137.3600000.139.8700000.64051730
[I + 8] Gazp.d.201319.00 0000.141.0100000.142.4800000.140.1400000.140.4500.68051170
................................................. ................................................. .........
[i+N]
We see that on December 9 this condition is met, since the open price is greater than the open price of December 6 and the close price of the current period is greater than the close price of December 6th. Next, you need thisfix the sequence in a separate file (for example, transfer the given sequence of observations obeying the above rule into one new observation). The lines of interest are from [i-2] to [i+10].
What needs to be done next, having collected a certain amount of data: It is
necessary to compare all newly received observations and calculate both in percentages and in absolute numbers: -How
much did the close price change on average after 3/5/7/10 days,
- What was the average maximum value of the high parameter during these 3/5/7/10 days.
-What was the average maximum value of the high parameter for these 3/5/7/10 days.
And now the main question .... in what environment can this be done? Does SPSS allow you to do all of the above, or do you need to use other software?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
Z
zaplokee, 2015-04-25
@zaplokee

I can’t speak for SPSS, but in RStudio with pre-installed packages and dplyr (although it is optional, but, in my opinion, working with dataframes is more convenient in it) everything is possible.
Load the original data into a dataframe.
Fetch the rows with the variable condition you wrote into the new dataframe. In dplyr, the filter() function is responsible for selecting rows.
To change the construction of rows and columns in a dataframe, you can use the mutate() and melt() functions.
The summarise() function from dplyr will help you build and calculate the average, maximum and minimum values. Well, graphs can be built either using predefined functions (plot()), or using the lattice or ggplot2 packages.
The received data can be written to a file (txt, xls/xlsx, csv, html, pdf).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question