S
S
Strohmann2017-12-08 13:14:10
Python
Strohmann, 2017-12-08 13:14:10

How to properly parse a CSV file in pandas that contains many TimeStamp0 datasets; value0; Timestamp1;Value1;?

Hello,.
I just started learning Python and immediately decided to try to solve an urgent practical problem - creating interactive reports.
Sketched a module that works with a specific CSV file. I got the result, but I want to do well.
The next step is to unify to import a file with an unknown number of datasets. And here the question arose about how best to process the source data.
The data in CSV has the format:
Timestamp0; value0 ; timestamp1; value1; ...TimestampN; ValueN;
The main problem is understanding the organization of indexing. When reading with Pandas, you can select a column or series as an index. And my problem is that each data column must have its own index. For further plotting using Plotly, I simply read the CSV several times with the required column in index_col. I understand that this is very wrong, but I can not figure out how to do it more correctly.
Could you tell me how to work with such data?

Current implementation
import plotly
import plotly.graph_objs as go
#from plotly.graph_objs import Scatter, Layout

import pandas

plotly.offline.init_notebook_mode(connected=True) #инициализация работы plotly offline
# чтение CSV файла

source="d:/TEMP/GAS.csv"
trend1 = 'Термопара BK1, °C'
trend2 = 'Термопара BK2, °C'
trend3 = 'Термопара BK3, °C'
trend1_name='A-A.BK1'
trend2_name='A-A.BK2'
trend3_name='A-A.BK3'

df = pandas.read_csv(source,
                     sep=';',
                     parse_dates=[trend1 + ' Time'],
                     dayfirst=True,
                     index_col=[trend1 + ' Time'],
                     decimal=','
                    )
                   

dfBK2 = pandas.read_csv(source,
                     sep=';',
                     parse_dates=[trend2 + ' Time'],
                     dayfirst=True,
                     index_col=[trend2 +' Time'],
                     decimal=','
                     )

dfBK3 = pandas.read_csv(source,
                     sep=';',
                     parse_dates=[trend3 + ' Time'],
                     dayfirst=True,
                     index_col=[trend3 +' Time'],
                     decimal=','
                     )

# определение трендов
trace1 = go.Scatter(x=df.index,
                    y=df[trend1 + ' ValueY'],
                    name=trend1_name
                    )

trace2 = go.Scatter(x=dfBK2.index,
                   y=dfBK2[trend2 + ' ValueY'],
                    name=trend2_name,
                    yaxis='y2'
                   )
trace3 = go.Scatter(x=dfBK3.index,
                   y=dfBK3[trend3 + ' ValueY'],
                    name=trend3_name,
                    yaxis='y3'
                   )
data = [trace1, trace2, trace3]

# определение области построения
Width = 1
High = 1
domainWidth=Width-0.1
layout = dict(legend= dict(x= 0, 
                           y= 1
                          ),
              hovermode='x',
              xaxis=dict(domain=[0, domainWidth] # размер области графика                           
                        ), 
              yaxis=dict(showgrid=True, 
                          side= 'right', 
                         title= trend1_name
                        ), 
              
              yaxis2=dict(overlaying= 'y', 
                          anchor= 'free', 
                          side= 'right',
                          title= trend2_name,
                        position=domainWidth+0.05                      
                         ),
              yaxis3=dict(overlaying= 'y', 
                          anchor= 'free', 
                          side= 'right',
                          title= trend3_name,
                        position=domainWidth+0.1                      
                         ),
             )
fig = dict(data=data,layout=layout)
plotly.offline.plot(fig)
Link to sample file if anyone is interested

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey, 2017-12-09
@Strohmann

You can simply split the table into several independent dataframes. There is no need to read CSV multiple times.
For example, like this:

import pandas as pd
import numpy as np


def get_df(filename, parse_dates=None):
    """Build :class:`DataFrame` list from CSV file.

    Expected CSV file format::

        Timestamp0; Value0 ; Timestamp1; Value1; ...TimestampN; ValueN;

    Args:
        filename: CSV filename.
        parse_dates: List of columns with dates.

    Returns:
        List of DataFrames with 'Timestamp' as index and 'Value' as value
        column.

    Notes:
        :attr:`DataFrame._name` contains name extracted
        from 'TimestampX' column.
    """

    df_all = pd.read_csv(filename, sep=';', decimal=',',
                         parse_dates=parse_dates, header=0)

    assert len(df_all.columns) % 2 == 0

    lst = []
    columns = ['time', 'value']

    # Create lsit of 2-items chunks.
    col_list = np.split(df_all.columns, len(df_all.columns) / 2)

    for cols in col_list:
        df = df_all[cols]                   # split 2-column DataFrame.
        df._name = cols[0].split(',')[0]    # attach name to data frame.
        df.columns = columns                # change columns names.
        df = df.set_index('time')           # set index to timestamps.
        lst.append(df)

    return lst


df_list = get_df('GAS.csv', parse_dates=[0, 2, 4])
df = df_list[0]
print(df.index)
print(df._name)

...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question