![]() ![]() SAS import pandas as pd import pyarrow as pa fs = pa.nnect() with fs.open(‘/datalake/airplane.sas7bdat’, ‘rb’) as f: sas_df = pd.read_sas(f, format='sas7bdat') sas_df.head()Įxcel import pandas as pd import pyarrow as pa fs = pa.nnect() with fs.open(‘/datalake/airplane.xlsx’, ‘rb’) as f: g.download('airplane.xlsx') ex_df = pd.read_excel('airplane.xlsx') To accomplish that we’ll use the open function that returns a buffer object that many pandas function like read_sas, read_json could receive as input instead of a string URL. ![]() Pyarrow.parquet import pyarrow.parquet as pq path = ‘hdfs:///iris/part-00000–71c8h2d3-fcc9–47ff-8fd1–’ table = pq.read_table(path) table.schema df = table.to_pandas() df.head()Īs we can store any kind of files (SAS, STATA, Excel, JSON or objects), the majority of then are easily interpreted by Python. Using pandas and Pyarrow engine import pandas as pd pdIris = pd.read_parquet(‘hdfs:///iris/part-00000–27c8e2d3-fcc9–47ff-8fd1–’, engine=’pyarrow’) pdTrain.head() There is two forms to read a parquet file from HDFS Compatibility note: if you are using pq.write_to_dataset to create a table that will then be used by HIVE then partition column values must be compatible with the allowed character set of the HIVE version you are running.Apache Arrow with Pandas (Local File System)Ĭonverting Pandas Dataframe to Apache Arrow Table import numpy as np import pandas as pd import pyarrow as pa df = pd.DataFrame(, index=list(‘abc’)) table = pa.om_pandas(df) pq.write_to_dataset(table, root_path=’dataset_name’,partition_cols=) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |