Converting a List of Dictionaries to Pandas Dataframe

I have this code: import urllib.request, json url = "https://api.wto.org/timeseries/v1/indicator_categories?lang=1" hdr ={ # Request headers 'Cache-Control': 'no-cache', 'Ocp-Apim-Subscription-Key': '21cda66d75fc4010b8b4d889

Continue reading →

Sum of values with 2 decimals resulting in a number with a lot of decimals

I am working on a project where I use dataframes and at some point I need to do the sum of a column. This columns contains float values that has all been rounded to 2 decimals. The screenshot shows 2953.8 but in reality it is 2953.80. Wh

Continue reading →

How to group by and find new or disappearing items

I am trying to assess in a sales database whether the # of advertisements has changed. The example dataframe I am using is as such: df = pd.DataFrame({"offer-id": [1,1,2,2,3,4,5], "date": ["2024-02-10","2024-02-11","2024-02-10","2024-02-

Continue reading →

back fill data on the condition that there is a 1 located <=2 rows previously to your current location

df = pd.DataFrame( { "group": [0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,1, 1, 1, 1], "value": [0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,0, 0, 1, 1], "desire

Continue reading →

Pandas DataFrame groupby(...).agg(...) adding an extra row for a sum value instead of merging it into the row with the id that the groupby uses

This is part two or a follow up to [this question][1]. I have a DataFrame that looks like this: |id |name |start_date|clicks |conversions|installs|downloads| |---|------|-----------|--------|---------|---------|--------| |101|India |202

Continue reading →

why does python's panda return differce found when compare two csv files and the cell is empty?

(1) I am using Python's pandas to compare two csv files. In the both files have exact same data set and should return something like the statement "two files are identical". However, there is one column with header of "Error" and the col

Continue reading →

Switch row value into column header file and cell value

MY EXCEL CONTENT HAD 0 HEADER FILE BUT 4TH & 5TH ROW ALSO A HEADER FILE HOW TO BRING IT UP INTO 0 NEXT TO ID. SAMPLE TABLE. BELOW FIND THE REAL SOURCE IMAGE. SOURCE IMAGE tried this below code df = pd.DataFrame([[1,2,3],[4,5,6]]) print(p

Continue reading →

Fastest way to assign row to dataframe in pandas groupby loop

Ok so I have 2 dataframes: df = pd.DataFrame({'A':['German Shepherd','Border Collie','Golden Retriever','Beagle','Daschund']}) df = df.T df.columns = df.iloc[0] df = df.drop(df.index[0]) A German Shepherd Border Collie Golden Re

Continue reading →

The best algorithm to get index with specific range at pandas dataframe

I need to implement whether the driver was overspeeding while driving. GPS information is uploaded every second from the GPS device installed in the vehicle driven by the driver like below. [(37.165224, 127.2354123), ... ,(37.123456, 127

Continue reading →

pyspark udf function storing incorrect data despite function producing correct result

So I have this weird issue. I'm using a huge dataset that has dates and times in it represented by a single string. This data can be easily converted using datetime.strptime(), but the problem is the data is so huge, I need to use pysp

Continue reading →

Filling in a numpy ndarray with a 1d array and a mask/index of true/false values

Let's say I have the following: import numpy as np X = np.array([[1, 2, 3], [4, 5, 6]]) mask = np.array([[False, False, True], [False, True, False]]) values = np.array([20, 30]) I want to fill in the values of 20 and 30 in where the ma

Continue reading →