Series

Storing Data

Putting some turbo boost into our gradient descent code


  • Bijon Setyawan Raya

  • July 18, 2022

    7 mins


    Putting some turbo boost into our gradient descent code

    Optimized GD 🔥 (2 Parts)


    gd_with_df() is the function that I originally ran to show my experimentations. However, the time required to run the code is too long whenever the number of iteration increases.

    In this post, I will be presenting three means of storing data into dataframes.

    Dataframe#

    def gd_with_df(x, y, epochs, df, alpha = 0.01):
        intercept, coefficient = 2.0, -7.5
        predictions = predict(intercept, coefficient, x)
        sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
        df.loc[0] = [intercept, coefficient, sum_error]
        for epoch in range(1, epochs + 1):
            predictions = predict(intercept, coefficient, x)
            b0_error = (1/len(x)) * np.sum(predictions - y)
            b1_error = (1/len(x)) * np.sum((predictions - y) * x)
            intercept = intercept - alpha * b0_error
            coefficient = coefficient - alpha * b1_error 
            sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
            df.loc[epoch] = [intercept, coefficient, sum_error]
            sum_error = 0
        return df
    

    def gd_with_list(x, y, epochs, df, alpha = 0.01):
        intercepts = list()
        coefficients = list()
        sum_errors = list()
        
        intercept, coefficient = 2.0, -7.5
        predictions = predict(intercept, coefficient, x)
        sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
        
        intercepts.append(intercept)
        coefficients.append(coefficient)
        sum_errors.append(sum_error)
    
        for epoch in range(1, epochs+1):
            predictions = predict(intercept, coefficient, x)
            b0_error = (1/len(x)) * np.sum(predictions - y)
            b1_error = (1/len(x)) * np.sum((predictions - y) * x)
            intercept = intercept - alpha * b0_error
            coefficient = coefficient - alpha * b1_error 
            sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
    
            intercepts.append(intercept)
            coefficients.append(coefficient)
            sum_errors.append(sum_error)
    
            sum_error = 0
    
        df['intercept'] = intercepts
        df['coefficient'] = coefficients
        df['sum_error'] = sum_errors
    
        return df
    

    def gd_with_dict(x, y, epochs, df, alpha = 0.01):
        intercept, coefficient = 2.0, -7.5
        predictions = predict(intercept, coefficient, x)
        sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
        
        result = {
            'intercept': [intercept],
            'coefficient': [coefficient],
            'sum_error': [sum_error]
        }
    
        for epoch in range(1, epochs+1):
            predictions = predict(intercept, coefficient, x)
            b0_error = (1/len(x)) * np.sum(predictions - y)
            b1_error = (1/len(x)) * np.sum((predictions - y) * x)
            intercept = intercept - alpha * b0_error
            coefficient = coefficient - alpha * b1_error 
            sum_error = np.sum((predictions - y) ** 2) / (2 * len(x))
    
            result['intercept'].append(intercept)
            result['coefficient'].append(coefficient)
            result['sum_error'].append(sum_error)
    
            sum_error = 0
    
        # convert dict to dataframe
        df = pd.DataFrame(result)
    
        return df
    

    IterationsDataframe (mins)List (mins)Dictionary (mins)
    10000.03030.00140.0013
    100000.31230.01140.0112
    1000005.76110.10450.1064
    1000000I can take a nap1.10751.1141
    10000000$&!^@&#@(10.98710.521

    Summing the numbers up into a single picture, we can see that working with both lists and dictionaries is much faster than working with dataframes.

    Time required per iteration
    Time required per iteration

    Related Posts