Keep track of the carry using a variable and simulate digits-by-digits sum starting from the head of list, which contains the least-significant digit.
the addition of two numbers: 342+465=807342 + 465 = 807342+465=807.
Each node contains a single digit and the digits are stored in reverse order.
Algorithm
Just like how you would sum two numbers on a piece of paper, we begin by summing the least-significant digits, which is the head of l1l1l1 and l2l2l2. Since each digit is in the range of 0…90 \ldots 90…9, summing two digits may "overflow". For example 5+7=125 + 7 = 125+7=12. In this case, we set the current digit to 222 and bring over the carry=1carry = 1carry=1 to the next iteration. carrycarrycarry must be either 000 or 111 because the largest possible sum of two digits (including the carry) is 9+9+1=199 + 9 + 1 = 199+9+1=19.
The pseudocode is as following:
Initialize current node to dummy head of the returning list.
Initialize carry to 000.
Loop through lists l1l1l1 and l2l2l2 until you reach both ends and carry is 000.
Set xxx to node l1l1l1's value. If l1l1l1 has reached the end of l1l1l1, set to 000.
Set yyy to node l2l2l2's value. If l2l2l2 has reached the end of l2l2l2, set to 000.
Set sum=x+y+carrysum = x + y + carrysum=x+y+carry.
Update carry=sum/10carry = sum / 10carry=sum/10.
Create a new node with the digit value of (sum mod 10)(sum \bmod 10)(summod10) and set it to current node's next, then advance current node to next.
Advance both l1l1l1 and l2l2l2.
Return dummy head's next node.
Note that we use a dummy head to simplify the code. Without a dummy head, you would have to write extra conditional statements to initialize the head's value.
The problem we are solving in this video is to find the indices of two numbers in a given array of integers that add up to a given target, which is 9 in case 1, 6 in case 2, and case 3 respectively. We may assume that each input would have exactly one solution, and we may not use the same element twice. We can return the answer in any order.
Approach : Brute Force
Algorithm
The brute force approach is simple.
Loop through each element x and find if there is another value that equals to target −x .
Complexity Analysis
Time complexity: O(〖𝑛 〗^2) For each element, we try to find its complement by looping through the rest of the array which takes O(〖𝑛 〗^) time. Therefore, the time complexity is O(〖𝑛 〗^2)
Space complexity: O(1). The space required does not depend on the size of the input array, so only constant space is used.
for i in range(len(nums)):
if target - nums[i] not in dict:
dict[nums[i]] = i
else:
return[dict[target - nums[i]], i]
The algorithm used in this code is called the "hash table" approach. By using a dictionary to store each integer in nums and its corresponding index, we can quickly look up if the difference between target and a current integer in nums has already been encountered before. If it has, then we know that the current integer and the one in the dict add up to target, and we can return their indices. Overall, this code provides a solution to the "two sum" problem by using a hash table approach to efficiently look up pairs of integers that add up to a target.
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
# LSTAT: percentage of the population classified as low status
# INDUS: proportion of non-retail business acres per town
# NOX: nitric oxide concentrations
# RM: average number of rooms
# MEDV: median value of owner-occupied homes in $1000
housing = pd.read_csv('Data_CSV/housing.csv', index_col=0) # housing price in Boston
housing.head()
# Quantifying association with covariance
housing.cov()
# correlation
housing.corr()
# scatter matrix plot
from pandas.plotting import scatter_matrix
sm = scatter_matrix(housing, figsize=(10, 10))
# Observe the association between LSTAT and MEDV
housing.plot(kind='scatter', x='RM', y='MEDV', figsize=(6,6))
import pandas as pd import numpy as np from scipy.stats import norm import matplotlib.pyplot as plt %matplotlib inline
# import microsoft.csv, and add a new feature - logreturn ms = pd.read_csv('Data_CSV/microsoft.csv') ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column. ms['logReturn'] = np.log(ms['Close'].shift(-1)) - np.log(ms['Close'])
# Log return goes up and down during the period plt.title("Daily Return of Microsoft from 2014 to 2017", size = 20) ms['logReturn'].plot(figsize=(20, 8)) plt.axhline(0, color='red') plt.xlabel("Time", size = 10) plt.ylabel("Daily Return", size = 10) plt.show()
plt.title("Close Price of Microsoft from 2014 to 2017", size = 10) plt.xlabel("Time", size = 10) plt.ylabel("US $", size = 10) plt.plot(ms.loc[:,'Close'])
plt.title("Histogram of Daily Return of Miscorsoft from 2014 to 2017", size = 10) ms.loc[:,'logReturn'].dropna().hist(bins = 100,figsize=(10, 4))
xbar = ms['logReturn'].mean() # mean of daily log return of Microsoft s = ms['logReturn'].std(ddof=1) n = ms['logReturn'].shape[0] zhat = (xbar-0)/(s/(n**0.5)) print(xbar) print(zhat)
alpha = 0.05 zleft = norm.ppf(alpha/2,0,1) zright = -zleft print('zleft = ',zleft,', zright =',zright) print("Rejection region: zhat < {:.4f} or zhat > {:.4f}".format(zleft, zright)) print('At the significance level of ', alpha) print('Shall we reject?:', zhat<zleft or zhat>zright )
alpha = 0.05 zright = norm.ppf(1-alpha,0,1) print("zriht:{:.4f}, zhat:{:.4f}".format(zright,zhat)) print('At the significance level of ', alpha) print('Shall we reject?:', zhat>zright )
# Null hypothesis mu = 0 mu = 0
# Test statistic (z-score) zhat = 1.6141477140003675
alpha = 0.05 p = 2 *(1 - norm.cdf(np.abs(zhat), 0, 1)) print('At the significance level of ', alpha,', p value =', p) print('Shall we reject?:', p<alpha )
# Look up Z-value
# norm.ppf() is the percent point function of the standard normal distribution
# norm.ppf() returns the critical value for a given probability or percentile
z_value = norm.ppf((1 + confidence_level) / 2)
# Calculate standard error
standard_error = sample_std / np.sqrt(len(data))
# Print results
print("Z_value is ", z_value)
print("Sample Mean is ", sample_mean)
print("************************************")
print("{0}% confidence interval is ({1}, {2})".format(int(confidence_level * 100), lower_bound, upper_bound))
# generate a random sample of 100 values from a standard normal distribution
# Generate random sample from standard normal distribution
sample = np.random.normal(size=100)
# Calculate mean and standard deviation of sample
mean = np.mean(sample)
std_dev = np.std(sample, ddof=1)
# Calculate Z-score of a specific value
value = 1.5
z_score = (value - mean) / std_dev
# Calculate probability of Z-score being less than a certain value
prob = stats.norm.cdf(z_score)
print("Z_score of a value of 1.5 is ",z_score)
print("Probability of z_score is ",prob)
ms = pd.read_csv('Data_CSV/microsoft.csv') # imports the Microsoft stock data from a CSV file located in the "Data_CSV" folder of the current workspace.
ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column.
ms.head()
# values for calculting the 80% confidence interval
z_left = norm.ppf(0.1) # left quantile
z_right = norm.ppf(0.9) # right quantile
sample_size = ms['logReturn'].shape[0]
sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)/sample_size**0.5
# Confidence interval for daily return
# 80% confidence interval tells you that there will be 90% chance that the average stock return lies between "interval_left" &"interval_right".
interval_left = sample_mean+z_left*sample_std # lower bound
interval_right = sample_mean+z_right*sample_std # upper bound
print("Z_left*sample_std is ", z_left*sample_std)
print("Z_right*sample_std is ", z_right*sample_std)
print("Smaple Mean is ", sample_mean)
print("************************************")
print("80% confidence interval is ",(interval_left, interval_right) )
Today, we are going to delve into the topic of how to estimate the average return utilizing confidence intervals. A confidence interval is a range of values within which we are reasonably sure that the true value of a population parameter lies. It is calculated from the sample data, and the width of the interval depends on the level of confidence and the sample size.
For example, suppose we want to estimate the average height of students in a school. We first take a random sample of 50 students and find that their average height is 160cm, with a standard deviation of 5cm. We can then use this information to calculate a confidence interval for the true average height of all students in the school.
Let's say we want a 95% confidence interval. We can use this formula to calculate the interval:
CI = x ± z * (s / sqrt(n))
Specifically, Z is the z-score corresponding to the desired confidence level
In this case, the z-score for 95% confidence level is 1.96 (you can find this value in a Z-table or use the norm.ppf() function in Python). So the 95% confidence interval is bwetween:
CI = 160 ± 1.96 * (5 / sqrt(50)) = (158.61, 161.39)
This means that we are 95% confident that the true average height of all students in the school is between 158.61cm and 161.39cm.
Here's the Python code to calculate the confidence interval:
import numpy as np
from scipy.stats import norm
In finance, analysts use confidence intervals to estimate the possible range of outcomes for a given data set. For example, a financial analyst might use a confidence interval to estimate the possible range of returns for a particular investment. The confidence interval provides a measure of the level of uncertainty associated with the estimate, which can be used to make informed decisions about the investment. A wider confidence interval indicates more uncertainty, while a narrower confidence interval indicates more confidence in the estimate.
Here ~ is a sample log return of Microsoft stock price from which we can compute the sample mean, which is an estimate of the true population mean. So, our goal in this video is to go beyond mere estimation. It is reasonable to assume that if a sample is a good representation of the population, the population mean should be in proximity to the sample mean. Thus, our task in this video is to determine the population mean by means of a range with a lower and upper bound.
To commence with, we need to standardize the sample mean since different samples possess distinct means and standard deviations. We can standardize the sample mean by subtracting the population mean and dividing by the standard deviation of the population, which is equivalent to the sample size's square root. After standardization, it follows a standard normal distribution, also known as the Z-distribution.
The Z-distribution, is a probability distribution that has a mean of 0 and a standard deviation of 1. It is a bell-shaped curve that is symmetric around the mean, with most of the data falling within 3 standard deviations of the mean.
The Z-distribution is important in statistics and probability because it allows us to standardize data and compare it to other distributions. By converting data to Z-scores, we can calculate probabilities and make comparisons across different data sets.
In finance, the Z-distribution is often used in hypothesis testing to determine whether a given result is statistically significant. For example, a financial analyst might use the Z-distribution to test whether a particular investment strategy is outperforming the market. By comparing the results to the Z-distribution, the analyst can determine whether the results are statistically significant or whether they could have occurred by chance. This can help to guide investment decisions and avoid costly mistakes.
In Python, we can use the scipy.stats module to work with the Z-distribution. Here's an example of how to generate a random sample of 100 values from a standard normal distribution:
This code uses the np.random.normal() function to generate a random sample of 100 values from a standard normal distribution. We can also calculate the Z-score of a specific value by dividing std_dev into the difference of value and mean. In this case, we calculate the Z-score of a value of 1.5. Finally, we can use the stats.norm.cdf() function to calculate the probability of a Z-score being less than a certain value.
Oaky, now, you have a good understanding of both confidence interval and z-distribution.
Now, I am going to show you the standard steps to calculate confidence interval using the Z-distribution.
1. First, we need to determine the confidence level we want to use. For example, if you want to calculate an 80% confidence interval, your confidence level would be 0.8.
2. Then, we need to look up the Z-value that corresponds to our desired confidence level using a Z-table. With the help of python, we can calculate the z_value with this formula and the result is 1.28.
3. Thirdly, we shall calculate the sample mean and sample standard deviation for the data set we are analyzing.
4. Then, we will calculate the standard error, which is equal to the sample standard deviation divided by the square root of the sample size.
5. Then, we multiply the standard error by the Z-value which we calculated before.
Lastly, we add and subtract the value obtained in the last step from the sample mean to get the lower and upper bounds of the confidence interval, respectively.
Here is an example Python code that calculates an 80% confidence interval using the Z-distribution:
In this example, the confidence interval for the data set is between (2.21, 3.63) with an 80% confidence level. This means that we can be 80% confident that the true population mean falls within this range.
Okay, now, let’s put what we just learned into constructing the 80% confidence interval for the daily return of Microsoft stock. First, we need to find the quantiles of the mean distribution. Then, we can utilize the norm.ppf function to obtain the quantiles. We can then compute the sample mean and the standard deviation of the sample, in which the population standard deviation is replaced by the sample standard deviation. If we run the cell, we can read the result of the 80 percent confidence interval. As you can see, it is noteworthy that this interval is on the positive side. This indicates that the average return of Micorsoft stock is highly likely to be positive.
Overall, confidence intervals and the Z-distribution are essential tools for financial analysts who need to make data-driven decisions based on statistical significance. By understanding these concepts and applying them to real-world scenarios, analysts can make more informed decisions and improve their performance.
# Look up Z-value
# norm.ppf() is the percent point function of the standard normal distribution
# norm.ppf() returns the critical value for a given probability or percentile
z_value = norm.ppf((1 + confidence_level) / 2)
# Calculate standard error
standard_error = sample_std / np.sqrt(len(data))
# Print results
print("Z_value is ", z_value)
print("Sample Mean is ", sample_mean)
print("************************************")
print("{0}% confidence interval is ({1}, {2})".format(int(confidence_level * 100), lower_bound, upper_bound))
# generate a random sample of 100 values from a standard normal distribution
# Generate random sample from standard normal distribution
sample = np.random.normal(size=100)
# Calculate mean and standard deviation of sample
mean = np.mean(sample)
std_dev = np.std(sample, ddof=1)
# Calculate Z-score of a specific value
value = 1.5
z_score = (value - mean) / std_dev
# Calculate probability of Z-score being less than a certain value
prob = stats.norm.cdf(z_score)
print("Z_score of a value of 1.5 is ",z_score)
print("Probability of z_score is ",prob)
ms = pd.read_csv('Data_CSV/microsoft.csv') # imports the Microsoft stock data from a CSV file located in the "Data_CSV" folder of the current workspace.
ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column.
ms.head()
# values for calculting the 80% confidence interval
z_left = norm.ppf(0.1) # left quantile
z_right = norm.ppf(0.9) # right quantile
sample_size = ms['logReturn'].shape[0]
sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)/sample_size**0.5
# Confidence interval for daily return
# 80% confidence interval tells you that there will be 90% chance that the average stock return lies between "interval_left" &"interval_right".
interval_left = sample_mean+z_left*sample_std # lower bound
interval_right = sample_mean+z_right*sample_std # upper bound
print("Z_left*sample_std is ", z_left*sample_std)
print("Z_right*sample_std is ", z_right*sample_std)
print("Smaple Mean is ", sample_mean)
print("************************************")
print("80% confidence interval is ",(interval_left, interval_right) )
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
print('Sample mean is ', Fstsample[0].mean())
print('Sample SD is ', Fstsample[0].std(ddof=1))
meanlist = []
varlist = []
for t in range (10000):
sample = pd.DataFrame(np.random.normal(10,5,size=30))
meanlist.append(sample[0].mean())
varlist.append(sample[0].var(ddof=1))
pop = pd.DataFrame(np.random.normal(10,5,size=100000))
pop[0].hist(bins=500, color='cyan', density=True)
collection['meanlist'].hist(bins=500,density=True,color='red',figsize=(10,4))
samplemeanlist = []
apop = pd.DataFrame([1,0,1,0,1])
for t in range (100000):
sample = apop[0].sample(10,replace=True)
samplemeanlist.append(sample.mean())
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
Firstly, the code imports the necessary libraries for data manipulation, numerical calculations, statistical computations, and visualization, which are all essential for quantitative trading. Specifically, pandas is a library used for data manipulation and analysis, numpy is a library for numerical computations.
‘from scipy.stats import norm’ - This line of code imports the norm function from the scipy.stats library. The norm function is a statistical function that computes the probability density function (PDF) of the standard normal distribution.
‘%matplotlib inline’ - This line of code is a Jupyter Notebook magic command that allows for the visualization of plots and charts inline within the notebook.
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
print('Sample mean is ', Fstsample[0].mean())
print('Sample SD is ', Fstsample[0].std(ddof=1))
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
This line of code generates a sample of 30 numbers from a normal distribution with a mean of 10 and a standard deviation of 5 using the numpy.random.normal() function. It then converts the generated sample into a pandas DataFrame object and assigns it to the variable 'Fstsample'. (ppt)
print('Sample mean is ', Fstsample[0].mean())
This line of code prints the sample mean of the generated sample. It does this by accessing the first column of the 'Fstsample' DataFrame using the index operator ([0]) and computing the mean using the mean() method
print('Sample SD is ', Fstsample[0].std(ddof=1))
This line of code prints the sample standard deviation of the generated sample. It does this by accessing the first column of the 'Fstsample' DataFrame using the index operator ([0]) and computing the sample standard deviation using the std() method with the 'ddof' parameter set to 1. The 'ddof' parameter specifies the degrees of freedom correction, which is used to adjust the calculation of the standard deviation for small sample sizes.
In summary, this code generates a random sample of 30 numbers from a normal distribution, computes the sample mean and sample standard deviation of the generated sample, and prints these values to the console. These statistical measures are important in quantitative trading for analyzing the distribution of data and making informed investment decisions
meanlist = []
varlist = []
for t in range (10000):
sample = pd.DataFrame(np.random.normal(10,5,size=30))
meanlist.append(sample[0].mean())
varlist.append(sample[0].var(ddof=1))
This code generates 10,000 random samples of size 30 from a normal distribution with a mean of 10 and a standard deviation of 5 using the numpy.random.normal() function. For each sample, it computes the sample mean and sample variance, and stores these values in two separate lists 'meanlist' and 'varlist', respectively.
The code first initializes two empty lists 'meanlist' and 'varlist' to store the calculated sample means and sample variances.
The code then enters a for loop that iterates 10,000 times. In each iteration, the code generates a new sample of size 30 from the normal distribution using the numpy.random.normal() function, converts it to a pandas DataFrame object, and assigns it to the variable 'sample'.
The code then calculates the sample mean of the generated sample using the mean() method, and appends this value to the 'meanlist' using the append() method.
Next, the code calculates the sample variance of the generated sample using the var() method with the 'ddof' parameter set to 1. The calculated sample variance is then appended to the 'varlist' using the append() method.
After 10,000 iterations, the 'meanlist' and 'varlist' contain the sample means and sample variances of the 10,000 generated samples, respectively.
This code is useful in quantitative trading because it enables the simulation of random samples from a distribution, allowing traders to study the behavior of a security or portfolio under various scenarios and make informed investment decisions.
In this video, let’s delve into the variability of the sample mean and its corresponding distribution. Understanding the principles of variation is crucial in enabling us to accurately evaluate estimations and validate assertions concerning populations based on samples.
For instance, let us consider a scenario where historical data for a hundred days is available. We can calculate the sample mean and variance of the stock returns. Based on these statistics, can we make inferences regarding the parameters? To what extent are the statistics close to population parameters? Can we assert, from observing stock data for a hundred days, that the stock is experiencing an upward trend, implying that the mean return is positive? Our comprehension of the sample mean distribution is vital in answering these questions.
In this case, we randomly sample 30 observations from a population with a normal distribution, where the mean equals 10 and the standard deviation equals 5. Running this cell multiple times would result in different outcomes, such as these examples. The variance of the samples is known as the variation of the sample, which is due to the random selection of samples from a normal distribution.
Moreover, the sample mean and standard deviation are not arbitrary; they follow specific rules as they are extracted from the same population. In this code, we generate a thousand samples from the same population, obtaining the mean and variance for each sample, which we save in a DataFrame collection. The meanlist stores the sample means for the 1,000 samples, while the varlist preserves the sample variances for the same number of samples. We then use a loop to generate 1,000 samples, computing the mean and variance for each sample and saving them in the respective lists.
Finally, we create an empty DataFrame, which we name "collection," to which we add the same meanlist and varlist to different columns. We can then create a histogram of the sample means, which appears symmetric and resembles a normal distribution. On the other hand, the histogram for the sample variance is not normal, as it is right-skewed. It is possible to infer, and even mathematically prove, that the sample mean is normally distributed. If the population is normal with a mean of Mu and a variance of sigma squared, then the sample mean is also normal, with a mean equal to Mu and a variance equal to sigma squared divided by the sample size N. Why is the variance of the sample mean smaller than the variance of the population? Intuitively, the sample mean is the average of N individuals from the population, hence the variation of the sample mean is smaller than the variation of individuals in the population. The Python script provided here demonstrates this concept.The blue histogram represents the population, while the red one depicts the sample mean.
What if the population is not normal? According to the central limit theorem of statistics, if the sample size is large enough, the distribution of sample means resembles a normal distribution. Thus, we can conclude that even if the population is not normal, the sample is approximately normal if the sample size is large enough. We provide an example of the distribution of sample means when the population is not normal. We use a DataFrame named 'apop' to store the population, which comprises five values of one and zero. We can generate 100,000 samples with a small sample size of 10, and the resulting histogram for sample means does not appear to be a normal distribution. However, if we generate 100,000 samples with a large sample size of 2,000, the distribution of sample mean now looks like a normal distribution.
In this video, we focused on the distribution of sample mean and the probability rule of variation of sample mean. We will apply these concepts in the next two videos to explore two important quantitative statistical tools: confidence interval and hypothesis testing.
NeedCode
Approach : Elementary Math
Intuition
Keep track of the carry using a variable and simulate digits-by-digits sum starting from the head of list, which contains the least-significant digit.
the addition of two numbers: 342+465=807342 + 465 = 807342+465=807.
Each node contains a single digit and the digits are stored in reverse order.
Algorithm
Just like how you would sum two numbers on a piece of paper, we begin by summing the least-significant digits, which is the head of l1l1l1 and l2l2l2. Since each digit is in the range of 0…90 \ldots 90…9, summing two digits may "overflow". For example 5+7=125 + 7 = 125+7=12. In this case, we set the current digit to 222 and bring over the carry=1carry = 1carry=1 to the next iteration. carrycarrycarry must be either 000 or 111 because the largest possible sum of two digits (including the carry) is 9+9+1=199 + 9 + 1 = 199+9+1=19.
The pseudocode is as following:
Initialize current node to dummy head of the returning list.
Initialize carry to 000.
Loop through lists l1l1l1 and l2l2l2 until you reach both ends and carry is 000.
Set xxx to node l1l1l1's value. If l1l1l1 has reached the end of l1l1l1, set to 000.
Set yyy to node l2l2l2's value. If l2l2l2 has reached the end of l2l2l2, set to 000.
Set sum=x+y+carrysum = x + y + carrysum=x+y+carry.
Update carry=sum/10carry = sum / 10carry=sum/10.
Create a new node with the digit value of (sum mod 10)(sum \bmod 10)(summod10) and set it to current node's next, then advance current node to next.
Advance both l1l1l1 and l2l2l2.
Return dummy head's next node.
Note that we use a dummy head to simplify the code. Without a dummy head, you would have to write extra conditional statements to initialize the head's value.
3 years ago | [YT] | 0
View 0 replies
NeedCode
The problem we are solving in this video is to find the indices of two numbers in a given array of integers that add up to a given target, which is 9 in case 1, 6 in case 2, and case 3 respectively. We may assume that each input would have exactly one solution, and we may not use the same element twice. We can return the answer in any order.
Approach : Brute Force
Algorithm
The brute force approach is simple.
Loop through each element x and find if there is another value that equals to target −x .
Complexity Analysis
Time complexity: O(〖𝑛 〗^2) For each element, we try to find its complement by looping through the rest of the array which takes O(〖𝑛 〗^) time. Therefore, the time complexity is O(〖𝑛 〗^2)
Space complexity: O(1). The space required does not depend on the size of the input array, so only constant space is used.
Demo Code;
class Solution:
def twoSum(self, nums:List[int], target:int) -> List[int]:
dict = {}
for i in range(len(nums)):
if target - nums[i] not in dict:
dict[nums[i]] = i
else:
return[dict[target - nums[i]], i]
The algorithm used in this code is called the "hash table" approach. By using a dictionary to store each integer in nums and its corresponding index, we can quickly look up if the difference between target and a current integer in nums has already been encountered before. If it has, then we know that the current integer and the one in the dict add up to target, and we can return their indices. Overall, this code provides a solution to the "two sum" problem by using a hash table approach to efficiently look up pairs of integers that add up to a target.
3 years ago | [YT] | 0
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 4.1 - Association of Random Variables #trading #technology #python #algorithmic
Demo Code:
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
# LSTAT: percentage of the population classified as low status
# INDUS: proportion of non-retail business acres per town
# NOX: nitric oxide concentrations
# RM: average number of rooms
# MEDV: median value of owner-occupied homes in $1000
housing = pd.read_csv('Data_CSV/housing.csv', index_col=0) # housing price in Boston
housing.head()
# Quantifying association with covariance
housing.cov()
# correlation
housing.corr()
# scatter matrix plot
from pandas.plotting import scatter_matrix
sm = scatter_matrix(housing, figsize=(10, 10))
# Observe the association between LSTAT and MEDV
housing.plot(kind='scatter', x='RM', y='MEDV', figsize=(6,6))
3 years ago | [YT] | 1
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.7 - Hypothesis Testing #trading #technology #python #algorithmic
import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
%matplotlib inline
# import microsoft.csv, and add a new feature - logreturn
ms = pd.read_csv('Data_CSV/microsoft.csv')
ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column.
ms['logReturn'] = np.log(ms['Close'].shift(-1)) - np.log(ms['Close'])
# Log return goes up and down during the period
plt.title("Daily Return of Microsoft from 2014 to 2017", size = 20)
ms['logReturn'].plot(figsize=(20, 8))
plt.axhline(0, color='red')
plt.xlabel("Time", size = 10)
plt.ylabel("Daily Return", size = 10)
plt.show()
plt.title("Close Price of Microsoft from 2014 to 2017", size = 10)
plt.xlabel("Time", size = 10)
plt.ylabel("US $", size = 10)
plt.plot(ms.loc[:,'Close'])
plt.title("Histogram of Daily Return of Miscorsoft from 2014 to 2017", size = 10)
ms.loc[:,'logReturn'].dropna().hist(bins = 100,figsize=(10, 4))
xbar = ms['logReturn'].mean() # mean of daily log return of Microsoft
s = ms['logReturn'].std(ddof=1)
n = ms['logReturn'].shape[0]
zhat = (xbar-0)/(s/(n**0.5))
print(xbar)
print(zhat)
alpha = 0.05
zleft = norm.ppf(alpha/2,0,1)
zright = -zleft
print('zleft = ',zleft,', zright =',zright)
print("Rejection region: zhat < {:.4f} or zhat > {:.4f}".format(zleft, zright))
print('At the significance level of ', alpha)
print('Shall we reject?:', zhat<zleft or zhat>zright )
alpha = 0.05
zright = norm.ppf(1-alpha,0,1)
print("zriht:{:.4f}, zhat:{:.4f}".format(zright,zhat))
print('At the significance level of ', alpha)
print('Shall we reject?:', zhat>zright )
# Null hypothesis mu = 0
mu = 0
# Test statistic (z-score)
zhat = 1.6141477140003675
alpha = 0.05
p = 2 *(1 - norm.cdf(np.abs(zhat), 0, 1))
print('At the significance level of ', alpha,', p value =', p)
print('Shall we reject?:', p<alpha )
3 years ago | [YT] | 1
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.6 - Confidence Interval - CODE EXPLAINED #trading #technology #python #algorithmic
Demo Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import norm
%matplotlib inline
# calculate the confidence interval: CI = x ± z * (s / sqrt(n))
# sample data
sample_mean = 160
sample_std = 5
sample_size = 50
# z-score for 95% confidence level
z = norm.ppf(0.975)
print(f"Z-score for 95% confidence interval: {z}")
# calculate confidence interval
interval_left = sample_mean - z * (sample_std / np.sqrt(sample_size))
interval_right = sample_mean + z * (sample_std / np.sqrt(sample_size))
print(f"95% confidence interval: ({interval_left:.2f}, {interval_right:.2f})")
import numpy as np
from scipy.stats import norm
# Example data set
data = np.array([1.2, 3.4, 2.3, 4.5, 3.2])
# Calculate sample mean and sample standard deviation
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)
# Determine confidence level
confidence_level = 0.8
# Look up Z-value
# norm.ppf() is the percent point function of the standard normal distribution
# norm.ppf() returns the critical value for a given probability or percentile
z_value = norm.ppf((1 + confidence_level) / 2)
# Calculate standard error
standard_error = sample_std / np.sqrt(len(data))
# Calculate confidence interval
lower_bound = sample_mean - z_value * standard_error
upper_bound = sample_mean + z_value * standard_error
# Print results
print("Z_value is ", z_value)
print("Sample Mean is ", sample_mean)
print("************************************")
print("{0}% confidence interval is ({1}, {2})".format(int(confidence_level * 100), lower_bound, upper_bound))
# generate a random sample of 100 values from a standard normal distribution
# Generate random sample from standard normal distribution
sample = np.random.normal(size=100)
# Calculate mean and standard deviation of sample
mean = np.mean(sample)
std_dev = np.std(sample, ddof=1)
# Calculate Z-score of a specific value
value = 1.5
z_score = (value - mean) / std_dev
# Calculate probability of Z-score being less than a certain value
prob = stats.norm.cdf(z_score)
print("Z_score of a value of 1.5 is ",z_score)
print("Probability of z_score is ",prob)
ms = pd.read_csv('Data_CSV/microsoft.csv') # imports the Microsoft stock data from a CSV file located in the "Data_CSV" folder of the current workspace.
ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column.
ms.head()
ms['logReturn'] = np.log(ms['Close'].shift(-1) - np.log(ms['Close']))
# values for calculting the 80% confidence interval
z_left = norm.ppf(0.1) # left quantile
z_right = norm.ppf(0.9) # right quantile
sample_size = ms['logReturn'].shape[0]
sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)/sample_size**0.5
# Confidence interval for daily return
# 80% confidence interval tells you that there will be 90% chance that the average stock return lies between "interval_left" &"interval_right".
interval_left = sample_mean+z_left*sample_std # lower bound
interval_right = sample_mean+z_right*sample_std # upper bound
print("Z_left*sample_std is ", z_left*sample_std)
print("Z_right*sample_std is ", z_right*sample_std)
print("Smaple Mean is ", sample_mean)
print("************************************")
print("80% confidence interval is ",(interval_left, interval_right) )
https://www.youtube.com/watch?v=NJA87...
3 years ago | [YT] | 1
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.5 - Confidence Interval #trading #technology #python #algorithmic
https://www.youtube.com/watch?v=EY-O6...
Today, we are going to delve into the topic of how to estimate the average return utilizing confidence intervals. A confidence interval is a range of values within which we are reasonably sure that the true value of a population parameter lies. It is calculated from the sample data, and the width of the interval depends on the level of confidence and the sample size.
For example, suppose we want to estimate the average height of students in a school. We first take a random sample of 50 students and find that their average height is 160cm, with a standard deviation of 5cm. We can then use this information to calculate a confidence interval for the true average height of all students in the school.
Let's say we want a 95% confidence interval. We can use this formula to calculate the interval:
CI = x ± z * (s / sqrt(n))
Specifically, Z is the z-score corresponding to the desired confidence level
In this case, the z-score for 95% confidence level is 1.96 (you can find this value in a Z-table or use the norm.ppf() function in Python). So the 95% confidence interval is bwetween:
CI = 160 ± 1.96 * (5 / sqrt(50)) = (158.61, 161.39)
This means that we are 95% confident that the true average height of all students in the school is between 158.61cm and 161.39cm.
Here's the Python code to calculate the confidence interval:
import numpy as np
from scipy.stats import norm
# sample data
sample_mean = 160
sample_std = 5
sample_size = 50
# z-score for 95% confidence level
z = norm.ppf(0.975)
# calculate confidence interval
interval_left = sample_mean - z * (sample_std / np.sqrt(sample_size))
interval_right = sample_mean + z * (sample_std / np.sqrt(sample_size))
print(f"95% confidence interval: ({interval_left:.2f}, {interval_right:.2f})")
In finance, analysts use confidence intervals to estimate the possible range of outcomes for a given data set. For example, a financial analyst might use a confidence interval to estimate the possible range of returns for a particular investment. The confidence interval provides a measure of the level of uncertainty associated with the estimate, which can be used to make informed decisions about the investment. A wider confidence interval indicates more uncertainty, while a narrower confidence interval indicates more confidence in the estimate.
Here ~ is a sample log return of Microsoft stock price from which we can compute the sample mean, which is an estimate of the true population mean. So, our goal in this video is to go beyond mere estimation. It is reasonable to assume that if a sample is a good representation of the population, the population mean should be in proximity to the sample mean. Thus, our task in this video is to determine the population mean by means of a range with a lower and upper bound.
To commence with, we need to standardize the sample mean since different samples possess distinct means and standard deviations. We can standardize the sample mean by subtracting the population mean and dividing by the standard deviation of the population, which is equivalent to the sample size's square root. After standardization, it follows a standard normal distribution, also known as the Z-distribution.
The Z-distribution, is a probability distribution that has a mean of 0 and a standard deviation of 1. It is a bell-shaped curve that is symmetric around the mean, with most of the data falling within 3 standard deviations of the mean.
The Z-distribution is important in statistics and probability because it allows us to standardize data and compare it to other distributions. By converting data to Z-scores, we can calculate probabilities and make comparisons across different data sets.
In finance, the Z-distribution is often used in hypothesis testing to determine whether a given result is statistically significant. For example, a financial analyst might use the Z-distribution to test whether a particular investment strategy is outperforming the market. By comparing the results to the Z-distribution, the analyst can determine whether the results are statistically significant or whether they could have occurred by chance. This can help to guide investment decisions and avoid costly mistakes.
In Python, we can use the scipy.stats module to work with the Z-distribution. Here's an example of how to generate a random sample of 100 values from a standard normal distribution:
This code uses the np.random.normal() function to generate a random sample of 100 values from a standard normal distribution. We can also calculate the Z-score of a specific value by dividing std_dev into the difference of value and mean. In this case, we calculate the Z-score of a value of 1.5. Finally, we can use the stats.norm.cdf() function to calculate the probability of a Z-score being less than a certain value.
Oaky, now, you have a good understanding of both confidence interval and z-distribution.
Now, I am going to show you the standard steps to calculate confidence interval using the Z-distribution.
1. First, we need to determine the confidence level we want to use. For example, if you want to calculate an 80% confidence interval, your confidence level would be 0.8.
2. Then, we need to look up the Z-value that corresponds to our desired confidence level using a Z-table. With the help of python, we can calculate the z_value with this formula and the result is 1.28.
3. Thirdly, we shall calculate the sample mean and sample standard deviation for the data set we are analyzing.
4. Then, we will calculate the standard error, which is equal to the sample standard deviation divided by the square root of the sample size.
5. Then, we multiply the standard error by the Z-value which we calculated before.
Lastly, we add and subtract the value obtained in the last step from the sample mean to get the lower and upper bounds of the confidence interval, respectively.
Here is an example Python code that calculates an 80% confidence interval using the Z-distribution:
In this example, the confidence interval for the data set is between (2.21, 3.63) with an 80% confidence level. This means that we can be 80% confident that the true population mean falls within this range.
Okay, now, let’s put what we just learned into constructing the 80% confidence interval for the daily return of Microsoft stock. First, we need to find the quantiles of the mean distribution. Then, we can utilize the norm.ppf function to obtain the quantiles. We can then compute the sample mean and the standard deviation of the sample, in which the population standard deviation is replaced by the sample standard deviation. If we run the cell, we can read the result of the 80 percent confidence interval. As you can see, it is noteworthy that this interval is on the positive side. This indicates that the average return of Micorsoft stock is highly likely to be positive.
Overall, confidence intervals and the Z-distribution are essential tools for financial analysts who need to make data-driven decisions based on statistical significance. By understanding these concepts and applying them to real-world scenarios, analysts can make more informed decisions and improve their performance.
3 years ago | [YT] | 0
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.5 - Confidence Interval #trading #technology #python #algorithmic
https://www.youtube.com/watch?v=EY-O6...
Demo Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import norm
%matplotlib inline
# calculate the confidence interval: CI = x ± z * (s / sqrt(n))
# sample data
sample_mean = 160
sample_std = 5
sample_size = 50
# z-score for 95% confidence level
z = norm.ppf(0.975)
print(f"Z-score for 95% confidence interval: {z}")
# calculate confidence interval
interval_left = sample_mean - z * (sample_std / np.sqrt(sample_size))
interval_right = sample_mean + z * (sample_std / np.sqrt(sample_size))
print(f"95% confidence interval: ({interval_left:.2f}, {interval_right:.2f})")
import numpy as np
from scipy.stats import norm
# Example data set
data = np.array([1.2, 3.4, 2.3, 4.5, 3.2])
# Calculate sample mean and sample standard deviation
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)
# Determine confidence level
confidence_level = 0.8
# Look up Z-value
# norm.ppf() is the percent point function of the standard normal distribution
# norm.ppf() returns the critical value for a given probability or percentile
z_value = norm.ppf((1 + confidence_level) / 2)
# Calculate standard error
standard_error = sample_std / np.sqrt(len(data))
# Calculate confidence interval
lower_bound = sample_mean - z_value * standard_error
upper_bound = sample_mean + z_value * standard_error
# Print results
print("Z_value is ", z_value)
print("Sample Mean is ", sample_mean)
print("************************************")
print("{0}% confidence interval is ({1}, {2})".format(int(confidence_level * 100), lower_bound, upper_bound))
# generate a random sample of 100 values from a standard normal distribution
# Generate random sample from standard normal distribution
sample = np.random.normal(size=100)
# Calculate mean and standard deviation of sample
mean = np.mean(sample)
std_dev = np.std(sample, ddof=1)
# Calculate Z-score of a specific value
value = 1.5
z_score = (value - mean) / std_dev
# Calculate probability of Z-score being less than a certain value
prob = stats.norm.cdf(z_score)
print("Z_score of a value of 1.5 is ",z_score)
print("Probability of z_score is ",prob)
ms = pd.read_csv('Data_CSV/microsoft.csv') # imports the Microsoft stock data from a CSV file located in the "Data_CSV" folder of the current workspace.
ms.set_index('Date',inplace = True) # sets the DataFrame's index to the 'Date' column.
ms.head()
ms['logReturn'] = np.log(ms['Close'].shift(-1) - np.log(ms['Close']))
# values for calculting the 80% confidence interval
z_left = norm.ppf(0.1) # left quantile
z_right = norm.ppf(0.9) # right quantile
sample_size = ms['logReturn'].shape[0]
sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)/sample_size**0.5
# Confidence interval for daily return
# 80% confidence interval tells you that there will be 90% chance that the average stock return lies between "interval_left" &"interval_right".
interval_left = sample_mean+z_left*sample_std # lower bound
interval_right = sample_mean+z_right*sample_std # upper bound
print("Z_left*sample_std is ", z_left*sample_std)
print("Z_right*sample_std is ", z_right*sample_std)
print("Smaple Mean is ", sample_mean)
print("************************************")
print("80% confidence interval is ",(interval_left, interval_right) )
https://www.youtube.com/watch?v=EY-O6...
3 years ago (edited) | [YT] | 0
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.4 - Variation of Sample - code explained #trading #technology #python #algorithmic
Demo Code:
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
print('Sample mean is ', Fstsample[0].mean())
print('Sample SD is ', Fstsample[0].std(ddof=1))
meanlist = []
varlist = []
for t in range (10000):
sample = pd.DataFrame(np.random.normal(10,5,size=30))
meanlist.append(sample[0].mean())
varlist.append(sample[0].var(ddof=1))
collection = pd.DataFrame()
collection['meanlist'] = meanlist
collection['varlist'] = varlist
collection['meanlist'].hist(bins=500,density=True,figsize=(10,4))
collection['varlist'].hist(bins=500,density=True,figsize=(10,4))
pop = pd.DataFrame(np.random.normal(10,5,size=100000))
pop[0].hist(bins=500, color='cyan', density=True)
collection['meanlist'].hist(bins=500,density=True,color='red',figsize=(10,4))
samplemeanlist = []
apop = pd.DataFrame([1,0,1,0,1])
for t in range (100000):
sample = apop[0].sample(10,replace=True)
samplemeanlist.append(sample.mean())
acollec = pd.DataFrame()
acollec['meanlist'] = samplemeanlist
acollec['meanlist'].hist(bins=500, color='red',density=True,figsize=(10,4))
3 years ago | [YT] | 0
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.4 - Variation of Sample - code explained #trading #technology #python #algorithmic
import pandas as pd
import numpy as np
from scipy.stats import norm
%matplotlib inline
Firstly, the code imports the necessary libraries for data manipulation, numerical calculations, statistical computations, and visualization, which are all essential for quantitative trading. Specifically, pandas is a library used for data manipulation and analysis, numpy is a library for numerical computations.
‘from scipy.stats import norm’ - This line of code imports the norm function from the scipy.stats library. The norm function is a statistical function that computes the probability density function (PDF) of the standard normal distribution.
‘%matplotlib inline’ - This line of code is a Jupyter Notebook magic command that allows for the visualization of plots and charts inline within the notebook.
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
print('Sample mean is ', Fstsample[0].mean())
print('Sample SD is ', Fstsample[0].std(ddof=1))
Fstsample = pd.DataFrame(np.random.normal(10,5,size=30))
This line of code generates a sample of 30 numbers from a normal distribution with a mean of 10 and a standard deviation of 5 using the numpy.random.normal() function. It then converts the generated sample into a pandas DataFrame object and assigns it to the variable 'Fstsample'. (ppt)
print('Sample mean is ', Fstsample[0].mean())
This line of code prints the sample mean of the generated sample. It does this by accessing the first column of the 'Fstsample' DataFrame using the index operator ([0]) and computing the mean using the mean() method
print('Sample SD is ', Fstsample[0].std(ddof=1))
This line of code prints the sample standard deviation of the generated sample. It does this by accessing the first column of the 'Fstsample' DataFrame using the index operator ([0]) and computing the sample standard deviation using the std() method with the 'ddof' parameter set to 1. The 'ddof' parameter specifies the degrees of freedom correction, which is used to adjust the calculation of the standard deviation for small sample sizes.
In summary, this code generates a random sample of 30 numbers from a normal distribution, computes the sample mean and sample standard deviation of the generated sample, and prints these values to the console. These statistical measures are important in quantitative trading for analyzing the distribution of data and making informed investment decisions
meanlist = []
varlist = []
for t in range (10000):
sample = pd.DataFrame(np.random.normal(10,5,size=30))
meanlist.append(sample[0].mean())
varlist.append(sample[0].var(ddof=1))
This code generates 10,000 random samples of size 30 from a normal distribution with a mean of 10 and a standard deviation of 5 using the numpy.random.normal() function. For each sample, it computes the sample mean and sample variance, and stores these values in two separate lists 'meanlist' and 'varlist', respectively.
The code first initializes two empty lists 'meanlist' and 'varlist' to store the calculated sample means and sample variances.
The code then enters a for loop that iterates 10,000 times. In each iteration, the code generates a new sample of size 30 from the normal distribution using the numpy.random.normal() function, converts it to a pandas DataFrame object, and assigns it to the variable 'sample'.
The code then calculates the sample mean of the generated sample using the mean() method, and appends this value to the 'meanlist' using the append() method.
Next, the code calculates the sample variance of the generated sample using the var() method with the 'ddof' parameter set to 1. The calculated sample variance is then appended to the 'varlist' using the append() method.
After 10,000 iterations, the 'meanlist' and 'varlist' contain the sample means and sample variances of the 10,000 generated samples, respectively.
This code is useful in quantitative trading because it enables the simulation of random samples from a distribution, allowing traders to study the behavior of a security or portfolio under various scenarios and make informed investment decisions.
3 years ago | [YT] | 0
View 0 replies
NeedCode
Algorithmic Trading Python 2023 - 3.3 - Variation of Sample #trading #technology #python #algorithmic
In this video, let’s delve into the variability of the sample mean and its corresponding distribution. Understanding the principles of variation is crucial in enabling us to accurately evaluate estimations and validate assertions concerning populations based on samples.
For instance, let us consider a scenario where historical data for a hundred days is available. We can calculate the sample mean and variance of the stock returns. Based on these statistics, can we make inferences regarding the parameters? To what extent are the statistics close to population parameters? Can we assert, from observing stock data for a hundred days, that the stock is experiencing an upward trend, implying that the mean return is positive? Our comprehension of the sample mean distribution is vital in answering these questions.
In this case, we randomly sample 30 observations from a population with a normal distribution, where the mean equals 10 and the standard deviation equals 5. Running this cell multiple times would result in different outcomes, such as these examples. The variance of the samples is known as the variation of the sample, which is due to the random selection of samples from a normal distribution.
Moreover, the sample mean and standard deviation are not arbitrary; they follow specific rules as they are extracted from the same population. In this code, we generate a thousand samples from the same population, obtaining the mean and variance for each sample, which we save in a DataFrame collection. The meanlist stores the sample means for the 1,000 samples, while the varlist preserves the sample variances for the same number of samples. We then use a loop to generate 1,000 samples, computing the mean and variance for each sample and saving them in the respective lists.
Finally, we create an empty DataFrame, which we name "collection," to which we add the same meanlist and varlist to different columns. We can then create a histogram of the sample means, which appears symmetric and resembles a normal distribution. On the other hand, the histogram for the sample variance is not normal, as it is right-skewed. It is possible to infer, and even mathematically prove, that the sample mean is normally distributed. If the population is normal with a mean of Mu and a variance of sigma squared, then the sample mean is also normal, with a mean equal to Mu and a variance equal to sigma squared divided by the sample size N. Why is the variance of the sample mean smaller than the variance of the population? Intuitively, the sample mean is the average of N individuals from the population, hence the variation of the sample mean is smaller than the variation of individuals in the population. The Python script provided here demonstrates this concept.The blue histogram represents the population, while the red one depicts the sample mean.
What if the population is not normal? According to the central limit theorem of statistics, if the sample size is large enough, the distribution of sample means resembles a normal distribution. Thus, we can conclude that even if the population is not normal, the sample is approximately normal if the sample size is large enough. We provide an example of the distribution of sample means when the population is not normal. We use a DataFrame named 'apop' to store the population, which comprises five values of one and zero. We can generate 100,000 samples with a small sample size of 10, and the resulting histogram for sample means does not appear to be a normal distribution. However, if we generate 100,000 samples with a large sample size of 2,000, the distribution of sample mean now looks like a normal distribution.
In this video, we focused on the distribution of sample mean and the probability rule of variation of sample mean. We will apply these concepts in the next two videos to explore two important quantitative statistical tools: confidence interval and hypothesis testing.
3 years ago | [YT] | 0
View 0 replies
Load more