Ok, so this blog will be more of a tutorial for those who face similar issues of using a storage instance on EC2. As a Linux/Ubuntu newbie, figuring out how to write files from Python to that 1TB EC2 SSD storage volume was probably one of the more frustrating things. The solution is actually quite simple, so here it is if you’re looking to store off large files on EC2.
First, the storage instances that you start are probably not formatted or mounted. To figure out if it is, use the command df -h
This is the disk free command, -h option just shows available free space in your drives. If your storage drives are formatted and mounted, it will show up (as /dev/xvdb).
If the volume isn’t formatted and mounted, you can list it using the lsblk (list block devices) command. The unmounted volume will not have a mount point (as is the case here for /dev/xvdc below).
Ok, so now we know the available disks that have yet to be mounted. There are 3 steps to mounting this disk.
1. Create a file system on the device using the command below (with the appropriate device name of course)
sudo mkfs -t ext4 /dev/xvdc
Note that sudo is necessary for running commands as an administrator, mkfs is make file system, and the -t option is to specify file system type, in this case ext4 (a linux standard).
2. Make the new directory. Simply mkdir an appropriate destination name as below
sudo mkdir /mnt/data2
3. Mount filesystem to new directory
sudo mount /dev/xvdc /mnt/data2
So now that your storage is mounted (click here for more on this), you’re ready to go right? Not so fast. If you try to make any kind of change to this volume through iPython, you will get an error like:
However, note that if you try to mkdir on the volume in Ubuntu, this is perfectly okay as long as you use sudo mkdir. So the problem is that iPython’s user isn’t granted access, only root has access to make changes on the new device. The user for python/ipython is actually Ubuntu, where we’ve installed Anaconda (see the output to the “top” command below).
To fix this, we can simply change the owner of the mounted drive with the following command:
sudo chown -R ubuntu:ubuntu /mnt/data2
Note that chown is to change owner, the option -R is recursive, so that the subfolders are changed as well.
Now let’s try to mkdir storage without the sudo command…success!
Finally, let’s try to change the folder from iPython NB again..
And there you go. Now you can store away all the things you want from iPythonNB. In our case, we are saving ~ 1 TB of crawled data from the web.