File Uploads¶
When Django handles a file upload, the file data ends up placed in request.FILES (for more on the request object see the documentation for request and response objects). This document explains how files are stored on disk and in memory, and how to customize the default behavior.
Basic file uploads¶
Consider a simple form containing a FileField:
from django import forms
class UploadFileForm(forms.Form):
title = forms.CharField(max_length=50)
file = forms.FileField()
A view handling this form will receive the file data in request.FILES, which is a dictionary containing a key for each FileField (or ImageField, or other FileField subclass) in the form. So the data from the above form would be accessible as request.FILES['file'].
Note that request.FILES will only contain data if the request method was POST and the <form> that posted the request has the attribute enctype="multipart/form-data". Otherwise, request.FILES will be empty.
Most of the time, you’ll simply pass the file data from request into the form as described in Binding uploaded files to a form. This would look something like:
from django.http import HttpResponseRedirect
from django.shortcuts import render_to_response
# Imaginary function to handle an uploaded file.
from somewhere import handle_uploaded_file
def upload_file(request):
if request.method == 'POST':
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
handle_uploaded_file(request.FILES['file'])
return HttpResponseRedirect('/success/url/')
else:
form = UploadFileForm()
return render_to_response('upload.html', {'form': form})
Notice that we have to pass request.FILES into the form’s constructor; this is how file data gets bound into a form.
Handling uploaded files¶
- class UploadedFile¶
The final piece of the puzzle is handling the actual file data from request.FILES. Each entry in this dictionary is an UploadedFile object – a simple wrapper around an uploaded file. You’ll usually use one of these methods to access the uploaded content:
- read()¶
Read the entire uploaded data from the file. Be careful with this method: if the uploaded file is huge it can overwhelm your system if you try to read it into memory. You’ll probably want to use chunks() instead; see below.
- multiple_chunks()¶
Returns True if the uploaded file is big enough to require reading in multiple chunks. By default this will be any file larger than 2.5 megabytes, but that’s configurable; see below.
- chunks()¶
A generator returning chunks of the file. If multiple_chunks() is True, you should use this method in a loop instead of read().
In practice, it’s often easiest simply to use chunks() all the time; see the example below.
- name¶
The name of the uploaded file (e.g. my_file.txt).
- size¶
The size, in bytes, of the uploaded file.
There are a few other methods and attributes available on UploadedFile objects; see UploadedFile objects for a complete reference.
Putting it all together, here’s a common way you might handle an uploaded file:
def handle_uploaded_file(f):
with open('some/file/name.txt', 'wb+') as destination:
for chunk in f.chunks():
destination.write(chunk)
Looping over UploadedFile.chunks() instead of using read() ensures that large files don’t overwhelm your system’s memory.
Where uploaded data is stored¶
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold the entire contents of the upload in memory. This means that saving the file involves only a read from memory and a write to disk and thus is very fast.
However, if an uploaded file is too large, Django will write the uploaded file to a temporary file stored in your system’s temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload. If an upload is large enough, you can watch this file grow in size as Django streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable defaults”. Read on for details on how you can customize or completely replace upload behavior.
Changing upload handler behavior¶
Three settings control Django’s file upload behavior:
- FILE_UPLOAD_MAX_MEMORY_SIZE
The maximum size, in bytes, for files that will be uploaded into memory. Files larger than FILE_UPLOAD_MAX_MEMORY_SIZE will be streamed to disk.
Defaults to 2.5 megabytes.
- FILE_UPLOAD_TEMP_DIR
The directory where uploaded files larger than FILE_UPLOAD_MAX_MEMORY_SIZE will be stored.
Defaults to your system’s standard temporary directory (i.e. /tmp on most Unix-like systems).
- FILE_UPLOAD_PERMISSIONS
The numeric mode (i.e. 0644) to set newly uploaded files to. For more information about what these modes mean, see the documentation for os.chmod().
If this isn’t given or is None, you’ll get operating-system dependent behavior. On most platforms, temporary files will have a mode of 0600, and files saved from memory will be saved using the system’s standard umask.
Warning
If you’re not familiar with file modes, please note that the leading 0 is very important: it indicates an octal number, which is the way that modes must be specified. If you try to use 644, you’ll get totally incorrect behavior.
Always prefix the mode with a 0.
- FILE_UPLOAD_HANDLERS
The actual handlers for uploaded files. Changing this setting allows complete customization – even replacement – of Django’s upload process. See upload handlers, below, for details.
Defaults to:
("django.core.files.uploadhandler.MemoryFileUploadHandler", "django.core.files.uploadhandler.TemporaryFileUploadHandler",)
Which means “try to upload to memory first, then fall back to temporary files.”
Handling uploaded files with a model¶
If you’re saving a file on a Model with a FileField, using a ModelForm makes this process much easier. The file object will be saved to the location specified by the upload_to argument of the corresponding FileField when calling form.save():
from django.http import HttpResponseRedirect
from django.shortcuts import render
from .forms import ModelFormWithFileField
def upload_file(request):
if request.method == 'POST':
form = ModelFormWithFileField(request.POST, request.FILES)
if form.is_valid():
# file is saved
form.save()
return HttpResponseRedirect('/success/url/')
else:
form = ModelFormWithFileField()
return render(request, 'upload.html', {'form': form})
If you are constructing an object manually, you can simply assign the file object from request.FILES to the file field in the model:
from django.http import HttpResponseRedirect
from django.shortcuts import render
from .forms import UploadFileForm
from .models import ModelWithFileField
def upload_file(request):
if request.method == 'POST':
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
instance = ModelWithFileField(file_field=request.FILES['file'])
instance.save()
return HttpResponseRedirect('/success/url/')
else:
form = UploadFileForm()
return render(request, 'upload.html', {'form': form})
UploadedFile objects¶
In addition to those inherited from File, all UploadedFile objects define the following methods/attributes:
- UploadedFile.content_type¶
The content-type header uploaded with the file (e.g. text/plain or application/pdf). Like any data supplied by the user, you shouldn’t trust that the uploaded file is actually this type. You’ll still need to validate that the file contains the content that the content-type header claims – “trust but verify.”
- UploadedFile.charset¶
For text/* content-types, the character set (i.e. utf8) supplied by the browser. Again, “trust but verify” is the best policy here.
- UploadedFile.temporary_file_path¶
Only files uploaded onto disk will have this method; it returns the full path to the temporary uploaded file.
Note
Like regular Python files, you can read the file line-by-line simply by iterating over the uploaded file:
for line in uploadedfile:
do_something_with(line)
However, unlike standard Python files, UploadedFile only understands \n (also known as “Unix-style”) line endings. If you know that you need to handle uploaded files with different line endings, you’ll need to do so in your view.
Upload Handlers¶
When a user uploads a file, Django passes off the file data to an upload handler – a small class that handles file data as it gets uploaded. Upload handlers are initially defined in the FILE_UPLOAD_HANDLERS setting, which defaults to:
("django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler",)
Together the MemoryFileUploadHandler and TemporaryFileUploadHandler provide Django’s default file upload behavior of reading small files into memory and large ones onto disk.
You can write custom handlers that customize how Django handles files. You could, for example, use custom handlers to enforce user-level quotas, compress data on the fly, render progress bars, and even send data to another storage location directly without storing it locally.
Modifying upload handlers on the fly¶
Sometimes particular views require different upload behavior. In these cases, you can override upload handlers on a per-request basis by modifying request.upload_handlers. By default, this list will contain the upload handlers given by FILE_UPLOAD_HANDLERS, but you can modify the list as you would any other list.
For instance, suppose you’ve written a ProgressBarUploadHandler that provides feedback on upload progress to some sort of AJAX widget. You’d add this handler to your upload handlers like this:
request.upload_handlers.insert(0, ProgressBarUploadHandler())
You’d probably want to use list.insert() in this case (instead of append()) because a progress bar handler would need to run before any other handlers. Remember, the upload handlers are processed in order.
If you want to replace the upload handlers completely, you can just assign a new list:
request.upload_handlers = [ProgressBarUploadHandler()]
Note
You can only modify upload handlers before accessing request.POST or request.FILES – it doesn’t make sense to change upload handlers after upload handling has already started. If you try to modify request.upload_handlers after reading from request.POST or request.FILES Django will throw an error.
Thus, you should always modify uploading handlers as early in your view as possible.
Also, request.POST is accessed by CsrfViewMiddleware which is enabled by default. This means you will need to use csrf_exempt() on your view to allow you to change the upload handlers. You will then need to use csrf_protect() on the function that actually processes the request. Note that this means that the handlers may start receiving the file upload before the CSRF checks have been done. Example code:
from django.views.decorators.csrf import csrf_exempt, csrf_protect
@csrf_exempt
def upload_file_view(request):
request.upload_handlers.insert(0, ProgressBarUploadHandler())
return _upload_file_view(request)
@csrf_protect
def _upload_file_view(request):
... # Process request
Writing custom upload handlers¶
All file upload handlers should be subclasses of django.core.files.uploadhandler.FileUploadHandler. You can define upload handlers wherever you wish.
Required methods¶
Custom file upload handlers must define the following methods:
- FileUploadHandler.receive_data_chunk(self, raw_data, start)
Receives a “chunk” of data from the file upload.
raw_data is a byte string containing the uploaded data.
start is the position in the file where this raw_data chunk begins.
The data you return will get fed into the subsequent upload handlers’ receive_data_chunk methods. In this way, one handler can be a “filter” for other handlers.
Return None from receive_data_chunk to sort-circuit remaining upload handlers from getting this chunk.. This is useful if you’re storing the uploaded data yourself and don’t want future handlers to store a copy of the data.
If you raise a StopUpload or a SkipFile exception, the upload will abort or the file will be completely skipped.
- FileUploadHandler.file_complete(self, file_size)
Called when a file has finished uploading.
The handler should return an UploadedFile object that will be stored in request.FILES. Handlers may also return None to indicate that the UploadedFile object should come from subsequent upload handlers.
Optional methods¶
Custom upload handlers may also define any of the following optional methods or attributes:
- FileUploadHandler.chunk_size
Size, in bytes, of the “chunks” Django should store into memory and feed into the handler. That is, this attribute controls the size of chunks fed into FileUploadHandler.receive_data_chunk.
For maximum performance the chunk sizes should be divisible by 4 and should not exceed 2 GB (231 bytes) in size. When there are multiple chunk sizes provided by multiple handlers, Django will use the smallest chunk size defined by any handler.
The default is 64*210 bytes, or 64 KB.
- FileUploadHandler.new_file(self, field_name, file_name, content_type, content_length, charset)
Callback signaling that a new file upload is starting. This is called before any data has been fed to any upload handlers.
field_name is a string name of the file <input> field.
file_name is the unicode filename that was provided by the browser.
content_type is the MIME type provided by the browser – E.g. 'image/jpeg'.
content_length is the length of the image given by the browser. Sometimes this won’t be provided and will be None.
charset is the character set (i.e. utf8) given by the browser. Like content_length, this sometimes won’t be provided.
This method may raise a StopFutureHandlers exception to prevent future handlers from handling this file.
- FileUploadHandler.upload_complete(self)
- Callback signaling that the entire upload (all files) has completed.
- FileUploadHandler.handle_raw_input(self, input_data, META, content_length, boundary, encoding)
Allows the handler to completely override the parsing of the raw HTTP input.
input_data is a file-like object that supports read()-ing.
META is the same object as request.META.
content_length is the length of the data in input_data. Don’t read more than content_length bytes from input_data.
boundary is the MIME boundary for this request.
encoding is the encoding of the request.
Return None if you want upload handling to continue, or a tuple of (POST, FILES) if you want to return the new data structures suitable for the request directly.