You are not logged in.

#1 2018-03-28 12:54:23

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,579
Website

Stupid simple dynamic python web pages

I recently switched from Apache to nginx for my own server and have since been tinkering with all to possibilities it opens up to fcgi and/or proxying to other servers.  I've really wanted to have a simple way to have dynamic python-generated pages.

I've experimented with many of the major players in wsgi (e.g., CherryPy and friends), and while these worked very well, they all felt quite bloated for my purposes.  They ran substantial daemon processes and could do all sorts of things I had no need for.  It was also not trivial to get set up to have python scripts in my server's document root, that when "visited" would get interpreted to generate html output.  Clearly this goal is influenced by my background with Apache (file-based server) and php (a script file generates an html 'file') while nginx and wsgi remove these constraints.  But despite not being constrained to doing just that, I wanted to be able to do at least that - but all the popular wsgi servers were not made for this.

So I started writing a couple {f,s}cgi servers from scratch to be sure I understood the inner workings.  My own custom servers were a fun learning experience, but I'd not likely want to use them on a daily basis - while they could be a ridiculously light daemon process, they'd have to spawn entirely different processes for every page visit; basically they worked like cgi dispatchers that could invoke the python (or any other interpreter) to convert a script to a http response to send back to nginx.

Then I stumbled upon the perfect middle ground.  Python's standard library has a wsgi server built in, and it does more than I need.  With just over a dozen lines of python, I have a wsgi server that can serve up individual python files.

The WSGI Server:

#!/bin/env python3

from imp import load_source
from wsgiref.simple_server import make_server

def serve(env, start):
    status = '200 OK'
    headers = [('Content-type', 'text/html; charset=utf-8')]
    try:
        fname = env['HTTP_X_DOCUMENT_ROOT'] + env['PATH_INFO']
        script = load_source('script', fname)
        content = script.main(env, status, headers)
    except:
        status = '301 Moved Permanently'
        headers = [('Location', 'http://' + env['HTTP_HOST'])]
        content = ''
    start(status, headers)
    return [ content.encode('utf-8') ]

httpd = make_server('localhost', 8080, serve)
httpd.serve_forever()

Nginx proxy config example:

location ~ \.wsgi$ {
   proxy_pass http://localhost:8080;
   proxy_redirect     off;
   proxy_set_header   Host $host;
   proxy_set_header   X-Real-IP $remote_addr;
   proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_set_header   X-Forwarded-Host $server_name;
   proxy_set_header   X-Document-Root '/srv/http';
}

With these, any file ending in .wsgi is interpetted by python to generate html.  A simple example '/srv/http/hello.wsgi':

#!/bin/env python3

def main(env, status, headers):
    return 'Hello World'

The individual wsgi files need to have a 'main' function that accepts 3 parameters.  It is free to ignore all of them and return text html content and nothing else.  The first parameter (env above) is the wsgi environment dictionary, the second is the status to be returned (e.g. to return an http error code), and the third is a list of the html headers with a minimal default which can be modified as needed or just left alone to simply return html output.

I may need to change how 'status' is passed to the .wsgi scripts.  I've always been annoyed at the python documentation on how function parameters are passed as they seem to butcher the terminology of by-value and by-reference to not actually mean what those words mean in any other language.  The header list is definitely passed by reference (so headers can be modified in the .wsgi script) but I think I need to adjust the above code a bit for changes to status to be made in the .wsgi scripts.  Input from python experts welcome; I know I could pass status as a single item list which would then work, but it'd be a bit ugly, is there a better way to have a mutable string parameter passed by reference?

The other caveate is that this is "stupid simple" and it would be stupid to use it for non-simple environments.  This is made to fill the gap for very low traffic servers that are almost exclusively static content but want the ability to serve up dynamic python-generate pages on occasion.

(edit: several typos down, several more to go.)

Last edited by Trilby (2018-03-28 13:04:12)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#2 2018-03-28 13:59:11

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,808

Re: Stupid simple dynamic python web pages

Nice.  I too have been immersed in nginx and cherrypy.  What is your gut feel as to the security of your approach?   The cherrypy solution seems to be rock solid; especially when tucked behind nginx.  Any chance an adversary could break out of TTP_X_DOCUMENT_ROOT with a well crafted request?   Also, I've not looked into this yet, but how about RESTful APIs?  Or is that a getting beyond what you are up to?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#3 2018-03-28 14:21:55

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,579
Website

Re: Stupid simple dynamic python web pages

This is still behind nginx as the python server only listens for local connections.  As for backing out of the document root, even if this were possible, nginx will only forward requests for file names ending in .wsgi, so at worst an attacker could cause the server to import a file from outside the document root if it also had a .wsgi extension.  Even then the python exception will trigger if this is not an importable python file, and the exception will also fire if said file does not have a main function that accepts 3 parameters of the right types.

That said, the individual .wsgi files can do whatever they want (or anything the user the wsgi server runs as can do).  There is no equivalent of 'open_basedir' here.

So, that is a vanishingly small attack surface.  But of course I wasn't happy with that, so I tested whether even that was possible.  I added the following line to the top of the serve function to monitor what files it would even try to import:
I just added a line at the start of the serve function:

print(env['HTTP_X_DOCUMENT_ROOT'] + env['PATH_INFO'])

I tried many variations of "../" paths to back out to a parent directory.  In every case the resulting "filename" was within the document root.

I can't find anything in PEP 3333 or the wsgiref documentation suggesting that the PATH_INFO should/would be sanitized in such a way, so I'd speculate that perhaps nginx is doing this.  Later today I'll fire up some of my own experimental servers to identify exactly where the path is restricted - but it seems that even that vanishingly small attack surface is not accessible with wsgiref behind nginx.

As for RESTful APIs, that's a buzzword I've never bothered to learn much about as in every case I have tried to learn about it, it seemed to be all buzzword and no content.  The wsgiref server can accept and return any form of HTTP payload (json, xml, image, whatever you want).  The env dictionary passed to the .wsgi script contains all the wsgiref parameters that include the request body and even a file-like wrapper for the request body.  So with the above setup the .wsgi script could receive any valid HTTP request and respond with any HTTP response.  The one exception is that my dozen-or-so line server has utf-8 encoding hard coded into it.  But that would be trivially easy to change if needed.

If you need anything other than HTTP, then you'd need to use a different nginx config like a 'stream' block. I had my own gopher server running behind a nginx stream for a bit until I realized that was pretty silly: nginx would forward all port 70 traffic to my gopher server as-is, so it didn't do anything useful as a middleman, so I just had my gopher server listen on port 70 itself.  All nginx could really add might be encryption, but gopher over TLS would be rather pointless ... no gopher client could connect.

Last edited by Trilby (2018-03-28 14:31:53)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#4 2018-03-28 14:37:13

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,808

Re: Stupid simple dynamic python web pages

I usually take RESTful it to mean that the thing that identifies the 'app' is not necessarily at the end of the url.  For example http://xyz.com/musicCatalog/artists/BoBMarley might by the url, but the infrastructure finds musicCatalog and passes it ['artists','BobMarley'] and the request (POST, GET, ...); OTOH http://xyz.com/musicCatalog/album/DarkSideOfTheMoon would call musicCatalog with ['album','DarkSideOfTheMoon'] and the request.   

So, to answer my own pondering, it should not be a major issue. You would simply look at PATH_INFO, truncate as appropriate, and pass the trimmed part to the python script.

In any event, I bookmarked this thread for the next time I need something simple like this.  Parting thought:  I don't expect this to scale well, but I think you indicated that in your first post.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#5 2018-03-28 14:52:23

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,579
Website

Re: Stupid simple dynamic python web pages

Ah, yes.  The version in the OP is for file/path-based python scripts as that was what I felt was missing.

What you described above is actually even easier with wsgiref.  My first versions of this script worked as you described (which is much more like CherryPy and others).  But this wasn't what I was looking for, and the 'imp' module actually was the missing piece for me.

To use something much like my python script above for a url-based dispatcher as you describe, you would likely want to use `shift_path_info` from wsgiref.uti.  You can pop one level at a time off the front of the uri and match that against a dict (or other dispatch method of your choosing) to pass control to right handler.

As I played with that approach, I didn't want all uri to end in .wsgi, so my nginx block was a bit different.

As much as I liked CherryPy, the more I work with wsgiref, the more I realize that CherryPy doesn't seem to add any functionally, just some syntactic sugar.

Last edited by Trilby (2018-03-28 14:53:52)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

Board footer

Powered by FluxBB